CDH6集成kerberos 作者: sysit 分类: d 发表于 2019-04-09 562人围观 ## 1. 前置条件 * kerberos已经安装并正确配置,见《kerberos高可用安装与配置》 * CDH安装完成,见《CDH6.2.0安装配置手册》 ## 2 安装配置kerberos 客户端 * 集群安装kerberos包 ``` #为集群安装所有Kerberos客户端,包括Cloudera Manager yum -y install krb5-libs krb5-workstation # 在cloudera-manager-server节点上安装额外的包 yum install openldap-clients -y ``` * 将KDC Server上的krb5.conf文件拷贝到所有Kerberos客户端 ``` scp krb-1.sysit.cn:/etc/krb5.conf /etc/krb5.conf ``` * 在KDC中给Cloudera Manager添加管理员账号 ``` root@krb-1 ~]# kadmin.local Authenticating as principal root/admin@SYSIT.CN with password. kadmin.local: addprinc cloudera-scm/admin@SYSIT.CN WARNING: no policy specified for cloudera-scm/admin@SYSIT.CN; defaulting to no policy Enter password for principal "cloudera-scm/admin@SYSIT.CN": Re-enter password for principal "cloudera-scm/admin@SYSIT.CN": Principal "cloudera-scm/admin@SYSIT.CN" created. kadmin.local: exit ``` ## 3 cdh启用kerberos 进入Cloudera Manager的“Administration”-> “Security”界面   选择“Enable Kerberos”,进入如下界面: 确保如下列出的所有检查项都已完成  点击“Continue”,配置相关的KDC信息,包括类型、KDC服务器、KDC Realm、加密类型以及待创建的Service Principal(hdfs,yarn,,hbase,hive等)的更新生命期等  点击“Continue”  不建议让Cloudera Manager来管理krb5.conf, 点击“Continue”  输入Cloudera Manager的Kerbers管理员账号,必须和之前创建的账号一致,否则会提示错误,点击“Continue”进入下一步。  导入完成,点击“Continue”进入下一步。  点击“Continue”进入下一步。  等待安装完成后,点击“Continue”进入下一步。  提示成功,点击“Finish”。 ## 4 使用kerberos ### 4.1 创建一个用户来测试kerberos 创建一个用户admin,在操作系统中也应该有admin这个账号。 ``` [root@krb-1 ~]# kadmin.local Authenticating as principal root/admin@SYSIT.CN with password. kadmin.local: addprinc admin@SYSIT.CN WARNING: no policy specified for admin@SYSIT.CN; defaulting to no policy Enter password for principal "admin@SYSIT.CN": Re-enter password for principal "admin@SYSIT.CN": Principal "admin@SYSIT.CN" created. kadmin.local: exit ``` ### 4.2 使用kerberos访问hdfs 在没有经过kerberos认证的时候,执行hdfs dfs -ls /出现如下信息: ``` [admin@cdh-1 ~]$ hdfs dfs -ls / 19/04/09 20:54:19 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 19/04/09 20:54:19 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 19/04/09 20:54:19 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "cdh-1.sysit.cn/10.50.101.106"; destination host is: "cdh-2.sysit.cn":8020; , while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-2.sysit.cn/10.50.101.64:8020 after 1 failover attempts. Trying to failover after sleeping for 754ms. ``` 我们重新获取kerberos 认证后执行。 ``` [admin@cdh-1 ~]$ kinit admin@SYSIT.CN Password for admin@SYSIT.CN: [admin@cdh-1 ~]$ hdfs dfs -ls / Found 2 items drwxrwxrwt - hdfs supergroup 0 2019-04-09 17:07 /tmp drwxr-xr-x - hdfs supergroup 0 2019-04-09 17:33 /user ``` ### 4.3 运行mapReduce作业 ``` [admin@cdh-5 ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples-3.0.0-cdh6.2.0.jar pi 10 1 WARNING: Use "yarn jar" to launch YARN applications. Number of Maps = 10 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 19/04/09 22:06:34 INFO client.RMProxy: Connecting to ResourceManager at cdh-4.sysit.cn/10.50.101.88:8032 19/04/09 22:06:34 INFO hdfs.DFSClient: Created token for admin: HDFS_DELEGATION_TOKEN owner=admin@SYSIT.CN, renewer=yarn, realUser=, issueDate=1554818795193, maxDate=1555423595193, sequenceNumber=5, masterKeyId=6 on ha-hdfs:nameservice1 19/04/09 22:06:34 INFO security.TokenCache: Got dt for hdfs://nameservice1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for admin: HDFS_DELEGATION_TOKEN owner=admin@SYSIT.CN, renewer=yarn, realUser=, issueDate=1554818795193, maxDate=1555423595193, sequenceNumber=5, masterKeyId=6) 19/04/09 22:06:34 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/admin/.staging/job_1554817971140_0003 19/04/09 22:06:34 INFO input.FileInputFormat: Total input files to process : 10 19/04/09 22:06:35 INFO mapreduce.JobSubmitter: number of splits:10 19/04/09 22:06:35 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/04/09 22:06:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554817971140_0003 19/04/09 22:06:35 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for admin: HDFS_DELEGATION_TOKEN owner=admin@SYSIT.CN, renewer=yarn, realUser=, issueDate=1554818795193, maxDate=1555423595193, sequenceNumber=5, masterKeyId=6)] 19/04/09 22:06:35 INFO conf.Configuration: resource-types.xml not found 19/04/09 22:06:35 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 19/04/09 22:06:35 INFO impl.YarnClientImpl: Submitted application application_1554817971140_0003 19/04/09 22:06:35 INFO mapreduce.Job: The url to track the job: http://cdh-4.sysit.cn:8088/proxy/application_1554817971140_0003/ 19/04/09 22:06:35 INFO mapreduce.Job: Running job: job_1554817971140_0003 19/04/09 22:06:44 INFO mapreduce.Job: Job job_1554817971140_0003 running in uber mode : false 19/04/09 22:06:44 INFO mapreduce.Job: map 0% reduce 0% 19/04/09 22:06:51 INFO mapreduce.Job: map 70% reduce 0% 19/04/09 22:06:53 INFO mapreduce.Job: map 100% reduce 0% 19/04/09 22:06:59 INFO mapreduce.Job: map 100% reduce 100% 19/04/09 22:07:00 INFO mapreduce.Job: Job job_1554817971140_0003 completed successfully 19/04/09 22:07:00 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=65 FILE: Number of bytes written=2453572 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=45 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=51174 Total time spent by all reduces in occupied slots (ms)=3070 Total time spent by all map tasks (ms)=51174 Total time spent by all reduce tasks (ms)=3070 Total vcore-milliseconds taken by all map tasks=51174 Total vcore-milliseconds taken by all reduce tasks=3070 Total megabyte-milliseconds taken by all map tasks=52402176 Total megabyte-milliseconds taken by all reduce tasks=3143680 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=331 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=331 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=921 CPU time spent (ms)=7760 Physical memory (bytes) snapshot=5548974080 Virtual memory (bytes) snapshot=28700708864 Total committed heap usage (bytes)=5287444480 Peak Map Physical memory (bytes)=544194560 Peak Map Virtual memory (bytes)=2611851264 Peak Reduce Physical memory (bytes)=289554432 Peak Reduce Virtual memory (bytes)=2624212992 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 26.141 seconds Estimated value of Pi is 3.60000000000000000000 ``` ### 4.4 spark提交任务 ``` spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_2.11-2.4.0-cdh6.2.0.jar 10 19/04/09 22:14:35 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.2.0 19/04/09 22:14:35 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-b416ec8a-e17b-4c33-b006-935e26835e24/__driver_logs__/driver.log 19/04/09 22:14:35 INFO spark.SparkContext: Submitted application: Spark Pi 19/04/09 22:14:35 INFO spark.SecurityManager: Changing view acls to: admin 19/04/09 22:14:35 INFO spark.SecurityManager: Changing modify acls to: admin 19/04/09 22:14:35 INFO spark.SecurityManager: Changing view acls groups to: 19/04/09 22:14:35 INFO spark.SecurityManager: Changing modify acls groups to: 19/04/09 22:14:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(admin); groups with view permissions: Set(); users with modify permissions: Set(admin); groups with modify permissions: Set() 19/04/09 22:14:35 INFO util.Utils: Successfully started service 'sparkDriver' on port 41703. 19/04/09 22:14:35 INFO spark.SparkEnv: Registering MapOutputTracker 19/04/09 22:14:35 INFO spark.SparkEnv: Registering BlockManagerMaster 19/04/09 22:14:35 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/04/09 22:14:35 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/04/09 22:14:35 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-e2ec9cef-eec1-42d5-9256-37e12085ee1e 19/04/09 22:14:35 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 19/04/09 22:14:35 INFO spark.SparkEnv: Registering OutputCommitCoordinator 19/04/09 22:14:35 INFO util.log: Logging initialized @1936ms 19/04/09 22:14:35 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2018-09-05T05:11:46+08:00, git hash: 3ce520221d0240229c862b122d2b06c12a625732 19/04/09 22:14:35 INFO server.Server: Started @2026ms 19/04/09 22:14:35 INFO server.AbstractConnector: Started ServerConnector@2e8ab815{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 19/04/09 22:14:35 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e29e14{/jobs,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@797501a{/jobs/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a15b789{/jobs/job,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c4f9535{/jobs/job/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5bd1ceca{/stages,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@30c31dd7{/stages/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@499b2a5c{/stages/stage,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@241a53ef{/stages/stage/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@344344fa{/stages/pool,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2db2cd5{/stages/pool/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e659aa{/storage,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@615f972{/storage/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@285f09de{/storage/rdd,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73393584{/storage/rdd/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@31500940{/environment,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1827a871{/environment/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@48e64352{/executors,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7249dadf{/executors/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4362d7df{/executors/threadDump,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66238be2{/executors/threadDump/json,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c25b8a7{/static,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ea502e0{/,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@443dbe42{/api,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ed190be{/jobs/job/kill,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@402f80f5{/stages/stage/kill,null,AVAILABLE,@Spark} 19/04/09 22:14:35 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://cdh-5.sysit.cn:4040 19/04/09 22:14:35 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_2.11-2.4.0-cdh6.2.0.jar at spark://cdh-5.sysit.cn:41703/jars/spark-examples_2.11-2.4.0-cdh6.2.0.jar with timestamp 1554819275864 19/04/09 22:14:35 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml 19/04/09 22:14:36 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1094953782_1, ugi=admin@SYSIT.CN (auth:KERBEROS)]] with renewer yarn/cdh-4.sysit.cn@SYSIT.CN 19/04/09 22:14:37 INFO hdfs.DFSClient: Created token for admin: HDFS_DELEGATION_TOKEN owner=admin@SYSIT.CN, renewer=yarn, realUser=, issueDate=1554819277768, maxDate=1555424077768, sequenceNumber=8, masterKeyId=6 on ha-hdfs:nameservice1 19/04/09 22:14:37 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1094953782_1, ugi=admin@SYSIT.CN (auth:KERBEROS)]] with renewer admin@SYSIT.CN 19/04/09 22:14:37 INFO hdfs.DFSClient: Created token for admin: HDFS_DELEGATION_TOKEN owner=admin@SYSIT.CN, renewer=admin, realUser=, issueDate=1554819277849, maxDate=1555424077849, sequenceNumber=9, masterKeyId=6 on ha-hdfs:nameservice1 19/04/09 22:14:37 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400067 for token HDFS_DELEGATION_TOKEN 19/04/09 22:14:37 INFO deploy.SparkHadoopUtil: Updating delegation tokens for current user. 19/04/09 22:14:37 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/04/09 22:14:37 INFO client.RMProxy: Connecting to ResourceManager at cdh-4.sysit.cn/10.50.101.88:8032 19/04/09 22:14:37 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 19/04/09 22:14:37 INFO conf.Configuration: resource-types.xml not found 19/04/09 22:14:37 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 19/04/09 22:14:37 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4285 MB per container) 19/04/09 22:14:37 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 19/04/09 22:14:37 INFO yarn.Client: Setting up container launch context for our AM 19/04/09 22:14:37 INFO yarn.Client: Setting up the launch environment for our AM container 19/04/09 22:14:37 INFO yarn.Client: Preparing resources for our AM container 19/04/09 22:14:37 INFO yarn.Client: Uploading resource file:/tmp/spark-b416ec8a-e17b-4c33-b006-935e26835e24/__spark_conf__2395738594444617053.zip -> hdfs://nameservice1/user/admin/.sparkStaging/application_1554817971140_0005/__spark_conf__.zip 19/04/09 22:14:38 INFO spark.SecurityManager: Changing view acls to: admin 19/04/09 22:14:38 INFO spark.SecurityManager: Changing modify acls to: admin 19/04/09 22:14:38 INFO spark.SecurityManager: Changing view acls groups to: 19/04/09 22:14:38 INFO spark.SecurityManager: Changing modify acls groups to: 19/04/09 22:14:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(admin); groups with view permissions: Set(); users with modify permissions: Set(admin); groups with modify permissions: Set() 19/04/09 22:14:38 INFO yarn.Client: Submitting application application_1554817971140_0005 to ResourceManager 19/04/09 22:14:38 INFO impl.YarnClientImpl: Submitted application application_1554817971140_0005 19/04/09 22:14:39 INFO yarn.Client: Application report for application_1554817971140_0005 (state: ACCEPTED) 19/04/09 22:14:39 INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.admin start time: 1554819278543 final status: UNDEFINED tracking URL: http://cdh-4.sysit.cn:8088/proxy/application_1554817971140_0005/ user: admin 19/04/09 22:14:40 INFO yarn.Client: Application report for application_1554817971140_0005 (state: ACCEPTED) 19/04/09 22:14:41 INFO yarn.Client: Application report for application_1554817971140_0005 (state: ACCEPTED) 19/04/09 22:14:42 INFO yarn.Client: Application report for application_1554817971140_0005 (state: ACCEPTED) 19/04/09 22:14:43 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh-4.sysit.cn, PROXY_URI_BASES -> http://cdh-4.sysit.cn:8088/proxy/application_1554817971140_0005), /proxy/application_1554817971140_0005 19/04/09 22:14:43 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill. 19/04/09 22:14:43 INFO yarn.Client: Application report for application_1554817971140_0005 (state: RUNNING) 19/04/09 22:14:43 INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: 10.50.101.88 ApplicationMaster RPC port: -1 queue: root.users.admin start time: 1554819278543 final status: UNDEFINED tracking URL: http://cdh-4.sysit.cn:8088/proxy/application_1554817971140_0005/ user: admin 19/04/09 22:14:43 INFO cluster.YarnClientSchedulerBackend: Application application_1554817971140_0005 has started running. 19/04/09 22:14:43 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1554817971140_0005 and attemptId None 19/04/09 22:14:43 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39195. 19/04/09 22:14:43 INFO netty.NettyBlockTransferService: Server created on cdh-5.sysit.cn:39195 19/04/09 22:14:43 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/04/09 22:14:43 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, cdh-5.sysit.cn, 39195, None) 19/04/09 22:14:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager cdh-5.sysit.cn:39195 with 366.3 MB RAM, BlockManagerId(driver, cdh-5.sysit.cn, 39195, None) 19/04/09 22:14:43 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, cdh-5.sysit.cn, 39195, None) 19/04/09 22:14:43 INFO storage.BlockManager: external shuffle service port = 7337 19/04/09 22:14:43 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, cdh-5.sysit.cn, 39195, None) 19/04/09 22:14:43 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json. 19/04/09 22:14:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3850e90c{/metrics/json,null,AVAILABLE,@Spark} 19/04/09 22:14:43 INFO scheduler.EventLoggingListener: Logging events to hdfs://nameservice1/user/spark/applicationHistory/application_1554817971140_0005 19/04/09 22:14:43 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/04/09 22:14:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 19/04/09 22:14:43 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled. 19/04/09 22:14:43 INFO util.Utils: Extension com.cloudera.spark.lineage.NavigatorAppListener not being initialized. 19/04/09 22:14:44 INFO logging.DriverLogger$DfsAsyncWriter: Started driver log file sync to: /user/spark/driverLogs/application_1554817971140_0005_driver.log 19/04/09 22:14:44 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/04/09 22:14:44 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Parents of final stage: List() 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Missing parents: List() 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents 19/04/09 22:14:44 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 19/04/09 22:14:44 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB) 19/04/09 22:14:44 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 19/04/09 22:14:44 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1232.0 B, free 366.3 MB) 19/04/09 22:14:44 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on cdh-5.sysit.cn:39195 (size: 1232.0 B, free: 366.3 MB) 19/04/09 22:14:44 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1164 19/04/09 22:14:44 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)) 19/04/09 22:14:44 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks 19/04/09 22:14:45 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 3) 19/04/09 22:14:46 INFO spark.ExecutorAllocationManager: Requesting 4 new executors because tasks are backlogged (new desired total will be 7) 19/04/09 22:14:47 INFO spark.ExecutorAllocationManager: Requesting 3 new executors because tasks are backlogged (new desired total will be 10) 19/04/09 22:14:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.50.101.88:49890) with ID 1 19/04/09 22:14:47 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 19/04/09 22:14:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, cdh-4.sysit.cn, executor 1, partition 0, PROCESS_LOCAL, 7741 bytes) 19/04/09 22:14:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager cdh-4.sysit.cn:41437 with 366.3 MB RAM, BlockManagerId(1, cdh-4.sysit.cn, 41437, None) 19/04/09 22:14:48 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.50.101.88:49898) with ID 2 19/04/09 22:14:48 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, cdh-4.sysit.cn, executor 2, partition 1, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on cdh-4.sysit.cn:41437 (size: 1232.0 B, free: 366.3 MB) 19/04/09 22:14:48 INFO storage.BlockManagerMasterEndpoint: Registering block manager cdh-4.sysit.cn:34996 with 366.3 MB RAM, BlockManagerId(2, cdh-4.sysit.cn, 34996, None) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, cdh-4.sysit.cn, executor 1, partition 2, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1089 ms on cdh-4.sysit.cn (executor 1) (1/10) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, cdh-4.sysit.cn, executor 1, partition 3, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 65 ms on cdh-4.sysit.cn (executor 1) (2/10) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, cdh-4.sysit.cn, executor 1, partition 4, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 54 ms on cdh-4.sysit.cn (executor 1) (3/10) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, cdh-4.sysit.cn, executor 1, partition 5, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:48 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 51 ms on cdh-4.sysit.cn (executor 1) (4/10) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, cdh-4.sysit.cn, executor 1, partition 6, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 38 ms on cdh-4.sysit.cn (executor 1) (5/10) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, cdh-4.sysit.cn, executor 1, partition 7, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 34 ms on cdh-4.sysit.cn (executor 1) (6/10) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, cdh-4.sysit.cn, executor 1, partition 8, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 129 ms on cdh-4.sysit.cn (executor 1) (7/10) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, cdh-4.sysit.cn, executor 1, partition 9, PROCESS_LOCAL, 7743 bytes) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 44 ms on cdh-4.sysit.cn (executor 1) (8/10) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 40 ms on cdh-4.sysit.cn (executor 1) (9/10) 19/04/09 22:14:49 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on cdh-4.sysit.cn:34996 (size: 1232.0 B, free: 366.3 MB) 19/04/09 22:14:49 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 960 ms on cdh-4.sysit.cn (executor 2) (10/10) 19/04/09 22:14:49 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 19/04/09 22:14:49 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 5.118 s 19/04/09 22:14:49 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 5.209917 s Pi is roughly 3.142983142983143 19/04/09 22:14:49 INFO server.AbstractConnector: Stopped Spark@2e8ab815{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 19/04/09 22:14:49 INFO ui.SparkUI: Stopped Spark web UI at http://cdh-5.sysit.cn:4040 19/04/09 22:14:49 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 19/04/09 22:14:49 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 19/04/09 22:14:49 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 19/04/09 22:14:49 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 19/04/09 22:14:49 INFO cluster.YarnClientSchedulerBackend: Stopped 19/04/09 22:14:49 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/04/09 22:14:49 INFO memory.MemoryStore: MemoryStore cleared 19/04/09 22:14:49 INFO storage.BlockManager: BlockManager stopped 19/04/09 22:14:49 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/04/09 22:14:49 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/04/09 22:14:49 INFO spark.SparkContext: Successfully stopped SparkContext 19/04/09 22:14:49 INFO util.ShutdownHookManager: Shutdown hook called 19/04/09 22:14:49 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b416ec8a-e17b-4c33-b006-935e26835e24 19/04/09 22:14:49 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3f7a9b49-e3c7-4778-97f0-86a3ca115fcd ``` ### 4.5 使用kerberos访问hive ``` [admin@cdh-5 ~]$ beeline WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 2.1.1-cdh6.2.0 by Apache Hive beeline> !connect jdbc:hive2://cdh-1.sysit.cn:10000/default;principal=hive/cdh-1.sysit.cn@SYSIT.CN Connecting to jdbc:hive2://cdh-1.sysit.cn:10000/default;principal=hive/cdh-1.sysit.cn@SYSIT.CN Connected to: Apache Hive (version 2.1.1-cdh6.2.0) Driver: Hive JDBC (version 2.1.1-cdh6.2.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> show tables; INFO : Compiling command(queryId=hive_20190409223155_a7a39b73-fb1a-4a31-a90e-41e7c6cdd9e1): show tables INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20190409223155_a7a39b73-fb1a-4a31-a90e-41e7c6cdd9e1); Time taken: 1.423 seconds INFO : Executing command(queryId=hive_20190409223155_a7a39b73-fb1a-4a31-a90e-41e7c6cdd9e1): show tables INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20190409223155_a7a39b73-fb1a-4a31-a90e-41e7c6cdd9e1); Time taken: 0.08 seconds INFO : OK +-----------+ | tab_name | +-----------+ +-----------+ No rows selected (2.296 seconds) 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> create database testdb2; INFO : Compiling command(queryId=hive_20190409223253_b377092d-e1b9-4587-a3b2-7f70937c9df9): create database testdb2 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20190409223253_b377092d-e1b9-4587-a3b2-7f70937c9df9); Time taken: 0.039 seconds INFO : Executing command(queryId=hive_20190409223253_b377092d-e1b9-4587-a3b2-7f70937c9df9): create database testdb2 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20190409223253_b377092d-e1b9-4587-a3b2-7f70937c9df9); Time taken: 0.173 seconds INFO : OK No rows affected (0.259 seconds) 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> use testdb2; INFO : Compiling command(queryId=hive_20190409223255_71667cd7-22a8-496f-9375-babc1dfe21bd): use testdb2 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20190409223255_71667cd7-22a8-496f-9375-babc1dfe21bd); Time taken: 0.041 seconds INFO : Executing command(queryId=hive_20190409223255_71667cd7-22a8-496f-9375-babc1dfe21bd): use testdb2 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20190409223255_71667cd7-22a8-496f-9375-babc1dfe21bd); Time taken: 0.015 seconds INFO : OK No rows affected (0.089 seconds) 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> create table emp_part( . . . . . . . . . . . . . . . . . . . . . . > empno int, . . . . . . . . . . . . . . . . . . . . . . > empname string, . . . . . . . . . . . . . . . . . . . . . . > job string, . . . . . . . . . . . . . . . . . . . . . . > mgr int, . . . . . . . . . . . . . . . . . . . . . . > hiredate string, . . . . . . . . . . . . . . . . . . . . . . > salary double, . . . . . . . . . . . . . . . . . . . . . . > comm double, . . . . . . . . . . . . . . . . . . . . . . > deptno int) . . . . . . . . . . . . . . . . . . . . . . > partitioned by (year string, month string) . . . . . . . . . . . . . . . . . . . . . . > row format delimited . . . . . . . . . . . . . . . . . . . . . . > fields terminated by '\t'; INFO : Compiling command(queryId=hive_20190409223335_66f2d48a-6b73-464f-a0f8-cb07098c4875): create table emp_part( empno int, empname string, job string, mgr int, hiredate string, salary double, comm double, deptno int) partitioned by (year string, month string) row format delimited fields terminated by '\t' INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20190409223335_66f2d48a-6b73-464f-a0f8-cb07098c4875); Time taken: 0.289 seconds INFO : Executing command(queryId=hive_20190409223335_66f2d48a-6b73-464f-a0f8-cb07098c4875): create table emp_part( empno int, empname string, job string, mgr int, hiredate string, salary double, comm double, deptno int) partitioned by (year string, month string) row format delimited fields terminated by '\t' INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20190409223335_66f2d48a-6b73-464f-a0f8-cb07098c4875); Time taken: 0.293 seconds INFO : OK No rows affected (0.692 seconds) 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> show tables; INFO : Compiling command(queryId=hive_20190409223343_7a7a1372-094f-4597-a396-aeedfa687c68): show tables INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20190409223343_7a7a1372-094f-4597-a396-aeedfa687c68); Time taken: 0.043 seconds INFO : Executing command(queryId=hive_20190409223343_7a7a1372-094f-4597-a396-aeedfa687c68): show tables INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20190409223343_7a7a1372-094f-4597-a396-aeedfa687c68); Time taken: 0.022 seconds INFO : OK +-----------+ | tab_name | +-----------+ | emp_part | +-----------+ 1 row selected (0.273 seconds) 0: jdbc:hive2://cdh-1.sysit.cn:10000/default> ``` 如果觉得我的文章对您有用,请随意赞赏。您的支持将鼓励我继续创作! 赞赏支持