spark基于hadoop的预编译包是不支持hive的,需要需要支持Hive,需要进行安装配置。
首先,安装maven,并配置(略过)
下载spark源码,通过maven进行构建
build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package
编译成功后,解压安装spark(略过)
验证Spark
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/target/spark-examples_2.10-1.6.0.jar 10
在bigdata-2节点,分发mysql驱动
[root@bigdata-2 ~]# scp /usr/local/hive-1.2.1/lib/mysql-connector-java-5.1.38-bin.jar bigdata-1:/usr/local/spark-1.6.0/lib_managed/jars/
在bigdata-2节点,分发hive配置文件
[root@bigdata-2 ~]# scp /usr/local/hive-1.2.1/conf/hive-site.xml bigdata-1:/usr/local/spark-1.6.0/conf/
在bigdata-1节点启动thriftserver
[root@bigdata-1 spark-1.6.0]# sbin/start-thriftserver.sh --master yarn --jars lib_managed/jars/datanucleus-core-3.2.10.jar, lib_managed/jars/datanucleus-api-3.2.6.jar, lib_managed/jars/datanucleus-rdbms-3.2.9.jar, lib_managed/jars/mysql-connector-java-5.1.38-bin.jar --files conf/hive-site.xml
在bigdata-3节点连接hive
[root@bigdata-3 spark-1.6.0-bin-spark-hadoop-2.6]# ./bin/beeline -u jdbc:hive2://127.0.0.1:10000 -n root
在hive中查看表
0: jdbc:hive2://127.0.0.1:10000> show tables;
在Hive中创建表
0: jdbc:hive2://127.0.0.1:10000> create table test_hive(id int, name string);
在Hive表中插入数据
0: jdbc:hive2://127.0.0.1:10000> select * from test;
在Hive中查看表的数据
0: jdbc:hive2://127.0.0.1:10000> select * from test_hive;