虚拟机1 命名为broker1 已安装hadoop,scala,jdk
虚拟机2 命名为broker2 已安装hadoop,scala,jdk
虚拟机3 命名为broker3 已安装hadoop,scala,jdk
Spark安装1.到spark官网 2.选择spark 2.1版本,并选择是hadoop 2.7编译的版本
Spark配置1.将下载后的压缩包上传到虚拟机。2.解压压缩包3.新建spark用户和用户组useradd spark4.将解压后的文件移到 /usr/lib/,并改名为spark5.spark文件所属设为spark用户组chown -R spark:spark /usr/lib/spark6.其他两台虚拟机重复上述操作。
Spark集群启动1.spark集群配置文件mv /usr/lib/spark/conf/spark-env.sh.template /usr/lib/spark/conf/spark-env.shvi /usr/lib/spark/conf/spark-env.sh添加以下内容export JAVA_HOME=/usr/lib/jdkexport SCALA_HOME=/usr/lib/scalaexport SPARK_MASTER_IP=spark1export SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop2.编辑slave配置文件mv /usr/lib/spark/conf/slaves.template /usr/lib/spark/conf/slavesvi /usr/lib/spark/conf/slaves添加以下内容:spark2spark33.将配置文件都同步到另外两台虚拟机。4.启动master/usr/lib/spark/sbin/start-master.sh 5.启动worker/usr/lib/spark/sbin/start-slaves.sh
通过Spark Shell运行一个Spark小案例1.进入spark shell/usr/lib/spark/bin/spark-shell2.读取文件val file= sc.textFile('~/temp/gutenburg.txt')3.计算这个文件的字数val count=file.flatMap(line => line.split(' ')).map(word => (word,1)).reduceByKey(_+_)