前提
系统:centos 7.3.1611
python: anaconda3
spark:2.2.1
方式1-环境变量
方式2-profile
创建ipython profile,并命名为pyspark
ipython profile create pyspark
配置~/.ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import system
spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip'))
exec(open(os.path.join(spark_home, 'python/pyspark/shell.py')).read())
启动ipython
ipython --profile=pyspark
方式3-kernel
创建kernel.json
$ cat .ipython/kernels/pyspark/kernel.json
{
"display_name": "pySpark (Spark 2.1.1)",
"language": "python",
"argv": [
"/opt/anaconda2/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"CAPTURE_STANDARD_OUT": "true",
"CAPTURE_STANDARD_ERR": "true",
"SEND_EMPTY_OUTPUT": "false",
"SPARK_HOME": "/data/spark-2.1.1-bin-hadoop2.6"
}
}
启动jupyter notebook
jupyter notebook --no-browser
调整日志级别
在conf目录存在log4j.properties.template,可以通过复制该模板创建日志配置文件。
默认日志配置
log4j.rootCategory=INFO,console
降低日志级别,只显示警告及更严重的信息
log4j.rootCategory=WARN,console
此时通过pyspark等打开的shell,日志输出将会变少。