前提

系统:centos 7.3.1611

python: anaconda3

spark:2.2.1

方式1-环境变量

方式2-profile

创建ipython profile,并命名为pyspark

ipython profile create pyspark

配置~/.ipython/profile_pyspark/startup/00-pyspark-setup.py

import os
import system
spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
  raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip'))
exec(open(os.path.join(spark_home, 'python/pyspark/shell.py')).read())

启动ipython

ipython --profile=pyspark

方式3-kernel

创建kernel.json

$ cat .ipython/kernels/pyspark/kernel.json
{
"display_name": "pySpark (Spark 2.1.1)",
"language": "python",
"argv": [
"/opt/anaconda2/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
     "CAPTURE_STANDARD_OUT": "true",
     "CAPTURE_STANDARD_ERR": "true",
     "SEND_EMPTY_OUTPUT": "false",
     "SPARK_HOME": "/data/spark-2.1.1-bin-hadoop2.6"
 }
}

启动jupyter notebook

jupyter notebook --no-browser

调整日志级别

在conf目录存在log4j.properties.template,可以通过复制该模板创建日志配置文件。

默认日志配置

log4j.rootCategory=INFO,console

降低日志级别,只显示警告及更严重的信息

log4j.rootCategory=WARN,console

此时通过pyspark等打开的shell,日志输出将会变少。

results matching ""

    No results matching ""