Spark SQL可以自动推测JSON的schema,并将其加载为DataFrame。

具体实例可以查看$SPARK_HOME/examples/src/main/python/sql/datasource.py

json并不是传统上的JSON,而是每行都有分隔符的json

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

读取json文件

path = "examples/src/main/resources/people.json"
peopleDF = spark.read.json(path)

打印Schema

peopleDF.printSchema()

创建临时视图

peopleDF.createOrReplaceTempView("people")

sql查询

teenagerNamesDF = spark.sql("SELECT name FROM people WHERE age BETWEEN 13 AND 19")
teenagerNamesDF.show()

方法2

jsonStrings = ['{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}']
otherPeopleRDD = sc.parallelize(jsonStrings)
otherPeople = spark.read.json(otherPeopleRDD)
otherPeople.show()

results matching ""

    No results matching ""