Spark支持将数据集保存在cluster缓存中,当数据需要频繁访问,这非常有用。下面将linesWithSpark 数据集缓存。

Scala

scala> linesWithSpark.cache()
res7: linesWithSpark.type = [value: string]

scala> linesWithSpark.count()
res8: Long = 15

scala> linesWithSpark.count()
res9: Long = 15

Python

>>> linesWithSpark.cache()

>>> linesWithSpark.count()
15

>>> linesWithSpark.count()
15

results matching ""

    No results matching ""