Spark支持将数据集保存在cluster缓存中，当数据需要频繁访问，这非常有用。下面将linesWithSpark 数据集缓存。

Scala

scala> linesWithSpark.cache()
res7: linesWithSpark.type = [value: string]

scala> linesWithSpark.count()
res8: Long = 15

scala> linesWithSpark.count()
res9: Long = 15

Python

>>> linesWithSpark.cache()

>>> linesWithSpark.count()
15

>>> linesWithSpark.count()
15

Caching

results matching ""

No results matching ""