Transformation

转换 说明
map(func) 返回新的分布式数据集
filter(func) 返回新的数据集
flatMap(func) 每个输入项会映射成0或多个输出项
mapPartitions(func) 类似于map,但独立运行在RDD的每个分区
mapPartitionsWithIndex(func) 类似mapPartitions,但提供分区的索引值
sample(withReplacement,fraction,seed)
union(otherDataset) 返回新的数据集
intersection(otherDataset)
distinct([numTasks])
groupByKey([numTasks])
reduceByKey(func,[numTasks])
aggregateByKey(zeroValue)(seqOp,combOp,[numTasks])
sortByKey([ascending],[numTasks])
join(otherDataset,[numTasks])
cogroup(otherDataset,[numTasks])
cartesian(otherDataset)
pipe(command,[envVars])
coalesce(numPartitions)
repartition(numPartitions)
repartitionAndSortWithinPartitions(partitioner)

Actions

Action 说明
reduce(func)
collect()
count()
first()
take(n)
takeSample(withReplacement,num,[seed])
takeOrdered(n,[ordering])
saveAsSequenceFile(path)
saveAsObjectFile(path)
countByKey()
foreach(func)

results matching ""

    No results matching ""