Transformation
转换 | 说明 |
---|---|
map(func) | 返回新的分布式数据集 |
filter(func) | 返回新的数据集 |
flatMap(func) | 每个输入项会映射成0或多个输出项 |
mapPartitions(func) | 类似于map,但独立运行在RDD的每个分区 |
mapPartitionsWithIndex(func) | 类似mapPartitions,但提供分区的索引值 |
sample(withReplacement,fraction,seed) | |
union(otherDataset) | 返回新的数据集 |
intersection(otherDataset) | |
distinct([numTasks]) | |
groupByKey([numTasks]) | |
reduceByKey(func,[numTasks]) | |
aggregateByKey(zeroValue)(seqOp,combOp,[numTasks]) | |
sortByKey([ascending],[numTasks]) | |
join(otherDataset,[numTasks]) | |
cogroup(otherDataset,[numTasks]) | |
cartesian(otherDataset) | |
pipe(command,[envVars]) | |
coalesce(numPartitions) | |
repartition(numPartitions) | |
repartitionAndSortWithinPartitions(partitioner) |
Actions
Action | 说明 |
---|---|
reduce(func) | |
collect() | |
count() | |
first() | |
take(n) | |
takeSample(withReplacement,num,[seed]) | |
takeOrdered(n,[ordering]) | |
saveAsSequenceFile(path) | |
saveAsObjectFile(path) | |
countByKey() | |
foreach(func) |