Transformation
| 转换 | 说明 |
|---|---|
| map(func) | 返回新的分布式数据集 |
| filter(func) | 返回新的数据集 |
| flatMap(func) | 每个输入项会映射成0或多个输出项 |
| mapPartitions(func) | 类似于map,但独立运行在RDD的每个分区 |
| mapPartitionsWithIndex(func) | 类似mapPartitions,但提供分区的索引值 |
| sample(withReplacement,fraction,seed) | |
| union(otherDataset) | 返回新的数据集 |
| intersection(otherDataset) | |
| distinct([numTasks]) | |
| groupByKey([numTasks]) | |
| reduceByKey(func,[numTasks]) | |
| aggregateByKey(zeroValue)(seqOp,combOp,[numTasks]) | |
| sortByKey([ascending],[numTasks]) | |
| join(otherDataset,[numTasks]) | |
| cogroup(otherDataset,[numTasks]) | |
| cartesian(otherDataset) | |
| pipe(command,[envVars]) | |
| coalesce(numPartitions) | |
| repartition(numPartitions) | |
| repartitionAndSortWithinPartitions(partitioner) |
Actions
| Action | 说明 |
|---|---|
| reduce(func) | |
| collect() | |
| count() | |
| first() | |
| take(n) | |
| takeSample(withReplacement,num,[seed]) | |
| takeOrdered(n,[ordering]) | |
| saveAsSequenceFile(path) | |
| saveAsObjectFile(path) | |
| countByKey() | |
| foreach(func) |