CountByKey
因为代码中调用了collect,所以是action算子
将(key,value)转换成了(key,1) 然后调用了reduceByKey ->collect->toMap
PairRDDFunctions类\
/**
* TODO:统计每个不同的key的个数
* Count the number of elements for each key, collecting the results to a local Map.
*
* @note This method should only be used if the resulting map is expected to be small, as
* the whole thing is loaded into the driver's memory.
* To handle very large results, consider using rdd.mapValues(_ => 1L).reduceByKey(_ + _), which
* returns an RDD[T, Long] instead of a map.
*/
def countByKey(): Map[K, Long] = self.withScope {
// TODO:将(key,value)=>(key,1) 然后调用reduceByKey,这里调用了collect 触发了job操作,所以这个是action算子
|