自己复习的记录 一、使用反射机制推断RDD 在利用反射机制推断RDD模式时,需要首先定义一个case class,因为,只有case class才能被Spark隐式地转换为DataFrame。
val scoreRdd = sc.textFile("data/data2.txt")
val value = scoreRdd.map(_.split(" ")).map(x => (x(0), x(1), x(2))).toDF("id", "name", "subject")
value.createTempView("student")
spark.sql("select * from student").show()
二、显式定义RDD模式 将含有Row的RDD显式转换成DataFrame。
val schema = StructType(Array(StructField("id",StringType,true),
StructField("name",StringType,true), StructField("subhect",StringType))
val rowRDD = studentRDD.map(_.split(" ")).map(attributes => Row(attributes(0), attributes(1),attributes(2)))
val peopleDF = spark.createDataFrame(rowRDD, schema)
peopleDF.createOrReplaceTempView("people")
spark.sql("SELECT name,age FROM people").show()
|