文件studentinfo--> Linux -->? HDFS? --> Hive? -->? Spark读取
1.把本地的文件上传到Linux上
利用命令:rz-E 上传文件studentinfo到Linux下的/dataset/路径下
2.把Linux中/dataset/路径下studentinfo文件上传到HDFS上
hdfs dfs -mkdir -p /dataset
hdfs dfs -put studentinfo /dataset/
3.使用hive或beeline执行SQL,创建hive表student
CREATE DATABASE IF NOT EXISTS spark_integrition;
USE spark_integrition;
CREATE EXTERNAL TABLE student
(
name STRING,
age INT,
gpa string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/dataset/hive';
4.加载HDFS数据到hive
LOAD DATA INPATH '/dataset/studentinfo' OVERWRITE INTO TABLE student;
5.通过SparkSQL查询hive的表
scala> spark.sql("use spark_integrition")
scala> val resultDF = spark.sql("select * from student limit 10")
scala> resultDF.show()
|