IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> Hive基本使用(1) -> 正文阅读

[大数据]Hive基本使用(1)

# Hive DDL数据定义

1.显示数据库

hive (dyhtest)> show databases;
OK
database_name
default
dyhtest
Time taken: 0.022 seconds, Fetched: 2 row(s)

--- 过滤显示查询的数据库,使用模糊匹配
hive (dyhtest)> show databases like 'db_hive*';
OK
database_name
db_hive
db_hive_1
Time taken: 0.034 seconds, Fetched: 2 row(s)

2.创建数据库

语法:

CREATE DATABASE [IF NOT EXISTS] database_name #库是否已经存在
[COMMENT database_comment] # 库注释信息
[LOCATION hdfs_path] # 可指定路径
[WITH DBPROPERTIES (property_name=property_value, ...)];# 给创建的库加属性描述

注意:
未指定location,默认的是:

---创建一个数据库,数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db。

hive (dyhtest)> desc database db_hive;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db	atdyh	USER	
Time taken: 0.027 seconds, Fetched: 1 row(s)

-- 创建数据库
hive (dyhtest)>  create database if not exists mydb   
              > comment "my first db"
              > with dbproperties("createtime"="2021-04-24");
OK
Time taken: 0.077 seconds

-- 查看下是否创建成功
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
mydb
Time taken: 0.021 seconds, Fetched: 5 row(s)

注意:
避免要创建的数据库已经存在错误,增加if not exists判断。

-- 显示库已经存在
hive (dyhtest)> create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists

---加上就不会出错
hive (dyhtest)> create database if not exists  db_hive;
OK
Time taken: 0.018 seconds
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
Time taken: 0.024 seconds, Fetched: 4 row(s)

3.查看数据库详情

  • 显示数据库信息
hive (dyhtest)> desc database db_hive;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db	atdyh	USER	
Time taken: 0.027 seconds, Fetched: 1 row(s)

  • 显示数据库详细信息,extended
hive (dyhtest)> desc database extended mydb ; 
OK
db_name	comment	location	owner_name	owner_type	parameters
mydb	my first db	hdfs://hadoop102:9820/user/hive/warehouse/mydb.db	atdyh	USER	{createtime=2021-04-24}
Time taken: 0.033 seconds, Fetched: 1 row(s)

注意:加上extended关键字,把创建库的时候添加的参数都会显示出来

  • 切换当前数据库
hive (dyhtest)> use db_hive;
OK
Time taken: 0.026 seconds
-- 数据库由 dyhtest切换为db_hive
hive (db_hive)> 

注意:
1.hive创建查看数据库相关的指令其实就是去元数据库去查询相关的信息

--  连接mysql
[atdyh@hadoop102 ~]$ mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 83
Server version: 5.7.28 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
-- 切换元数据库
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| metastore          |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.01 sec)

mysql> use metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
--- 查看元数据库下的表
mysql> show tables;
+-------------------------------+
| Tables_in_metastore           |
+-------------------------------+
| AUX_TABLE                     |
| BUCKETING_COLS                |
| CDS                           |
| COLUMNS_V2                    |
| COMPACTION_QUEUE              |
| COMPLETED_COMPACTIONS         |
| COMPLETED_TXN_COMPONENTS      |
| CTLGS                         |
| DATABASE_PARAMS               |
| DBS                           |
| DB_PRIVS                      |
| DELEGATION_TOKENS             |
| FUNCS                         |
| FUNC_RU                       |
| GLOBAL_PRIVS                  |
| HIVE_LOCKS                    |
| IDXS                          |
| INDEX_PARAMS                  |
| I_SCHEMA                      |
| KEY_CONSTRAINTS               |
| MASTER_KEYS                   |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES       |
| MIN_HISTORY_LEVEL             |
| MV_CREATION_METADATA          |
| MV_TABLES_USED                |
| NEXT_COMPACTION_QUEUE_ID      |
| NEXT_LOCK_ID                  |
| NEXT_TXN_ID                   |
| NEXT_WRITE_ID                 |
| NOTIFICATION_LOG              |
| NOTIFICATION_SEQUENCE         |
| NUCLEUS_TABLES                |
| PARTITIONS                    |
| PARTITION_EVENTS              |
| PARTITION_KEYS                |
| PARTITION_KEY_VALS            |
| PARTITION_PARAMS              |
| PART_COL_PRIVS                |
| PART_COL_STATS                |
| PART_PRIVS                    |
| REPL_TXN_MAP                  |
| ROLES                         |
| ROLE_MAP                      |
| RUNTIME_STATS                 |
| SCHEMA_VERSION                |
| SDS                           |
| SD_PARAMS                     |
| SEQUENCE_TABLE                |
| SERDES                        |
| SERDE_PARAMS                  |
| SKEWED_COL_NAMES              |
| SKEWED_COL_VALUE_LOC_MAP      |
| SKEWED_STRING_LIST            |
| SKEWED_STRING_LIST_VALUES     |
| SKEWED_VALUES                 |
| SORT_COLS                     |
| TABLE_PARAMS                  |
| TAB_COL_STATS                 |
| TBLS                          |
| TBL_COL_PRIVS                 |
| TBL_PRIVS                     |
| TXNS                          |
| TXN_COMPONENTS                |
| TXN_TO_WRITE_ID               |
| TYPES                         |
| TYPE_FIELDS                   |
| VERSION                       |
| WM_MAPPING                    |
| WM_POOL                       |
| WM_POOL_TO_TRIGGER            |
| WM_RESOURCEPLAN               |
| WM_TRIGGER                    |
| WRITE_SET                     |
+-------------------------------+
74 rows in set (0.00 sec)
--- 查看元数据存储的表DBS
mysql> show create table DBS;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| DBS   | CREATE TABLE `DBS` (
  `DB_ID` bigint(20) NOT NULL,
  `DESC` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `DB_LOCATION_URI` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  `NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `OWNER_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `OWNER_TYPE` varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `CTLG_NAME` varchar(256) NOT NULL DEFAULT 'hive',
  PRIMARY KEY (`DB_ID`),
  UNIQUE KEY `UNIQUE_DATABASE` (`NAME`,`CTLG_NAME`),
  KEY `CTLG_FK1` (`CTLG_NAME`),
  CONSTRAINT `CTLG_FK1` FOREIGN KEY (`CTLG_NAME`) REFERENCES `CTLGS` (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

--- 查看表内容,存储的都是hive数据库相关的信息
		-- id 库的注释 location 库名字 创建用户名 创建用户类型 客户端 
mysql> mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC                  | DB_LOCATION_URI                                        | NAME      | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
|     1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse              | default   | public     | ROLE       | hive      |
|     6 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db   | dyhtest   | atdyh      | USER       | hive      |
|    11 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db   | db_hive   | atdyh      | USER       | hive      |
|    12 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh      | USER       | hive      |
|    13 | my first db           | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db      | mydb      | atdyh      | USER       | hive      |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)

4.修改数据库

用户可以使用ALTER DATABASE命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。

hive (db_hive)> alter database mydb set dbproperties("createtime"="2020-04-24","author"="wyh");
OK
Time taken: 0.098 seconds
-- 查看是否修改成功
hive (db_hive)> desc database extended mydb ; 
OK
db_name	comment	location	owner_name	owner_type	parameters
mydb	my first db	hdfs://hadoop102:9820/user/hive/warehouse/mydb.db	atdyh	USER	{createtime=2020-04-24, author=wyh}
Time taken: 0.034 seconds, Fetched: 1 row(s)

上述操作其实就是hive底层修改元数据,可在元数据存储(mysql)查看到修改记录,


mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC                  | DB_LOCATION_URI                                        | NAME      | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
|     1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse              | default   | public     | ROLE       | hive      |
|     6 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db   | dyhtest   | atdyh      | USER       | hive      |
|    11 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db   | db_hive   | atdyh      | USER       | hive      |
|    12 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh      | USER       | hive      |
|    13 | my first db           | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db      | mydb      | atdyh      | USER       | hive      |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)

-- 修改记录
mysql> 
select * from DATABASE_PARAMS;
+-------+------------+-------------+
| DB_ID | PARAM_KEY  | PARAM_VALUE |
+-------+------------+-------------+
|    13 | author     | wyh         |
|    13 | createtime | 2020-04-24  |
+-------+------------+-------------+
2 rows in set (0.00 sec)

5.删除数据库

  • 如果删除的数据库不存在,最好采用 if exists判断数据库是否存在
hive (dyhtest)> drop database if  exists db_hive_1;
OK
Time taken: 0.026 seconds
  • 如果数据库不为空,可以采用cascade命令,强制删除
-- db_hive 不为空
hive (db_hive)> show tables;
OK
tab_name
mytbl
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive (db_hive)> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
-- 级联删除
hive (db_hive)> drop database db_hive cascade ; 
OK
Time taken: 0.427 seconds
-- 查看数据库
hive (db_hive)> use dyhtest;
OK
Time taken: 0.027 seconds
-- 成功删除
hive (dyhtest)> show databases;;
OK
database_name
default
dyhtest
mydb
Time taken: 0.019 seconds, Fetched: 3 row(s)
  • 删除空数据库
hive (dyhtest)> drop database db_hive_1;
OK
Time taken: 0.259 seconds

6.建表

  • 建表语法
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name   -- EXTERANL: 外部表
[(col_name data_type [COMMENT col_comment], ...)]  -- 列名 列类型 列描述信息  ....
[COMMENT table_comment] -- 表描述信息
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] -- 创建分区表指定分区字段  分区列名  列类型
[CLUSTERED BY (col_name, col_name, ...) -- 创建分桶表指定分桶字段   分桶列名
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  -- 指定分桶数
[ROW FORMAT delimited fields terminated by ... ] -- 指定一条数据字段与字段的分割符
[collection items terminated by  ... ] -- 指定集合元素与元素的分割符
[map keys terminated by ... ] -- 指定map的kv的分割符
[STORED AS file_format] -- 指定文件存储格式,默认为 textfile
[LOCATION hdfs_path] -- 指定表在hdfs中对应的路径
[TBLPROPERTIES (property_name=property_value, ...)] -- 指定表的属性
[AS select_statement] -- 基于某个查询建表
  • 建表
-- 创建表
hive (dyhtest)> create table if not exists test2(
              >   id int comment "this's id ",
              >   name string  comment "this 's  name"
              > )
              > comment "测试用"
              > row format delimited fields terminated by ','
              > STORED as textfile 
              > TBLPROPERTIES("createtime"="2022-04-24") ;
OK
Time taken: 0.299 seconds
-- 查看表是否已创建
hive (dyhtest)> desc test2;
OK
col_name	data_type	comment
id                  	int                 	this's id           
name                	string              	this 's  name       
Time taken: 0.055 seconds, Fetched: 2 row(s)


  • 查看表
    show tables
    desc test2;
    desc formatted test2;
hive (dyhtest)> desc test2;
OK
col_name	data_type	comment
id                  	int                 	this's id           
name                	string              	this 's  name       
Time taken: 0.055 seconds, Fetched: 2 row(s)
hive (dyhtest)> desc formatted test2;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	this's id           
name                	string              	this 's  name       
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 15:38:29 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"name\":\"true\"}}
	bucketing_version   	2                   
	comment             	???                 
	createtime          	2022-04-24          
	numFiles            	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1655624309          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.163 seconds, Fetched: 35 row(s)

注意查看表也是hive底层查看元数据信息:
1.先查看下TBLS,查找到SD_ID

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1654416053 |     6 |                0 | atdyh | USER       |         0 |     1 | mytbl    | MANAGED_TABLE | NULL               | NULL               |                    |
|      6 |  1654430751 |     6 |                0 | atdyh | USER       |         0 |     6 | test1    | MANAGED_TABLE | NULL               | NULL               |                    |
|      8 |  1654432371 |     6 |                0 | atdyh | USER       |         0 |     8 | test     | MANAGED_TABLE | NULL               | NULL               |                    |
|     12 |  1655624309 |     6 |                0 | atdyh | USER       |         0 |    12 | test2    | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
4 rows in set (0.01 sec)

2.然后查看下SDS,可以看到表相关的跑MR的输入/输出存储类型、location等信息

mysql> select * from SDS;
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT                             | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION                                                   | NUM_BUCKETS | OUTPUT_FORMAT                                              | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
|     1 |     1 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/mytbl |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        1 |
|     6 |     6 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test1 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        6 |
|     8 |     8 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test  |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        8 |
|    12 |    12 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       12 |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
  • DML - 数据导入
    数据加载的几种方式
    1.load 方式
    load data local inpath ‘文件夹/文件’ into table 表名;
    例如:
    load data local inpath ‘/opt/module/hive-3.1.2/datas/testdata.txt’ into table test2;
--- 准备数据
[atdyh@hadoop102 datas]$ sudo vim testdata.txt

1001,zhangsan
1002,lisi
1003,wangwu
~    
-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/testdata.txt' into table test2;
Loading data to table dyhtest.test2
OK
Time taken: 0.538 seconds
hive (dyhtest)> select * from test2;
OK
test2.id	test2.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.133 seconds, Fetched: 3 row(s)
                     

可以看到test2的目录下有我们刚刚通过load方式加载过来的数据
在这里插入图片描述
由此就可以推出下一个方式,直接把准备好的数据上传到对应的目录下
2.把数据直接上传到对应表的hdfs上

-- 把准备好的数据上传到hdfs
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt  /user/hive/warehouse/dyhtest.db/test2
2022-06-19 16:44:22,471 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$ 

-- 查找数据
hive (dyhtest)> select * from test2;
OK
test2.id	test2.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.197 seconds, Fetched: 3 row(s)

既然这个样的话,我们可以通过另外一种方式,再建表的时候指定location,然后建完表就可以直接查数据了
3. 先把数据上传到hdfs,然后建表的时候指定location

-- hdfs创建文件夹
[atdyh@hadoop102 datas]$ hadoop fs -mkdir /mydata
-- 上传数据
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt  /mydata
2022-06-19 16:51:10,787 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$ 

-- 建表
hive (dyhtest)>  create table if not exists test3(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ','
              > location "/mydata" ;
OK
Time taken: 0.14 seconds

-- 查询数据
hive (dyhtest)> select * from test3;
OK
test3.id	test3.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.144 seconds, Fetched: 3 row(s)
hive (dyhtest)> 

  • 表的分类
    1.管理表(内部表),不加external关键字
hive (dyhtest)> create table if not exists test4(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ',' ;

hive (dyhtest)> desc formatted test4;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:42 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	bucketing_version   	2                   
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630433          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.07 seconds, Fetched: 32 row(s)


通过 desc formatted 查看到test4 的类型:
Table Type: MANAGED_TABLE
2.外部表

hive (dyhtest)> create external table if not exists test5(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ',' ;
hive (dyhtest)> desc formatted test5;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:55 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	EXTERNAL            	TRUE                
	bucketing_version   	2                   
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630436          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.07 seconds, Fetched: 33 row(s)
              

通过 desc formatted 查看到test4 的类型:
Table Type: EXTERNAL_TABLE

  1. 内部表和外部表相互转换
    a. 内部表转外部表
    语法:
    alter table 表名set tblproperties(‘EXTERNAL’ = ‘TRUE’);
    括号里面需要大写
hive (dyhtest)> alter table test4 set tblproperties('EXTERNAL' = 'TRUE');
OK
Time taken: 0.108 seconds
hive (dyhtest)> desc formatted test4;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:42 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	EXTERNAL            	TRUE                
	bucketing_version   	2                   
	last_modified_by    	atdyh               
	last_modified_time  	1655630844          
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630844          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.075 seconds, Fetched: 35 row(s)

可以看到test4从内部表转为了外部表:
Table Type: EXTERNAL_TABLE

b. 外部表转内部表

hive (dyhtest)> alter table test5 set tblproperties ('EXTERNAL'='FALSE');
OK
Time taken: 0.094 seconds
hive (dyhtest)> desc formatted test5;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:55 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	EXTERNAL            	FALSE               
	bucketing_version   	2                   
	last_modified_by    	atdyh               
	last_modified_time  	1655631058          
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655631058          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.063 seconds, Fetched: 35 row(s)

可以看出来test5从外部表转为了内部表:
Table Type: MANAGED_TABLE

注意:
1.如果想要删除表和删除数据:
a.外部表:先把外部表转为内部表
alter table test5 set tblproperties (‘EXTERNAL’=‘FALSE’);
drop table test5
b.内部表:直接删除
drop test5

  • 修改表
    1.准备数据
[atdyh@hadoop102 datas]$ cat emptest.txt 
1001    zhangsan	10000.1
1002	lisi	10000.2
1003	wangwu	10000.3
[atdyh@hadoop102 datas]$ 

2.创建表 加载数据

-- 创建表
hive (dyhtest)> create table emp(
              >   id int , 
              >   name string, 
              >   salary double  
              > ) 
              > row format delimited fields terminated by '\t';  
OK
Time taken: 0.451 seconds
hive (dyhtest)> show tables;
OK
tab_name
emp
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.061 seconds, Fetched: 8 row(s)

-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/emptest.txt' into table emp;
Loading data to table dyhtest.emp
OK
Time taken: 0.394 seconds

3.修改表名
语法:
alter table 旧表名 rename 新表名

hive (dyhtest)> alter table emp rename to emptest;
OK
Time taken: 0.224 seconds
hive (dyhtest)> show tables;
OK
tab_name
emptest
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.045 seconds, Fetched: 8 row(s)

4.列相关操作
语法:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
alter table 表名 change 旧列名 新列名 列类型
a. 修改列名

-- 修改列名
hive (dyhtest)>  alter table emptest change column salary sal double ;
OK
Time taken: 0.167 seconds
-- 查看修改后的结果
hive (dyhtest)> show  create table emptest;
OK
createtab_stmt
CREATE TABLE `emptest`(
  `id` int, 
  `name` string, 
  `sal` double)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='\t', 
  'serialization.format'='\t') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/emptest'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'last_modified_by'='atdyh', 
  'last_modified_time'='1655645129', 
  'transient_lastDdlTime'='1655645129')
Time taken: 0.052 seconds, Fetched: 20 row(s)
		

注意:
1.修改列的时候,如果涉及到修改类型,修改后的类型需要>=原来的类型。例如原来是double,修改后就不能用float。会提示修改不成功。

b.增加和替换列
语法:
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], …)

alter tables 表名 add| replace cloumns (列名 类型 注释)

-- 添加列
hive (dyhtest)>  alter table emptest add columns (addr string, deptno int );
OK
Time taken: 0.132 seconds
-- 查看是否成功
hive (dyhtest)> select * from emptest;
OK
emptest.id	emptest.name	emptest.sal	emptest.addr	emptest.deptno
NULL	10000.1	NULL	NULL	NULL
1002	lisi	10000.2	NULL	NULL
1003	wangwu	10000.3	NULL	NULL
Time taken: 0.274 seconds, Fetched: 3 row(s)

替换列

-- 替换列
hive (dyhtest)> alter table emptest replace columns (empid int, empname string);
OK
Time taken: 0.114 seconds
-- 查看数据
hive (dyhtest)> select * from emptest;
OK
emptest.empid	emptest.empname
NULL	10000.1
1002	lisi
1003	wangwu
Time taken: 0.149 seconds, Fetched: 3 row(s)

注:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。

  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2022-06-23 00:55:58  更:2022-06-23 00:56:06 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年5日历 -2024/5/19 21:36:28-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码