# Hive DDL数据定义
1.显示数据库
hive (dyhtest)> show databases;
OK
database_name
default
dyhtest
Time taken: 0.022 seconds, Fetched: 2 row(s)
--- 过滤显示查询的数据库,使用模糊匹配
hive (dyhtest)> show databases like 'db_hive*';
OK
database_name
db_hive
db_hive_1
Time taken: 0.034 seconds, Fetched: 2 row(s)
2.创建数据库
语法:
CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
注意: 未指定location,默认的是:
---创建一个数据库,数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db。
hive (dyhtest)> desc database db_hive;
OK
db_name comment location owner_name owner_type parameters
db_hive hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db atdyh USER
Time taken: 0.027 seconds, Fetched: 1 row(s)
-- 创建数据库
hive (dyhtest)> create database if not exists mydb
> comment "my first db"
> with dbproperties("createtime"="2021-04-24");
OK
Time taken: 0.077 seconds
-- 查看下是否创建成功
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
mydb
Time taken: 0.021 seconds, Fetched: 5 row(s)
注意: 避免要创建的数据库已经存在错误,增加if not exists判断。
-- 显示库已经存在
hive (dyhtest)> create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
---加上就不会出错
hive (dyhtest)> create database if not exists db_hive;
OK
Time taken: 0.018 seconds
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
Time taken: 0.024 seconds, Fetched: 4 row(s)
3.查看数据库详情
hive (dyhtest)> desc database db_hive;
OK
db_name comment location owner_name owner_type parameters
db_hive hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db atdyh USER
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive (dyhtest)> desc database extended mydb ;
OK
db_name comment location owner_name owner_type parameters
mydb my first db hdfs://hadoop102:9820/user/hive/warehouse/mydb.db atdyh USER {createtime=2021-04-24}
Time taken: 0.033 seconds, Fetched: 1 row(s)
注意:加上extended关键字,把创建库的时候添加的参数都会显示出来
hive (dyhtest)> use db_hive;
OK
Time taken: 0.026 seconds
-- 数据库由 dyhtest切换为db_hive
hive (db_hive)>
注意: 1.hive创建查看数据库相关的指令其实就是去元数据库去查询相关的信息
-- 连接mysql
[atdyh@hadoop102 ~]$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 83
Server version: 5.7.28 MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
-- 切换元数据库
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| metastore |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.01 sec)
mysql> use metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
--- 查看元数据库下的表
mysql> show tables;
+-------------------------------+
| Tables_in_metastore |
+-------------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| CTLGS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| I_SCHEMA |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES |
| MIN_HISTORY_LEVEL |
| MV_CREATION_METADATA |
| MV_TABLES_USED |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NEXT_WRITE_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| REPL_TXN_MAP |
| ROLES |
| ROLE_MAP |
| RUNTIME_STATS |
| SCHEMA_VERSION |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TXN_TO_WRITE_ID |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WM_MAPPING |
| WM_POOL |
| WM_POOL_TO_TRIGGER |
| WM_RESOURCEPLAN |
| WM_TRIGGER |
| WRITE_SET |
+-------------------------------+
74 rows in set (0.00 sec)
--- 查看元数据存储的表DBS
mysql> show create table DBS;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| DBS | CREATE TABLE `DBS` (
`DB_ID` bigint(20) NOT NULL,
`DESC` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`DB_LOCATION_URI` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`OWNER_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`OWNER_TYPE` varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`CTLG_NAME` varchar(256) NOT NULL DEFAULT 'hive',
PRIMARY KEY (`DB_ID`),
UNIQUE KEY `UNIQUE_DATABASE` (`NAME`,`CTLG_NAME`),
KEY `CTLG_FK1` (`CTLG_NAME`),
CONSTRAINT `CTLG_FK1` FOREIGN KEY (`CTLG_NAME`) REFERENCES `CTLGS` (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
--- 查看表内容,存储的都是hive数据库相关的信息
-- id 库的注释 location 库名字 创建用户名 创建用户类型 客户端
mysql> mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| 1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse | default | public | ROLE | hive |
| 6 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db | dyhtest | atdyh | USER | hive |
| 11 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db | db_hive | atdyh | USER | hive |
| 12 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh | USER | hive |
| 13 | my first db | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db | mydb | atdyh | USER | hive |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)
4.修改数据库
用户可以使用ALTER DATABASE命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。
hive (db_hive)> alter database mydb set dbproperties("createtime"="2020-04-24","author"="wyh");
OK
Time taken: 0.098 seconds
-- 查看是否修改成功
hive (db_hive)> desc database extended mydb ;
OK
db_name comment location owner_name owner_type parameters
mydb my first db hdfs://hadoop102:9820/user/hive/warehouse/mydb.db atdyh USER {createtime=2020-04-24, author=wyh}
Time taken: 0.034 seconds, Fetched: 1 row(s)
上述操作其实就是hive底层修改元数据,可在元数据存储(mysql)查看到修改记录,
mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| 1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse | default | public | ROLE | hive |
| 6 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db | dyhtest | atdyh | USER | hive |
| 11 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db | db_hive | atdyh | USER | hive |
| 12 | NULL | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh | USER | hive |
| 13 | my first db | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db | mydb | atdyh | USER | hive |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)
-- 修改记录
mysql>
select * from DATABASE_PARAMS;
+-------+------------+-------------+
| DB_ID | PARAM_KEY | PARAM_VALUE |
+-------+------------+-------------+
| 13 | author | wyh |
| 13 | createtime | 2020-04-24 |
+-------+------------+-------------+
2 rows in set (0.00 sec)
5.删除数据库
- 如果删除的数据库不存在,最好采用 if exists判断数据库是否存在
hive (dyhtest)> drop database if exists db_hive_1;
OK
Time taken: 0.026 seconds
- 如果数据库不为空,可以采用cascade命令,强制删除
-- db_hive 不为空
hive (db_hive)> show tables;
OK
tab_name
mytbl
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive (db_hive)> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
-- 级联删除
hive (db_hive)> drop database db_hive cascade ;
OK
Time taken: 0.427 seconds
-- 查看数据库
hive (db_hive)> use dyhtest;
OK
Time taken: 0.027 seconds
-- 成功删除
hive (dyhtest)> show databases;;
OK
database_name
default
dyhtest
mydb
Time taken: 0.019 seconds, Fetched: 3 row(s)
hive (dyhtest)> drop database db_hive_1;
OK
Time taken: 0.259 seconds
6.建表
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name -- EXTERANL: 外部表
[(col_name data_type [COMMENT col_comment], ...)] -- 列名 列类型 列描述信息 ....
[COMMENT table_comment] -- 表描述信息
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] -- 创建分区表指定分区字段 分区列名 列类型
[CLUSTERED BY (col_name, col_name, ...) -- 创建分桶表指定分桶字段 分桶列名
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] -- 指定分桶数
[ROW FORMAT delimited fields terminated by ... ] -- 指定一条数据字段与字段的分割符
[collection items terminated by ... ] -- 指定集合元素与元素的分割符
[map keys terminated by ... ] -- 指定map的kv的分割符
[STORED AS file_format] -- 指定文件存储格式,默认为 textfile
[LOCATION hdfs_path] -- 指定表在hdfs中对应的路径
[TBLPROPERTIES (property_name=property_value, ...)] -- 指定表的属性
[AS select_statement] -- 基于某个查询建表
-- 创建表
hive (dyhtest)> create table if not exists test2(
> id int comment "this's id ",
> name string comment "this 's name"
> )
> comment "测试用"
> row format delimited fields terminated by ','
> STORED as textfile
> TBLPROPERTIES("createtime"="2022-04-24") ;
OK
Time taken: 0.299 seconds
-- 查看表是否已创建
hive (dyhtest)> desc test2;
OK
col_name data_type comment
id int this's id
name string this 's name
Time taken: 0.055 seconds, Fetched: 2 row(s)
- 查看表
show tables desc test2; desc formatted test2;
hive (dyhtest)> desc test2;
OK
col_name data_type comment
id int this's id
name string this 's name
Time taken: 0.055 seconds, Fetched: 2 row(s)
hive (dyhtest)> desc formatted test2;
OK
col_name data_type comment
# col_name data_type comment
id int this's id
name string this 's name
# Detailed Table Information
Database: dyhtest
OwnerType: USER
Owner: atdyh
CreateTime: Sun Jun 19 15:38:29 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"name\":\"true\"}}
bucketing_version 2
comment ???
createtime 2022-04-24
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1655624309
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.163 seconds, Fetched: 35 row(s)
注意查看表也是hive底层查看元数据信息: 1.先查看下TBLS,查找到SD_ID
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| 1 | 1654416053 | 6 | 0 | atdyh | USER | 0 | 1 | mytbl | MANAGED_TABLE | NULL | NULL | |
| 6 | 1654430751 | 6 | 0 | atdyh | USER | 0 | 6 | test1 | MANAGED_TABLE | NULL | NULL | |
| 8 | 1654432371 | 6 | 0 | atdyh | USER | 0 | 8 | test | MANAGED_TABLE | NULL | NULL | |
| 12 | 1655624309 | 6 | 0 | atdyh | USER | 0 | 12 | test2 | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
4 rows in set (0.01 sec)
2.然后查看下SDS,可以看到表相关的跑MR的输入/输出存储类型、location等信息
mysql> select * from SDS;
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| 1 | 1 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/mytbl | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 1 |
| 6 | 6 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test1 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 6 |
| 8 | 8 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 8 |
| 12 | 12 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 12 |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
- DML - 数据导入
数据加载的几种方式 1.load 方式 load data local inpath ‘文件夹/文件’ into table 表名; 例如: load data local inpath ‘/opt/module/hive-3.1.2/datas/testdata.txt’ into table test2;
--- 准备数据
[atdyh@hadoop102 datas]$ sudo vim testdata.txt
1001,zhangsan
1002,lisi
1003,wangwu
~
-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/testdata.txt' into table test2;
Loading data to table dyhtest.test2
OK
Time taken: 0.538 seconds
hive (dyhtest)> select * from test2;
OK
test2.id test2.name
1001 zhangsan
1002 lisi
1003 wangwu
Time taken: 0.133 seconds, Fetched: 3 row(s)
可以看到test2的目录下有我们刚刚通过load方式加载过来的数据 由此就可以推出下一个方式,直接把准备好的数据上传到对应的目录下 2.把数据直接上传到对应表的hdfs上
-- 把准备好的数据上传到hdfs
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt /user/hive/warehouse/dyhtest.db/test2
2022-06-19 16:44:22,471 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$
-- 查找数据
hive (dyhtest)> select * from test2;
OK
test2.id test2.name
1001 zhangsan
1002 lisi
1003 wangwu
Time taken: 0.197 seconds, Fetched: 3 row(s)
既然这个样的话,我们可以通过另外一种方式,再建表的时候指定location,然后建完表就可以直接查数据了 3. 先把数据上传到hdfs,然后建表的时候指定location
-- hdfs创建文件夹
[atdyh@hadoop102 datas]$ hadoop fs -mkdir /mydata
-- 上传数据
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt /mydata
2022-06-19 16:51:10,787 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$
-- 建表
hive (dyhtest)> create table if not exists test3(
> id int ,
> name string
> )
> row format delimited fields terminated by ','
> location "/mydata" ;
OK
Time taken: 0.14 seconds
-- 查询数据
hive (dyhtest)> select * from test3;
OK
test3.id test3.name
1001 zhangsan
1002 lisi
1003 wangwu
Time taken: 0.144 seconds, Fetched: 3 row(s)
hive (dyhtest)>
- 表的分类
1.管理表(内部表),不加external关键字
hive (dyhtest)> create table if not exists test4(
> id int ,
> name string
> )
> row format delimited fields terminated by ',' ;
hive (dyhtest)> desc formatted test4;
OK
col_name data_type comment
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: dyhtest
OwnerType: USER
Owner: atdyh
CreateTime: Sun Jun 19 17:15:42 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4
Table Type: MANAGED_TABLE
Table Parameters:
bucketing_version 2
numFiles 1
numRows 0
rawDataSize 0
totalSize 36
transient_lastDdlTime 1655630433
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.07 seconds, Fetched: 32 row(s)
通过 desc formatted 查看到test4 的类型: Table Type: MANAGED_TABLE 2.外部表
hive (dyhtest)> create external table if not exists test5(
> id int ,
> name string
> )
> row format delimited fields terminated by ',' ;
hive (dyhtest)> desc formatted test5;
OK
col_name data_type comment
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: dyhtest
OwnerType: USER
Owner: atdyh
CreateTime: Sun Jun 19 17:15:55 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
bucketing_version 2
numFiles 1
numRows 0
rawDataSize 0
totalSize 36
transient_lastDdlTime 1655630436
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.07 seconds, Fetched: 33 row(s)
通过 desc formatted 查看到test4 的类型: Table Type: EXTERNAL_TABLE
- 内部表和外部表相互转换
a. 内部表转外部表 语法: alter table 表名set tblproperties(‘EXTERNAL’ = ‘TRUE’); 括号里面需要大写
hive (dyhtest)> alter table test4 set tblproperties('EXTERNAL' = 'TRUE');
OK
Time taken: 0.108 seconds
hive (dyhtest)> desc formatted test4;
OK
col_name data_type comment
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: dyhtest
OwnerType: USER
Owner: atdyh
CreateTime: Sun Jun 19 17:15:42 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
bucketing_version 2
last_modified_by atdyh
last_modified_time 1655630844
numFiles 1
numRows 0
rawDataSize 0
totalSize 36
transient_lastDdlTime 1655630844
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.075 seconds, Fetched: 35 row(s)
可以看到test4从内部表转为了外部表: Table Type: EXTERNAL_TABLE
b. 外部表转内部表
hive (dyhtest)> alter table test5 set tblproperties ('EXTERNAL'='FALSE');
OK
Time taken: 0.094 seconds
hive (dyhtest)> desc formatted test5;
OK
col_name data_type comment
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: dyhtest
OwnerType: USER
Owner: atdyh
CreateTime: Sun Jun 19 17:15:55 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5
Table Type: MANAGED_TABLE
Table Parameters:
EXTERNAL FALSE
bucketing_version 2
last_modified_by atdyh
last_modified_time 1655631058
numFiles 1
numRows 0
rawDataSize 0
totalSize 36
transient_lastDdlTime 1655631058
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.063 seconds, Fetched: 35 row(s)
可以看出来test5从外部表转为了内部表: Table Type: MANAGED_TABLE
注意: 1.如果想要删除表和删除数据: a.外部表:先把外部表转为内部表 alter table test5 set tblproperties (‘EXTERNAL’=‘FALSE’); drop table test5 b.内部表:直接删除 drop test5
[atdyh@hadoop102 datas]$ cat emptest.txt
1001 zhangsan 10000.1
1002 lisi 10000.2
1003 wangwu 10000.3
[atdyh@hadoop102 datas]$
2.创建表 加载数据
-- 创建表
hive (dyhtest)> create table emp(
> id int ,
> name string,
> salary double
> )
> row format delimited fields terminated by '\t';
OK
Time taken: 0.451 seconds
hive (dyhtest)> show tables;
OK
tab_name
emp
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.061 seconds, Fetched: 8 row(s)
-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/emptest.txt' into table emp;
Loading data to table dyhtest.emp
OK
Time taken: 0.394 seconds
3.修改表名 语法: alter table 旧表名 rename 新表名
hive (dyhtest)> alter table emp rename to emptest;
OK
Time taken: 0.224 seconds
hive (dyhtest)> show tables;
OK
tab_name
emptest
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.045 seconds, Fetched: 8 row(s)
4.列相关操作 语法: ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] alter table 表名 change 旧列名 新列名 列类型 a. 修改列名
-- 修改列名
hive (dyhtest)> alter table emptest change column salary sal double ;
OK
Time taken: 0.167 seconds
-- 查看修改后的结果
hive (dyhtest)> show create table emptest;
OK
createtab_stmt
CREATE TABLE `emptest`(
`id` int,
`name` string,
`sal` double)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/emptest'
TBLPROPERTIES (
'bucketing_version'='2',
'last_modified_by'='atdyh',
'last_modified_time'='1655645129',
'transient_lastDdlTime'='1655645129')
Time taken: 0.052 seconds, Fetched: 20 row(s)
注意: 1.修改列的时候,如果涉及到修改类型,修改后的类型需要>=原来的类型。例如原来是double,修改后就不能用float。会提示修改不成功。
b.增加和替换列 语法: ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], …)
alter tables 表名 add| replace cloumns (列名 类型 注释)
-- 添加列
hive (dyhtest)> alter table emptest add columns (addr string, deptno int );
OK
Time taken: 0.132 seconds
-- 查看是否成功
hive (dyhtest)> select * from emptest;
OK
emptest.id emptest.name emptest.sal emptest.addr emptest.deptno
NULL 10000.1 NULL NULL NULL
1002 lisi 10000.2 NULL NULL
1003 wangwu 10000.3 NULL NULL
Time taken: 0.274 seconds, Fetched: 3 row(s)
替换列
-- 替换列
hive (dyhtest)> alter table emptest replace columns (empid int, empname string);
OK
Time taken: 0.114 seconds
-- 查看数据
hive (dyhtest)> select * from emptest;
OK
emptest.empid emptest.empname
NULL 10000.1
1002 lisi
1003 wangwu
Time taken: 0.149 seconds, Fetched: 3 row(s)
注:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。
|