数仓搭建的过程中,需要将业务数据导入到hive中,但是业务表较多,建表成为一个效率瓶颈。可以利用mysql的元数据库INFORMATION_SCHEMA ,其中COLUMNS和 TABLES这两张表记录了我们需要的表名,列名,comemnt等信息。
下面是生产中实际用到的一个案例sql,在使用时可以根据自己公司的规范进行修改: 可以往where table name中添加多张表,实现批量创建
SELECT CONCAT('create table if not exists ods.ods_',schema_table, ' (',column_info,
') comment \'', TABLE_COMMENT, '\' partitioned by (p_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY \'\\001\' stored as orc',
' location \'hdfs://emr-cluster/user/hive/warehouse/fle_ods.db/ods_',schema_table, '\' TBLPROPERTIES ("orc.compress" = "snappy");')
FROM (SELECT TABLE_NAME,
table_schema,
schema_new,
concat(schema_new,'_',TABLE_NAME,'','_di') as schema_table,
TABLE_COMMENT,
group_concat(CONCAT(COLUMN_NAME, ' ', DATA_TYPE, ' comment ', '\'', COLUMN_COMMENT, '\'')) AS column_info
FROM (SELECT t1.TABLE_NAME,
t1.table_schema,
case when t1.table_schema ='test' then 'other_test' when t1.table_schema='order_test' then 'ods_order' else '??' end as schema_new,
t2.TABLE_COMMENT,
COLUMN_NAME,
CASE
WHEN DATA_TYPE = 'varchar' THEN 'string'
WHEN DATA_TYPE = 'int' THEN 'string'
WHEN DATA_TYPE = 'tinyint' THEN 'string'
WHEN DATA_TYPE = 'decimal' THEN 'string'
WHEN DATA_TYPE = 'datetime' THEN 'string'
WHEN DATA_TYPE = 'timestamp' THEN 'string'
WHEN DATA_TYPE = 'float' THEN 'string'
WHEN DATA_TYPE = 'double' THEN 'string'
when DATA_TYPE = 'bigint' THEN 'string'
WHEN DATA_TYPE = 'smallint' THEN 'string'
WHEN DATA_TYPE = 'tinyint' THEN 'string'
WHEN DATA_TYPE = 'date' THEN 'string'
WHEN DATA_TYPE = 'boolean' THEN 'string'
WHEN DATA_TYPE = 'time' THEN 'string'END AS DATA_TYPE,
COLUMN_COMMENT
FROM COLUMNS t1
JOIN TABLES t2 ON t1.TABLE_NAME = t2.TABLE_NAME
WHERE t1.TABLE_NAME in ('test_table_1','table_2')
) t3
GROUP BY TABLE_NAME, TABLE_COMMENT
) t4
|