Hive Lateral View、视图与索引

网友投稿 252 2022-11-16

Hive Lateral View、视图与索引

1.Hive Lateral View

Lateral View 用于和 UDTF 函数(explode、split)结合来使用。 首先通过 UDTF 函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。 主要解决在 select 使用 UDTF 做查询过程中,查询只能包含单个 UDTF,不能包含其 他字段、以及多个 UDTF 的问题 语法:

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

hive> select explode(likes) from person7;OKlolbookmovielolbookmovielolbookmovielolbookhive> select id,name,mycol1,mycol2,mycol3 from person7 lateral view explode(likes) mytb1 as mycol1 lateral view explode(address) mytb2 as mycol2,mycol3;OK1 小明 1 lol beijing xisanqi1 小明 1 lol shanghai pudong 1 小明 1 book beijing xisanqi1 小明 1 book shanghai pudong 1 小明 1 movie beijing xisanqi1 小明 1 movie shanghai pudong 2 小明 2 lol beijing xisanqi2 小明 2 lol shanghai pudong 2 小明 2 book beijing xisanqi2 小明 2 book shanghai pudong 2 小明 2 movie beijing xisanqi2 小明 2 movie shanghai pudong 3 小明 3 lol beijing xisanqi3 小明 3 lol shanghai pudong 3 小明 3 book beijing xisanqi3 小明 3 book shanghai pudong 3 小明 3 movie beijing xisanqi3 小明 3 movie shanghai pudong 4 小明 4 lol beijing xisanqi4 小明 4 lol shanghai pudong 4 小明 4 book beijing xisanqi4 小明 4 book shanghai pudong 4 小明 4 movie beijing xisanqi4 小明 4 movie shanghai pudong 5 小明 5 lol beijing xisanqihive> select count(distinct(mycol2)),count(distinct(mycol3)) from person7 lateral view explode(address) mytb2 as mycol2,mycol3;Query ID = root_20211112124842_626f1f8e-0764-4306-ab9d-dc404771b732Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=Starting Job = job_1636690643978_0001, Tracking URL = Command = /opt/hadoop-2.6.5/bin/hadoop job -kill job_1636690643978_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12021-11-12 12:49:07,938 Stage-1 map = 0%, reduce = 0%2021-11-12 12:49:29,804 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.54 sec2021-11-12 12:49:46,263 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.42 secMapReduce Total cumulative CPU time: 5 seconds 420 msecEnded Job = job_1636690643978_0001MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.42 sec HDFS Read: 12719 HDFS Write: 4 SUCCESSTotal MapReduce CPU Time Spent: 5 seconds 420 msecOK2 3Time taken: 66.565 seconds, Fetched: 1 row(s)hive> create table yyy( > id int, > name string);OKTime taken: 2.269 seconds

2.Hive 视图

#mysql 中支持视图删除: CREATE VIEW v_users AS SELECT * FROM myusers; DELETE FROM v_users WHERE id = '1316403900579872';

创建视图:

hive> create view v_psn as select * from person5;OKTime taken: 0.275 seconds

hive> drop view v_psn;OKTime taken: 1.22 seconds

3.Hive 索引

目的:优化查询以及检索性能 创建索引:

hive> create index t1_index > on table person5(name) > as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler" > with deferred rebuild > in table t1_index_table > comment "table person5 name index comment";OKTime taken: 1.732 seconds

as:指定索引器; in table:指定索引表,若不指定默认生成在 default__person5_t1_index__表中

重建索引(建立索引之后必须重建索引才能生效)

ALTER INDEX t1_index ON person5 REBUILD;

重建完毕之后,再次查询有索引数据:select * from t1_index_table;

hive> select * from t1_index_table;OKNULL hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [513] 10小明 1 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [0] 10小明 2 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [59] 10小明 3 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [118] 10小明 4 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [177] 10小明 5 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [236] 10小明 6 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [290] 10小明 7 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [349] 10小明 8 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [402] 10小明 9 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [455] 10

删除索引

DROP INDEX IF EXISTS t1_index ON person5;

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:java poi 读取单元格null或者空字符串方式
下一篇:hdfs 参数调优
相关文章

 发表评论

暂时没有评论,来抢沙发吧~