编写代码创建UDTF函数

网友投稿 291 2022-09-14

编写代码创建UDTF函数

1)创建UDTF函数——编写代码

(1)创建一个maven工程:hivefunction

(2)创建包名:com.atguigu.hive.udtf

(3)引入如下依赖

​​pom.xml​​

fieldNames = new ArrayList(); List fieldOIs = new ArrayList(); fieldNames.add("items"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } public void process(Object[] objects) throws HiveException { // 1 获取传入的数据 String jsonArray = objects[0].toString(); // 2 将string转换为json数组 JSONArray actions = new JSONArray(jsonArray); // 3 循环一次,取出数组中的一个json,并写出 for (int i = 0; i < actions.length(); i++) { String[] result = new String[1]; result[0] = actions.getString(i); forward(result); } } public void close() throws HiveException { }}

我的写法

​​ExplodeJSONArray.java​​

package com.qc.gmall.hive.udtf;import java.util.ArrayList;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;import org.json.JSONArray;// UDTF 函数编码public class ExplodeJSONArray extends GenericUDTF { private PrimitiveObjectInspector inputOI; @Override public void close() throws HiveException { // 不需要实现任何逻辑 } @Override public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { if (argOIs.length != 1){ throw new UDFArgumentException("explode_json_array函数只能接收1个参数"); } ObjectInspector argOI = argOIs[0]; if (argOI.getCategory() != ObjectInspector.Category.PRIMITIVE){ throw new UDFArgumentException("explode_json_array函数只能接收基本数据类型的参数"); } // 强转类型 PrimitiveObjectInspector primitiveOI = (PrimitiveObjectInspector) argOI; inputOI = primitiveOI; if (primitiveOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING){ throw new UDFArgumentException("explode_json_array函数只能接收STRING类型的参数"); } ArrayList fieldNames = new ArrayList(); ArrayList fieldOIs = new ArrayList(); fieldNames.add("item"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } @Override public void process(Object[] args) throws HiveException { Object arg = args[0]; String jsonArrayStr = PrimitiveObjectInspectorUtils.getString(arg, inputOI); // json 解析工具 JSONArray jsonArray = new JSONArray(jsonArrayStr); // for循环遍历 for (int i = 0; i < jsonArray.length(); i++) { String json = jsonArray.getString(i); // 即便只有一列,也要用一个字符数组去装 String[] result = {json}; // 把数组通过 forward 方法输出出去 forward(result); } }}

2)创建函数

(1)打包

(2)将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module,然后再将该jar包上传到HDFS的/user/hive/jars路径下

[atguigu@hadoop102 module]$ hadoop fs -mkdir -p /user/hive/jars[atguigu@hadoop102 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars

(3)创建永久函数与开发好的java class关联

create function explode_json_array as 'com.atguigu.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar';

(4)注意:如果修改了自定义函数重新生成jar包怎么处理?

只需要替换HDFS路径上的旧jar包,然后重启Hive客户端即可。

参考官方

​​DeveloperGuide UDTF​​

GenericUDTF Interface A custom UDTF can be created by extending the GenericUDTF abstract class and then implementing the initialize, process, and possibly close methods. The initialize method is called by Hive to notify the UDTF the argument types to expect. The UDTF must then return an object inspector corresponding to the row objects that the UDTF will generate. Once initialize() has been called, Hive will give rows to the UDTF using the process() method. While in process(), the UDTF can produce and forward rows to other operators by calling forward(). Lastly, Hive will call the close() method when all the rows have passed to the UDTF.

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:2021年私域流量新变量:视频号、直播和Martech!
下一篇:Google Earth Engine(GEE)批量下载夜光遥感数据
相关文章

 发表评论

暂时没有评论,来抢沙发吧~