编写代码创建UDTF函数

网友投稿 291 2022-09-14

编写代码创建UDTF函数

1）创建UDTF函数——编写代码

（1）创建一个maven工程：hivefunction

（2）创建包名：com.atguigu.hive.udtf

（3）引入如下依赖

pom.xml

fieldNames = new ArrayList(); List fieldOIs = new ArrayList(); fieldNames.add("items"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } public void process(Object[] objects) throws HiveException { // 1 获取传入的数据 String jsonArray = objects[0].toString(); // 2 将string转换为json数组 JSONArray actions = new JSONArray(jsonArray); // 3 循环一次，取出数组中的一个json，并写出 for (int i = 0; i < actions.length(); i++) { String[] result = new String[1]; result[0] = actions.getString(i); forward(result); } } public void close() throws HiveException { }}

我的写法

ExplodeJSONArray.java

package com.qc.gmall.hive.udtf;import java.util.ArrayList;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;import org.json.JSONArray;// UDTF 函数编码public class ExplodeJSONArray extends GenericUDTF { private PrimitiveObjectInspector inputOI; @Override public void close() throws HiveException { // 不需要实现任何逻辑 } @Override public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { if (argOIs.length != 1){ throw new UDFArgumentException("explode_json_array函数只能接收1个参数"); } ObjectInspector argOI = argOIs[0]; if (argOI.getCategory() != ObjectInspector.Category.PRIMITIVE){ throw new UDFArgumentException("explode_json_array函数只能接收基本数据类型的参数"); } // 强转类型 PrimitiveObjectInspector primitiveOI = (PrimitiveObjectInspector) argOI; inputOI = primitiveOI; if (primitiveOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING){ throw new UDFArgumentException("explode_json_array函数只能接收STRING类型的参数"); } ArrayList fieldNames = new ArrayList(); ArrayList fieldOIs = new ArrayList(); fieldNames.add("item"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } @Override public void process(Object[] args) throws HiveException { Object arg = args[0]; String jsonArrayStr = PrimitiveObjectInspectorUtils.getString(arg, inputOI); // json 解析工具 JSONArray jsonArray = new JSONArray(jsonArrayStr); // for循环遍历 for (int i = 0; i < jsonArray.length(); i++) { String json = jsonArray.getString(i); // 即便只有一列，也要用一个字符数组去装 String[] result = {json}; // 把数组通过 forward 方法输出出去 forward(result); } }}

2）创建函数

（1）打包

（2）将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module，然后再将该jar包上传到HDFS的/user/hive/jars路径下

[atguigu@hadoop102 module]$ hadoop fs -mkdir -p /user/hive/jars[atguigu@hadoop102 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars

（3）创建永久函数与开发好的java class关联

create function explode_json_array as 'com.atguigu.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar';

（4）注意：如果修改了自定义函数重新生成jar包怎么处理？

只需要替换HDFS路径上的旧jar包，然后重启Hive客户端即可。

参考官方

DeveloperGuide UDTF

GenericUDTF Interface A custom UDTF can be created by extending the GenericUDTF abstract class and then implementing the initialize, process, and possibly close methods. The initialize method is called by Hive to notify the UDTF the argument types to expect. The UDTF must then return an object inspector corresponding to the row objects that the UDTF will generate. Once initialize() has been called, Hive will give rows to the UDTF using the process() method. While in process(), the UDTF can produce and forward rows to other operators by calling forward(). Lastly, Hive will call the close() method when all the rows have passed to the UDTF.

标签：工具

暂时没有评论，来抢沙发吧~

编写代码创建UDTF函数

linux cpu占用率如何看

宝塔数据库如何清理缓存

oracle怎么创建存储过程

推荐文章

api接口有哪几种分类及功能

什么是API接口?API接口简单介绍

短信API接口概述，短信API接口的优势

7款快递物流的物流查询API工具，物流快递查询API接口怎么对接？

企业四要素: 了解企业经营成功的关键

什么是语音验证码?,语音验证码平台有哪些

全国工商查询系统怎么查企业名录

哪些平台提供实名认证的接口？

PHP如何调用API接口?

如何使用百度天气预报API接口?

最近发表

热评文章

数据接口api（数据接口API开发平台）

数据开放接口api（数据服务api开发）

Python爬虫教程：爬取酷狗音乐（python爬取

hbuilder怎么更改字体大小和颜色

直播平台api接口 - 构建卓越的直播平台

实时股票数据api接口（股票实时行情api接口）