dremio 的job 分析参考示例

网友投稿 310 2022-11-27

dremio 的job 分析参考示例

dremio 官方专门提供了一个章节,讲解如何进行分析,一些是一个总结,以及学习

预备

下载profiles,以及上传到dremio 中(个人默认space中)因为profiles 默认是json 格式的,我们直接使用dremio 查询分析数据

需要关注的内容:

sql 查询:sql 查询是否符合预期 query 源: 数据是否在需要的数据源中 计划时间: 时间是否符合预期 非默认参数:查看是否有非模式参数使用了 计划数据: 查看是否使用了rows ,row count 会造成比较高昂的广播,构建,构建会将数据加载到内存中 加速数据:包含了反射的选择,系统活跃情况 我们需要重点关注的是线程状态,通过此了解系统的等待,阻塞情况 错误处理:dremio nodes 情况,oom 性能考虑: 计划时间以及执行实现,线程数,row count 以及row,阻塞以及睡眠,准备以及等待时间

operator_type 数据准备

准备操作类型映射operator_type.json 内容如下

{"NAME":"SINGLE_SENDER","ID":0}

{"NAME":"BROADCAST_SENDER","ID":1}

{"NAME":"FILTER","ID":2}

{"NAME":"HASH_AGGREGATE","ID":3}

{"NAME":"HASH_JOIN","ID":4}

{"NAME":"MERGE_JOIN","ID":5}

{"NAME":"HASH_PARTITION_SENDER","ID":6}

{"NAME":"LIMIT","ID":7}

{"NAME":"MERGING_RECEIVER","ID":8}

{"NAME":"ORDERED_PARTITION_SENDER","ID":9}

{"NAME":"PROJECT","ID":10}

{"NAME":"UNORDERED_RECEIVER","ID":11}

{"NAME":"RANGE_SENDER","ID":12}

{"NAME":"SCREEN","ID":13}

{"NAME":"SELECTION_VECTOR_REMOVER","ID":14}

{"NAME":"STREAMING_AGGREGATE","ID":15}

{"NAME":"TOP_N_SORT","ID":16}

{"NAME":"EXTERNAL_SORT","ID":17}

{"NAME":"TRACE","ID":18}

{"NAME":"UNION","ID":19}

{"NAME":"OLD_SORT","ID":20}

{"NAME":"PARQUET_ROW_GROUP_SCAN","ID":21}

{"NAME":"HIVE_SUB_SCAN","ID":22}

{"NAME":"SYSTEM_TABLE_SCAN","ID":23}

{"NAME":"MOCK_SUB_SCAN","ID":24}

{"NAME":"PARQUET_WRITER","ID":25}

{"NAME":"DIRECT_SUB_SCAN","ID":26}

{"NAME":"TEXT_WRITER","ID":27}

{"NAME":"TEXT_SUB_SCAN","ID":28}

{"NAME":"JSON_SUB_SCAN","ID":29}

{"NAME":"INFO_SCHEMA_SUB_SCAN","ID":30}

{"NAME":"COMPLEX_TO_JSON","ID":31}

{"NAME":"PRODUCER_CONSUMER","ID":32}

{"NAME":"HBASE_SUB_SCAN","ID":33}

{"NAME":"WINDOW","ID":34}

{"NAME":"NESTED_LOOP_JOIN","ID":35}

{"NAME":"AVRO_SUB_SCAN","ID":36}

{"NAME":"MONGO_SUB_SCAN","ID":37}

{"NAME":"ELASTICSEARCH_SUB_SCAN","ID":38}

{"NAME":"ELASTICSEARCH_AGGREGATOR_SUB_SCAN","ID":39}

{"NAME":"FLATTEN","ID":40}

{"NAME":"EXCEL_SUB_SCAN","ID":41}

{"NAME":"ARROW_SUB_SCAN","ID":42}

{"NAME":"ARROW_WRITER","ID":43}

{"NAME":"JSON_WRITER","ID":44}

{"NAME":"VALUES_READER","ID":45}

{"NAME":"CONVERT_FROM_JSON","ID":46}

{"NAME":"JDBC_SUB_SCAN","ID":47}

{"NAME":"DICTIONARY_LOOKUP","ID":48}

{"NAME":"WRITER_COMMITTER","ID":49}

{"NAME":"ROUND_ROBIN_SENDER","ID":50}

效果

参考查询

查询行数分析

SELECT majorFragmentId, operatorId, op_type.name, sum(records) records

-- minorFragmentId, nested_4.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_4.minorFragmentProfile.memoryUsed AS memoryUsed, nested_4.endpoint.address AS address, nested_4.endpoint.maxDirectMemory AS maxDirectMemory

FROM (

SELECT minorFragmentProfile, nested_3.minorFragmentProfile.endpoint AS endpoint, minorFragmentId, operatorType, operatorId, nested_3.inputProfile.records AS records, majorFragmentId

FROM (

SELECT minorFragmentProfile, nested_2.minorFragmentProfile.minorFragmentId AS minorFragmentId, operatorProfile, nested_2.operatorProfile.operatorType AS operatorType, nested_2.operatorProfile.operatorId AS operatorId, flatten(nested_2.operatorProfile.inputProfile) AS inputProfile, majorFragmentId

FROM (

SELECT fragmentProfile, minorFragmentProfile, flatten(nested_1.minorFragmentProfile.operatorProfile) AS operatorProfile, majorFragmentId

FROM (

SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId

FROM (

SELECT flatten(fragmentProfile) AS fragmentProfile

FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"

nested_0

nested_1

nested_2

nested_3

) nested_4, "@dalong".operator_type op_type

where nested_4.operatorType = op_type.id

group by majorFragmentId, operatorId, op_type.name

order by majorFragmentId, operatorId

效果

查询内存情况

SELECT nested_2.endpoint.address AS address, sum(nested_2.minorFragmentProfile.maxMemoryUsed)/1024/1024 AS maxMemoryUsed_MB

-- nested_2.minorFragmentProfile.memoryUsed AS memoryUsed, nested_2.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_2.endpoint.address AS address, nested_2.endpoint.maxDirectMemory AS maxDirectMemory, nested_2.endpoint.provisionId AS provisionId, majorFragmentId

FROM (

SELECT fragmentProfile, minorFragmentProfile, nested_1.minorFragmentProfile.endpoint AS endpoint, majorFragmentId

FROM (

SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId

FROM (

SELECT flatten(fragmentProfile) AS fragmentProfile

FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"

nested_0

nested_1

) nested_2

group by nested_2.endpoint.address

order by nested_2.endpoint.address

效果

说明

对于job 的分析可以结合apache drill 学习,dremio 文档对于这部分介绍还是太简单,推荐结合drill 学习,同时默认32k 数据太小,需要调整下 参考命令

alter system set limits.single_field_size_bytes = 99000;

参考资料

​​​​​​​​​​​​​​​​​https://drill.apache.org/docs/query-profile-column-descriptions/​​

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:汽车USB3.1连接器系统设计注意事项
下一篇:Java MyBatis
相关文章

 发表评论

暂时没有评论,来抢沙发吧~