天天看點

Apache Druid 解析ORC及parquet格式的資料

Apache Druid可以從本地或者HDFS批量攝取資料,現在最新版本(0.18)也支援直接解析

ORC

parquet

格式的資料,但是要使用這個功能還需要進行簡單的配置。

官方文檔說明

Apache Druid打包了所有的核心擴充(參考本文附件),您可以通過将需要的擴充名添加到

common.runtime.properties

中的

druid.extensions.loadList

。例如,要加載

postqresql-metadata-storage

druid-hdfs-storage

擴充,請使用配置:

druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]
           

是以當我們需要Druid 解析ORC及Parquet格式的資料時,就需要這樣配置:

druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","druid-orc-extensions","druid-parquet-extensions"]
           

附件

Name Description Docs
druid-avro-extensions Support for data in Apache Avro data format. link
druid-azure-extensions Microsoft Azure deep storage.
druid-basic-security Support for Basic HTTP authentication and role-based access control.
druid-bloom-filter Support for providing Bloom filters in druid queries.
druid-datasketches Support for approximate counts and set operations with Apache DataSketches.
druid-google-extensions Google Cloud Storage deep storage.
druid-hdfs-storage HDFS deep storage.
druid-histogram Approximate histograms and quantiles aggregator. Deprecated, please use the DataSketches quantiles aggregator from the

druid-datasketches

extension instead.
druid-kafka-extraction-namespace Apache Kafka-based namespaced lookup. Requires namespace lookup extension.
druid-kafka-indexing-service Supervised exactly-once Apache Kafka ingestion for the indexing service.
druid-kinesis-indexing-service Supervised exactly-once Kinesis ingestion for the indexing service.
druid-kerberos Kerberos authentication for druid processes.
druid-lookups-cached-global A module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.
druid-lookups-cached-single Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups
druid-orc-extensions Support for data in Apache ORC data format.
druid-parquet-extensions Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.
druid-protobuf-extensions Support for data in Protobuf data format.
druid-ranger-security Support for access control through Apache Ranger.
druid-s3-extensions Interfacing with data in AWS S3, and using S3 as deep storage.
druid-ec2-extensions Interfacing with AWS EC2 for autoscaling middle managers UNDOCUMENTED
druid-stats Statistics related module including variance and standard deviation.
mysql-metadata-storage MySQL metadata store.
postgresql-metadata-storage PostgreSQL metadata store.
simple-client-sslcontext Simple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS.
druid-pac4j OpenID Connect authentication for druid processes.
aliyun-oss-extensions Aliyun OSS deep storage
ambari-metrics-emitter Ambari Metrics Emitter
druid-cassandra-storage Apache Cassandra deep storage.
druid-cloudfiles-extensions Rackspace Cloudfiles deep storage and firehose.
druid-distinctcount DistinctCount aggregator
druid-redis-cache A cache implementation for Druid based on Redis.
druid-time-min-max Min/Max aggregator for timestamp.
sqlserver-metadata-storage Microsoft SQLServer deep storage.
graphite-emitter Graphite metrics emitter
statsd-emitter StatsD metrics emitter
kafka-emitter Kafka metrics emitter
druid-thrift-extensions Support thrift ingestion
druid-opentsdb-emitter OpenTSDB metrics emitter
materialized-view-selection, materialized-view-maintenance Materialized View
druid-moving-average-query Support for Moving Average and other Aggregate Window Functions in Druid queries.
druid-influxdb-emitter InfluxDB metrics emitter
druid-momentsketch Support for approximate quantile queries using the momentsketch library
druid-tdigestsketch Support for approximate sketch aggregators based on T-Digest
gce-extensions GCE Extensions