apache / druid Goto Github PK

Apache Druid: a high performance real-time analytics database.

License: Apache License 2.0

Java 91.58% Shell 0.22% HTML 0.01% JavaScript 0.09% Makefile 0.01% TeX 0.65% R 0.03% PostScript 0.01% ANTLR 0.01% CSS 0.01% Thrift 0.01% Roff 0.01% Dockerfile 0.02% TypeScript 4.01% Python 0.18% SCSS 0.35% FreeMarker 0.04% E 1.70% HiveQL 0.25% Euphoria 0.84%

druid

druid's Introduction

Workflow	Status
⚙️ CodeQL Config
🔍 CodeQL
🕒 Cron Job ITS
🏷️ Labeler
♻️ Reusable Revised ITS
♻️ Reusable Standard ITS
♻️ Reusable Unit Tests
🔄 Revised ITS
🔧 Standard ITS
🛠️ Static Checks
🧪 Unit and Integration Tests Unified
🔬 Unit Tests

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in web console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

See the latest documentation for the documentation for the current official release. If you need information on a previous release, you can browse previous releases documentation.

Make documentation and tutorials updates in /docs using Markdown or extended Markdown (MDX). Then, open a pull request.

To build the site locally, you need Node 16.14 or higher and to install Docusaurus 2 with npm|yarn install in the website directory. Then you can run npm|yarn start to launch a local build of the docs.

If you're looking to update non-doc pages like Use Cases, those files are in the druid-website-src repo.

Community

Visit the official project community page to read about getting involved in contributing to Apache Druid, and how we help one another use and operate Druid.

Druid users can find help in the druid-user mailing list on Google Groups, and have more technical conversations in #troubleshooting on Slack.
Druid development discussions take place in the druid-dev mailing list ([email protected]). Subscribe by emailing [email protected]. For live conversations, join the #dev channel on Slack.

Check out the official community page for details of how to join the community Slack channels.

Find articles written by community members and a calendar of upcoming events on the project site - contribute your own events and articles by submitting a PR in the apache/druid-website-src repository.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0

druid's People

Contributors

Stargazers

Watchers

Forkers

tralfamadude mcroydon mashion fharenheit joshdsullivan inthecloud247 steveblackmon libin jaytoday stonegao daemon13 kender-germany ianbrandt codeslinger andradeandrey jboulon marshallpierce benky platformd darxriggs drtobbe heptagon girishrao devendralaulkar lianhuiwang baeeq g9yuayon housejester web5design flyinweb jeske asherbond dovezat gigfork jssethi decbis sagarl flyerxiao qvin limlam8 venuktan apsaltis lukehan sofent harpreet-skilledanalyst krishna23456 rjurney hummus rgrieselhuber soitun activitystream rakeshnair franklinsun daggerrz s3051024 tranhuong ebu pablocasares branky cheddar mhazelwood metacret millmobile lvjianqiu jaworwoj bluebird03 alberts streamlyzer nebrera yilab jamesbw stfs taeyangoh alvinhom hkmurakami liquidm royrui merphonos bruce2008github rays-qpi aniruddha9591 askyer changguanghua jowy guisim agile collectivemedia gopalanj olark mactop cloud-school yahoodave billthebest scholarpallavi julious cybernetics luyao1986 traycho-zz avalani emtiazahmed

druid's Issues

Remove SIGAR

We currently use SIGAR to pull in some machine-level stats (like cpu usage, disk usage, network usage, etc.). This would be better done via something that polls SNMPD rather than embedding native code.

IncrementalIndex.add() barfs when InputRow.getDimensions() has duplicates

If InputRow.getDimensions() has duplicates, IncrementalIndex.add() fails with

java.lang.ArrayIndexOutOfBoundsException: 4
at com.metamx.druid.index.v1.IncrementalIndex.add(IncrementalIndex.java:148)
at com.metamx.druid.realtime.Sink.add(Sink.java:98)
at com.metamx.druid.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:176)
[12:29pm]

Query documentation

It would be really nice if we actually had documentation on how queries work.

Make the wiki have a real getting started section

The current "Getting Started" on the wiki doesn't tell people about DruidSetup or any of those other things. We need to fix that.

And/Or with only one field doesn't work

Sixtus42 from the IRC channel reports that And/Or filters with only one field do not actually work properly. This is pretty annoying and we should be able to make it work as if the and/or just wasn't there.

Cluster setup documentation

We have some docs describing how to get set up with a single node running Realtime. Need some docs discussing what a cluster looks like and how to fire it up. Some scripts would also probably be helpful.

Realtime nodes don't throttle their intake of messages

If a Firehose produces messages faster than the node can process (where process is defined in terms of the rate at which it can persist checkpoints and merge those checkpoints at the end of each time period), then the node will currently fall over instead of throttle itself back. It would be much nicer if it were to just throttle itself and go stale.

Druid won't build, can not resolve dependencies

I downloaded the master release of druid.

When trying to download the POM at https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local/com/metamx/java-util/0.16.1-SNAPSHOT/java-util-0.16.1-SNAPSHOT.pom, the buid.sh script yields :

[INFO] ------------------------------------------------------------------------
[INFO] Building druid-common 0.1.26-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local/com/metamx/java-util/0.16.1-SNAPSHOT/java-util-0.16.1-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] druid ............................................. SUCCESS [0.002s]
[INFO] druid-common ...................................... FAILURE [2.649s]
[INFO] druid-index-common ................................ SKIPPED
[INFO] druid-client ...................................... SKIPPED
[INFO] druid-server ...................................... SKIPPED
[INFO] druid-indexer ..................................... SKIPPED
[INFO] druid-realtime .................................... SKIPPED
[INFO] druid-merger ...................................... SKIPPED
[INFO] druid-examples .................................... SKIPPED
[INFO] druid-examples-rand ............................... SKIPPED
[INFO] druid-examples-twitter ............................ SKIPPED
[INFO] druid-services .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.230s
[INFO] Finished at: Mon Jan 07 06:18:05 CET 2013
[INFO] Final Memory: 7M/19M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project druid-common: Could not resolve dependencies for project com.metamx.druid:druid-common:jar:0.1.26-SNAPSHOT: Failed to collect dependencies for [com.metamx:java-util:jar:0.16.1-SNAPSHOT (compile), commons-codec:commons-codec:jar:1.3 (compile), commons-dbcp:commons-dbcp:jar:1.4 (compile), commons-pool:commons-pool:jar:1.6 (compile), org.skife.config:config-magic:jar:0.9 (compile), com.netflix.curator:curator-recipes:jar:1.2.2 (compile), com.netflix.curator:curator-x-discovery:jar:1.2.2 (compile), it.uniroma3.mat:extendedset:jar:1.3.2 (compile), com.google.guava:guava:jar:11.0.1 (compile), org.codehaus.jackson:jackson-core-asl:jar:1.9.9 (compile), org.codehaus.jackson:jackson-mapper-asl:jar:1.9.9 (compile), org.jdbi:jdbi:jar:2.32 (compile), joda-time:joda-time:jar:2.1 (compile), com.google.code.findbugs:jsr305:jar:2.0.1 (compile), log4j:log4j:jar:1.2.16 (compile), mysql:mysql-connector-java:jar:5.1.18 (compile), org.mozilla:rhino:jar:1.7R4 (compile), junit:junit:jar:4.8.1 (test), org.easymock:easymock:jar:3.0 (test), com.metamx:java-util:jar:tests:0.16.0 (test)]: Failed to read artifact descriptor for com.metamx:java-util:jar:0.16.1-SNAPSHOT: Could not transfer artifact com.metamx:java-util:pom:0.16.1-SNAPSHOT from/to pub-libs (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local): Failed to transfer file: https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local/com/metamx/java-util/0.16.1-SNAPSHOT/java-util-0.16.1-SNAPSHOT.pom. Return code is: 409, ReasonPhrase:Conflict. -> [Help 1]
[ERROR]

Furtherless, if I try loading the xml file manually, I get :

HTTP Status 409 - The repository 'pub-libs-releases-local' rejected the artifact 'pub-libs-releases-local:com/metamx/java-util/0.16.1-SNAPSHOT/java-util-0.16.1-SNAPSHOT.pom' due to its snapshot/release handling policy.

Release : Debian GNU Linux 6.0
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

Small Feature Request: Add a flag to prevent aggs and postAggs from being sent back.

I would like to have a flag that I can set on aggs and postAggs that would remove them from the output.

Preposed:

"suppress": true

(false by default)
This can be added with perfect backward computability

Why?

Sometimes I need to have helper aggs and postAggs that I need for the calculation but do not want to show to the user. (i.e. if the user downloads the data those columns need to be removed)

Use case for aggs:

Let's say that I want to compute the average bucketing for wiki_edits
I do not care about count or sum!
Everything with starting with a _ I do not want to have sent back.

Query:

{
  "dataSource": "wikipedia_editstream",
  "intervals": [
    "2013-02-26T00:00:00.000/2013-02-27T00:00:00.000"
  ],
  "queryType": "timeseries",
  "granularity": "all",
  "filter": {
    "type": "selector",
    "dimension": "namespace",
    "value": "article"
  },
  "dimension": {
    "type": "default",
    "dimension": "page",
    "outputName": "Page"
  },
  "aggregations": [
    {
      "type": "count",
      "name": "_count"
    },
    {
      "type": "doubleSum",
      "name": "_sum_count",
      "fieldName": "count"
    }
  ],
  "postAggregations": [
    {
      "type": "arithmetic",
      "name": "AvgCount",
      "fn": "/",
      "fields": [
        {
          "type": "fieldAccess",
          "fieldName": "_sum_count"
        },
        {
          "type": "fieldAccess",
          "fieldName": "_count"
        }
      ]
    }
  ]
}

Result:

[ {
  "timestamp" : "2013-02-26T00:00:00.000Z",
  "result" : {
    "_count" : 492635,
    "AvgCount" : 1.0816080871233267,
    "_sum_count" : 532838.0
  }
} ]

Conclusion

By being able to suppress everything that currently starts with and "_" we could save network bandwidth and save me work by having to remove these columns explicitly before sending it to the user.

Test com.metamx.druid.query.extraction.TimeDimExtractionFnTest assert fails

Hi there,

When building druid-client, the com.metamx.druid.query.extraction.TimeDimExtractionFnTest test fails. The build script output, and the content of the SureFire report log are pasted below.

Thank you !

PS : Sorry, the XML markup messed everything in the SureFire Report. Most important part might be :
java.lang.AssertionError:
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at com.metamx.druid.query.extraction.TimeDimExtractionFnTest.testQuarterExtraction(TimeDimExtractionFnTest.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:113)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

[INFO] Compiling 145 source files to /home/scoulibaly/druid/druid-master/client/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ druid-client ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/scoulibaly/druid/druid-master/client/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ druid-client ---
[INFO] Compiling 18 source files to /home/scoulibaly/druid/druid-master/client/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.12.2:test (default-test) @ druid-client ---
[INFO] Surefire report directory: /home/scoulibaly/druid/druid-master/client/target/surefire-reports

T E S T S

Running com.metamx.druid.shard.SingleDimensionShardSpecTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.381 sec
Running com.metamx.druid.query.QuerySegmentSpecTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.583 sec
Running com.metamx.druid.query.search.LexicographicSearchSortSpecTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.metamx.druid.query.search.StrlenSearchSortSpecTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running com.metamx.druid.query.QueriesTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
Running com.metamx.druid.query.extraction.TimeDimExtractionFnTest
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.271 sec <<< FAILURE!
Running com.metamx.druid.query.extraction.PartialDimExtractionFnTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.metamx.druid.query.extraction.RegexDimExtractionFnTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running com.metamx.druid.query.extraction.SearchQuerySpecDimExtractionFnTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running com.metamx.druid.CombiningIterableTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.metamx.druid.TimeseriesBinaryFnTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
Running com.metamx.druid.SearchBinaryFnTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec
Running com.metamx.druid.client.DataSegmentTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.189 sec
Running com.metamx.druid.client.cache.ByteCountingLRUMapTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.292 sec
Running com.metamx.druid.client.cache.MapCacheBrokerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running com.metamx.druid.client.cache.MemcachedCacheBrokerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
Running com.metamx.druid.ResultGranularTimestampComparatorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec

Results :

Failed tests: testQuarterExtraction(com.metamx.druid.query.extraction.TimeDimExtractionFnTest)

Tests run: 41, Failures: 1, Errors: 0, Skipped: 0

java.lang.AssertionError: at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at com.metamx.druid.query.extraction.TimeDimExtractionFnTest.testQuarterExtraction(TimeDimExtractionFnTest.java:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:113) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

DruidMaster stops thinking that there's a server, but doesn't remove and stop the LoadQueuePeon

Someone today reported a cluster where they had a node go on the fritz and even after it was destroyed and the announcement znode removed, the LoadQueuePeon continued to persist and do things (namely, when we tried to remove things from the loadQueue path, it kept creating new things).

Looking at the code in DruidMaster, it's hard to imagine how this could happen. Symptoms seen were

After the node went away, the Master's runs didn't think it was there and did not show it as one of the servers with segments to load/drop
The log from DruidMaster:658: log.info("Removing listener for server[%s] which is no longer there.", name); did not run.
When znodes from loadQueue path were deleted, log lines would show up as if the LoadQueuePeon was still doing things

I couldn't build druid on ubuntu 12.10 server with OpenJDK.

I tried many time to build on ubuntu 12.10 server with OpenJDK but I couldn't it.
What's wrong with me?

java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.10.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/mike/collective/druid/common/src/main/java/com/metamx/druid/VersionedIntervalTimeline.java:[91,74] incompatible types; no instance(s) of type variable(s) K,V exist so that java.util.TreeMap conforms to java.util.TreeMap.TimelineEntry>
found : java.util.TreeMap
required: java.util.TreeMap.TimelineEntry>
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] druid ............................................. SUCCESS [0.002s]
[INFO] druid-common ...................................... FAILURE [3.222s]
[INFO] druid-index-common ................................ SKIPPED
[INFO] druid-client ...................................... SKIPPED
[INFO] druid-server ...................................... SKIPPED
[INFO] druid-indexer ..................................... SKIPPED
[INFO] druid-realtime .................................... SKIPPED
[INFO] druid-merger ...................................... SKIPPED
[INFO] druid-examples .................................... SKIPPED
[INFO] druid-examples-rand ............................... SKIPPED
[INFO] druid-examples-twitter ............................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.518s
[INFO] Finished at: Tue Oct 30 18:15:05 CDT 2012
[INFO] Final Memory: 11M/239M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project druid-common: Compilation failure
[ERROR] /home/mike/collective/druid/common/src/main/java/com/metamx/druid/VersionedIntervalTimeline.java:[91,74] incompatible types; no instance(s) of type variable(s) K,V exist so that java.util.TreeMap conforms to java.util.TreeMap.TimelineEntry>
[ERROR] found : java.util.TreeMap
[ERROR] required: java.util.TreeMap.TimelineEntry>
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project druid-common: Compilation failure
/home/mike/collective/druid/common/src/main/java/com/metamx/druid/VersionedIntervalTimeline.java:[91,74] incompatible types; no instance(s) of type variable(s) K,V exist so that java.util.TreeMap conforms to java.util.TreeMap.TimelineEntry>
found : java.util.TreeMap
required: java.util.TreeMap.TimelineEntry>

PostAggregations in GroupBy being computed twice?

Sixtus42 on the IRC channel reports that PostAggregations with shadowed names in GroupBy queries appear to be computed twice.

BUG: compute node doesn't give up deleted file handles

I just ran out of disk space on a compute node:

the node was the only one up, so it got all the segments
the other nodes came up, so it deleted some
however, the files weren't closed properly so the filesystem didn't release the space

I had to restart the compute node to release the space on the filesystem

Implement "local" segment loader

The HadoopDruidIndexer when writing to local disk stores the segment with loadSpec type "test" which isn't actually useable by the compute nodes. It would be great to have a "local" option instead which would allow it to work all from the local box.

Zookeeper dependency issue on startup in Master?

It looks like the Master might have the wrong dependencies setup. It seems to be complaining about a ZK method:

java.lang.NoSuchMethodError: org.apache.zookeeper.ZooKeeper.(Ljava/lang/String;ILorg/apache/zookeeper/Watcher;Z)V
at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:166)
at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at com.netflix.curator.ConnectionState.reset(ConnectionState.java:210)
at com.netflix.curator.ConnectionState.start(ConnectionState.java:124)
at com.netflix.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:182)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:231)
at com.metamx.druid.initialization.Initialization$3.start(Initialization.java:258)
at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:54)
at com.metamx.druid.http.MasterMain.main(MasterMain.java:201)

Casing bug for dataSource in realtime

Nebrera on the IRC channel today reported that he was ingesting data but couldn't query it. His real-time schema had some capital letters in the dataSource and his query matched it but he couldn't get results. I had him adjust it to be all lowercase and it started working. So, we've got a casing bug in the real-time nodes right now.

Switch to Guice for Dependency Injection and allow for modules that don't embed Druid

The DI in Druid is currently done via newing things up in the main() method. This works, but it is less than wonderful for modularization and system configuration.

It would be great if we had a method of attaching modules to Druid that essentially amount to only adding jars to the classpath and adjusting some configuration. I think Guice is probably the best way to allow for this. We can make it such that attaching modules of Druid functionality is the equivalent of creating new Guice modules.

Using Guice will also allow us to inject "business" objects (like an s3Client) into Jackson-deserialized POJOs and simplify a lot of the interfaces and interactions that are allowed in the system.

Unit Tests!

The current code base is missing a lot of Unit Tests, it's almost as if it isn't tested. This is an artifact of the separation from our proprietary code base, it turns out that a lot of the unit tests depended on things that are metamx-specific, so we need to port those tests from our internal repo back into the code base.

Segments should be self-describing to the point that they can generate their own meaningful DataSegment objects

A lot of places in the code would be cleaned up with we could essentially go from the location of a segment to a DataSegment object. It makes sense that a segment should be able to have enough information to be able to handle its own metadata (everything except for the loadSpec anyway...).

Casing bug with IncrementalIndex and realtime

We've got a casing bug with IncrementalIndex and persistence in realtime. Need to go through and re-think how the casing is actually being adjusted.

Examples does not work.

I have cloned and build the code. Then I proceeded to run the example (following instructions from here: https://github.com/metamx/druid/wiki/RealtimeStandaloneMain ). I see the following error:

    2013-02-21 10:25:16,920 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning default value [50000] for [druid.query.groupBy.maxIntermediateRows] on [com.metamx.druid.query.group.GroupByQueryEngineConfig#getMaxIntermediateRows()]
    2013-02-21 10:25:17,218 INFO [main] druid.examples.RealtimeStandaloneMain - Throwable caught at startup, committing seppuku
    java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:108)
        at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:54)
        at druid.examples.RealtimeStandaloneMain.main(RealtimeStandaloneMain.java:89)
    Caused by: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'maxRowsInMemory' (in class com.metamx.druid.realtime.FireDepartmentConfig)
     at [Source: rand_realtime.spec; line: 1, column: 1]
        at com.google.common.base.Throwables.propagate(Throwables.java:156)
        at com.metamx.druid.realtime.RealtimeNode.initializeFireDepartments(RealtimeNode.java:252)
        at com.metamx.druid.realtime.RealtimeNode.doInit(RealtimeNode.java:172)
        at com.metamx.druid.QueryableNode.init(QueryableNode.java:352)
        at com.metamx.druid.realtime.RealtimeNode.start(RealtimeNode.java:202)
        ... 7 more
    Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'maxRowsInMemory' (in class com.metamx.druid.realtime.FireDepartmentConfig)
     at [Source: rand_realtime.spec; line: 1, column: 1]
        at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)

    ....

Realtime node cannot start back up if there is a "merged" directory on disk

The realtime nodes periodically merge data and store it in a local directory for pushing up to deep storage. When these directories are left lying around, the screw up the start up logic with exceptions like:

[2013-04-17 15:58:43,178] ERROR RuntimeException aborted realtime processing[rb_flow]: {class=com.metamx.druid.realtime.RealtimeManager, exceptionType=class java.lang.NumberFormatException, exceptionMessage=For input string: "merged"} (co
java.lang.NumberFormatException: For input string: "merged"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at com.metamx.druid.realtime.plumber.RealtimePlumberSchool$1.bootstrapSinksFromDisk(RealtimePlumberSchool.java:377)
at com.metamx.druid.realtime.plumber.RealtimePlumberSchool$1.startJob(RealtimePlumberSchool.java:188)
at com.metamx.druid.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:159)

We need to fix this in two ways:

Make the restart logic resilient to this stuff
Store the merged directories somewhere else so that they don't get in the way in the first place

Memory configuration settings should be verified at startup

There are a number of memory configuration settings that need to be set properly in order for the system to work. -XX:MaxDirectSystemMemory and druid.computation.buffer.size and druid.processing.numThreads are the ones that made me write this issue

The initialization process should check that it believes the settings will allow proper operation of the node and fail fast if this appear to be misconfigured.

PostAggregations do not work with groupBy query

I ran a groupBy query with postAggregations and it did not include my postAggregations in the returned data.

The POM for com.metamx:java-util:jar:0.16.0 is missing ?

just cloned the druid git repo. trying to execute build.sh, and getting:

Downloading: https://repo.bazaarvoice.com:443/nexus/content/groups/bazaarvoice/com/metamx/java-util/0.16.0/java-util-0.16.0.pom
[WARNING] The POM for com.metamx:java-util:jar:0.16.0 is missing, no dependency information available
Downloading: https://repo.bazaarvoice.com:443/nexus/content/groups/bazaarvoice/it/uniroma3/mat/extendedset/1.3.2/extendedset-1.3.2.pom
[WARNING] The POM for it.uniroma3.mat:extendedset:jar:1.3.2 is missing, no dependency information available
[WARNING] The POM for com.metamx:java-util:jar:tests:0.16.0 is missing, no dependency information available

it then causes an error:

[ERROR] Failed to execute goal on project druid-common: Could not resolve dependencies for project com.metamx.druid:druid-common:jar:0.1.7-SNAPSHOT: The following artifacts could not be resolved: com.metamx:java-util:jar:0.16.0, it.uniroma3.mat:extendedset:jar:1.3.2, com.metamx:java-util:jar:tests:0.16.0: Could not find artifact com.metamx:java-util:jar:0.16.0 in bazaarvoice (https://repo.bazaarvoice.com:443/nexus/content/groups/bazaarvoice) -> [Help 1]

what am i doing wrong?

Druid doesn't build with OpenJDK (on Ubuntu 12.04)

I'm sure this is not a big deal to anyone but it might warrant a note in the docs - or maybe Maven can catch it?

GroupBy query not multi-threaded?

The GroupBy Query is not running in multiple threads because of the mergeRunners() method on GroupByQueryRunnerFactory.

We should adjust it with a configuration switch to switch between using one thread or multiple and have it default to using multiple.

Improve result merging code

The current way that results are merged is sub-optimal when a cache is used. When the cache is not used, it is able to stream off of the servers and return results as they come back. When using the cache, it has to wait for at some some result from all servers before it can even start returning results.

We need to eliminate the separate "mergeSequences" calls from the CacheStrategy and the QueryToolChest. The CachingClient should be able to put things in the right order such that only the tool chest's mergeSequences() is required.

Feature Request: Query Monitor

It would be very helpful to have a monitor url on the broker that reports on currently running queries.

This would help to get a feel of the query load and simplify debugging ("it's not answering, where in the stack is did it get stuck").

json config reading in HadoopDruidIndexerConfig broken

https://github.com/madvertise/druid/commit/4a8347e0c5aac5b74bc71a8157371729cf310c56

and while writing this up, my run crashes on

com.metamx.common.IAE: Key[registererers] is required

HadoopDruidIndexer breaks when timestampColumn is not all lowercase

The Parser converts all the field names into lowercase values, but the HadoopDruidIndexerConfig is not doing the same with the timestampColumn field, so when it looks for the actual column, it breaks with an exception like

Caused by: java.lang.IllegalArgumentException: null timestamp
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at com.metamx.common.parsers.TimestampParser$5.apply(TimestampParser.java:119)
at com.metamx.common.parsers.TimestampParser$5.apply(TimestampParser.java:115)
at com.metamx.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.map(IndexGeneratorJob.java:189)

Granularity day and timezoned queries

Hi,

my users needs a "per day in UTC+2".

I can send the request that way, but druid responds in UTC, i.e. the buckets are always UTC instead of being based on the timezone of the query.

Improve indexing code to simplify merge and persist operations

The indexing code is getting old and the method of updating the storage format is essentially amounting to writing in an old version and converting up multiple times. This is less than optimal and not a good sign for future maintenance.

We need to look at the indexing code and fixor it up to make it betterz.

Realtime node query results are dependent on the number of persists

These are the relevant details from the mail thread on druid-dev.

The rand example was used with these settings:
"indexGranularity": "minute"
"maxRowsInMemory": 2
"maxGeneratedRows": 6
"nTokens": 2

The result should only contain 2 rows but it contains 6.

The groupBy query result using "all" granularity:

{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a1","rows":1,"eventsSum":1.0,"randomNumberSum":1.1248835325241089,"avg_random":1.1248835325241089}
{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a2","rows":1,"eventsSum":1.0,"randomNumberSum":2.730257511138916,"avg_random":2.730257511138916}
{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a1","rows":1,"eventsSum":1.0,"randomNumberSum":1.773929476737976,"avg_random":1.773929476737976}
{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a2","rows":1,"eventsSum":1.0,"randomNumberSum":2.5145485401153564,"avg_random":2.5145485401153564}
{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a1","rows":1,"eventsSum":1.0,"randomNumberSum":1.8219218254089355,"avg_random":1.8219218254089355}
{"version":"v1","timestamp":"2012-12-03T19:00:00.000Z","event":{"target":"a2","rows":1,"eventsSum":1.0,"randomNumberSum":2.066697597503662,"avg_random":2.066697597503662}

Eric:

Ah, yes, it is dependent on the number of persists. Right now, the
Broker will merge those results together if the query were routed
through it, but the realtime node is not merging them together. That
is something that needs fixing, yes.

Feature Request: size-based rules

It would be really nice to have a rule set like:

Keep the newest X GB on the hot cluster
Keep everything else on the longtail

If there is more than one data source with a size-based rule (and/or the cluster's size is smaller), the values should be considered relative.

GroupBy query does not finalize metrics

It seems like group by queries are not finalizing metrics.

ZkSetup should use the same config as the servers

ZkSetup takes a bunch of stuff off the command line, it should instead use the same mechanisms as the servers to get at the config.

Failure to parse a timestamp kills Realtime consumer

When a timestamp cannot be properly parsed, it fails the whole realtime consumer because it is throwing an IllegalArgumentException instead of a ParseException like the code is currently looking for. We need to fix that.

Threads hanging on Realtime nodes when asked to shutdown?

There are reports of threads hanging on Realtime nodes when asked to shutdown. We should try to reproduce and look into it.

HadoopDruidIndexer breaks in weird ways when the interval doesn't match the data

When running HadoopDruidIndexer using uniform granularity and an interval and not partitioning, it only generates segments for the data that actually exists, but it tries to load up metadata for the entire interval range. Given that it hasn't actually generated any segments for most of that data, this results in nothing showing up and triggers the retry mechanism which never finds any data and just all around fails.

It should do a better job of handling this case.

Post-Aggregations wiki page broken

https://github.com/metamx/druid/wiki/Post-aggregations seems to be missing the "name" parameter

Adjust HadoopDruidIndexer to allow for pulling credentials out of something other than the config

The HadoopDruidIndexer currently requires various credentials (namely the DB) to exist in the config, it would be easier to share those configs around if we didn't require that and instead took them from a runtime.properties file like all the other processes do.

The TwitterSpritzerFirehoseFactory example does not terminate properly

The TwitterSpritzerFirehoseFactory example does not terminate properly after SIGTERM (control-C) because some threads are blocked.

Master has issues assigning segments and logging on 0.3.27.2

One user experienced a bunch of logs that look like

[2013-04-17 17:22:56,670] WARN Not enough _default_tier servers or node capacity to assign segment[rb_flow_2013-04-17T17:12:00.000Z_2013-04-17T17:13:00.000Z_2013-04-17T17:12:00.000Z_-1]! Expected Replicants[2] (com.metamx.druid.master.rules.LoadRule)
[2013-04-17 17:22:56,670] WARN Not enough _default_tier servers or node capacity to assign segment[rb_flow_2013-04-17T17:09:00.000Z_2013-04-17T17:10:00.000Z_2013-04-17T17:09:00.000Z_-1]! Expected Replicants[2] (com.metamx.druid.master.rules.LoadRule)

And then

[2013-04-17 17:22:56,673] ERROR Caught exception, ignoring so that schedule keeps going.: {class=com.metamx.druid.master.DruidMaster, exceptionType=class java.lang.NullPointerException, exceptionMessage=null} (com.metamx.druid.master.DruidMaster)
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
        at com.google.common.collect.Maps$TransformedEntriesMap.<init>(Maps.java:1165)
        at com.google.common.collect.Maps.transformEntries(Maps.java:1064)
        at com.metamx.druid.master.DruidMasterLogger.run(DruidMasterLogger.java:97)
        at com.metamx.druid.master.DruidMaster$MasterRunnable.run(DruidMaster.java:617)
        at com.metamx.druid.master.DruidMaster$3.call(DruidMaster.java:459)
        at com.metamx.druid.master.DruidMaster$3.call(DruidMaster.java:452)
        at com.metamx.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:99)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

When running the master. They are only running a single compute node, so it cannot fully replicate, but this should not stop the Master from assigning segments.

It also should not produce an NPE when logging (I'm actually not sure if that is causing the NPE or not, but it seems like a decent avenue to pursue).

Log when dropping data outside "windowPeriod"?

Following our discussion on Google Groups:
https://groups.google.com/d/msg/druid-development/ag7EyEIftqQ/QBRNoFxuWMsJ

It might help the learning curve for Druid if the logs indicated when data was dropped. I assume it's a rare occurrence in a properly configured Realtime node, so the overhead would be minimal, but when experimenting with RealtimeStandaloneMain this was a confusing side-road and difficult to understand what was going on :)

Nodes in zookeeper should have a type

Currently it's not possible to safely tell nodes apart in zookeeper. Either I have to rely on druid.service to be different for each node type (which is a naming convention) or probe the node via http gets. Both not very nice for client libs.

Proposal:

add {"type": "broker|master" } to the entry stored in zk

EC2AutoScalingStrategy should be more user friendly

EC2AutoScalingStrategy right now takes an AMI ID and a userdata file as configuration, which means that if you want to update the worker AMI/userdata you have to update configuration and restart the process for all of the coordinators that you have running.

It should be possible to just update this via a UI of some sort and in a way such that I update it once and as long as that succeeds other coordinators will see it as well. Perhaps store the values in ZK or MySQL? For the UI, jconsole is sufficient in my mind, but an HTTP UI and RESTful interfaces would be best.

Enhancement: Add support for getting config file from S3 for HadoopDruidIndexerMain

We want to run the HadoopDruidIndexerMain using AWS EMR. Unfortunately, currently HadoopDruidIndexerMain has to read the config file from the local filesystem. We would like it to include support for reading config file from S3 URL.

TimeBoundary and MaxTime queries hit every segment in an interval instead of the edge segments

TimeBoundary and maxTime queries do not need to access and merge data from every segment in an interval, they only need information from the most and least recent segments.