-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-8504: Add Schema Caching to Splunk Plugin #2929
base: master
Are you sure you want to change the base?
Conversation
<dependency> | ||
<groupId>com.github.ben-manes.caffeine</groupId> | ||
<artifactId>caffeine</artifactId> | ||
<version>2.9.3</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we achieve the same thing using Guava's caching? The reason I ask is that we already have this insanely big dependency tree and Guava is already in it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so is caffeine now that I look! So I guess we can ignore this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnturton Is that a +1? I somehow broke the versioning when I rebased on the current master, but I'll fix before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I got pulled away before I could continue but will complete the review today.
af3d11b
to
f372ad6
Compare
74f6b90
to
0d8364b
Compare
@jnturton It looks like the GitHub CI is failing on the Hadoop 2 tests with Hive. |
2ff2983
to
fd2549a
Compare
DRILL-8504: Add Schema Caching to Splunk Plugin
Description
Whenever Drill executes a Splunk query, it must retrieve a list of indexes from Splunk. This step can add a considerable amount of time to the planning phase. This PR introduces a simple in-memory cache for the Splunk plugin which caches the list of indexes to avoid having to query Splunk repeatedly to obtain this information.
This PR also makes a few unrelated minor improvements:
Documentation
(Added to README)
For every query that you send to Splunk from Drill, Drill will have to pull schema information from Splunk. If you have a lot of indexes, this process can cause slow planning time. To improve planning time, you can configure Drill to cache the index names so that it does not need to make additional calls to Splunk.
There are two configuration parameters for the schema caching:
maxCacheSize
andcacheExpiration
. The maxCacheSize defaults to 10k bytes and thecacheExpiration
defaults to 1024 minutes. To disable schema caching simply set thecacheExpiration
parameter to a value less than zero.Testing
Ran all unit tests and tested manually.