Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8504: Add Schema Caching to Splunk Plugin #2929

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

cgivre
Copy link
Contributor

@cgivre cgivre commented Jul 24, 2024

DRILL-8504: Add Schema Caching to Splunk Plugin

Description

Whenever Drill executes a Splunk query, it must retrieve a list of indexes from Splunk. This step can add a considerable amount of time to the planning phase. This PR introduces a simple in-memory cache for the Splunk plugin which caches the list of indexes to avoid having to query Splunk repeatedly to obtain this information.

This PR also makes a few unrelated minor improvements:

  • Updates the test container to Splunk version 9.3 which at the time of writing is the most current version. I had to update some unit tests as a result.
  • Adds a new config option for the maximum columns returned in Splunk
  • Adds the actual SPL sent to Splunk to the query plan. This can be useful for debugging.

Documentation

(Added to README)
For every query that you send to Splunk from Drill, Drill will have to pull schema information from Splunk. If you have a lot of indexes, this process can cause slow planning time. To improve planning time, you can configure Drill to cache the index names so that it does not need to make additional calls to Splunk.

There are two configuration parameters for the schema caching: maxCacheSize and cacheExpiration. The maxCacheSize defaults to 10k bytes and the cacheExpiration defaults to 1024 minutes. To disable schema caching simply set the cacheExpiration parameter to a value less than zero.

Testing

Ran all unit tests and tested manually.

@cgivre cgivre self-assigned this Jul 24, 2024
@cgivre cgivre requested a review from jnturton July 25, 2024 16:29
@cgivre cgivre changed the title Add Index Cache to Splunk Plugin DRILL-8504: Add Schema Caching to Splunk Plugin Jul 29, 2024
@cgivre cgivre marked this pull request as ready for review July 29, 2024 15:00
@cgivre cgivre added enhancement PRs that add a new functionality to Drill doc-impacting PRs that affect the documentation dependencies labels Jul 30, 2024
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>2.9.3</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we achieve the same thing using Guava's caching? The reason I ask is that we already have this insanely big dependency tree and Guava is already in it...

https://www.baeldung.com/guava-cache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so is caffeine now that I look! So I guess we can ignore this suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnturton Is that a +1? I somehow broke the versioning when I rebased on the current master, but I'll fix before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I got pulled away before I could continue but will complete the review today.

@cgivre
Copy link
Contributor Author

cgivre commented Aug 25, 2024

@jnturton It looks like the GitHub CI is failing on the Hadoop 2 tests with Hive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies doc-impacting PRs that affect the documentation enhancement PRs that add a new functionality to Drill
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants