-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support embedded alluxio cache in hive #20658
Conversation
d4fbec3
to
8398a20
Compare
@@ -305,7 +311,7 @@ else if (maxSplitBytes * 2 >= remainingBlockBytes) { | |||
internalSplit.getFileModifiedTime(), | |||
internalSplit.getSchema(), | |||
internalSplit.getPartitionKeys(), | |||
block.getAddresses(), | |||
cachingHostAddressProvider.getHosts(internalSplit.getPath(), block.getAddresses()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to extend the interface with defaultAddresses? We could also do
block.getAddresses().isEmpty() ? cachingHostAddressProvider.getHosts(internalSplit.getPath()) : block.getAddresses(),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block addresses are populated when the filesystem is HDFS. When caching is used with HDFS, we still want caching to drive split scheduling decision rather than HDFS block locality.
...em-cache-alluxio/src/main/java/io/trino/filesystem/alluxio/AlluxioFileSystemCacheModule.java
Outdated
Show resolved
Hide resolved
b269320
to
ee8145b
Compare
...no-filesystem/src/main/java/io/trino/filesystem/cache/DefaultCachingHostAddressProvider.java
Show resolved
Hide resolved
lib/trino-filesystem/src/main/java/io/trino/filesystem/cache/CachingHostAddressProvider.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSplitManager.java
Show resolved
Hide resolved
Allows connector to use it's own host addresses for split scheduling when caching is not enabled
ee8145b
to
9c476f7
Compare
@raunaqmorarka merge it! |
Description
Support embedded alluxio cache in hive
Additional context and related issues
Part of #20550
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: