Skip to content

Commit

Permalink
OAK-10341 Tree store (#1577)
Browse files Browse the repository at this point in the history
* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10944: oak-auth-ldap: update commons-pool2 dependency to 2.12.0 (#1576)

* OAK-10705: oak-standalone: update dependencies (#1411)

Updated dependencies.

* OAK-10905 | Add a configurable  async checkpoint creator service (#1560)

* OAK-10905 | Add license header to  AsyncCheckpointService (#1579)

* OAK-10848: commons: remove use of slf4j.event.Level in SystemPropertySupplier implementation (#1580)

* OAK-10685: remove unused import of java.io.UnsupportedEncodingException

* OAK-10949: blob-cloud, segment-aws: update aws SDK to 1.12.761 (dependencies reference vulnerable amazon ion-java version) (#1581)

* OAK-10954: Update spotbugs plugin to 4.8.6.2 (#1588)

* OAK-10959: webapp: update Tomcat dependency to 9.0.90 (#1589)

* OAK-10960: blob-cloud, segment: update netty version to 4.1.111 (#1590)

* OAK-10945: Remove usage of Guava Function interface (#1578)

* OAK-10945: Remove usage of Guava Function interface (oak-upgrade)

* OAK-10945: Remove usage of Guava Function interface (oak-store-spi)

* OAK-10945: Remove usage of Guava Function interface (oak-it)

* OAK-10945: Remove usage of Guava Function interface (oak-store-document)

* OAK-10945: Remove usage of Guava Function interface (oak-store-composite)

* OAK-10945: Remove usage of Guava Function interface (oak-segment-tar)

* OAK-10945: Remove usage of Guava Function interface (oak-segment-azure)

* OAK-10945: Remove usage of Guava Function interface (oak-security-spi)

* OAK-10945: Remove usage of Guava Function interface (oak-security-search)

* OAK-10945: Remove usage of Guava Function interface (oak-run-commons)

* OAK-10945: Remove usage of Guava Function interface (oak-run)

* OAK-10945: Remove usage of Guava Function interface (oak-lucene)

* OAK-10945: Remove usage of Guava Function interface (oak-jcr)

* OAK-10945: Remove usage of Guava Function interface (oak-exercise)

* OAK-10945: Remove usage of Guava Function interface (oak-core-spi)

* OAK-10945: Remove usage of Guava Function interface (oak-core)

* OAK-10945: Remove usage of Guava Function interface (oak-commons)

* OAK-10945: Remove usage of Guava Function interface (oak-blob-plugins)

* OAK-10945: Remove usage of Guava Function interface (oak-blob-cloud, oak-blob-cloud-azure)

* OAK-10945: Remove usage of Guava Function interface (oak-upgrade) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-store-spi) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-store-document) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-store-composite) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-segment-tar) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-lucene) - cleanup

* OAK-10945: Remove usage of Guava Function interface (oak-jcr) - cleanup

* OAK-10962: oak-solr-osgi: update zookeeper dependency to 3.9.2 (#1591)

* Update build.yml to disable Sonar for now

...because of failures, as in https://github.com/apache/jackrabbit-oak/actions/runs/10021673018

* OAK-10965 - Make ConsoleIndexingReporter thread safe. (#1592)

* OAK-6762: Convert oak-blob to OSGi R7 annotations (#1413)

done

* OAK-6773: Convert oak-store-composite to OSGi R7 annotations (#1489)

done

* OAK-10951 - Add a new configuration property (#1594)

- "oak.indexer.persistedLinkedList.cacheSize" - sets the cache size of the PersistedLinkedList used to traverse the FFS. This controls the number FFS entries kept in memory.

* OAK-10966 - Avoid object creation in PathUtils.isAncestor (#1596)

* OAK-10964: bump nimbus-jose-jwt dependency to latest (#1593)

* OAK-6773: Convert oak-store-composite to OSGi R7 annotations - fix line ends

* OAK-10803 -- compress/uncompress property, disabled by default (#1526)

Co-authored-by: pirlogea <[email protected]>

* OAK-10974 and OAK-10869 : temporarily disabling flaky tests

* OAK-10971 - Add a method to test if a path is a direct ancestor of another: PathUtils.isDirectAncestor()  (#1598)

* OAK-10803: fix NPE when Mongo is unavailable, remove '*' imports (#1601)

* OAK-10976 - Avoid unnecessary call to PathUtils.getName in IndexDefinition (#1603)

* OAK-10904: Close token refresh executor service after access token is no longer needed (#1545)

* OAK-10904: change token refresh executor to use daemon thread

* OAK-10904: add log

* OAK-10904: move storage credential method to concrete class and explicitly calling close to shutdown the executor

* OAK-10904: wrap code for generating storage credentials in try catch

* OAK-10904: minor changes

* OAK-10904: refactor log

* OAK-10977 - Cleanup IndexDefinition class (#1604)

- replace Guava usages with JDK equivalents
- fix typos
- improve formatting

* OAK-10978 - Skip Azure compaction when there's not enough garbage in the repository (#1606)

* OAK-10966 - Indexing job: create optimized version of PersistedLinkedList (#1595)

* Revert "OAK-10966 - Indexing job: create optimized version of PersistedLinkedList (#1595)"

This reverts commit 8a72ef8.

* Revert "OAK-10978 - Skip Azure compaction when there's not enough garbage in the repository (#1606)"

This reverts commit 5814638.

* Revert "OAK-10977 - Cleanup IndexDefinition class (#1604)"

This reverts commit ce5c7df.

* Revert "OAK-10904: Close token refresh executor service after access token is no longer needed (#1545)"

This reverts commit 632a15b.

* Revert "OAK-10976 - Avoid unnecessary call to PathUtils.getName in IndexDefinition (#1603)"

This reverts commit e411960.

* Revert "OAK-10803: fix NPE when Mongo is unavailable, remove '*' imports (#1601)"

This reverts commit 5dd6344.

* Revert "OAK-10971 - Add a method to test if a path is a direct ancestor of another: PathUtils.isDirectAncestor()  (#1598)"

This reverts commit c416850.

* Revert "OAK-10974 and OAK-10869 : temporarily disabling flaky tests"

This reverts commit aefb990.

* Revert "OAK-10803 -- compress/uncompress property, disabled by default (#1526)"

This reverts commit 25792e7.

* Revert "OAK-6773: Convert oak-store-composite to OSGi R7 annotations - fix line ends"

This reverts commit 818317c.

* Revert "OAK-10964: bump nimbus-jose-jwt dependency to latest (#1593)"

This reverts commit 9528fdd.

* Revert "OAK-10966 - Avoid object creation in PathUtils.isAncestor (#1596)"

This reverts commit b37db4c.

* Revert "OAK-10951 - Add a new configuration property (#1594)"

This reverts commit ebefe01.

* Revert "OAK-6773: Convert oak-store-composite to OSGi R7 annotations (#1489)"

This reverts commit 0521c63.

* Revert "OAK-6762: Convert oak-blob to OSGi R7 annotations (#1413)"

This reverts commit 6c44805.

* Revert "OAK-10965 - Make ConsoleIndexingReporter thread safe. (#1592)"

This reverts commit 8b05cae.

* Revert "Update build.yml to disable Sonar for now"

This reverts commit 8db9183.

* Revert "OAK-10962: oak-solr-osgi: update zookeeper dependency to 3.9.2 (#1591)"

This reverts commit 95ed6c4.

* Revert "OAK-10945: Remove usage of Guava Function interface (#1578)"

This reverts commit 5d56db1.

* Revert "OAK-10960: blob-cloud, segment: update netty version to 4.1.111 (#1590)"

This reverts commit cd35db7.

* Revert "OAK-10959: webapp: update Tomcat dependency to 9.0.90 (#1589)"

This reverts commit 5fb55e0.

* Revert "OAK-10954: Update spotbugs plugin to 4.8.6.2 (#1588)"

This reverts commit b098fd8.

* Revert "OAK-10949: blob-cloud, segment-aws: update aws SDK to 1.12.761 (dependencies reference vulnerable amazon ion-java version) (#1581)"

This reverts commit cafbd29.

* Revert "OAK-10685: remove unused import of java.io.UnsupportedEncodingException"

This reverts commit 36cfcfa.

* Revert "OAK-10848: commons: remove use of slf4j.event.Level in SystemPropertySupplier implementation (#1580)"

This reverts commit ba530fe.

* Revert "OAK-10905 | Add license header to  AsyncCheckpointService (#1579)"

This reverts commit 7f112fa.

* Revert "OAK-10905 | Add a configurable  async checkpoint creator service (#1560)"

This reverts commit f12ccb1.

* Revert "OAK-10705: oak-standalone: update dependencies (#1411)"

This reverts commit dc10a12.

* Revert "OAK-10944: oak-auth-ldap: update commons-pool2 dependency to 2.12.0 (#1576)"

This reverts commit 3d6d009.

* OAK-10341 Tree store (bugfix for loggers)

* OAK-10341 Tree store (PipelinedTreeStoreStrategy.java)

* OAK-10341 Tree store (tests)

* OAK-10341 Tree store (use less memory)

* OAK-10341 Tree store (fix memory cache calculation)

* OAK-10341 Tree store (blob prefetch)

* OAK-10341 Tree store (blob prefetch)

* OAK-10341 Tree store (blob prefetch)

* OAK-10341 Tree store (node prefetch)

* OAK-10341 Tree store (incremental)

* OAK-10341 Tree store (pack files)

* OAK-10341 Tree store (pack files)

* OAK-10341 Tree store (tests)

* OAK-10341 Tree store (incremental)

* OAK-10341 Tree store (compress)

* OAK-10341 Tree store (traverse included paths)

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store

* OAK-10341 Tree store (use same blob prefetching configuration as for FlatFileStore)

* OAK-10341 Tree store (javadocs)

* OAK-10341 Tree store (code review)

---------

Co-authored-by: Julian Reschke <[email protected]>
Co-authored-by: mbaedke <[email protected]>
Co-authored-by: nit0906 <[email protected]>
Co-authored-by: Nuno Santos <[email protected]>
Co-authored-by: Tushar <[email protected]>
Co-authored-by: ionutzpi <[email protected]>
Co-authored-by: pirlogea <[email protected]>
Co-authored-by: Stefan Egli <[email protected]>
Co-authored-by: Andrei Dulceanu <[email protected]>
  • Loading branch information
10 people authored Sep 6, 2024
1 parent 1e24f55 commit bb4ecd1
Show file tree
Hide file tree
Showing 53 changed files with 7,850 additions and 28 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,16 @@ nodeStore, getMongoDocumentStore(), traversalLog))
return storeList;
}

public IndexStore buildTreeStore() throws IOException, CommitFailedException {
String old = System.setProperty(FlatFileNodeStoreBuilder.OAK_INDEXER_SORT_STRATEGY_TYPE,
FlatFileNodeStoreBuilder.SortStrategyType.PIPELINED_TREE.name());
try {
return buildFlatFileStore();
} finally {
System.setProperty(FlatFileNodeStoreBuilder.OAK_INDEXER_SORT_STRATEGY_TYPE, old);
}
}

public IndexStore buildStore() throws IOException, CommitFailedException {
return buildFlatFileStore();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,15 @@ private AheadOfTimeBlobDownloadingFlatFileStore(FlatFileStore ffs, CompositeInde
}
}

static boolean isEnabledForIndexes(String indexesEnabledPrefix, List<String> indexPaths) {
/**
* Whether blob downloading is needed for the given indexes.
*
* @param indexesEnabledPrefix the comma-separated list of prefixes of the index
* definitions that benefit from the download
* @param indexPaths the index paths
* @return true if any of the indexes start with any of the prefixes
*/
public static boolean isEnabledForIndexes(String indexesEnabledPrefix, List<String> indexPaths) {
List<String> enableForIndexes = splitAndTrim(indexesEnabledPrefix);
for (String indexPath : indexPaths) {
if (enableForIndexes.stream().anyMatch(indexPath::startsWith)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,14 @@
import org.apache.jackrabbit.oak.index.indexer.document.CompositeException;
import org.apache.jackrabbit.oak.index.indexer.document.CompositeIndexer;
import org.apache.jackrabbit.oak.index.indexer.document.NodeStateEntryTraverserFactory;
import org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.ConfigHelper;
import org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.PipelinedStrategy;
import org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.PipelinedTreeStoreStrategy;
import org.apache.jackrabbit.oak.index.indexer.document.indexstore.IndexStore;
import org.apache.jackrabbit.oak.index.indexer.document.indexstore.IndexStoreSortStrategy;
import org.apache.jackrabbit.oak.index.indexer.document.indexstore.IndexStoreUtils;
import org.apache.jackrabbit.oak.index.indexer.document.tree.Prefetcher;
import org.apache.jackrabbit.oak.index.indexer.document.tree.TreeStore;
import org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore;
import org.apache.jackrabbit.oak.plugins.document.RevisionVector;
import org.apache.jackrabbit.oak.plugins.document.mongo.MongoDocumentStore;
Expand Down Expand Up @@ -122,7 +126,11 @@ public enum SortStrategyType {
/**
* System property {@link #OAK_INDEXER_SORT_STRATEGY_TYPE} if set to this value would result in {@link PipelinedStrategy} being used.
*/
PIPELINED
PIPELINED,
/**
* System property {@link #OAK_INDEXER_SORT_STRATEGY_TYPE} if set to this value would result in {@link PipelinedTreeStoreStrategy} being used.
*/
PIPELINED_TREE,
}

public FlatFileNodeStoreBuilder(File workDir) {
Expand Down Expand Up @@ -224,20 +232,52 @@ public IndexStore build(IndexHelper indexHelper, CompositeIndexer indexer) throw
entryWriter = new NodeStateEntryWriter(blobStore);
IndexStoreFiles indexStoreFiles = createdSortedStoreFiles();
File metadataFile = indexStoreFiles.metadataFile;
FlatFileStore store = new FlatFileStore(blobStore, indexStoreFiles.storeFiles.get(0), metadataFile,
new NodeStateEntryReader(blobStore),
unmodifiableSet(preferredPathElements), algorithm);
File file = indexStoreFiles.storeFiles.get(0);
IndexStore store;
if (file.isDirectory()) {
store = buildTreeStoreForIndexing(indexHelper, file);
} else {
store = new FlatFileStore(blobStore, file, metadataFile,
new NodeStateEntryReader(blobStore),
unmodifiableSet(preferredPathElements), algorithm);
}
if (entryCount > 0) {
store.setEntryCount(entryCount);
}
if (indexer == null || indexHelper == null) {
return store;
}
if (withAheadOfTimeBlobDownloading) {
return AheadOfTimeBlobDownloadingFlatFileStore.wrap(store, indexer, indexHelper);
} else {
return store;
if (withAheadOfTimeBlobDownloading && store instanceof FlatFileStore) {
FlatFileStore ffs = (FlatFileStore) store;
return AheadOfTimeBlobDownloadingFlatFileStore.wrap(ffs, indexer, indexHelper);
}
return store;
}

public IndexStore buildTreeStoreForIndexing(IndexHelper indexHelper, File file) {
TreeStore indexingTreeStore = new TreeStore(
"indexing", file,
new NodeStateEntryReader(blobStore), 10);
indexingTreeStore.setIndexDefinitions(indexDefinitions);

// use a separate tree store (with a smaller cache)
// for prefetching, to avoid cache evictions
TreeStore prefetchTreeStore = new TreeStore(
"prefetch", file,
new NodeStateEntryReader(blobStore), 3);
prefetchTreeStore.setIndexDefinitions(indexDefinitions);
String blobPrefetchEnableForIndexes = ConfigHelper.getSystemPropertyAsString(
AheadOfTimeBlobDownloadingFlatFileStore.BLOB_PREFETCH_ENABLE_FOR_INDEXES_PREFIXES, "");
Prefetcher prefetcher = new Prefetcher(prefetchTreeStore, indexingTreeStore);
String blobSuffix = "";
if (AheadOfTimeBlobDownloadingFlatFileStore.isEnabledForIndexes(
blobPrefetchEnableForIndexes, indexHelper.getIndexPaths())) {
blobSuffix = ConfigHelper.getSystemPropertyAsString(
AheadOfTimeBlobDownloadingFlatFileStore.BLOB_PREFETCH_BINARY_NODES_SUFFIX, "");
}
prefetcher.setBlobSuffix(blobSuffix);
prefetcher.startPrefetch();
return indexingTreeStore;
}

public List<IndexStore> buildList(IndexHelper indexHelper, IndexerSupport indexerSupport,
Expand Down Expand Up @@ -351,15 +391,24 @@ IndexStoreSortStrategy createSortStrategy(File dir) {
log.warn("TraverseWithSortStrategy is deprecated and will be removed in the near future. Use PipelinedStrategy instead.");
return new TraverseWithSortStrategy(nodeStateEntryTraverserFactory, preferredPathElements, entryWriter, dir,
algorithm, pathPredicate, checkpoint);
case PIPELINED:
case PIPELINED: {
log.info("Using PipelinedStrategy");
List<PathFilter> pathFilters = indexDefinitions.stream().map(IndexDefinition::getPathFilter).collect(Collectors.toList());
List<String> indexNames = indexDefinitions.stream().map(IndexDefinition::getIndexName).collect(Collectors.toList());
indexingReporter.setIndexNames(indexNames);
return new PipelinedStrategy(mongoClientURI, mongoDocumentStore, nodeStore, rootRevision,
preferredPathElements, blobStore, dir, algorithm, pathPredicate, pathFilters, checkpoint,
statisticsProvider, indexingReporter);

}
case PIPELINED_TREE: {
log.info("Using PipelinedTreeStoreStrategy");
List<PathFilter> pathFilters = indexDefinitions.stream().map(IndexDefinition::getPathFilter).collect(Collectors.toList());
List<String> indexNames = indexDefinitions.stream().map(IndexDefinition::getIndexName).collect(Collectors.toList());
indexingReporter.setIndexNames(indexNames);
return new PipelinedTreeStoreStrategy(mongoClientURI, mongoDocumentStore, nodeStore, rootRevision,
preferredPathElements, blobStore, dir, algorithm, pathPredicate, pathFilters, checkpoint,
statisticsProvider, indexingReporter);
}
}
throw new IllegalStateException("Not a valid sort strategy value " + sortStrategyType);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
* histogram are correct but if the histogram overflowed, it may be missing some entries.
*/
public class BoundedHistogram {
private static final Logger LOG = LoggerFactory.getLogger(PipelinedStrategy.class);
private static final Logger LOG = LoggerFactory.getLogger(BoundedHistogram.class);
private final ConcurrentHashMap<String, LongAdder> histogram = new ConcurrentHashMap<>();
private volatile boolean overflowed = false;
private final String histogramName;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import org.slf4j.LoggerFactory;

public class ConfigHelper {
private static final Logger LOG = LoggerFactory.getLogger(PipelinedStrategy.class);
private static final Logger LOG = LoggerFactory.getLogger(ConfigHelper.class);

public static int getSystemPropertyAsInt(String name, int defaultValue) {
int result = Integer.getInteger(name, defaultValue);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ public Result call() throws Exception {
for (NodeDocument nodeDoc : nodeDocumentBatch) {
statistics.incrementMongoDocumentsTraversed();
mongoObjectsProcessed++;
if (mongoObjectsProcessed % 50000 == 0) {
if (mongoObjectsProcessed % 50_000 == 0) {
LOG.info("Mongo objects: {}, total entries: {}, current batch: {}, Size: {}/{} MB",
mongoObjectsProcessed, totalEntryCount, nseBatch.numberOfEntries(),
nseBatch.sizeOfEntriesBytes() / FileUtils.ONE_MB,
Expand Down
Loading

0 comments on commit bb4ecd1

Please sign in to comment.