Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-11139 Allow downloading only recently changed nodes from MongoDB #1734

Merged
merged 9 commits into from
Oct 15, 2024

Conversation

thomasmueller
Copy link
Member

No description provided.

}

public void setMinModified(long minModified) {
this.minModified = minModified;
Copy link
Contributor

@tihom88 tihom88 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oak documentation mentions:
The _modified field contains an indexed low-resolution timestamp when the node was last modified. The time resolution is five seconds. This field is also updated when a branch commit modifies a node.
If I am understanding it correctly anything updated till 4 sec will be set to 0 as per time resolution.
i.e for a 14th second update _modified property will be set to 10 sec.
For a 29th second update _modified property will be set to 25 sec.

So to make sure that we don't miss any update
I think minModified should be
this.minModified = minModified - minModified/5

@stefan-egli , @mreutegg am I correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minModified is the minimum (lower bound) of the "_modified" property.

Copy link
Member Author

@thomasmueller thomasmueller Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything updated till 4 sec will be set to 0

Anything between on January 1st, 1970 midnight (up to ~5 seconds later), will be set to 0.

@@ -76,12 +76,21 @@ public class IndexerSupport {
private File indexDefinitions;
private final String checkpoint;
private File existingDataDumpDir;
private long minModified;
Copy link
Contributor

@tihom88 tihom88 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use lastModifed instead of minModified as we are using _modified property of mongMK which denotes lastModified timestamp?

Oak document:
The _modified field contains an indexed low-resolution timestamp when the node was last modified.

Copy link
Member Author

@thomasmueller thomasmueller Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Last modified" refers to the last time a node was modified. It refers to the property "jcr:lastModified", which may or may not exist.

"Min modified" means the minimum value (lower bound) of the "_modified" property.

_modified property of mongMK which denotes lastModified timestamp

The "_modified" property does not denote the "jcr:lastModified" timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, _modified is independent of jcr:lastModified. My point was to just add info about what this property maps to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand now! Thanks!

}
treeStore.removeNode(line.path);
break;
case REMOVE_IF_EXISTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference from a behaviour point of view between the 2 modes is that there is warning logged for DELETE, which doesn't seem to warrant a new mode. But do you forsee different implementations going forward? In that case maybe it's ok to have these different

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's the point: "REMOVE_IF_EXISTS" should not log a warning, but "DELETE" should.

I think both implementations are needed: one for the "diff" command (DELETE), and one for the "top-up" command (REMOVE_IF_EXISTS).

Copy link
Contributor

@amit-jain amit-jain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@nit0906
Copy link
Contributor

nit0906 commented Oct 15, 2024

I have one query on the overall top up implementation - when getting the topup directly from mongo using the _modified filter - we would miss the deleted nodes - right ?

Eventually this would lead to larger and larger index stores ?

We filter the non-existant nodes in queries, so even if this won't effect query results - but would it still impact the performance ? Let's say a bulk delete on a large content repository just before the top up.

@thomasmueller
Copy link
Member Author

thomasmueller commented Oct 15, 2024

we would miss the deleted nodes - right ?

Only for those deleted nodes that the "diff" doesn't see. (My plan is to run both the diff and the top-up.)

Eventually this would lead to larger and larger index stores ?

Until the bugs in the "diff" parts are fixed, yes. But the verification would still notice it, and so we could re-download everything once we see that it's too many.

Let's say a bulk delete on a large content repository just before the top up.

If the bugs in the "diff" logic are not fixed, then I think the behavior is the same as it is now. This is because "diff" is also used for async index update.

If it turns out that the "diff" logic can not be fixed easily, then we could also change the download logic (pipelined strategy) to emit "removed" entries. But I think we don't need to implement this logic currently.

@thomasmueller thomasmueller merged commit e80723e into trunk Oct 15, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants