-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New diff query and parser #3280
Conversation
76c2bc1
to
0861581
Compare
|
0861581
to
5fbfcda
Compare
5fbfcda
to
2454472
Compare
b29cca8
to
470cf90
Compare
470cf90
to
109f160
Compare
1d4a286
to
4df897b
Compare
109f160
to
a56dcb6
Compare
3f4a3a1
to
d124131
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, looks good to me though I'd probably need to spend a bit more time to understand everything.
|
||
class DiffSummarySerializer: | ||
async def serialize(self, diff_root: DiffRoot) -> dict[str, Any]: # pylint: disable=unused-argument | ||
return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these classes be used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I didn't really tie them into the rest of the skeleton
I've deleted them for now
|
||
class DiffCombiner: | ||
def combine(self, earlier_diff: DiffRoot, later_diff: DiffRoot) -> DiffRoot: # pylint: disable=unused-argument | ||
return earlier_diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is mostly a placeholder for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there are a few stub classes in here for me/the team to fill in later
await self.diff_repo.save_diff_root(diff_root=calculated_diffs.diff_branch_diff) | ||
missing_time_range_diffs.append(calculated_diffs.diff_branch_diff) | ||
full_time_range_diffs = calculated_timeframe_diffs + missing_time_range_diffs | ||
full_time_range_diffs.sort(key=lambda dr: dr.from_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support this sort option on the objects themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might at some point, but right now it is just a list, so I think sorting it with the method is acceptable for now
@@ -0,0 +1,12 @@ | |||
from neo4j.graph import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially add a from __future__ import annotations
and move this import into a TYPECHECKING block.
Not sure where it will be imported but it's probably better to be sure if we later run agents that doesn't have access to the database then we don't want the neo4j module to be loaded at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. made some updates to do this
New query and Python parsing class for getting a diff for a given
base_branch
,diff_branch
,from_time
andto_time
.The Query
The goal is to get every "full path" (Root -> Node -> Attribute/Relationship -> Value/Other Node) that has been altered on one of the given branches in the given time frame. First, we identify every edge that should be included in the diff. This is actually pretty straightforward b/c we are just filtering edges based on
branch
,from
, andto
like we do in all our cypher queries. Determining which "full paths" to include in the response is somewhat harder and is what most of the lines in the query are working on.There are only 6 basic paths we need to consider about for a given diff
Each path has 4 cypher nodes and 3 cypher edges in it. The query runs in this order
MATCH (p:Node|Attribute|Relationship)-[diff_rel]->(q)
get every cypher edge that is on one of the branches in the diff and has a change within the timeframe of the diff. This should be relatively fast b/c we've got indicesfrom
andbranch
onHAS_ATTRIBUTE
andHAS_VALUE
, but it might be worth adding more on the other edge types in the query:IS_PART_OF
,IS_RELATED
, etc.HAS_VALUE
,IS_VISIBLE
,IS_PROTECTED
,HAS_OWNER
, andHAS_SOURCE
. if the path identified is on thediff_branch
, we need to make sure we include the latestbase_branch
version of that path so that we can correctly setprevious_value
andnew_value
. if the edge in question touches aRelationship
node, we also need to get the far side of that relationship so that we can correctly set thepeer_id
of the relationship being updated in the diff.HAS_ATTRIBUTE
orIS_RELATED
. this includes getting all the paths with the latest edges in the given timeframe below the diff edge, preferring paths on the same branch as thediff_rel
. For example, if aHAS_ATTRIBUTE
edge is included in the diff, we would also want to include all theHAS_VALUE
,IS_VISIBLE
,IS_PROTECTED
,HAS_SOURCE
, andHAS_OWNER
paths that connect to theHAS_ATTRIBUTE
via the linkedAttribute
nodeIS_PART_OF
. this includes getting all the paths with the latest edges in the given timeframe below the diff edge, preferring paths on the same branch as thediff_rel
. For example, if a node is added or deleted on the branch, thenIS_PART_OF
will be included in ourdiff_rel
s and we need to get all the latest paths in the given timeframe with preference for the branch of thediff_rel
The Python parser
The
DiffQueryParser
class and its associated dataclasses ingest every "full path" returned by the query and turn them into a simple hierarchical datastructure (see dataclasses incore.diff.model.path
). Hopefully, this simple internal data structure is enough to handle everything we want to do with the diffRemaining work
This new logic is not actually used anywhere yet. We will need to replace existing diff logic with this new query and class and perhaps make changes along the way.
kind
of the nodes in the diff