Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Introduce NodeIdentity #4868

Closed
wants to merge 6 commits into from

Conversation

mhsdesign
Copy link
Member

@mhsdesign mhsdesign commented Feb 2, 2024

Resolves partially: #4564
Required for a new NodeUriBuilder api #4552 which in-turn "should" solve cross cr linking: #4441

The NodeIdentity will contain WorkspaceName instead of the ContentStreamId

For most, if not all use cases, like caching a reference to a node in fusions @cache.context or serializing a node into the frontend and passing it around as query parameter, the workspace name should be used.

That guarantees that a node is still findable, when a workspace will be rebased (which can happen anytime, and thus the csid would be changed

Upgrade instructions

Review instructions

  • Userland should not need to operate on cs ids
  • Events must operate on cs ids
  • getSubgraph arguments
    • WorkspaceName → internally resolved to cs id and will do lookup instantly (CACHE)
    • DimensionSpacePoint
    • VisibilityConstraints
  • NodeFactory::mapNodeRowToNode must accept WorkspaceName and ContentStreamId from outside.
  • => Extend SubgraphIdentity
  • command handlers should still operate on cs ids via getSubgraphByIdentity - INTERNAL
    • by that both apis always pass the WorkspaceName
  • Node should know its
    • ContentRepositoryId
    • WorkspaceName
      • NOT: ContentSubgraphIdentity.
      • NOT: ContentStreamId
    • DimensionSpacePoint it was fetched from
    • VisibilityConstraints
      • so it can be passed as single instance and eel helpers can fetch parents e.t.c in the same context
    • OriginDimensionSpacePoint (as before)
  • WorkspaceName::fromString() must be stricter without colons (so that we could allow in the future for extension, e.g. WorkspaceName::detached(ContentStreamid $csId) (with an internal representation like “cs:”)

#4943 (comment) but maybe manual refresh instead of implicit through handle.

TODOs as discussed with @mhsdesign and @kitsunet today:

  • add workspaceName <-> contentStreamId table to content graph projection
  • add workspaceName -> contentStreamId lookup to ContentGraph::getSubgraph() (with runtime cache in the ContentGraph – the same workspace will resolve to the same CS ID in one request, maybe reset runtime cache via markStale())
  • rewrite ContentGraph::find* queries to use workspaceName instead of contentStreamId (IMO it's totally fine to join the workspaceName <-> cs id table here for those queries. Alternatively we could add a dedicated lookup + runtime cache)
  • remove SubgraphIdentity

Afterwards we have to find out whether there are places that need to be able to override the worspace name -> cs id mapping for tests or CR constraints. If so we could either:

  • Add a separate, internal, ContentGraph::getSubgraphInternal() – but that would be dangerous because it is part of the public interface (even if marked internal)
  • Introduce a ContentsubgraphFactoryInterface that can be used via CR service
  • Provide a way to hook into the runtime cache, e.g. $contentGraph->dangerousSetContentStreamIdForWorkspaceNameFooBar(...)
  • Extract workspaceName<->cs id mapping to external service that can be replaced/hooked into

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

@mhsdesign mhsdesign marked this pull request as draft February 7, 2024 16:23
@mhsdesign mhsdesign force-pushed the feature/NodeIdentityDto branch 2 times, most recently from 4e56a48 to 8912655 Compare February 12, 2024 21:11
@mhsdesign mhsdesign changed the title FEATURE: Introduce NodeIdentity Dto FEATURE: Introduce NodeIdentity and NodeSerializer Feb 12, 2024
@mhsdesign mhsdesign force-pushed the feature/NodeIdentityDto branch from 8912655 to cd7ad38 Compare February 12, 2024 21:18
@mhsdesign mhsdesign marked this pull request as ready for review February 14, 2024 12:52
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

My main concern (not with this PR but in general regarding this topic) is the implicit distinction between the "content stream" and the "workspace" view.

E.g. ContentSubgraphIdentity vs NodeIdentity

Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As recap from the weekly:
I like the NodeIdentity DTO and I think that it really makes sense.

NodeSerializer does not really make sense to me yet and I think it highlights some inconsistencies we currently have.

I could imagine that we could change the Node read model from:

subgraphIdentity:
  contentRepositoryId
  contentStreamId
  dimensionSpacePoint
  visibilityConstraints
nodeAggregateId
originDimensionSpacePoint
...

to

identity
  contentRepositoryId
  workspaceName
  dimensionSpacePoint
  nodeAggregateId
originDimensionSpacePoint
...

But I'm not sure about all implications yet

@kitsunet
Copy link
Member

@bwaidelich the suggested properties for node would prevent us from ever addressing a node that is in a contentstream without workspace attachment. I think it's fine to not plan for this from outside, but we might need to load one eg. for conflict resolution?

@bwaidelich
Copy link
Member

I just discussed this topic with @skurfuerst and he also agrees to getting rid of the CS id in the public APIs as much as possible.

We do probably still need visibilityConstraints in the Node read model because we use them to "stay in the same visibility context" when passing nodes via Fusion.
Personally I would prefer some required Fusion runtime variable for that (similar to request) but that would probably be too big of a change.
visibilityConstraints should not be part of the NodeIdentity though but it could be a property of the Node itself (and SubgraphIdentity can probably removed afterwards).

@kitsunet re

the suggested properties for node would prevent us from ever addressing a node that is in a contentstream without workspace attachment

That's true and might be a potential issue.
Sebastian suggested to introduce something like a VirtualWorkspaceName (or TransientWorkspaceName or some similar name) that reflects a CS id without a corresponding workspace (a bit similar to a detached head in Git)
This could be a concept of the read model exclusively, i.e.

class NodeIdentity
{
    private function __construct(
        // ...
        public WorkspaceName|VirtualWorkspaceName $workspaceName
// ...

For write operations, i.e. commands, we would always require a WorkspaceName

@mhsdesign mhsdesign marked this pull request as draft February 16, 2024 21:21
@mhsdesign mhsdesign changed the title FEATURE: Introduce NodeIdentity and NodeSerializer FEATURE: Introduce NodeIdentity Feb 20, 2024
@mhsdesign
Copy link
Member Author

mhsdesign commented Feb 20, 2024

I implemented some wip stuff, edit moved to https://github.com/mhsdesign/neos-development-collection/tree/feature/NodeIdentityDto-original-draft

Details

ala (my goal was to not have to keep track of the workspaces in the content graph projection)

final class ContentRepository
{
    public function getSubgraph(NodeIdentity $nodeIdentity, VisibilityConstraints $visibilityConstraints): ContentSubgraphInterface
    {
        if ($nodeIdentity->workspaceName instanceof DetachedWorkspaceName) {
            return $this->getContentGraph()->getSubgraph(
                $nodeIdentity->workspaceName->contentStreamId,
                $nodeIdentity->dimensionSpacePoint,
                $visibilityConstraints
            );
        }
        $workspace = $this->getWorkspaceFinder()->findOneByName($nodeIdentity->workspaceName);
        // ...


interface ContentGraphInterface extends ProjectionStateInterface
{
    public function getSubgraph(
        ContentStreamId|Workspace $contentStreamReference,
        DimensionSpacePoint $dimensionSpacePoint,
        VisibilityConstraints $visibilityConstraints
    ): ContentSubgraphInterface;


final readonly class NodeIdentity implements \JsonSerializable
{
    private function __construct(
        public ContentRepositoryId $contentRepositoryId,
        public WorkspaceName|DetachedWorkspaceName $workspaceName,
        public DimensionSpacePoint $dimensionSpacePoint,
        public NodeAggregateId $nodeAggregateId,
    ) {
    }


final class ContentSubgraph implements ContentSubgraphInterface
{
    // todo what if there was already a subgraph with runtime cache constructed and different / no workspace name?
    public function __construct(
        private readonly ContentRepositoryId $contentRepositoryId,
        private readonly ContentStreamId $contentStreamId,
        private readonly ?WorkspaceName $workspaceName,


final class NodeFactory
{
    public function __construct(
        private readonly ContentRepositoryId $contentRepositoryId,
        private readonly NodeTypeManager $nodeTypeManager,
        private readonly PropertyConverter $propertyConverter
    ) {
    }

    /**
     * @param array<string,string> $nodeRow Node Row from projection (<prefix>_node table)
     */
    public function mapNodeRowToNode(
        array $nodeRow,
        DimensionSpacePoint $dimensionSpacePoint,
        VisibilityConstraints $visibilityConstraints,
        ?WorkspaceName $workspaceName,
    ): Node {
        $nodeType = $this->nodeTypeManager->hasNodeType($nodeRow['nodetypename'])
            ? $this->nodeTypeManager->getNodeType($nodeRow['nodetypename'])
            : null;

        return Node::create(
            NodeIdentity::create(
                $this->contentRepositoryId,
                $workspaceName ?? DetachedWorkspaceName::fromContentStreamId(
                    ContentStreamId::fromString($nodeRow['contentstreamid'])
                ),
                $dimensionSpacePoint,
                NodeAggregateId::fromString($nodeRow['nodeaggregateid'])
            ),

@kitsunet
Copy link
Member

The VirtualWorkspaceName idea seems fine, I think that should cover most bases, I like it :)

Re the visibility constraints, IMHO it makes sense to know where exactly that node is from, to refer to other nodes, eg. if I get one of those nodes handed in userland code without them being available on the node I would need to pass that in parallel to not suddenly drop into a different visibility if I need a subtree for this node.

@bwaidelich
Copy link
Member

eg. if I get one of those nodes handed in userland code without them being available on the node I would need to pass that in parallel

IMO that would be the "better" way to solve this: To traverse the graph from a given node, you'd need the visibility constraints, too.
Merging those into the node read model for convenience blurs the responsibilities (e.g. whether they should be part of some caching identifier depends on the scenario).

But the usecase you describe is exactly why we would need it for now because the alternative would be to introduce the visibility constraints as some kind of context to the fusion runtime

@bwaidelich
Copy link
Member

Any update on this one?
It's one of the major outstandig changes AFAIR and it blocks some security related features.
Let me know if I can help!

@mhsdesign
Copy link
Member Author

Jup. #4790 should be merged first as i was about to adjust the node factory and was like wait a second, dont make it harder with the conflicts ;)

@mhsdesign mhsdesign force-pushed the feature/NodeIdentityDto branch from ae02b4c to c9feaca Compare March 18, 2024 23:10
The nodeadress was added 6 years ago without the concept of multiple crs.
It will be replaced by the node identity
@bwaidelich
Copy link
Member

@mhsdesign Thanks for the update and let me know if I can help with this one!

@mhsdesign mhsdesign force-pushed the feature/NodeIdentityDto branch from c9feaca to e37a4b8 Compare March 19, 2024 22:13
@@ -175,6 +196,7 @@ public function mapNodeRowsToNodeAggregate(
// ... so we handle occupation exactly once ...
$nodesByOccupiedDimensionSpacePoints[$occupiedDimensionSpacePoint->hash] = $this->mapNodeRowToNode(
$nodeRow,
WorkspaceName::forLive(), // todo use workspace name in content graph api
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeAggregates also need to operate on workspace names, but for that we would have to adjust the content graph api. That might go in a separate pr.

public function findNodeAggregateById(
    WorkspaceName $workspaceName,
    NodeAggregateId $nodeAggregateId
): ?NodeAggregate;

the cs id would be looked up internally via the workspaces table.
And the cache would be similar flushed on markStale as the cached subgraphs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this would be that the write site heavily relies on the ContentGraph and must work on a specific contentstream id (see ContentStreamIdOverride)

unless we create a custom ContentGraph which will use our content stream, or by actually hacky checking against the state in ContentStreamIdOverride we wont get around that.

@@ -71,6 +71,8 @@ private function handleMoveDimensionSpacePoint(
$streamName = ContentStreamEventStreamName::fromContentStreamId($command->contentStreamId)
->getEventStreamName();

// todo use WorkspaceName here as well but the command doesnt expose it
// https://github.com/neos/neos-development-collection/issues/4942
self::requireDimensionSpacePointToBeEmptyInContentStream(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to find out why #4942 is how it is.
These were the only commands untouched by bernhards refactoring.

Comment on lines +153 to +154
// todo UNSAFE as override via ContentStreamIdOverride will NOT be taken into account!!!
$command->workspaceName,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this MUST not be used as a custom repoint via ContentStreamIdOverride would be ignored.
We possibly have to pass both as touple and create a new ContentGraph by hand.

Comment on lines +59 to 60
// todo use workspace name instead!!!
$usage->contentStreamId,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess adding the workspace to the AssetUsage will not be trivial, so we should just use the workspace finder beforehand and get a workspace that will be internally resolved again to a content stream?

@@ -193,7 +193,7 @@ private function handleChangeNodeAggregateType(
$tetheredNodeName = NodeName::fromString($serializedTetheredNodeName);

$subgraph = $contentRepository->getContentGraph()->getSubgraph(
$node->subgraphIdentity->contentStreamId,
$node->identity->workspaceName,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is illegal as well because of ContentStreamIdOverride.

@mhsdesign
Copy link
Member Author

Thanks for all the discussion here, but this branch is super outdated. I cherry picked all the changes to a new pr and we can continue on a blank page: #5042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants