[POC] Generate mermaid diagrams from harmonized index schemas #4918

pepopowitz · 2025-02-02T09:20:49Z

Should not be merged as-is. This PR exists for the purpose of soliciting feedback.

Description

My hackathon project, a proof of concept of #4870.

Adds a new page containing diagrams of the main & runtime harmonized indexes. Includes scripts to generate those diagrams from the upstream index schemas.

Preview

https://preview.docs.camunda.cloud/pr-4918/docs/next/self-managed/operational-guides/backup-restore/harmonized-indexes/

What's in this PR

Copies the harmonized index schemas from upstream for ease of working. Long-term, I'd expect us to check out the camunda/camunda repo as part of the workflow.
Adds a script (1-combine-sources) to combine all the JSON files into one big one (or technically two, to retain the distinction between main & runtime indexes). I'm undecided if this would be useful for a long-term workflow.
Adds a script (2-generate-mermaid-diagrams) to generate proper markdown to emit mermaid diagrams for all schemas in a JSON file.
- Note that I would like extra hackathon points for not only writing unit tests during a hackathon, but using TDD to derive this logic.
Adds a script (2-generate-mermaid-output) to dump the generated markdown into a file.
Adds markdown partials to include that output in a new page.

Implementation notes

I put the new page under self-managed/backup and restore because that seemed like the most likely place someone might be interested in knowing what the schema looked like.
The schema definitions aren't truly an "Entity Relationship" diagram; that type of diagram seemed like the best fit that I could find.
The mermaid integration into docusaurus is such that very large diagrams become unreadably small. The diagram is limited to the width of the page, and is itself an SVG, and it gets scaled down to fit the page.
- I intentionally left 6 entities in one diagram so you can see this scaling in action, in the 4th row of entities as you scroll down the page.
- In response, I chose for this prototype to create diagrams of 3 schemas, and stack them on top of each other vertically. I looked briefly into adding scrollbars and making one big scrollable diagram, but I didn't have success with that.
- We do have the ability to control the layout algorithm, but neither of them lays the entities out in a more readable format than what you're seeing.
There are multiple concepts in the schemas that we'll need to address somehow.
- object types: an index defines nested types in its schema. I chose to represent these using ERD relationships, extracting the nested type into a separate entity (see camunda-authorization index). This is visually inaccurate according to ERDs; it is not a separate index; but it might be a better way to describe things?
  - An example would be grouping a firstName and lastName under a name, or this from our indexes:
```
"permissions": {
  "type": "object",
  "properties": {
    "type": {
      "type": "keyword"
    },
    "resourceIds": {
      "type": "keyword"
    }
  }
}
```
- join types: in this case, an item in an index references other items in the same index, through a defined relationship. This is not exactly a concept that a traditional Entity Relationship diagram handles out of the box. I tried joining a couple indexes to themselves (see camunda-group and tasklist-task indexes) to represent this; I left the others as using the type join.
  - Here's an example from a schema:
```
"joinRelation": {
   "type": "join",
   "eager_global_ordinals": true,
   "relations": {
     "processInstance": ["activity", "variable"]
   }
 },
```
- There may be some implied "relationships" across indexes, which I did not represent. For example, the operate-decision index includes a decisionRequirementsKey property, which I think might refer to the key of an item in the operate-decision-requirements index.
  - It would make a more complete diagram to include these implied relationships, if I am understanding them correctly.
  - However it would destroy my workaround for mermaid/docusaurus's tendency to squish large diagrams, as I would need all entities in one diagram. That isn't a reason not to do it, but including the relationships would mean I'd have to find a different workaround.
  - The implied relationships are not represented in the indexes. If we chose to represent them in the diagrams, I'd need to maintain a list of relationships here, separate from the upstream source. Again, not a reason not to do it, but it does introduce complexity and fragility.

Other changes required before "done"

The generation of the markdown from JSON schemas would happen in a GitHub workflow.
Content for the page needs to be written. https://docs.google.com/document/d/1EFZ19Gx8Nf559pP_Bg8ObFMfGYdlq20age8P_WiSBOY/edit?tab=t.0#heading=h.c447h0byekxu is a good starting point for this. I will likely solicit a technical writer to help me with this 😅

Decisions to be made/feedback I'm interested in

Because I think it will be easier for reviewers, I will post a list of questions in a comment, so that you can reply to it with any of your feedback.

When should this change go live?

Never, at least not in this form!

github-actions · 2025-02-02T09:21:12Z

👋 🤖 🤔 Hello, @pepopowitz! Did you make your changes in all the right places?

These files were changed only in docs/. You might want to duplicate these changes in versioned_docs/version-8.6/.

docs/self-managed/operational-guides/backup-restore/_harmonized-indexes-main.md
docs/self-managed/operational-guides/backup-restore/_harmonized-indexes-runtime.md
docs/self-managed/operational-guides/backup-restore/harmonized-indexes.md

You may have done this intentionally, but we wanted to point it out in case you didn't. You can read more about the versioning within our docs in our documentation guidelines.

github-actions · 2025-02-02T09:39:47Z

The preview environment relating to the commit a6a5600 has successfully been deployed. You can access it at https://preview.docs.camunda.cloud/pr-4918/index.html

pepopowitz · 2025-02-03T20:31:35Z

Things I'm directly seeking feedback on

"Harmonized indexes" feels like a thing we call these internally, and I wonder if a user would know what that meant. Is there a better more simplified name for the page?
How do you feel about the location of the page? Should it be moved somewhere else?
Is this better than generating text-only tables? A downloadable diagram?
What do you think about the object representation I described? Do you think an extracted object in the diagram is confusing, since it technically lives on the original index object? Is it clear from the diagram what's going on?
What do you think about the join alternatives I presented in the description? Do you have a preference? Is there another way to represent this that might be better?
Do you think it would be better to include the implied relationships I described?
How many extra hackathon points will you give me for the unit tests & TDD?

And of course, anything else that's on your mind 😄

pepopowitz · 2025-02-03T20:47:52Z

docs/self-managed/operational-guides/backup-restore/_harmonized-indexes-main.md

+        long memberKey
+        join join
+    }
+    camunda-group ||--o{ camunda-group: "group:member"


I forgot to mention this but I snuck these join treatments into the generated markdown manually, to see what they would look like. The scripts would need to be updated to accommodate these, if we decide we like them.

akeller · 2025-02-05T20:52:28Z

Note that I would like extra hackathon points for not only writing unit tests during a hackathon, but using TDD to derive this logic.

🪙 🙌

akeller · 2025-02-05T21:23:00Z

The mermaid integration into docusaurus is such that very large diagrams become unreadably small. The diagram is limited to the width of the page, and is itself an SVG, and it gets scaled down to fit the page.

Zoomable images are back again, I see. @Sijoma was interested in Mermaid via this issue. He might have a larger example we could try in our docs to see how the experience would be. I fear most of our diagrams will end up being very large 🥲 .

"Harmonized indexes" feels like a thing we call these internally, and I wonder if a user would know what that meant. Is there a better more simplified name for the page?

This is probably a @ChrisKujawa question. It's part of platform unification and (IMO) users only really need to know about it for migrating from the multi-component concept. It's the Camunda platform core indices, but maybe with more capitalization.

How do you feel about the location of the page? Should it be moved somewhere else?

Good question. I think it's part of the architecture (self-managed/reference-architecture/#architecture) but also part of the update guide (self-managed/operational-guides/update-guide/introduction/). I don't know how often it would be referenced outside the context of updating.

Is this better than generating text-only tables? A downloadable diagram?

🤷‍♀️ Immediately, I wondered why we wouldn't offer multiple view/use options, but it doesn't have much to do with this presentation of info.

What do you think about the object representation I described? Do you think an extracted object in the diagram is confusing, since it technically lives on the original index object? Is it clear from the diagram what's going on?

I think this is clear, but I also think I cheated by looking at other representations of this data. I also don't have much feedback on the remaining questions because the diagram looks fairly simple...? But maybe I'm missing something.

How many extra hackathon points will you give me for the unit tests & TDD?

10x

ChrisKujawa · 2025-02-06T09:35:07Z

First of all I want to thank you @pepopowitz that you looked into this and spent your hackday on this topic 🚀 Really cool. 💪🏼

Things I'm directly seeking feedback on

"Harmonized indexes" feels like a thing we call these internally, and I wonder if a user would know what that meant. Is there a better more simplified name for the page?

Yeah, I think it should just be like Indicies or maybe "Secondary Storage Schema" something. For C7 we call the page Database Schema

How do you feel about the location of the page? Should it be moved somewhere else?

I think it might make sense to have this separate, maybe even in the Reference section 🤔 But yeah in general we can move it around I guess if we find a better spot.

Is this better than generating text-only tables? A downloadable diagram?

Good question. I was also thinking whether it would also work when we just generate markdown tables out of this, or something. I guess somehow it is interesting to have it visual especially if you want to show relations.

What do you think about the object representation I described? Do you think an extracted object in the diagram is confusing, since it technically lives on the original index object? Is it clear from the diagram what's going on?

I agree it is a bit confusing, but we can workaround here via a different form or something 🤔 In general I liked that we have it to see that this is contained in the index, but yeah might be not fully clear.

What do you think about the join alternatives I presented in the description? Do you have a preference? Is there another way to represent this that might be better?

The join is not really clear based on the visualization, what it actually is that multiple entities can live in the same index/table. Maybe we can visualize this differently, via combined rows or something.

Do you think it would be better to include the implied relationships I described?

I think it would be interesting, but might introduce more complexity.

How many extra hackathon points will you give me for the unit tests & TDD?

At least 10 👍🏼 :D

And of course, anything else that's on your mind 😄

I was thinking, as you described there is an issue with the size of the images, whether we could split them up by use case/context. For example, by related to identity, decision execution, process execution, task execution, etc. Wdyt?

ChrisKujawa · 2025-02-06T09:38:24Z

docs/self-managed/operational-guides/backup-restore/_harmonized-indexes-main.md

+    camunda-authorization {
+        keyword id
+        long ownerKey
+        keyword ownerType
+        keyword resourceType


❓ One thing I was wondering was whether we could turn the values around. That we first have the name of the property then the type? I felt this is somehow more natural. What are your thoughts?

For example this is also done in the C7 ER diagram https://docs.camunda.org/manual/7.22/user-guide/process-engine/database/database-schema/#entity-relationship-diagrams

ChrisKujawa · 2025-02-06T11:29:04Z

Maybe @aleksander-dytko or @ingorichtsmeier want to give some input here as well :)

aleksander-dytko · 2025-02-07T12:58:14Z

@pepopowitz some thoughts:

Yeah, I think it should just be like Indicies or maybe "Secondary Storage Schema" something. For C7 we call the page Database Schema

I think we should officially introduce "Primary Storage" and "Secondary Storage" In our docs to describe the data pipeline. This would be useful for further reference e.g. here

Visuals

I believe having a visual representation of the schema is useful to quickly orientate in C8. We could first show the list indices and have each line clickable, with the details of the schema.

pepopowitz added 7 commits January 30, 2025 09:50

feat: basic transformation

e9a764b

feat: support nested properties

47f4dbc

feat: support many schemas

9fa4e67

swizzle mermaid components just to see what we get access to

23f1043

capture indexes from upstream

9ed065d

add a script to combine individual json files into one big one

025d497

add scripts to generate markdown partials from index schema JSON

db00cb9

pepopowitz added the hold This issue is parked, do not merge. label Feb 2, 2025

github-actions bot assigned pepopowitz Feb 2, 2025

pepopowitz added the deploy Stand up a temporary docs site with this PR label Feb 2, 2025

github-actions bot temporarily deployed to camunda-docs February 2, 2025 09:36 Destroyed

pepopowitz requested review from ChrisKujawa and akeller February 3, 2025 20:31

pepopowitz commented Feb 3, 2025

View reviewed changes

ChrisKujawa reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Generate mermaid diagrams from harmonized index schemas #4918

[POC] Generate mermaid diagrams from harmonized index schemas #4918

pepopowitz commented Feb 2, 2025 •

edited

Loading

github-actions bot commented Feb 2, 2025

github-actions bot commented Feb 2, 2025

pepopowitz commented Feb 3, 2025 •

edited

Loading

pepopowitz Feb 3, 2025

akeller commented Feb 5, 2025

akeller commented Feb 5, 2025 •

edited

Loading

ChrisKujawa commented Feb 6, 2025

Things I'm directly seeking feedback on

ChrisKujawa Feb 6, 2025

ChrisKujawa commented Feb 6, 2025

aleksander-dytko commented Feb 7, 2025

[POC] Generate mermaid diagrams from harmonized index schemas #4918

Are you sure you want to change the base?

[POC] Generate mermaid diagrams from harmonized index schemas #4918

Conversation

pepopowitz commented Feb 2, 2025 • edited Loading

Description

Preview

What's in this PR

Implementation notes

Other changes required before "done"

Decisions to be made/feedback I'm interested in

When should this change go live?

github-actions bot commented Feb 2, 2025

github-actions bot commented Feb 2, 2025

pepopowitz commented Feb 3, 2025 • edited Loading

Things I'm directly seeking feedback on

pepopowitz Feb 3, 2025

Choose a reason for hiding this comment

akeller commented Feb 5, 2025

akeller commented Feb 5, 2025 • edited Loading

ChrisKujawa commented Feb 6, 2025

Things I'm directly seeking feedback on

ChrisKujawa Feb 6, 2025

Choose a reason for hiding this comment

ChrisKujawa commented Feb 6, 2025

aleksander-dytko commented Feb 7, 2025

Visuals

pepopowitz commented Feb 2, 2025 •

edited

Loading

pepopowitz commented Feb 3, 2025 •

edited

Loading

akeller commented Feb 5, 2025 •

edited

Loading