Skip to content

wikibonsai/semtree

Repository files navigation

semtree

A WikiBonsai Project NPM package

“It is important to view knowledge as a sort of semantic tree. Make sure you understand the fundamental principles, i.e., the trunk and big branches before you get into the leaves/details or there is nothing for them to hang on to.”

~ Elon Musk

semtree is a utility to construct a semantic tree from word lists/indexes which may span multiple objects -- the most likely setup being multiple filenames which map to their file content.

semtree itself is essentially a collection of functions to facilitate the cultivation of this tree, namely with lint(), create(), update(), or print(), and handles the build process via a state machine.

This package can be used in conjunction with treehouze and is compatible with [[wikirefs]], caml and yaml syntaxes.

See context for more about why and how this package is useful.

🌳 Cultivate a "semantic tree" or "knowledge bonsai" in your 🎋 WikiBonsai digital garden.

Install

Install with npm:

npm install semtree

Use

Say we have the following two markdown files:

// file: fname-a

- [[node-1]]
  - [[node-1a]]
- [[node-2]]
  - [[node-2a]]
  - [[node-2b]]
- [[fname-b]]
// file: fname-b

- [[node-3]]
- [[node-4]]

If we wanted to create a single tree from both of these files, we can use semtree like so:

import * as semtree from 'semtree';

let opts = {
  wikiLink: true, // defaults to 'true'
};
const rootName: string | undefined = 'fname-a';
// read in files and create a record where
// keys are filenames and values are the file's content
const semTreeText: Record<string, string> = {
  // key: filename; value: file content
  'fname-a':
`- [[node-1]]
  - [[node-1a]]
  - [[node-2]]
    - [[node-2a]]
      - [[node-2b]]
      - [[fname-b]]
`,
  'fname-b':
`- [[node-3]]
- [[node-4]]
`,
};
const tree = semtree.create(semTreeText, rootName, opts);

Which will create a tree that looks like:

graph TD;
  node-1-->node-1a;
  node-1-->node-2;
  node-1-->fname-b;
  node-2-->node-2a;
  node-2-->node-2b;
  fname-b-->node-3;
  fname-b-->node-4;
Loading

Parsing, Syntax, and Validity

Tree requirements are sparse because the idea is to allow the end-user to determine the shape and structure of their tree in their markdown files. This package merely creates a single, virtual tree so as to better present that unified structure to the end-user.

Parsing:

  • Semtree will check for delimiters delineating where the index content is:
    : title : index file
    
    This is some markdown text.
    
    <!--semtree-->
    - [[node-a]]
      - [[node-b]]
      - [[node-c]]
    <!--/semtree-->
    • This content would return the following as the file content relevant for tree-building:
    - [[node-a]]
      - [[node-b]]
      - [[node-c]]
  • If delimiters do not exist any attribute metadata in caml or yaml format will be stripped.
    ---
    subject: semantic tree
    ---
    : title : index file
    
    - [[node-a]]
      - [[node-b]]
      - [[node-c]]
    • Here, the same content would be identified as the tree content since caml and yaml attrs would be stripped:
    - [[node-a]]
      - [[node-b]]
      - [[node-c]]

Syntax:

  • Indentation size defaults to 2 'space's. (see options indentKind and indentSize).
  • Markdown bullets (-*+) are optional (see option mkdnBullet).
  • [[wikilink]] syntax is optional (see option wikiLink).

Validity:

  • Every node in the tree should be unique; e.g. each list-item's text should be unique.
  • Must be a directed-acyclic-graph (DAG).
  • Each level can have any number of nodes.

API

TreeNode

Each node in the tree contains:

export interface TreeNode {
  text: string;
  ancestors: string[];
  children: string[];
  // custom data
  [key: string]: any;
}

ancestors: An array of strings that are the text of other nodes in the tree. Represents ancestors of the current node from the root node following the ancestral path to the current node.

children: An array of strings that are the text of other nodes in the tree. Represents children of the current node.

text: Contains the node text, which should be unique across all nodes in the tree and is used as an identifier in each nodes' other properties ancestors and children.

Finally, custom data is supported.

SemTree

The full SemTree looks like this:

interface SemTree {
  root: string;
  nodes: TreeNode[];
  trunk: string[];
  petioleMap: Record<string, string>;
  orphan: string[];
}

root: The text of the root node.

nodes: Contains a flat array of all the TreeNodes in the tree.

trunk: An array of text names of all the index/branch nodes (which typically correspond to the keys of the content hash).

petioleMap: A hash whose keys are the text names of all the nodes in the tree and the values are the text names of the index/branch node those keys appeared in (e.g. key node-1 yields value fname-a from the example above because node-1 appears in fname-a).

('petiole': "A leaf petiole is a thin stalk that connects a leaf blade to a stem")

orphan: An array of text names of any unprocessed index/branch nodes from the content hash keys not processed after calling create() or update().

create(root: string, content: Record<string, string>, opts: SemTreeOpts): SemTree | string;

Create a tree from a given Record, where keys represent nodes in a tree and values represent multiple values in the tree (such as filenames and their content) and build a tree from them. Will return a tree instance upon successful creation. Will return an error string otherwise, for example if there are duplicates found in the tree.

Parameters

content: Record<string, string>

A Record whose keys are entities (such as files) and values are content strings of those entities.

root: string

Name of the root node of the tree.

opts: SemTreeOpts

Options object -- see options below.

lint(content: string | Record<string, string>, opts: LintOpts): void | string

Lint a file's content or a record of multiple files' file content.

Checks for:

  • Duplicates / cycles
  • Spaces / tabs
  • Inconsistent indentation
  • Over-indentation
  • Markdown bullets
  • WikiLink
  • Lists files that weren't linked in the tree

(Note: Lint line numbers returned will be offset by wherever the target semtree content started within the file. If the content starts at line 5 and the linter says an error occurred on line 1, then the error probably occurs on line 6 of the file.)

Parameters

content: string | Record<string, string>

A content string or a Record whose keys are entities (such as files) and values are content strings of those entities.

opts: LintOpts

Lint options:

indentKind?: 'space' | 'tab'

Kind of indentation -- either 'space's or 'tab's.

indentSize?: number

Number of indentations (spaces or tabs) which represent each level in the tree.

mkdnBullet?: boolean

Whether the linter should check for markdown bullets (-, *, +) and print a warning if any nodes are missing them.

wikiLink?: boolean

Whether the linter should check for [[wikilink]] and print a warning if any nodes are missing them.

root?: string

The root filename is needed to print the names of any orphan (unprocessed / unlinked) index / trunk files.

print(tree: SemTree, print: boolean = true): string | undefined

Print the contents of a tree to console logs and return the string if there was a valid tree to print. Returns undefined if the tree is invalid.

Example output:

bk.how-to-read-a-book
├── demanding-reader
|   └── active-reading
|       ├── reading-comprehension
|       └── the-art-of-reading
└── 4-levels-of-reading
    ├── elementary-reading
    ├── inspectional-reading
    ├── analytical-reading
    └── syntopical-reading

Parameters

tree: SemTree

An instance of a SemTree.

print: boolean = true

Seeing this to false will suppress printing the tree to the console log and just return the string representation.

update(tree: SemTree, subroot: string, content: Record<string, string>, opts?: SemTreeOpts): SemTree | string;

A method to update a subtree within the semantic tree. (Best used to update individual index documents.) The given tree will be directly updated and the updated subtree nodes will be returned separately by update().

Parameters

tree: SemTree

A SemTree object.

content: Record<string, string>

A Record whose keys are entities (such as filenames) and values are content strings of those entities (such as file content).

subroot: string

Name of the subroot node of the subtree to be replaced.

opts: SemTreeOpts

Options object -- see options below.

Options

Config

virtualTrunk: boolean

Whether or not to include the semtree/index files themselves as nodes in the tree. This option is a useful toggle between 'tree-building' (non-virtual to allow for index/trunk file traversal) and 'tree-viewing' (virtual to eliminate unnecessary index/trunk files) states. Default is false. Best used for things like static site generation where updates are not a usual occurrence.

Note: If virtualTrunk is set to true, the resulting tree will not be updatable via the update function.

Text / Lint

delimiter: string = 'semtree'

The delimiter string to look for when identifying semtree indexes within a markdown file. Defaults to 'semtree'.

indentKind: 'space' | 'tab' = 'space'

The kind of whitespace expected for indentation of each level of the tree. The default is 'space'.

indentSize: number = 2

The size of each indentation level in the tree -- corresponds to number of spaces or tabs. The default is 2.

mkdnBullet: boolean = true

Whether or not to expect markdown bullets (- , * , + ).

wikiLink: boolean = true

Whether or not to expect [[wikilink square brackets]]. Default is true.

Functions

Option functions are useful when keeping the state of the tree in-sync with some other source like an index or database.

graft: (parentText: string, childText: string) => void

A function to execute when each node is added to the tree.

prune: (parentText: string, childText: string) => void

A function to execute when each node is removed from the tree.

setRoot: (text: string) => void

A function that can return/operate on the text of the root of the tree when it is being set.

Context

A semantic tree wends through concepts in semantic space, like a melody winds through harmonies in music.

In personal knowledge management (pkm) systems, there are sometimes mechanisms to facilitate the creation and management of hierarchical structures: Tag hierarchies, dynamic tables of contents, note metadata, namespacing, even using the directory system itself, adding a folgezettel to a zettelkasten, are all attempts to create one unified hierarchy from one's atomic notes.

But none of these solutions accommodate the specific aim of trying to build a single "semantic tree" very well: Tag hierarchies and namespacing both suffer from branch length problems -- namespaces generally require the entire branch be spelled out to represent a node accurately, which restricts branch size and thus the size of the whole tree. Metadata pointers is better, but because relationships are built one by one between notes, making large changes to the tree itself is burdensome and visualizing the entire tree at once requires imagination. Using the file directory itself runs into The Duplicate Folder Problem, where using paths to represent branches would contain needless duplicates which correspond to a file of the same name at the same level.

This implementation attempts to ameliorate these issues with the primary focus on facilitating semantic tree cultivation.

Side-Note: If you already have a collection of markdown notes, good candidates for index/tree(trunk) files might be "zettelkasten hubs" or "maps of content" (will likely require some tweaking to fit the model required by this package).