Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code and simplify file processing #48

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"pretest": "npm run build && node ./dist/src/bin/index.js ./__mocks__/content",
"test": "node --experimental-vm-modules node_modules/jest/bin/jest.js",
"build": "tsc --p tsconfig.lib.json",
"dev": "tsc --p tsconfig.lib.json --watch",
"changeset": "changeset",
"prepublishOnly": "npm run build",
"release": "changeset publish"
Expand Down Expand Up @@ -55,6 +56,7 @@
"@changesets/changelog-github": "^0.4.8",
"@changesets/cli": "^2.26.1",
"@types/jest": "^29.5.1",
"@types/node": "^20.8.7",
"@typescript-eslint/eslint-plugin": "^5.59.5",
"@typescript-eslint/parser": "^5.59.5",
"eslint": "^8.40.0",
Expand Down
6 changes: 4 additions & 2 deletions src/bin/index.js → src/bin/index.ts
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env node

import { MarkdownDB } from "../lib/markdowndb.js";
// import MddbFolderWatcher from "./watch.js";
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

// TODO get these from markdowndb.config.js or something
const dbPath = "markdown.db";
Expand All @@ -18,11 +19,12 @@
},
});

// Ignore top-level await errors
//@ts-ignore

Check failure on line 23 in src/bin/index.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Do not use "@ts-ignore" because it alters compilation errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, let's maybe change "module": "es2020" to "module": "esnext" in tsconfig.lib.json? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I used that to ignore this error while I was working on the issue but used this approach for compatibility concerns...

await client.init();

//@ts-ignore

Check failure on line 26 in src/bin/index.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Do not use "@ts-ignore" because it alters compilation errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

await client.indexFolder({
folderPath: contentPath,
ignorePatterns: ignorePatterns,
});

process.exit();
Copy link
Member

@olayway olayway Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a hacky solution, but without it, the process doesn't seem to terminate on its own. So until we figure this out, I'd leave it so that users don't need to force it with SIGINT.

116 changes: 116 additions & 0 deletions src/lib/DbQueryManager.ts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend avoiding refactoring the DB here - let's just focus on process.ts as discussed in the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the refactoring done so far in this PR. It makes the code much easier to reason about.

Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import { Knex } from "knex";
import { MddbFile, MddbTag, MddbLink } from "./schema.js";
import { FilesQuery } from "./types/DbQueryTypes.js";

export class DbQueryManager {
private db: Knex;

constructor(db: Knex) {
this.db = db;
}

async getFileById(id: string): Promise<MddbFile | null> {
const file = await this.db.from("files").where("_id", id).first();
return new MddbFile(file);
}

async getFileByUrl(url: string): Promise<MddbFile | null> {
const file = await this.db
.from("files")
.where("url_path", encodeURI(url))
.first();
return new MddbFile(file);
}

async getFiles(query?: FilesQuery): Promise<MddbFile[]> {
const { filetypes, tags, extensions, folder, frontmatter } = query || {};

const files = await this.db
// TODO join only if tags are specified ?
.leftJoin("file_tags", "files._id", "file_tags.file")
.where((builder) => {
// TODO temporary solution before we have a proper way to filter files by and assign file types
if (folder) {
builder.whereLike("url_path", `${folder}/%`);
}

if (tags) {
builder.whereIn("tag", tags);
}

if (extensions) {
builder.whereIn("extension", extensions);
}

if (filetypes) {
builder.whereIn("filetype", filetypes);
}

if (frontmatter) {
Object.entries(frontmatter).forEach(([key, value]) => {
if (typeof value === "string" || typeof value === "number") {
builder.whereRaw(`json_extract(metadata, '$.${key}') = ?`, [
value,
]);
} else if (typeof value === "boolean") {
if (value) {
builder.whereRaw(`json_extract(metadata, '$.${key}') = ?`, [
true,
]);
} else {
builder.where(function () {
this.whereRaw(`json_extract(metadata, '$.${key}') = ?`, [
false,
]).orWhereRaw(`json_extract(metadata, '$.${key}') IS NULL`);
});
}
}

// To check if the provided value exists in an array inside the JSON
else {
builder.whereRaw(`json_extract(metadata, '$.${key}') LIKE ?`, [
`%${value}%`,
]);
}
});
}
})
.select("files.*")
.from("files")
.groupBy("_id");

return files.map((file) => new MddbFile(file));
}

async getTags(): Promise<MddbTag[]> {
const tags = await this.db("tags").select();
return tags.map((tag) => new MddbTag(tag));
}

async getLinks(query?: {
fileId: string;
linkType?: "normal" | "embed";
direction?: "forward" | "backward";
}): Promise<MddbLink[]> {
const { fileId, direction = "forward", linkType } = query || {};
const joinKey = direction === "forward" ? "from" : "to";
const where = {
[joinKey]: fileId,
};
if (linkType) {
where["link_type"] = linkType;
}
const dbLinks = await this.db
.select("links.*")
.from("links")
.rightJoin("files", `links.${joinKey}`, "=", "files._id")
.where(where);

const links = dbLinks.map((link) => new MddbLink(link));
return links;
}

_destroyDb() {
this.db.destroy();
}
}
63 changes: 63 additions & 0 deletions src/lib/indexFolderToObjects.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import { WikiLink, getUniqueValues, recursiveWalkDir } from "../utils/index.js";
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved
import { markdownToObject } from "./markdownToObject.js";
import { File, FileTag, Link, Tag } from "./types/schemaTypes.js";
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

export function indexFolderToObjects(
folderPath: string,
ignorePatterns?: RegExp[],
pathToUrlResolver?: (filePath: string) => string

Check warning on line 8 in src/lib/indexFolderToObjects.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

'pathToUrlResolver' is defined but never used
) {
const filePathsToIndex = recursiveWalkDir(folderPath);
const files: File[] = [];
const tags: Tag[] = [];
const fileTags: FileTag[] = [];
const links: Link[] = [];
const filteredFilePathsToIndex = filePathsToIndex.filter((filePath) => {
return !(
ignorePatterns && ignorePatterns.some((pattern) => pattern.test(filePath))
);
});

for (const filePath of filteredFilePathsToIndex) {
const fileObject = markdownToObject(folderPath, filePath, filePathsToIndex);

files.push(fileObject.file);
tags.push(...fileObject.tags);

const fileTagsToInsert = fileObject.tags.map((tag) => ({
tag: tag.name,
file: fileObject.file._id,
}));
const uniqueFileTags = getUniqueValues(fileTagsToInsert);
fileTags.push(...uniqueFileTags);
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

const linksToInsert: Link[] = processWikiLinks(
fileObject.links,
fileObject.file._id,
filePathsToIndex
);
links.push(...linksToInsert);
}

const uniqueTags = getUniqueValues(tags);
return {
files: files,
tags: uniqueTags,
fileTags: fileTags,
links: links,
};
}

function processWikiLinks(
links: WikiLink[],
fileId: string,
filePathsToIndex: string[]
): Link[] {
return links
.map((link) => ({
from: fileId,
to: filePathsToIndex.find((file) => file === link.linkSrc)!,

Check warning on line 59 in src/lib/indexFolderToObjects.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
link_type: link.linkType,
}))
.filter((link) => link.to !== undefined);
}
82 changes: 82 additions & 0 deletions src/lib/markdownToObject.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import fs from "fs";
import path from "path";
import { parseFile } from "../utils/index.js";
import { FileObject } from "./types/FileObject.js";
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved
import { MddbFile } from "./schema.js";
import {
defaultFilePathToUrl,
} from "./markdowndb.js";
import { generateFileIdFromPath } from "../utils/index.js";
import { getFileExtensionFromPath } from "../utils/index.js";

export function markdownToObject(
folderPath: string,
filePath: string,
filePathsToIndex: string[]
): FileObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird that markdownToObject takes "folderPath" as a first argument 🤔
Also, thinking from the "perspective" of markdownToObject function 😅, why am I receiving filePathsToIndex? What "I" actually need are "permalinks" that can be used to correctly resolve wikilinks.

Suggested change
export function markdownToObject(
folderPath: string,
filePath: string,
filePathsToIndex: string[]
): FileObject {
export function fileToObject({
filePath,
folderPath,
permalinks
}:
{
filePath: string,
folderPath: string,
permalinks: string[]
}): FileObject {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I renamed this function to better reflect what id does...

const id = generateFileIdFromPath(filePath);
const extension = getFileExtensionFromPath(filePath);
const defaultFileJson = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const defaultFileJson = {
const fileObject = {

And below you don't need to create another one (see comment below).

file: {
_id: id,
file_path: filePath,
extension,
url_path: null,
filetype: null,
metadata: null,
},
tags: [],
links: [],
};

// if not md or mdx return this
const isExtensionNotSupported =
!MddbFile.supportedExtensions.includes(extension);
if (isExtensionNotSupported) {
return defaultFileJson;
}

// url_path
const pathRelativeToFolder = path.relative(folderPath, filePath);
// const urlPath = defaultFilePathToUrl(pathRelativeToFolder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// const urlPath = defaultFilePathToUrl(pathRelativeToFolder);

const urlPath = defaultFilePathToUrl(pathRelativeToFolder);
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

// metadata, tags, links
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});

({ metadata, links } = parseFile(source, {
permalinks: filePathsToIndex,
}));
} catch (e) {
console.error(
`Failed to parse file ${filePath}. Waiting for file changes.`
);

return defaultFileJson;
}

// get tags in the file
const tags = (metadata?.tags || []).map((tag: string) => {
return { name: tag };
});
mohamedsalem401 marked this conversation as resolved.
Show resolved Hide resolved

let fileObject: FileObject = {

Check failure on line 68 in src/lib/markdownToObject.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

'fileObject' is never reassigned. Use 'const' instead
file: {
_id: id,
extension: extension,
url_path: urlPath,
file_path: filePath,
filetype: metadata?.type || null,
metadata: metadata,
},
tags: [...tags],
links: links,
};

return fileObject;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can fs.readFileSync be replaced with async operation?
  2. Magbe tags should also just be returned from the parseFile and destructured like links? Not sure...
  3. You don't need to create a new fileObject object. Use the one you created above and add properties on it. Then return.

A bit more tidy?:

Suggested change
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});
({ metadata, links } = parseFile(source, {
permalinks: filePathsToIndex,
}));
} catch (e) {
console.error(
`Failed to parse file ${filePath}. Waiting for file changes.`
);
return defaultFileJson;
}
// get tags in the file
const tags = (metadata?.tags || []).map((tag: string) => {
return { name: tag };
});
let fileObject: FileObject = {
file: {
_id: id,
extension: extension,
url_path: urlPath,
file_path: filePath,
filetype: metadata?.type || null,
metadata: metadata,
},
tags: [...tags],
links: links,
};
return fileObject;
try {
const source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});
const ({ metadata, links } = parseFile(source, {
permalinks: filePathsToIndex,
}));
// get tags in the file
const tags = (metadata?.tags || []).map((tag: string) => {
return { name: tag };
});
fileObject.tags = tags;
fileObject.metadata = metadata;
// ...
} catch (e) {
console.error(
`Failed to parse file ${filePath}. Waiting for file changes.`
);
}
return fileObject;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should consider making the function asynchronous in a separate pull request, as I believe it would require a few modifications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

}
3 changes: 2 additions & 1 deletion src/lib/markdowndb.spec.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
// import knex from "knex";
import { MarkdownDB } from "./markdowndb";
import { File, MddbFile, Table } from "./schema";
import { MddbFile } from "./schema";
import { recursiveWalkDir } from "../utils";
import { File, Table } from "./types/schemaTypes";

/**
* @jest-environment node
Expand Down Expand Up @@ -173,14 +174,14 @@
test("can find file by url path", async () => {
const dbFile = await mddb.getFileByUrl("blog/blog2");
expect(dbFile).not.toBeNull();
expect(dbFile!.url_path).toBe("blog/blog2");

Check warning on line 177 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
});

test("can find file by id", async () => {
const dbFile = await mddb.getFileByUrl("blog/blog2");
const dbFileById = await mddb.getFileById(dbFile!._id);

Check warning on line 182 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
expect(dbFileById).not.toBeNull();
expect(dbFileById!.url_path).toBe("blog/blog2");

Check warning on line 184 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
});
});

Expand Down Expand Up @@ -208,10 +209,10 @@
const toFile = await mddb.getFileByUrl("blog0");

const forwardLinks = await mddb.getLinks({
fileId: fromFile!._id,

Check warning on line 212 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
});
expect(forwardLinks.length).toBe(1);
expect(forwardLinks[0].to).toBe(toFile!._id);

Check warning on line 215 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
});

test("can get all backward links of a file", async () => {
Expand All @@ -220,13 +221,13 @@
const fromFile2 = await mddb.getFileByUrl("blog/blog1");

const backwardLinks = await mddb.getLinks({
fileId: toFile!._id,

Check warning on line 224 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
direction: "backward",
});
const backwardLinksFileIds = backwardLinks.map((l) => l.from);
expect(backwardLinksFileIds).toHaveLength(2);
expect(backwardLinksFileIds).toContain(fromFile1!._id);

Check warning on line 229 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
expect(backwardLinksFileIds).toContain(fromFile2!._id);

Check warning on line 230 in src/lib/markdowndb.spec.ts

View workflow job for this annotation

GitHub Actions / Lint & format check

Forbidden non-null assertion
});
});

Expand Down
Loading
Loading