-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor code and simplify file processing #48
Conversation
|
Co-authored-by: Ola Rubaj <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job 👏. I think some parts of the code (sometimes leftovers from the original code and sometimes yours) can be improved.
In general, apart from all the detailed comments, I'd rethink the structure of the FileObject
that gets returned after parsing MD in markdownToObject
. Atm it looks like this:
export interface FileObject {
file: File;
tags: Tag[];
links: WikiLink[];
}
// OTHER TYPES
export interface File {
_id: string;
file_path: string;
extension: string;
url_path: string | null;
filetype: string | null;
metadata: MetaData | null;
}
export type MetaData = {
[key: string]: any;
};
export interface Tag {
name: string;
}
export interface WikiLink {
linkSrc: string;
linkType: "normal" | "embed";
}
As I mentioned in one of the comments related to tags somewhere in your PR, I think we shouldn't try to mold this object already at the parsing (md->obj) stage into something resembling our schema (Mddb*
classes). This should be a more natural description of file metadata, and only when starting the next stage, i.e. "writing" to the database, specific data from it should be extracted and passed to relevant insert functions. I imagine this could look like this:
export interface FileObject {
_id: string; // btw, shouldn't _id be auto generated by the DB? In this case it shouldn't even be here, only on the file fetched from the DB
file_path: string;
extension: string;
url_path: string | null;
filetype: string | null;
metadata: MetaData | null;
tags: string[]; // <--- instead of {name:string}[]
wikilinks: WikiLink[]
}
And so, what I miss in this PR is a bit of planning ;) I.e. How do we want the object resulting from the parsing stage be structured, and why? All the refactoring (apart from the massive tidying up you did, which is definitely an improvement) should be predicated on this new API you envisioned.
Also, not sure if parse.ts
and markdownToObject
need to be separate. They seem to have overlapping responsibilities. I'd merge these two.
src/bin/index.ts
Outdated
// Ignore top-level await errors | ||
//@ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, let's maybe change "module": "es2020"
to "module": "esnext"
in tsconfig.lib.json
? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I used that to ignore this error while I was working on the issue but used this approach for compatibility concerns...
src/bin/index.ts
Outdated
await client.init(); | ||
|
||
//@ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one comment so far but making the point that we don't want to refactor existing code really at all here ...
src/lib/DbQueryManager.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend avoiding refactoring the DB here - let's just focus on process.ts
as discussed in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave the refactoring done so far in this PR. It makes the code much easier to reason about.
@mohamedsalem401 What do you think? I'm not sure about it |
The reason why I think they should be independent is that |
Yes, I agree, let's leave them separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohamedsalem401 Where are the tests? 😄
src/lib/markdownToObject.ts
Outdated
export function markdownToObject( | ||
folderPath: string, | ||
filePath: string, | ||
filePathsToIndex: string[] | ||
): FileObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird that markdownToObject
takes "folderPath" as a first argument 🤔
Also, thinking from the "perspective" of markdownToObject
function 😅, why am I receiving filePathsToIndex
? What "I" actually need are "permalinks" that can be used to correctly resolve wikilinks.
export function markdownToObject( | |
folderPath: string, | |
filePath: string, | |
filePathsToIndex: string[] | |
): FileObject { | |
export function fileToObject({ | |
filePath, | |
folderPath, | |
permalinks | |
}: | |
{ | |
filePath: string, | |
folderPath: string, | |
permalinks: string[] | |
}): FileObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so I renamed this function to better reflect what id does...
src/lib/markdownToObject.ts
Outdated
let source: string, metadata, links; | ||
try { | ||
source = fs.readFileSync(filePath, { | ||
encoding: "utf8", | ||
flag: "r", | ||
}); | ||
|
||
({ metadata, links } = parseFile(source, { | ||
permalinks: filePathsToIndex, | ||
})); | ||
} catch (e) { | ||
console.error( | ||
`Failed to parse file ${filePath}. Waiting for file changes.` | ||
); | ||
|
||
return defaultFileJson; | ||
} | ||
|
||
// get tags in the file | ||
const tags = (metadata?.tags || []).map((tag: string) => { | ||
return { name: tag }; | ||
}); | ||
|
||
let fileObject: FileObject = { | ||
file: { | ||
_id: id, | ||
extension: extension, | ||
url_path: urlPath, | ||
file_path: filePath, | ||
filetype: metadata?.type || null, | ||
metadata: metadata, | ||
}, | ||
tags: [...tags], | ||
links: links, | ||
}; | ||
|
||
return fileObject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
src/utils/resolveLinkToUrlPath.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future: this should probably be renamed to sth like resolveLinkToFilePath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
const urlPath = pathToUrlResolver(pathRelativeToFolder); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const urlPath = pathToUrlResolver(pathRelativeToFolder); | |
fileObject.url_path = pathToUrlResolver(pathRelativeToFolder); |
let source: string, metadata, links; | ||
try { | ||
source = fs.readFileSync(filePath, { | ||
encoding: "utf8", | ||
flag: "r", | ||
}); | ||
|
||
({ metadata, links } = parseMarkdownContent(source, { | ||
permalinks: filePathsToIndex, | ||
})); | ||
|
||
fileObject.url_path = urlPath; | ||
fileObject.metadata = metadata; | ||
fileObject.filetype = metadata?.type || null; | ||
fileObject.links = links; | ||
if (metadata.tags) { | ||
fileObject.tags = metadata.tags; | ||
} | ||
|
||
return fileObject; | ||
} catch (e) { | ||
console.error(`Error processing file ${filePath}: ${e}`); | ||
return fileObject; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this function at the top:
function readFileContent(filePath: string): string {
return fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});
}
and then:
let source: string, metadata, links; | |
try { | |
source = fs.readFileSync(filePath, { | |
encoding: "utf8", | |
flag: "r", | |
}); | |
({ metadata, links } = parseMarkdownContent(source, { | |
permalinks: filePathsToIndex, | |
})); | |
fileObject.url_path = urlPath; | |
fileObject.metadata = metadata; | |
fileObject.filetype = metadata?.type || null; | |
fileObject.links = links; | |
if (metadata.tags) { | |
fileObject.tags = metadata.tags; | |
} | |
return fileObject; | |
} catch (e) { | |
console.error(`Error processing file ${filePath}: ${e}`); | |
return fileObject; | |
} | |
try { | |
const source = readFileContent(filePath); | |
const ({ metadata, links } = parseMarkdownContent(source, { | |
permalinks: filePathsToIndex, | |
})); | |
fileObject.metadata = metadata; | |
fileObject.filetype = metadata?.type || null; | |
fileObject.links = links; | |
fileObject.tags = metadata?.tags || null; | |
} catch (e) { | |
console.error(`Error processing file ${filePath}: ${e}`); | |
} | |
return fileObject; |
pathToUrlResolver | ||
); | ||
|
||
const file = extractFileSchemeFromObject(fileObject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest "marshalling" (or molding the object into the ready-to-insert-to-db structure) is done by relevant Mddb* classes (in this case MddbFile). So I'd just add the fileObject
to the list and let MddbFile
do the rest. And if you'd really want to do this here, creating a function for it is not necessary, as you could just destructure what's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ready to proceed, but I have a question about the structure of the data in the JSON files. Do you think it should mirror the format of the database? If they should align, we might want to keep things as they are. However, if variations are acceptable, I am more than willing to implement the modifications based on your recommended alternative structure.
import { FileObject } from "../lib/types/FileObject.js"; | ||
|
||
export function extractFileSchemeFromObject(fileObject: FileObject) { | ||
return { | ||
_id: fileObject._id, | ||
file_path: fileObject.file_path, | ||
extension: fileObject.extension, | ||
url_path: fileObject.url_path, | ||
filetype: fileObject.filetype, | ||
metadata: fileObject.metadata, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, I'd move "marshalling" responsibility to relevant Mddb* classes and do it only right before inserting into db.
I believe this pull request (PR) includes numerous changes at the moment. Therefore, I plan to open a new PR specifically for the latest changes, including tests. |
Co-authored-by: Ola Rubaj <[email protected]>
Co-authored-by: Ola Rubaj <[email protected]>
Co-authored-by: Ola Rubaj <[email protected]>
src/lib/indexFolderToObjects.ts
Outdated
const fileObject = readLocalMarkdownFileToObject( | ||
folderPath, | ||
filePath, | ||
filePathsToIndex, | ||
pathToUrlResolver | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit awkward imo... Both the naming, the arguments it takes and unnecesarily adds a layer of abstraction. I wonder if we could get rid of the readLocalMarkdownFileToObject
whatsoever. This could just be:
const fileObject = readLocalMarkdownFileToObject( | |
folderPath, | |
filePath, | |
filePathsToIndex, | |
pathToUrlResolver | |
); | |
const id = generateFileIdFromPath(filePath); | |
const extension = getFileExtensionFromPath(filePath); | |
if (MddbFile.supportedExtensions.includes(extension)) { | |
return { id, extension, filePath }; | |
} | |
const pathRelativeToFolder = path.relative(folderPath, filePath); | |
const urlPath = pathToUrlResolver(pathRelativeToFolder); | |
const data = parseMarkdownFile(filePath) | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
This could benefit from a slight refactor...
I think it would be better if we add tests to the same PR... Also, please don't merge it to |
OK, I will add them in this pull request and will switch to the new branch |
Pull Request: Main Refactoring and Conversion
Summary of Changes
This pull request introduces significant improvements and refactoring to enhance the MarkdownDB functionality. The key modifications include:
Class Breakdown:
MarkdownDB
class has been refactored into two distinct classes, separating concerns for indexing and querying.Conversion to TypeScript Objects and SQL:
Function Refactoring: