Chunking algorithm for files other than markdown #235
Unanswered
sh-l
asked this question in
Documentation requests
Replies: 1 comment
-
Sorry for being late! And thank you for reading that document! However, In several formats, I have compared the PDF before and after editing and found that some references were often being modified. This would make comparisons difficult. Nevertheless, I think that your use case might be common. Could I ask what PDF format I should use as a reference? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If I understand https://docs.vrtmrz.net/LiveSync/hintandtrivia/Data+structure+of+LiveSync correctly, the plugin uses a chunking and diffing algorithm to reduce bandwidth consumption.
But the documentation mostly mentions
.md
files. I'm wondering to know whether the same algorithm is applied to files other than.md
, and if so, how it splits such files into chunks.My main use case would be
.pdf
files, and from Obsidian recently improving its PDF viewer and such, I guess PDF files are also a first-class citizen of Obsidian. I often add/modify/delete annotations to PDF files via third-party apps a lot, and these are very small operations that leaves the most of the file content unchanged, so it might be a good target for optimization. Also, internal data structure of PDF is nicely split into pages, and pages are splitted from annotations and so on, so I guess PDF files can be chunked by its page boundaries.Beta Was this translation helpful? Give feedback.
All reactions