Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR Smart Chunks Improved handling of common headings #3

Open
LeoLDLeo opened this issue Jan 17, 2024 · 5 comments
Open

FR Smart Chunks Improved handling of common headings #3

LeoLDLeo opened this issue Jan 17, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@LeoLDLeo
Copy link

Discussed in brianpetro/obsidian-smart-connections#416

Originally posted by Levani307 January 16, 2024
Hello, I am a Smart Connection Supporter and I love using this plugin to find related notes in my Obsidian vault. However, I am encountering a problem. My daily notes follow the same structure and block names (e.g. ## Thoughts, ## Notes Created Today, etc.). Smart Connection matches other notes with the same structure and returns other notes with the same block names, even if other topics and themes are not quite that related. If I were to remove the block names, then finding related notes becomes a bit easier, but I lose the benefit of having a consistent format for my daily notes.

Is there a way to make Smart Connection ignore the block names and focus on the content of the notes instead? Or the only way is to remove any structure in my daily notes so that Smart Connection can find more relevant matches? I would appreciate any ideas or help from anyone who had the same issue. Thank you.

@LeoLDLeo LeoLDLeo changed the title Blok Titles Block Titles Jan 17, 2024
@eamonnvi
Copy link

Yes, I hit this issue too. I got around it by serialising my note titles and copying the original title into the note as a footnote (with a bash script, I think) and that prevented the title block outweighing the content in the similarity listing. However, it does make the results a little more opaque to the human eye. It would be good to be able to have a switch to include/exclude the title. Of course, this may just be a measure of my incompetence in using this amazing technology and there may be easier ways of achieving the desired result.

@brianpetro
Copy link
Owner

@Levani307 @eamonnvi thanks for raising this issue.

I think that being able to toggle inclusion of headings makes a lot of sense, though it would come at the cost of losing context which could also have a negative impact on results.

There are some alternative approaches that I'd like to explore. For example, after the initial embeddings score, you can "re-rank" the results using various methods, this can be anything from simply reducing scores based on shared headings to processing the results through another AI model. Another method would be creating up to three embeddings for the same content, 1) the existing, 2) the content only, and 3) the path (file path plus headings) then calculating a final score based on all three of those.

Long-term the best method, and what I have my sights on, is likely a combination of these that is unique to each user. This would look like a reinforcement learning layer that adjusts the weights of the various score inputs based on feedback.

This problem also pops up related to template files, or empty notes with similar headings, being erroneously surfaced. I have a solution for that, which will be implemented relatively shortly (before v2 general release). And I think it might be similarly helpful for this issue. In short, it's a variation of the re-ranking mentioned above.

I'm also going to think about the simply toggling off the headings in the embeddings option. If I think it can be done relatively easily (probably should be), then I'll do that too.

Thanks for the helpful feedback & support 😊
Brian 🌴

@eamonnvi
Copy link

eamonnvi commented Jan 17, 2024 via email

@brianpetro brianpetro added the enhancement New feature or request label Jan 18, 2024
@brianpetro brianpetro changed the title Block Titles FR Improved handling of common headings Jan 18, 2024
@brianpetro brianpetro changed the title FR Improved handling of common headings Improved handling of common headings Jan 19, 2024
@brianpetro
Copy link
Owner

@eamonnvi Thank you for your feedback and for sharing your use cases. I'm glad to hear that Smart Connections has been helpful for you.

Regarding your question about understanding how context works, context in Smart Connections is determined by the embeddings of the text. Embeddings are vector (numerical) representations of the text that capture its semantic meaning. The model uses these embeddings to calculate the similarity between different pieces of text.

In the case of your experiment with the novel chapters, in v1 it depends on how you asked the question due to the use of the HyDE method. It's kind of like a secondary search query being generated by GPT prior to the actual retrieval.

As for the ADA embedding model, it should create a .json file with the same name as the model. You can check if it's working by looking for this file in the .smart-connections directory. If the file is present, it indicates that the ADA model is being created successfully.

Good resources for learning about embeddings really depend on the specifics of what you're trying to achieve. Common formulas for calculating similarity are "dot product" and "cosine similarity", and linear algebra the relevant field of study. And there's a wide range of other relevant skills for utilizing embeddings that range from chunking methods to retrieval strategy.

Thank you for your kind words and support. I'm glad to have you as part of the Smart Connections community. If you have any more questions or need further assistance, feel free to ask.

Brian 🌴

@brianpetro
Copy link
Owner

Note: 2 possible methods of improvement:

Smart View filter: similar to brianpetro/obsidian-smart-connections#423, exclude or reduce weight of results containing specified headings, impacting results on a per-search basis. Filter may persist, but when removed the results go back to baseline.

Plugin-level configuration: settings allows exclusion of headings during the block-parsing process, impacting all results until exclusion is removed and blocks are re-embedded.

@brianpetro brianpetro transferred this issue from brianpetro/obsidian-smart-connections May 18, 2024
@brianpetro brianpetro changed the title Improved handling of common headings FR Smart Chunks Improved handling of common headings May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants