Web search needs to return more than a snippet. #877
Replies: 2 comments
-
I am thinking the same as you but somehow, the chat of bing ai has very great result using only the snippet when you ask it a question. I am wondering how they manage to set up their AI to answer so well using only the data from the snippet. Also, their snippet is more complete that the snippets retourner by the bing search api. When I do the same method as the one you are proposing, the AI (chatgpt4) often only use the data that I am giving to it to answer and not also the data that it knows about the subject. I am trying to solve this issue. For exemple, if you ask bing ai to write an article about the best place to visit in England, it will answer using the search result but it will also add a lot of data from it knowledge base. I think that Microsoft has worked very well the integration of the data from the search browsing to their AI using probably some optimized prompts. |
Beta Was this translation helpful? Give feedback.
-
I beleive @craigomatic is already working on another repo to solve this using Playwrights API to return the body tag of text back to the LLM per url. I ran into the exact same issue as you and found that the google api only returning snippets is not adequate to get enough data on a topic for a valid response to be formulated by an LLM. I plan to be looking through what Craig has already built here: https://github.com/craigomatic/webscraper-aiplugin/ and expand on it. Hope that hepls! |
Beta Was this translation helpful? Give feedback.
-
At present, both the Bing and Google connectors return the snippet from a page. This is problematic as in my experience, there is rarely anything of substance in this snippet. If you are getting an AI to attempt to gain knowledge via a web search, this doesn't work.
I appreciate that downloading and parsing pages is more expensive, yet this approach is far more likely to result in good content, and hence better overall performance.
For reference, the existing code is here: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Skills/Skills.Web/Bing/BingConnector.cs
and looks thus (with some unimportant lines omitted for brevity):
` public async Task<IEnumerable> SearchAsync(string query, int count = 1, int offset = 0, CancellationToken cancellationToken = default)
{
Uri uri = new($"https://api.bing.microsoft.com/v7.0/search?q={Uri.EscapeDataString(query)}&count={count}&offset={offset}");
My approach is to instead:
Initial tests show this to be working well.
So, is this a modification that the team feels sits here in the connector? Should it be an alternative mode / different method on the connector?
I honestly don't see the value of looking at the snippets, so possibly I'm misunderstanding the use-case.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions