Fix: bugs when using opensource models #609

PaulSZH95 · 2024-07-18T09:05:16Z

Description

Bugs fixes

Related Issues

Proposed Changes

1 - clean_up_json function is resolved to parse from first instance of '{' in llm outputs. Allows for opensource models which are more verbose.

2 - embed function resolves to decode encoded chunked tokens. This allows opensource models to with a different tokenizer to still work.

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

I have not tested with openai's model, only with llm: groq and llm. embedding: lmstudio.

PaulSZH95 · 2024-07-18T09:06:15Z

@microsoft-github-policy-service agree

cedarwud · 2024-08-01T04:01:12Z

graphrag/index/utils/json.py

@@ -6,6 +6,7 @@

 def clean_up_json(json_str: str):
    """Clean up json string."""
+    json_str = json_str[json_str.index('{'):]


it will raise error in global query

json_str = json_str[json_str.index('{'):]
ValueError: substring not found

wrt to your error... no matter how well you write your json parser you'd still encounter error from time to time.

reason: your model isn't able to output json in a format you require it to.

So far the only solution is reiteration when errors are occured as error will still occur from time to time even if you finetune your gpt4 model. Of course, this is speaking from experience but so far i have not seen any model getting a 100% in humaneval benchs. p.s Langchain's approach is also reiteration, you probably experience it less cause the reiteration is hidden away unless you opt for verbosity.

Probably a good fix would be reiteration when faced with parsing error and not a fix on the parsing logic.

natoverse · 2024-08-09T17:29:29Z

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

(this may not resolve embeddings formats, but our expectation is that any proxy will translate to maintain compatibility with the default GraphRAG LLM calls)

Fix: bugs

9cad1d1

PaulSZH95 requested a review from a team as a code owner July 18, 2024 09:05

sdjd93dj mentioned this pull request Jul 19, 2024

[Bug]: Errors in local search #451

Closed

Merge branch 'main' into main

ccb07e0

PaulSZH95 requested a review from a team as a code owner July 25, 2024 01:32

PaulSZH95 added 2 commits July 25, 2024 19:38

Merge branch 'main' into main

eb652bd

Merge branch 'main' into main

a5eaa00

cedarwud reviewed Aug 1, 2024

View reviewed changes

natoverse closed this Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: bugs when using opensource models #609

Fix: bugs when using opensource models #609

PaulSZH95 commented Jul 18, 2024

PaulSZH95 commented Jul 18, 2024

cedarwud Aug 1, 2024

PaulSZH95 Aug 1, 2024 •

edited

Loading

natoverse commented Aug 9, 2024

Fix: bugs when using opensource models #609

Fix: bugs when using opensource models #609

Conversation

PaulSZH95 commented Jul 18, 2024

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

PaulSZH95 commented Jul 18, 2024

cedarwud Aug 1, 2024

Choose a reason for hiding this comment

PaulSZH95 Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

natoverse commented Aug 9, 2024

PaulSZH95 Aug 1, 2024 •

edited

Loading