Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DuckDuckGoSearchRM is not working - AttributeError #231

Open
AIconicInsight opened this issue Oct 24, 2024 · 2 comments
Open

[BUG] DuckDuckGoSearchRM is not working - AttributeError #231

AIconicInsight opened this issue Oct 24, 2024 · 2 comments

Comments

@AIconicInsight
Copy link

Describe the bug
When using DuckDuckGoSearchRM in the STORMWikiRunner, I always get an attribute error in STORMWikiRunner.run(...): 'AssertionError' object has no attribute 'message'. The underlaying error comes from _text_api function in the DDGS class in the duckduckgo_search library, file duckduckgo_search.py: AssertionError: keywords is mandatory.

I used a LLM from Ollama: llama3.1:8b-instruct-fp16.

To Reproduce
See 'Code' section below

Environment:

  • OS: Ubuntu 22.04.5 LTS
  • Python v3.10.0
Code
  
import os
from knowledge_storm import (
    STORMWikiRunnerArguments,
    STORMWikiRunner,
    STORMWikiLMConfigs,
)
from knowledge_storm.lm import OllamaClient
from knowledge_storm.rm import DuckDuckGoSearchRM
from dotenv import load_dotenv

load_dotenv()

output_dir = os.getenv("OUTPUT_DIR")

lm_configs = STORMWikiLMConfigs()
engine_args = STORMWikiRunnerArguments(output_dir=output_dir)

ollama_kwargs = {
"temperature": 0.7,
"top_p": 0.9,
}

fast_model_name = os.getenv("FAST_MODEL_NAME")
fast_model_host = os.getenv("FAST_MODEL_HOST")
fast_model_port = int(os.getenv("FAST_MODEL_PORT"))
fast_model_max_tokens = int(os.getenv("FAST_MODEL_MAX_TOKENS"))
strong_model_name = os.getenv("STRONG_MODEL_NAME")
strong_model_host = os.getenv("STRONG_MODEL_HOST")
strong_model_port = int(os.getenv("STRONG_MODEL_PORT"))
strong_model_max_tokens = int(os.getenv("STRONG_MODEL_MAX_TOKENS"))

rm = DuckDuckGoSearchRM()

fast_model = OllamaClient(
model=fast_model_name,
url=fast_model_host,
port=fast_model_port,
max_tokens=fast_model_max_tokens,
**ollama_kwargs
)
strong_model = OllamaClient(
model=strong_model_name,
url=strong_model_host,
port=strong_model_port,
max_tokens=strong_model_max_tokens,
**ollama_kwargs
)

lm_configs.set_conv_simulator_lm(fast_model)
lm_configs.set_question_asker_lm(fast_model)
lm_configs.set_outline_gen_lm(strong_model)
lm_configs.set_article_gen_lm(strong_model)
lm_configs.set_article_polish_lm(strong_model)

runner = STORMWikiRunner(args=engine_args, lm_configs=lm_configs, rm=rm)

topic = "Deep Neural Networks"
runner.run( # <--- Error is thrown here
topic=topic,
do_research=True,
do_generate_outline=True,
do_generate_article=True,
do_polish_article=True,
)

Error Message
{
	"name": "AttributeError",
	"message": "'AssertionError' object has no attribute 'message'",
	"stack": "---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/backoff/_sync.py:105, in retry_exception.<locals>.retry(*args, **kwargs)
    104 try:
--> 105     ret = target(*args, **kwargs)
    106 except exception as e:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/rm.py:787, in DuckDuckGoSearchRM.request(self, query)
    778 @backoff.on_exception(
    779     backoff.expo,
    780     (Exception,),
   (...)
    785 )
    786 def request(self, query: str):
--> 787     results = self.ddgs.text(
    788         query, max_results=self.k, backend=self.duck_duck_go_backend
    789     )
    790     return results

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py:243, in DDGS.text(self, keywords, region, safesearch, timelimit, backend, max_results)
    242 if backend == \"api\":
--> 243     results = self._text_api(keywords, region, safesearch, timelimit, max_results)
    244 elif backend == \"html\":

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py:275, in DDGS._text_api(self, keywords, region, safesearch, timelimit, max_results)
    258 \"\"\"DuckDuckGo text search. Query params: https://duckduckgo.com/params.
    259 
    260 Args:
   (...)
    273     TimeoutException: Inherits from DuckDuckGoSearchException, raised for API request timeouts.
    274 \"\"\"
--> 275 assert keywords, \"keywords is mandatory\"
    277 vqd = self._get_vqd(keywords)

AssertionError: keywords is mandatory

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[6], line 2
      1 topic = \"Deep Neural Networks\"
----> 2 runner.run(
      3     topic=topic,
      4     do_research=True,
      5     do_generate_outline=True,
      6     do_generate_article=True,
      7     do_polish_article=True,
      8 )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/engine.py:394, in STORMWikiRunner.run(self, topic, ground_truth_url, do_research, do_generate_outline, do_generate_article, do_polish_article, remove_duplicate, callback_handler)
    392 information_table: StormInformationTable = None
    393 if do_research:
--> 394     information_table = self.run_knowledge_curation_module(
    395         ground_truth_url=ground_truth_url, callback_handler=callback_handler
    396     )
    397 # outline generation module
    398 outline: StormArticle = None

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:499, in Engine.log_execution_time_and_lm_rm_usage.<locals>.wrapper(*args, **kwargs)
    496 @functools.wraps(func)
    497 def wrapper(*args, **kwargs):
    498     start_time = time.time()
--> 499     result = func(*args, **kwargs)
    500     end_time = time.time()
    501     execution_time = end_time - start_time

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/engine.py:219, in STORMWikiRunner.run_knowledge_curation_module(self, ground_truth_url, callback_handler)
    212 def run_knowledge_curation_module(
    213     self,
    214     ground_truth_url: str = \"None\",
    215     callback_handler: BaseCallbackHandler = None,
    216 ) -> StormInformationTable:
    218     information_table, conversation_log = (
--> 219         self.storm_knowledge_curation_module.research(
    220             topic=self.topic,
    221             ground_truth_url=ground_truth_url,
    222             callback_handler=callback_handler,
    223             max_perspective=self.args.max_perspective,
    224             disable_perspective=False,
    225             return_conversation_log=True,
    226         )
    227     )
    229     FileIOHelper.dump_json(
    230         conversation_log,
    231         os.path.join(self.article_output_dir, \"conversation_log.json\"),
    232     )
    233     information_table.dump_url_to_info(
    234         os.path.join(self.article_output_dir, \"raw_search_results.json\")
    235     )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:379, in StormKnowledgeCurationModule.research(self, topic, ground_truth_url, callback_handler, max_perspective, disable_perspective, return_conversation_log)
    377 # run conversation
    378 callback_handler.on_information_gathering_start()
--> 379 conversations = self._run_conversation(
    380     conv_simulator=self.conv_simulator,
    381     topic=topic,
    382     ground_truth_url=ground_truth_url,
    383     considered_personas=considered_personas,
    384     callback_handler=callback_handler,
    385 )
    387 information_table = StormInformationTable(conversations)
    388 callback_handler.on_information_gathering_end()

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:340, in StormKnowledgeCurationModule._run_conversation(self, conv_simulator, topic, ground_truth_url, considered_personas, callback_handler)
    338     for future in as_completed(future_to_persona):
    339         persona = future_to_persona[future]
--> 340         conv = future.result()
    341         conversations.append(
    342             (persona, ArticleTextProcessing.clean_up_citation(conv).dlg_history)
    343         )
    345 return conversations

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:438, in Future.result(self, timeout)
    436     raise CancelledError()
    437 elif self._state == FINISHED:
--> 438     return self.__get_result()
    440 self._condition.wait(timeout)
    442 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/thread.py:52, in _WorkItem.run(self)
     49     return
     51 try:
---> 52     result = self.fn(*self.args, **self.kwargs)
     53 except BaseException as exc:
     54     self.future.set_exception(exc)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:318, in StormKnowledgeCurationModule._run_conversation.<locals>.run_conv(persona)
    317 def run_conv(persona):
--> 318     return conv_simulator(
    319         topic=topic,
    320         ground_truth_url=ground_truth_url,
    321         persona=persona,
    322         callback_handler=callback_handler,
    323     )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/primitives/program.py:26, in Module.__call__(self, *args, **kwargs)
     25 def __call__(self, *args, **kwargs):
---> 26     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:69, in ConvSimulator.forward(self, topic, persona, ground_truth_url, callback_handler)
     67 if user_utterance.startswith(\"Thank you so much for your help!\"):
     68     break
---> 69 expert_output = self.topic_expert(
     70     topic=topic, question=user_utterance, ground_truth_url=ground_truth_url
     71 )
     72 dlg_turn = DialogueTurn(
     73     agent_utterance=expert_output.answer,
     74     user_utterance=user_utterance,
     75     search_queries=expert_output.queries,
     76     search_results=expert_output.searched_results,
     77 )
     78 dlg_history.append(dlg_turn)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/primitives/program.py:26, in Module.__call__(self, *args, **kwargs)
     25 def __call__(self, *args, **kwargs):
---> 26     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:214, in TopicExpert.forward(self, topic, question, ground_truth_url)
    212 queries = queries[: self.max_search_queries]
    213 # Search
--> 214 searched_results: List[Information] = self.retriever.retrieve(
    215     list(set(queries)), exclude_urls=[ground_truth_url]
    216 )
    217 if len(searched_results) > 0:
    218     # Evaluate: Simplify this part by directly using the top 1 snippet.
    219     info = \"\"

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:314, in Retriever.retrieve(self, query, exclude_urls)
    309     return local_to_return
    311 with concurrent.futures.ThreadPoolExecutor(
    312     max_workers=self.max_thread
    313 ) as executor:
--> 314     results = list(executor.map(process_query, queries))
    316 for result in results:
    317     to_return.extend(result)

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:608, in Executor.map.<locals>.result_iterator()
    605 while fs:
    606     # Careful not to keep a reference to the popped future
    607     if timeout is None:
--> 608         yield fs.pop().result()
    609     else:
    610         yield fs.pop().result(end_time - time.monotonic())

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:438, in Future.result(self, timeout)
    436     raise CancelledError()
    437 elif self._state == FINISHED:
--> 438     return self.__get_result()
    440 self._condition.wait(timeout)
    442 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/thread.py:52, in _WorkItem.run(self)
     49     return
     51 try:
---> 52     result = self.fn(*self.args, **self.kwargs)
     53 except BaseException as exc:
     54     self.future.set_exception(exc)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:295, in Retriever.retrieve.<locals>.process_query(q)
    294 def process_query(q):
--> 295     retrieved_data_list = self.rm(
    296         query_or_queries=[q], exclude_urls=exclude_urls
    297     )
    298     local_to_return = []
    299     for data in retrieved_data_list:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/retrieve/retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     29 def __call__(self, *args, **kwargs):
---> 30     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/rm.py:813, in DuckDuckGoSearchRM.forward(self, query_or_queries, exclude_urls)
    809 collected_results = []
    811 for query in queries:
    812     #  list of dicts that will be parsed to return
--> 813     results = self.request(query)
    815     for d in results:
    816         # assert d is dict
    817         if not isinstance(d, dict):

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/backoff/_sync.py:111, in retry_exception.<locals>.retry(*args, **kwargs)
    107 max_tries_exceeded = (tries == max_tries_value)
    108 max_time_exceeded = (max_time_value is not None and
    109                      elapsed >= max_time_value)
--> 111 if giveup(e) or max_tries_exceeded or max_time_exceeded:
    112     _call_handlers(on_giveup, **details, exception=e)
    113     if raise_on_giveup:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dsp/modules/mistral.py:28, in giveup_hdlr(details)
     26 def giveup_hdlr(details):
     27     \"\"\"wrapper function that decides when to give up on retry\"\"\"
---> 28     if \"rate limits\" in details.message:
     29         return False
     30     return True

AttributeError: 'AssertionError' object has no attribute 'message'"
}

I also tried GoogleSearch instead of DuckDuckGoSearchRM, which logs a lot of errors in the search phase but it runs through however.
The code is the same, I only adjusted the rm variable:

rm = GoogleSearch(
    google_search_api_key=os.getenv("GOOGLE_SEARCH_API_KEY"),
    google_cse_id=os.getenv("GOOGLE_CSE_ID"),
)

Generated output:
generated.zip

Google Search Error Logs
  
root : ERROR    : Error occurred while searching query : 
`Error while requesting URL('https://www.sciencedirect.com/topics/computer-science/deep-neural-network') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/topics/computer-science/deep-neural-network'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query History of deep learning: `
`Error while requesting URL('https://ofac.treasury.gov/faqs') - ReadTimeout('The read operation timed out')`
`root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query : The read operation timed out
root : ERROR    : Error occurred while searching query Here are the queries I would type in the search box to find information on computing error gradients during backpropagation and handling vanishing or exploding gradients:: The read operation timed out
root : ERROR    : Error occurred while searching query : The read operation timed out
root : ERROR    : Error occurred while searching query backpropagation error gradient computation: The read operation timed out`
`Error while requesting URL('https://www.tandfonline.com/doi/full/10.1080/10888691.2018.1537791') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.tandfonline.com/doi/full/10.1080/10888691.2018.1537791'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")
Error while requesting URL('https://www.sciencedirect.com/science/article/pii/S0268401223000233') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/science/article/pii/S0268401223000233'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")
Error while requesting URL('https://www.sciencedirect.com/science/article/pii/S266734522300024X') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/science/article/pii/S266734522300024X'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query Here are some queries I would use to find relevant information:: [SSL] internal error (_ssl.c:2536)
root : ERROR    : Error occurred while searching query **What makes a Deep Neural Network "Deep"?**: The read operation timed out
root : ERROR    : Error occurred while searching query Here are the Google search queries I would use:: The read operation timed out
root : ERROR    : Error occurred while searching query : `
`Error while requesting URL('https://www.sciencedirect.com/topics/computer-science/deep-neural-network') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/topics/computer-science/deep-neural-network'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query Here are some Google search queries I would use to find information on the concept of gradient descent and its relation to backpropagation in deep neural networks:: The read operation timed out
root : ERROR    : Error occurred while searching query : The read operation timed out
trafilatura.utils : ERROR    : parsed tree length: 1, wrong data type or not valid HTML
trafilatura.core : ERROR    : empty HTML tree: None
trafilatura.core : WARNING  : discarding data: None
root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query : [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2536)
trafilatura.utils : ERROR    : parsed tree length: 1, wrong data type or not valid HTML
trafilatura.core : ERROR    : empty HTML tree: None
trafilatura.core : WARNING  : discarding data: None
root : ERROR    : Error occurred while searching query Here are the queries I would use to find information on common loss functions used in deep neural networks:: The read operation timed out
knowledge_storm.interface : INFO     : run_knowledge_curation_module executed in 263.2473 seconds
knowledge_storm.interface : INFO     : run_outline_generation_module executed in 7.4509 seconds
sentence_transformers.SentenceTransformer : INFO     : Use pytorch device_name: cuda
sentence_transformers.SentenceTransformer : INFO     : Load pretrained SentenceTransformer: paraphrase-MiniLM-L6-v2`
  
@shaoyijia
Copy link
Collaborator

Sorry for the delay. Could you check whether the rm can output content with some queries written by yourself?

Given you are using a quantized model, the model may fail to output correct queries.

@AIconicInsight
Copy link
Author

The RM works fine, after some more testing I narrowed the issue down to the LLMs. I tested Ollama models in a up to 80GB VRAM range (both quantized and full-precision). The only model that didn't run into errors was Llama 3.2 3b (fp), however the output was only the introduction section. I also used the DSPy templates from the ollama example.

Have you found success with a model from the Ollama model library? If so, which one(s) was it?

Tested LLMs:

  • Llama 3.1 8b (fp)
  • Llama 3.1 70b (q8)
  • Llama 3.2 3b (fp)
  • Nemotron Mini 4b (fp)
  • Gemma 2 27b (q8)
  • Qwen 2.5 14b (q8)
  • Qwen 2.5 72b (q8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants