Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker, web-qa]: Unable to fetch document using Selenium #50

Open
ifTNT opened this issue Sep 5, 2024 · 1 comment
Open

[docker, web-qa]: Unable to fetch document using Selenium #50

ifTNT opened this issue Sep 5, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ifTNT
Copy link
Contributor

ifTNT commented Sep 5, 2024

Description:

The WebQA bot is failing to retrieve documents using Selenium. This results in an error message: "An error occurred while trying to fetch the document. Please make sure the submitted document exists and is publicly available."

Steps to Reproduce:

  1. Attempt to summarize the content of the URL: https://management.ntu.edu.tw/IM using WebQA.
  2. Observe the error message: "An error occurred while trying to fetch the document. Please make sure the submitted document exists and is publicly available."

Expected Outcome:

WebQA should successfully fetch and summarize the document from the provided URL.

Environment Details:

  • OS: Archlinux
  • Docker:
    • Client: Version 26.1.4
    • Server: Version 26.1.4
  • Browser: Firefox 127.0
  • WebQA Version: dev branch (Commit ID: f04a8f)

Additional context
The full log:

docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Discovering versions from https:/04:09:32 [7/1509]
.github.io/chrome-for-testing/known-good-versions-with-downloads.json
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Required driver: chromedriver 128.0.6613.119
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Downloading chromedriver 128.0.6613.119 from https
://storage.googleapis.com/chrome-for-testing-public/128.0.6613.119/linux64/chromedriver-linux64.zip
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Driver path: /root/.cache/selenium/chromedriver/li
nux64/128.0.6613.119/chromedriver
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Browser path: /root/.cache/selenium/chrome/linux64
/128.0.6613.119/chrome
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.service] DEBUG:    Started executable: `/root/.cache/selenium/chromedriver/lin
ux64/128.0.6613.119/chromedriver` in a child process with pid: 104 using 0 to output -3
docqa-executor-1    | 2024-09-05 20:09:32 [src.crawler  ] WARNING:  Message: Service /root/.cache/selenium/chromedriver/linux64/128.0.6613.119/chro
medriver unexpectedly exited. Status code was: 127
docqa-executor-1    |
docqa-executor-1    | 2024-09-05 20:09:32 [asyncio      ] ERROR:    Unclosed client session
docqa-executor-1    | client_session: <aiohttp.client.ClientSession object at 0x77cedaa60d60>
docqa-executor-1    | 2024-09-05 20:09:32 [asyncio      ] ERROR:    Unclosed connector
docqa-executor-1    | connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x77cea18f9f00>, 244810.599)]']
docqa-executor-1    | connector: <aiohttp.connector.TCPConnector object at 0x77cedaa609a0>
docqa-executor-1    | 2024-09-05 20:09:32 [__main__     ] ERROR:    Error when constructing document store.
docqa-executor-1    | Traceback (most recent call last):
docqa-executor-1    |   File "/usr/src/app/docqa/docqa.py", line 155, in doc_qa
docqa-executor-1    |     document_store, docs = await self.document_store_factory.construct_document_store(
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 147, in construct_document_store
docqa-executor-1    |     document_store, docs = await self._construct_document_store(urls, document_store_kwargs, ttl_hash)
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 45, in __await__
docqa-executor-1    |     self.result = yield from self.co.__await__()
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 118, in _construct_document_store
docqa-executor-1    |     if len(docs) == 0: raise RuntimeError("Error fetching documents.")
docqa-executor-1    | RuntimeError: Error fetching documents.
@ifTNT ifTNT added the bug Something isn't working label Sep 5, 2024
@ifTNT ifTNT added this to the v0.3.4 milestone Sep 5, 2024
@ifTNT ifTNT self-assigned this Sep 5, 2024
@ifTNT ifTNT closed this as completed Sep 10, 2024
@ifTNT ifTNT reopened this Sep 11, 2024
@ifTNT
Copy link
Contributor Author

ifTNT commented Sep 11, 2024

Progress update:

  1. Installed missing dependency
  2. new error message:
[src.crawler  ] WARNING:  Message: session not created: Chrome failed to start: exited normally.
(session not created: DevToolsActivePort file doesn't exist)
(The process started from chrome location /root/.cache/selenium/chrome/linux64/128.0.6613.119/chrome is no longer run
ning, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x6227c47ea86a <unknown>

ifTNT added a commit that referenced this issue Sep 11, 2024
@ifTNT ifTNT removed this from the v0.3.4 milestone Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant