Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading BM25 Encoders using Pinecone-Text in Celery is failing #82

Open
2 tasks done
saharaquant opened this issue Aug 26, 2024 · 4 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@saharaquant
Copy link

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When trying to init a BM25 encoder bm25_encoder = BM25Encoder().default() during a celery task - it's throwing an error
'LoggingProxy' object has no attribute 'fileno'

This issue is happening because of wget trying to print a bar, and there's no way to control it from the outside.
https://github.com/pinecone-io/pinecone-text/blob/main/pinecone_text/sparse/bm25_encoder.py#L261

Expected Behavior

Provide a way to control if it should display a bar or not when downloading the encoder

Steps To Reproduce

Run the simplest celery task trying to execute bm25_encoder = BM25Encoder().default()

Relevant log output

bm25_encoder = BM25Encoder().default()\n File \"\/usr\/local\/lib\/python3.10\/site-packages\/pinecone_text\/sparse\/bm25_encoder.py\", line 261, in default\n wget.download(url, str(tmp_path))\n File \"\/usr\/local\/lib\/python3.10\/site-packages\/wget.py\", line 526, in download\n (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)\n File \"\/usr\/local\/lib\/python3.10\/urllib\/request.py\", line 267, in urlretrieve\n reporthook(blocknum, bs, size)\n File \"\/usr\/local\/lib\/python3.10\/site-packages\/wget.py\", line 513, in callback_charged\n callback_progress(blocks, block_size, total_size, bar_function=bar)\n File \"\/usr\/local\/lib\/python3.10\/site-packages\/wget.py\", line 461, in callback_progress\n width = min(100, get_console_width())\n File \"\/usr\/local\/lib\/python3.10\/site-packages\/wget.py\", line 337, in get_console_width\n ioctl(sys.stdout.fileno(), TIOCGWINSZ, winsize)\nAttributeError: 'LoggingProxy' object has no attribute 'fileno'\n

Environment

- **OS**: Linux
- **Language version**: 3.10.4
- **Pinecone client version**: 0.5.3, but also on the latest

Additional Context

https://github.com/pinecone-io/pinecone-text/blob/main/pinecone_text/sparse/bm25_encoder.py#L261

@saharaquant saharaquant added the bug Something isn't working label Aug 26, 2024
@saharaquant
Copy link
Author

Anyone?

@github-staff github-staff deleted a comment from saharaquant Aug 27, 2024
@Gayathri-Gobinathan
Copy link

it is taking too long every time, i try to load. any alternate options available?

@isaaccs
Copy link

isaaccs commented Sep 26, 2024

it is taking too long every time, i try to load. any alternate options available?

what I did is dl punk_taband then copy past it and rename it as punk before calling BM25

@Gayathri-Gobinathan
Copy link

@isaaccs , can you please provide some code for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants
@isaaccs @saharaquant @Gayathri-Gobinathan and others