Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BM25Encoder.default() Fails to download stopwords on a cert error #74

Open
2 tasks done
newgolddream opened this issue Feb 12, 2024 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@newgolddream
Copy link

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Invoking BM25Encoder.getDefault() fails to download related files with a SSL cert error:

File "****.py", line 45, in sparseVectorQuery
    bm25 = BM25Encoder.default()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pinecone_text/sparse/bm25_encoder.py", line 261, in default
    wget.download(url, str(tmp_path))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)

Expected Behavior

Expect BM25Encoder.getDefault() to download supporting files successfully when they are not already present in the environment.

Steps To Reproduce

from pinecone_text.sparse import BM25Encoder

def createSparseVectors(chunks):

    #  encoder = tiktoken.encoding_for_model("gpt-4")

    # extract text to form corpus
    corpus = []
    for chunk in chunks:
        corpus.append(chunk["content"])

    bm25 = BM25Encoder.default() # <<<<<<<<<<<<<<<
    bm25.fit(corpus)

    sparse_vectors = bm25.encode_documents(corpus)

    return sparse_vectors



x = [
    {"content":"apples are red"},
    {"content":"bananas are yellow"}
]
createSparseVectors(x)

Relevant log output

No response

Environment

MacOS 13.1
python==3.10.9
pip==24.0
certi==24.2.2
pinecone-client==3.0.2
pinecone-text==0.8.0

Additional Context

No response

@newgolddream newgolddream added the bug Something isn't working label Feb 12, 2024
@newgolddream newgolddream changed the title BM25Encoder.getDefault() Fails to download stopwords on a cert error BM25Encoder.default() Fails to download stopwords on a cert error Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant