Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn worker gets terminated due to signal 9 (low memory) #34

Open
2 tasks done
owais142002 opened this issue May 30, 2023 · 2 comments
Open
2 tasks done

Gunicorn worker gets terminated due to signal 9 (low memory) #34

owais142002 opened this issue May 30, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@owais142002
Copy link

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Gunicorn worker gets terminated due to signal 9 (low memory), however, I have 29 gb space in RAM available to use. As soon as I send request to my server hosted on render, the worker gets killed. But things are working fine in my local space as I have a system with some good specs. How much should I scale the server or is there any other way around by which I can solve this issue

Expected Behavior

I expected that things will work fine as it is working fine locally.

Steps To Reproduce

Just try running spladeEncoder stuff on render or replit.

Relevant log output

No response

Environment

- **OS**:
- **Language version**:
- **Pinecone client version**:

Additional Context

No response

@owais142002 owais142002 added the bug Something isn't working label May 30, 2023
@miararoy
Copy link
Collaborator

miararoy commented Jun 2, 2023

Hi @beamerboyyyy

Running the spladeEncoder is using a small, distilled model link this model should run <8G ram

you can see that this library unittests - https://github.com/pinecone-io/pinecone-text/blob/main/tests/test_splade.py is running on ubuntu-latest with 7G ram see here

It will be hard to debug this issue without more details. My intuition is that when loading the model on render/replit the model is loaded multiple times on different processes, thus causing the machine to fail. Try to check and see if there's a parallelism factor and try setting it to 1 and see if the issue reproduces.

@eudaimoniatech
Copy link

eudaimoniatech commented Aug 16, 2023

@beamerboyyyy I had the same issue un Kubernetes and the root cause was that Torch do not releases memory allocated for the tensors. I fixed it this way and now my ingestion container runs smoothly with around 900MB memory allocated during the whole process without deadly spikes.

def _encode(self, texts: Union[str, List[str]]) -> Union[SparseVector, List[SparseVector]]:
       """
       Args:
           texts: single or list of texts to encode.

       Returns a list of Splade sparse vectors, one for each input text.
       """
       with torch.no_grad():
           inputs = self.tokenizer(
               texts,
               return_tensors="pt",
               padding=True,
               truncation=True,
               max_length=self.max_seq_length,
           ).to(self.device)

           logits = self.model(**inputs).logits
           del inputs  # Explicitly delete the inputs tensor

           inter = torch.log1p(torch.relu(logits))
           token_max = torch.max(inter, dim=1)
           del inter, logits  # Explicitly delete intermediate tensors

           nz_tokens_i, nz_tokens_j = torch.where(token_max.values > 0)

           output = []
           for i in range(token_max.values.shape[0]):
               nz_tokens = nz_tokens_j[nz_tokens_i == i]
               nz_weights = token_max.values[i, nz_tokens]
               output.append({"indices": nz_tokens.cpu().tolist(), "values": nz_weights.cpu().tolist()})

           del token_max, nz_tokens_i, nz_tokens_j  # Explicitly delete tensors

       return output[0] if isinstance(texts, str) else output

Probably not the best solution but hope it helps. If someone has better ideas please share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants