You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Gunicorn worker gets terminated due to signal 9 (low memory), however, I have 29 gb space in RAM available to use. As soon as I send request to my server hosted on render, the worker gets killed. But things are working fine in my local space as I have a system with some good specs. How much should I scale the server or is there any other way around by which I can solve this issue
Expected Behavior
I expected that things will work fine as it is working fine locally.
Steps To Reproduce
Just try running spladeEncoder stuff on render or replit.
It will be hard to debug this issue without more details. My intuition is that when loading the model on render/replit the model is loaded multiple times on different processes, thus causing the machine to fail. Try to check and see if there's a parallelism factor and try setting it to 1 and see if the issue reproduces.
@beamerboyyyy I had the same issue un Kubernetes and the root cause was that Torch do not releases memory allocated for the tensors. I fixed it this way and now my ingestion container runs smoothly with around 900MB memory allocated during the whole process without deadly spikes.
def _encode(self, texts: Union[str, List[str]]) -> Union[SparseVector, List[SparseVector]]:
"""
Args:
texts: single or list of texts to encode.
Returns a list of Splade sparse vectors, one for each input text.
"""
with torch.no_grad():
inputs = self.tokenizer(
texts,
return_tensors="pt",
padding=True,
truncation=True,
max_length=self.max_seq_length,
).to(self.device)
logits = self.model(**inputs).logits
del inputs # Explicitly delete the inputs tensor
inter = torch.log1p(torch.relu(logits))
token_max = torch.max(inter, dim=1)
del inter, logits # Explicitly delete intermediate tensors
nz_tokens_i, nz_tokens_j = torch.where(token_max.values > 0)
output = []
for i in range(token_max.values.shape[0]):
nz_tokens = nz_tokens_j[nz_tokens_i == i]
nz_weights = token_max.values[i, nz_tokens]
output.append({"indices": nz_tokens.cpu().tolist(), "values": nz_weights.cpu().tolist()})
del token_max, nz_tokens_i, nz_tokens_j # Explicitly delete tensors
return output[0] if isinstance(texts, str) else output
Probably not the best solution but hope it helps. If someone has better ideas please share.
Is this a new bug?
Current Behavior
Gunicorn worker gets terminated due to signal 9 (low memory), however, I have 29 gb space in RAM available to use. As soon as I send request to my server hosted on render, the worker gets killed. But things are working fine in my local space as I have a system with some good specs. How much should I scale the server or is there any other way around by which I can solve this issue
Expected Behavior
I expected that things will work fine as it is working fine locally.
Steps To Reproduce
Just try running spladeEncoder stuff on render or replit.
Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: