Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory on GPU not cleared after transcription #992

Open
DinnoKoluh opened this issue Sep 6, 2024 · 4 comments
Open

Memory on GPU not cleared after transcription #992

DinnoKoluh opened this issue Sep 6, 2024 · 4 comments

Comments

@DinnoKoluh
Copy link

DinnoKoluh commented Sep 6, 2024

Hi, I have a use case where after a script that transcribes an audio file still needs to run also after it finished the transcription process.
The question I have is why doesn't the GPU memory get fully cleared after the transcription process is over?

When I try to delete the faster-whisper model object, there seems to be about 312 MiB GPU memory still occupied but I don't know by what. There is a sample screen shot from the nvtop command and the code to replicate this behaviour. The "running" after the transcription process is imitated by the time.sleep(20) command and during those 20 seconds you can see that the GPU memory is still occupied by those 312 MiB, and gets only released when the script is fully finished.

img

from faster_whisper import WhisperModel
import time

def create_model(model_size) -> WhisperModel:
    """
    Instantiates a new Whisper model and returns it.
    """
    transcriber = WhisperModel(
        model_size_or_path=model_size,
        device="cuda",
        device_index=0,
        compute_type="int8",
        cpu_threads=1
    )
    return transcriber

model = create_model("medium")

segments, info = model.transcribe("english4.mp3")
for seg in segments:
    print(seg.text)

print("Finished transcribing")

del model
print("Deleted faster-whisper object")

print("Wait 20 sec")
time.sleep(20)
@MahmoudAshraf97
Copy link
Contributor

can you try that in a loop? i.e. repeating the experiment to see if each model instance created and deleted leaves a residue in memory and not a once-in-a-runtime issue
also try using a gpu ram clearing function such as torch.cuda.empty_cache() or anything similar to see if it solves the issue

@DinnoKoluh
Copy link
Author

For every model (tiny, medium, large-v2, ...), the residue is the same, 312 MiB. I tried using, torch.cuda.empty_cache() but it still doesn't clear the memory.

@benniekiss
Copy link

That's likely the memory held by the cuda runtime, which iirc, cant really be freed unless the entire process is killed.

Do you notice if loading/unloading the model a few times in a row always results in the same amount of memory remaining? If so, its probably very likely the runtime.

@DinnoKoluh
Copy link
Author

Yes, always the same amount of memory for all models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants