Memory on GPU not cleared after transcription #992

DinnoKoluh · 2024-09-06T06:52:03Z

Hi, I have a use case where after a script that transcribes an audio file still needs to run also after it finished the transcription process.
The question I have is why doesn't the GPU memory get fully cleared after the transcription process is over?

When I try to delete the faster-whisper model object, there seems to be about 312 MiB GPU memory still occupied but I don't know by what. There is a sample screen shot from the nvtop command and the code to replicate this behaviour. The "running" after the transcription process is imitated by the time.sleep(20) command and during those 20 seconds you can see that the GPU memory is still occupied by those 312 MiB, and gets only released when the script is fully finished.

from faster_whisper import WhisperModel
import time

def create_model(model_size) -> WhisperModel:
    """
    Instantiates a new Whisper model and returns it.
    """
    transcriber = WhisperModel(
        model_size_or_path=model_size,
        device="cuda",
        device_index=0,
        compute_type="int8",
        cpu_threads=1
    )
    return transcriber

model = create_model("medium")

segments, info = model.transcribe("english4.mp3")
for seg in segments:
    print(seg.text)

print("Finished transcribing")

del model
print("Deleted faster-whisper object")

print("Wait 20 sec")
time.sleep(20)

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2024-09-06T09:30:06Z

can you try that in a loop? i.e. repeating the experiment to see if each model instance created and deleted leaves a residue in memory and not a once-in-a-runtime issue
also try using a gpu ram clearing function such as torch.cuda.empty_cache() or anything similar to see if it solves the issue

DinnoKoluh · 2024-09-06T09:56:52Z

For every model (tiny, medium, large-v2, ...), the residue is the same, 312 MiB. I tried using, torch.cuda.empty_cache() but it still doesn't clear the memory.

benniekiss · 2024-09-06T13:02:22Z

That's likely the memory held by the cuda runtime, which iirc, cant really be freed unless the entire process is killed.

Do you notice if loading/unloading the model a few times in a row always results in the same amount of memory remaining? If so, its probably very likely the runtime.

DinnoKoluh · 2024-09-06T13:03:27Z

Yes, always the same amount of memory for all models.

fedirz mentioned this issue Oct 3, 2024

Unload model after being not used for some time fedirz/faster-whisper-server#72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory on GPU not cleared after transcription #992

Memory on GPU not cleared after transcription #992

DinnoKoluh commented Sep 6, 2024 •

edited

Loading

MahmoudAshraf97 commented Sep 6, 2024

DinnoKoluh commented Sep 6, 2024

benniekiss commented Sep 6, 2024

DinnoKoluh commented Sep 6, 2024

Memory on GPU not cleared after transcription #992

Memory on GPU not cleared after transcription #992

Comments

DinnoKoluh commented Sep 6, 2024 • edited Loading

MahmoudAshraf97 commented Sep 6, 2024

DinnoKoluh commented Sep 6, 2024

benniekiss commented Sep 6, 2024

DinnoKoluh commented Sep 6, 2024

DinnoKoluh commented Sep 6, 2024 •

edited

Loading