-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda OOM error when "saving batch" #110
Comments
Can confirm this also happens with
|
Might also be related to the Issue I mentioned here in combination with Japanese / Chinese symbols, since my dataset contains these: #94 (comment) I'll try another iteration on the model later without these languages to see if that makes any difference. |
I think Vall-E X is for multi language support . Not sure if Vall E can learn multiple languages |
From my testing yesterday it was able to transfer dialect from input sample to output even if it's a different language. The Issue with OOM I believe comes from the model implementation not being able to handle the symbols. I've seen there is this PR for adding chinese language support with a giant wordlist and G2PBackend, which I believe is needed to convert the symbol language to words that can be properly phonemized internally. Because yesterday I tried Inference with sth. like this:
I believe this is what G2PBackend actually does internally, but I could be mistaken. |
Ah I see. Makes sense. |
Do you mean we can train it on multiple languages already?
Let me know if I get it right. Does this mean that input is lang A and output is lang B? I thought that the language id would control the accent instead as in "Learning to Speak Foreign Language Fluently". |
I have a PR open for the Commonvoice Dataset and it is currently training on 24 different languages in my AI training machine. Unfortunately I still cannot be sure it trains on the full dataset, because I hit this OOM error after ~164.500 steps each time; however I implemented some code and hopefully it is fixed next time it hits the issue.
I might understood that wrong as well. Just revisited their Github Page, I think they actually can control the accent explicitly. Just in most examples they just switch from english to chinese and backwards, so it's hard to assume if it could for example do french with german accent very well. It probably can. |
Further debugging the issue revealed that the crash seems to happen frequently if there are Cyrillic letters in the batch; processing these takes considerably more time and VRAM. Also going to provide a PR with exception handling code later, which provides some verbosity output if the error is hit and (tries to) skip broken batches and continue training. |
@RuntimeRacer do you think it's useful to preprocess the data into phonemes and then give that to vall e. I feel like this would solve a lot of the OOM errors |
@nivibilla I did follow the exact process of dataset preparation for my CommonVoice training; I assume the phoneme conversion has already happened; also it apparently IS able to process these letters and symbols, but the performance in generation as well as the memory footprint is incredibly worse compared to latin. Elle est toujours utilisée par Réseau ferré italien pour le service de l'infrastructure. -> This takes ~2 seconds in inference and 6.3 GB of VRAM |
Ah right, in inference we are using a lot of vram for phoneme conversion? That's strange. |
Im finding it difficult to understand why there is a vram difference when using different languages. When converting to phonemes why is there a difference in VRAM usage? I assume after converting, there should be no difference what language it is. Unless the converted phonemes are much longer when its not English? |
I can share my symbols file later. But I don't think these are very different; I had a look at them before starting training |
Almost forgot I wanted to share my symbols file from phonemization:
|
try lower |
@lifeiteng I believe it is most likely a charset issue following my observations: #110 (comment) Duration Disribution for CV with 24 languages (including languages with Cyrillic, Chinese and Japanese Charset) I shared in my commit here: https://github.com/lifeiteng/vall-e/pull/111/files#diff-aaf4d0ff4603a6956d6a4834fd5df31c65f62e95cee609f435828504c31a82fa I will share my intermediate training model to allow further testing once Commonvoice epoch 1 finshed. |
Have you increased the macro Line 2 in 168ace8
|
I'm running into this issue, as well. I thought I had stripped out the non-latin alphabet characters from my dataset, but I still run into the issue. It passes the:
But then fails on a specific batch. Are you also stripping punctuation? What are you doing to filter out the non-latin alphabet characters? |
So I tested this again. I tried with Value 1024 and also 4096 now, but each time it starts breaking the training as soon as the first cyrillic sentence appears. I believe this is some encoding related issue. EDIT: I will check if I can somehow apply this while reading in the datasets: https://pypi.org/project/anyascii/0.1.6/ |
@RuntimeRacer Can you post how the I think my OOMs were due to text that was way too long compared to the audio. So the
Most good (English) data that I've spot checked seems to have a ratio around 6.0-6.5, but I've seen it as low as 4.0, too. |
Just ran into this midst of training. I assume maybe the epoch ended and it tried to save something to disk, which is only few MB in size however.
I'll continue with lowering
max-duration
from 80 to 60 now.The text was updated successfully, but these errors were encountered: