You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the same code that worked for me with parler-tts/parler-tts-mini-expresso yields slower much slower generations (like 4-5x) and an error with parler-tts/parler-tts-tiny-v1.
I then went an used the "specific voice" and "random voice" suggested scripts from the hugging face repo for tiny. Specifically the error I get is:
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/scott/miniconda3/envs/parler/lib/python3.11/site-packages/soundfile.py", line 342, in write
channels = data.shape[1]
~~~~~~~~~~^^^
IndexError: tuple index out of range
I'm running it on an RTX 3090, Ubuntu 24.04. I just confirmed this problem in a brand new conda env:
conda create -n parler-bare python=3.11
conda activate parler-bare
pip install git+https://github.com/huggingface/parler-tts.git
python
#Run example code blocks from huggingface, e.g.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-tiny-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-tiny-v1")
prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
(I also got the warning The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. after the generation =... line. I was getting (different) attention mask warnings with my working mini-expresso code though.)
P.S. Thanks for these great models!
The text was updated successfully, but these errors were encountered:
ScottMcMac
changed the title
Tiny generation slower and "IndexError: tuple out of shape"
Tiny generation slower and "IndexError: tuple out of range"
Oct 3, 2024
ScottMcMac
changed the title
Tiny generation slower and "IndexError: tuple out of range"
Tiny generation slower and yields "IndexError: tuple out of range"
Oct 3, 2024
Using the same code that worked for me with parler-tts/parler-tts-mini-expresso yields slower much slower generations (like 4-5x) and an error with parler-tts/parler-tts-tiny-v1.
I then went an used the "specific voice" and "random voice" suggested scripts from the hugging face repo for tiny. Specifically the error I get is:
I'm running it on an RTX 3090, Ubuntu 24.04. I just confirmed this problem in a brand new conda env:
(I also got the warning
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's
attention_maskto obtain reliable results.
after thegeneration =...
line. I was getting (different) attention mask warnings with my working mini-expresso code though.)P.S. Thanks for these great models!
The text was updated successfully, but these errors were encountered: