You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, ive been trying to make my own Training data, but there doesnt seem to be a ton of resources on how the data should be formatted. Ive compared the LJ001 Data and tried to imitate it, including splitting wavs, and the transcript.csv.
I have tested train.py with the LJ001 Data and the trainer works, but when i try with my Data it fails, giving me this error:
Traceback (mostrecentcalllast):
File"train.py", line96, in<module>g=Graph(); print("Training Graph loaded")
File"train.py", line33, in__init__self.x, self.y, self.z, self.fnames, self.num_batch=get_batch()
File"C:\Users\...\tacotron-master\data_load.py", line116, inget_batchdynamic_pad=True)
File"C:\anaconda3\envs\...\training\bucket_ops.py", line374, inbucket_by_sequence_lengthraiseValueError("bucket_boundaries must not be empty")
ValueError: bucket_boundariesmustnotbeempty
Here is an example of the CSV File, i tried matching the ID, TEXT, LENGTH Format.
SM001-0001|Oh happy fourth of July America|00:00:02
SM001-0002|Ready to fire up the grill and celebrate our victory over the Brits|00:00:03
SM001-0003|Well, I'm not|00:00:01
SM001-0004|Because despite that incredibly convincing American accent, I'm one of those Brits|00:00:04
SM001-0005|now I've acted in film and TV for years|00:00:02
SM001-0006|but my greatest performance is acting like I don't care that every summer you gobble down tube sausages and celebrate kicking our arses|00:00:07
SM001-0007|Or butts as you say incorrectly|00:00:02
SM001-0008|Do you really still have to celebrate your emancipation from us|00:00:02
SM001-0009|I mean that's like your girlfriend breaking up with you and then celebrating with fireworks|00:00:04
SM001-0010|every year for 300 years|00:00:03
SM001-0011|it gets my goat|00:00:01
SM001-0012|but what really gets my goat is imagining how great America would be if we were still in charge|00:00:04
SM001-0013|Oh America if we'd won the war you'd have better comedy news TV programs and way better rude words|00:00:07
SM001-0014|Oh I'm talking fanny, trollop, minger tar, Minjbag, bleeding, sodding, blooming, cocked up, get stuffed|00:00:06
SM001-0015|and of course wanker|00:00:01
SM001-0016|imagine how sophisticated you'd say when you're insulting someone|00:00:03
SM001-0017|Oh Brad your wife's a slag don't piss off your wanker|00:00:04
SM001-0018|see how classy that sounded with our accents and your American self-confidence you'd be unstoppable|00:00:05
SM001-0019|yeah you'd have to pay a few more taxes but you can't put a price on that|00:00:03
SM001-0020|Great Britain two would be the greatest country on Earth|00:00:02
SM001-0021|your lawyers would all wear powdered wigs so criminals really respect them|00:00:04
SM001-0022|and you'd have all the mushy peas you can stuff down your bloody great gobs|00:00:03
SM001-0023|oh and if you get sick you don't need to worry about medical insurance because with a National Health Service a doctor will see you for free in about two years|00:00:08
SM001-0024|plus your taxes will be spent on things you really need like a royal family who do the tough jobs no one else wants to do|00:00:06
SM001-0025|like being driven around in a really nice car while waving|00:00:04
SM001-0026|you'all want to eat some apple pie then shoot some hoops and have hoedown|00:00:03
So tldr two questions:
Why am i receiving this bucket_boundaries must not be empty Error when python finds the CSV and can read it.
Based on 1's answer how can i properly format my data to work with the neural network
The text was updated successfully, but these errors were encountered:
Hi @dbarroso1 ,
Did you eventually find how to format your data?
I’m at the same stage. I couldn’t figure how to properly do it but duplicating the transcript.csv file from the LJ dataset and carefully pasting in my own dataset, sentence by sentence, did the trick. Not a particularly sustainable or elegant solution…
I am also facing this bucket error I checked the maxlen(151) and minlength (149) that's why in for loop there is no iteration , so there is no value in bucket . If anyone solved this problem kindly help me in solving this issue
Hello, ive been trying to make my own Training data, but there doesnt seem to be a ton of resources on how the data should be formatted. Ive compared the LJ001 Data and tried to imitate it, including splitting wavs, and the transcript.csv.
I have tested train.py with the LJ001 Data and the trainer works, but when i try with my Data it fails, giving me this error:
Here is an example of the CSV File, i tried matching the ID, TEXT, LENGTH Format.
So tldr two questions:
bucket_boundaries must not be empty
Error when python finds the CSV and can read it.The text was updated successfully, but these errors were encountered: