Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 Genre ID "jazz fusion" is non-functional #31

Open
xandramax opened this issue May 4, 2020 · 6 comments
Open

v3 Genre ID "jazz fusion" is non-functional #31

xandramax opened this issue May 4, 2020 · 6 comments
Assignees

Comments

@xandramax
Copy link

Also, this genre is duplicated in v3_genre_ids.txt at id 107 and id 295.

Traceback (most recent call last):
  File "jukebox/sample.py", line 307, in <module>
    fire.Fire(run)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 304, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 268, in save_samples
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "jukebox/sample.py", line 268, in <listcomp>
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
    label = self.get_label(**meta)
  File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
    genre_ids = self.ag_processor.get_genre_ids(genre)
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
    return [self.genre_ids[word] for word in genres]
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
    return [self.genre_ids[word] for word in genres]
KeyError: 'fusion'
@xandramax
Copy link
Author

xandramax commented May 4, 2020

Opera, Andean Music, Sufi, Baroque, Kirtan, Canterbury, Operatic Pop, Mystic Folk, Anime, Poetry, Ragtime, Appalachian Folk, Religious, Sea Shanties, Christian Hymns, Spirituals, Barbershop, Choral, Gregorian Chant, and Boogie Woogie also fail to load.

Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_artist_ids.txt
Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:786432
Downloading from gce
Restored from ~/.cache/jukebox-assets/models/1b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode
Traceback (most recent call last):
  File "jukebox/sample.py", line 366, in <module>
    fire.Fire(run)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 363, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 327, in save_samples
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "jukebox/sample.py", line 327, in <listcomp>
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
    label = self.get_label(**meta)
  File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
    genre_ids = self.ag_processor.get_genre_ids(genre)
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
    return [self.genre_ids[word] for word in genres]
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
    return [self.genre_ids[word] for word in genres]
KeyError: 'sea'

@mcleavey
Copy link
Contributor

mcleavey commented May 6, 2020

Thanks. Looks like this is historical that we had trained 1B and 5B separately with different genres, but in the merge, the 1B is using the 5B's genres for the upsamplers. I'll adjust so the upsamplers won't complain if they see surprising genre words.

@kcrosley-leisurelabs
Copy link

@mcleavey, is there a related issue here with the colab notebook? When I use the colab notebook to load 5b_lyrics and then specify a genre that exists in VERSION 3 (v3_genre_ids.txt), but not in the version 2 (v2_genre_ids.txt), the cell where you specify your metas throws an error.

For example, if you try:

metas = [dict(artist = "barry white",
            genre = "coldwave",
            total_length = hps.sample_length,
            offset = 0,
            lyrics = """Some lyrics.
            """,
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

This will throw a Key Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-e02795cb531e> in <module>()
     16             ),
     17           ] * hps.n_samples
---> 18 labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

3 frames
/usr/local/lib/python3.6/dist-packages/jukebox/data/artist_genre_processor.py in <listcomp>(.0)
     51             # In v2, we convert genre into a bag of words
     52             genres = norm(genre).split("_")
---> 53         return [self.genre_ids[word] for word in genres]
     54 
     55     # get_artist/genre throw error if we ask for non-present values

KeyError: 'coldwave'

@mcleavey
Copy link
Contributor

mcleavey commented May 6, 2020

@kcrosley-leisurelabs Yes, the 5B model was trained with the v2 genres (historically, the 5B-without-lyrics came first so was v2, and then we branched out to experiment with a 1B model with lyrics, which became v3). I'm wrapped up with other work this afternoon, but will update names/comments to make this more clear & intuitive.

@kcrosley-leisurelabs
Copy link

@mcleavey thanks so much for the clarification.

@kcrosley-leisurelabs
Copy link

So, I'm still kind of confused about this. The 1B model is smaller but has larger numbers of genres and artists? (Can that really be true?)

I notice that the latest commit now complains if one specifies a V3 artist when using 5b_lyrics whereas it didn't before - it notes that the artist will be mapped to "unknown" (again, this occurs in the colab notebook -- BTW, the notebook shared by @SMarioMan in #40 is vastly superior to the one in the current distro as it uses Google drive to store generated samples rather than volatile session storage and also demonstrates how to prime the model).

Final question: Before the latest updates, I'd been able to specify artists from V3 list with the 5b_lyrics model and it didn't throw any errors or warnings. Under the hood, was this simply silently mapping them to "unknown" in previous builds?

(Sorry for what might be derpy questions. I'm pretty novice with the AI rocket surgery stuff. ;) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants