Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained classifier #1

Open
5 tasks done
pmhalvor opened this issue Nov 20, 2023 · 2 comments
Open
5 tasks done

Pretrained classifier #1

pmhalvor opened this issue Nov 20, 2023 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@pmhalvor
Copy link
Owner

pmhalvor commented Nov 20, 2023

To round off the classifier experimenting, we want to test a pretrained music classifier, and compare the architecture against our simpler models.

Steps

  • Find classifier (MAEST)
  • Build preprocessing from GTZAN to classifier
  • Evaluate model as out-of-the-box
  • Run fine-tuning similar to previous steps
  • Evaluate fine-tuned model
@pmhalvor pmhalvor added the documentation Improvements or additions to documentation label Nov 20, 2023
@pmhalvor
Copy link
Owner Author

pmhalvor commented Nov 20, 2023

Simple classifications:

import numpy as np

pipe = pipeline("audio-classification", "mtg/maest...")


file_paths = np.load("../data/file_paths.npy")

dlabel = list(set([row.split("---")[0] for row in discogs_labels]))
glabel = set(np.load("labels.npy"))

dlabel_to_idx = {
    label:idx
    for (idx, label) in enumerate(dlabel)
}
glabel_to_dlabel = {
    # manually map
}


_, audio = wav.read(file_paths[0])

outputs = pipe(audio)
# [{"score":0.123, "label": "Electronic---Noise"}, ...]

predictions = np.zeros(len(dlabel_to_idx))

for output in outputs: 
    idx = dlabel_to_idx[output["label"]]
    if output["label"]  > predictons[idx]:
        predictions[idx] = output["label"]

criterion = CrossEntropyLoss(pipe.model.parameters(), lr=0.001)

loss = criterion(predictions, y) 
        

@pmhalvor
Copy link
Owner Author

After some experimenting with the above, I think it may be best to build a class from an ASTForAudioClassification instance loaded from pretrain. This is the same architecture MAEST was trained on.

By wrapping the training in a model class, we can easily handle the data transformation steps necessary to covert GTZAN labels to Discogs label outputs. This will allow us to compare against our other models tested.

It's recommended to freeze the pretrained layers, and only update the new layers during back propagation. But I'll have to experiment a bit and see what gives the best results.

The code will look something like this (though not exactly bc the below example is using a Pipeline base):

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from transformers import AutoModelForSequenceClassification, Wav2Vec2Tokenizer

class AudioClassificationPipeline(Pipeline):
    def __init__(self, model_name='wav2vec2-base-960'):
        super().__init__()
        self.scaler = StandardScaler()
        self.tokenizer = Wav2Vec2Tokenizer.from_pretrained(model_name)
        self.encoder = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=8)
        self.fc = LinearLayer(8, 8, activation='softmax')

    def fit(self, X, y):
        X_encoded = self.scaler.fit_transform(X)
        X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.fc(X_encoded)
        return super().fit(X_encoded, y)

    def predict(self, X):
        X_encoded = self.scaler.transform(X)
        X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.fc(X_encoded)
        return np.argmax(X_encoded, axis=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant