Pretrained classifier #1

pmhalvor · 2023-11-20T16:32:22Z

To round off the classifier experimenting, we want to test a pretrained music classifier, and compare the architecture against our simpler models.

Steps

Find classifier (MAEST)
Build preprocessing from GTZAN to classifier
Evaluate model as out-of-the-box
Run fine-tuning similar to previous steps
Evaluate fine-tuned model

pmhalvor · 2023-11-20T21:36:30Z

Simple classifications:

import numpy as np

pipe = pipeline("audio-classification", "mtg/maest...")


file_paths = np.load("../data/file_paths.npy")

dlabel = list(set([row.split("---")[0] for row in discogs_labels]))
glabel = set(np.load("labels.npy"))

dlabel_to_idx = {
    label:idx
    for (idx, label) in enumerate(dlabel)
}
glabel_to_dlabel = {
    # manually map
}


_, audio = wav.read(file_paths[0])

outputs = pipe(audio)
# [{"score":0.123, "label": "Electronic---Noise"}, ...]

predictions = np.zeros(len(dlabel_to_idx))

for output in outputs: 
    idx = dlabel_to_idx[output["label"]]
    if output["label"]  > predictons[idx]:
        predictions[idx] = output["label"]

criterion = CrossEntropyLoss(pipe.model.parameters(), lr=0.001)

loss = criterion(predictions, y)

pmhalvor · 2023-11-20T22:46:17Z

After some experimenting with the above, I think it may be best to build a class from an ASTForAudioClassification instance loaded from pretrain. This is the same architecture MAEST was trained on.

By wrapping the training in a model class, we can easily handle the data transformation steps necessary to covert GTZAN labels to Discogs label outputs. This will allow us to compare against our other models tested.

It's recommended to freeze the pretrained layers, and only update the new layers during back propagation. But I'll have to experiment a bit and see what gives the best results.

The code will look something like this (though not exactly bc the below example is using a Pipeline base):

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from transformers import AutoModelForSequenceClassification, Wav2Vec2Tokenizer

class AudioClassificationPipeline(Pipeline):
    def __init__(self, model_name='wav2vec2-base-960'):
        super().__init__()
        self.scaler = StandardScaler()
        self.tokenizer = Wav2Vec2Tokenizer.from_pretrained(model_name)
        self.encoder = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=8)
        self.fc = LinearLayer(8, 8, activation='softmax')

    def fit(self, X, y):
        X_encoded = self.scaler.fit_transform(X)
        X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.fc(X_encoded)
        return super().fit(X_encoded, y)

    def predict(self, X):
        X_encoded = self.scaler.transform(X)
        X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
        X_encoded = self.fc(X_encoded)
        return np.argmax(X_encoded, axis=1)

pmhalvor added the documentation Improvements or additions to documentation label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained classifier #1

Pretrained classifier #1

pmhalvor commented Nov 20, 2023 •

edited

Loading

pmhalvor commented Nov 20, 2023 •

edited

Loading

pmhalvor commented Nov 20, 2023

Pretrained classifier #1

Pretrained classifier #1

Comments

pmhalvor commented Nov 20, 2023 • edited Loading

Steps

pmhalvor commented Nov 20, 2023 • edited Loading

pmhalvor commented Nov 20, 2023

pmhalvor commented Nov 20, 2023 •

edited

Loading

pmhalvor commented Nov 20, 2023 •

edited

Loading