-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretrained classifier #1
Comments
Simple classifications: import numpy as np
pipe = pipeline("audio-classification", "mtg/maest...")
file_paths = np.load("../data/file_paths.npy")
dlabel = list(set([row.split("---")[0] for row in discogs_labels]))
glabel = set(np.load("labels.npy"))
dlabel_to_idx = {
label:idx
for (idx, label) in enumerate(dlabel)
}
glabel_to_dlabel = {
# manually map
}
_, audio = wav.read(file_paths[0])
outputs = pipe(audio)
# [{"score":0.123, "label": "Electronic---Noise"}, ...]
predictions = np.zeros(len(dlabel_to_idx))
for output in outputs:
idx = dlabel_to_idx[output["label"]]
if output["label"] > predictons[idx]:
predictions[idx] = output["label"]
criterion = CrossEntropyLoss(pipe.model.parameters(), lr=0.001)
loss = criterion(predictions, y)
|
After some experimenting with the above, I think it may be best to build a class from an By wrapping the training in a model class, we can easily handle the data transformation steps necessary to covert GTZAN labels to Discogs label outputs. This will allow us to compare against our other models tested. It's recommended to freeze the pretrained layers, and only update the new layers during back propagation. But I'll have to experiment a bit and see what gives the best results. The code will look something like this (though not exactly bc the below example is using a Pipeline base): from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from transformers import AutoModelForSequenceClassification, Wav2Vec2Tokenizer
class AudioClassificationPipeline(Pipeline):
def __init__(self, model_name='wav2vec2-base-960'):
super().__init__()
self.scaler = StandardScaler()
self.tokenizer = Wav2Vec2Tokenizer.from_pretrained(model_name)
self.encoder = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=8)
self.fc = LinearLayer(8, 8, activation='softmax')
def fit(self, X, y):
X_encoded = self.scaler.fit_transform(X)
X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
X_encoded = self.fc(X_encoded)
return super().fit(X_encoded, y)
def predict(self, X):
X_encoded = self.scaler.transform(X)
X_encoded = self.tokenizer.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
X_encoded = self.encoder.encode_plus(X_encoded, return_attention_mask=True, max_length=512, padding='max_length', truncation=True)
X_encoded = self.fc(X_encoded)
return np.argmax(X_encoded, axis=1) |
To round off the classifier experimenting, we want to test a pretrained music classifier, and compare the architecture against our simpler models.
Steps
The text was updated successfully, but these errors were encountered: