This is the repository for the networks that generate audio.
-
main.ipynb: This is the go to file that contains the model, the audio dataset loader and the audio generation. Naturally the IPython notebook makes it easy to explore the data.
-
main_large_dataset.ipynb: This example takes on multiple samples to train for much longer. Use this once you have main.ipynb working and experiment with the parameters to achieve different results
-
old_main.py: This is a older file that had the initial vinilla TensorFlow code in it. It creates the model via the methods from model.py. Higher level TensorFlow wrappers such as tflearn and slim tended to deliver better results so we migrated to using those in the main.ipynb. This demonstrates a lot of the usefulness of the AudioDatasetGenerator in the running of a tf session via the get_next_batch and is_new_epoch functions.
-
old_model.py: This contained the vinilla TensorFlow code to replicate Sam's Keras model. As mentioned earlier higher level wrappers provided better results to we moved to those.
-
old_phase_gen.py: This is the original method of phase reconstruction. Currently we now use griffin lim, which can be found in the audio_dataset_generator.py under the method with the same name. There is also a phase reconstruction network in progress in the phase_reconstruction.ipynb file, although this as of the time of writing is achieving poor results.
-
assets: This is the folder that contains the audio for training. The current code in main.ipynb instanciates a AudioDatasetGenerator object that will pass the path to the assets folder so it can create the necessary sequences of fft magnitudes.
To access the b0rk via ssh:
ssh [email protected]
ssh [email protected]
Check the docker processes:
sudo docker ps
If the container is running copy the container id and run the ipython notebook:
sudo docker exec -it <container id> /run_jupyter.sh
Else open a new container with the saved docker image, forward ipython notebook port to 8882 then run the notebook:
sudo nvidia-docker run -it -p 8882:8888 golb0rk/lstmsynth /run_jupyter.sh
You can use any port numbers, if it complains try a different port - I find it easiest to keep a tunnel open (with -f) from my igor into b0rk. Then all I have to do is tunnel from my local machine into igor.
ssh tunnel from Igor to b0rk (whilst sshed into igor):
ssh -N -f -L localhost:8883:localhost:8882 [email protected]
then tunnel from your local machine to igor
ssh -N -L localhost:8884:localhost:8883 [email protected]
now localhost:8884
should work in your browser!
Detach from the docker with Ctrl+p
+Ctrl+q
(exit or ctrl+e will halt the container).
Then from outside the docker (eavi):
sudo docker commit <container id> golb0rk/lstmsynth
To run this locally, it is assumed that Python 3.5, TensorFlow 1.1++ and the latest version of TFLearn are all installed on your system.
Generally the model found in the main.ipynb file is the best place to start hacking around. Have at it!
You can upload your own training data through the iPython file browser.
Locate the audio data path set in the main script. In this example it's been set to: assets/test_samples
audio_data_path = "assets/test_samples"
Upload a .wav file
Confirm upload
When using a new set of samples, make sure force_new_dataset is set to true
When running a new test, you'll need to restart the kernal in the main notebook and re-run all the scripts. Tip: in the Jupyter toolbar, go to Cell -> Run All
There are a few things to note when building networks.
Audio data can be managed using the AudioDatasetGenerator. Care should be taken to remove any existing .npy files from whatever folder the target wavs are stored as the class will attempt to load those first. To force a new dataset, just pass true in the load method where force_new_dataset is.
dataset = AudioDatasetGenerator(fft_size, window_size, hop_size, sequence_length, sample_rate)
dataset.load(audio_data_path, force_new_dataset)
The above code is fairly trivial - one other thing that might be unclear is the sequence_length parameter. This simply means how many n frames of fft magnitudes there should be for every prediction of a frame of fft magnitudes by the model.
- Implement fast griffin lim / implement good phase reconstruction network.
- Get working instructions on how to get ipython notebook of b0rk running locally.
- Modify current instructions to allow multiple users to modify same image - currently all ip's are used to ipython forwarding.
- Perhaps try something like seq2seq for the generation of frames.
- Add variable layer mlp / highway layers at the end for easier experimentation.
- Experiment with deconvolutions at the end?