For more discussion, please see kohonen.ipynb
cd app/
uvicorn main:app
(default port is 8000)
- Look for
app/test.py
test.py file - It has sample tests written for running the network over following configurations:
- 10X10 for 100N
- 10X10 for 200N
- 10X10 for 500N
- 100X100 for 1000N
docker build . -t kohonen
(assuming your current directory is this project's directory)docker run -d --name kohonen -p 8000:80 kohonen:latest
numpy
is used as primary library to hold and manipluate data.fastAPI
is used to package the application and expose/train/
,/atrain/
,/list-of-models/
,/download/
,/predict/
endpoints.fastAPI
was chosen due to its high performance and auto tuned for number of CPU cores for handling high request load.- The solution is packaged as a production ready server application & containerised. The solution files reside under
app/
folder:kohonen.py
: file containsKohonen
that implements the algorithm.main.py
: file act as entry point forfastAPI
, where all REST API requests will fall.settings.py
: file contain settings, contant values etc used in the project.utils/utils.py
: file contains helper methods that are used in the project.saved_models/
: folder which (can) contain saved models (weights) post training. Ideally, models/weights could be saved on a blob storage for better scalablity.- There are already saved grids/model files (.npy) under following names/configurations:
10X10 100N
10X10 200N
10X10 500N
100X100 1000N
- There are already saved grids/model files (.npy) under following names/configurations:
exceptions.py
: file implementing kohonen algorithm exceptions.logs/logging.log
: file contains logs that are captured throughout the appplication.api_params
: file contains multiple classes that are responsible for parsingfastAPI
requests (body parameters).app/test.py
: Please look for test.py file for triggering tests. Uncomment lines, to run via command linepython test.py
saved_plots/
: folder to save any plot images (used in test.py) for analysis/code-testing purposessaved_train_inputs/
: folder to save any training data (used in test.py) as .npy file for analysis/code-testing purposesconfigs/
: folder contains python package requiements files.prod.requirements.text
is used to install requirements when packaging, containerising.requirements.text
has to be used to create local envrionment for running, testing the application via command line, jupyter lab etc.- (Core Libs)
! pip install numpy==1.20.1
! pip install matplotlib==3.3.4
- (Packaging/Productioning Libs)
! pip install fastapi==0.63.0
! pip install aiofiles==0.6.0
! pip install uvicorn==0.13.4
- (For making HTTP requests to running application)
! pip install requests==2.25.1
- (Core Libs)
Dockerfile
: to package this project as REST API- This is based on
tiangolo/uvicorn-gunicorn-fastapi:latest
image which haspython==3.8.6
preinstalled.
- This is based on
To make Kohonen network more efficient, I tried using python-inbuilt Multiprocessing at first. However, as the results show (look at kohonen.ipynb), the larger model took more than an hour ie. not 'that' efficient.
I revisited the codebase and started attacking the most computationaly expensive method in the codebase i.e. (euclidean) distance calculation method in utils.py. After learning more about vectorisation, I managed to convert method to use vectorization (that you see now in codebase). The result was nothing less than joy! :D