-
Notifications
You must be signed in to change notification settings - Fork 55
Home
Welcome to the DS Repo for Story Squad. It can be a little overwhelming for new cohorts, so here are the basics that you need. The following recommendations in this Basics
section for future cohorts were made on 12/17/21.
-
First, here is a link to our DS Video Archive. It has powerpoints and videos walking through everything from cloning this repo to your computer to building your own custom OCR model from scratch(which you don’t have to do, thanks to Docker!).
-
We have a Dockerfile set up, and instructions to use it can be found in the
custom_tesseract_training
folder. This README walks you through connecting to Docker, how and where our data is stored, and how to tune and save a new model. Improving our OCR model is one of our main tasks on the Data Science side of the project, so future cohorts should focus attention on this. -
In the
app
folder, you will find all of the files and endpoints related to the API and how they interact with each other. The folder README gives a visual of the story submission workflow. -
The current data we have is stored in the
data
folder. In the future we want to move these samples to S3 Cloud storage buckets, so that they aren’t saved on Github. We’ve also been looking into data augmentation techniques to generate more data to train with. -
The README of the
data management
folder walks through the data cleaning pipeline, to prepare a story submission to be used by our model. Currently, this is done manually. An important task for future cohorts will be to automate this process. -
The
notebooks
folder has a range of topics from clustering methods, creation of the crop cloud, endpoint relationships, and more. These can give you background on how the crop cloud and other features were created. If you are working on a notebook that will be valuable for others, you can add it here. -
As new cohorts make updates and create new features to the repo, be sure to add your progress and updates here for a better onboarding experience.