Skip to content
Niger Little-Poole edited this page Mar 2, 2022 · 10 revisions

Basics

Welcome to the DS Repo for Story Squad. It can be a little overwhelming for new cohorts, so here are the basics that you need. The following recommendations in this Basics section for future cohorts were made on 12/17/21.

  • First, here is a link to our DS Video Archive. It has powerpoints and videos walking through everything from cloning this repo to your computer to building your own custom OCR model from scratch(which you don’t have to do, thanks to Docker!).

  • We have a Dockerfile set up, and instructions to use it can be found in the custom_tesseract_training folder. This README walks you through connecting to Docker, how and where our data is stored, and how to tune and save a new model. Improving our OCR model is one of our main tasks on the Data Science side of the project, so future cohorts should focus attention on this.

  • In the app folder, you will find all of the files and endpoints related to the API and how they interact with each other. The folder README gives a visual of the story submission workflow.

  • The current data we have is stored in the data folder. In the future we want to move these samples to S3 Cloud storage buckets, so that they aren’t saved on Github. We’ve also been looking into data augmentation techniques to generate more data to train with.

  • The README of the data management folder walks through the data cleaning pipeline, to prepare a story submission to be used by our model. Currently, this is done manually. An important task for future cohorts will be to automate this process.

  • The notebooks folder has a range of topics from clustering methods, creation of the crop cloud, endpoint relationships, and more. These can give you background on how the crop cloud and other features were created. If you are working on a notebook that will be valuable for others, you can add it here.

  • As new cohorts make updates and create new features to the repo, be sure to add your progress and updates here for a better onboarding experience.

Clone this wiki locally