Skip to content

Latest commit

 

History

History
120 lines (91 loc) · 3.66 KB

README.md

File metadata and controls

120 lines (91 loc) · 3.66 KB

ML Datasets 🫙

Event datasets used for training machine learning models.

Data Sources

For each supported language, learning events and assessment events are continuously uploaded from Android devices to the webapp database.

And those datasets are then downloaded to this repository in a daily cron job.

flowchart
    subgraph device [Android device]
        kukariri -- AssessmentEvents --> analytics@{ shape: cyl }
    
        subgraph sg_lit [Literacy apps]
        herufi
        vitabu
        etc_lit[...]
        end
        sg_lit -- LearningEvents --> analytics
    
        subgraph sg_num [Numeracy apps]
        nyas-space-quest
        nambari
        etc_num[...]
        end
        sg_num -- LearningEvents --> analytics
    end

    subgraph rest_api_hin [hin.elimu.ai]
        analytics-->webapp_hin[webapp]@{ shape: cyl }
    end
    webapp_hin-->ml-datasets

    subgraph rest_api_tgl [tgl.elimu.ai]
        webapp_tgl[webapp]@{ shape: cyl }
    end
    webapp_tgl-->ml-datasets

    subgraph rest_api_tha [tha.elimu.ai]
        webapp_tha[webapp]@{ shape: cyl }
    end
    webapp_tha-->ml-datasets

    subgraph rest_api_vie [vie.elimu.ai]
        webapp_vie[webapp]@{ shape: cyl }
    end
    webapp_vie-->ml-datasets

    click kukariri "https://github.com/elimu-ai/kukariri"
    click herufi "https://github.com/elimu-ai/herufi"
    click vitabu "https://github.com/elimu-ai/vitabu"
    click nyas-space-quest "https://github.com/elimu-ai/nyas-space-quest"
    click nambari "https://github.com/elimu-ai/nambari"
    click analytics "https://github.com/elimu-ai/analytics"
    click webapp_hin "https://github.com/elimu-ai/webapp"
    click webapp_tgl "https://github.com/elimu-ai/webapp"
    click webapp_tha "https://github.com/elimu-ai/webapp"
    click webapp_vie "https://github.com/elimu-ai/webapp"
Loading

Machine Learning Operations (MLOps)

When machine learning models are being trained with datasets collected from the elimu.ai Android apps, they should be fetching the data from this repository.

Note

This repository only contains learning events and assessment events. If you are also looking for content datasets, see webapp-lfs .

Daily Updates

You can expect the datasets in this repository to be updated once per day.

Tip

Since datasets in this repository are continuously updated, you should also design your machine learning code to continulously train new versions of your model (e.g. once per night).

Code Usage

Prerequisites:

Dependencies

Install the Python dependencies:

pip install -r requirements.txt

Run

Download datasets:

python download_datasets.py

elimu.ai - Free open-source learning software for out-of-school children 🚀✨

Website 🌐  •  Wiki 📃  •  Projects 👩🏽‍💻  •  Milestones 🎯  •  Community 👋🏽  •  Support 💜