Solution delivery (#5)

* docs: add adrs * docs: add problem background section to repo (#1) * docs: add problem background section to repo * fix: format 3rd party tools * docs: solution * docs: notion solution background migration (#4) * docs: notion solution background migration * Fix broken diagrams for unsupported syntax. * fix: rename documents to follow uniform style * docs: update readme with solution index * fix: update README index * docs: add resource section to project --------- Co-authored-by: ariel.morelli <[email protected]> Co-authored-by: Ariel Morelli <[email protected]>
Kata-Ceals · Oct 31, 2023 · 206f697 · 206f697
1 parent a223d1b
commit 206f697
Show file tree

Hide file tree

Showing 65 changed files with 1,324 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.vscode/
diff --git a/1.ProblemBackground/0.BusinessGoalsDrivers.md b/1.ProblemBackground/0.BusinessGoalsDrivers.md
@@ -0,0 +1,22 @@
+# Business Goals, Drivers
+
+# Background
+
+[Wildlife.ai](http://Wildlife.ai) is a charitable trust that uses artificial intelligence to accelerate wildlife conservation. As such, the organization promotes community events, seminars and educational activities, all with the aim of protect biodiversity by leveraging machine learning technology.
+
+They focus on grassroots wildlife conservation projects. Therefore, the following aspects need to be considered. From the research institution perspective, the quality of collected scientific evidence is of the highest precedence. From the non-profit perspective, the operational costs need to be reduced to a minimum in order to sustain the activities.
+
+# Business goals
+
+The proposed architecture starts with the organization’s mission statement: “Using artificial intelligence to accelerate wildlife conservation.” To achieve that conservation practitioners need accurate and real time information. One way of doing so is to deploy solutions that would help in automating repetitive tasks. Species identification and unique animal counting are two challenges in pressing needs of a solution, as they are the building blocks for testing more complex scientific hypothesis.
+
+Currently, the Wildlife Watcher camera is deployed for Wētā insects detection. The final solution aims for a modular design adapted to monitor varying species. The proposed architecture should aim to serve those goals in order to enable real-time ecosystem health. All that should be done through an open-source community while educating and empowering citizen scientists of all ages to contribute to the conservation causes.
+
+That is currently achieved through a camera which aims to address the monitoring automation requirements. That camera needs to run a machine learning model on the edge, to ease the burden on the biologists. To achieve full automation, a iterative process of machine learning model building and evaluation is required. That process includes the collection of model input samples, their annotation for target output labels, model training on the built dataset and storage of the build model as a deployment unit. With such a repository of trained models, the community could decide which models belong to which areas. The models are then sent to the corresponding cameras and configured for evaluation of the model performance. With enough confidence in built models, the solutions can be run with certainty to perform the automation tasks with satisfying level of confidence.
+
+# Business drivers
+
+1. Prove that the automated solutions can provide quality wildlife measurements, exceeding competing alternatives, at a lower price point.
+2. Ability to successfully monitor an area will enable scientists to control for experiment variables.
+3. Grow the community of biologists, nature enthusiasts, data scientists and open source makers while developing a product that fits their needs.
+4. To ensure artificial intelligence is widely applied to protect biodiversity and accelerating efficient species conservation.
diff --git a/1.ProblemBackground/1.Requirements.md b/1.ProblemBackground/1.Requirements.md
@@ -0,0 +1,45 @@
+# Requirements
+
+## Engagement Models
+
+### Users
+
+**Description:** Biologists and nature enthusiasts around the globe.
+
+Receive reports near-real time or identified species in specific locations.
+
+Organize collected in a uniform pattern way across multiple projects.
+
+### Community
+
+**Description:** community members from external communities, like iNaturalist and GBIF.
+
+They can take advantage of discoveries and experiments from Wildlife users.
+
+## Architecturally significant business requirements
+
+### 1. Update camera config by an app
+
+An online system must be present in order to connect mobile app and cameras.
+
+### 2. External integration
+
+Integration with external tools is required, so an integration token (or any other authentication token) must be stored in the system.
+
+### 3. Store big amount of data from users
+
+The system must store a large amount of data.
+
+Data includes:
+
+- Photos and video from cameras
+- Labelled photos and videos from 3rd party tools
+- Trained models
+
+### 4. Reports near real-time
+
+Cameras should report to users once they identify a species.
+
+## Significant Non-Functional requirements
+
+The system must be built from scratch so it doesn’t need to integrate into an existing solution.
diff --git a/1.ProblemBackground/2.ConsiderationsAndDataCriticality.md b/1.ProblemBackground/2.ConsiderationsAndDataCriticality.md
@@ -0,0 +1,20 @@
+# Considerations and Data Criticality
+
+Overview of data criticality and project considerations.
+
+## Considerations
+
+- Cameras don’t need to stream or send photos/videos over the internet - this data will be stored on an SD and requires manual step
+- Uploading the model through the internet can be not doable given the lack of internet.
+- The camera already has manual steps, like getting data from SD and charging batteries.
+- Cameras will have access to the internet, when available.
+
+## Data criticality
+
+| Data type | Criticality  | Observation |
+| --- | --- | --- |
+| Video and photos on camera | Critical | Must be always available |
+| Videos and Photos on platform | Critical | Must be always available |
+| Labelled data | Critical | Must be always available |
+| Trained models | Critical | Must be always available |
+| Notifications from cameras | Low | can be not available because of the internet in the wild |
diff --git a/1.ProblemBackground/3.ActorsAndActions.md b/1.ProblemBackground/3.ActorsAndActions.md
@@ -0,0 +1,37 @@
+# Actors & Actions
+
+## Actions & Actions
+
+Identified actors of “Wildlife Watcher” and their actions.
+
+---
+
+**Actor:** User
+
+**Description:** biologists and/or nature enthusiasts
+
+**Actions:**
+
+- Register itself
+- Register cameras on the platform
+- Update cameras config
+- Upload data from cameras
+- Label dataset
+- Train machine learning model based on a labelled dataset
+- Publish selected frames to iNaturalist
+- Publish occurrences to GBIF
+- Receive notifications from the camera
+
+---
+
+**Actor:** Camera
+
+**Description:** wildlife camera
+
+**Actions:**
+
+- Be configurable remotely
+- Send occurrences to the platform over the internet
+- Store photos, videos and metadata on an SD disk
+
+---
diff --git a/1.ProblemBackground/4.RaidLog.md b/1.ProblemBackground/4.RaidLog.md
@@ -0,0 +1,7 @@
+# RAID log
+
+Risk and actions log.
+
+| Risks | Description | Actions |
+| --- | --- | --- |
+| Change requirements for camera | The camera is hardware, which means that the platform should still continue working with old and new camera versions, without the need to “release” new cameras  | The decisions about the camera are made in the first place, so the architecture is linked to it. |
diff --git a/1.ProblemBackground/5.AnalysesOf3rdPartyTools.md b/1.ProblemBackground/5.AnalysesOf3rdPartyTools.md
@@ -0,0 +1,59 @@
+# Analyses of 3rd party tools
+
+## Labelling platforms
+
+|                 | [Wildlife Insights](https://www.wildlifeinsights.org/)                                                   | [TrapTagger](https://wildeyeconservation.org/traptagger/)                                       | [Trapper](https://gitlab.com/trapper-project/trapper)                                                                     |
+| --------------- | -------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
+| Deployment      | SaaS                                                                                                     | SaaS                                                                                            | Self-hosted                                                                                                               |
+| Scalability     | -                                                                                                        | -                                                                                               | Python + celery                                                                                                           |
+| License         | Proprietary                                                                                              | Open Source                                                                                     | Open-source                                                                                                               |
+| Handle video    | ❌                                                                                                       | ❌                                                                                              | ✅                                                                                                                        |
+| Handle photo    | ✅                                                                                                       | ✅                                                                                              | ✅                                                                                                                        |
+| API             | ✅                                                                                                       | ✅                                                                                              | ✅                                                                                                                        |
+| Manual upload   | ✅                                                                                                       | ✅                                                                                              | ✅                                                                                                                        |
+| Data input      | Via [Google Cloud Platform (GCP)](https://www.wildlifeinsights.org/get-started/upload/bulk-data-uploads) | Via [Via Amazon S3](https://youtu.be/9vH9rHPnoxk?list=PLz-q4hjV3X_YfKdix0LKovQNyANuZc0-R&t=137) | Via FTP                                                                                                                   |
+| Deployment info | -                                                                                                        | -                                                                                               | There is no Docker image ready (but there is a Dockerfile), some adjustments in the original repository are necessary [1] |
+|                 |
+| Assumptions     | - Data output is the same as data input                                                                  | - Data output is the same as data input                                                         | - Data output is the same as data input                                                                                   |
+
+[1] Need AZURE to run with cloud-based storage
+
+## Training tools
+
+|                    | [Roboflow](https://roboflow.com/)                                                                                                     | [EdgeImpulse](https://edgeimpulse.com/)                         | [TensorFlow Lite](https://www.tensorflow.org/lite)                                                                                                                                                                                                                           |
+| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Solution           | SaaS                                                                                                                                  | SaaS                                                            | Framework                                                                                                                                                                                                                                                                    |
+| Dataset management | ✅: different data itegration sources [here](https://roboflow.com/integrations/data) including S3                                     | ✅ : built-in & 3rd party [1]                                   | ❌ [TFDataset](https://www.tensorflow.org/datasets): additional lib for dataset storage formating, for management consider [DVC](https://dvc.org/), for integration with S3 and other [cloud storage formats](https://dvc.org/doc/user-guide/data-management/remote-storage) |
+| Dataset labeling   | ✅: on platform labeling [here](https://roboflow.com/annotate) as well as integrations [here](https://roboflow.com/integrations/data) | ⚖️ : Built-in for object detection in images only               | ❌ : Needs integration with 3rd parties                                                                                                                                                                                                                                      |
+| Model training     | ✅: built-in [here](https://roboflow.com/train)                                                                                       | ✅: built-in                                                    | ❌: For training models use [TensorFlow](https://www.tensorflow.org/)                                                                                                                                                                                                        |
+| Model management   | ✅: built-in versioning and re-training                                                                                               | ✅ : built-in versioning and re-training                        | ❌ : Consider [DVC Model Registry](https://dvc.org/doc/use-cases/model-registry) or [MLFlow](https://mlflow.org/)                                                                                                                                                            |
+| Model deployment   | ✅: different deployment targets ([here](https://roboflow.com/deploy) and [here](https://roboflow.com/integrations/deployment))       | ✅ (+ optimization for edge: export to different architectures) | ✅ : Convert the model into an efficient format for edge compute; supported targets can be found [here](https://www.tensorflow.org/lite/microcontrollers)                                                                                                                    |
+
+[1] [Edge Impulse](https://edgeimpulse.com/) : Dataset management
+
+- A storage bucket in the cloud (S3 compatible)
+- Organizational dataset (platform-hosted repository)
+
+[2] [Edge Impulse](https://edgeimpulse.com/) : Model deployment
+
+- Wide range of supported targets + custom integrations for Enterprise
+
+---
+
+## Publishing platforms
+
+### [iNaturalist](https://www.inaturalist.org/)
+
+- Integration can be done using API
+  - **API reference:** [https://www.inaturalist.org/pages/api+reference#auth](https://www.inaturalist.org/pages/api+reference#auth)
+  - **Protocol:** HTTPS
+  - **Authentication:** 3rd party authentication
+  - Images can be uploaded via HTTPS
+
+### [GBIF](https://www.gbif.org/)
+
+- Integration can be done using API
+  - **API reference:** [https://www.gbif.org/developer/summary](https://www.gbif.org/developer/summary)
+  - **Protocol:** HTTPS
+  - **Authentication:** basic auth (`user:password`)
+  - Images can be uploaded via HTTPS
diff --git a/1.ProblemBackground/6.Glossary.md b/1.ProblemBackground/6.Glossary.md
@@ -0,0 +1,10 @@
+# Glossary
+
+- **User:** the platform user
+- **Camera:** wildlife camera
+- **Dataset:** a collection of photos and/or videos, grouped or not.
+- **Labelled dataset:** dataset with labelled photos and/or videos.
+- **Frame:** picture extracted from a video
+- **Occurrences:** depends on the context. In general, context is an identified animal/species.
+- **GCP:** Google Cloud Platform
+- **ML:** machine learning
diff --git a/1.ProblemBackground/README.md b/1.ProblemBackground/README.md
@@ -0,0 +1,11 @@
+# Problem background
+
+Deep analysis of the problem and requirements.
+
+- [Business Goals, Drivers](./0.BusinessGoalsDrivers.md)
+- [Requirements](./1.Requirements.md)
+- [Considerations & Data Criticality](./2.ConsiderationsAndDataCriticality.md)
+- [Actors & Actions](./3.ActorsAndActions.md)
+- [RAID Log](./4.RaidLog.md)
+- [Analysis of 3rd Party Tools](./5.AnalysesOf3rdPartyTools.md)
+- [Glossary](./6.Glossary.md)
diff --git a/2.SolutionBackground/0.Vision.md b/2.SolutionBackground/0.Vision.md
@@ -0,0 +1,10 @@
+# Vision
+
+The solution consists of a online platform which:
+
+- Centralize all data - photos, videos, identified animals and machine-learning models
+- Get camera info (e.g. battery and storage level) and adjust camera settings remotely
+- Label and train data with help of external tools.
+- Ask on iNaturalist for help to identify of the species
+- Publish species occurrences to GBIF
+- Receive notifications from cameras
diff --git a/2.SolutionBackground/1.ArchitecturePrinciples.md b/2.SolutionBackground/1.ArchitecturePrinciples.md
@@ -0,0 +1,17 @@
+# Architecture principles
+
+Principles that should be applied in the overall architecture of the project.
+
+| Principle              | Reason                                                             | Outcomes                                                            |
+| ---------------------- | ------------------------------------------------------------------ | ------------------------------------------------------------------- |
+| Availability           | - Cameras can send data at any time                                | - System always up and running                                      |
+|                        | - Users from around the globe (access at any time)                 |                                                                     |
+| Reliability            | - Multiple users                                                   | - Users will have access only to their resources                    |
+| Scalability/Elasticity | - Multiple users in parallel                                       | - System can handle multiple data at the same time                  |
+|                        | - Multiple cameras sending data in parallel                        | - Less resources are used when there is no high-traffic             |
+| Security               | - Password and integration tokens will be stored in the platform   | - Secured system                                                    |
+| Extendability          | - New functionality and integrations can be present in the future  | - Enable future improvements in the system                          |
+| Modularity             | - Services and components must be modular                          | - Improvement/changes can be done without touching the whole system |
+| Maintainability        | - Support is done by volunteers with different levels of expertise | - System will                                                       |
+| Responsiveness         | - Notifications must reach users near real-time                    | - Users will be informed faster                                     |
+| Data integrity         | - Data will be erased from SD once is uploaded                     | - Data will be present on the platform once is uploaded             |