Technology used: Python, Selenium, Apache Airflow, Hadoop Distributed File System, Metabase
Smart farming combines Information and communication technologies (ICT) with traditional agricultural practices to improve the quality and quantity of agricultural products. These ICTs could be Unmanned Aerial Vehicles (UAVs) or drones, artificial intelligence, robots, and sensors. In Smart Farming systems, various data types are needed, such as food prices, sensors, weather, images, and video. The data can be structured, semi-structured, or unstructured. Therefore, a system that can integrate Smart Farming data with versatility characteristics is needed to empower various types of analysis. In this study, the authors suggested using a Smart Farming Data Lake System based on Apache Airflow as data integration automation technology, Hadoop Distribution File System (HDFS) as data storage technology, and Metabase as dashboard technology. Check the Documentation too for full information within the paper.
Enclosed is the architectural design envisaged for the successful completion of this final project.
- Python 3
- Java 11
- Selenium 4.9.1
- Pandas 2.0.1
- Hadoop 3.2.1
- Apache Airflow 2.6.1
- Metabase 0.43.1
- Set up the Hadoop environment for HDFS (as a data lake)
- Set up the Apache Airflow, used for ochestrator
- Set up the Metabase environment for Business Intelligence, used to display dashboards related to metrics or charts from Airflow data.
- Creating DAG file of the web scraper for weather info from BMKG Website, and food prices from PIHPS Website.
- Creating DAG file to send a command to capture an image and record video into RapsberryPi (Linux-based Operating System).
- Creating DAG flie to retrieve the sensor data from InfluxDB using API.
- Running each DAG on Apache Airflow
- Review the result on Apache Airflow and ensure the data compeletly exist in HDFS
- Review the Apache Airflow performance using Metabase Business Intelligence
- Web_Scraper_DAG For extract the data automatically from the website, consist wather forecast information for evey region in a day separate in 6 hours, and food prices info for every comodity with 5 days past price from traditional and modern market.
- Sensor_retrieve_DAG For extract the data from InfluxDB Database automatically using API, the sensor data consist such as battery, nitrogen, potassium, soil ph, and many more that generate every 5 minute.
- Camera_data_capture For send a command automatically through the Raspberry Pi using SSH connection for capture an image and record a video.
- SQL Query for Metabase The query that i've create for analyze the Airflow databae such as average execution time for each DAG, how much the successful or fail task, and to identify how much the DAG crash or not in a week.
The outcome was a:
- Weather info, food prices info, and sensor data dataset that presents results in expected format.
- Dashboard from Metabase Business Intelligence