Skip to content

Xedonedron/airflow-hdfs-metabase-for-smart-farming-integration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Data Heterogeneous Integration - Airflow, Hadoop, Metabase

Technology used: Python, Selenium, Apache Airflow, Hadoop Distributed File System, Metabase

Abstract

Smart farming combines Information and communication technologies (ICT) with traditional agricultural practices to improve the quality and quantity of agricultural products. These ICTs could be Unmanned Aerial Vehicles (UAVs) or drones, artificial intelligence, robots, and sensors. In Smart Farming systems, various data types are needed, such as food prices, sensors, weather, images, and video. The data can be structured, semi-structured, or unstructured. Therefore, a system that can integrate Smart Farming data with versatility characteristics is needed to empower various types of analysis. In this study, the authors suggested using a Smart Farming Data Lake System based on Apache Airflow as data integration automation technology, Hadoop Distribution File System (HDFS) as data storage technology, and Metabase as dashboard technology. Check the Documentation too for full information within the paper.

Architectural Design

Enclosed is the architectural design envisaged for the successful completion of this final project. Architecture Design

Prerequisite

  • Python 3
  • Java 11
  • Selenium 4.9.1
  • Pandas 2.0.1
  • Hadoop 3.2.1
  • Apache Airflow 2.6.1
  • Metabase 0.43.1

Process

  1. Set up the Hadoop environment for HDFS (as a data lake)
  2. Set up the Apache Airflow, used for ochestrator
  3. Set up the Metabase environment for Business Intelligence, used to display dashboards related to metrics or charts from Airflow data.
  4. Creating DAG file of the web scraper for weather info from BMKG Website, and food prices from PIHPS Website.
  5. Creating DAG file to send a command to capture an image and record video into RapsberryPi (Linux-based Operating System).
  6. Creating DAG flie to retrieve the sensor data from InfluxDB using API.
  7. Running each DAG on Apache Airflow
  8. Review the result on Apache Airflow and ensure the data compeletly exist in HDFS
  9. Review the Apache Airflow performance using Metabase Business Intelligence

Source code: Python script (DAG file) & SQL Query for Metabase

  • Web_Scraper_DAG For extract the data automatically from the website, consist wather forecast information for evey region in a day separate in 6 hours, and food prices info for every comodity with 5 days past price from traditional and modern market.
  • Sensor_retrieve_DAG For extract the data from InfluxDB Database automatically using API, the sensor data consist such as battery, nitrogen, potassium, soil ph, and many more that generate every 5 minute.
  • Camera_data_capture For send a command automatically through the Raspberry Pi using SSH connection for capture an image and record a video.
  • SQL Query for Metabase The query that i've create for analyze the Airflow databae such as average execution time for each DAG, how much the successful or fail task, and to identify how much the DAG crash or not in a week.

Output

The outcome was a:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages