Skip to content

A cloud-native data pipeline and visualization dashboard for FDA drug approvals, using AWS Lambda (containerized), EventBridge, and S3 for automated data updates. Built a Python scraper processing over 1,770 records and developed a Dash app with real-time KPIs, charts, and tables. Deployed the dashboard on AWS AppRunner with CI/CD pipelines.

License

Notifications You must be signed in to change notification settings

Tanguy9862/new-drug-approvals-dashboard

Repository files navigation

📊 New Drug Approvals Dashboard

An interactive dashboard developed with Dash by Plotly, which visualizes up-to-date information on newly approved drugs. The dashboard autonomously updates every 24 hours through a fully automated scraper that manages data retrieval and storage, ensuring the information is always current without any manual intervention.

🌍 Live Application

Explore the live dashboard here.

🖼️ Dashboard Previews

🧩 Project Overview

🔄 Data Pipeline

This project features a fully automated pipeline leveraging AWS services to scrape, enrich, and store pharmaceutical data. The scraper is containerized and runs on AWS Lambda, triggered at scheduled intervals via EventBridge. The enriched data is then stored in Amazon S3, and the Dash app is hosted on AWS App Runner.

Data Pipeline Schema

🤖 Automated Scraper with AI Enrichment

The data pipeline includes a scraper containerized in AWS ECR, executed by AWS Lambda:

  • Scrapes new drug approval data from Drugs.com.
  • Uses gpt-4o-mini LLM to classify therapeutic classes and diseaes.
  • Scheduled with AWS EventBridge every 24 hours.

For a detailed understanding of the scraper's workings and to view the source code, refer to the dedicated repository.

🌐 Dash Application

The dashboard is deployed on AWS App Runner, automatically updating when new data is pushed to S3. Key features:

  • ✔ Real-time insights into drug approvals.
  • ✔ Interactive graphs & analytics powered by Dash/Plotly.
  • ✔ Filters for disease categories, companies, and trends.

📦 Containerization

  • App Runner hosts the Dash app, with CI/CD triggering a rebuild on every GitHub push.
  • The scraper runs in a Docker container stored in AWS ECR.

🛠️ Installation & Setup

The system is designed for flexibility in deployment:

  • Local Setup: Clone the repository, install dependencies from requirements.txt, and run locally.
  • Cloud Deployment: For deploying on Google Cloud Platform (GCP) or Amazon Web Services (AWS), modify the user_config.py to fit your configurations. Ensure appropriate permissions are set.

If you need to configure the application for a specific environment, check the corresponding branches:

About

A cloud-native data pipeline and visualization dashboard for FDA drug approvals, using AWS Lambda (containerized), EventBridge, and S3 for automated data updates. Built a Python scraper processing over 1,770 records and developed a Dash app with real-time KPIs, charts, and tables. Deployed the dashboard on AWS AppRunner with CI/CD pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published