This project explores a public dataset of food products to develop a scoring system for a nutrition recommendation application. The app evaluates products based on three key metrics: health score, environmental score, and packaging impact. The project involved cleaning and analyzing data, designing scoring methods, and creating insights to assist users in making informed food choices.
- Clean and preprocess food product data to address missing values and inconsistencies.
- Analyze data to derive insights for nutritional, environmental, and packaging scores.
- Implement scoring systems to assess health, ecological, and packaging impacts.
- Demonstrate the feasibility of a nutrition recommendation app with computed scores.
- Python Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Data Processing: Missing value imputation using KNN, statistical analysis, and data normalization.
- Analysis:
- Univariate: Distribution histograms and normality tests.
- Bivariate: Correlation analysis and scatter plots.
- Multivariate: Principal Component Analysis (PCA) for dimensionality reduction.
- Health Score:
- Categories like vegetables, rice, and milk have the highest scores.
- Increased additives and palm oil content reduce the health score.
- Environmental Score:
- Top countries: Poland, Sweden, and France.
- Environmental score decreases with higher packaging impact and shipping distance.
- Packaging Score:
- Fresh products score higher, while plastic-heavy packaging scores lower.
- Certifications like FSC and Green Dot are positive indicators.
- Jupyter Notebooks:
- Data Cleaning: Handling missing values and anomalies.
- Data Exploration: Statistical and visualization analysis.
- Presentation:
- Summarizes key findings and recommendations.
- Proposed App:
- Scores products on health, environment, and packaging for informed decisions.
- Include detailed packaging and shipping data for precise ecological scoring.
- Enhance prediction models with advanced machine learning techniques.
- Expand analysis to integrate consumer feedback for score validation.
This project showcases expertise in:
- Data cleaning, analysis, and visualization.
- Machine learning techniques for imputation and prediction.
- Application of data science to real-world consumer problems.