This data pipeline project moves data from API's to an SQL database and combines several elements: Public API access, data pipelines with MySQL, and Python for data manipulation, machine learning, and visualization.
- 📊 Stock Data API: Utilized Rapid API/Nairobi Stock Exchange to retrieve historical and real-time stock price data.
- 📰 News API: Integrated World News API to gather news articles related to Safaricom company which I chose. You need to obtain an API key to make requests to the API and add request params such as source-countries = 'ke' and text = 'safaricom' to get specific news.
- 🗃️ Stocks Data Table: Stores historical stock data including ticker, name, volume, price, change, and date.
- 🗞️ News Data Table: Stores news articles including article ID, title, text, URL, publish date, author, language, source country, and sentiment.
- ⚙️ Data Extraction: Extracted historical and real-time stock price data using Alpha Vantage API.
- 🚀 Data Loading: Loaded the extracted data into the MySQL database.
- 🛠️ Data Preprocessing: Preprocessed the stock data for machine learning, handling missing values, and creating new features.
- 📝 Sentiment Analysis: Performed sentiment analysis on news articles using NLTK and TextBlob, storing sentiment scores in the News Data Table.
- 🔄 Data Merging: Merged stock data with average daily sentiment scores.
- 🤖 Machine Learning Model: Trained a machine learning model using Scikit-learn to predict future stock prices based on historical data and sentiment features.
- 📊 Model Evaluation: Evaluated the model using Mean Squared Error (MSE) and R-squared metrics.
- 📈 Visualization: Visualized the predicted prices against the actual closing prices.
- ⚙️ db_config.py: Contains the database configuration settings.
- 🔄 db_operations.py: Handles database operations such as connecting to the database, executing queries, and fetching results.
- 🚀 load_model.py: Loads the machine learning model, performs data preprocessing, and evaluates the model's performance.
- 📰 load_news_pipeline.py: Implements the pipeline for fetching and processing news data, including sentiment analysis.
- 📊 load_stocks_pipeline.py: Implements the pipeline for fetching and processing stock data from the API.
- 🔧 transform_pipeline.py: Contains functions for processing, transforming and cleaning data for machine learning models.
- Contains sensitive information such as API keys and database credentials. Not included in the repository.
- Training and Testing Sets Shapes: (8, 2) (3, 2) (8,) (3,)
- MSE: 0.00033765568078515055
- R2: 0.8822131346098291
- Clone the repository.
git clone https://github.com/mohswell/StockNews--SentimentAnalyis.git cd StockNews--SentimentAnalyis
- Create a
.env
file in the root directory of the project and add the following environment variables:
NEWS_API_URL=https://api.worldnewsapi.com/search-news?text=safaricom&source-countries=ke&language=en&api-key=your_news_api_key_here
RAPIDAPI_KEY=your_rapidapi_key_here
RAPIDAPI_HOST=nairobi-stock-exchange-nse.p.rapidapi.com
RAPIDAPI_PATH=/stocks/Safaricom
DB_USER=your_database_user_here
DB_PASSWORD=your_database_password_here
DB_HOST=your_database_host_here
DB_NAME=your_database_name_here
DB_TABLE=your_database_table_here
- Install dependencies (
pip install -r requirements.txt
). - Run
python load_stocks_pipeline.py
to execute the project.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Your Name - Muhammad Said
This project showcases the integration of multiple APIs, data analysis, sentiment analysis, feature engineering, and machine learning for stock price prediction.