Author: Sacha Schwab
MIT license
- Crawl Yahoo Finance Cryptocurrency news articles
- Raw text data is preprocessed, embedded (TF-IDF)
- NLP engine runs keyword extraction based on TF-IDF weights, named entity extraction and sentiment analysis
- HDBSCAN algorithm used for clustering, with currently moderate effectiveness (to be enhance in upcoming versions).
- Reports are in /main as 'A3_DocumentNumber_X_sacha_schwab<' as per assessment outline/li>
- Code files: (1) 'code_webcralwer.ipynb', (2) 'code_nlp.ipynb'
- Model available under /main/model
- For privacy reasons the audio annotated Powerpoint presentation is not available here but in the assessment folder in JCU Learn
- Git
- Python 3.7
- Any IDE supporting Jupyter Notebook files
- Schedule daily_jobs/webcrawler.py code for daily run (ipynb version is for grading)
- Schedule daily_jobs/model_update.py for daily run
- TBD: Get connected articles to a new article by running get_cluster from model_run.py