This project identifies misinformation in textual data using Retrieval-Augmented Generation (RAG), combining large language models (LLMs) with a custom knowledge retrieval pipeline for accurate, context-aware misinformation detection. By leveraging external knowledge sources, the system enhances the model's ability to distinguish between truthful and misleading content.
- RAG-based Model: Combines the retrieval of external knowledge and generation capabilities of LLMs for enhanced misinformation detection.
- Real-time Information Retrieval: Uses a custom-built information retrieval system to fetch relevant data for context-based analysis.
- High Accuracy: Capable of identifying false claims with a high degree of accuracy by cross-checking statements against factual databases.
-
Python 3.8+
-
Clone the repository
git clone https://github.com/JimmyIITR/misInformationIdentificationUsingRAG.git cd misInformationIdentificationUsingRAG/usingLangChain
-
Install the required libraries using
pip
:pip install -r requirements.txt
-
Prepare the Dataset: Ensure your dataset (textual data for analysis) is available in the
data/
directory. -
Run the Misinformation Detection:
streamlit run main.py
The script will process the input data, query the retrieval system, and use the LLM to generate a classification for misinformation.
"Climate change is a hoax created by scientists to get more funding."
"The statement has been flagged as misinformation based on contextual facts."
The system is based on a Retrieval-Augmented Generation (RAG) architecture, which involves:
- Information Retrieval: The first stage retrieves the most relevant documents or knowledge from a database to provide context.
- Text Generation: The second stage uses an LLM to generate a classification or response, based on both the input text and retrieved context.
For detailed documentation and the roadmap of our project, please visit the Wiki of this repository.
To view or download the video demonstration, click here 🎬
Feel free to open an issue. We are glad to help you. ❤️
This project is published under the MIT license.