Skip to content

A machine learning project leveraging large language models (LLMs) to detect and flag misinformation in textual data. The system uses advanced NLP techniques to analyze and classify content for enhanced digital trust.

Notifications You must be signed in to change notification settings

Jimmy5467/misInformationIdentificationUsingRAG

 
 

Repository files navigation

Misinformation Identification with LLM

Project Overview

This project identifies misinformation in textual data using Retrieval-Augmented Generation (RAG), combining large language models (LLMs) with a custom knowledge retrieval pipeline for accurate, context-aware misinformation detection. By leveraging external knowledge sources, the system enhances the model's ability to distinguish between truthful and misleading content.

Features

  • RAG-based Model: Combines the retrieval of external knowledge and generation capabilities of LLMs for enhanced misinformation detection.
  • Real-time Information Retrieval: Uses a custom-built information retrieval system to fetch relevant data for context-based analysis.
  • High Accuracy: Capable of identifying false claims with a high degree of accuracy by cross-checking statements against factual databases.

Installation

Prerequisites

  • Python 3.8+

  • Clone the repository

      git clone https://github.com/JimmyIITR/misInformationIdentificationUsingRAG.git
      cd misInformationIdentificationUsingRAG/usingLangChain
  • Install the required libraries using pip:

    pip install -r requirements.txt

Usage

Running the Script

  1. Prepare the Dataset: Ensure your dataset (textual data for analysis) is available in the data/ directory.

  2. Run the Misinformation Detection:

    streamlit run main.py

The script will process the input data, query the retrieval system, and use the LLM to generate a classification for misinformation.

Example Input:

"Climate change is a hoax created by scientists to get more funding."

Example Output:

"The statement has been flagged as misinformation based on contextual facts."

Model Architecture

The system is based on a Retrieval-Augmented Generation (RAG) architecture, which involves:

  1. Information Retrieval: The first stage retrieves the most relevant documents or knowledge from a database to provide context.
  2. Text Generation: The second stage uses an LLM to generate a classification or response, based on both the input text and retrieved context.

Documentation

For detailed documentation and the roadmap of our project, please visit the Wiki of this repository.

Video Demonstration

To view or download the video demonstration, click here 🎬

Facing any issues???

Feel free to open an issue. We are glad to help you. ❤️

License

This project is published under the MIT license.




About

A machine learning project leveraging large language models (LLMs) to detect and flag misinformation in textual data. The system uses advanced NLP techniques to analyze and classify content for enhanced digital trust.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.9%
  • C++ 5.1%
  • Cython 2.7%
  • C 1.2%
  • Jupyter Notebook 0.5%
  • CMake 0.4%
  • Other 0.2%