Skip to content

TayJen/raisontext

Repository files navigation

RaisonText

RaisonText is an open-source AI library for synthetic (e. g. chatGPT-generated) text detection.

Data

We collected and generated 941k samples of human-written (33%) and AI-generated (67%) texts.

Public datasets used:

Generative models used:

  • Chat-GPT
  • opt-125m
  • opt-1.3b
  • opt-2.7b
  • llama2-7b
  • llama2-13b

Source name saved in source column.

Dataset may be downloaded here.

Part of generated data publiched on HuggingFace.

How to run

In general the model and the backend are two independent instances, where each of them runs independently. They only interact through RabbitMQ queues.

pip install -r requirements.txt
cd raisontext

Backend

The server part sends user requests to the queue and listens to the model's queue, so when a forecast is made for a particular user, it transmits it to the frontend.

uvicorn main:app

Model

The model listens to the backend queue, makes a prediction, and then sends it to the model's queue.

python logreg_baseline.py

Model weights

Model weights could be found on Google Drive by this link

Prediction baseline

We reached 0.96 roc_auc score with baseline model. Exract archive into repo root folder models.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •