Pip Lib

https://pypi.org/project/knowledgegpt/

To use library
pip install knowledgegpt

Before run project locally

Please check config file use own open-ai api-key and your own mongo uri

knowledgegpt

knowledgegpt is designed to gather information from various sources, including the internet and local data, which can be used to create prompts. These prompts can then be utilized by OpenAI's GPT-3 model to generate answers that are subsequently stored in a database for future reference.

To accomplish this, the text is first transformed into a fixed-size vector using either open source or OpenAI models. When a query is submitted, the text is also transformed into a vector and compared to the stored knowledge embeddings. The most relevant information is then selected and used to generate a prompt context.

knowledgegpt supports various information sources including websites, PDFs, PowerPoint files (PPTX), and documents (Docs). Additionally, it can extract text from YouTube subtitles and audio (using speech-to-text technology) and use it as a source of information. This allows for a diverse range of information to be gathered and used for generating prompts and answers.

How to use

Restful API

uvicorn server:app --reload

How to install the library

pip install knowledgegpt or

git clone https://github.com/geeks-of-data/knowledge-gpt.git
pip install .

Before running for the first time download the related spacy model by running:

# !python3 -m spacy download en_core_web_sm

How to use the library

# Import the library
from knowledgegpt.extractors.web_scrape_extractor import WebScrapeExtractor

# Import OpenAI and Set the API Key
import openai
from example_config import SECRET_KEY 
openai.api_key = SECRET_KEY

# Define target website
url = "https://en.wikipedia.org/wiki/Bombard_(weapon)"

# Initialize the WebScrapeExtractor
scrape_website = WebScrapeExtractor( url=url, embedding_extractor="hf", model_lang="en")

# Prompt the OpenAI Model
answer, prompt, messages = scrape_website.extract(query="What is a bombard?",max_tokens=300,  to_save=True, mongo_client=db)

# See the answer
print(answer)

# Output: 'A bombard is a type of large cannon used during the 14th to 15th centuries.'

Other examples can be found in the examples folder. But to give a better idea of how to use the library, here is a simple example:

# Basic Usage
basic_extractor = BaseExtractor(df)
answer, prompt, messages = basic_extractor.extract("What is the title of this PDF?", max_tokens=300)

# PDF Extraction
pdf_extractor = PDFExtractor( pdf_file_path, extraction_type="page", embedding_extractor="hf", model_lang="en", )
answer, prompt, messages = pdf_extractor.extract(query, max_tokens=1500)

# PPTX Extraction
ppt_extractor = PowerpointExtractor(file_path=ppt_file_path, embedding_extractor="hf", model_lang="en")
answer, prompt, messages = ppt_extractor.extract( query,max_tokens=500)

# DOCX Extraction
docs_extractor = DocsExtractor(file_path="../example.docx", embedding_extractor="hf", model_lang="en", is_turbo=False)
answer, prompt, messages = \
    docs_extractor.extract( query="What is an object detection system?", max_tokens=300)

# Extraction from Youtube video (audio)
scrape_yt_audio = YoutubeAudioExtractor(video_id=url, model_lang='tr', embedding_extractor='hf')
answer, prompt, messages = scrape_yt_audio.extract( query=query, max_tokens=1200)

# Extraction from Youtube video (transcript)
scrape_yt_subs = YTSubsExtractor(video_id=url, embedding_extractor='hf', model_lang='en')
answer, prompt, messages = scrape_yt_subs.extract( query=query, max_tokens=1200)

How to contribute

Open an issue
Fork the repo
Create a new branch
Make your changes
Create a pull request

FEATURES

Extract knowledge from the internet (i.e. Wikipedia)
Extract knowledge from local data sources - PDF
Extract knowledge from local data sources - DOCX
Extract knowledge from local data sources - PPTX
Extract knowledge from youtube audio (when caption is not available)
Extract knowledge from youtube transcripts
Library implementation (partially done, initial release)

TODO

( To be extended...)

System Architecture

(To be updated with a better image)

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
examples		examples
knowledgegpt		knowledgegpt
static_files		static_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bot.py		bot.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pip Lib

Before run project locally

knowledgegpt

How to use

Restful API

How to install the library

How to use the library

How to contribute

FEATURES

TODO

System Architecture

About

Releases

Packages

Languages

License

kaanozbudak/knowledge-gpt

Folders and files

Latest commit

History

Repository files navigation

Pip Lib

Before run project locally

knowledgegpt

How to use

Restful API

How to install the library

How to use the library

How to contribute

FEATURES

TODO

System Architecture

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages