Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add structured output to openai #603

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bfdykstra
Copy link
Contributor

Description

  • Adds a StructuredOutputChatOpenAI class to enable downstream applications to consume json

Simple example usage

import json
from kotaemon.llms import StructuredOutputChatOpenAI

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

answer = await structured_llm.ainvoke('Hello how are you?')

print(json.loads(answer.content))
# -> {'answer': "I'm just a computer program, but I'm here and ready to help you! How can I assist you today?"}

Example usage in a retrieval pipeline

from kotaemon.storages.docstores import LanceDBDocumentStore
from kotaemon.storages.vectorstores import ChromaVectorStore
from kotaemon.embeddings.openai import OpenAIEmbeddings
from ktem.ktem.index.file.pipelines import DocumentRetrievalPipeline
from kotaemon.indices.qa.format_context import PrepareEvidencePipeline
from kotaemon.indices.qa.citation_qa import AnswerWithContextPipeline
from kotaemon.llms.chats.openai import StructuredOutputChatOpenAI, ChatOpenAI

from ktem.ktem.reasoning.simple import FullQAPipeline

from kotaemon.indices.rankings import LLMTrulensScoring

app_dir = "<path to your app data>/kotaemon/ktem_app_data/"
user_data_dir = app_dir + "user_data/"
doc_store_dir = user_data_dir + "docstore/"
doc_store = LanceDBDocumentStore(path = doc_store_dir, collection_name="index_1")

# vector store stuff
vector_store_dir = user_data_dir + "vectorstore"

vector_store = ChromaVectorStore(path = vector_store_dir, collection_name="index_1")

llm = ChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 0,
    api_key = os.environ.get('OPENAI_API_KEY'),
)
llm_scorer = LLMTrulensScoring( llm = llm )

#embeddings
embedding = OpenAIEmbeddings(
    base_url='https://api.openai.com/v1',
    model = 'text-embedding-ada-002',
    api_key=os.environ.get('OPENAI_API_KEY'),
    context_length=8191)


# document retrieval pipeline
document_retrieval = DocumentRetrievalPipeline(
    embedding = embedding,
    retrieval_mode = 'vector', # can be vector or text
    vector_store = vector_store,
    doc_store = doc_store,
    top_k=5,
    rerankers=[], #can provide rerankers
    llm_scorer = llm_scorer
    # rerankers = [cohere_reranking]
)

# pipeline that formats retrieved content
evidence_pipeline = PrepareEvidencePipeline()

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

# answer questions with provided evidence
answer_pipeline = AnswerWithContextPipeline(
    llm=structured_llm,
    qa_template= (
            "Context: \n{context}\n\n"
            "{question}\n"
        )
)

qa_pipeline = FullQAPipeline(
    retrievers=[document_retrieval],
    evidence_pipeline=evidence_pipeline,
    answering_pipeline=answer_pipeline
)

prompt = 'This is a prompt'

# fetch relevant document ids and implement invoke method
answer, scored_docs = qa_pipeline.invoke(prompt, document_ids=[])
        
parsed_answer = json.loads(answer.content)

Type of change

  • New features (non-breaking change).
  • Bug fix (non-breaking change).
  • Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • I have performed a self-review of my code.
  • I have added thorough tests if it is a core feature.
  • There is a reference to the original bug report and related work.
  • I have commented on my code, particularly in hard-to-understand areas.
  • The feature is well documented.

@bfdykstra bfdykstra changed the title [Feature] add structured output to openai feat: add structured output to openai Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant