Concept

This is a series of experiments on how effectively an LLM agent can analyze and improve an academic argument. The experiments focus on historical arguments as an excellent playground for non-technical arguments grounded in verifiable facts. In particular, I noticed that LLM answers to historical questions are promising but broad enough that they could not effectively be assessed without significant further information. Moreover, the claims often break down when an LLM is asked for clarification without tools. I aim to see how far methods such as LLM self-examination, structured data generation, and access to external data can go toward improving a model's effective reasoning capabilities on complex questions.

Tech Stack

The project is a Jupyter Notebook using the LangChain API, using Google's low-end model gemini-1.5-flash.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Concept

Tech Stack

Files

README.md

Latest commit

History

README.md

File metadata and controls

Concept

Tech Stack