Skip to content

collinskatie/structured_flexible_and_robust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the official codebase for our 2022 CogSci paper:

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

Katherine M. Collins*, Lionel Wong*, Jiahai Feng, Megan Wei, and Joshua B. Tenenbaum

*Contributed equally

Paper: https://arxiv.org/pdf/2205.05718.pdf

Project Page: https://sites.google.com/view/structured-flexible-and-robust/home

Spreadsheet of Collected Human and GPT-3 Generations: https://docs.google.com/spreadsheets/d/1K7Jspk4Yb-fDl2bucejxgg9mT-1-CPgAfPkv2nOZss8/edit?usp=sharing

Abstract

Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.

Repository Details

Code for "Part I: Linguistic reasoning benchmark for humans and language models" can be found in the Part_I directory.

Human and LLM generations, as well as all stimuli, can be found in the nested data directories for each domain (plans and explanations, respectively). Please keep in mind that our LLM generations were elicited from the GPT-3 variant (davinci-003) prior to any instruction fine-tuning.

Code for "Part II: Integrating language with structured reasoning models" can be found in the Part_II directory.

Citing

If citing us, please consider the following bibtex entry:

@misc{collinsWong2022,
  doi = {10.48550/ARXIV.2205.05718},
  
  url = {https://arxiv.org/abs/2205.05718},
  
  author = {Collins, Katherine M. and Wong, Catherine and Feng, Jiahai and Wei, Megan and Tenenbaum, Joshua B.},
  
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), Symbolic Computation (cs.SC), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks},
  
  publisher = {CogSci},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •