A5: Dynamic Group Chat

Demonstration

Dynamic Group Chat with a product manager, a coder, and a human adm.
Dynamic Group Chat for a visualization task
Dynamic Group Chat for a research task

Pilot Study on the Speaker Selection Policy

To validate the necessity of multi-agent dynamic group chat and the effectiveness of the role-play speaker selection policy, we conduct a pilot study comparing a four-agent dynamic group chat system with two possible alternatives across 12 manually crafted complex tasks. Then we evaluate the performance of the three systems in terms of success rate (higher the better), average llm call (lower the better), and termination failure count (lower the better). The results show that the four-agent dynamic group chat system outperforms the other two systems in all three metrics.

How to run the code

Install dependency locally
Set up OpenAI model and key in OAI_CONFIG_LIST
Run run_experiment.py

python /path/to/run_experiment.py

A Four-agent dynamic group chat system

The four-agent group chat system was composed of the following group members: a user proxy to take human inputs, an engineer to write code and fix bugs, a critic to review code and provide feedback, and a code executor for executing code.

Two alternative systems

The first alternative system was a two-agent group chat system composed of a user proxy and an engineer. The second alternative system was a four-agent group chat system composed of the same four group members as the four-agent dynamic group chat system and a different, task-style speaker selection policy.

Results

Number of successes on the 12 tasks (higher the better).

Model	Two Agent	Group Chat with roleplay policy	Group Chat with task-based policy
GPT-3.5-turbo	8	9	7
GPT-4	9	11	8

Average number of llm calls (lower the better) and number of termination failures (lower the better) on the 12 tasks.

Model	Two Agent	Group Chat with roleplay policy	Group Chat with task-based policy
GPT-3.5-turbo	9.9, 9	5.3, 0	4, 0
GPT-4	6.8, 3	4.5, 0	4, 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A5: Dynamic Group Chat

Demonstration

Pilot Study on the Speaker Selection Policy

How to run the code

A Four-agent dynamic group chat system

Two alternative systems

Results

Number of successes on the 12 tasks (higher the better).

Average number of llm calls (lower the better) and number of termination failures (lower the better) on the 12 tasks.

Files

README.md

Latest commit

History

README.md

File metadata and controls

A5: Dynamic Group Chat

Demonstration

Pilot Study on the Speaker Selection Policy

How to run the code

A Four-agent dynamic group chat system

Two alternative systems

Results

Number of successes on the 12 tasks (higher the better).

Average number of llm calls (lower the better) and number of termination failures (lower the better) on the 12 tasks.