Let's add a capstone project! #93

burtenshaw · 2024-12-12T11:23:37Z

Some students have asked for a capstone project that allows them to try out their skills and get collective feedback on their work.

We have another discussion going on here about a capstone using a student leaderboard. Also, the evaluation module already has a project in it, which could be converted to a capstone.

Evaluation Capstone

This could use the existing material in module_4 and ask the students to implement an LLM evaluation task. They could then share their evaluation task and results. We could publish the results in a leaderboard on the Hugging Face hub.

RAG Capstone

@duydl proposed a capstone project on RAG here

burtenshaw · 2024-12-13T09:29:17Z

I have added a draft PR that proposes a setup for this issue: #97

michaelshekasta · 2024-12-16T12:15:56Z

I think LB is great, but we should focus on specific tasks. In my opinion, we should define (or better yet, find) datasets that have one correct answer.

Here are some examples:

Math problems (I know this will be difficult, but it's easy to understand what I mean)
Counting elements in an image (VLM)
Specific fields such as law, finance, or medicine
Code-related tasks: fixing Python code or similar tasks

What do you think @burtenshaw ?

burtenshaw · 2024-12-16T12:47:51Z

Thanks @michaelshekasta . This sounds like a valid evaluation setup, and the tasks that you suggest are a good starting point. These are my main concerns:

If we define a set of tasks for students, we might be repeating the work of the open llm leaderboard, which will become difficult to maintain.
Any set of tasks may limit students with specific focuses. It would be cool if we could allow people to define their own tasks and add those to a set of core tasks.
There's a lack of library support for evaluating LLMs on vision tasks which would make the implementation extensive.

With these concerns in mind, what do you think about this as a proposal to implement on top of #97?

We take a set of core automated tasks that meet the tasks/domains you highlight (-vision)
We add these tasks to the capstone setup in [MODULE] Capstone project on evaluation #97
We instruct students to do one (or both) of these things:
- run the evaluation suite and open a PR with your model's results
- create a custom evaluation task and add it to the suite
We review and validate PRs for results submissions and add them to the leaderboard

IMO, this workflow would satisfy you comment about core interpretable evaluation, whilst also support a growing 'community driven' evaluation suite.

michaelshekasta · 2024-12-16T13:05:16Z

@burtenshaw Thank you for the quick reply! I will break my thoughts into two parts:

Small LLMs – I understand that we have openLB, but I'm unclear about the benefits of using a small LLM in this context. I would assume that larger models would likely perform better than smaller ones. My suggestion is to choose a specific task, such as math, and optimize the model for it.

VLMs – I believe we can also use the VLM leaderboard and select a specific task to optimize. Alternatively, we could use a dataset (such as one with stable diffusion models) and apply the methods from this work to count the relevant data.

burtenshaw · 2024-12-16T19:43:22Z

Small LLMs – I understand that we have openLB, but I'm unclear about the benefits of using a small LLM in this context. I would assume that larger models would likely perform better than smaller ones.

This is correct and the open llm leaderboad can still be used to compare smaller models. See this filter

My suggestion is to choose a specific task, such as math, and optimize the model for it.

I agree with this, I just want to extend it so that others can also choose specific tasks for the student leaderboard.

VLMs – I believe we can also use the VLM leaderboard and select a specific task to optimize.

Ok. VLM leaderboard uses the VLMEvalKit library. I'm definitely open to other evaluations, which relates to my above comment. As long as its something we can maintain.

To summarise, I think that your proposal is compatible within the PR. We would just need to add a core specific task.

burtenshaw mentioned this issue Dec 12, 2024

Let's work on a RAG module! #85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's add a capstone project! #93

Let's add a capstone project! #93

burtenshaw commented Dec 12, 2024

burtenshaw commented Dec 13, 2024

michaelshekasta commented Dec 16, 2024

burtenshaw commented Dec 16, 2024

michaelshekasta commented Dec 16, 2024 •

edited

Loading

burtenshaw commented Dec 16, 2024

Let's add a capstone project! #93

Let's add a capstone project! #93

Comments

burtenshaw commented Dec 12, 2024

Evaluation Capstone

RAG Capstone

burtenshaw commented Dec 13, 2024

michaelshekasta commented Dec 16, 2024

burtenshaw commented Dec 16, 2024

michaelshekasta commented Dec 16, 2024 • edited Loading

burtenshaw commented Dec 16, 2024

michaelshekasta commented Dec 16, 2024 •

edited

Loading