Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud-edge collaborative speculative decoding for LLM based on KubeEdge-Ianvs #126

Open
hsj576 opened this issue Jul 23, 2024 · 15 comments · May be fixed by #156
Open

Cloud-edge collaborative speculative decoding for LLM based on KubeEdge-Ianvs #126

hsj576 opened this issue Jul 23, 2024 · 15 comments · May be fixed by #156
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@hsj576
Copy link
Member

hsj576 commented Jul 23, 2024

  • Description:
    • The autoregressive decoding mode of LLM determines that LLM can only be decoded serially, which limits its inference speed. Speculative decoding technique can be used to decode LLM in parallel with the help of draft model, so as to improve the inference speed of LLM without loss of accuracy. However, the speculative decoding technology of LLM does not consider the application in the cloud-edge distributed environment. This project aims to implement cloud-edge collaborative speculative decoding based on KubeEdge-Ianvs, an open source cloud-edge collaborative distributed machine learning platform, so as to further improve the LLM inference speed in cloud-edge environment.
  • Expected outcome:
    • Implement an example of cloud-edge collaborative speculative decoding based on KubeEdge-Ianvs platform.
    • (Optional) Propose a more efficient cloud-edge collaborative speculative decoding algorithm.
  • Recommended Skills:
    • Familiar with LLM related technologies and have experience in deploying open source LLM locally.
    • Proficient in Python and Pytorch.
    • Have experience in deploying KubeEdge-Ianvs.
@kairveeehh
Copy link

Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?

@hsj576
Copy link
Member Author

hsj576 commented Aug 1, 2024

Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?

The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.

@kairveeehh
Copy link

Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?

The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.

okk sure , thankyou

@Ytemiloluwa
Copy link

Hello @hsj576 my name is Temi I am an open source developer. I have genuine interest to work on this project this fall.

@Ytemiloluwa
Copy link

Hi @hsj576 I am kairvee and would like to take up this project under LFX mentorship as it aligns with my interests and skills as well , could you assign it and let me know the further details?

The details of the project will be discussed in the weekly SIG ai meeting. Feel free to join us in https://zoom.us/j/4167237304 every Thursday 16:30 UTC+8.

it seems I missed the meeting. I wasn't familiar with the time zone

@kairveeehh
Copy link

Hi @hsj576,
Need your reviews and the way forward to contributing

Proposed Approach

  1. Speculative Decoding Implementation:

    • I plan to set up a basic speculative decoding pipeline using a draft model to parallelize the decoding process, thus improving the inference speed of the LLM.
  2. Cloud-Edge Architecture:

    • For cloud-edge collaboration, I will deploy the draft model at the edge to handle initial predictions and the full model in the cloud for verification and refinement. This setup aims to optimize resource usage and reduce latency.
  3. Testing and Optimization:

    • I will benchmark the system to evaluate performance improvements and ensure that the solution does not compromise accuracy. I will also explore possible optimizations to enhance the efficiency of the cloud-edge collaboration.

@lazyperson1020
Copy link

@hsj576 Are there any pre-tests to submit? I am interested to contribute.

@aryan0931
Copy link

@hsj576 I'm Aryan , I would like to take his project under LFX mentorship as it aligns perfectly with my skills, are there any pretest to submit?

@Sid260303
Copy link

@hsj576, Hello sir, I am Siddhant. I would like to work on this project under LFX mentorship as working on LLMs have been my aim and this opportunity is best to kick-start my journey in open source. I also missed the weekly meeting and would like to know more about the project and how can I help.
Also, it would be very kind of you if you could share some resources to make ourselves more prepared for the project

@hsj576
Copy link
Member Author

hsj576 commented Aug 7, 2024

I will release a pretest in next week.

@Ytemiloluwa
Copy link

I will release a pretest in next week.

Hello @hsj576 the application for mentees close on 13th August (Tuesday).

@hsj576
Copy link
Member Author

hsj576 commented Aug 7, 2024

I will release a pretest in next week.

Hello @hsj576 the application for mentees close on 13th August (Tuesday).

Ok, I will release the pretest as soon as possible (before 9th August).

@FuryMartin
Copy link
Contributor

Hi, I hope to take on this project. I would like to highlight my advantages and the contributions I can make to the community.

I have a relatively deep understanding of the concept of Edge Computing and LLMs, and I am familiar with the main strategies for LLM cloud-edge collaboration, as well as the principles and implementation methods of Speculative Decoding.

Additionally, I have strong programming skills, particularly in Python and PyTorch. Last year, I interned at a large AI company specializing in LLMs.

What's more, I am trying to conduct a research about LLM cloud-edge collaboration strategy using Ianvs. After two months of learning, I have gained a good understanding of Ianvs's architecture, interfaces, and features. Now I am attempting to introduce a new feature which is relevant to the issue. If I am accepted, the integration of my insights could lead to a more coherent and rational architectural design.

Eagerly await consideration of my taking on this task!

@sanggusti
Copy link

Hi @hsj576

I would like to know more on your weekly meet. Do you have calendar link that I can add myself into? I would like to know more about this project.

And for the Collaborative Speculative Decoding, do you mind to share the paper of this technique? I just find a Collaborative Decoding paper online on Arxiv(https://arxiv.org/html/2406.12295v1) and can't find the Collaborative Speculative Decoding one.

@hsj576
Copy link
Member Author

hsj576 commented Aug 9, 2024

Details of the pretest are released in #130.

@MooreZheng MooreZheng added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants