Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardware: Intel iGPU, dGPU and NPU support #470

Open
2 of 3 tasks
xiangyang-95 opened this issue Mar 26, 2024 · 4 comments
Open
2 of 3 tasks

hardware: Intel iGPU, dGPU and NPU support #470

xiangyang-95 opened this issue Mar 26, 2024 · 4 comments
Assignees
Labels
category: hardware management Related to hardware & compute P1: important Important feature / fix type: feature request A new feature

Comments

@xiangyang-95
Copy link

xiangyang-95 commented Mar 26, 2024

Overview

  • Intel's Lunar Lake is releasing soon, which has CPU, NPU and iGPU in a single chip

Tasklist

Original Post

Problem
Unable to use Intel integrated GPU and discrete GPU to offload the model inferencing.

Success Criteria
Able to use Intel integrated GPU and discrete GPU to offload the model inferencing.

Additional context
I can add in the documentation to add Intel GPU support to nitro inference server.

@xiangyang-95 xiangyang-95 added the type: feature request A new feature label Mar 26, 2024
@rahulunair
Copy link

rahulunair commented Mar 28, 2024

I am trying to do the same, this would be incredibly useful with the latest Intel Core Ultra chips as well, as there is a unified cpu + GPU architecture.

With this, any user using jan or nitro directly on latest intel core ultra cpus will get a boost from the gpu

Options:

  1. Intel acceleration support for llama.cpp using ipex-llm as a backend: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html

  2. Direct compilation of llama.cpp with sycl bindings on device that support sycl: https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md

@xiangyang-95
Copy link
Author

xiangyang-95 commented Mar 28, 2024

@rahulunair I have integrated the nitro server with Intel OneAPI SYCL Optimization on llama.cpp. Stay tuned for the pull request.

@freelerobot
Copy link
Contributor

closing as dupe of #677

@dan-menlo dan-menlo changed the title feat: Add Intel integrated GPU and discrete GPU support hardware: Intel iGPU, dGPU and NPU support Sep 6, 2024
@dan-menlo
Copy link
Contributor

dan-menlo commented Sep 6, 2024

I'm re-opening this, given our discussions with Intel. We should evaluate the following possibilities:

  • llama.cpp through sycl (preferred)
  • engine: Intel OpenVino #677 (which Intel recommended on their call)
  • IPEX-LLM (less preferred to OpenVino)

@dan-menlo dan-menlo reopened this Sep 6, 2024
@dan-menlo dan-menlo moved this from Completed to Planning in Jan & Cortex Sep 6, 2024
@dan-menlo dan-menlo moved this from Planning to Scheduled in Jan & Cortex Sep 29, 2024
@freelerobot freelerobot added category: hardware management Related to hardware & compute and removed type: hardware support labels Oct 17, 2024
@freelerobot freelerobot added the P1: important Important feature / fix label Nov 27, 2024
@freelerobot freelerobot self-assigned this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: hardware management Related to hardware & compute P1: important Important feature / fix type: feature request A new feature
Projects
Status: Scheduled
Development

No branches or pull requests

5 participants