sycl : Implemented reorder Q4_K mmvq #13109

sgeor255 · 2025-04-25T12:13:55Z

This PR enables reorder optimization for Q4_K layout similarly to #12858 . This branch is based off of @Alcpz 's and before that is merged the easiest way to review it is looking at the diff for d1f5b2d .

Some performance numbers below:

Lunar lake

GGML_SYCL_DISABLE_OPT=0

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	pp512	1593.59 ± 79.66
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	tg128	41.43 ± 0.49
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	pp512	551.60 ± 2.19
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	tg128	17.69 ± 1.04
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	pp512	590.18 ± 4.57
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	tg128	28.36 ± 0.24
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	507.64 ± 0.92
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	13.61 ± 0.07
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	pp512	823.78 ± 30.18
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	tg128	21.44 ± 0.08

build: 105a01d (5223)

GGML_SYCL_DISABLE_OPT=1

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	pp512	1624.32 ± 64.90
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	tg128	36.27 ± 0.25
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	pp512	552.24 ± 1.20
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	tg128	12.83 ± 1.24
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	pp512	623.69 ± 3.50
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	tg128	24.23 ± 0.58
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	508.55 ± 1.01
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	10.21 ± 0.03
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	pp512	820.33 ± 30.67
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	tg128	17.72 ± 0.06

build: 105a01d (5223)

Arc B580 (Battlemage)

GGML_SYCL_DISABLE_OPT=0

model	size	params	backend	ngl	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	pp512	7963.47 ± 49.91
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	tg128	119.66 ± 1.24
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	pp512	2251.25 ± 3.16
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	tg128	53.63 ± 0.51
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	pp512	5899.09 ± 16.46
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	tg128	87.05 ± 2.77
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	pp512	2116.96 ± 3.79
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	tg128	47.78 ± 0.32
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	pp512	3247.42 ± 3.66
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	tg128	66.47 ± 0.62

build: 105a01d (5223)

GGML_SYCL_DISABLE_OPT=1

model	size	params	backend	ngl	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	pp512	7900.28 ± 61.92
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	tg128	100.15 ± 3.03
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	pp512	2250.62 ± 2.25
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	tg128	38.05 ± 0.25
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	pp512	5925.76 ± 9.85
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	tg128	71.27 ± 0.16
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	pp512	2114.17 ± 3.93
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	tg128	34.39 ± 0.10
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	pp512	3265.26 ± 6.07
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	tg128	54.89 ± 0.55

build: 105a01d (5223)

Arc A770

GGML_SYCL_DISABLE_OPT=0

model	size	params	backend	ngl	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	pp512	4540.38 ± 8.00
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	tg128	44.47 ± 0.15
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	pp512	1753.07 ± 2.08
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	tg128	32.04 ± 0.22
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	pp512	3785.29 ± 6.46
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	tg128	38.65 ± 0.33
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	pp512	1702.11 ± 2.83
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	tg128	29.26 ± 0.07
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	pp512	2534.60 ± 0.94
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	tg128	34.11 ± 0.32

build: 105a01d (5223)

GGML_SYCL_DISABLE_OPT=1

model	size	params	backend	ngl	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	pp512	4532.79 ± 9.10
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	none	tg128	44.17 ± 0.39
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	pp512	1749.38 ± 2.50
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	none	tg128	26.03 ± 0.02
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	pp512	3774.80 ± 2.61
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	none	tg128	35.51 ± 0.08
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	pp512	1702.25 ± 1.93
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	none	tg128	23.40 ± 0.23
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	pp512	2535.88 ± 3.86
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	none	tg128	30.36 ± 0.39

build: 105a01d (5223)

ggml/src/ggml-sycl/ggml-sycl.cpp

NeoZhangJianyu

Could you share the GPU type of above test result?
Have you test the PR by local UT?
Could you check the detailed output of Q4_K LLM?
I guess the output should be different to legacy code.

NeoZhangJianyu · 2025-04-28T02:56:07Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+        // Dispatch becomes obscure with the reorder, MMVQ when the reorder optimization
+        // is enabled takes precedence over DMMV, the current if-else implementation
+        // requires disabling DMMV if both conditions are met
+        || (reorder && ggml_sycl_supports_reorder_mmvq(src0->type))) {


I have same comment and concern:
This code will impact the code path of below. That would lead to the wrong result.

I suggest this PR only optimize the mmvq() function.
You could add another PR to optimize by changing the code path, like by this sentence.

Thanks for the comment, this branch is based off of #12858 and the only changes I've added are in d1f5b2d . When #12858 is merged I will rebase again, so whatever conclusion is reached on that PR will propagate here too.

OK! Let's focus on the review of #12858 firtly.

NeoZhangJianyu · 2025-04-28T03:01:43Z

ggml/src/ggml-sycl/ggml-sycl.cpp

@@ -2968,14 +2994,17 @@ static void ggml_sycl_mul_mat(ggml_backend_sycl_context & ctx, const ggml_tensor
        // KQ + KQV multi-batch
        ggml_sycl_mul_mat_batched_sycl(ctx, src0, src1, dst);
    } else if (use_dequantize_mul_mat_vec) {
-        ggml_sycl_op_mul_mat(ctx, src0, src1, dst, ggml_sycl_op_dequantize_mul_mat_vec, false);
-        // save_tensor_txt("1/dst_1.txt", (float*) dst->data, src0->ne[1], sizeof(float), ctx.stream());
+        constexpr bool convert_src1_to_q8_1 = false;


Could you follow the solution of PR #13003?
It fixed base issue of reorder Q4_0.

I rebased this branch and will rebase it again when #12858 is merged, so these changes should make it into this PR.

Depending on the discussion WRT opt_for_reorder and how to call the function, this will require another rebase, sorry about that. Will keep it to a single commit so you can cherry pick with no issues.

OK! Let's focus on the review of #12858 firtly.

NeoZhangJianyu · 2025-04-28T10:21:37Z

@sgeor255
Here is a discussion about Q4_K. #13120 (reply in thread)
Could you test the model by this PR?
If result is good, could you reply with your test result?

We need promote SYCL backend in related cases. :)

ggml/src/ggml-sycl/dequantize.hpp

ggml/src/ggml-sycl/ggml-sycl.cpp

sgeor255 · 2025-05-01T14:43:51Z

I rebased the PR on @Alcpz 's latest changes & updated the description with more performance numbers.

sgeor255 · 2025-05-01T15:24:22Z

@NeoZhangJianyu to answer your questions:

Could you share the GPU type of above test result?

I updated the PR description with results from more devices.

Have you test the PR by local UT?

Unit tests pass locally (if I understood the question correctly?).

Could you check the detailed output of Q4_K LLM?
I guess the output should be different to legacy code.

I ran the example script with Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf, output below:

master @ 8936784

sampler seed: 0
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0

user

Building a website can be done in 10 simple steps:
Step 1:assistant

Here are the 10 simple steps to build a website:

**Step 1: Plan Your Website**
 Define the purpose, target audience, and goals of your website. Determine what type of content you will feature, and what features you want to include (e.g., e-commerce, blog, contact form).

**Step 2: Choose a Domain Name**
 Register a unique and memorable domain name that reflects your website's identity and is easy to spell. Make sure to check if the name is available and not already taken by someone else.

**Step 3: Select a Web Host**
 Choose a reliable web hosting service that meets your website's needs in terms of storage, bandwidth, and technical support. Some popular options include Bluehost, HostGator, and SiteGround.

**Step 4: Design Your Website**
 Use a website builder tool or a content management system (CMS) like WordPress to design and layout your website. Choose a theme or template that is responsive and user-friendly.

**Step 5: Add Content**
 Populate your website with high-quality content, including text, images, videos, and other multimedia elements. Make sure to optimize your content for search engines (SEO).

**Step 6: Install a CMS (Optional)**
 If you want to have more control over your website's design and functionality, install a CMS like WordPress, Joomla, or Drupal.

**Step 7: Set Up Navigation**
 Create a logical and intuitive navigation menu that allows visitors to easily find and access different parts of your website.

**Step 8: Add Features and Functionality**
 Add features and functionality to your website, such as contact forms, email newsletters, and e-commerce functionality (if applicable).

**Step 9: Test and Launch**
 Test your website thoroughly to ensure that it is stable, secure, and functions as intended. Launch your website and make it available to the public.

**Step 10: Maintain and Update**
 Regularly update your website's content, plugins, and

This PR

sampler seed: 0
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0

user

Building a website can be done in 10 simple steps:
Step 1:assistant

Here are the 10 simple steps to build a website:

**Step 1: Plan Your Website**
 Define the purpose, target audience, and goals of your website. Determine what type of content you will feature, and what features you want to include (e.g., e-commerce, blog, contact form).

**Step 2: Choose a Domain Name**
 Register a unique and memorable domain name that reflects your website's identity. Ensure it's easy to spell and remember, and consider the extension (e.g., .com, .net, .io).

**Step 3: Select a Web Hosting Service**
 Choose a reliable web hosting service that meets your needs (e.g., bandwidth, storage, customer support). Consider factors like uptime, security, and scalability.

**Step 4: Plan Your Content**
 Develop a content strategy that includes writing engaging articles, creating high-quality images, and planning a content calendar.

**Step 5: Design Your Website**
 Create a visually appealing and user-friendly website design using a website builder, design software, or by hiring a professional designer.

**Step 6: Choose a Content Management System (CMS)**
 Select a CMS like WordPress, Joomla, or Drupal that suits your needs and allows for easy content management.

**Step 7: Install and Customize Your CMS**
 Install the CMS and customize it to your liking using themes, plugins, and widgets.

**Step 8: Create and Add Content**
 Write and publish engaging content, add images and multimedia, and optimize it for search engines.

**Step 9: Test and Launch**
 Test your website for bugs, usability issues, and performance. Launch your website and make any final adjustments.

**Step 10: Maintain and Update**
 Regularly update your website with fresh content, fix bugs, and keep your CMS and plugins up-to-date to ensure a smooth user experience and maintain search engine rankings.

Let me know if you'd like me to expand on any of these steps!

AD2605 · 2025-05-02T09:21:12Z

@sgeor255 I cannot resolve my comments (because the "resolve conversation " button is just isn't there for me), consider them resolved 👍🏻

Alcpz

Overall LGTM

ggml/src/ggml-sycl/ggml-sycl.cpp

Alcpz · 2025-05-02T10:55:46Z

ggml/src/ggml-sycl/ggml-sycl.cpp

@@ -2968,14 +2994,17 @@ static void ggml_sycl_mul_mat(ggml_backend_sycl_context & ctx, const ggml_tensor
        // KQ + KQV multi-batch
        ggml_sycl_mul_mat_batched_sycl(ctx, src0, src1, dst);
    } else if (use_dequantize_mul_mat_vec) {
-        ggml_sycl_op_mul_mat(ctx, src0, src1, dst, ggml_sycl_op_dequantize_mul_mat_vec, false);
-        // save_tensor_txt("1/dst_1.txt", (float*) dst->data, src0->ne[1], sizeof(float), ctx.stream());
+        constexpr bool convert_src1_to_q8_1 = false;


Depending on the discussion WRT opt_for_reorder and how to call the function, this will require another rebase, sorry about that. Will keep it to a single commit so you can cherry pick with no issues.

sgeor255 · 2025-05-02T13:41:33Z

@sgeor255 Here is a discussion about Q4_K. #13120 (reply in thread) Could you test the model by this PR? If result is good, could you reply with your test result?

We need promote SYCL backend in related cases. :)

https://github.com/NeoZhangJianyu there's a small improvement for this model too:

GGML_SYCL_DISABLE_OPT=0

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 7B Q4_K - Medium	4.36 GiB	7.62 B	SYCL	99	8	none	pp512	3681.78 ± 24.68
qwen2 7B Q4_K - Medium	4.36 GiB	7.62 B	SYCL	99	8	none	tg128	62.10 ± 0.27

build: 105a01d (5223)

GGML_SYCL_DISABLE_OPT=1

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 7B Q4_K - Medium	4.36 GiB	7.62 B	SYCL	99	8	none	pp512	3721.85 ± 16.25
qwen2 7B Q4_K - Medium	4.36 GiB	7.62 B	SYCL	99	8	none	tg128	45.49 ± 0.16

build: 105a01d (5223)

ggml/src/ggml-sycl/ggml-sycl.cpp

sgeor255 · 2025-05-09T16:07:27Z

This PR is now rebased on master as #12858 was merged.

qnixsynapse

LGTM so far

NeoZhangJianyu · 2025-05-13T07:15:18Z

I find the refer PR #12858 has performance and wrong result issue.
Please hope this PR, until the #12858 is confirmed.

Rbiessy

LGTM overall!

ggml/src/ggml-sycl/dmmv.cpp

ggml/src/ggml-sycl/ggml-sycl.cpp

NeoZhangJianyu · 2025-05-15T01:38:57Z

ggml/src/ggml-sycl/ggml-sycl.cpp


+static bool can_use_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {


Suggested change

static bool can_use_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

static bool choose_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

It's nit but can_use seems more accurate to me since there are more logic later on to make the final decision on the matmul implementation.

We should use one function to detect if need call deq_mul_mat_vec(), instead of several.

What I mean is that the final choice depends on the outputs from can_use_dequantize_mul_mat_vec and can_use_mul_mat_vec_q so it can't all be in a single choose_dequantize_mul_mat_vec currently.

Yes, I agree with @Rbiessy and that's why it is currently implemented this way.

NeoZhangJianyu · 2025-05-15T01:39:12Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+           src0->ne[0] % GGML_SYCL_DMMV_X == 0 && src1->ne[1] == 1;
+}
+
+static bool can_use_mul_mat_vec_q(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {


Suggested change

static bool can_use_mul_mat_vec_q(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

static bool choose_mul_mat_vec_q(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

Rbiessy · 2025-05-15T15:35:37Z

Merging now since this PR includes an important fix with the reorder optimization mentioned here: #13109 (comment)
I think the major concerns have been answered.

Alcpz changed the title ~~sycl : Implemented reorder Q4_0 mmvq~~ sycl : Implemented reorder Q4_K mmvq Apr 25, 2025

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 25, 2025

qnixsynapse reviewed Apr 26, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

NeoZhangJianyu reviewed Apr 28, 2025

View reviewed changes

AD2605 reviewed Apr 28, 2025

View reviewed changes

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch 2 times, most recently from d1f5b2d to 105a01d Compare May 1, 2025 10:49

sgeor255 requested a review from AD2605 May 1, 2025 10:56

AD2605 approved these changes May 2, 2025

View reviewed changes

Alcpz reviewed May 2, 2025

View reviewed changes

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch from 105a01d to 685e02b Compare May 2, 2025 13:27

qnixsynapse reviewed May 3, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

AD2605 reviewed May 5, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch from 685e02b to d7e5179 Compare May 6, 2025 14:34

sgeor255 mentioned this pull request May 6, 2025

Eval bug: Qwen3 Q4_0 not working with SYCL #13163

Closed

Alcpz mentioned this pull request May 6, 2025

SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled #13254

Merged

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch from d7e5179 to bb10b4a Compare May 9, 2025 09:51

Alcpz mentioned this pull request May 9, 2025

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs #12858

Merged

3 tasks

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch from bb10b4a to f7e7d2a Compare May 9, 2025 16:03

s-Nick mentioned this pull request May 12, 2025

sycl : Overcoming workaround for mmap() allocation on Windows #13482

Merged

sgeor255 requested review from qnixsynapse and Alcpz May 12, 2025 14:20

qnixsynapse approved these changes May 13, 2025

View reviewed changes

Rbiessy reviewed May 14, 2025

View reviewed changes

ggml/src/ggml-sycl/dmmv.cpp Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

sycl: reordered Q4_K MMVQ

2a2aef0

sgeor255 force-pushed the svet/mmvq_q4_k_reorder branch from f7e7d2a to 2a2aef0 Compare May 14, 2025 15:34

Rbiessy approved these changes May 14, 2025

View reviewed changes

NeoZhangJianyu reviewed May 15, 2025

View reviewed changes

Rbiessy merged commit 64bb51c into ggml-org:master May 15, 2025
44 checks passed


		static bool can_use_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

	static bool can_use_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {
	static bool choose_dequantize_mul_mat_vec(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

	static bool can_use_mul_mat_vec_q(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {
	static bool choose_mul_mat_vec_q(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {

sycl : Implemented reorder Q4_K mmvq #13109

sycl : Implemented reorder Q4_K mmvq #13109

Uh oh!

Conversation

sgeor255 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lunar lake

Arc B580 (Battlemage)

Arc A770

Uh oh!

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sgeor255 commented May 1, 2025

Uh oh!

sgeor255 commented May 1, 2025

Uh oh!

AD2605 commented May 2, 2025

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgeor255 commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

sgeor255 commented May 9, 2025

Uh oh!

qnixsynapse left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented May 13, 2025

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rbiessy commented May 15, 2025

Uh oh!

sgeor255 commented Apr 25, 2025 •

edited

Loading