flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 136
Star 1.4k

Code
Issues 36
Pull requests 6
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

36 Open 93 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Question] Overflow risks when batch size and sequence length grows extremely large

#596 opened Nov 8, 2024 by rchardx

[Feature Request] Add an argument to control the number of CTAs used in attention APIs

#591 opened Nov 7, 2024 by yzh119

Have any plans to optimize the decode kernel for NV-Hopper

#576 opened Oct 31, 2024 by JamesLim-sy

Release binary for torch 2.5

#575 opened Oct 30, 2024 by merrymercy

FlashInfer generating NANs on A100 GPU

#574 opened Oct 30, 2024 by dbarbuzzi

[Feature request] Adding optional cpu_indptr/cpu_qo_indptr parameter to plan method to avoid synchronized device to host copy.

#565 opened Oct 28, 2024 by reyoung

ImportError: cannot import name '_grouped_size_compiled_for_decode_kernels' from 'flashinfer.decode'

#549 opened Oct 23, 2024 by Hutlustc

Runtime error with single_prefill_with_kv_cache while Compilation

#541 opened Oct 20, 2024 by YudiZh

[Question] Sampling kernel only support FP32 now?

#531 opened Oct 15, 2024 by yz-tang

Runtime error with head dim 224

#528 opened Oct 14, 2024 by yz-tang

Have any plans to optimize the prefill kernel for the Hopper architecture?

#521 opened Oct 10, 2024 by alexngng

Question about use_tensor_cores = True or False

#520 opened Oct 10, 2024 by sleepwalker2017

Is lean attention supported by flash infer?

#515 opened Oct 8, 2024 by sleepwalker2017

Compilation issue on old cuda versions

#514 opened Oct 3, 2024 by yzh119

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize

#512 opened Sep 27, 2024 by yiakwy-xpu-ml-framework-team

bug: partial unit tests failed

#479 opened Aug 28, 2024 by zhyncs

failed to dispatch head_dim 96

#455 opened Aug 20, 2024 by ZX-ModelCloud

apply_rope_inplace will cause graphbreak due to mutated inputs

#403 opened Jul 28, 2024 by jianc99

[FEAT REQ][CUDA GRAPH] Allow explicit control flag to force enable/disable split KV

#397 opened Jul 26, 2024 by AgrawalAmey

pytorch 2.4 support

#395 opened Jul 25, 2024 by yzh119

1 of 2 tasks

CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)

#249 opened May 16, 2024 by lucasjinreal

Circular import error when importing built-from-source flashinfer

#248 opened May 15, 2024 by vedantroy

multiple definition of `cuda::__3::pipeline...

#245 opened May 14, 2024 by jpf888

Support MLA (Multi-Head Latent Attention) in DeepSeek-v2

#237 opened May 7, 2024 by yzh119

[LoRA] Roadmap of LoRA operators

#199 opened Apr 8, 2024 by yzh119

3 tasks

Previous 1 2 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly