Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Transform Codebase from Azure to GitHub #14

Merged
merged 316 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
316 commits
Select commit Hold shift + click to select a range
a33fcc6
update codeql
LeiWang1999 Feb 27, 2024
1dda8d5
fix uint32 zero issue
LeiWang1999 Feb 27, 2024
6358462
initial transparency.
LeiWang1999 Feb 28, 2024
29012ca
enhance transparency.
LeiWang1999 Feb 28, 2024
03fb8f7
Merge branch 'main' into azure/dev
LeiWang1999 Feb 28, 2024
b2373ac
rename transparency
LeiWang1999 Feb 28, 2024
8d92e90
dependabot fix
LeiWang1999 Feb 28, 2024
1430f15
Merge branch 'main' of https://github.com/microsoft/BitBLAS into azur…
LeiWang1999 Feb 28, 2024
376362e
update transparency.
LeiWang1999 Feb 28, 2024
9846883
update plugin
LeiWang1999 Feb 28, 2024
1f96898
Merge branch 'azure/dev' of https://github.com/microsoft/BitBLAS into…
LeiWang1999 Feb 28, 2024
d65b239
Merge branch 'main' into azure/dev
LeiWang1999 Feb 28, 2024
269103d
improve transparency
LeiWang1999 Feb 28, 2024
f6cc10b
Merge branch 'azure/dev' of https://github.com/microsoft/BitBLAS into…
LeiWang1999 Feb 28, 2024
1e93b94
remove redundant transparency
LeiWang1999 Feb 28, 2024
ebbd294
dsl benchmark scirpts
LeiWang1999 Feb 28, 2024
45d5da7
del tran
LeiWang1999 Feb 28, 2024
3c54987
update submodule.
LeiWang1999 Feb 28, 2024
1349ac3
remove redundant code.
LeiWang1999 Feb 28, 2024
9022591
Merge branch 'azure/dev' of https://dev.azure.com/msrasrg/LLMInferenc…
LeiWang1999 Feb 28, 2024
71e098c
remove transparency
LeiWang1999 Feb 28, 2024
2371058
fix propagate map issue
LeiWang1999 Feb 29, 2024
8c4da23
implement in register dequantize config
LeiWang1999 Feb 29, 2024
59b9622
optimize target
LeiWang1999 Feb 29, 2024
03b6124
fix tag.
LeiWang1999 Feb 29, 2024
151386d
fix some issues on ampere game device
LeiWang1999 Feb 29, 2024
3435fc9
finetune with data distribution.
LeiWang1999 Feb 29, 2024
7de361e
fill matmul benchmarking scripts
LeiWang1999 Feb 29, 2024
7f5ffd3
refactor use_async_copy to bool value
LeiWang1999 Feb 29, 2024
f3d35da
support af format
LeiWang1999 Mar 1, 2024
e1dd650
format fix
LeiWang1999 Mar 1, 2024
421f135
support propagate input transform for dequantization.
LeiWang1999 Mar 1, 2024
1a9b856
update requirements
LeiWang1999 Mar 1, 2024
f00f413
update requirements.txt
LeiWang1999 Mar 1, 2024
b510675
update af4 related tests.
LeiWang1999 Mar 1, 2024
8269fe3
clean test
LeiWang1999 Mar 1, 2024
96aa7d8
Merge branch 'azure/dev' of https://github.com/LeiWang1999/BitBLAS in…
LeiWang1999 Mar 1, 2024
89115d4
naive support for dynamic zeros
LeiWang1999 Mar 1, 2024
9ec85ba
move to bitdistiller
LeiWang1999 Mar 1, 2024
d6762ff
implement lop3 with zeros cpp test
LeiWang1999 Mar 1, 2024
d2c921a
implement fast decoding with zeros
LeiWang1999 Mar 1, 2024
8bce199
update zero generation support.
LeiWang1999 Mar 1, 2024
6aa00bd
Merge branch 'azure/dev' of https://github.com/LeiWang1999/BitBLAS in…
LeiWang1999 Mar 2, 2024
4e63482
Bump transformers from 4.29.2 to 4.36.0
dependabot[bot] Mar 2, 2024
d1f14b3
Merge pull request #1 from LeiWang1999/dependabot/pip/transformers-4.…
LeiWang1999 Mar 2, 2024
a3e2c6d
Bump pillow from 9.4.0 to 10.2.0
dependabot[bot] Mar 2, 2024
9a10d7f
Bump tornado from 6.2 to 6.3.3
dependabot[bot] Mar 2, 2024
4938541
Bump scipy from 1.5.3 to 1.11.1
dependabot[bot] Mar 2, 2024
2254148
Bump jinja2 from 3.1.2 to 3.1.3
dependabot[bot] Mar 2, 2024
179ef57
Merge pull request #5 from LeiWang1999/dependabot/pip/jinja2-3.1.3
LeiWang1999 Mar 2, 2024
827411c
Merge pull request #4 from LeiWang1999/dependabot/pip/scipy-1.11.1
LeiWang1999 Mar 2, 2024
e1c4263
Merge pull request #3 from LeiWang1999/dependabot/pip/tornado-6.3.3
LeiWang1999 Mar 2, 2024
2a743b6
Merge pull request #2 from LeiWang1999/dependabot/pip/pillow-10.2.0
LeiWang1999 Mar 2, 2024
f423b44
Bump pygments from 2.2.0 to 2.15.0
dependabot[bot] Mar 2, 2024
5d2d1e7
Merge pull request #6 from LeiWang1999/dependabot/pip/pygments-2.15.0
LeiWang1999 Mar 2, 2024
0b5e1db
Bump pygments from 2.13.0 to 2.15.0
dependabot[bot] Mar 2, 2024
3444ffa
Merge pull request #7 from LeiWang1999/dependabot/pip/pygments-2.15.0
LeiWang1999 Mar 2, 2024
d67b47c
Merge branch 'azure/dev' of https://github.com/LeiWang1999/BitBLAS in…
LeiWang1999 Mar 2, 2024
e8afa01
update requirements and matmul.
LeiWang1999 Mar 2, 2024
079420e
support fast decode for int8 related items
LeiWang1999 Mar 2, 2024
4ab5860
improve pass context
LeiWang1999 Mar 2, 2024
d5f63c3
update benchmark related figures.
LeiWang1999 Mar 2, 2024
d246d3f
update benchmark readme
LeiWang1999 Mar 2, 2024
9b1af43
reorganize readme
LeiWang1999 Mar 2, 2024
b44533e
refactor readme
LeiWang1999 Mar 2, 2024
883e776
update benchmark readme
LeiWang1999 Mar 2, 2024
69c79ea
refactor quant linear for bisect
LeiWang1999 Mar 2, 2024
22afeda
update tvm submodule
LeiWang1999 Mar 2, 2024
e610f73
fix blockIdx related
LeiWang1999 Mar 2, 2024
c2e1b2d
update bitditiller related.
LeiWang1999 Mar 2, 2024
4e33959
update zero type related test
LeiWang1999 Mar 3, 2024
85a841d
implement zero types support
LeiWang1999 Mar 3, 2024
bc9a284
implement zero types support
LeiWang1999 Mar 3, 2024
957774b
fix lop3 permuteta issue.
LeiWang1999 Mar 3, 2024
6b11448
fix weight executor bug.
LeiWang1999 Mar 3, 2024
302ab86
improve typing
LeiWang1999 Mar 3, 2024
dcafdc0
resolve performance related items
LeiWang1999 Mar 3, 2024
b9d07b1
add implementation for dequantization with dynamic symbolic
LeiWang1999 Mar 3, 2024
2f17e1a
fix ladder transform related issues.
LeiWang1999 Mar 4, 2024
14e27a4
improve ladder permutation for dequantization
LeiWang1999 Mar 4, 2024
75cd98e
enhance dynamic symbolic for matmul_impl
LeiWang1999 Mar 4, 2024
50d1f90
improve support for dynamic symbolic
LeiWang1999 Mar 4, 2024
981407f
update tvm dependency
LeiWang1999 Mar 4, 2024
72e7d6d
implement operator cache.
LeiWang1999 Mar 4, 2024
6c4516e
refactor print to logging
LeiWang1999 Mar 4, 2024
170eca6
append setup.py and remove tvm pythonpath dependency.
LeiWang1999 Mar 4, 2024
60bb4f8
update ignore
LeiWang1999 Mar 8, 2024
56bc140
improve installation scripts
LeiWang1999 Mar 8, 2024
a97c7bf
update scaling benchmark of 1bit
LeiWang1999 Mar 8, 2024
924b1ab
int8xint1 lop3 support.
LeiWang1999 Mar 8, 2024
97b3905
replace with to_torch_func
LeiWang1999 Mar 10, 2024
27b8ec2
license related fix
LeiWang1999 Mar 11, 2024
dfa4650
update contributing.md
LeiWang1999 Mar 11, 2024
7b61964
autogptq support.
LeiWang1999 Mar 11, 2024
6e7bdaf
Merge branch 'main' of https://github.com/microsoft/BitBLAS into azur…
LeiWang1999 Mar 11, 2024
068b24e
refactor docs
LeiWang1999 Mar 11, 2024
88a5e4a
refactor docs
LeiWang1999 Mar 11, 2024
7b8bfb5
refactor
LeiWang1999 Mar 11, 2024
2e500f9
refactor docs
LeiWang1999 Mar 11, 2024
a7e54e7
typo fix
LeiWang1999 Mar 11, 2024
7a5b0f2
implement disk cache
LeiWang1999 Mar 12, 2024
6f0260e
refactor codegen to get_source
LeiWang1999 Mar 13, 2024
c071921
support get weight shape.
LeiWang1999 Mar 13, 2024
fda8412
Update dependabot.yml
LeiWang1999 Mar 14, 2024
48d9dbd
Update dependabot.yml
LeiWang1999 Mar 14, 2024
d97e0b6
Update dependabot.yml
LeiWang1999 Mar 14, 2024
45f8666
Update dependabot.yml
LeiWang1999 Mar 14, 2024
2e4e764
Update dependabot.yml
LeiWang1999 Mar 14, 2024
150b93d
Update requirements.txt
LeiWang1999 Mar 14, 2024
652a9a4
Update requirements.txt
LeiWang1999 Mar 14, 2024
061027d
Update requirements.txt
LeiWang1999 Mar 14, 2024
56e1614
refactor propagate into transform kind
LeiWang1999 Mar 14, 2024
193245f
Update dependabot.yml
LeiWang1999 Mar 14, 2024
1c7552c
implement scale and zero layout propagation
LeiWang1999 Mar 15, 2024
d52cb34
typo fix
LeiWang1999 Mar 15, 2024
65bfc02
refactor codes
LeiWang1999 Mar 15, 2024
d6c1da6
fix performance issue of dequantize propagate
LeiWang1999 Mar 15, 2024
68b0081
refactor print
LeiWang1999 Mar 17, 2024
e3e2aff
fix gemv scale bugs
LeiWang1999 Mar 17, 2024
76a530e
refactor ops configs
LeiWang1999 Mar 17, 2024
f02fa9a
improve tensor_adapter
LeiWang1999 Mar 17, 2024
874e577
implement trick wrapper for integration
LeiWang1999 Mar 17, 2024
58d735c
code refactor
LeiWang1999 Mar 18, 2024
d9a20b8
Merge branch 'azure/dev' of https://github.com/LeiWang1999/BitBLAS in…
LeiWang1999 Mar 18, 2024
4a558f6
SUPPORT.md commit
LeiWang1999 Mar 19, 2024
8ecd315
spell check
LeiWang1999 Mar 19, 2024
b5df24f
improve for linting
LeiWang1999 Mar 19, 2024
c2090b5
overal lint improvements
LeiWang1999 Mar 19, 2024
3490fd7
Add copyright and license information
LeiWang1999 Mar 19, 2024
6e2baa5
improve contributing
LeiWang1999 Mar 19, 2024
a4ffcef
Fix PYTHONPATH export in installation script and update BitBLAS package
LeiWang1999 Mar 20, 2024
25696d8
Update benchmark section in README.md
LeiWang1999 Mar 20, 2024
9570be6
Update performance benchmarks and integration details
LeiWang1999 Mar 20, 2024
f18b632
Fix typo in README.md
LeiWang1999 Mar 20, 2024
d6dd73a
Refactor index map logging in matmul_analysis.py
LeiWang1999 Mar 21, 2024
ea007e1
Add .ruff_cache to .gitignore
LeiWang1999 Mar 21, 2024
9396aa8
Merge branch 'azure/dev' of https://github.com/LeiWang1999/BitBLAS in…
LeiWang1999 Mar 21, 2024
9dc4510
Add _tir_u32_to_f4_to_f16 function to quantization module
LeiWang1999 Mar 21, 2024
22fa9aa
Update performance benchmark images
LeiWang1999 Mar 21, 2024
f1b57e9
Update benchmark configurations
LeiWang1999 Mar 21, 2024
9c4d2a9
Update benchmark information in README.md
LeiWang1999 Mar 21, 2024
f7faa46
Refactor code for improved performance and readability
LeiWang1999 Mar 21, 2024
743e51d
convolution impl support
LeiWang1999 Mar 21, 2024
685d414
Refactor convolution2d_impl.py and test_auto_normalized_tensorcore.py
LeiWang1999 Mar 22, 2024
da31d2e
Fix code formatting and remove unnecessary code
LeiWang1999 Mar 23, 2024
714788b
Update TensorCore GEMM Performance Comparison
LeiWang1999 Mar 23, 2024
18d8cfd
Update TensorCore GEMM performance comparison on A100 and RTX4090
LeiWang1999 Mar 24, 2024
c00796e
Refactor propagate_inputs method in TensorCorePolicy
LeiWang1999 Mar 25, 2024
bc28ac7
Fix BitBLAS import and remove debug print statements
LeiWang1999 Mar 26, 2024
92fe39c
Add end-to-end integration with Quantize Inference Kernel for AutoGPT…
LeiWang1999 Mar 26, 2024
af09e69
Fix import order and handle exception in benchmark scripts
LeiWang1999 Mar 26, 2024
f3c1b47
Update TVM subproject commit
LeiWang1999 Mar 26, 2024
468424f
Update TileDevice class names in bitblas package
LeiWang1999 Mar 26, 2024
27c00bd
Update imports in roller module
LeiWang1999 Mar 26, 2024
efa1e44
Update images
LeiWang1999 Mar 26, 2024
1736543
Update images
LeiWang1999 Mar 26, 2024
7c6ac71
Update end2end_llama_13b_vllm.png
LeiWang1999 Mar 26, 2024
8ac5893
Update trademark and acknowledgement section
LeiWang1999 Mar 26, 2024
901f8fb
Update benchmark images for consistent GEMM operations
LeiWang1999 Mar 26, 2024
be67f26
Add test case for decoding UInt4 to Float16 with scaling and zeros qu…
LeiWang1999 Mar 30, 2024
5dbcdef
Remove benchmarking code for int4 on a specific target
LeiWang1999 Apr 1, 2024
56a8128
Update image files and add new functions for quantization and rasteri…
LeiWang1999 Apr 1, 2024
7e5c6ed
fix rescale and original lop3.
LeiWang1999 Apr 2, 2024
f5df4cf
Add integration example of FasterTransformers with BitBLAS
LeiWang1999 Apr 2, 2024
243fb59
Update integration example of FasterTransformer with BitBLAS
LeiWang1999 Apr 2, 2024
9d4ddc7
Update requirements-dev.txt and requirements.txt
LeiWang1999 Apr 3, 2024
dff007c
Add LLVM download and extraction functionality
LeiWang1999 Apr 3, 2024
734ce67
Update FasterTransformer.gif
LeiWang1999 Apr 5, 2024
1516d27
Update BitBLAS version and requirements
LeiWang1999 Apr 5, 2024
6ad4562
Update BitBLAS import paths and add support for installing and develo…
LeiWang1999 Apr 6, 2024
1f23cd1
Add GPU intrinsics module for BitBLAS
LeiWang1999 Apr 6, 2024
081d899
Update requirements-dev.txt and requirements.txt
LeiWang1999 Apr 6, 2024
fe83369
Refactor import paths in BitBLAS GPU modules
LeiWang1999 Apr 6, 2024
f397a04
Update installation guide in Installation.md
LeiWang1999 Apr 6, 2024
b59eb60
Refactor MatmulConfig class in matmul.py for improved readability and…
LeiWang1999 Apr 6, 2024
75d764e
Refactor MatmulConfig class in matmul.py for improved readability and…
LeiWang1999 Apr 6, 2024
752966d
Refactor MatmulConfig class in matmul.py for improved readability and…
LeiWang1999 Apr 6, 2024
fcf4c83
Update installation guide and QuickStart link in README.md
LeiWang1999 Apr 6, 2024
d1f002c
Update installation guide and QuickStart link in README.md
LeiWang1999 Apr 6, 2024
2a20950
Append Default Schedule Fallback
LeiWang1999 Apr 7, 2024
6b5b8f9
Refactor requirements-dev.txt and fix newline issue in arch_base.py
LeiWang1999 Apr 7, 2024
8e473fa
Fix typo in check_mit_license.sh
LeiWang1999 Apr 7, 2024
e0b4ce2
imrpove the target detection.
LeiWang1999 Apr 7, 2024
6550c99
Improve target detection and fix typos in code
LeiWang1999 Apr 7, 2024
efee0e3
Fix auto-inline spacing issue in MatmulTensorizationMMAWithDequantize…
LeiWang1999 Apr 7, 2024
52e54b2
Improve target detection and fix typos in code
LeiWang1999 Apr 7, 2024
6cdd59d
transform to submit
LeiWang1999 Apr 8, 2024
ca78084
Add support for weight_dtype transformation in MatmulWeightOnlyDequan…
LeiWang1999 Apr 8, 2024
8b639d7
Update zeros_type to zeros_mode in code blocks
LeiWang1999 Apr 8, 2024
0cf5a85
update README
xysmlx Apr 8, 2024
a2112c7
update README
xysmlx Apr 8, 2024
a35ee17
Fix import errors and update paths in code
LeiWang1999 Apr 8, 2024
e92bbaa
Merge branch 'azure/dev' of https://dev.azure.com/msrasrg/LLMInferenc…
LeiWang1999 Apr 8, 2024
525ef72
Update variable names in test_bitblas_linear.py and __init__.py
LeiWang1999 Apr 8, 2024
8f14696
Update imports and add new function in quantization and cache modules
LeiWang1999 Apr 8, 2024
18e7b71
Update README with support matrix table
LeiWang1999 Apr 9, 2024
c7bba6e
Update support matrix table and benchmark configurations
LeiWang1999 Apr 9, 2024
50e96e5
Update support matrix table and benchmark configurations
LeiWang1999 Apr 9, 2024
c370284
Update support matrix table and benchmark configurations
LeiWang1999 Apr 9, 2024
6b0a473
Update support matrix table and benchmark configurations
LeiWang1999 Apr 9, 2024
e269f36
Update support matrix table and benchmark configurations
LeiWang1999 Apr 9, 2024
e45bdac
Update import statements and add new functions in quantization and ca…
LeiWang1999 Apr 9, 2024
496a4bd
Fix default dynamic range for M in MatmulConfig
LeiWang1999 Apr 9, 2024
e7c9707
Update support matrix table with new tested platforms and Out_dtype c…
LeiWang1999 Apr 9, 2024
d82e186
Refactor code for mixed-precision matrix multiplication and update su…
LeiWang1999 Apr 9, 2024
b114f42
Refactor code for mixed-precision matrix multiplication and update su…
LeiWang1999 Apr 9, 2024
1f687f6
Update MatmulConfig initialization in QuickStart.md
LeiWang1999 Apr 9, 2024
6c7f36b
Update support matrix table with new tested platforms and INT32/FP16/…
LeiWang1999 Apr 9, 2024
0d056bc
Refactor code for mixed-precision matrix multiplication and update su…
LeiWang1999 Apr 9, 2024
1661dfb
Update link to code implementation in QuickStart.md
LeiWang1999 Apr 9, 2024
be5eb5d
Disable tuning for initial bitblas operator creation
LeiWang1999 Apr 9, 2024
4a719e1
Update linear transformation description in PythonAPI.md
LeiWang1999 Apr 9, 2024
67df65f
Update MatmulConfig in PythonAPI.md
LeiWang1999 Apr 9, 2024
292c5bb
convert af format to nf
LeiWang1999 Apr 9, 2024
8acf34c
Enable hardware-aware tuning for bitblas operators
LeiWang1999 Apr 9, 2024
cf893e9
Refactor code for mixed-precision matrix multiplication and update su…
LeiWang1999 Apr 9, 2024
13c3b8e
Update support matrix table with new tested platforms and INT32/FP16/…
LeiWang1999 Apr 9, 2024
add2972
Update OperatorConfig.md with matrix multiplication configuration det…
LeiWang1999 Apr 9, 2024
77a0caa
code refactor
LeiWang1999 Apr 9, 2024
f095194
Fix capitalization in QuickStart.md
LeiWang1999 Apr 9, 2024
bf6695d
update ReadME
LeiWang1999 Apr 9, 2024
945deb1
Refactor setup.py to remove unnecessary code and improve readability
LeiWang1999 Apr 9, 2024
a6659df
refactor infeatures to infeatures
LeiWang1999 Apr 10, 2024
6a6a92e
update README.md
xysmlx Apr 10, 2024
6b8a34f
Merge remote-tracking branch 'origin/azure/dev' into azure/dev
xysmlx Apr 10, 2024
bf4dc38
Fix incorrect data type mapping in general_matmul.py
LeiWang1999 Apr 10, 2024
55e723c
update doc
xysmlx Apr 10, 2024
bf49925
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
612fccc
Merge branch 'azure/dev' of vs-ssh.visualstudio.com:v3/msrasrg/LLMInf…
xysmlx Apr 10, 2024
e3555a9
uncomments some case
LeiWang1999 Apr 10, 2024
84399ca
Add BITBLAS_DATABASE_PATH constant to OperatorCache and update load_g…
LeiWang1999 Apr 10, 2024
1d01c05
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
c65290e
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
240a4a3
Update dependencies in requirements-dev.txt and requirements.txt
LeiWang1999 Apr 10, 2024
a434cef
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
284122c
Fix BITBLAS_DATABASE_PATH constant assignment in OperatorCache
LeiWang1999 Apr 10, 2024
a4de3ed
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
626a881
Refactor variable names in bitblas_linear.py and bitblas_quant_linear.py
LeiWang1999 Apr 10, 2024
881badc
Merge branch 'azure/dev' of https://dev.azure.com/msrasrg/LLMInferenc…
LeiWang1999 Apr 10, 2024
a2f5ddf
update install
LeiWang1999 Apr 10, 2024
25c2939
Refactor variable names in setup.py and build_tvm function
LeiWang1999 Apr 11, 2024
36bd0e6
append linear benchmark scripts
LeiWang1999 Apr 11, 2024
ec4720e
simple bug fix
LeiWang1999 Apr 11, 2024
8f55d4f
Update BitBLAS installation instructions for Ubuntu 20.04
LeiWang1999 Apr 11, 2024
851e38a
Refactor variable names and add output print statements for debugging
LeiWang1999 Apr 11, 2024
218d2e6
Refactor variable names and update dependencies
LeiWang1999 Apr 11, 2024
cc6fbde
Update BitBLAS installation instructions for Ubuntu 20.04 and add not…
LeiWang1999 Apr 11, 2024
14d5cf0
Refactor logging handler and set log level in BitBLAS module
LeiWang1999 Apr 15, 2024
970f535
Bump version to 0.0.1
LeiWang1999 Apr 15, 2024
2dc1604
Merge branch 'main' into azure/dev
LeiWang1999 Apr 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,10 @@ models/frozenmodels/
.pytest_cache

# .hypothesis
.hypothesis
.hypothesis

# .ruff_cache
.ruff_cache

# .bitblas_database
.bitblas_database
3 changes: 3 additions & 0 deletions 3rdparty/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
clang*

llvm*
2 changes: 1 addition & 1 deletion 3rdparty/tvm
Submodule tvm updated 35 files
+34 −0 include/tvm/runtime/c_runtime_api.h
+6 −0 include/tvm/runtime/ndarray.h
+3 −0 include/tvm/tir/schedule/schedule.h
+47 −0 include/tvm/tir/stmt.h
+4 −0 include/tvm/tir/stmt_functor.h
+2 −0 include/tvm/tir/transform.h
+22 −3 python/tvm/tir/schedule/schedule.py
+9 −0 python/tvm/tir/transform/transform.py
+1 −0 src/driver/driver_api.cc
+1 −0 src/relay/printer/text_printer.h
+6 −0 src/relay/printer/tir_text_printer.cc
+7 −0 src/relay/printer/tvmscript_printer.cc
+50 −0 src/runtime/ndarray.cc
+7 −0 src/script/printer/legacy_repr.cc
+1 −0 src/script/printer/tir/stmt.cc
+32 −4 src/target/source/codegen_c.cc
+1 −0 src/target/source/codegen_c.h
+3 −3 src/target/source/codegen_cuda.cc
+5 −0 src/tir/analysis/device_constraint_utils.cc
+12 −0 src/tir/ir/stmt.cc
+8 −0 src/tir/ir/stmt_functor.cc
+4 −0 src/tir/ir/tir_visitor_with_path.cc
+1 −0 src/tir/ir/tir_visitor_with_path.h
+6 −0 src/tir/schedule/concrete_schedule.cc
+1 −0 src/tir/schedule/concrete_schedule.h
+11 −0 src/tir/schedule/primitive.h
+165 −0 src/tir/schedule/primitive/rewrite_buffer_access.cc
+6 −0 src/tir/schedule/schedule.cc
+11 −1 src/tir/schedule/traced_schedule.cc
+1 −0 src/tir/schedule/traced_schedule.h
+123 −0 src/tir/transforms/inject_customized_code.cc
+6 −2 src/tir/transforms/inject_permuted_layout.cc
+6 −0 src/tir/transforms/inject_software_pipeline.cc
+13 −4 src/tir/transforms/lower_device_kernel_launch.cc
+189 −0 tests/python/tir-transform/test_tir_transform_inject_customized_code.py
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Please ask questions in issues.

All pull requests are super welcomed and greatly appreciated! Issues in need of a solution are marked with a [`♥ help`](https://github.com/ianstormtaylor/BitBLAS/issues?q=is%3Aissue+is%3Aopen+label%3A%22%E2%99%A5+help%22) label if you're looking for somewhere to start.

Please run `./format.sh` before submitting a pull request to make sure that your code is formatted correctly.

Please include tests and docs with every pull request!

## Repository Setup
Expand Down
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
recursive-include 3rdparty/tvm *
recursive-exclude 3rdparty/tvm/build *
recursive-exclude 3rdparty/clang* *
recursive-exclude 3rdparty/llvm* *
93 changes: 66 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,83 @@
# BitBLAS

BitBLAS is a lightweight framework designed to generate high-performance CUDA/HIP code for BLAS operators, featuring swizzling and layout propagation. It achieves performance comparable to vendor libraries across various platforms and hardware. BitBLAS aims to assist algorithm developers working on projects like BitNet, GPTQ, and similar endeavors by enabling the rapid implementation of accelerated kernels and their efficient deployment.
BitBLAS is a library to support mixed-precision BLAS operations on GPUs, for example, the $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication where $C_{cdtype}[M, N] = A_{adtype}[M, K] \times W_{wdtype}[N, K]$.
BitBLAS aims to support efficient mixed-precision DNN model deployment, especially the $W_{wdtype}A_{adtype}$ quantization in large language models (LLMs), for example, the $W_{INT4}A_{FP16}$ in [GPTQ](https://arxiv.org/abs/2210.17323), the $W_{INT2}A_{FP16}$ in [BitDistiller](https://arxiv.org/abs/2402.10631), the $W_{INT1}A_{INT8}$ and $W_{INT2}A_{INT8}$ in [BitNet](https://arxiv.org/abs/2310.11453) and [BitNet-b1.58](https://arxiv.org/abs/2402.17764). BitBLAS is based on techniques from our accepted submission at OSDI'24.


Some of the key features of BitBLAS include:
- Auto Tensorize compute with TensorCore-like hardware instructions.
- High Performance (Not only FP16xFP16, INT8xINT8, but also FP16xINT4/2/1, INT8xINT4/2/1).
- With the flexible DSL (TIR Script) to effortlessly craft domain-specific kernels for your situations.
- Support with dynamic symbolic throuth tvm unity -> generate source code with dynamic shape.
- BitBLAS first proposed int8xint1 gemv/gemm with 10x/2x speedup over float16xfloat16 on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed input scaling benchmark results.
- High performance matrix multiplication for both GEMV (e.g., the single batch auto-regressive decode phase in LLM) and GEMM (e.g., the batched auto-regressive decode phase and the prefill phase in LLM):
- $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication including FP16xINT4/2/1, INT8xINT4/2/1, etc. Please checkout [support matrix](#support-matrix) for detailed data types support.
- Matrix multiplication like FP16xFP16 and INT8xINT8.
- Auto-Tensorization for TensorCore-like hardware instructions.
- Implemented [integration](./integration/) to [PyTorch](https://pytorch.org/), [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) and [vLLM](https://github.com/vllm-project/vllm) for LLM deployment. Please checkout [benchmark summary](#benchmark-summary) for detailed end2end LLM inference performance.
- BitBLAS first implemented $W_{INT1}A_{INT8}$ GEMV/GEMM with 10x/2x speedup over $W_{FP16}A_{FP16}$ on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed benchmark results.
- Support customizing mixed-precision DNN operations for your specific scenarios via the flexible DSL (TIR Script).

## Integration Example of FasterTransformer with BitBLAS
![FasterTransformer Integration](images/gif/FasterTransformer.gif)


## Benchmark Summary

BitBLAS achieves exceptional performance across a variety of computational patterns. Below are selected results showcasing its capabilities:

- End2End Integration with Quantize Inference Kernel for AutoGPTQ and vLLM.

<div>
<img src="./images/figures/end2end_llama_13b_auto_gptq.png" alt="AutoGPTQ end2end performance of llama13b on A100" style="width: 24%;" />
<img src="./images/figures/end2end_llama_70b_auto_gptq.png" alt="AutoGPTQ end2end performance of llama13b on A100" style="width: 24%;" />
<img src="./images/figures/end2end_llama_13b_vllm.png" alt="vLLM end2end performance of llama13b on A100" style="width: 24%;" />
<img src="./images/figures/end2end_llama_70B_vllm.png" alt="vLLM end2end performance of llama13b on A100" style="width: 24%;" />
</div>

- Weight Only Matmul performance on A100

<div>
<img src="./images/figures/op_benchmark_a100_wq_gemv_e7.png" alt="gemm weight only performance on A100" style="width: 49%;" />
<img src="./images/figures/op_benchmark_a100_wq_gemm_e7.png" alt="gemm weight only performance on A100" style="width: 49%;" />
</div>

## Benchmark
BitBLAS can achieve optimal performance across various compute patterns:

- GTX 3090
- FLOAT16xFLOAT16 with TensorCore ![3090-gemm-fp16](./images/figures/op_benchmark_3090_fp16_gemm.png)
- INT8xINT8 with TensorCore ![3090-gemm-s8](./images/figures/op_benchmark_3090_s8_gemm.png)
- FLOAT16xAF4(LUT4) GEMV ![3090-af4-gemv](./images/figures/op_benchmark_3090_af4_gemv.png)
- FLOAT16xAF4(LUT4) with TensorCore ![3090-af4-gemm](./images/figures/op_benchmark_3090_af4_gemm.png)
- TensorCore FP16/INT8 GEMM Performance Vs. Vendor Library on A100 and RTX4090

- A100
- WeightOnly GEMV ![a100-wq-gemv](./images/figures/op_benchmark_a100_wq_gemv.png)
- WeightOnly GEMM with TensorCore ![a100-wq-gemm](./images/figures/op_benchmark_a100_wq_gemm.png)
<div>
<img src="./images/figures/op_benchmark_consistent_gemm_fp16.png" alt="gemm fp16 performance on 4090 and a100" style="width: 49%;" />
<img src="./images/figures/op_benchmark_consistent_gemm_int8.png" alt="gemm int8 performance on 4090 and a100" style="width: 49%;" />
</div>

See more details in our [benchmark](./benchmark) directory.
For more detailed information on benchmark sets with other formats (NF4/FP4) and other devices (GTX 3090), please refer to the [benchmark](./benchmark/README.md).

## Support Matrix

| **A_dtype** | **W_dtype** | **Accum_dtype** | **Out_dtype** | **BitBLAS<br>Support** | **Tested<br>Platform** |
|:-----------:|:-----------:|:---------------:|:---------------:|:----------------------:|:----------------------:|
| FP16 | FP16 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | FP4_E2M1 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | INT8 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | INT4 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | INT2 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | INT1 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| FP16 | NF4 | FP16 | FP16 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| INT8 | INT8 | INT32 | FP32/INT32/FP16/INT8 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| INT8 | INT4 | INT32 | FP32/INT32/FP16/INT8 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| INT8 | INT2 | INT32 | FP32/INT32/FP16/INT8 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |
| INT8 | INT1 | INT32 | FP32/INT32/FP16/INT8 | **√** | V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89) |

We are continuously expanding the support matrix. If you have any specific requirements, please feel free to open an issue or PR.

## Getting Started

- Installation:
To manually install BitBLAS, please checkout `maint/scripts/installation.sh`. Also Make sure you already have the cuda toolkit (version >= 11) installed in the system. Or you can install from `python setup.py install` or `pip install .` in the root directory.
- [Installation](./docs/Installation.md):
To install BitBLAS, please checkout the document [installation](./docs/Installation.md). Also Make sure you already have the cuda toolkit (version >= 11) installed in the system. Or you can easily install from `pip install bitblas` in the root directory.

- [QuickStart](./docs/QuickStart.md): BitBLAS provides two Python APIs to perform mixed-precision matrix multiplication:
- ```bitblas.Matmul``` implements the $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication of $C_{cdtype}[M, N] = A_{adtype}[M, K] \times W_{wdtype}[N, K]$.
- ```bitblas.Linear``` is a PyTorch ```nn.Linear```-like module to support a Linear of mixed-precision.

- [QuickStart](./docs/QuickStart.md): We provide two primary ways to do the code generation: using a high-level DSL (TensorIR Script), or using packed Operators, from the quick start guide, you can learn how to use BitBLAS to generate high performance kernels with both methods.
- [Integration](./integration/): Explore how BitBLAS seamlessly integrates with LLM deployment frameworks through our examples. Discover the ease of integrating BitBLAS with PyTorch, AutoGPTQ, and vLLM in the 3rd-party integration examples.

- [Customization](./docs/ExtendOperatorsWithDSL.md): BitBLAS supports implementing customized mixed-precision DNN operations rather than matrix multiplication with the flexible DSL (TIR Script).

- [3rd Party Integration](./integration/): BitBLAS can also be easily integrated to other frameworks, the integration provides some examples of integrating BitBLAS with PyTorch, AutoGPTQ and vLLM.

## Contributing

Expand All @@ -46,9 +91,3 @@ This project has adopted the Microsoft Open Source Code of Conduct. For more inf

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

## Acknowledgement

We learned a lot from the following projects.

- [Apache TVM](https://github.com/apache/tvm): BitBLAS havs adopted TensorIR as our DSL. Additionally, we have customized TVM from the unity branch to incorporate specific features that were required for our project.
- [Microsoft Roller](https://github.com/microsoft/nnfusion/tree/roller): The design and algo inspiration of hardware aware tuning in BitBLAS comes from Roller,.
36 changes: 20 additions & 16 deletions SUPPORT.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
# TODO: The maintainer of this repo has not yet edited this file
# Support

**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
Welcome to the BitBLAS support page! BitBLAS is a cutting-edge framework designed for generating high-performance CUDA/HIP code for BLAS operators. Whether you're working on projects like BitNet, GPTQ, or similar, BitBLAS is here to accelerate your development with its robust features.

- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
- **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.
## How to File Issues and Get Help

*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
### Reporting Bugs or Requesting Features

# Support
If you encounter a bug or have a feature request, we encourage you to file an issue through our GitHub Issues page. Please follow these steps:

1. **Search Existing Issues**: Before creating a new issue, please search the existing ones to avoid duplicates.
2. **Create a New Issue**: If your issue is new, go ahead and file it as a new issue. Provide as much detail as possible to help us understand and address it efficiently.

### Seeking Help and Questions

For questions and help with using BitBLAS, we offer the following channels:

- **GitHub Discussions**: For community support, sharing ideas, and discussing best practices, please visit our [GitHub Discussions](https://github.com/YOUR_REPO/discussions).
- **Stack Overflow**: Use the tag `BitBLAS` when posting questions. This is monitored by our team and the community.

## How to file issues and get help
## Microsoft Support Policy

This project uses GitHub Issues to track bugs and feature requests. Please search the existing
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.
Support for BitBLAS is primarily provided through the above-mentioned community channels. We strive to address issues and questions in a timely manner, leveraging the collective knowledge and experience of the BitBLAS community.

For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
## Contributing to BitBLAS

## Microsoft Support Policy
We warmly welcome contributions to the BitBLAS project. Whether it's improving the documentation, adding new features, or fixing bugs, your contributions are invaluable to us. Please refer to our [CONTRIBUTING.md](./CONTRIBUTING.md) file for more details on how to contribute.

Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
Before submitting a pull request, you may need to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. The CLA process is straightforward and only needs to be completed once.
Loading
Loading