Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make native code portable and add GitHub workflow for building #949

Merged
merged 61 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
b263fc6
Make native code portable and add GitHub workflow for building
rickardp Jan 2, 2024
de10f7e
Removed deprecated Python versions
rickardp Jan 2, 2024
0c62fa6
Merge branch 'main' into portable
rickardp Jan 13, 2024
aae5ff7
Update python-package.yml
rickardp Jan 25, 2024
b06590d
Update python-package.yml
rickardp Jan 25, 2024
03744cb
Update python-package.yml
rickardp Jan 25, 2024
648e2f5
Update python-package.yml
rickardp Jan 25, 2024
cba2b1a
Update python-package.yml
rickardp Jan 25, 2024
6f70a5e
Update python-package.yml
rickardp Jan 25, 2024
90fa8b1
Update python-package.yml
rickardp Jan 25, 2024
c815ca0
Update python-package.yml
rickardp Jan 25, 2024
36b1ef2
Do not test on Python 3.13 until released
rickardp Jan 25, 2024
44e3f17
Update python-package.yml
rickardp Jan 25, 2024
6fe8d0c
Update python-package.yml
rickardp Jan 26, 2024
572225e
Update python-package.yml
rickardp Jan 26, 2024
7a8676e
Update python-package.yml
rickardp Jan 26, 2024
771f6db
Merge branch 'main' into portable
rickardp Jan 28, 2024
8dd8d63
Refactor build stage
rickardp Jan 28, 2024
8b1ceb7
Fixed breaking actions change
rickardp Jan 28, 2024
e11867b
Slim down Windows cuda
rickardp Jan 28, 2024
57625db
Create dependabot.yml
rickardp Jan 28, 2024
ada2e9a
Bespoke local dev requirements.txt
rickardp Jan 28, 2024
e0093e9
Enable VS integration
rickardp Jan 28, 2024
23bdf05
Group Dependabot updates
rickardp Jan 28, 2024
87414c3
Cleanup
rickardp Jan 28, 2024
0ee8f7f
Update python-package.yml
rickardp Jan 28, 2024
1f35064
Merge branch 'main' into portable
rickardp Jan 31, 2024
816eee0
Reinstate file that was wrongly merged
rickardp Jan 31, 2024
0528324
Fixed regression caused by new version of download-artifact
rickardp Jan 31, 2024
8152e21
Update python-package.yml
rickardp Feb 1, 2024
9aad25a
Update python-package.yml
rickardp Feb 1, 2024
bcc6780
Fix matrix
rickardp Feb 1, 2024
2951e2c
Update python-package.yml
rickardp Feb 2, 2024
9867392
Merge
rickardp Feb 2, 2024
b773dfb
Pipeline
rickardp Feb 2, 2024
45ad394
Fixed conflict
rickardp Feb 2, 2024
e2e4874
Fixed conflict
rickardp Feb 2, 2024
59a1000
Update CMakeLists.txt
rickardp Feb 2, 2024
41ddd25
Fixed merge error
rickardp Feb 3, 2024
b5b6151
cleanup
rickardp Feb 3, 2024
ca5f14a
cleanup
rickardp Feb 3, 2024
cd8343e
Merge pull request #12 from rickardp/portable-merge
rickardp Feb 3, 2024
2456cf3
Merge remote-tracking branch 'origin/main' into portable
rickardp Feb 3, 2024
b460125
Find CUDA
rickardp Feb 3, 2024
7a605e1
Fix
rickardp Feb 3, 2024
28188a5
Fixing merge error from latest merge from main
rickardp Feb 3, 2024
86b2bd6
Fix setup.py
rickardp Feb 3, 2024
01c3f59
Fixed typo in artifact name
rickardp Feb 3, 2024
e4344b0
Remove linker flags
rickardp Feb 3, 2024
e2e8f87
Merge pull request #13 from rickardp/rickardp-patch-1
rickardp Feb 3, 2024
2ba8be3
Build nocublaslt versions
rickardp Feb 3, 2024
7ed6a01
Merge pull request #14 from rickardp/nocublaslt
rickardp Feb 3, 2024
eba19a8
Merge branch 'main' into portable
rickardp Feb 4, 2024
3288a0f
Fixed formatting
rickardp Feb 4, 2024
fdddb11
Fixed VS Code format on save
rickardp Feb 4, 2024
b7503c9
Ran format on save from VScode
rickardp Feb 4, 2024
fb642a5
Re-saved the json files using the new settings
rickardp Feb 4, 2024
2730dd9
Re-saved CMakeLists.txt to get formatting right
rickardp Feb 4, 2024
2e3a1d8
Add path filter
rickardp Feb 4, 2024
590a27a
Merge branch 'main' into portable
rickardp Feb 5, 2024
927f716
Formatting
rickardp Feb 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
name: Python package

on:
push:
branches: [ "*" ]
rickardp marked this conversation as resolved.
Show resolved Hide resolved
pull_request:
branches: [ master ]
rickardp marked this conversation as resolved.
Show resolved Hide resolved
release:
types: [ published ]

jobs:

##
# This job matrix builds the non-CUDA versions of the libraries for all supported platforms.
##
build-shared-libs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
arch: [x86_64, aarch64]
exclude:
- os: windows-latest # This probably requres arm64 Windows agents
arch: aarch64
runs-on: ${{ matrix.os }} # One day, we could run them on native agents. Azure supports this now but it's planned only for Q3 2023 for hosted agents
steps:
# Check out code
- uses: actions/checkout@v3
rickardp marked this conversation as resolved.
Show resolved Hide resolved
# On Linux we use CMake within Docker
- name: Setup cmake
uses: jwlawson/[email protected]
with:
cmake-version: '3.26.x'
- name: Add msbuild to PATH
uses: microsoft/[email protected]
if: ${{ startsWith(matrix.os, 'windows') }}
# Compile C++ code
- name: Build C++
shell: bash
run: |
set -ex
build_os=${{ matrix.os }}
build_arch=${{ matrix.arch }}
( git clone https://github.com/NVlabs/cub ./dependencies/cub; cd dependencies/cub; git checkout 1.11.0 )
rickardp marked this conversation as resolved.
Show resolved Hide resolved
if [ ${build_os:0:6} == ubuntu -a ${build_arch} == aarch64 ]; then
# Allow cross-compile om aarch64
sudo apt-get install -y gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu
fi
if [ ${build_os:0:5} == macos -a ${build_arch} == aarch64 ]; then
cmake -DCMAKE_OSX_ARCHITECTURES=arm64 -DENABLE_CUDA=OFF .
else
cmake -DENABLE_CUDA=OFF .
fi
if [ ${build_os:0:7} == windows ]; then
pwsh -Command "msbuild bitsandbytes.vcxproj /property:Configuration=Release"
else
make
fi
mkdir -p output/${{ matrix.os }}/${{ matrix.arch }}
( shopt -s nullglob && cp bitsandbytes/*.{so,dylib,dll} output/${{ matrix.os }}/${{ matrix.arch }}/ )
- name: Upload build artifact
uses: actions/upload-artifact@v3
with:
name: shared_library
path: output/*
retention-days: 7
##
# This job matrix builds the CUDA versions of the libraries for platforms that support CUDA (Linux x64/aarch64 + Windows x64)
##
build-shared-libs-cuda:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
arch: [x86_64, aarch64]
cuda_version: ['12.1.0']
exclude:
- os: windows-latest # This probably requres arm64 Windows agents
arch: aarch64
runs-on: ${{ matrix.os }} # One day, we could run them on native agents. Azure supports this now but it's planned only for Q3 2023 for hosted agents
steps:
# Check out code
- uses: actions/checkout@v3
rickardp marked this conversation as resolved.
Show resolved Hide resolved
# Linux: We use Docker to build cross platform Cuda (aarch64 is built in emulation)
- name: Set up Docker multiarch
if: startsWith(matrix.os, 'ubuntu')
uses: docker/setup-qemu-action@v2
# On Linux we use CMake within Docker
- name: Setup cmake
if: ${{ !startsWith(matrix.os, 'linux') }}
uses: jwlawson/[email protected]
with:
cmake-version: '3.26.x'
# Windows: We install Cuda on the agent (slow)
rickardp marked this conversation as resolved.
Show resolved Hide resolved
- uses: Jimver/[email protected]
if: startsWith(matrix.os, 'windows')
id: cuda-toolkit
with:
cuda: ${{ matrix.cuda_version }}
method: 'local'
#sub-packages: '["nvcc","cudart","nvrtc_dev","cublas_dev","cusparse_dev","visual_studio_integration"]'
- name: Add msbuild to PATH
uses: microsoft/[email protected]
if: ${{ startsWith(matrix.os, 'windows') }}
# Compile C++ code
- name: Build C++
shell: bash
run: |
set -ex
build_os=${{ matrix.os }}
build_arch=${{ matrix.arch }}
( git clone https://github.com/NVlabs/cub ./dependencies/cub; cd dependencies/cub; git checkout 1.11.0 )
rickardp marked this conversation as resolved.
Show resolved Hide resolved
if [ ${build_os:0:6} == ubuntu ]; then
image=nvidia/cuda:${{ matrix.cuda_version }}-devel-ubuntu22.04
echo "Using image $image"
docker run --platform linux/$build_arch -i -w /src -v $PWD:/src $image sh -c \
"apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends cmake \
&& cmake -DENABLE_CUDA=ON . \
&& make"
else
cmake -DENABLE_CUDA=ON .
pwsh -Command "msbuild bitsandbytes.vcxproj /property:Configuration=Release"
fi
mkdir -p output/${{ matrix.os }}/${{ matrix.arch }}
( shopt -s nullglob && cp bitsandbytes/*.{so,dylib,dll} output/${{ matrix.os }}/${{ matrix.arch }}/ )
- name: Upload build artifact
uses: actions/upload-artifact@v3
with:
name: shared_library
path: output/*
retention-days: 7
build-wheels:
needs:
- build-shared-libs
- build-shared-libs-cuda
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
rickardp marked this conversation as resolved.
Show resolved Hide resolved
arch: [x86_64, aarch64]
exclude:
- os: windows-latest # This probably requres arm64 Windows agents
arch: aarch64
runs-on: ${{ matrix.os }}
steps:
# Check out code
- uses: actions/checkout@v3
rickardp marked this conversation as resolved.
Show resolved Hide resolved
# Download shared libraries
- name: Download build artifact
uses: actions/download-artifact@v3
with:
name: shared_library
path: output/
- name: Copy correct platform shared library
shell: bash
run: |
cp output/${{ matrix.os }}/${{ matrix.arch }}/* bitsandbytes/
# Compile C++ code
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
rickardp marked this conversation as resolved.
Show resolved Hide resolved
with:
python-version: ${{ matrix.python-version }}
#
- name: Install Python dependencies
shell: bash
run: |
pip install -r requirements.txt
rickardp marked this conversation as resolved.
Show resolved Hide resolved
# TODO: How to run CUDA tests on GitHub actions?
#- name: Run unit tests
# if: ${{ matrix.arch == 'x86_64' }} # Tests are too slow to run in emulation. Wait for real aarch64 agents
# run: |
# PYTHONPATH=. pytest --log-cli-level=DEBUG tests
- name: Build wheel
shell: bash
run: |
python setup.py bdist_wheel
rickardp marked this conversation as resolved.
Show resolved Hide resolved
- name: Upload build artifact
uses: actions/upload-artifact@v3
with:
name: bdist_wheel
path: dist/bitsandbytes-*.whl
retention-days: 7
publish:
needs: build-wheels
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
rickardp marked this conversation as resolved.
Show resolved Hide resolved
- name: Build dist
run: |
python setup.py sdist
rickardp marked this conversation as resolved.
Show resolved Hide resolved
- name: Download build artifact
uses: actions/download-artifact@v3
with:
name: bdist_wheel
path: dist/
- run: |
ls -lR dist/
- name: Publish to PyPi
if: startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.pypi }}
22 changes: 20 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,26 @@
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so
*.dll
*.dylib
*.o
*.obj
*.air
*.metallib

# CMake generated files
CMakeCache.txt
CMakeScripts/
cmake_install.cmake
Makefile
CMakeFiles/
*.sln
*.vcxproj*
*.xcodeproj/
bitsandbytes.dir/
Debug/
Release/

# Distribution / packaging
.Python
Expand Down Expand Up @@ -133,4 +150,5 @@ dmypy.json

dependencies
cuda_build
output/
.vscode/*
121 changes: 121 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
cmake_minimum_required(VERSION 3.22.1)

option(ENABLE_CUDA "Build for CUDA (Nvidia)" OFF)
option(ENABLE_MPS "Build for Metal Performance Shaders (Apple)" OFF)

if(ENABLE_CUDA)
if(APPLE)
message(FATAL_ERROR "CUDA is not supported on macOS" )
endif()
option(NO_CUBLASLT "Don't use CUBLAST" OFF)
if(NO_CUBLASLT)
set(CMAKE_CUDA_ARCHITECTURES 50 52 60 61 70 72)
else()
set(CMAKE_CUDA_ARCHITECTURES 75 80 86 89 90)
endif()
endif()
rickardp marked this conversation as resolved.
Show resolved Hide resolved

if(ENABLE_CUDA)
message("Building CUDA support for ${CMAKE_CUDA_ARCHITECTURES}")
# Find CUDA tools if we are compiling with CUDA
find_package(CUDAToolkit REQUIRED)
if(NO_CUBLASLT)
set(LIBSUFFIX "cuda${CUDAToolkit_VERSION_MAJOR}${CUDAToolkit_VERSION_MINOR}_nocublaslt")
else()
set(LIBSUFFIX "cuda${CUDAToolkit_VERSION_MAJOR}${CUDAToolkit_VERSION_MINOR}")
endif()

project(bitsandbytes LANGUAGES CXX CUDA)
add_compile_definitions(BUILD_CUDA)
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
set(GPU_SOURCES csrc/ops.cu csrc/kernels.cu)
elseif(ENABLE_MPS)
if(NOT APPLE)
message(FATAL_ERROR "MPS is only supported on macOS" )
endif()
message("Building MPS support")
set(LIBSUFFIX "mps")
project(bitsandbytes LANGUAGES CXX OBJCXX)
add_compile_definitions(BUILD_MPS)
set(METAL_SOURCES csrc/mps_kernels.metal)
file(MAKE_DIRECTORY "build")
add_custom_command(OUTPUT "bitsandbytes/bitsandbytes.metallib"
COMMAND xcrun metal -c -o "build/bitsandbytes.air" ${METAL_SOURCES}
COMMAND xcrun metallib "build/bitsandbytes.air" -o "bitsandbytes/bitsandbytes.metallib"
DEPENDS "${METAL_SOURCES}"
COMMENT "Compiling Metal kernels"
VERBATIM)
add_custom_target(metallib DEPENDS "bitsandbytes/bitsandbytes.metallib")
set(GPU_SOURCES csrc/mps_ops.mm)
else()
message("Building with CPU only")
set(LIBSUFFIX "cpu")

project(bitsandbytes LANGUAGES CXX)
set(GPU_SOURCES)
endif()

if(APPLE)
set(CMAKE_OSX_DEPLOYMENT_TARGET 13.1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also user facing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code fails to compile on a lower target. For my understanding, why would you want to change this without chaning source code?

endif()
set(CMAKE_CXX_STANDARD 14)
set(CXX_STANDARD_REQUIRED C++14)
rickardp marked this conversation as resolved.
Show resolved Hide resolved

if(WIN32)
# Mute warnings
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -diag-suppress=177")

# Enable fast math on VC++
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /fp:fast")
rickardp marked this conversation as resolved.
Show resolved Hide resolved

# Export all symbols
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
endif()

# Weird MSVC hacks
if(MSVC)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /NODEFAULTLIB:msvcprtd /NODEFAULTLIB:MSVCRTD /NODEFAULTLIB:LIBCMT")
rickardp marked this conversation as resolved.
Show resolved Hide resolved
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /arch:AVX2")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using AVX2 intrinsic? Otherwise this is also a user decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Celeron does not support AVX2 for example

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also override user settings of /arch:AVX512 since MSVC takes the last flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also override user settings of /arch:AVX512 since MSVC takes the last flag

The source code uses AVX2 intrinsics. "user" here is developer, i.e. you need to change source code to make use of AVX512.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code uses AVX2 intrinsics. "user" here is developer, i.e. you need to change source code to make use of AVX512.

That is incorrect. MSVC can compile AVX2 intrinsic without the /ARCH:AVX2 since most intrinsics are visible from immintrin.h. The /ARCH flags just allows the compiler to use these intrinsics itself while optimizing the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry if I was unclear. I just meant that since the AVX2 intrinsics is part of the source code, does it make sense to expose this to the end user compiling from source?

Copy link
Contributor Author

@rickardp rickardp Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I don't mind removing this or making it overridable. I just want to understand how to design the replacement :) For example, should we add a SIMD option that would allow the end user to switch between scalar, AVX2, AVX512 and (on arm ) Neon?

endif()

# Add csrc files
add_library(bitsandbytes SHARED
${GPU_SOURCES}
csrc/common.cpp
csrc/cpu_ops.cpp
csrc/pythonInterface.cpp)

target_include_directories(bitsandbytes PUBLIC
${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
${CMAKE_CURRENT_SOURCE_DIR}/csrc
${CMAKE_CURRENT_SOURCE_DIR}/include)

if(ENABLE_CUDA)
target_include_directories(bitsandbytes PUBLIC ${CUDA_TOOLKIT_ROOT_DIR}/include)

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --use_fast_math")
rickardp marked this conversation as resolved.
Show resolved Hide resolved

set_target_properties(
bitsandbytes
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON)

target_link_libraries(bitsandbytes CUDA::cudart CUDA::cublas CUDA::cublasLt CUDA::cusparse)
endif()
if(ENABLE_MPS)
add_dependencies(bitsandbytes metallib)
target_link_libraries(bitsandbytes objc "-framework Foundation" "-framework Metal" "-framework MetalPerformanceShaders" "-framework MetalPerformanceShadersGraph")
endif()

set_target_properties(bitsandbytes PROPERTIES OUTPUT_NAME "bitsandbytes_${LIBSUFFIX}")
# Set the output name of the CUDA library
if(MSVC)
set_target_properties(bitsandbytes PROPERTIES LIBRARY_OUTPUT_DIRECTORY_RELEASE bitsandbytes)
set_target_properties(bitsandbytes PROPERTIES LIBRARY_OUTPUT_DIRECTORY_DEBUG bitsandbytes)
set_target_properties(bitsandbytes PROPERTIES RUNTIME_OUTPUT_DIRECTORY_RELEASE bitsandbytes)
set_target_properties(bitsandbytes PROPERTIES RUNTIME_OUTPUT_DIRECTORY_DEBUG bitsandbytes)
endif()

set_target_properties(bitsandbytes PROPERTIES LIBRARY_OUTPUT_DIRECTORY bitsandbytes)
Loading