Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to install Transformer Engine for fp8 #1526

Closed
palash04 opened this issue Sep 16, 2024 · 2 comments
Closed

Not able to install Transformer Engine for fp8 #1526

palash04 opened this issue Sep 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@palash04
Copy link

I am following fp8 setup from here on bare metal (inside condo env)

But I am getting following error -

      Building CMake extension transformer_engine
      Running command /opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake -S /tmp/pip-req-build-eibanc7t/transformer_engine/common -B /tmp/pip-req-build-eibanc7t/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11 -GNinja
      -- The CUDA compiler identification is NVIDIA 12.4.131
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.4.131")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      CMake Error at /tmp/pip-req-build-eibanc7t/3rdparty/cudnn-frontend/cmake/cuDNN.cmake:3 (find_path):
        Could not find CUDNN_INCLUDE_DIR using the following files: cudnn.h
      Call Stack (most recent call first):
        CMakeLists.txt:40 (include)
      
      
      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 89, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 174, in <module>
          setuptools.setup(
        File "/opt/conda/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 54, in run
          super().run()
        File "/opt/conda/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 115, in run
          ext._build_cmake(
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 91, in _build_cmake
          raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer_engine

What could be the possible solution to this?

@palash04 palash04 added the bug Something isn't working label Sep 16, 2024
@lisicheng-csn
Copy link

I am following fp8 setup from here on bare metal (inside condo env)

But I am getting following error -

      Building CMake extension transformer_engine
      Running command /opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake -S /tmp/pip-req-build-eibanc7t/transformer_engine/common -B /tmp/pip-req-build-eibanc7t/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11 -GNinja
      -- The CUDA compiler identification is NVIDIA 12.4.131
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.4.131")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      CMake Error at /tmp/pip-req-build-eibanc7t/3rdparty/cudnn-frontend/cmake/cuDNN.cmake:3 (find_path):
        Could not find CUDNN_INCLUDE_DIR using the following files: cudnn.h
      Call Stack (most recent call first):
        CMakeLists.txt:40 (include)
      
      
      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 89, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 174, in <module>
          setuptools.setup(
        File "/opt/conda/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 54, in run
          super().run()
        File "/opt/conda/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 115, in run
          ext._build_cmake(
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 91, in _build_cmake
          raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer_engine

What could be the possible solution to this?

Hello friend, have you solved it? I also encountered the same problem

@dakinggg
Copy link
Collaborator

dakinggg commented Nov 1, 2024

I'd suggest opening an issue on the transformer engine repo if you are unable to install transformer engine. We provide docker images with transformer engine installed, but won't be able to help you with your custom setup.

@dakinggg dakinggg closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants