Skip to content

Support Iluvatar CoreX #8585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

honglyua-il
Copy link

@honglyua-il honglyua-il commented Jun 19, 2025

close #8584

The PR was validated on Iluvatar CoreX GPUs. We need to install Iluvatar Corex Toolkit first. Then run:

# Intsall dependencies
pip install -r requirements.txt
# run
python3 main.py --disable-cuda-malloc

We use the sd_xl_base_1.0 model and get the default workflow's results as below:

root@848fa421ea4c:~/ComfyUI# python3 main.py --disable-cuda-malloc --listen 0.0.0.0
Checkpoint files will always be loaded safely.
Total VRAM 32716 MB, total RAM 515630 MB
pytorch version: 2.4.1
/usr/local/corex-4.2.0/lib64/python3/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/corex-4.2.0/lib64/python3/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(cls, ctx, dx5):
xformers version: 0.0.26.post1
Set vram state to: NORMAL_VRAM
Device: cuda:0 Iluvatar BI-V150 : native
Using pytorch attention
Python version: 3.10.12 (main, Nov 29 2024, 18:13:52) [GCC 9.4.0]
ComfyUI version: 0.3.41
ComfyUI frontend version: 1.22.2
[Prompt Server] web root: /usr/local/lib/python3.10/site-packages/comfyui_frontend_package/static
/usr/local/corex-4.2.0/lib64/python3/dist-packages/flash_attn/ops/fused_dense.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(
/usr/local/corex-4.2.0/lib64/python3/dist-packages/flash_attn/ops/fused_dense.py:71: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output, *args):

Import times for custom nodes:
   0.0 seconds: /root/ComfyUI/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
/usr/local/lib/python3.10/site-packages/alembic/config.py:564: DeprecationWarning: No path_separator found in configuration; falling back to legacy splitting on spaces, commas, and colons for prepend_sys_path.  Consider adding path_separator=os to Alembic config.
  util.warn_deprecated(
No target revision found.
/usr/local/corex-4.2.0/lib64/python3/dist-packages/aiohttp/web_urldispatcher.py:202: DeprecationWarning: Bare functions are deprecated, use async ones
  warnings.warn(
Starting server

To see the GUI go to: http://0.0.0.0:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SDXLClipModel
loaded completely 31430.74140625 1560.802734375 True
/root/ComfyUI/comfy/ldm/modules/attention.py:451: UserWarning: Optional attn_mask_ param is not recommended to use. For better performance,1.Assuming causal attention masking, 'is_causal' parameter can be selected.2.Assuming alibi attention masking, 'PT_SDPA_USE_ALIBI_MASK' env can be selected. (Triggered internally at /home/corex/sw_home/apps/pytorch/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:1769.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load SDXL
loaded completely 29856.74970703125 4897.0483474731445 True
100%|███████████████████████████████████████████| 20/20 [00:02<00:00,  7.18it/s]
Requested to load AutoencoderKL
loaded completely 24619.5859375 159.55708122253418 True
Prompt executed in 5.58 seconds

image

@honglyua-il honglyua-il force-pushed the iluvatar_support branch 2 times, most recently from 46d9466 to da50a8e Compare June 23, 2025 05:57
cuda_malloc.py Outdated
@@ -50,7 +50,17 @@ def enum_display_devices():
"GeForce GTX 1650", "GeForce GTX 1630", "Tesla M4", "Tesla M6", "Tesla M10", "Tesla M40", "Tesla M60"
}

def is_ixuca():
try:
import torch
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you import torch before setting the cuda malloc option it completely breaks cuda malloc on nvidia gpus.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code. Using _load_torch_submodule to check if is_ixuca like the get version code, and return early in cuda_malloc_supported if is_ixuca.

  1. Created shared _load_torch_submodule() helper to handle all the importlib boilerplate like version.py, corex.py and so on.
  2. Updated is_ixuca to use the helper and not import torch directly to break cuda malloc on nvidia gpus.
  3. Made the version check code more concise and robust by using the helper

@honglyua-il honglyua-il force-pushed the iluvatar_support branch 2 times, most recently from 2ec7155 to 49529c7 Compare June 24, 2025 02:07
…portlib boilerplate like version.py, corex.py and so on.

2. Updated is_ixuca to use the helper and not import torch directly to break cuda malloc on nvidia gpus.
3. Made the version check code more concise and robust by using the helper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Request for Iluvatar Corex GPU support
3 participants