Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: fallback to sysconf when failed to detect memory size from hwloc for branch-6.1 #6

Merged

Conversation

syuu1228
Copy link
Contributor

@syuu1228 syuu1228 commented Feb 3, 2025

This is backported version of scylladb/seastar#2624


On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Seastar program such as Scylla fails to startup with following error message:

$ /opt/scylladb/bin/scylla --log-to-stdout 1
WARNING: debug mode. Not for benchmarking or production
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
scylla: seastar/src/core/resource.cc:683: resources seastar::resource::allocate(configuration &): Assertion `!remain' failed.

It seems like hwloc is failed to initialize because of /sys/devices/system/cpu/cpu0/topology/ not available on the instance.

I debugged src/core/resource.cc to find out why assert occured, and found that alloc_from_node() is failing because node->total_memory is 0. It is likely because of failure of hwloc initialize described above.

I also found that calculate_memory() going wrong since machine->total_memory is also 0.

To avoid the error on such environment, we need to fixup memory size on both machine->total_memory and node->total_memory. We can use sysconf(_SC_PAGESIZE) * sysconf(_SC_PHYS_PAGES) for this, just like we do on non-hwloc version of allocate().

Fixes scylladb/scylladb#22382
Related scylladb/scylla-pkg#4797

(cherry picked from commit b0a9f89)

…hwloc

On Fedora 41 AMI on some aarch64 instance such as m7gd.16xlarge, Seastar
program such as Scylla fails to startup with following error message:
```
$ /opt/scylladb/bin/scylla --log-to-stdout 1
WARNING: debug mode. Not for benchmarking or production
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.
scylla: seastar/src/core/resource.cc:683: resources seastar::resource::allocate(configuration &): Assertion `!remain' failed.
```

It seems like hwloc is failed to initialize because of
/sys/devices/system/cpu/cpu0/topology/ not available on the instance.

I debugged src/core/resource.cc to find out why assert occured,
and found that alloc_from_node() is failing because node->total_memory is 0.
It is likely because of failure of hwloc initialize described above.

I also found that calculate_memory() going wrong since
machine->total_memory is also 0.

To avoid the error on such environment, we need to fixup memory size on
both machine->total_memory and node->total_memory.
We can use sysconf(_SC_PAGESIZE) * sysconf(_SC_PHYS_PAGES) for this,
just like we do on non-hwloc version of allocate().

Fixes scylladb/scylladb#22382
Related scylladb/scylla-pkg#4797

(cherry picked from commit b0a9f89)
@avikivity
Copy link
Member

@syuu1228 please check the failures

@syuu1228
Copy link
Contributor Author

syuu1228 commented Feb 4, 2025

@syuu1228 please check the failures

@avikivity Tested ./configure.py --c++-standard 23 --compiler g++ --c-compiler gcc --mode release locally, it is not related with this patch, branch-6.1 (or 6.2) without this patch failed to compile.
Since branch-2025.1 does not have compile error, it probably fixed on latest code.

@avikivity avikivity merged commit 882ed7a into scylladb:branch-6.1 Feb 4, 2025
11 of 14 checks passed
@avikivity
Copy link
Member

Queued submodule update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants