Description
We currently try to open /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_map for every PU and Y between 0 and 9. That's usually 6 useless syscalls per PU since most CPUs have 4 caches per PU. That's almost 1ms per PU.
Linux numbers caches from 0 to N-1 internally but some of them might get skip when added to sysfs for some reasons (see cache_add_dev() in drivers/base/cacheinfo.c). That means we have no easy way to break the loop when index4 is missing as usual.
Doing stat on the parent directory might be a good way to find out the total number of indexY subdirectories. That would mean one syscall to avoid 6 syscalls. However btrfs (for fsroot regression tests) has some issues with nlink being wrong (see comments in topology-linux.c).
Reducing to 5 instead of 9 is likely a good start for now. Most current CPUs have 4 caches in sysfs. There are some L4 out there but I have never seen those in sysfs since they are rather outside of the CPUs. Itanium had 5 caches (L2i and L2d) but it's dead. So 5 works fine and gives us one free slot in case newer CPUs bring an additional level.