Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Resolution loop on 'bids/freesurfer/latest' detected, RHEL 9 #677

Open
SomePersonSomeWhereInTheWorld opened this issue Dec 10, 2024 · 32 comments

Comments

@SomePersonSomeWhereInTheWorld
Copy link

SomePersonSomeWhereInTheWorld commented Dec 10, 2024

The modules do not appear with Lmod via module available. Testing freesurfer, using TCL the module appears as:
bids/freesurfer/latest/module.tcl

ml bids/freesurfer/latest/module.tcl results in:
ERROR: Resolution loop on 'bids/freesurfer/latest' detected

I've switched between TCL and Lmod. We use the Bright Computing provisioning management system but I don't think that is related.

To Reproduce
Steps to reproduce the behavior:

git clone https://github.com/singularityhub/singularity-hpc
cd singularity-hpc
pip install -e .[all]
 /usr/local/bin/shpc install bids/freesurfer:latest
INFO:    Using cached SIF image
Module bids/freesurfer:latest was created.
module use ./modules

This was installed within echo $MODULEPATH

Expected behavior
The module should appear.

apptainer version 1.3.0-1.el9
/usr/local/bin/shpc --version
0.1.28
rpm -qa|grep -i lmod
Lmod-8.7.32-1.el9.x86_64
rpm -qa|grep -i tcl
tcl-8.6.10-7.el9.x86_64
tcl-devel-8.6.10-7.el9.x86_64

Anything else?
If this is a question more than a bug, feel free to move it.

@vsoch
Copy link
Member

vsoch commented Dec 10, 2024

I'm not great with developing environment modules, but this seems relevant:

image

@marcodelapierre have you seen this before?

@marcodelapierre
Copy link
Contributor

I am testing with Lmod version 8.7.47, and latest Singularity-HPC, and cannot reproduce.

Would you be able to provide the output of ml av bids, as well as the contents of the directory modules/bids/freesurfer/latest/ ?

Thank you

@SomePersonSomeWhereInTheWorld
Copy link
Author

ml av bids

It does not return any values.

ls modules/bids/freesurfer/latest
99-shpc.sh  bin  module.lua

Since we have both TCL and Lmod installed which option should be used? Is it expected to not load modules when we use Lmod?

@vsoch
Copy link
Member

vsoch commented Dec 11, 2024

It should work for either or, but I've never tested shpc in an environment with both!

@SomePersonSomeWhereInTheWorld
Copy link
Author

Is there any debugging or comments we can insert to see where the modules are failing to load?

@vsoch
Copy link
Member

vsoch commented Dec 11, 2024

shpc is just using standard environment modules or lmod, so you'd want to look in documentation for those projects. @marcodelapierre is much better at these modules than me so he might have some advice! I do most in the cloud these days, or user space kubernetes.

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 16, 2024

Hello @marcodelapierre is there anything I can do to work around this?

@marcodelapierre
Copy link
Contributor

I am surprised that ml av bids does not return anything.
Is the root directory of that module tree in the $MODULEPATH? You should always make sure you have it, module use </modules/root/path>.

I would like to see the outcome of that command in the same shell environment that also generates the Error.

When you say you have both TCL + Lmod installed, I am assuming you mean TCL as the language and then Lmod modules, and that you do NOT also have installed Environment Modules (the TCL variation of Lmod). Is this correct?
In other words, I would expect module --version to reveal that you have an Lmod version installed.

With the assumption above, Lmod can handle both modules written in lua and in tcl. Lua is the native one for Lmod, hence preferable because it has more functionalities; Lmod can handle tcl modules too though.

For what regards SHPC, I cannot test at present, but for what I remember from past exploration, Lmod is capable of correctly using both Lua and Tcl modules generated by SHPC. (as said above, lua preferred because it is the native language of Lmod modules).

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 17, 2024

I am surprised that ml av bids does not return anything. Is the root directory of that module tree in the $MODULEPATH? You should always make sure you have it, module use </modules/root/path>.

Yes here's what echo $MODULEPATH returns after I load an existing (non SHPC) module:

/cluster/shared/modulefiles/scotch/7.0.5:/cluster/shared/modulefiles/bids/freesurfer/V30-a43f1f/module.lua:/cluster/shared/modulefiles/bids/freesurfer/V30-a43f1f:/cluster/shared/modulefiles/bids/freesurfer:/etc/scl/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles
If I run ml on another non-SHPC module using the full path it shows up like this, with the full path:

ml  /cluster/shared/modulefiles/metis/5.1.0
 ml
Currently Loaded Modulefiles:
 1) anaconda/2023.09   2) scotch/7.0.5   3) /cluster/shared/modulefiles/metis/5.1.0  

These also show up as modules:

singularity-hpc/shpc/main/modules/templates/default_version  
singularity-hpc/shpc/main/modules/templates/docker.tcl       
singularity-hpc/shpc/main/modules/templates/singularity.tcl  
singularity-hpc/shpc/main/modules/templates/view_module.tcl  

I would like to see the outcome of that command in the same shell environment that also generates the Error.

Did the above help?

When you say you have both TCL + Lmod installed, I am assuming you mean TCL as the language and then Lmod modules, and that you do NOT also have installed Environment Modules (the TCL variation of Lmod). Is this correct?
Both are installed at the same time:

$ rpm -qa|grep -i envi
environment-modules-5.3.0-1.el9.x86_64
 rpm -qa|grep -i lmod
Lmod-8.7.32-1.el9.x86_64

In other words, I would expect module --version to reveal that you have an Lmod version installed.

 module --version
Modules Release 5.3.0 (2023-05-14)

With the assumption above, Lmod can handle both modules written in lua and in tcl. Lua is the native one for Lmod, hence preferable because it has more functionalities; Lmod can handle tcl modules too though.

For what regards SHPC, I cannot test at present, but for what I remember from past exploration, Lmod is capable of correctly using both Lua and Tcl modules generated by SHPC. (as said above, lua preferred because it is the native language of Lmod modules).

This must be a clue as to why the 4 singularity-hpc modules appear. The ones ending in .tcl appear:

ls -l /cluster/shared/modulefiles/singularity-hpc/shpc/main/modules/templates
total 83
-rw-r--r-- 1 root root  112 Dec 17 09:26 default_version
-rw-r--r-- 1 root root 7094 Dec 17 09:26 docker.lua
-rw-r--r-- 1 root root 8490 Dec 17 09:26 docker.tcl
-rw-r--r-- 1 root root 5200 Dec 17 09:26 docs.md
drwxr-xr-x 2 root root 4096 Dec 17 09:26 includes
-rw-r--r-- 1 root root    0 Dec 17 09:26 __init__.py
-rw-r--r-- 1 root root 8062 Dec 17 09:26 singularity.lua
-rw-r--r-- 1 root root 9799 Dec 17 09:26 singularity.tcl
-rwxr-xr-x 1 root root  405 Dec 17 09:26 test.sh
-rw-r--r-- 1 root root  256 Dec 17 09:26 view_module.lua
-rw-r--r-- 1 root root  307 Dec 17 09:26 view_module.tcl

Attempting to load one of those gets this error:

ml singularity-hpc/shpc/main/modules/templates/singularity.tcl
Loading singularity-hpc/shpc/main/modules/templates/singularity.tcl
  Module ERROR: invalid command name "% include "includes/load_view.tcl" %"
        while executing
    "{% include "includes/load_view.tcl" %}"
        (file "/cluster/shared/modulefiles/singularity-hpc/shpc/main/modules/templates/singularity.tcl" line 45)
    Please contact <root@localhost>

Using the full path to the module.lua file:

 ml /cluster/shared/modulefiles/bids/freesurfer/V30-a43f1f/module.lua 
Module ERROR: Magic cookie '#%Module' missing
  In '/cluster/shared/modulefiles/bids/freesurfer/V30-a43f1f/module.lua'
  Please contact <root@localhost>

@marcodelapierre
Copy link
Contributor

Mmh OK. I would discourage having both Env Modules and Lmod active in the same shell environment, it adds unnecessary complexity and sources of issues.
Remember that ml is exclusive to Lmod, whereas module exists for both. So in your case you really have both active, as module --version mentions Env Modules.

You should not be seeing these:

singularity-hpc/shpc/main/modules/templates/default_version  
singularity-hpc/shpc/main/modules/templates/docker.tcl       
singularity-hpc/shpc/main/modules/templates/singularity.tcl  
singularity-hpc/shpc/main/modules/templates/view_module.tcl 

I think they come out of using module use on the wrong path.

From your first messages:

cd singularity-hpc
module use ./modules

To correctly use modules, you need to run module use with the modules subdirectory of the SHPC installation.

I would like to see the outcome of ml av bids after you run this.

@marcodelapierre
Copy link
Contributor

What type of privileges do you have on that machine?

It would be good to be able to disable either Lmod or Env modules setup in the shell environment, to test in a more robust environment.

Sourcing of the corresponding scripts will typically be either in your .bashrc/.profile in your home, in which case you can comment out the corresponding line, or in the forms of scripts within /etc/profile.d/ directory. The latter case is trickier as it would require root privileges to move sourceable files out of it.

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 18, 2024

To correctly use modules, you need to run module use with the modules subdirectory of the SHPC installation.

I would like to see the outcome of ml av bids after you run this.

So within /cluster/shared/apps/singularity-hpc I ran

Tue Dec 17 root@node003 $ module use ./modules
Tue Dec 17 root@node003 $ ml av bids
Tue Dec 17 root@node003 $ ml
No Modulefiles Currently Loaded.
echo $MODULEPATH
/cluster/shared/apps/singularity-hpc/modules:/etc/scl/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles

What type of privileges do you have on that machine?

root

It would be good to be able to disable either Lmod or Env modules setup in the shell environment, to test in a more robust environment.

Sure what would you like me to try?

EDIT I went to another node. This time I made some progress:

[root@prov singularity-hpc]# module use ./modules
[root@prov singularity-hpc]# ml av bids

---------------------- /cluster/shared/apps/singularity-hpc/modules -----------------------
   bids/freesurfer/latest/module

------------------------------- /cluster/shared/modulefiles -------------------------------
   bids/freesurfer/V30-a43f1f/module (D)

So if I do

ml bids/freesurfer/latest/module
[root@2402-prov-002 singularity-hpc]# ml

Currently Loaded Modules:
  1) anaconda/2023.09   2) bids/freesurfer/latest/module



[root@prov singularity-hpc]# free
free                          freesurfer-inspect-deffile    freesurfer-shell
freesurfer-container          freesurfer-inspect-runscript  freetype-config
freesurfer-exec               freesurfer-run     

Yep this is a node that has only Lmod-8.7.32-1.el9.x86_64 so having both definitely confuses the modules themselves.

@marcodelapierre
Copy link
Contributor

Oh great! Looks like the EnvModules vs Lmod clash was indeed causing your issues.

Let us know if you need anything else, or if we can close this one.

@vsoch
Copy link
Member

vsoch commented Dec 18, 2024

@marcodelapierre 🙌 🙌 !

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 18, 2024

It would be good to be able to disable either Lmod or Env modules setup in the shell environment, to test in a more robust environment.

Can you suggest a way to do this in the nodes that have both?

Oh great! Looks like the EnvModules vs Lmod clash was indeed causing your issues.

Let us know if you need anything else, or if we can close this one.

Looks like both Lmod and environment-modules are in use to get specific modules to load only on compute nodes, so when trying to uninstall environment-modules it wants to remove mpich, openmpi, etc.

It'd be great to know how to work around this.

@marcodelapierre
Copy link
Contributor

Please start from this comment of mine above:

It would be good to be able to disable either Lmod or Env modules setup in the shell environment, to test in a more robust environment.

Sourcing of the corresponding scripts will typically be either in your .bashrc/.profile in your home, in which case you can comment out the corresponding line, or in the forms of scripts within /etc/profile.d/ directory. The latter case is trickier as it would require root privileges to move sourceable files out of it.

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 19, 2024

Please start from this comment of mine above:

It would be good to be able to disable either Lmod or Env modules setup in the shell environment, to test in a more robust environment.

Sourcing of the corresponding scripts will typically be either in your .bashrc/.profile in your home, in which case you can comment out the corresponding line, or in the forms of scripts within /etc/profile.d/ directory. The latter case is trickier as it would require root privileges to move sourceable files out of it.

That's where I'm not clear. Are you suggesting to run the source cmd on
/cluster/shared/apps/singularity-hpc/modules/bids/freesurfer/latest/module and put it in my .bashrc? That means all users that want this module need to do the same, unless I put it into a file in /etc/profile.d. Basically I'm asking for assistance in what to do next.

@marcodelapierre
Copy link
Contributor

No worries, let me try and clarify.

Having the RPM packages for Lmod and Env modules is not enough to have them actually configured in the shell environment. Both applications rely on an initialisation script that needs to be sourced when the shell starts. We have proof that you have both running (ml -> lmod ; module --version --> suggest env modules), and this implies that at shell startup scripts for both are sourced.

Now, to have only one running, you need to locate where the scripts are sourced, and disable that.

Common setups:

LMOD:

  1. .bashrc, .profile or .bash_profile source the file <LMOD-ROOT>/lmod/init/profile
  2. /etc/profile.d/ contains a copy or symlink of the lmod script file above; often, this is named in a meaningful way, e.g. /etc/profile.d/z00_lmod.sh (otherwise need to grep the contents to find out)

ENV MODULES:

  1. .bashrc, .profile or .bash_profile source the file <MODULES-ROOT>/init/profile.sh
  2. /etc/profile.d/ contains a copy or symlink of the env modules script file above; often, this is named in a meaningful way, e.g. /etc/profile.d/modules.sh (otherwise need to grep the contents to find out)

So, what I suggest is to have a look at the following:

  • contents of the files .bashrc, .profile or .bash_profile
  • files within the directory /etc/profile.d/
    There, you want to search for the info I mentioned just above.
    Select the software you want to disable, and then it will be a matter of either:
  • commenting the line that does the sourcing in the file .bashrc, .profile or .bash_profile ; or
  • moving out the script file from /etc/profile.d/

I hope this can help.

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 19, 2024

OK I'm getting closer. I'm assuming I want to disable Environment Modules. I see we have:

/etc/alternatives/modules.sh
/etc/profile.d/modules.sh

There is also the .csh versions which I'll ignore for now.

file /etc/alternatives/modules.sh
/etc/alternatives/modules.sh: symbolic link to /usr/share/Modules/init/profile.sh

Which has:

cat  /usr/share/Modules/init/profile.sh 
# shellcheck shell=sh

# get current shell name by querying shell variables or looking at parent
# process name
if [ -n "${BASH:-}" ]; then
   if [ "${BASH##*/}" = 'sh' ]; then
      shell='sh'
   else
      shell='bash'
   fi
elif [ -n "${ZSH_NAME:-}" ]; then
   shell=$ZSH_NAME
else
   shell=$(/usr/bin/basename "$(/usr/bin/ps -p $$ -ocomm=)")
fi

if [ -f "/usr/share/Modules/init/$shell" ]; then
   . "/usr/share/Modules/init/$shell"
else
   . '/usr/share/Modules/init/sh'
fi

And this file, which has the MODULEPATH in it and I believe this is for Lmod:

cat /etc/profile.d/00-modulepath.sh
[ -z "$MODULEPATH" ] &&
  [ "$(readlink /etc/alternatives/modules.sh)" = "/usr/share/lmod/lmod/init/profile" -o -f /etc/profile.d/z00_lmod.sh ] &&
  export MODULEPATH=/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles || :

So notice that it references /etc/alternatives/modules.sh which is a sym link to:

file  /etc/alternatives/modules.sh
/etc/alternatives/modules.sh: symbolic link to /usr/share/Modules/init/profile.sh

So can I just use alternatives cmd to get around this?

FWIW:

ls /etc/profile.d/z00_lmod.sh
ls: cannot access '/etc/profile.d/z00_lmod.sh': No such file or directory

And this file is what appears to enable Lmod:
/usr/share/lmod/lmod/init/profile

########################################################################
. /etc/profile.d/00-modulepath.sh

#  This is the system wide source file for setting up modules
########################################################################

if [ -z "${LMOD_ALLOW_ROOT_USE+x}" ]; then
  LMOD_ALLOW_ROOT_USE=yes
fi
[...snip...]

@marcodelapierre
Copy link
Contributor

Looks like /etc/profile.d/00-modulepath.sh is used by both EnvModules and Lmod, it has a conditional to check which one is active.

Yes, you want to disable the env modules sourceable files, so for instance having the alternatives file /etc/alternatives/modules.sh NOT pointing to Env Modules.
Also, for the same purpose /etc/profile.d/modules.sh should not be the Env Modules one.

What I don't understand in this setup is where the Lmod script is currently sourced, otherwise you would not be able to use ml.

What I would do:

  1. get rid of /etc/alternatives/modules.sh and /etc/profile.d/modules.sh
  2. check whether Lmod ml is still available - where is this setup? .bashrc/.bash_profile/.profile ? some other .sh file in /etc/profile.d/ ?
  3. if still needed after 2., make sure that Lmod's /usr/share/lmod/lmod/init/profile is sourced somewhere (eg by creating a symlink to /etc/profile.d/z00_lmod.sh)
  4. make sure the conditional /etc/profile.d/00-modulepath.sh works properly with your Lmod-only setup

I hope this helps.

Please be aware that I will be logging off at the end of today, and get back on my work computer on 6 Jan. I will not be responsive in the meanwhile.

All the best,

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Dec 23, 2024

  • get rid of /etc/alternatives/modules.sh and /etc/profile.d/modules.sh
  • check whether Lmod ml is still available - where is this setup? .bashrc/.bash_profile/.profile ? some other .sh file in /etc/profile.d/ ?
  • if still needed after 2., make sure that Lmod's /usr/share/lmod/lmod/init/profile is sourced somewhere (eg by creating a symlink to /etc/profile.d/z00_lmod.sh)
  • make sure the conditional /etc/profile.d/00-modulepath.sh works properly with your Lmod-only setup

OK I removed the 2 suggested files, then created the sym link. However in order for bids/freesurfer/latest/module to appear I had to manually run the module use ./modules cmd after I cd /cluster/shared/apps/singularity-hpc

How do I get that module to be available without having to run module use? How can I customize the long module name bids/freesurfer/latest/module

Please be aware that I will be logging off at the end of today, and get back on my work computer on 6 Jan. I will not be responsive in the meanwhile.

Thanks perhaps @vsoch has an idea?

@marcodelapierre
Copy link
Contributor

Hi,

To make the modules available from shell login, I suggest to add module use /cluster/shared/apps/singularity-hpc in your ~/.bash_profile or ~/.profile (depending on which one you have in your home).

To customise grouping of modules within the directory structure, please have a look at the Views functionality in the documentation, see if it can be helpful in your case.

@SomePersonSomeWhereInTheWorld
Copy link
Author

To customise grouping of modules within the directory structure, please have a look at the Views functionality in the documentation, see if it can be helpful in your case.

What I mean is all of these show up:

 ml avail

------------------ /cluster/shared/apps/singularity-hpc ------------------
   modules/bids/freesurfer/latest/module
   shpc/main/modules/templates/default_version
   shpc/main/modules/templates/docker.tcl
   shpc/main/modules/templates/docker
   shpc/main/modules/templates/includes/default_version
   shpc/main/modules/templates/includes/load_view       (D)
   shpc/main/modules/templates/singularity.tcl
   shpc/main/modules/templates/singularity
   shpc/main/modules/templates/view_module.tcl
   shpc/main/modules/templates/view_module

So how would I hide those?

Also what causes this?

 ml modules/bids/freesurfer/latest/module
Lmod Warning:  MODULEPATH directory:
"/cluster/shared/apps/singularity-hpc" has too many non-modulefiles
(159). Please make sure that modulefiles are in their own directory and not
mixed in with non-modulefiles (e.g. source code)

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Jan 7, 2025

Oh I see, thanks for clarifying.

Both issues come to the fact that the directory /cluster/shared/apps/singularity-hpc is in the MODULEPATH variable, which should NOT be the case. because that is a location of the SHPC installation, not a module directory tree.
Most likely this implies that somewhere the following command is executed, module use /cluster/shared/apps/singularity-hpc.

If you are not executing it yourself after shell start, then it is executed in some sourced script, such as the usual .bashrc, .bash_profile, .. or in one inside /etc/profile.d/. You should locate that command and get rid of it.

@SomePersonSomeWhereInTheWorld SomePersonSomeWhereInTheWorld changed the title ERROR: Resolution loop on 'bids/freesurfer/latest' detected, RHEL 9, Bright Computing ERROR: Resolution loop on 'bids/freesurfer/latest' detected, RHEL 9 Jan 7, 2025
@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Jan 7, 2025

Both issues come to the fact that the directory /cluster/shared/apps/singularity-hpc is in the MODULEPATH variable, which should NOT be the case. because that is a location of the SHPC installation, not a module directory tree.
Most likely this implies that somewhere the following command is executed, module use /cluster/shared/apps/singularity-hpc.

I only run module use /cluster/shared/apps/singularity-hpc then from ml avail all those extra shpc/main/modules/templates appear.

So I'm confused now, as previously you told me to:

To make the modules available from shell login, I suggest to add module use /cluster/shared/apps/singularity-hpc in your ~/.bash_profile or ~/.profile (depending on which one you have in your home).

After I run the module use cmd the $MODULEPATH is:

echo $MODULEPATH
/etc/scl/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core

so indeed it does get added to/cluster/shared/apps/singularity-hpc in $MODULEPATH

Looking at my cmd history could one of these cause the $MODULEPATH to be updated?

shpc config set container_base /cluster/shared/apps/shpc-containers
shpc config set module_sys lmod

@marcodelapierre
Copy link
Contributor

Instead of the one you have just mentioned, the correct module use command should be:

module use /cluster/shared/apps/singularity-hpc/modules

Using this one should fix the incorrect ml avail output, let me know :)

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Jan 8, 2025

Using this one should fix the incorrect ml avail output, let me know :)

Ah now that looks much better:

module use /cluster/shared/apps/singularity-hpc/modules
 ml av
------------------- /cluster/shared/apps/singularity-hpc/modules --------------------
   bids/freesurfer/latest/module

Any ideas on how to fix having both Lmod and environment-modules without having to do the below on all nodes? I suppose this could be added to an Ansible play book just seems clumsy to scale it the way you suggested:

  • get rid of /etc/alternatives/modules.sh and /etc/profile.d/modules.sh
  • check whether Lmod ml is still available - where is this setup? .bashrc/.bash_profile/.profile ? some other .sh file in /etc/profile.d/ ?
  • if still needed after 2., make sure that Lmod's /usr/share/lmod/lmod/init/profile is sourced somewhere (eg by creating a symlink to /etc/profile.d/z00_lmod.sh)
  • make sure the conditional /etc/profile.d/00-modulepath.sh works properly with your Lmod-only setup

Perhaps this is a feature request, to allow usage/confguration with Lmod and environment-modules installed?

And thanks for you patience and responding to my issues, I bet this thread will help others in the future.

@marcodelapierre
Copy link
Contributor

Hi, no worries, indeed these are enjoyable and useful conversation within the community :)

I don't have much experience on the cluster sys admin side nor on removing one of the modules installation. I would suggest to look around for a scriptable, effective way of removing it, that can be implemented within the cluster manager of your cluster - for this second part consulting with other team members of the cluster operations team may help I reckon.

Regarding allowing to use both modules systems at once, as far as I know this is not a limitation of SHPC per se. I may be wrong, but Lmod and EnvModules themselves were designed as alternative options. As such, their shell setups have inherently overlapping and conflicting aspects.
In my experience, I have only seen deployments with one of them being installed. However, this is just my experience. I encourage you to consider reaching out to their developer teams (https://github.com/tacc/lmod, https://github.com/envmodules/modules) and gauge their feedback on this one.

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Jan 9, 2025

Well I uninstalled Lmod and module use /cluster/shared/apps/singularity-hpc/modules runs but does not make freesurfer available. And thus, creating a symbol link to /usr/share/lmod/lmod/init/profile won't work as Lmod is uninstalled. $MODULEPATH does however add the path to SHPC:

echo $MODULEPATH
/cluster/shared/apps/singularity-hpc/modules:/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles

What else could be preventing the module from appearing?

I also ran: shpc config set module_sys tcl as Lmod is now gone. No difference.

Running the following results in the below error:

ml bids/freesurfer/latest/module
Module ERROR: Magic cookie '#%Module' missing
  In '/cluster/shared/apps/singularity-hpc/modules/bids/freesurfer/latest/module.lua'
  Please contact <root@localhost>

So is an additional step needed for Environment Modules?

EDIT: I see that for Environment Modules I need to use --module-sys tcl so I ran:
shpc install --module-sys tcl bids/freesurfer

However I get this error:

 module load bids/freesurfer/V30-a43f1f/module.tcl
Loading bids/freesurfer/V30-a43f1f/module.tcl
  Module ERROR: No extra specification allowed on this command
        while executing
    "conflict bids/freesurfer:V30-a43f1f"
        (file "/cluster/shared/apps/singularity-hpc/modules/bids/freesurfer/V30-a43f1f/module.tcl" line 85)
    Please contact <root@localhost>

What's interesting is the help cmd works:

 module help bids/freesurfer/latest
-------------------------------------------------------------------
Module Specific Help for /cluster/shared/apps/singularity-hpc/modules/bids/freesurfer/V30-a43f1f/module.tcl:

This module is a singularity container wrapper for bids/freesurfer:V30-a43f1f vV30-a43f1f

so Line 85:

     80 # If we have wrapper base set, honor it, otherwise we use the moduleDir
     81 set wrapperDir "/cluster/shared/apps/singularity-hpc/modules/bids/freesurfer/V30-a43f1f"
     82 
     83 # conflict with modules with the same alias name
     84 conflict freesurfer
     85 conflict bids/freesurfer:V30-a43f1f

Here's a module --debug output:

module --debug use /cluster/shared/apps/singularity-hpc/modules 
DEBUG setState: cmdline set to '/usr/share/Modules/libexec/modulecmd.tcl bash --debug use /cluster/shared/apps/singularity-hpc/modules'
DEBUG setState: machine set to 'x86_64'
DEBUG setConf: tcl_ext_lib set to '/usr/lib64/libtclenvmodules.so'
DEBUG Load Tcl extension library (/usr/lib64/libtclenvmodules.so)
DEBUG setState: tcl_ext_lib_loaded set to '1'
DEBUG setConf: siteconfig set to '/etc/environment-modules/siteconfig.tcl'
DEBUG sourceSiteConfig: Source site configuration (/etc/environment-modules/siteconfig.tcl)
DEBUG setState: siteconfig_loaded set to '1'
DEBUG setConf: locked_configs set to ''
DEBUG setState: supported_shells set to 'sh bash ksh zsh csh tcsh fish cmd tcl perl python ruby lisp cmake r'
DEBUG setState: shell set to 'bash'
DEBUG setState: subcmd set to 'use'
DEBUG setState: subcmd_args set to '/cluster/shared/apps/singularity-hpc/modules'
DEBUG setState: init_error_report set to '1'
DEBUG setConf: verbosity set to 'debug'
DEBUG setState: is_stderr_tty set to '1'
DEBUG setConf: term_background set to 'dark'
DEBUG setConf: colors set to 'hi=1:db=2:tr=2:se=2:er=91:wa=93:me=95:in=94:mp=1;94:di=94:al=96:va=93:sy=95:de=4:cm=92:aL=100:L=90;47:H=2:F=41:nF=43:S=46:sS=44:kL=30;48;5;109'
DEBUG setConf: color set to '1'
DEBUG setConf: pager set to '/usr/bin/less -eFKRX'
DEBUG setState: paginate set to '1'
DEBUG setState: report_format set to 'regular'
DEBUG setState: reportfd set to 'file4'
DEBUG setState: timer set to '0'
DEBUG lappendState: modulefile appended with '{}'
DEBUG parseModuleCommandName: (command=use, cmdvalid=1, cmdempty=0)
DEBUG setConf: avail_indepth set to '1'
DEBUG setConf: search_match set to 'starts_with'
DEBUG parseModuleCommandArgs: (show_oneperline=0, show_mtime=0, show_filter=, search_filter=, search_match=starts_with, dump_state=0, addpath_pos=prepend, not_req=0, tag_list=, otherargs=/cluster/shared/apps/singularity-hpc/modules)
DEBUG setConf: advanced_version_spec set to '1'
DEBUG setConf: variant_shortcut set to ''
DEBUG lappendState: always_read_full_file appended with '1'
DEBUG lappendState: commandname appended with 'use'
DEBUG setState: rc_running set to '1'
DEBUG setConf: ignore_user_rc set to '0'
DEBUG unsetState: rc_running unset
DEBUG setState: cwd set to '/root'
DEBUG lappendState: mode appended with 'load'
DEBUG add-path: (--ignore-refcount MODULEPATH /cluster/shared/apps/singularity-hpc/modules) cmd=prepend-path, mode=load, dflbhv=prepend
DEBUG setState: is_win set to '0'
DEBUG setState: path_separator set to ':'
DEBUG parsePathCommandArgs: (delim=:, allow_dup=0, idx_val=0, ign_refcount=1, bhv=prepend, var=MODULEPATH, val=/cluster/shared/apps/singularity-hpc/modules, nbval=1)
DEBUG unset-env: __MODULES_PUSHENV_MODULEPATH (internal=1, val=)
DEBUG getReferenceCountArray: (var=MODULEPATH, delim=:) got '/cluster/shared/modulefiles 1 /etc/scl/modulefiles 1 /cluster/shared/apps/singularity-hpc/modules 1 /usr/share/modulefiles 1 /etc/modulefiles 1 /usr/share/Modules/modulefiles 1'
DEBUG set-env: MODULEPATH=/cluster/shared/apps/singularity-hpc/modules:/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/cluster/shared/modulefiles
DEBUG setState: mode set to ''
DEBUG setState: commandname set to ''
DEBUG setState: always_read_full_file set to ''
DEBUG setState: autoinit set to '0'
DEBUG setState: shelltype set to 'sh'
DEBUG setState: error_count set to '0'
DEBUG setState: return_false set to '0'

Edit: someone at the environment-modules Git wrote this

The conflict statement conflict bids/freesurfer:V30-a43f1f is not valid due to the use of the : character which is the extra match specifier.

I believe there is an issue with the generation of this modulefile. If the goal is to say that only one bids/freesurfer module can be loaded, the conflict statement should be conflict bids/freesurfer.

I would suggest to report this to the singularity-hpc team.

Edit: Indeed removing in line 85 the :V30-a43f1f allows the module to load.

@SomePersonSomeWhereInTheWorld
Copy link
Author

So on another cluster I get this error:

ml tensorflow/tensorflow/2.18.0rc0/module.tcl 
Lmod has detected the following error:
/share/apps/Miniforge/lib/python3.12/site-packages/modules/tensorflow/tensorflow/2.18.0rc0/module.tcl:
(tensorflow/tensorflow/2.18.0rc0/module.tcl): expected boolean value but got
"bash"
While processing the following module(s):
    Module fullname                             Module Filename
    ---------------                             ---------------
    tensorflow/tensorflow/2.18.0rc0/module.tcl  /share/apps/Miniforge/lib/python3.12/site-packages/modules/tensorflow/tensorflow/2.18.0rc0/module.tcl

bash is mentioned here:

/share/apps/Miniforge/lib/python3.12/site-packages/modules/tensorflow/tensorflow/2.18.0rc0/module.tcl
if { [ module-info shell bash ] } {
  if { [ module-info mode load ] } {
 

  }
  if { [ module-info mode remove ] } {

@SomePersonSomeWhereInTheWorld
Copy link
Author

SomePersonSomeWhereInTheWorld commented Jan 11, 2025

I was advised to share this, via the environment-module Git

change the syntax from:

if { [ module-info shell bash ] } 

to

 if { [ module-info shell ] eq {bash} } {

In module.tcl

@xdelaruelle
Copy link
Contributor

I was advised to share this, via the environment-module Git

change the syntax from:

if { [ module-info shell bash ] } 

to

 if { [ module-info shell ] eq {bash} } {

In module.tcl

Hello SHPC team.

For this point mentionned by @SomePersonSomeWhereInTheWorld, I have just proposed you a pull request: #682.

Regards,
Xavier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants