-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Refactor #31
Comments
The re factored code can now launch Fluka jobs correctly. Due to the limitations of the CHTC system the code was re factored based on the following assumptions.
On the basis of this lead to the following design changes
|
The script looks for a certain directory structure within the run directory, it looks for
On the basis of what is passed, the script looks in the directories provided to determine what calculation should be performed, for example
Tells the script to look in /home/davisa/fng_str/input for the input decks, that its a Fluka calculation and that each calculation should be run 10 times. The script tar.gz's everything whithin /home/davisa/fng_str and copies it to /squid/davisa where the precompiled tar.gz's of the gcc compilers and fluka compilers exist. The script then build the DyAG graph to control the tasks using the dag_manager. (since we can no longer tag on the post processing as a child of this run the only benefit to using dag_manager is for resubmission of failed runs) |
Does any of this change if we have our own submit machine over which we have full control and disk access? I think that's what all the productive HTCondor users do. |
I don't know for sure, but I don't think so, its the IO of getting all the data to and from the compute nodes that is the issue, which is why we have to put things in squid and then wget them. If we had our own dedicated submit node it would make things easier in the sense a lot of the processing could be done there. However, at some point we have do deal with the issue of these large files so take for example one of Tim's ITER FW models, he has 2 or 3 advanced tallies in there as well, the model itself takes 10 minutes to read in cross sections, then its another 40 mins to build the kdtree. This preprocessing is done in serial on another machine, which means almost an hour just to build the runtpe file for one calculation, where we may consider splitting into 1000 sub calculations. We can of course parallelise this. If instead we brought the xs with the calculation, in effect abandoning the idea of a continue run, storing the xs on squid, then this would be several hours of transfer of xsdata before the run begins, which is not much use either since we have several hours of dead time before any useful work is done. An alternative is to pull out the xs data that is needed for the calculation and build a custom xsdir and ace file for each calculation. Another issue we have is one of storage, to even get a big ITER calculation onto Condor will take several 10's or even 100's of GB. Which from our perspective isn't really a problem but for the Condor folks who winced when I asked for 30 GB, it is. |
I think having our own submit node will solve at least 2 problems: #. We'll have a little bit more control over our environment for building our tools (although it will still have to be compatible with the execute machines) #. We can put a big hard drive there as a launching/landing pad for the data as it comes and goes. I think if we're clever, the initial costs of reading data and building the MOAB search trees (do we need both an OBB-tree for DAGMC and a KD-tree for mesh tallies?) is worth it if we can reuse the runtpe for each of the separate jobs. We should perhaps try do do a 2- to 4-way replication by hand and see what the moving parts actually look like. |
The reuse of the runtpes is the key, and unfortunately I don't currently see how we can reuse them. In normal MCNP use we can, however for advanced tallies we have to ensure the output mesh name is unique, which cannot be reset after the runtpe has been written, hence the need for multiple runtpes. Unless we shift the meshtal setup routine? |
What about different subdirectories? |
Yeah that would probably work, it just seemed a bit messy, but thats preferable to slow. |
When the alpha version is released for use in the group, it would be beneficial to have an experienced python user refactor and tidy the code from its current state.
The text was updated successfully, but these errors were encountered: