Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort out confusion/duplicates in tarballs handling from TW to sched to Wrapper #8699

Closed
belforte opened this issue Sep 18, 2024 · 7 comments
Closed

Comments

@belforte
Copy link
Member

belforte commented Sep 18, 2024

currently TW sends one file to scheduler: InputFiles.tar.gz

  • expanded by dag_bootstrap_startup.sh
  • content in [*]
  • contains the two code tarballs
    • CMSRunAnalysis.tar.gz (which is shipped to WN)
    • TaskManagerRun.tar.gz
  • also contains the user sandbox.tar.gz

CMSRunAnalysis.tar.gz is expanded by CMSRunAnalysis.sh [1]. It has

  • WMCore.zip
  • CMSRunAnalysis.py, TweakPSet.py and 3 other py file which are likely not needed [2]

TaskManagerRun.tar.gz contains files used by scripts running on the scheduler [4]

  • expanded by dagBoostrap.sh [3]
  • contains files which are duplicates of what's in CMSRunAnalysis.tar.gz [4]

[*]

annotated content

Action Needed file purpose
N gWMS-CMSRunAnalysis.sh the JobWrapper. command to be passed to HTC in the JDL when submitting to global pool
N submit_env.sh used by job wrapper
N CMSRunAnalysis.sh by job wrapper to unpack CMSRunAnalysis.tar.gz
Y cmscp.py used by job wrapper (could be moved inside CMSRunAnalysis.tar.gz )
Y cmscp.sh used by job wrapper (could be moved inside CMSRunAnalysis.tar.gz)
Y RunJobs.dag the DAG to be run for this task. Also send via transfer_input_files ? REMOVE ?
N Job.submit template for the JDL when submitting to grid, will be customized in Job.1.submit etc.
N dag_bootstrap.sh used inside DAG to setup the env for PreJobs, PostJob, PreDag steps
N AdjustSites.py called by dag_bootstrap_startup.sh*
Y site.ad list of available sites for each job in classAd format (obsolete ?) manipulated by AdjustSites but never used ? REMOVE ?
Y site.ad.json same in JSON format. Used in PreJob.py. Not touched in AdjustSites ! SHOULD CHANGE ?
N datadiscovery.pkl for automatic splitting
N taskinformation.pkl for automatic splitting
N taskworkerconfig.pkl for automatic splitting
N run_and_lumis.tar.gz input runs and lumis for each job
N input_files.tar.gz input files for each job
Y debug/ subdirectory contains the two files below, uploaded by CRABClient to S3. Can remove ?
Y debug/crabConfig.py user config file, useful debugging info
Y debug/originalPSet.py user PSet file, useful debugging info
Y debug/ user ScriptExe file (if present), useful debugging info
N sandbox.tar.gz the user files, uploaded by CRABClient to S3
Y CMSRunAnalysis.tar.gz to be transferred to running job via HTC file transfer mechanism.
Y TaskManagerRun.tar.gz
N input_dataset_lumis.json list of lumis in input dataset, for use by crab report
Y input_dataset_duplicate_lumis.json retrieved by crab report, but .... needed ?
N debug_files.tar.gz tarball whose contend has been expanded in debug/ . DUPLICATION
N input_args.json to be fetched by CRABClient and used in preparelocal

[1]

tar xmf CMSRunAnalysis.tar.gz || exit 10042

[2]
WMCore.zip
TweakPSet.py
CMSRunAnalysis.py
ServerUtilities.py Not needed
CMSGroupMapper.py Not needed
RESTInteractions.py Not needed

[3]

tar xvfm TaskManagerRun.tar.gz

[4]

CRAB3.zip
TweakPSet.py Not needed
CMSRunAnalysis.py Not needed
task_process/
task_process/CMSRucio.py
task_process/FTS_Transfers.py
task_process/RUCIO_Transfers.py
task_process/cache_status.py
task_process/task_proc_wrapper.sh
ServerUtilities.py
RucioUtils.py
CMSGroupMapper.py
RESTInteractions.py

@belforte belforte self-assigned this Sep 18, 2024
@belforte
Copy link
Member Author

belforte commented Sep 24, 2024

fall out action items

[1] see also

def parseJobAd(filename):
""" Parse the jobad file provided as argument and return a dict representing it
SB: why do we have this ? the classAd object returned by classad.parse has
the semantic of a dictionary ! Currently it is only used in cmscp.py
and in job wrapper we are not sure to have HTCondor available.
Note that we also have a parseAd() method inside CMSRunAnalysis.py which should
do finely also in cmscp.py
"""

@belforte
Copy link
Member Author

belforte commented Sep 24, 2024

How files make their way to scheduler's SPOOL_DIR and WN

thiis is what DagmanCreator has

inputFiles = ['gWMS-CMSRunAnalysis.sh', 'submit_env.sh', 'CMSRunAnalysis.sh', 'cmscp.py', 'cmscp.sh', 'RunJobs.dag', 'Job.submit', 'dag_bootstrap.sh',
'AdjustSites.py', 'site.ad', 'site.ad.json', 'datadiscovery.pkl', 'taskinformation.pkl', 'taskworkerconfig.pkl',
'run_and_lumis.tar.gz', 'input_files.tar.gz']

but also
Also, this prepareLocal method prepare a single "InputFiles.tar.gz" file with all the inputs files moved
from the TW to the schedd.
This is used by the client preparelocal command.

i.e.
tf = tarfile.open('InputFiles.tar.gz', mode='w:gz')
try:
for ifname in inputFiles + subdags + ['input_args.json']:
tf.add(ifname)
finally:
tf.close()

So the code to create InputFiles.tar.gz was possibly created for preparelocal, but eventually
DagmanSubmitter puts this in the DAGJob.jdl

transfer_input_files = InputFiles.tar.gz, subdag.jdl

The way things work is that DagmanCreator returns

return info, params, ["InputFiles.tar.gz"], splitterResult

which gets passed as input to DagmanSubmitter an in there

followed by
info['inputFilesString'] = ", ".join(inputFiles + ['subdag.jdl'])

and eventually
jobJDL["transfer_input_files"] = str(info['inputFilesString'])

that jobJDL object is saved in DAGJob.jdl file in TW /data/srv/tmp/_<taskname> directory

with open('DAGJob.jdl', 'w', encoding='utf-8') as fd:
print(jobJDL, file=fd)

Later on cmscp.sh, cmscp.py make there way to the grid WN by being placed in scheduler's SPOOL_DIR and then transferred by HTCondor again to the WN via this in Job.submit

transfer_input_files = CMSRunAnalysis.sh, cmscp.py, CMSRunAnalysis.tar.gz, sandbox.tar.gz, run_and_lumis.tar.gz, input_files.tar.gz, submit_env.sh, cmscp.sh

pff...what a mess

@belforte
Copy link
Member Author

belforte commented Sep 25, 2024

a more radical solution to "move cmscp.* inside tarball" would be to add two subdirectories in CRABServer/scripts (1)

scripts/
├── dagman
├── job_wrapper
└── task_process

or maybe better (2)

scripts
├── dagman
│   └── task_process
└── job_wrapper

so that in

cp -r "${CRABSERVERDIR}/scripts"/{TweakPSet.py,CMSRunAnalysis.py,task_process} .
cp "${CRABSERVERDIR}/src/python"/{ServerUtilities.py,RucioUtils.py,CMSGroupMapper.py,RESTInteractions.py} .
echo "Making TaskManagerRun tarball"
tar zcf "${RUNTIME_WORKDIR}/TaskManagerRun.tar.gz" CRAB3.zip TweakPSet.py CMSRunAnalysis.py task_process ServerUtilities.py RucioUtils.py CMSGroupMapper.py RESTInteractions.py || exit 4
echo "Making CMSRunAnalysis tarball"
tar zcf "${RUNTIME_WORKDIR}/CMSRunAnalysis.tar.gz" WMCore.zip TweakPSet.py CMSRunAnalysis.py ServerUtilities.py CMSGroupMapper.py RESTInteractions.py || exit 4

we only copy well named directories around, like done now for task_process

and also move expanding of job_wrapper tarball

tar xmf CMSRunAnalysis.tar.gz || exit 10042

to the top wrapper script between these lines

echo "======== PROXY INFORMATION FINISH at $(TZ=GMT date) ========"
echo "======== CMSRunAnalysis.sh at $(TZ=GMT date) STARTING ========"

@novicecpp I will very much like your opinion

@belforte

This comment was marked as outdated.

@belforte

This comment was marked as outdated.

@belforte
Copy link
Member Author

belforte commented Sep 27, 2024

I will close this when #8718 and #8721 are merged.
Will leave further refectoring as per comment above to new issue #8727 and document "tarball movements" as per #8699 (comment) to after I have finalized #7461 and #6544 . Now tracked in #8728

@belforte
Copy link
Member Author

conditions for closing met :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant