Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained HTCondor tasks #193

Open
TheRealLoliges486 opened this issue Oct 31, 2024 · 2 comments
Open

Chained HTCondor tasks #193

TheRealLoliges486 opened this issue Oct 31, 2024 · 2 comments
Assignees
Labels

Comments

@TheRealLoliges486
Copy link

Question

Hello,

what is the recommended way of running tasks with HTCondor workflow which rely on other tasks with HTCondor workflows?

Concretely, I have a task called FTest which has subtasks FTestCategory. The latter must run with HTCondor.
FTestCategory has a requirement called Trees2WS which again consists of subtasks Trees2WSSingleProcess which should run on HTCondor as well.

Now, when I execute law run FTest --workers 4, then law creates the Condor submission for FTestCategory and on that respective node the Condor submission for Trees2WSSingleProcess but ultimately fails, since on LXPLUS the condor nodes themselfs cannot access the schedd.

The resulting error is this one, which I guess is due to the inaccessibility of the schedd on Condor nodes:

Traceback (most recent call last):
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/luigi/worker.py", line 210, in run
    new_deps = self._run_get_new_deps()
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/law/workflow/remote.py", line 628, in run
    return self._run_impl()
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/law/workflow/remote.py", line 700, in _run_impl
    self.submit()
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/law/workflow/remote.py", line 882, in submit
    job_ids, submission_data = self._submit_group(submit_jobs)
  File "/afs/cern.ch/user/n/niharrin/cernbox/PhD/Higgs/CMSSW_14_1_0_pre4/src/flashggFinalFit/law/install_dir/lib/python3.9/site-packages/law/contrib/htcondor/workflow.py", line 190, in _submit_group
    c, p = job_id.split(".")
AttributeError: 'Exception' object has no attribute 'split'

How do you handle such chained HTCondor workflows?

Thanks a lot!!

@riga
Copy link
Owner

riga commented Oct 31, 2024

Hi,

two things before going into depth of the workflow -> task -> workflow pattern.

  1. The error you are seeing is a bug that we also stumbled upon recently. I will hopefully have time late next week to debug this further. It's quite elusive and seems to appear only in edge cases (at last on our end).

  2. To make sure I understand, is this the situation you want to achieve? (workflows have a purple border)

flowchart TD
    %% aliases
    ftest(FTest)
    ftestcat1[FTestCategory]
    ftestcat2[FTestCategory]
    t2ws1(Trees2WS)
    t2ws2(Trees2WS)
    t2wss11[Trees2WSSingleProcess]
    t2wss12[Trees2WSSingleProcess]
    t2wss21[Trees2WSSingleProcess]
    t2wss22[Trees2WSSingleProcess]

    %% styles
    classDef WF stroke: #83b, stroke-width: 3px

    %% assign styles
    class ftestcat1 WF
    class ftestcat2 WF
    class t2wss11 WF
    class t2wss12 WF
    class t2wss21 WF
    class t2wss22 WF
    
    %% actual graph
    ftest --> ftestcat1
    ftest --> ftestcat2
    ftestcat1 --> t2ws1
    ftestcat2 --> t2ws2
    t2ws1 --> t2wss11
    t2ws1 --> t2wss12
    t2ws2 --> t2wss21
    t2ws2 --> t2wss22
Loading

If not, feel free to change the graph and paste it here in GH in a ```mermaid code box.

@TheRealLoliges486
Copy link
Author

TheRealLoliges486 commented Oct 31, 2024

Hi,

Yes, for now this is the situation I want to achieve. Ideally, Trees2WS should run only once per execution of law (as it produces all the ingredients for FTestCategory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants