Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

QUEUING AND RUNNING TIME ON TREE #115

Open
kinow opened this issue Jul 24, 2024 · 9 comments
Open

QUEUING AND RUNNING TIME ON TREE #115

kinow opened this issue Jul 24, 2024 · 9 comments

Comments

@kinow
Copy link
Member

kinow commented Jul 24, 2024

In GitLab by @vsicardi on Jul 24, 2024, 17:46

Hi,

I have a question about the queuing and running time display in the tree visualization in the new Autosubmit GUI.

All the jobs run after the Autosubmit GUI update do not show the queuing or the run time.

Here is a screenshot as an example

image

I am aware I can check all the statistics on the dedicated page (https://earth.bsc.es/autosubmitapp/experiment/a7i9/stats?section=SIM&hours=0), but I wonder if this feature has been eliminated on purpose or if there is an issue with the visualization.

I personally find it quite useful, and it has helped me spot errors in experiments' output more than once.

Thank you

@kinow
Copy link
Member Author

kinow commented Jul 25, 2024

In GitLab by @mcastril on Jul 25, 2024, 18:04

Hi @vsicardi. Definitely not on purpose, but also definitely if it doesn't depend on the GUI.

For example, this other experiment shows queuing and run times in the GUI:

https://earth.bsc.es/autosubmitapp/experiment/a7gu/tree

Did you have Autosubmit up all the time? Did you modify the status with setstatus or recovery at any point? Trying to guess what could happen

@kinow
Copy link
Member Author

kinow commented Jul 25, 2024

In GitLab by @mcastril on Jul 25, 2024, 18:10

/esarchive/autosubmit/a7i9/tmp/a7i9_18500101_fc1_34_SIM_TOTAL_STATS shows minimum waiting time, but definetly not 0. And running time is not 0 either.

cat a7i9_18500101_fc1_34_SIM_TOTAL_STATS
20240725000453 20240725000517 20240725014819 COMPLETED

I don't see anything strange in the ASLOG

2024-07-24 20:51:34,093 Successful check job command: ecaccess-job-list 22447256
2024-07-24 20:51:34,094 a7i9_18500101_fc1_33_SIM job seems to have completed: checking...
2024-07-24 20:51:36,846 Job a7i9_18500101_fc1_33_SIM is COMPLETED
2024-07-24 20:51:39,198 a7i9_18500101_fc1_33_SIM_STAT file have been transfered
2024-07-24 20:51:39,403 End of checking
2024-07-24 20:51:39,403 Updating FAILED jobs
2024-07-24 20:51:39,406 Updating WAITING jobs

@kinow
Copy link
Member Author

kinow commented Jul 26, 2024

In GitLab by @vsicardi on Jul 26, 2024, 11:54

Hi @mcastril,

Thank you for your reply.

Actually, I cannot thoroughly answer your questions. I did not have any particular issues with this exp; I guess for some FAILED jobs, I changed the status, and probably autosubmit stopped at some point, so I relaunched the exp. For sure, I never did a recovery. I wish I could give you more details, but I do not know how to retrieve all these possible actions.

I noticed this missing time since the app update; maybe it's just happened now, and it will never happen again.

@kinow
Copy link
Member Author

kinow commented Jul 26, 2024

In GitLab by @mcastril on Jul 26, 2024, 12:32

Hi @vsicardi

Thanks a lot. Let me clarify that my second message was forwarded to Dani, Bruno, or Luiggi so that they have more information.

I think you answered the questions well. I forgot to say it, but through the LOGS I could verify that Autosubmit had been running without interruptions for a few days, and covering all the time that the jobs show no running and queuing.

I am 99% sure that it's not related to the new GUI, but I don't blame you for thinking about it. We have to figure out what could happen, even if it doesn't happen often. If it happened there's some sort of bug somewhere.

@kinow
Copy link
Member Author

kinow commented Jul 29, 2024

In GitLab by @vsicardi on Jul 29, 2024, 08:26

Hi,

If it can be of any help, I just noticed that while running, the job has the queueing time and the running time. They disappear when the job finishes (in both cases, failed or completed).

image

thanks

@kinow
Copy link
Member Author

kinow commented Jul 31, 2024

In GitLab by @mcastril on Jul 31, 2024, 09:23

Hi @vsicardi. Thank you. I think this information is helpful. This is the only experiment that you are running, right?

If you start any other experiment, it would be helpful to discard sources of problems.

@kinow
Copy link
Member Author

kinow commented Aug 16, 2024

In GitLab by @ltenorio on Aug 16, 2024, 15:10

I see that this problem comes from the job_data table of the job_data_a7i9.db file. What happens is that when a job is COMPLETED it takes the time values from the last run of this table. However, the last run does not correspond to the actual run. You can see this by selecting the Tree View of the last run:

Screenshot_from_2024-08-16_14-58-48

This run only has data until the a7i9_18500101_fc1_21_SIM, so when you go back to the actual run from the pkl file it doesn't have that info from a7i9_18500101_fc1_22_SIM because this jobs didn't exist in the last recorded run of the DDBB file.

image

So, the cause of this issue is due to the DDBB file that doesn't record the last run data.

PD: I saw that, in newer versions of Autosubmit, a patch was made to the function that populates the times to use the TOTAL_STATS in jobs with COMPLETED status instead of using the one from the DDBB. However, this patch was not applied to the API. @dbeltran why was this patch done? (this might not be the cause of the issue)

Screenshot_from_2024-08-16_14-50-06

@kinow
Copy link
Member Author

kinow commented Aug 19, 2024

In GitLab by @dbeltran on Aug 19, 2024, 09:15

Hello @ltenorio ,

If I recall correctly, this was done to fix the issues with the stats command, but I am not sure because it is part of the mega commit of the 4.1 https://earth.bsc.es/gitlab/es/autosubmit/-/issues/1098. The problem was that the queueing time was 0 ( if I recall correctly)

@kinow
Copy link
Member Author

kinow commented Aug 21, 2024

In GitLab by @mcastril on Aug 21, 2024, 18:29

I would have said that the patch was done to deal with the problems that we had in some experiments that didn't have their running and queuing time data populated. But the patch was developed by Dani and I don't know where it was applied. Dani, are we talking about the same here?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant