Skip to content

Commit add6b74

Browse files
Merge pull request AliceO2Group#292 from nicolaspoffley/master
Explanation of CPU warning in wagon vs train test, operator explanation of derived data scheduling, linked derived data addition.
2 parents 8e7831b + 64978bf commit add6b74

File tree

4 files changed

+19
-1
lines changed

4 files changed

+19
-1
lines changed

docs/faq/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,7 @@ The wagon test runs for about 5 minutes and is then stopped. Therefore, if you h
3434
### What is the job error ERROR_EW?
3535

3636
Hyperloop trains have a so-called express train feature. This feature is based on the fact that the last few percent of jobs usually take the longest time (not in execution time but to be scheduled on a site) and therefore trains can take the double total time just to process the last few percent. Therefore, up to 2% of the jobs are removed from the queue, in order for your train to finish. Those are marked with ERROR_EW in the job overview. In case you want the maximal statistics and you don't mind that your train will be slow, you can ask for a "slow train" submission to the operators.
37+
38+
### Why is it that my train test has a CPU warning but my wagon test was fine?
39+
40+
This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset.

docs/hyperloop/operatordocumentation.md

+12
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,18 @@ There are a number of settings that you can decide on when composing a train:
5757

5858
* The train will be automatically tested, and its progress can be followed in the _Train Runs_ table, or in the [**Train Runs**](#train-runs) page by clicking on the TRAIN_ID link.
5959

60+
61+
### <a name="wagonscheduling"></a>Scheduling of derived data wagons
62+
63+
* Wagons with derived data can be scheduled by operators to be automatically composed at the next composition schedule.
64+
* This is supported for standard and linked derived data wagons on any dataset with a composition schedule.
65+
* Multiple standard derived data wagons can be combined into one train automatically by Hyperloop, but linked derived data wagons are run separately.
66+
* Operators can simply choose to enable or disable the automatic *submission* and *slow train* options. The schedule is automatically determined by Hyperloop (the next scheduled slot in the dataset is used).
67+
68+
<div align="center">
69+
<img src="../images/scheduledWagon.png" width="40%">
70+
</div>
71+
6072
### <a name="stagedsubmission"></a>Staged Submission
6173

6274
* Short datasets are subsets of a big dataset

docs/hyperloop/userdocumentation.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -125,9 +125,10 @@ You can get to the _All Analyses_ page by using the main menu, or by the link in
125125
* <a name="wagonderived"></a>In _Derived Data_ the tables which are produced by the task are displayed. If activated, these are saved to the output if the train is run as a derived data production. The produced derived data can be made available by the operators and serve as input for subsequent trains.
126126

127127
### <a name="deriveddatatypes"></a> Derived data types
128-
* At the moment, there are two types of derived data specifications:
128+
* There are three types of derived data specifications:
129129
* Standard derived data (marked with 🗂️)- if the wagon is used in a train, this will produce derived data to be used for further analysis. The results will not be merged across runs and can be used as input for future train runs. Note that standard derived data trains do not submit automatically and may need additional approval. If in doubt, please seek advise before enabling derived data tables in your wagon configuration.
130130
* Slim derived data (marked with green bordered 🗂️) - similarly to the standard derived data case, if used in a train, this will produce derived data to be used for further analysis. This is reserved for derived data of small output size. The results will be merged across runs and are not available to use in future train runs. The data will be automatically deleted after a preset period of time. You can mark a wagon for running as slim derived data by checking `Ready for slim derived data`.
131+
* Linked derived data (marked with red bordered 🗂️) - linked derived data trains will also produce derived data to be used for further analysis. Linked derived data has access to the parent AO2D - this is not the case for other derived data types. Like standard derived data, results are not merged across runs.
131132

132133
* For wagons set as ready for slim derived data, two more fields need to be correctly set:
133134
* Max DF size - This sets the maximal dataframe size in the merging step. Has to be 0 for not-self contained derived data (which need parent file access).
@@ -337,6 +338,7 @@ When a wagon test finishes in warning, this means that the wagon will not be inc
337338
</div>
338339

339340
* The CPU usage limit is set per dataset and all trains running on a specific dataset must respect this constraint. If the limit is not respected, the train cannot be composed without PWG approval. Therefore, the user should discuss the details and requirements for this train with the PWG before requesting again. Depending on the amount of total resources, an approval in the Physics Board (PB) may also be needed. The CPU limit of a dataset may be viewed on the dataset page.
341+
* It is possible for a train to have a CPU warning when composed despite the wagon test not having a CPU warning. This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset.
340342

341343
### 4. <a name="warning-ccdb"></a> Too many CCDB calls
342344

docs/images/scheduledWagon.png

143 KB
Loading

0 commit comments

Comments
 (0)