-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autotuner] Improve unit test reliability 1 #2538
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered python mocking to avoid processes that you have to start and stop?
That is a good idea, I originally intended it to be as close to the real runtime environment as possible but if tests continue to fail this might be the move. |
I'm thinking it is going to be both. I've been using bazel-orfs in an autotuner like capacity and the biggest problem I currently have is error handling and resource management. @jeffng-or and I launched an exploration run of, for instance, MAX_UNGROUP_SIZE and also we wanted to do the runs through grt. The MAX_UNGROUP_SIZE never really completed, but I was able to look at the results that I got and I used them to plot the progress and the conclusion was trivial: there is no correct value of MAX_UNGROUP_SIZE, instead we have to first create a macro placement with SYNTH_HIERARCHICAL=1, but we have to throw away the result of that run and use that result in a run with SYNTH_HIERARCHICAL=0. For the grt runs, the problem is that this part of the flow can't run in parallel with other runs, since it will then make the servers run out of memory and the servers will crash. I plan to fix bazel-orfs such that it has a rudimentary knowledge of which steps can run in parallel and which cannot. I think grt, route and macro placement have to run alone in a server, whereas the other stages can run in parallel. I need instrumentation in bazel-orfs to track the resident memory set to see what can run in parallel or not. Possibly I have to do a trial run from start to end, then track the memory requirements and CPU usage and come up with some sort of provisioning plan. |
437ab5b
to
76d7344
Compare
03516f0
to
7a48e12
Compare
success=false | ||
|
||
while [[ $retry_count -lt $max_retries ]]; do | ||
if pip3 cache purge && pip3 install --no-cache-dir -U -r "$script_dir/requirements.txt"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cache purge
required with the --no-cache-dir
option or are they redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not mutually exclusive. Pip cache purge might be needed if system had cache before, and no cache dir just ensures no future caching is done.
Signed-off-by: Jack Luar <[email protected]>
Signed-off-by: Jack Luar <[email protected]>
* context: pip install fails on large files due to network instability Signed-off-by: Jack Luar <[email protected]>
Signed-off-by: Vitor Bandeira <[email protected]>
109679e
to
9b0671c
Compare
Signed-off-by: Jack Luar <[email protected]>
No description provided.