Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added long running commands runtime to log files. #5260

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

annapetrosyan26
Copy link
Contributor

Added logging of the runtime and peak memory for the following commands:

  • global_place
  • rtl_macro_placer
  • repair_design
  • repair_timing
  • global_route
  • detailed_route

Updated the regression tests diff checking to ignore runtime lines.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@annapetrosyan26
Copy link
Contributor Author

@stefanottili @rovinski @oharboe @maliberty @QuantamHD this PR addresses the #5003 issue.
Could you please take a look? The changes have been locally tested.

Copy link
Collaborator

@oharboe oharboe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote up some feedback on what I can think of.

Also, there was a request to make this non-default option so that logs/reports don't become any more non-idempotent than they already are.

If OpenROAD wants to be an "examplary citizen" for bazel type build systems, then it should be a way to configure OpenROAD logs and reports should to idempotent... This of course means forgoing some types of logging and reporting like running times and memory consumption but this is understood and accepted in a bazel context.


ScopedStatistics::~ScopedStatistics()
{
logger_->report("Runtime {} seconds, memory used {} KB.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output leaves me with questions...

Scope? What took this long and used this much memory. What was the memory usage at start of scope compared to the peak usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request to add runtime info is originated from #3539 (comment) , so the scope is going to be clear, however I agree with your note about start memory consumption, so I suggest the following format for the log:
Runtime: 0.000545613 seconds, memory: start - 96123222 KB, peak - 105279488 KB.

Will this work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't describe the scope, what was run during this time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mostly would care to know the peak on top of the starting memory consumption and in megabytes. kilobytes is too small a unit in 2024 for humans...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRT already outputs in this format

[INFO DRT-0267] cpu time = 00:00:04, elapsed time = 00:00:01, memory = 2197.27 (MB), peak = 4031.35 (MB)

Is there a reason to have another?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That prefix is nice and gives some context, but there should be a human description of the scope of what is being measured. In some cases it would be unmabigious from that prefix, but e.g. detailed routing measures for each iteration, for instance. So some further description of context could be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add the command name before reporting its runtime to be clear about the scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to add a prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command name before reporting its runtime is already added. But by adding prefix like this [INFO DRT-0267] we should also create a unique message ID for each tool as Logger::info requires it, is this your suggestion?

size_t ScopedStatistics::getPeakMemoryUsage()
{
struct rusage rusage;
getrusage(RUSAGE_SELF, &rusage);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error handling for getrusage missing. There may not be access rights to get this information or otherwise it is not available?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what I want to know is maximum virtual memory, not maximum resident memory set. The only reason why ORFS is settling for maximum resident memory set is because ps doesn't measure maximum virtual memory.

A discussion on cores and max virtual memory vs. maximum resident memory set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't measure it you can't report it so I'm not sure what is requested here.

I would argue that max virtual memory is not very interesting in 64bit computing. You'll never run out of virtual address space and thrashing is a resident set size issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max virtual address space can be measured in C++ easily enough, it just the time command that doesn't support it, which makes it awkward for ORFS to report it.

I disagree that virtual memory size isn't interesting. resident set size will always be smaller than physical memory and and doesn't give any indication that you have too large a working set and that the system could be thrashing. That said, even if max virtual memory is larger than physical memory, doesn't mean that there will be thrashing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it easily measured?

rss ~= physical memory is a warning sign. vsz > physical memory doesn't say much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rss says something about what is happening on the computer it is running on, vsz says something about how much memory is required. As an OpenROAD developer, I don't care about my own computer, I only care about how much memory OpenROAD is using. As a user of OpenROAD, I don't care about the computer that is failing, I already know it is failing, I want to know how much memory is needed.

So, I'm interested in knowing vsz, more than rss. I did run tests with rss and it only tells me something about how much memory is used when I'm not overcommitted. If I'm overcommitted, rss doesn't tell me anything as it will be somewhat close to physical memory. It doesn't tell me how much memory I need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a mac, but when I compile and run this on Ubuntu, I get the output below:

#include <iostream>
#include <unistd.h>

#ifdef __linux__
#include <fstream>
#elif __APPLE__
#include <mach/mach.h>
#endif

void getMemoryUsage(size_t &rss, size_t &vsz) {
#ifdef __linux__
    std::ifstream statm("/proc/self/statm");
    if (!statm.is_open()) {
        std::cerr << "Error opening /proc/self/statm" << std::endl;
        return;
    }

    size_t pageSize = sysconf(_SC_PAGESIZE);
    size_t size, resident, share, text, lib, data, dt;

    statm >> size >> resident >> share >> text >> lib >> data >> dt;

    rss = resident * pageSize; // Convert from pages to bytes
    vsz = size * pageSize;     // Convert from pages to bytes

    statm.close();
#elif __APPLE__
    mach_task_basic_info info;
    mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;

    if (task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &infoCount) != KERN_SUCCESS) {
        std::cerr << "Error getting task info" << std::endl;
        return;
    }

    rss = info.resident_size;
    vsz = info.virtual_size;
#endif
}

int main() {
    size_t rss = 0, vsz = 0;
    getMemoryUsage(rss, vsz);
    std::cout << "RSS: " << rss << " bytes" << std::endl;
    std::cout << "VSZ: " << vsz << " bytes" << std::endl;
    return 0;
}

Output:

$ ./a.out
RSS: 3440640 bytes
VSZ: 6377472 bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more convenient to have both RSS and VSZ information?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my part, I think we need both. Over time, we may develop better understanding of how they interact with OpenROAD and decide we only need rsz or rss and why only one is sufficient, but for now I'm definitely missing rsz when analyzing how much more resources I need to give a build server...

@@ -0,0 +1,42 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hook up to CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't using pathlib or os a matter of taste ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean with "CI" is continuous integration, I wanted to know if this test is hooked up to the tests run on pull requests in github.

Regarding os vs pathlib, I think pathlib is more recent and a bit more elegant(but a little bit more complex than os) in the Python world but that os is more familiar(I know it better). I have no strong opinion on either for such simple scripts as these. It depends on what my colleagues prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is included in the regression, so it's in CI.

puts " *FAIL*$error_msg"
append_failure $test
incr errors(fail)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this proc is already quite long. Break out a new proc and give the proc a good name so that it is clear what is being tested. Testing for regressions in maximum resident set is bound to have very significant variations...

Copy link
Contributor Author

@annapetrosyan26 annapetrosyan26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding to make this as an option: the request has a pending question which is not answered yet so I made it as default.

@@ -0,0 +1,42 @@
import os
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't using pathlib or os a matter of taste ?


ScopedStatistics::~ScopedStatistics()
{
logger_->report("Runtime {} seconds, memory used {} KB.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request to add runtime info is originated from #3539 (comment) , so the scope is going to be clear, however I agree with your note about start memory consumption, so I suggest the following format for the log:
Runtime: 0.000545613 seconds, memory: start - 96123222 KB, peak - 105279488 KB.

Will this work?

@stefanottili
Copy link

Would it be possible to put the name of the command in the same line ?
The whole idea was to be able to grep for a keyword to easily get runtime info out of the verbose log file.

@maliberty
Copy link
Member

Also, there was a request to make this non-default option so that logs/reports don't become any more non-idempotent than they already are.

I disagreed then and still do. The goal is to have an informative log file by default. If you want to give up information to get idempotency that should be the non-default option.

@oharboe
Copy link
Collaborator

oharboe commented Jun 19, 2024

Also, there was a request to make this non-default option so that logs/reports don't become any more non-idempotent than they already are.

I disagreed then and still do. The goal is to have an informative log file by default. If you want to give up information to get idempotency that should be the non-default option.

I wanted to mention it, because it came up last time.

I do prefer this sort of information in logs by default and I don't know bazel well enough to understand why or how non-idempotent log artifacts cause a problem or if they are just an eyesore of sorts. I haven't observed any problems as such. Non-idempotent logs don't surprise me and isn't a problem for me. Looking forward to an explanation of the issue here for such build systems. bazel is one example, there are other... Certainly with build times of 24 hours, artifacts are required, ORFS alone doesn't cut it.

@annapetrosyan26
Copy link
Contributor Author

Would it be possible to put the name of the command in the same line ? The whole idea was to be able to grep for a keyword to easily get runtime info out of the verbose log file.

We can put the name of the command in the same line.

@oharboe
Copy link
Collaborator

oharboe commented Jun 19, 2024

Could you show some example output in this PR after the next update?

@annapetrosyan26
Copy link
Contributor Author

Could you show some example output in this PR after the next update?

The log file should contain an information like the example below:
repair_design: runtime = 0.000545613 (seconds), rsz start = 2197.27 (MB), vsz start = 4031.35 (MB), rsz peak = 2197.27 (MB), vsz peak = 4031.35 (MB).
Will this work?

@oharboe
Copy link
Collaborator

oharboe commented Jun 19, 2024

Could you show some example output in this PR after the next update?

The log file should contain an information like the example below: repair_design: runtime = 0.000545613 (seconds), rsz start = 2197.27 (MB), vsz start = 4031.35 (MB), rsz peak = 2197.27 (MB), vsz peak = 4031.35 (MB). Will this work?

  • Round off to integer seconds & megabytes.
  • Now I have to mentally do the math of peak - start, which is the information I care about. So the log line could do this math for me. net usage rsz = 1234/MB, vsz 12552/MB, peak rsz = 13534/MB, vsz 1552/MB
  • If this log line line can be squeezed into 80 chars while being legible, that's a boon...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

while (*p < '0' || *p > '9') {
p++;
}
result = (size_t) atoi(p);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this input was human made, I would recommend more error handling, like checking that the number is a valid decimal integer and not something like 00031243, but this is well defined machine generated input, an API to the OS, so I think this is fine as it is.

Copy link
Collaborator

@oharboe oharboe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me. I don't know anything about that tcl testing code/CI integration. Leaving that for others to review...

@@ -62,6 +63,7 @@ NesterovPlace::NesterovPlace(const NesterovPlaceVars& npVars,
utl::Logger* log)
: NesterovPlace()
{
utl::ScopedStatistics stat(log, "global_place");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in Replace::doNesterovPlace to capture initial placement as well.

@@ -93,6 +94,7 @@ void RepairDesign::repairDesign(double max_wire_length,
bool verbose)
{
init();
utl::ScopedStatistics stat(logger_, "repair_design");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to Resizer::repairDesign. This API might be called internally.

@@ -93,6 +94,7 @@ void RepairHold::repairHold(
const bool verbose)
{
init();
utl::ScopedStatistics stat(logger_, "repair_timing");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to Resizer::repairHold

@@ -132,6 +134,7 @@ void RepairHold::repairHold(const Pin* end_pin,
const int max_passes)
{
init();
utl::ScopedStatistics stat(logger_, "repair_timing");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to Resizer::repairHold

Comment on lines 28 to 36
def main():
script_dir = os.path.dirname(os.path.abspath(__file__))
base_dir = os.path.abspath(os.path.join(script_dir, "../../rsz/test"))
log_path = os.path.join(base_dir, "results", "repair_design4-tcl.log")
regression_script_path = os.path.join(base_dir, "regression")

if not os.path.exists(log_path):
if not run_regression_script(regression_script_path, "repair_design4"):
exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a regression test calling regression again? This will create a conflict with the results of running repair_design4 (especially if run in parallel with this test). It is better to make this test self-contained.


ScopedStatistics::~ScopedStatistics()
{
logger_->report(msg_ + ": runtime {} seconds, usage: rsz = {} MB, vsz = {} MB, peak: rsz = {} MB, vsz = {} MB", static_cast<int>(Timer::elapsed()), (getPeakRSZ() - start_rsz_),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime is unclear as it could be elapsed or cpu time. I think using elapsed is clearer.

Personally I think peak is sufficient (@oharboe any opinion)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't know the starting memory consumption, we can miss false positives: the problem is not with the algorithm that is reported as too high usage, but something in the setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't tell you the starting memory - only the end - start

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. pick a way to present two numbers....

start memory is trivially calculated from peak and usagr, true for any cpmbination of these two numbers. most time usage and peak is what is of interested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage and peak do not give start. The peak may well be in the middle not at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is what peak should say and why I don't think usage matters much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Keep it simple. If there is something wrong(large initial memory allocation outside of scope), then it can be debugged otherwise. I think normal operation, determining required RAM, is more important and other information clouds that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so let's go with the DRT format above (with prefix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the conversation below it's not clear what the final message should be. Please confirm if it should look like the DRT message you provided.
[INFO DRT-0267] cpu time = 00:00:04, elapsed time = 00:00:01, memory = 2197.27 (MB), peak = 4031.35 (MB)
Otherwise please provide the exact way you would like the message to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the prefix would be about what is being timed, eg

repair_design: cpu time = 00:00:04, elapsed time = 00:00:01, memory = 2197.27 (MB), peak = 4031.35 (MB)

Though the prefix would be up to the client code.

@stefanottili
Copy link

It would be nice to have one common prefix that one can grep for.

FILE* file = fopen("/proc/self/status", "r");
if (file == nullptr) {
perror("Failed to open /proc/self/status");
exit(EXIT_FAILURE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too harsh a form of error handling. Issuing a logger warning is sufficient.


size_t ScopedStatistics::getMemoryUsage(const char* tag)
{
FILE* file = fopen("/proc/self/status", "r");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

procfs doesn't exist on OSX unfortunately. src/drt/src/db/infra/frTime_helper.cpp attempts to handle OSX

Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/utl/include/utl/timer.h Outdated Show resolved Hide resolved
src/utl/include/utl/timer.h Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
src/utl/src/timer.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/utl/src/timer.cpp Outdated Show resolved Hide resolved
Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/utl/src/timer.cpp Outdated Show resolved Hide resolved
Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/utl/src/timer.cpp Show resolved Hide resolved
Signed-off-by: Anna Petrosyan <[email protected]>
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@annapetrosyan26
Copy link
Contributor Author

@maliberty the appropriate changes have been made. Could you please review them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants