Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared memory for darwintraced processes #128

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

MihirLuthra
Copy link
Contributor

@MihirLuthra MihirLuthra commented Jun 8, 2019

The upgradations are made as per the Speed up trace mode project.
The code has worked stably with all ports I have tested till now. This PR adds a shared memory cache for the processes into which darwintrace library is injected.
I have also uploaded a separate repository on GitHub that contains tests for the newly added library and a readme which attempts to explain the complete code. The library can be used in a standalone way (by defining STANDALONE_DTSM in Makefile) if anyone wants to test its working.

Any suggestions or feedbacks would be really helpful.

I am entering speed comparisons data about in a spreadsheet. I will keep updating the table with the comparison test I make.

@pmetzger
Copy link
Member

pmetzger commented Jun 9, 2019

Can you add something to that spreadsheet to show the percentage speedup/slowdown?

@MihirLuthra
Copy link
Contributor Author

MihirLuthra commented Jun 9, 2019

Can you add something to that spreadsheet to show the percentage speedup/slowdown?

I have added a new column indicating the % speedup/slowdown.
Also I guess there are still many scope of improvements in the newly added library.
Aside from those i am uncertain about some improvements which maybe made. Like I dunno if I can maintain the same shared memory for all the installation phases of a port. Currently I delete the shared memory data after each phase. I actually think it should be okay to maintain data till the end of the installation but it would be better if someone confirms that.

@pmetzger
Copy link
Member

pmetzger commented Jun 9, 2019

Hrm. Looks like all this is a lot of work for only a few percent change on average...

@MihirLuthra
Copy link
Contributor Author

MihirLuthra commented Jun 9, 2019

Hrm. Looks like all this is a lot of work for only a few percent change on average...

Till now I guess yes. I am hoping to see more changes with the edits I am testing right now. I will make a comment when I am done. Can you give a look at P,Q & R column(the results with upgradations in code) in the spreadsheet? Those results till know are probably turning out to be better than before.

@MihirLuthra
Copy link
Contributor Author

MihirLuthra commented Jun 10, 2019

@pmetzger I have made some changes to the % calculation formula. Last time it was getting calculated as ((actual_time_by_old_trace_mode - actual_time_by_modified_trace_mode)/actual_time_by_old_trace_mode) * 100 which actually isn’t correct. I changed it to ((time_lag_caused_by_old_trace_mode - time_lag_caused_by_new_trace_mode)/time_lag_caused_by_old_trace_mode) * 100. Actually now it makes more sense because the modifications made in trace mode can’t make the trace mode faster than what it takes without trace mode. Like in perl5.28, time taken by old trace mode is 8m29.826s and time taken by new trace mode is 7m15.217s but the time taken without trace mode is 5m58.364s. So this actually means 51 % improvement. Am I right on that?

@pmetzger
Copy link
Member

8.5 minutes vs 7.25 minutes is not a 51% inprovement. A 51% improvement would be 4.25 minutes.

@MihirLuthra
Copy link
Contributor Author

8.5 minutes vs 7.25 minutes is not a 51% inprovement. A 51% improvement would be 4.25 minutes.

I understand, I will try to make the improvements better. Maybe I rushed in too early with a PR just completing the core idea.

@MihirLuthra MihirLuthra force-pushed the dtsm-darwintrace branch 3 times, most recently from 1488ebe to 050129a Compare June 15, 2019 08:37
@mojca
Copy link
Member

mojca commented Jun 21, 2019

8.5 minutes vs 7.25 minutes is not a 51% inprovement. A 51% improvement would be 4.25 minutes.

... only if the build without trace mode would take 0 minutes.

Since @MihirLuthra is not optimizing the compiler where your calculation would make sense, his numbers seem to be correct. If the build without trace mode is taking 6 minutes and the time in trace mode dropped from 8.5 to 7.25 minutes, this means that the overhead dropped from 2.5 minutes to 1.25 minutes. He's optimising the overhead, not the compilation time, so the assessment sounds correct to me.

I'm not saying that there's no further room for improvement, in particular for the cases where the percentages are negative, just that the numbers sound correct.

Copy link
Member

@raimue raimue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not been able yet to test this, so my review is solely based on the code on GitHub. Please look into the indentation, as it was already hard to read in the GitHub web interface due to the mix of tabs and spaces. If in doubt, please always follow the context of the surrounding code, even if it does not follow your preferred style.

I spotted lots of duplicated code that should be wrapped in their own functions. Otherwise future changes will have to be applied in multiple locations and there will always be the risk to miss any of them.

I have to admit that I have not looked much into the dtsharedmemory.c and what it does, but I assume this has already been discussed before this was started.

* changes should be made.
* See LOWER_LIMIT, UPPER_LIMIT & POSSIBLE_CHARACTERS in dtsharedmemory.h
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some inconsistency with the indentation here. Seems like the rest of the file uses tabs. I know this is cumbersome, but could you please adapt all hunks to follow the existing context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will correct it. I never noticed that because while working on my pc, the code seemed to have consistent indentation (sorry first time encountered such a situation).

@@ -1015,5 +1181,23 @@ bool __darwintrace_is_in_sandbox(const char *path, int flags) {
}
} while (pathIsSymlink);

return __darwintrace_sandbox_check(normPath, flags);
//Check shared memory cache for path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is already twice in this file. Couldn't we add all of this to __darwintrace_sandbox_check for simplicity? Or at least wrap this lookup sequence in its own function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix that before making the next commit for sure. Thanks for giving it a review :) Also, this became the case cuz I shifted __darwintrace_setup() calls a lot. __darwintrace_setup() were surprisingly the most time taking calls of all. Before the assumption was __darwintrace_is_in_sandbox(), when registry is checked, is the most time taking. So for that I only did setup on demand coz of which I ended up messing up the code. Also, I actually have conducted new tests with new changes which I had been discussing in mailing lists. I will update them by today night on here. Currently, I am testing on a rental Mac with HDD (previous tests were on ssd) now because I thought that’s where registry calls should actually be slow and that should have been the reason to come up with this project.

return false;
}

//Check shared memory for path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code. See below.


# Tell traced processes the file name to use as shared memory and status
set env(DTSM_STATUS_NAME) $dtsm_status_name
set env(DTSM_NAME) $dtsm_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the DARWINTRACE_ prefix for all environment variables for clarity.

@@ -844,9 +851,10 @@ bool __darwintrace_is_in_sandbox(const char *path, int flags) {

if (!filemap)
return true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace.

if ((flags & DT_REPORT) > 0) {
__darwintrace_log_op("sandbox_violation", path);
}
return false;

//Store in shared memory cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code. See below.

case 0:
// file belongs to a foreign port, deny access
if ((flags & DT_REPORT) > 0) {
__darwintrace_log_op("sandbox_violation", path);
}
return false;

//Store in shared memory cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code. See below.

case -1:
// if the file isn't known to MacPorts, allow
// access anyway, but report a sandbox violation.
// TODO find a better solution
if ((flags & DT_REPORT) > 0) {
__darwintrace_log_op("sandbox_unknown", path);
}
return true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code. See below.

Copy link
Contributor Author

@MihirLuthra MihirLuthra Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raimue , this is not the latest commit you are looking at. I had made one commit afterwards (in which I most probably resolved that problem by using flags instead of many bools). Would be great if you check that once.
Here you can find the latest commit.

//functions in dtsharedmemory.c to access offsets in memory that don't exist
//causing seg faults and bus errors

fprintf(stderr, "dtsharedmemory.c : Failed to reset fd");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go through some darwintrace function to a logfile instead of stderr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remember to fix that too in the next commit.

@MihirLuthra
Copy link
Contributor Author

8.5 minutes vs 7.25 minutes is not a 51% inprovement. A 51% improvement would be 4.25 minutes.

... only if the build without trace mode would take 0 minutes.

Since @MihirLuthra is not optimizing the compiler where your calculation would make sense, his numbers seem to be correct. If the build without trace mode is taking 6 minutes and the time in trace mode dropped from 8.5 to 7.25 minutes, this means that the overhead dropped from 2.5 minutes to 1.25 minutes. He's optimising the overhead, not the compilation time, so the assessment sounds correct to me.

I'm not saying that there's no further room for improvement, in particular for the cases where the percentages are negative, just that the numbers sound correct.

@mojca , thanks for making me aware of that. Also in the new changes none of the ports is giving negative results, but I surely need to optimise it more. I will post the latest tests asap.

@MihirLuthra
Copy link
Contributor Author

@raimue Kindly check the latest code. I think I fixed most issues which you told me last time.
Also, here are the latest tests but they still are incomplete.

@raimue raimue requested a review from neverpanic June 29, 2019 18:54
@mojca
Copy link
Member

mojca commented Aug 20, 2019

@neverpanic, @jmroot: could you please provide feedback to the revisited code?

@MihirLuthra: sorry for the delayed answer. One possible way to run performance tests without overheating your laptop could be to set up builds on Azure CI which provides 6 hours to finish some builds. I'm not saying it's a small amount of work, but one could start from the existing setup of ports setup, and first build the base with and without your patches (maybe different parameters to the build matrix?), install the base, then install dependencies from binary packages and finally start timing the build time of a given port. The port could potentially be cleaned and rebuilt again to increase the statistics. This is not a prerequisite to get this code merged, but rather a random idea that might help doing performance testing later.

@neverpanic
Copy link
Member

Just in the interest of documenting the measurements that we just did, here are some numbers.

curl-modified,untraced,75.36,83.06,49.01
curl-modified,untraced,75.48,82.96,48.68
curl-modified,untraced,74.83,80.88,48.21
curl-modified,baseline,84.61,95.44,58.41
curl-modified,baseline,84.19,93.21,57.67
curl-modified,baseline,83.99,93.83,57.80
curl-modified,optimized,81.11,93.37,54.97
curl-modified,optimized,81.86,93.38,55.47
curl-modified,optimized,81.85,93.27,55.23
curl-unmodified,untraced,78.28,85.64,48.21
curl-unmodified,untraced,79.82,88.55,50.73
curl-unmodified,untraced,79.53,86.71,50.39
curl-unmodified,baseline,90.14,102.47,61.22
curl-unmodified,baseline,88.33,100.38,59.54
curl-unmodified,baseline,89.24,99.23,60.02
curl-unmodified,optmized,85.66,98.39,56.66
curl-unmodified,optmized,85.62,99.43,57.08
curl-unmodified,optmized,85.23,97.31,56.25

This was tested on a MacBook Pro (Retina, 15-inch, Mid 2015) with sudo port <-t> destroot curl with a hot cache and no cleanup to do at the beginning of the build.

untraced is without trace mode enabled, baseline is with the current master of trace mode, and optimized is with this PR.

We see an improvement of between 3.5 and 4.7 percentage points in the wallclock (that's between 3.1 and 4.2 % of the tracemode baseline). The overhead of trace mode falls from ~12.5% to 8%.

@pmetzger
Copy link
Member

4% isn't a lot given the complexity. Is that worth it?

@neverpanic
Copy link
Member

Probably not in its current state, but we've found some things in the Ctrie that aren't as optimal as they should be, so I'm pretty confident this can be improved, and also simplified by maybe removing some of the socket communication functionality and replacing it with direct shared memory access.

@neverpanic
Copy link
Member

Progress is being made: https://github.com/neverpanic/cctrie. This implementation of a CTrie is significantly faster than what's currently in this PR.

@MihirLuthra
Copy link
Contributor Author

MihirLuthra commented Dec 28, 2019

I thought I should update it here once, the shared memory to be used alongside the above ctrie repository is ready: https://github.com/MihirLuthra/shm_alloc
Thanks to Clemens, the combination is significantly faster than before. Now what remains is writing tcl wrappers.

@MihirLuthra
Copy link
Contributor Author

MihirLuthra commented Feb 8, 2020

Old changes have been archived : old branch

Additions:

  1. Shared memory allocation library. This is introduced into the memory of each darwintraced process with__attribute__((constructor)).
  2. Ctrie data structure working with shared memory which is used for caching paths that were queried from registry.
  3. A shared memory trie that is being used for trace sandbox. Previously,darwintrace asked tracelib.c for a filemap over the socket through which it used to iterate. Now in porttrace.tcl, instead of supplying sandbox bounds to tracelib.c, prefixes are inserted in this trie.

The above 3 are combined into a static library that is used by both pextlib and darwintrace.
A shared library wasn't a good candidate for this as darwintrace would intercept calls in our own library as well.

@MihirLuthra MihirLuthra force-pushed the dtsm-darwintrace branch 5 times, most recently from 42d7c74 to c652fd3 Compare February 8, 2020 11:17
@MihirLuthra MihirLuthra force-pushed the dtsm-darwintrace branch 3 times, most recently from 057bd6f to 7ad0b01 Compare February 10, 2020 04:57
@mascguy
Copy link
Member

mascguy commented Oct 28, 2021

Love the work, it would be great to see a noticable reduction in trace mode overhead!

@mascguy mascguy self-assigned this Aug 12, 2022
@mascguy
Copy link
Member

mascguy commented Aug 19, 2022

As someone who's come to depend on trace mode more and more - it's absolutely critical when validating port dependencies and such - I'd love to see this work merged!

The first step is to resolve the merge conflicts, and I'll take care of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants