-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30993 Fix file access for index read activity #18144
Conversation
@@ -98,7 +98,7 @@ class CIndexReadSlaveBase : public CSlaveActivity | |||
if (!keyManager) | |||
throw MakeActivityException(&activity, 0, "Callback attempting to read blob with no key manager - index being read remotely?"); | |||
needsBlobCleaning = true; | |||
return (byte *) keyManager->loadBlob(id, dummy, &activity.contextLogger); | |||
return (byte *) keyManager->loadBlob(id, dummy, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob stats will be address in this ticket: https://track.hpccsystems.com/browse/HPCC-31002
{ | ||
IPartDescriptor &part = partDescs.item(p++); | ||
IPartDescriptor &part = partDescs.item(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in lines 169-171 is so that the correct contextLogger can be referenced with [p]. If p++ remains here then contextLoggers[p] cannot be used in line 306.
@@ -273,7 +272,7 @@ class CIndexReadSlaveBase : public CSlaveActivity | |||
continue; // try next copy and ultimately failover to local when no more copies | |||
} | |||
ActPrintLog("[part=%d]: reading remote dafilesrv index '%s' (logical file = %s)", partNum, path.str(), logicalFilename.get()); | |||
partNum = p; | |||
partNum = (p+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change relates to: 55ef2b9#r1426670630
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sensible change. Not new, but might be good to add a comment to clarify this is returning next part #, e.g.:
partNum = p+1; // NB: returning next part number to process.
keyManager = klManager; | ||
partNum = p; | ||
partNum = (p+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change relates to: 55ef2b9#r1426670630
return createIndexLookup(keyManager); | ||
} | ||
} | ||
keyMergerManager.setown(createKeyMerger(helper->queryDiskRecordSize()->queryRecordAccessor(true), keyIndexSet, seekGEOffset, &contextLogger, helper->hasNewSegmentMonitors(), false)); | ||
//Not tracking jhtree cache stats in KeyMerger at the moment. Future: something to consider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jira has been created to address KeyMerger stats: https://track.hpccsystems.com/browse/HPCC-30999
updateStats(); | ||
lazyIFileIO.clear(); | ||
} | ||
lazyIFileIO.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stats gathering has been moved to gatherActiveStats so that the stats are updated periodically whilst the activity is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - there are some merge clashes, looks like it needs rebasing.
Could you also update commit and PR message and JIRA description with some comments describing what the issue this is fixing and how. Thanks.
961044a
to
650c837
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - there are some merge clashes, looks like it needs rebasing.
Could you also update commit and PR message and JIRA description with some comments describing what the issue this is fixing and how. Thanks.
@shamser - also, the PR is failing (crashing by the looks of it) some regression tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update commit and PR message and JIRA description with some comments describing what the issue this is fixing and how. Thanks.
am not seeing an updated commit + PR message, and JIRA description ?
Serialize jhtree and file io stats to thor manager. (Correct cost calculation for index reads require jhtree stats, so with Note: this change is dependent on HPCC-31035 Move ctx stats merging to caller from CKeyLevelManager mergeStats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, some minor comment/questions.
@@ -719,10 +710,66 @@ class CIndexReadSlaveBase : public CSlaveActivity | |||
virtual void gatherActiveStats(CRuntimeStatisticCollection &activeStats) const | |||
{ | |||
PARENT::gatherActiveStats(activeStats); | |||
// *) jhtree stats will have been tracked as follows | |||
// - For superfiles, there will be 1 IContextLogger in contextLoggers[] for each subfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 671-673 is creating 1 IContextLogger per physical part not per subfile,
which gatherActiveStats is depending on as it iterates through parts and does:
fileStatItem->merge(contextLoggers[partNum]->queryStats());
You could create only 1 IContextLogger per subfile, and then use mapSubPart to map the part# -> subfile# to reference contextLoggers[subfile]
Not sure worth it, but comment should be changed if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a very large number of parts? If that is the case, do you think the performance would be impacted enough to make the change worthwhile?
I'd like leave as it is unless it is really worthwhile. Although change looks trivial, it will require significant time consuming re-testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be quite large, e.g. a superfile of 50+ files of 400x files, but each workers only gets a slice of those, and these CStatsContextLogger objects aren't that big.
I would merge as is, and perhaps open a low-prio. JIRA to revisit it on a rainy day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a jira to revisit this: https://track.hpccsystems.com/browse/HPCC-31103
@@ -273,7 +272,7 @@ class CIndexReadSlaveBase : public CSlaveActivity | |||
continue; // try next copy and ultimately failover to local when no more copies | |||
} | |||
ActPrintLog("[part=%d]: reading remote dafilesrv index '%s' (logical file = %s)", partNum, path.str(), logicalFilename.get()); | |||
partNum = p; | |||
partNum = (p+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sensible change. Not new, but might be good to add a comment to clarify this is returning next part #, e.g.:
partNum = p+1; // NB: returning next part number to process.
IKeyManager * keyManager; | ||
{ | ||
CriticalBlock b(keyManagersCS); | ||
if (!keyManagers.isItem(partNum)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a test (earlier) to exclude (and later handle differently) the merger case?
The test here may mean it passed for part 1, and bails out for part 2+ in the merge case, which will only have 1 IKeyManager in keyManagers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would merging stats even when merger is used be a problem? The merger was going to handled separately with https://track.hpccsystems.com/browse/HPCC-30999
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will cause exceptions, and it looks like stats will be correct in the non-super case.
But not clear to the reader what will happen in localMerge=true case at the moment.
I think worth a comment block that explains current caveat and that it needs refactoring, in if (localMerge) case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added line 728-730.
@jakesmith I've push changes with comment changes. I may need to make further changes in response to my clarification request on your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, if you could add a comment in gatherActiveStats re. localMerge caveat, and then go ahead and squash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, please squash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialize jhtree and file io stats to thor manager. Record jhtree and file io stats for Index Read Activity (to activity scope) (Correct cost calculation for index reads require jhtree stats, so with this change the file access cost for index read should be correct.) Signed-off-by: Shamser Ahmed <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser the changes look good - I didn't follow all the details.
A couple of questions which would help clarify the code, but happy to merge as-is.
{ | ||
CriticalBlock b(keyManagersCS); | ||
if (!keyManagers.isItem(partNum)) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser could this break rather than continuing? partNum will only get larger, and the test above will always fail. (If something else is adding them at the same time I guess it could succeed, but will be blank.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical section to protect against changes to KeyManagers.
// - the io stats will be tracked in keyManagers[] | ||
// - (There will be exactly 1 keyManager per part) | ||
// Note: the KeyManagers[], contextLoggers[] and fileStats[] are not necessarilly the same size | ||
// - KeyManagers[] will always be the same size as partDescs[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem to be quite true, since line 742 is testing it is true. Worth adding a comment where when it might not be true.
9391425
into
hpcc-systems:candidate-9.4.x
Type of change:
Checklist:
Smoketest:
Testing: