-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-2362: Profile Mgmt #1750
base: master
Are you sure you want to change the base?
DRILL-2362: Profile Mgmt #1750
Conversation
@arina-ielchiieva I'm looking for a suggestion on how to manage existing profiles. |
3f34a7a
to
1ec909c
Compare
@arina-ielchiieva could you please review this PR ? |
Hi @kkhatua! Thank you for this contribution, I'd like to help to move it forward.
I share your concern here. I think we should consider having Drill only write new profiles to partitioned directories. Any partitioning of historical profiles can be done externally by admins, in my opinion, and we can add examples of "housekeeping" scripts for doing that to the Drill documentation. Would you like to do any of the following?
Thanks |
Or, if we do want to keep this built in ability to partition existing profiles, perhaps we should have it launched from a button on the Profiles page in the web UI instead of on Drillbit startup? That would remove the complication of which Drillbit does the work and the worries of slowing down startup or partitioning profiles that nobody wanted partitioned. |
NOTE: This PR is a revamp of the work done for DRILL-5270 (PR #1250 and #1654). Those PRs were intended to improve the profile loading time for the WebUI, but did not address the fundamental problem of having too many profiles in the profiles directory.
When Drill is displaying profiles stored on the file system (Local or Distributed), it does so by loading the entire list of .sys.drill files in the profile directory, sorting and deserializing. This can get expensive, since only a single CPU thread does this.
As an example, a directory of 120K profiles, the time to just fetch the list of files alone is about 6 seconds. After that, based on the number of profiles being rendered, the time varies. An average of 30ms is needed to deserialize a standard profile, which translates to an additional 3sec for the rendering of default 100 profiles.
A user reported issue confirms just that:
DRILL-5028 Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS
Additional JIRAs filed ask for managing these profiles
DRILL-2362 Drill should manage Query Profiling archiving
DRILL-2861 enhance drill profile file management
This PR brings the following enhancements to ensure that profiles are better managed.
<profileDir>/yyyy/mm/dd
Reference: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/user/UserWorker.java#L67
e.g. moving from
yyyy/mm/dd
toyyyy/mm/dd/hh
diorama
, also exists in the root of the profile directory. The purpose of this is to allow users to dump external profiles that can be then rendered and visualized in the WebUI. Currently, the profile needs to be dumped manually and cannot be done via the WebUI. This could be a future enhancement,