-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace calls to ls and rm with perl functions. #42
Open
rapier1
wants to merge
2
commits into
hjmangalam:master
Choose a base branch
from
rapier1:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This avoids issues when there are too many cache files for ls to process with a wildcard.
The new method of counting the number of cache files makes this unnecessary.
Thanks very much Chris. Good points - a lot of the code was pulled together
quite haphazardly (as you might have noticed).
I'm ambivalent about getting rid of the fpart file number limits since too
many fpart files have impacts on other bits of the filesystem and the churn
when starting too many rsyncs.
I'm finishing up some other work, but I'll try to merge these over the
weekend.
Harry
…On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier ***@***.***> wrote:
This avoids issues when there are too many cache files for ls to process
with a wildcard. While this doesn't happen all that often I've run into
problems when moving very large data sets. Especially when they had files
with widely varying sizes. I *think* this means some of the checks for
too many cache files can be removed as well but I just wanted to submit the
basics at this point. I haven't seen any notable performance issues even
when processing 50,000 cache files.
I also removed trailing whitespace from some of the line (M-x
delete-trailing-whitespace in emacs).
------------------------------
You can view, comment on, or merge this pull request online at:
#42
Commit Summary
- Replace calls to ls and rm with perl functions.
File Changes
- *M* parsyncfp
<https://github.com/hjmangalam/parsyncfp/pull/42/files#diff-73be673404f947b738e0d81bf0a44761fb097919372f259198c6b5fb4c5f9a17>
(270)
Patch Links:
- https://github.com/hjmangalam/parsyncfp/pull/42.patch
- https://github.com/hjmangalam/parsyncfp/pull/42.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#42>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ>
.
--
Harry Mangalam
|
Glad to be of help.
Just so you have an example - we used parsyncfp to move our primary data
storage system to a new system. Something like 6 to 7 PB of data. Both
filesystems were using lustre (which is its own issue). I wrote a
wrapper so users could fire off their own runs of parsyncfp as slurm
jobs. We ended up using an NP of 16 and a chunksize of -4G to override
the cache limit. This was necessary as we had some users with more than
1PB of data. We dedicated 4 slurm node to these jobs and usually had 2
or 3 people per node. These are a *beefy* nodes with 64 cores and 128
threads and fully dedicated to parsyncfp tasks. We were seeing
throughput peaking at 2500MB/s and averaging at 853MB/s (thought the
media was probably closer to 1200). So I don't think we were seeing that
much in the way of thrashing but we did have really good equipment for
this. That said, the switch to disable the MAX_FPART works fine.
Chris
…On 4/9/21 3:02 PM, Harry Mangalam wrote:
Thanks very much Chris. Good points - a lot of the code was pulled together
quite haphazardly (as you might have noticed).
I'm ambivalent about getting rid of the fpart file number limits since too
many fpart files have impacts on other bits of the filesystem and the churn
when starting too many rsyncs.
I'm finishing up some other work, but I'll try to merge these over the
weekend.
Harry
On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier ***@***.***>
wrote:
> This avoids issues when there are too many cache files for ls to process
> with a wildcard. While this doesn't happen all that often I've run into
> problems when moving very large data sets. Especially when they had files
> with widely varying sizes. I *think* this means some of the checks for
> too many cache files can be removed as well but I just wanted to
submit the
> basics at this point. I haven't seen any notable performance issues even
> when processing 50,000 cache files.
>
> I also removed trailing whitespace from some of the line (M-x
> delete-trailing-whitespace in emacs).
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> #42
> Commit Summary
>
> - Replace calls to ls and rm with perl functions.
>
> File Changes
>
> - *M* parsyncfp
>
<https://github.com/hjmangalam/parsyncfp/pull/42/files#diff-73be673404f947b738e0d81bf0a44761fb097919372f259198c6b5fb4c5f9a17>
> (270)
>
> Patch Links:
>
> - https://github.com/hjmangalam/parsyncfp/pull/42.patch
> - https://github.com/hjmangalam/parsyncfp/pull/42.diff
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#42>, or unsubscribe
>
<https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ>
> .
>
--
Harry Mangalam
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKL66BQMMUXAZAGYAJFDBDTH5FOHANCNFSM42VJZ6WQ>.
|
Hi Chris,
Your patches are lighter weight than the system calls I had, so I'll
include them going forward, but I'm still concerned about eliminating the
$MAX_FPART_FILES simply bc a novice will use a value that generates
literally 10s of 1000s of them and unless I'm missing something, that's not
something you want, since starting up a bazillion rsyncs takes time as
well. There's a tradeoff between early starts (lots of tiny fpart chunks)
and late starts (smaller numbers of larger fpart chunks). In fact this is
something I'll mention to Ganael (fpart's author) - can fpart be told to
chunk X number of small files (for startup) and then switch to larger
chunks fo the main run.
Also, I've gotten the multihost version running and after some testing on a
fast net, I'll probably be releasing it within a week - it's a major
reworking of the code so I'll include your code, but not as a simple pull.
Also, did you get a chance to look at the RoundRobin changes I suggested?
Did any of them work the way you wanted?
Best wishes and thanks for your contribution to pfp.
Harry
On Fri, Apr 9, 2021 at 12:39 PM Chris Rapier ***@***.***>
wrote:
… Glad to be of help.
Just so you have an example - we used parsyncfp to move our primary data
storage system to a new system. Something like 6 to 7 PB of data. Both
filesystems were using lustre (which is its own issue). I wrote a
wrapper so users could fire off their own runs of parsyncfp as slurm
jobs. We ended up using an NP of 16 and a chunksize of -4G to override
the cache limit. This was necessary as we had some users with more than
1PB of data. We dedicated 4 slurm node to these jobs and usually had 2
or 3 people per node. These are a *beefy* nodes with 64 cores and 128
threads and fully dedicated to parsyncfp tasks. We were seeing
throughput peaking at 2500MB/s and averaging at 853MB/s (thought the
media was probably closer to 1200). So I don't think we were seeing that
much in the way of thrashing but we did have really good equipment for
this. That said, the switch to disable the MAX_FPART works fine.
Chris
On 4/9/21 3:02 PM, Harry Mangalam wrote:
> Thanks very much Chris. Good points - a lot of the code was pulled
together
> quite haphazardly (as you might have noticed).
> I'm ambivalent about getting rid of the fpart file number limits since
too
> many fpart files have impacts on other bits of the filesystem and the
churn
> when starting too many rsyncs.
> I'm finishing up some other work, but I'll try to merge these over the
> weekend.
> Harry
>
> On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier ***@***.***>
> wrote:
>
> > This avoids issues when there are too many cache files for ls to
process
> > with a wildcard. While this doesn't happen all that often I've run into
> > problems when moving very large data sets. Especially when they had
files
> > with widely varying sizes. I *think* this means some of the checks for
> > too many cache files can be removed as well but I just wanted to
> submit the
> > basics at this point. I haven't seen any notable performance issues
even
> > when processing 50,000 cache files.
> >
> > I also removed trailing whitespace from some of the line (M-x
> > delete-trailing-whitespace in emacs).
> > ------------------------------
> > You can view, comment on, or merge this pull request online at:
> >
> > #42
> > Commit Summary
> >
> > - Replace calls to ls and rm with perl functions.
> >
> > File Changes
> >
> > - *M* parsyncfp
> >
> <
https://github.com/hjmangalam/parsyncfp/pull/42/files#diff-73be673404f947b738e0d81bf0a44761fb097919372f259198c6b5fb4c5f9a17
>
> > (270)
> >
> > Patch Links:
> >
> > - https://github.com/hjmangalam/parsyncfp/pull/42.patch
> > - https://github.com/hjmangalam/parsyncfp/pull/42.diff
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <#42>, or unsubscribe
> >
> <
https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ
>
> > .
> >
>
>
> --
>
> Harry Mangalam
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#42 (comment)>,
or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAKL66BQMMUXAZAGYAJFDBDTH5FOHANCNFSM42VJZ6WQ
>.
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASF3YYNRGAATGZGBFHVMDLTH5JYXANCNFSM42VJZ6WQ>
.
--
Harry Mangalam
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This avoids issues when there are too many cache files for ls to process with a wildcard. While this doesn't happen all that often I've run into problems when moving very large data sets. Especially when they had files with widely varying sizes. I think this means some of the checks for too many cache files can be removed as well but I just wanted to submit the basics at this point. I haven't seen any notable performance issues even when processing 50,000 cache files.
I also removed trailing whitespace from some of the line (M-x delete-trailing-whitespace in emacs).