Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract fread/write_big into f_big #398

Closed
wants to merge 1 commit into from
Closed

Abstract fread/write_big into f_big #398

wants to merge 1 commit into from

Conversation

zlatanvasovic
Copy link
Contributor

No description provided.

@weshinsley
Copy link
Collaborator

Well... I see the point - reducing line numbers and sharing the code...

But conceptually, I'm not quite sure whether we want to merge reading and writing like this. Taking a step back, I can imagine there being ways of making some file reading faster that might not apply to file writing. (See Issue #223 which didn't quite make it into workable code, but shows the kind of read-specific sort of work I'm thinking of).

If we go with it, then I'd prefer the parameter to f_big to be a const containing READ | WRITE (rather than an arbitrary integer 0 or 1). But I'd like more thoughts from others about whether this is the right sort of refactor on this code, or whether a different sort of rewrite is really needed, which would be (slightly) harder to unpick if we merged them. I suspect there's a better more modern answer for both the reads and writes currently being done.

@zlatanvasovic
Copy link
Contributor Author

@weshinsley Even if some change is to be implemented specifically in read or write, most of the code still can be easily abstracted.
It can be done much more idiomatically, but C++ lacks many metaprogramming features, such as code evaluation from a string. About READ | WRITE, any form of parameter you prefer will work. I always allow edits by maintainers, so you can edit it before merging.

@DavidVernest
Copy link
Contributor

We are merging code into a live system - having the maintainers modify the logic is not good practice. It is the developer's responsibility to deliver a cohesive solution before asking for the merge.

@zlatanvasovic
Copy link
Contributor Author

@mstevens-uk I've already implemented the cohesive solution which has all the required logic, as you could have noticed. It's also far more cohesive than the previous solution.

However, what I cannot do is try to change the preference of maintainers for specific variable names or forms. For me having a variable with possible values of 0 and 1 or "READ" and "WRITE" is essentially the same. And as @weshinsley has said, it's just their own preference, not a must.

@matt-gretton-dann
Copy link
Collaborator

The code as it stands needs reworking to cope with errors from fwrite() & fread() - including those that may not be fatal and just indicate a retry. As it is I think if the disk gets full this code will go into an infinite loop when writing to disk.

My question is why can't we just use plain fread/fwrite what do these functions gain us? (Is it simply some Operating Systems perform better if you make the elements as large as possible compared to the number you are outputting?).

So I don't like this PR. But the functions do need reworking - if only to have better error handling.

(And on the 0 or 1 or enum question: I'd prefer a lambda which did the read or write).

@weshinsley
Copy link
Collaborator

Yep, so our original code looks like it is forcing a chunk size of 2Gb for reads/writes, so I too wonder if this overides some defaults (either OS/compiler/some combination) for the chunk size that fread/fwrite use. I can imagine Neil testing this over infiniband with some much larger files than we are used to - full US, or perhaps all of Africa, which we have run at times.

So this PR is just factoring out the code that's common to both the fread_big and fwrite_big functions, since they only differ in two places - f[read] or f[write]. But, if/when error handling is added, will that motivate us back to wanting separate read/write functions again? Presumably there are different potential errors for fread vs fwrite...?

@matt-gretton-dann
Copy link
Collaborator

My ideal would be to replace the f*_big functions with xf* functions which wrap the f* functions handling errors and smoothing over platform issues (similar to the x*alloc I've suggested elsewhere: #406).

On UNIX platforms the errors in particular we should be handling are EAGAIN & EINTR which are indicators that the call got interrupted - not that things went wrong. On Windows whilst the above are valid codes I don't think they are used - instead we should consider an invalid parameter handler.

@zlatanvasovic zlatanvasovic deleted the freadwrite branch July 17, 2020 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants