From 85aaa21a2ca4e57be12d5f537826304877eba180 Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Tue, 15 Apr 2014 18:02:47 +0100 Subject: [PATCH 1/3] RFC: Add a size hint method to allow Writer objects to pre-allocate space. --- active/0000-writer-reserve-additional.md | 76 ++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 active/0000-writer-reserve-additional.md diff --git a/active/0000-writer-reserve-additional.md b/active/0000-writer-reserve-additional.md new file mode 100644 index 00000000000..9d289d4494c --- /dev/null +++ b/active/0000-writer-reserve-additional.md @@ -0,0 +1,76 @@ +- Start Date: 2014-04-15 +- RFC PR #: +- Rust Issue #: + +# Summary + +Add a method the `std::io::Writer` trait to inform writers +of an estimate of how many bytes we’re about to write "soon" +(though possibly in mutliple `write*` method calls). +Implementations can use this as they see fit, +including doing nothing (the default). + +# Motivation + +In cases where the user of a writer makes a number of calls to `write*` methods, +they be be able to estimate beforehand the total number of bytes. +Writers that need to "allocate space" somehow before they can write to it +could use that knowledge to pre-allocate a big chunck of space +rather than starting small and keep re-allocating as needed. + +[rust-encoding](https://github.com/lifthrasiir/rust-encoding) is such a case, +with estimates based on the size of the input and encoding-specific knowledge. +At the moment, rust-encoding defines a custom `ByteWriter` trait +in order to support this. +Using libstd’s `Writer` instead would improve composability with other libraries. + +# Detailed design + +In the `std::io::Writer` trait, add: + +```rust + fn reserve_additional(n: uint) {} +``` + +Like `flush()`, this method defaults to a no-op +but may be overriden by implementations. + +In the `MemWriter` implementation, add: + +```rust + fn reserve_additional(n: uint) { + self.buf.reserve_additional(n) + } +``` + +Usage example: + +```rust +fn write_chars, W: Writer>(iter: I, output: W) -> IoError(()) { + let chars_low, chars_high = iter.size_hint(); + + // XXX Should this be used somehow? + let bytes_high = chars_high.map(|h| h * 4) // Only non-BMP code points + + let bytes_low = chars_low; // Only ASCII code points + output.reserve_additional(bytes_low); + for c in iter { + try!(output.write_char(c)) + } +} +``` + +# Alternatives + +* Do nothing, if the allocation optimization is judged not worth the API complixity. +* Rather than a single number, (optinally?) provide a range for the estimate. + See `Iterator::size_hint`. + +# Unresolved questions + +* `BufferedWriter` probably should also override this new method, + but exact desired behavior is not obvious. +* If `reserve_additional` takes a single number + but a user can estimate within a range, + should they reserve the lower bound, upper bound, or something else? + (`Vec::from_iter` currently only looks at the lower bound from `Iterator::size_hint`.) From 2f4e32ef7da8dd504a11485440432b837813aa4c Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Tue, 29 Apr 2014 16:44:54 +0100 Subject: [PATCH 2/3] Almost completly rewrite the stream size hint RFC: * Add size_hint on Reader, like on Iterator * Highlight the precedent of Iterator::size_hint * Rename reserve_additional to reserve * Use an estimated range (lower bound and optional upper bound) like size_hint, rather than a single estimated number. --- active/0000-stream-size-hints.md | 130 +++++++++++++++++++++++ active/0000-writer-reserve-additional.md | 76 ------------- 2 files changed, 130 insertions(+), 76 deletions(-) create mode 100644 active/0000-stream-size-hints.md delete mode 100644 active/0000-writer-reserve-additional.md diff --git a/active/0000-stream-size-hints.md b/active/0000-stream-size-hints.md new file mode 100644 index 00000000000..daacfb411ed --- /dev/null +++ b/active/0000-stream-size-hints.md @@ -0,0 +1,130 @@ +- Start Date: 2014-04-15 +- RFC PR #: +- Rust Issue #: + +# Summary + +Add a `size_hint` method to `Reader` that +returns an estimated range of how many bytes are remaining to be read, +similar to the existing `size_hint` method on iterators. +Add a `reserve` method to `Writer` that +takes an estimated range of how many bytes will “soon” be written. + + +# Motivation + +Just like `Iterator::size_hint` allows e.g. `Vec::from_iter` +to pre-allocate a big chunk of memory rather than keep reallocating as needed, +this would help reader users and writer implementations that can similarly +pre-allocate space. + +For writers, the caller may have information about amount of data being processed +across many calls to `write*` methods. +For example, [rust-encoding](https://github.com/lifthrasiir/rust-encoding) +processes one code point at a time (calling `write_char` repeatedly), +but can estimate based on the size of the input and encoding-specific knowledge. + +At the moment, rust-encoding defines a custom `ByteWriter` trait +in order to support this. +Using libstd’s `Writer` instead would improve composability with other libraries. + +Simplified usage example: + +```rust +fn write_chars, W: Writer>(iter: I, output: W) -> IoError(()) { + let (chars_low, chars_high) = iter.size_hint(); + let bytes_low = chars_low; // Only ASCII code points + let bytes_high = chars_high.map(|h| h * 4) // Only non-BMP code points + output.reserve(bytes_low, bytes_high); + for c in iter { + try!(output.write_char(c)) + } + Ok(()) +} +``` + +# Detailed design + +The `std::io::Reader` trait gets a new default method: + +```rust + /// Return a lower bound and upper bound on the estimated + /// remaining number of bytes until EOF. + /// + /// Note: This estimate may be wrong. + /// There is no guarantee that EOF will actually be reach within this range. + /// + /// The common use case for the estimate is pre-allocating space to store the results. + #[inline] + fn size_hint(&self) -> (uint, Option) { (0, None) } +``` + +This is identical to the `std::iter::Iterator::size_hint` method. + +The `Reader::read_to_end` default method is updated +to pre-allocate the new vector’s capacity based on `self.size_hint()`. + +`size_hint` is overriden as appropriate in libstd implementors. +For example, in `MemReader`: + +```rust + #[inline] + fn size_hint(&self) -> (uint, Option) { + let exact = self.buf.len() - self.pos; + (exact, Some(exact)) + } +``` + +The `std::io::Writer` trait gets a new default method: + +```rust + /// Inform the writer that of the lower bound and upper bound + /// on the estimated number of bytes that will be written “soon” + /// (though possibly in multiple `write*` method calls). + /// Return a lower bound and upper bound on the remaining number of bytes until EOF. + /// + /// Note: this estimate may be wrong. + /// It is valid to write a number of bytes outside the given range. + /// + /// Implementations can use this information as they see fit, + /// including doing nothing (the default). + /// The common use case for the estimate is pre-allocating space to store the results. + #[inline] + fn reserve(&self, _low: uint, _high: Option) {} +``` + +Like `flush()`, this method defaults to a no-op +but is meant be overridden by implementations. + +Override `size_hint` as appropriate in libstd implementors. +For example, in `MemWriter` (modeled after `Vec::from_iter`): + +```rust + #[inline] + fn size_hint(&self, low: uint, _high: Option) { + self.buf.reserve_additional(low) + } +``` + +`std::io::fs::File::size_hint` could use +[`fallocate`](http://man7.org/linux/man-pages/man2/fallocate.2.html) +with `FALLOC_FL_KEEP_SIZE` on Linux, +or equivalent on other systems. +(This is supposed to be only a hint, +we probably don’t want to change the apparent size of the file.) + +# Alternatives + +* Do nothing, if the allocation optimization is judged not worth the API complexity. +* A previous version of this RFC used a single integer as an "estimate" instead + of the current lower bound and optional upper bound (like `Iterator::size_hint`). +* A draft of this version used the `size_hint` name for both readers and writers, + but that would have prevented anything to implement both traits, like `File` does. +* Define these methods on new, special purpose traits. + This is only practical if we also have specialization: mozilla/rust#7059 + +# Unresolved questions + +* It’s unclear whether or how `BufferedWriter` should override `size_hint`. +* Should `File::size_hint` really call `fallocate`, + or is that better left to a more explicitly-name API? diff --git a/active/0000-writer-reserve-additional.md b/active/0000-writer-reserve-additional.md deleted file mode 100644 index 9d289d4494c..00000000000 --- a/active/0000-writer-reserve-additional.md +++ /dev/null @@ -1,76 +0,0 @@ -- Start Date: 2014-04-15 -- RFC PR #: -- Rust Issue #: - -# Summary - -Add a method the `std::io::Writer` trait to inform writers -of an estimate of how many bytes we’re about to write "soon" -(though possibly in mutliple `write*` method calls). -Implementations can use this as they see fit, -including doing nothing (the default). - -# Motivation - -In cases where the user of a writer makes a number of calls to `write*` methods, -they be be able to estimate beforehand the total number of bytes. -Writers that need to "allocate space" somehow before they can write to it -could use that knowledge to pre-allocate a big chunck of space -rather than starting small and keep re-allocating as needed. - -[rust-encoding](https://github.com/lifthrasiir/rust-encoding) is such a case, -with estimates based on the size of the input and encoding-specific knowledge. -At the moment, rust-encoding defines a custom `ByteWriter` trait -in order to support this. -Using libstd’s `Writer` instead would improve composability with other libraries. - -# Detailed design - -In the `std::io::Writer` trait, add: - -```rust - fn reserve_additional(n: uint) {} -``` - -Like `flush()`, this method defaults to a no-op -but may be overriden by implementations. - -In the `MemWriter` implementation, add: - -```rust - fn reserve_additional(n: uint) { - self.buf.reserve_additional(n) - } -``` - -Usage example: - -```rust -fn write_chars, W: Writer>(iter: I, output: W) -> IoError(()) { - let chars_low, chars_high = iter.size_hint(); - - // XXX Should this be used somehow? - let bytes_high = chars_high.map(|h| h * 4) // Only non-BMP code points - - let bytes_low = chars_low; // Only ASCII code points - output.reserve_additional(bytes_low); - for c in iter { - try!(output.write_char(c)) - } -} -``` - -# Alternatives - -* Do nothing, if the allocation optimization is judged not worth the API complixity. -* Rather than a single number, (optinally?) provide a range for the estimate. - See `Iterator::size_hint`. - -# Unresolved questions - -* `BufferedWriter` probably should also override this new method, - but exact desired behavior is not obvious. -* If `reserve_additional` takes a single number - but a user can estimate within a range, - should they reserve the lower bound, upper bound, or something else? - (`Vec::from_iter` currently only looks at the lower bound from `Iterator::size_hint`.) From 6e76258d96ae31ca1cbdbefbe631ab98bd88bd1b Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Tue, 29 Apr 2014 17:38:34 +0100 Subject: [PATCH 3/3] (Stream size hint RFC) Fix wording --- active/0000-stream-size-hints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/0000-stream-size-hints.md b/active/0000-stream-size-hints.md index daacfb411ed..df418766699 100644 --- a/active/0000-stream-size-hints.md +++ b/active/0000-stream-size-hints.md @@ -96,7 +96,7 @@ The `std::io::Writer` trait gets a new default method: Like `flush()`, this method defaults to a no-op but is meant be overridden by implementations. -Override `size_hint` as appropriate in libstd implementors. +`size_hint` is overridden as appropriate in libstd implementors. For example, in `MemWriter` (modeled after `Vec::from_iter`): ```rust