Skip to content

RFC: Macro to initialize collections #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

japaric
Copy link
Member

@japaric japaric commented Aug 21, 2014

Rendered view

Out of tree implementation

(This is a consolidation of the ideas that have been floating around rust-lang/rust#14726 into a RFC)

@Valloric
Copy link

This looks very nice; thanks for writing up the RFC!

@Gankra
Copy link
Contributor

Gankra commented Aug 21, 2014

One option is to define Seq like so:

trait Seq<T>: Default {
  fn with_capacity(n: uint) -> Self { 
    Default::default()
  }
  fn add_elem(&mut Self, elem: T);
}

Requiring Default to provide a nice default seems minimally onerous, since basically every collection can and should.

For the naming bikeshed, I will toss out the name collect! in reference to the iterator method of the same name.

@Gankra
Copy link
Contributor

Gankra commented Aug 21, 2014

Another alternative is to create a macro iter! that produces an anonymous iterator with a perfect size_hint over the values. seq! would then just be iter!(1,2,3,4,5).collect(). You wouldn't need to add a new trait, it would Just Work for everything that implements FromIterator which I believe all relevant std collections do. Collections that can preallocate can preallocate, and any other optimizations possible in FromIterator would be available, since the collection gets control over how it is constructed (rather than being constructed iteratively over n insertions).

This also has these benefits:

  • reduces collection api pollution.
  • iter! is more general, with possible usecases elsewhere
  • works for immutable collections like a perfect hash map, or any persistent structure

@treeman
Copy link

treeman commented Aug 21, 2014

+1 Overall, something like this is very needed. Thanks for writing the RFC!

I'm in favor for the implementation via iter! which @gankro suggests. Avoiding an extra trait and working for immutable collections are good benefits.

I don't think we should deprecate vec! over seq! as type interference for Vec will need to be explicit.

let v = vec![1, 2, 3]; // ok
let v = seq![1, 2, 3]; // not ok
let v: Vec<int> = seq![1, 2, 3]; // need to do this

To me vec! feels like a nice shorthand to have.

I like a => b over a: b as a: b reads like "a has type b".

@netvl
Copy link

netvl commented Aug 21, 2014

+1 for the implementation using iter!(). It itself is very useful, and exploiting existing traits is a big win.

@huonw
Copy link
Member

huonw commented Aug 21, 2014

I would be very interested to see an implementation of iter! (or at least a sketch). I'm doubtful about how efficient one can make it e.g. on first thought it seems like it would just have to evaluate everything directly into a backing array or else it would end up with weird borrowing/moving, and that evaluation really needs something like #197 to be able to yield the elements by-value and still be safe (without an allocation).

(I'm very happy to be proved wrong.)

@Gankra
Copy link
Contributor

Gankra commented Aug 21, 2014

@huonw Yeah, I couldn't come up with a satisfying sketch myself. Various pending RFCs might make it more feasible by constructing a MagicIter containing some [T,..n], or a pointer to a [T,..n] allocated elsewhere.. Although it depends on if we want the iterator to be a value that can be returned, or one that's unable to escape the scope it's defined in. Also whether we want all the elements on the stack, or the heap.

As a very silly shim, this works:

macro_rules! iter {
    // List style: seq![1, 2, 3]
    ($($x:expr),*) => ({
        let mut _temp = vec![$($x,)*];

        _temp.move_iter()
    });
    // other style omitted for simplicity
)

fn main() {
    let foo:RingBuf<uint> = iter!(1,2,3,4).collect();
}

@Kimundi
Copy link
Member

Kimundi commented Aug 21, 2014

One possibility would be to use [Option<T>, ..N], as implemented here: http://is.gd/pdN16V

@japaric
Copy link
Member Author

japaric commented Aug 21, 2014

@Kimundi Thanks a really nice macro, I never tought of using a fixed size array like that!

I did some comparisons between seq! and iter!().collect(), which you can find in the vs-iter branch (and should be able to replicate).

Here's the summary:
To generate a Vec<Box<int>> with 100 elements:

  • iter! is 10% slower than seq!
  • Both use the same amount of heap memory
  • Compiling the iter! version seems to take a 13x longer than compiling seq!
  • The iter! binary is slightly (60 KB) bigger than the seq! binary

I'd be great if someone can replicate these measurements, and/or check if my measurements are flawed in some way.

@Gankra
Copy link
Contributor

Gankra commented Aug 21, 2014

I'm curious, is there any reason you went with boxed ints as your testing element?

Otherwise, I'm not totally surprised that this implementation of iter is slower, since it's effectively zeroing out and padding the memory, as well as copying it with the iterator. Could you try out my naive vec!-based implementation as well? It heap allocates and copies the elements, but it doesn't have to zero out any memory or include discriminants (from my cursor skim of move_iter). I'd be interested to see the difference.

I'd also be interested in benching the construction of other structures. In particular, PriorityQueue should have O(n) FromIterator (but looking at the source, no one seems to have implemented that! Gonna go work on it).

@pczarn
Copy link

pczarn commented Aug 22, 2014

A similar macro could initialize SmallVecs of appropriate type, too.

@nikomatsakis
Copy link
Contributor

This generally looks like a nice idea. It would allow us to prototype how well extensible vector syntax would work, which I like.

The name of the trait seems less than ideal. This trait does not represent the general notion of a sequence, but rather just something that can be constructed via append.

I want to correct two misconceptions:

  1. In a post-UFCS world, there will be no distinction between a "static method" and a non-static method, so the definition of the trait is perfect as is. (Or, to put another way, any method that has an &mut Self receiver ought to be usable in method notation or in trait call notation.)
  2. This design works fine with persistent/immutable collections. Generally speaking, a persistent collection will use Rc or Gc internally to share its data (there isn't much point in being persistent if you don't have aliases!). So the insert method can simply update *self with the newly allocated collection containing the new data. Hence it might look like:
pub struct Tree<T> { data: Rc<Data<T>> }
impl<T> Tree<T> {
    fn new() -> Tree<T> { ... }
    fn add(&self, item: T) -> Tree<T> { ... }
}

In that case, you could impl the seq interface just fine:

impl<T> Seq<T> for Tree<T> {
    fn with_capacity(n: uint) -> Tree<T> { Tree::new() }
    fn insert(tree: &mut Tree<T>, element: T) {
       let n = tree.add(element);
       *tree = n;
    }
}

@Gankra
Copy link
Contributor

Gankra commented Aug 23, 2014

@nikomatsakis I agree that persistent strucutures can work with this scheme, but it's likely going to be much less efficient than possible with FromIterator, which opens up a whole new area of potential optimizations. Again, heapify is a great canonical example of where FromIterator can blow iterative construction out of the water.

We could possibly recover this performance with a triplet of functions:

unsafe fn new_unsealed(to_insert: uint) -> Self;
unsafe fn insert_unsealed(&mut self, T: elem);
unsafe fn seal(&mut self);

Which would basically allow the structure to work like FromIterator, but in a a truly iterative manner. In the case of priority_queue, this would look like:

unsafe fn new_unsealed(to_insert: uint) -> Self{
    priority_queue::with_capacity(to_insert);
}

unsafe fn insert_unsealed(&mut self, T: elem) {
    self.data.push(elem); // push into the underlying Vec, without preserving heap invariants
}

unsafe fn seal(&mut self) {
    self.heapify();
}

I mark the methods unsafe because the underlying structure is free to do whatever it pleases, and only guarantees that it's in a sane state when the functions are called in the right sequence.

@SiegeLord
Copy link

I concur with @treeman's point, and to that extent I don't see why it doesn't apply to HashMap and other containers. How possible would it be to have multiple macros, one for sequences and one for maps etc, which would utilize backing traits with default type parameters (assuming those end up working as hints)? I.e.:

let v = seq![1u]; // Made a Vec<uint>
let d: RingBuf<_> = seq![1u]; // Alter the default to a RingBuf
let m = map![1u => 1u]; // Made a HashMap<uint, uint>
let m: RbTree<_> = map![1u => 1u]; // Alter the default

This also resolves the seq being abused for maps. I have no problems with lots of macros in the default crates if they are used for what macros are meant to be used for.

Either way, if that's too crazy, I'm still voting for keeping vec!.

@nikomatsakis
Copy link
Contributor

@gankro something like "seal" would require more of a builder-style interface, I think, wherein the seal method returns a new type representing the "sealed" data. I agree it'd be nice, not quite sure what it looks like. Probably worth experimenting more.

As for having multiple macros, I think if we can have only one, that seems better, particularly if we get the generic API right (and I'm convinced now that the proposed interface is not necessarily right).

@nikomatsakis
Copy link
Contributor

Sorry, didn't finish that thought: I meant to say that if we have a generic API, and only one macro, it's nice because you can use the macro in generic code.

@Ericson2314
Copy link
Contributor

How will this interact with the generic collection traits with HKTs post 1.0? It would be a shame to have a redundant trait hanging around once those land.

@japaric
Copy link
Member Author

japaric commented Dec 24, 2014

Closing/postponing this because (a) we removed all the collections traits, (b) also removed several collections from the stdlib, and (c) not going to ship any generic collection trait with 1.0.

We should revisit this when we start thinking about adding generic collection traits to the stdlib. Which is definitely post 1.0, and maybe even after we get HKT.

@japaric japaric closed this Dec 24, 2014
withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.