Skip to content

Commit a3786db

Browse files
committed
auto merge of #17802 : Gankro/rust/collection-docs-redux, r=aturon
Adds a high-level discussion of "what collection should you use for what", as well as some general discussion of correct/efficient usage of the capacity, iterator, and entry APIs. Still building docs to confirm this renders right and the examples are good, but the content can be reviewed now.
2 parents e62ef37 + 1d6eda3 commit a3786db

File tree

3 files changed

+322
-3
lines changed

3 files changed

+322
-3
lines changed

src/libcollections/btree/mod.rs

+2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ pub use self::map::MoveEntries;
1515
pub use self::map::Keys;
1616
pub use self::map::Values;
1717
pub use self::map::Entry;
18+
pub use self::map::Occupied;
19+
pub use self::map::Vacant;
1820
pub use self::map::OccupiedEntry;
1921
pub use self::map::VacantEntry;
2022

src/libcollections/lib.rs

+3
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@
99
// except according to those terms.
1010

1111
//! Collection types.
12+
//!
13+
//! See [../std/collections](std::collections) for a detailed discussion of collections in Rust.
14+
1215

1316
#![crate_name = "collections"]
1417
#![experimental]

src/libstd/collections/mod.rs

+317-3
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,323 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
/*!
12-
* Collection types.
13-
*/
11+
//! Collection types.
12+
//!
13+
//! Rust's standard collection library provides efficient implementations of the most common
14+
//! general purpose programming data structures. By using the standard implementations,
15+
//! it should be possible for two libraries to communicate without significant data conversion.
16+
//!
17+
//! To get this out of the way: you should probably just use `Vec` or `HashMap`. These two
18+
//! collections cover most use cases for generic data storage and processing. They are
19+
//! exceptionally good at doing what they do. All the other collections in the standard
20+
//! library have specific use cases where they are the optimal choice, but these cases are
21+
//! borderline *niche* in comparison. Even when `Vec` and `HashMap` are technically suboptimal,
22+
//! they're probably a good enough choice to get started.
23+
//!
24+
//! Rust's collections can be grouped into four major categories:
25+
//!
26+
//! * Sequences: `Vec`, `RingBuf`, `DList`, `BitV`
27+
//! * Maps: `HashMap`, `BTreeMap`, `TreeMap`, `TrieMap`, `SmallIntMap`, `LruCache`
28+
//! * Sets: `HashSet`, `BTreeSet`, `TreeSet`, `TrieSet`, `BitVSet`, `EnumSet`
29+
//! * Misc: `PriorityQueue`
30+
//!
31+
//! # When Should You Use Which Collection?
32+
//!
33+
//! These are fairly high-level and quick break-downs of when each collection should be
34+
//! considered. Detailed discussions of strengths and weaknesses of individual collections
35+
//! can be found on their own documentation pages.
36+
//!
37+
//! ### Use a `Vec` when:
38+
//! * You want to collect items up to be processed or sent elsewhere later, and don't care about
39+
//! any properties of the actual values being stored.
40+
//! * You want a sequence of elements in a particular order, and will only be appending to
41+
//! (or near) the end.
42+
//! * You want a stack.
43+
//! * You want a resizable array.
44+
//! * You want a heap-allocated array.
45+
//!
46+
//! ### Use a `RingBuf` when:
47+
//! * You want a `Vec` that supports efficient insertion at both ends of the sequence.
48+
//! * You want a queue.
49+
//! * You want a double-ended queue (deque).
50+
//!
51+
//! ### Use a `DList` when:
52+
//! * You want a `Vec` or `RingBuf` of unknown size, and can't tolerate inconsistent
53+
//! performance during insertions.
54+
//! * You are *absolutely* certain you *really*, *truly*, want a doubly linked list.
55+
//!
56+
//! ### Use a `HashMap` when:
57+
//! * You want to associate arbitrary keys with an arbitrary value.
58+
//! * You want a cache.
59+
//! * You want a map, with no extra functionality.
60+
//!
61+
//! ### Use a `BTreeMap` when:
62+
//! * You're interested in what the smallest or largest key-value pair is.
63+
//! * You want to find the largest or smallest key that is smaller or larger than something
64+
//! * You want to be able to get all of the entries in order on-demand.
65+
//! * You want a sorted map.
66+
//!
67+
//! ### Use a `TreeMap` when:
68+
//! * You want a `BTreeMap`, but can't tolerate inconsistent performance.
69+
//! * You want a `BTreeMap`, but have *very large* keys or values.
70+
//! * You want a `BTreeMap`, but have keys that are expensive to compare.
71+
//! * You want a `BTreeMap`, but you accept arbitrary untrusted inputs.
72+
//!
73+
//! ### Use a `TrieMap` when:
74+
//! * You want a `HashMap`, but with many potentially large `uint` keys.
75+
//! * You want a `BTreeMap`, but with potentially large `uint` keys.
76+
//!
77+
//! ### Use a `SmallIntMap` when:
78+
//! * You want a `HashMap` but with known to be small `uint` keys.
79+
//! * You want a `BTreeMap`, but with known to be small `uint` keys.
80+
//!
81+
//! ### Use the `Set` variant of any of these `Map`s when:
82+
//! * You just want to remember which keys you've seen.
83+
//! * There is no meaningful value to associate with your keys.
84+
//! * You just want a set.
85+
//!
86+
//! ### Use a `BitV` when:
87+
//! * You want to store an unbounded number of booleans in a small space.
88+
//! * You want a bitvector.
89+
//!
90+
//! ### Use a `BitVSet` when:
91+
//! * You want a `SmallIntSet`.
92+
//!
93+
//! ### Use an `EnumSet` when:
94+
//! * You want a C-like enum, stored in a single `uint`.
95+
//!
96+
//! ### Use a `PriorityQueue` when:
97+
//! * You want to store a bunch of elements, but only ever want to process the "biggest"
98+
//! or "most important" one at any given time.
99+
//! * You want a priority queue.
100+
//!
101+
//! ### Use an `LruCache` when:
102+
//! * You want a cache that discards infrequently used items when it becomes full.
103+
//! * You want a least-recently-used cache.
104+
//!
105+
//! # Correct and Efficient Usage of Collections
106+
//!
107+
//! Of course, knowing which collection is the right one for the job doesn't instantly
108+
//! permit you to use it correctly. Here are some quick tips for efficient and correct
109+
//! usage of the standard collections in general. If you're interested in how to use a
110+
//! specific collection in particular, consult its documentation for detailed discussion
111+
//! and code examples.
112+
//!
113+
//! ## Capacity Management
114+
//!
115+
//! Many collections provide several constructors and methods that refer to "capacity".
116+
//! These collections are generally built on top of an array. Optimally, this array would be
117+
//! exactly the right size to fit only the elements stored in the collection, but for the
118+
//! collection to do this would be very inefficient. If the backing array was exactly the
119+
//! right size at all times, then every time an element is inserted, the collection would
120+
//! have to grow the array to fit it. Due to the way memory is allocated and managed on most
121+
//! computers, this would almost surely require allocating an entirely new array and
122+
//! copying every single element from the old one into the new one. Hopefully you can
123+
//! see that this wouldn't be very efficient to do on every operation.
124+
//!
125+
//! Most collections therefore use an *amortized* allocation strategy. They generally let
126+
//! themselves have a fair amount of unoccupied space so that they only have to grow
127+
//! on occasion. When they do grow, they allocate a substantially larger array to move
128+
//! the elements into so that it will take a while for another grow to be required. While
129+
//! this strategy is great in general, it would be even better if the collection *never*
130+
//! had to resize its backing array. Unfortunately, the collection itself doesn't have
131+
//! enough information to do this itself. Therefore, it is up to us programmers to give it
132+
//! hints.
133+
//!
134+
//! Any `with_capacity` constructor will instruct the collection to allocate enough space
135+
//! for the specified number of elements. Ideally this will be for exactly that many
136+
//! elements, but some implementation details may prevent this. `Vec` and `RingBuf` can
137+
//! be relied on to allocate exactly the requested amount, though. Use `with_capacity`
138+
//! when you know exactly how many elements will be inserted, or at least have a
139+
//! reasonable upper-bound on that number.
140+
//!
141+
//! When anticipating a large influx of elements, the `reserve` family of methods can
142+
//! be used to hint to the collection how much room it should make for the coming items.
143+
//! As with `with_capacity`, the precise behavior of these methods will be specific to
144+
//! the collection of interest.
145+
//!
146+
//! For optimal performance, collections will generally avoid shrinking themselves.
147+
//! If you believe that a collection will not soon contain any more elements, or
148+
//! just really need the memory, the `shrink_to_fit` method prompts the collection
149+
//! to shrink the backing array to the minimum size capable of holding its elements.
150+
//!
151+
//! Finally, if ever you're interested in what the actual capacity of the collection is,
152+
//! most collections provide a `capacity` method to query this information on demand.
153+
//! This can be useful for debugging purposes, or for use with the `reserve` methods.
154+
//!
155+
//! ## Iterators
156+
//!
157+
//! Iterators are a powerful and robust mechanism used throughout Rust's standard
158+
//! libraries. Iterators provide a sequence of values in a generic, safe, efficient
159+
//! and convenient way. The contents of an iterator are usually *lazily* evaluated,
160+
//! so that only the values that are actually needed are ever actually produced, and
161+
//! no allocation need be done to temporarily store them. Iterators are primarily
162+
//! consumed using a `for` loop, although many functions also take iterators where
163+
//! a collection or sequence of values is desired.
164+
//!
165+
//! All of the standard collections provide several iterators for performing bulk
166+
//! manipulation of their contents. The three primary iterators almost every collection
167+
//! should provide are `iter`, `iter_mut`, and `into_iter`. Some of these are not
168+
//! provided on collections where it would be unsound or unreasonable to provide them.
169+
//!
170+
//! `iter` provides an iterator of immutable references to all the contents of a
171+
//! collection in the most "natural" order. For sequence collections like `Vec`, this
172+
//! means the items will be yielded in increasing order of index starting at 0. For ordered
173+
//! collections like `BTreeMap`, this means that the items will be yielded in sorted order.
174+
//! For unordered collections like `HashMap`, the items will be yielded in whatever order
175+
//! the internal representation made most convenient. This is great for reading through
176+
//! all the contents of the collection.
177+
//!
178+
//! ```
179+
//! let vec = vec![1u, 2, 3, 4];
180+
//! for x in vec.iter() {
181+
//! println!("vec contained {}", x);
182+
//! }
183+
//! ```
184+
//!
185+
//! `iter_mut` provides an iterator of *mutable* references in the same order as `iter`.
186+
//! This is great for mutating all the contents of the collection.
187+
//!
188+
//! ```
189+
//! let mut vec = vec![1u, 2, 3, 4];
190+
//! for x in vec.iter_mut() {
191+
//! *x += 1;
192+
//! }
193+
//! ```
194+
//!
195+
//! `into_iter` transforms the actual collection into an iterator over its contents
196+
//! by-value. This is great when the collection itself is no longer needed, and the
197+
//! values are needed elsewhere. Using `extend` with `into_iter` is the main way that
198+
//! contents of one collection are moved into another. Calling `collect` on an iterator
199+
//! itself is also a great way to convert one collection into another. Both of these
200+
//! methods should internally use the capacity management tools discussed in the
201+
//! previous section to do this as efficiently as possible.
202+
//!
203+
//! ```
204+
//! let mut vec1 = vec![1u, 2, 3, 4];
205+
//! let vec2 = vec![10u, 20, 30, 40];
206+
//! vec1.extend(vec2.into_iter());
207+
//! ```
208+
//!
209+
//! ```
210+
//! use std::collections::RingBuf;
211+
//!
212+
//! let vec = vec![1u, 2, 3, 4];
213+
//! let buf: RingBuf<uint> = vec.into_iter().collect();
214+
//! ```
215+
//!
216+
//! Iterators also provide a series of *adapter* methods for performing common tasks to
217+
//! sequences. Among the adapters are functional favorites like `map`, `fold`, `skip`,
218+
//! and `take`. Of particular interest to collections is the `rev` adapter, that
219+
//! reverses any iterator that supports this operation. Most collections provide reversible
220+
//! iterators as the way to iterate over them in reverse order.
221+
//!
222+
//! ```
223+
//! let vec = vec![1u, 2, 3, 4];
224+
//! for x in vec.iter().rev() {
225+
//! println!("vec contained {}", x);
226+
//! }
227+
//! ```
228+
//!
229+
//! Several other collection methods also return iterators to yield a sequence of results
230+
//! but avoid allocating an entire collection to store the result in. This provides maximum
231+
//! flexibility as `collect` or `extend` can be called to "pipe" the sequence into any
232+
//! collection if desired. Otherwise, the sequence can be looped over with a `for` loop. The
233+
//! iterator can also be discarded after partial use, preventing the computation of the unused
234+
//! items.
235+
//!
236+
//! ## Entries
237+
//!
238+
//! The `entry` API is intended to provide an efficient mechanism for manipulating
239+
//! the contents of a map conditionally on the presence of a key or not. The primary
240+
//! motivating use case for this is to provide efficient accumulator maps. For instance,
241+
//! if one wishes to maintain a count of the number of times each key has been seen,
242+
//! they will have to perform some conditional logic on whether this is the first time
243+
//! the key has been seen or not. Normally, this would require a `find` followed by an
244+
//! `insert`, effectively duplicating the search effort on each insertion.
245+
//!
246+
//! When a user calls `map.entry(key)`, the map will search for the key and then yield
247+
//! a variant of the `Entry` enum.
248+
//!
249+
//! If a `Vacant(entry)` is yielded, then the key *was not* found. In this case the
250+
//! only valid operation is to `set` the value of the entry. When this is done,
251+
//! the vacant entry is consumed and converted into a mutable reference to the
252+
//! the value that was inserted. This allows for further manipulation of the value
253+
//! beyond the lifetime of the search itself. This is useful if complex logic needs to
254+
//! be performed on the value regardless of whether the value was just inserted.
255+
//!
256+
//! If an `Occupied(entry)` is yielded, then the key *was* found. In this case, the user
257+
//! has several options: they can `get`, `set`, or `take` the value of the occupied
258+
//! entry. Additionally, they can convert the occupied entry into a mutable reference
259+
//! to its value, providing symmetry to the vacant `set` case.
260+
//!
261+
//! ### Examples
262+
//!
263+
//! Here are the two primary ways in which `entry` is used. First, a simple example
264+
//! where the logic performed on the values is trivial.
265+
//!
266+
//! #### Counting the number of times each character in a string occurs
267+
//!
268+
//! ```
269+
//! use std::collections::btree::{BTreeMap, Occupied, Vacant};
270+
//!
271+
//! let mut count = BTreeMap::new();
272+
//! let message = "she sells sea shells by the sea shore";
273+
//!
274+
//! for c in message.chars() {
275+
//! match count.entry(c) {
276+
//! Vacant(entry) => { entry.set(1u); },
277+
//! Occupied(mut entry) => *entry.get_mut() += 1,
278+
//! }
279+
//! }
280+
//!
281+
//! assert_eq!(count.find(&'s'), Some(&8));
282+
//!
283+
//! println!("Number of occurences of each character");
284+
//! for (char, count) in count.iter() {
285+
//! println!("{}: {}", char, count);
286+
//! }
287+
//! ```
288+
//!
289+
//! When the logic to be performed on the value is more complex, we may simply use
290+
//! the `entry` API to ensure that the value is initialized, and perform the logic
291+
//! afterwards.
292+
//!
293+
//! #### Tracking the inebriation of customers at a bar
294+
//!
295+
//! ```
296+
//! use std::collections::btree::{BTreeMap, Occupied, Vacant};
297+
//!
298+
//! // A client of the bar. They have an id and a blood alcohol level.
299+
//! struct Person { id: u32, blood_alcohol: f32 };
300+
//!
301+
//! // All the orders made to the bar, by client id.
302+
//! let orders = vec![1,2,1,2,3,4,1,2,2,3,4,1,1,1];
303+
//!
304+
//! // Our clients.
305+
//! let mut blood_alcohol = BTreeMap::new();
306+
//!
307+
//! for id in orders.into_iter() {
308+
//! // If this is the first time we've seen this customer, initialize them
309+
//! // with no blood alcohol. Otherwise, just retrieve them.
310+
//! let person = match blood_alcohol.entry(id) {
311+
//! Vacant(entry) => entry.set(Person{id: id, blood_alcohol: 0.0}),
312+
//! Occupied(entry) => entry.into_mut(),
313+
//! };
314+
//!
315+
//! // Reduce their blood alcohol level. It takes time to order and drink a beer!
316+
//! person.blood_alcohol *= 0.9;
317+
//!
318+
//! // Check if they're sober enough to have another beer.
319+
//! if person.blood_alcohol > 0.3 {
320+
//! // Too drunk... for now.
321+
//! println!("Sorry {}, I have to cut you off", person.id);
322+
//! } else {
323+
//! // Have another!
324+
//! person.blood_alcohol += 0.1;
325+
//! }
326+
//! }
327+
//! ```
14328
15329
#![experimental]
16330

0 commit comments

Comments
 (0)