quickwit-oss · PSeitz · Oct 10, 2024 · Oct 9, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -29,8 +29,8 @@ Tantivy 0.23 will be backwards compatible with indices created with v0.22 and v0
     - modify fastfield range query heuristic [#2375](https://github.com/quickwit-oss/tantivy/pull/2375)(@trinity-1686a)
     - add FastFieldRangeQuery for explicit range queries on fast field (for `RangeQuery` it is autodetected) [#2477](https://github.com/quickwit-oss/tantivy/pull/2477)(@PSeitz)
 
-- add format backwards-compatibiliy tests [#2485](https://github.com/quickwit-oss/tantivy/pull/2485)(@PSeitz)
-- add columnar format compatibiliy tests [#2433](https://github.com/quickwit-oss/tantivy/pull/2433)(@PSeitz)
+- add format backwards-compatibility tests [#2485](https://github.com/quickwit-oss/tantivy/pull/2485)(@PSeitz)
+- add columnar format compatibility tests [#2433](https://github.com/quickwit-oss/tantivy/pull/2433)(@PSeitz)
 - Improved snippet ranges algorithm [#2474](https://github.com/quickwit-oss/tantivy/pull/2474)(@gezihuzi)
 - make find_field_with_default return json fields without path [#2476](https://github.com/quickwit-oss/tantivy/pull/2476)(@trinity-1686a)
 - feat(query): Make `BooleanQuery` support `minimum_number_should_match` [#2405](https://github.com/quickwit-oss/tantivy/pull/2405)(@LebranceBW)
@@ -74,7 +74,7 @@ Tantivy 0.22 will be able to read indices created with Tantivy 0.21.
 - Fix bug that can cause `get_docids_for_value_range` to panic. [#2295](https://github.com/quickwit-oss/tantivy/pull/2295)(@fulmicoton)
 - Avoid 1 document indices by increase min memory to 15MB for indexing [#2176](https://github.com/quickwit-oss/tantivy/pull/2176)(@PSeitz)
 - Fix merge panic for JSON fields [#2284](https://github.com/quickwit-oss/tantivy/pull/2284)(@PSeitz)
-- Fix bug occuring when merging JSON object indexed with positions. [#2253](https://github.com/quickwit-oss/tantivy/pull/2253)(@fulmicoton)
+- Fix bug occurring when merging JSON object indexed with positions. [#2253](https://github.com/quickwit-oss/tantivy/pull/2253)(@fulmicoton)
 - Fix empty DateHistogram gap bug [#2183](https://github.com/quickwit-oss/tantivy/pull/2183)(@PSeitz)
 - Fix range query end check (fields with less than 1 value per doc are affected) [#2226](https://github.com/quickwit-oss/tantivy/pull/2226)(@PSeitz)
 - Handle exclusive out of bounds ranges on fastfield range queries [#2174](https://github.com/quickwit-oss/tantivy/pull/2174)(@PSeitz)
@@ -92,7 +92,7 @@ Tantivy 0.22 will be able to read indices created with Tantivy 0.21.
   - Support to deserialize f64 from string [#2311](https://github.com/quickwit-oss/tantivy/pull/2311)(@PSeitz)
   - Add a top_hits aggregator [#2198](https://github.com/quickwit-oss/tantivy/pull/2198)(@ditsuke)
   - Support bool type in term aggregation [#2318](https://github.com/quickwit-oss/tantivy/pull/2318)(@PSeitz)
-  - Support ip adresses in term aggregation [#2319](https://github.com/quickwit-oss/tantivy/pull/2319)(@PSeitz)
+  - Support ip addresses in term aggregation [#2319](https://github.com/quickwit-oss/tantivy/pull/2319)(@PSeitz)
   - Support date type in term aggregation [#2172](https://github.com/quickwit-oss/tantivy/pull/2172)(@PSeitz)
   - Support escaped dot when addressing field [#2250](https://github.com/quickwit-oss/tantivy/pull/2250)(@PSeitz)
 
@@ -182,7 +182,7 @@ Tantivy 0.20
 - Add PhrasePrefixQuery [#1842](https://github.com/quickwit-oss/tantivy/issues/1842) (@trinity-1686a)
 - Add `coerce` option for text and numbers types (convert the value instead of returning an error during indexing) [#1904](https://github.com/quickwit-oss/tantivy/issues/1904) (@PSeitz)
 - Add regex tokenizer [#1759](https://github.com/quickwit-oss/tantivy/issues/1759)(@mkleen)
-- Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. [#1767](https://github.com/quickwit-oss/tantivy/issues/1767) (@PSeitz)
+- Move tokenizer API to separate crate. Having a separate crate with a stable API will allow us to use tokenizers with different tantivy versions. [#1767](https://github.com/quickwit-oss/tantivy/issues/1767) (@PSeitz)
 - **Columnar crate**: New fast field handling (@fulmicoton @PSeitz) [#1806](https://github.com/quickwit-oss/tantivy/issues/1806)[#1809](https://github.com/quickwit-oss/tantivy/issues/1809)
   - Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
   - Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. [#1876](https://github.com/quickwit-oss/tantivy/issues/1876) (@fulmicoton)
@@ -229,13 +229,13 @@ Tantivy 0.20
 - Auto downgrade index record option, instead of vint error [#1857](https://github.com/quickwit-oss/tantivy/issues/1857) (@PSeitz)
 - Enable range query on fast field for u64 compatible types [#1762](https://github.com/quickwit-oss/tantivy/issues/1762) (@PSeitz) [#1876]
 - sstable
-  - Isolating sstable and stacker in independant crates. [#1718](https://github.com/quickwit-oss/tantivy/issues/1718) (@fulmicoton)
+  - Isolating sstable and stacker in independent crates. [#1718](https://github.com/quickwit-oss/tantivy/issues/1718) (@fulmicoton)
   - New sstable format [#1943](https://github.com/quickwit-oss/tantivy/issues/1943)[#1953](https://github.com/quickwit-oss/tantivy/issues/1953) (@trinity-1686a)
-  - Use DeltaReader directly to implement Dictionnary::ord_to_term [#1928](https://github.com/quickwit-oss/tantivy/issues/1928) (@trinity-1686a)
-  - Use DeltaReader directly to implement Dictionnary::term_ord [#1925](https://github.com/quickwit-oss/tantivy/issues/1925) (@trinity-1686a)
-- Add seperate tokenizer manager for fast fields [#2019](https://github.com/quickwit-oss/tantivy/issues/2019) (@PSeitz)
+  - Use DeltaReader directly to implement Dictionary::ord_to_term [#1928](https://github.com/quickwit-oss/tantivy/issues/1928) (@trinity-1686a)
+  - Use DeltaReader directly to implement Dictionary::term_ord [#1925](https://github.com/quickwit-oss/tantivy/issues/1925) (@trinity-1686a)
+- Add separate tokenizer manager for fast fields [#2019](https://github.com/quickwit-oss/tantivy/issues/2019) (@PSeitz)
 - Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. [#1756](https://github.com/quickwit-oss/tantivy/issues/1756) (@adamreichold)
-- Added support for madvise when opening an mmaped Index [#2036](https://github.com/quickwit-oss/tantivy/issues/2036) (@fulmicoton)
+- Added support for madvise when opening an mmapped Index [#2036](https://github.com/quickwit-oss/tantivy/issues/2036) (@fulmicoton)
 - Rename `DatePrecision` to `DateTimePrecision` [#2051](https://github.com/quickwit-oss/tantivy/issues/2051) (@guilload)
 - Query Parser
   - Quotation mark can now be used for phrase queries. [#2050](https://github.com/quickwit-oss/tantivy/issues/2050) (@fulmicoton)
@@ -274,7 +274,7 @@ Tantivy 0.19
 - Add support for phrase slop in query language [#1393](https://github.com/quickwit-oss/tantivy/pull/1393) (@saroh)
 - Aggregation
   - Add aggregation support for date type [#1693](https://github.com/quickwit-oss/tantivy/pull/1693)(@PSeitz)
-  - Add support for keyed parameter in range and histgram aggregations [#1424](https://github.com/quickwit-oss/tantivy/pull/1424) (@k-yomo)
+  - Add support for keyed parameter in range and histogram aggregations [#1424](https://github.com/quickwit-oss/tantivy/pull/1424) (@k-yomo)
   - Add aggregation bucket limit [#1363](https://github.com/quickwit-oss/tantivy/pull/1363) (@PSeitz)
 - Faster indexing
   - [#1610](https://github.com/quickwit-oss/tantivy/pull/1610) (@PSeitz)

diff --git a/TODO.txt b/TODO.txt
@@ -1,7 +1,7 @@
 Make schema_builder API fluent.
 fix doc serialization and prevent compression problems
 
-u64 , etc. shoudl return Resutl<Option> now that we support optional missing a column is really not an error
+u64 , etc. should return Result<Option> now that we support optional missing a column is really not an error
 remove fastfield codecs
 ditch the first_or_default trick. if it is still useful, improve its implementation.
 rename FastFieldReaders::open to load

diff --git a/columnar/README.md b/columnar/README.md
@@ -31,7 +31,7 @@ restriction on 50% of the values (e.g. a 64-bit hash). On the other hand, a lot
 # Columnar format
 
 This columnar format may have more than one column (with different types) associated to the same `column_name` (see [Coercion rules](#coercion-rules) above).
-The `(column_name, columne_type)` couple however uniquely identifies a column.
+The `(column_name, column_type)` couple however uniquely identifies a column.
 That couple is serialized as a column `column_key`.  The format of that key is:
 `[column_name][ZERO_BYTE][column_type_header: u8]`
 

diff --git a/columnar/src/TODO.md b/columnar/src/TODO.md
@@ -10,7 +10,7 @@
 
 # Perf and Size
 * remove alloc in `ord_to_term`
-+ multivaued range queries restrat frm the beginning all of the time.
++ multivaued range queries restart from the beginning all of the time.
 * re-add ZSTD compression for dictionaries
 no systematic monotonic mapping
 consider removing multilinear
@@ -30,7 +30,7 @@ investigate if should have better errors? io::Error is overused at the moment.
 rename rank/select in unit tests
 Review the public API via cargo doc
 go through TODOs
-remove all  doc_id occurences -> row_id
+remove all  doc_id occurrences -> row_id
 use the rank & select naming in unit tests branch.
 multi-linear -> blockwise
 linear codec -> simply a multiplication for the index column
@@ -43,5 +43,5 @@ isolate u128_based and uniform naming
 # Other
 fix enhance column-cli
 
-# Santa claus
+# Santa Claus
 autodetect datetime ipaddr, plug customizable tokenizer.
diff --git a/columnar/src/column_index/merge/mod.rs b/columnar/src/column_index/merge/mod.rs
@@ -173,7 +173,7 @@ mod tests {
         .into();
         let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
         let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
-            panic!("Excpected a multivalued index")
+            panic!("Expected a multivalued index")
         };
         let mut output = Vec::new();
         serialize_multivalued_index(&start_index_iterable, &mut output).unwrap();
@@ -211,7 +211,7 @@ mod tests {
 
         let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
         let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
-            panic!("Excpected a multivalued index")
+            panic!("Expected a multivalued index")
         };
         let mut output = Vec::new();
         serialize_multivalued_index(&start_index_iterable, &mut output).unwrap();

diff --git a/columnar/src/column_index/mod.rs b/columnar/src/column_index/mod.rs
@@ -28,7 +28,7 @@ pub enum ColumnIndex {
     Full,
     Optional(OptionalIndex),
     /// In addition, at index num_rows, an extra value is added
-    /// containing the overal number of values.
+    /// containing the overall number of values.
     Multivalued(MultiValueIndex),
 }
 

diff --git a/columnar/src/column_values/u128_based/compact_space/build_compact_space.rs b/columnar/src/column_values/u128_based/compact_space/build_compact_space.rs
@@ -184,7 +184,7 @@ impl CompactSpaceBuilder {
 
         let mut covered_space = Vec::with_capacity(self.blanks.len());
 
-        // begining of the blanks
+        // beginning of the blanks
         if let Some(first_blank_start) = self.blanks.first().map(RangeInclusive::start) {
             if *first_blank_start != 0 {
                 covered_space.push(0..=first_blank_start - 1);

diff --git a/columnar/src/column_values/u64_based/line.rs b/columnar/src/column_values/u64_based/line.rs
@@ -122,7 +122,7 @@ impl Line {
         line
     }
 
-    /// Returns a line that attemps to approximate a function
+    /// Returns a line that attempts to approximate a function
     /// f: i in 0..[ys.num_vals()) -> ys[i].
     ///
     /// - The approximation is always lower than the actual value. Or more rigorously, formally

diff --git a/columnar/src/columnar/merge/mod.rs b/columnar/src/columnar/merge/mod.rs
@@ -25,7 +25,7 @@ use crate::{
 /// After merge, all columns belonging to the same category are coerced to
 /// the same column type.
 ///
-/// In practise, today, only Numerical colummns are coerced into one type today.
+/// In practise, today, only Numerical columns are coerced into one type today.
 ///
 /// See also [README.md].
 ///
@@ -63,8 +63,8 @@ impl From<ColumnType> for ColumnTypeCategory {
 /// `require_columns` makes it possible to ensure that some columns will be present in the
 /// resulting columnar. When a required column is a numerical column type, one of two things can
 /// happen:
-/// - If the required column type is compatible with all of the input columnar, the resulsting
-///   merged columnar will simply coerce the input column and use the required column type.
+/// - If the required column type is compatible with all of the input columnar, the resulting merged
+///   columnar will simply coerce the input column and use the required column type.
 /// - If the required column type is incompatible with one of the input columnar, the merged will
 ///   fail with an InvalidData error.
 ///

diff --git a/columnar/src/columnar/writer/column_operation.rs b/columnar/src/columnar/writer/column_operation.rs
@@ -87,7 +87,7 @@ impl<V: SymbolValue> ColumnOperation<V> {
         minibuf
     }
 
-    /// Deserialize a colummn operation.
+    /// Deserialize a column operation.
     /// Returns None if the buffer is empty.
     ///
     /// Panics if the payload is invalid:

diff --git a/examples/iterating_docs_and_positions.rs b/examples/iterating_docs_and_positions.rs
@@ -28,7 +28,7 @@ fn main() -> tantivy::Result<()> {
     let mut index_writer: IndexWriter = index.writer_with_num_threads(1, 50_000_000)?;
     index_writer.add_document(doc!(title => "The Old Man and the Sea"))?;
     index_writer.add_document(doc!(title => "Of Mice and Men"))?;
-    index_writer.add_document(doc!(title => "The modern Promotheus"))?;
+    index_writer.add_document(doc!(title => "The modern Prometheus"))?;
     index_writer.commit()?;
 
     let reader = index.reader()?;

diff --git a/query-grammar/src/query_grammar.rs b/query-grammar/src/query_grammar.rs
@@ -833,7 +833,7 @@ fn aggregate_infallible_expressions(
     if early_operand {
         err.push(LenientErrorInternal {
             pos: 0,
-            message: "Found unexpeted boolean operator before term".to_string(),
+            message: "Found unexpected boolean operator before term".to_string(),
         });
     }
 
@@ -856,7 +856,7 @@ fn aggregate_infallible_expressions(
                     _ => Some(Occur::Should),
                 };
                 if occur == &Some(Occur::MustNot) && default_op == Some(Occur::Should) {
-                    // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                    // if occur is MustNot *and* operation is OR, we synthesize a ShouldNot
                     clauses.push(vec![(
                         Some(Occur::Should),
                         ast.clone().unary(Occur::MustNot),
@@ -872,7 +872,7 @@ fn aggregate_infallible_expressions(
                     None => None,
                 };
                 if occur == &Some(Occur::MustNot) && default_op == Some(Occur::Should) {
-                    // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                    // if occur is MustNot *and* operation is OR, we synthesize a ShouldNot
                     clauses.push(vec![(
                         Some(Occur::Should),
                         ast.clone().unary(Occur::MustNot),
@@ -897,7 +897,7 @@ fn aggregate_infallible_expressions(
         }
         Some(BinaryOperand::Or) => {
             if last_occur == Some(Occur::MustNot) {
-                // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                // if occur is MustNot *and* operation is OR, we synthesize a ShouldNot
                 clauses.push(vec![(Some(Occur::Should), last_ast.unary(Occur::MustNot))]);
             } else {
                 clauses.push(vec![(last_occur.or(Some(Occur::Should)), last_ast)]);
@@ -1057,7 +1057,7 @@ mod test {
         valid_parse("1", 1.0, "");
         valid_parse("0.234234 aaa", 0.234234f64, " aaa");
         error_parse(".3332");
-        // TODO trinity-1686a: I disagree that it should fail, I think it should succeeed,
+        // TODO trinity-1686a: I disagree that it should fail, I think it should succeed,
         // consuming only "1", and leave "." for the next thing (which will likely fail then)
         // error_parse("1.");
         error_parse("-1.");
@@ -1467,7 +1467,7 @@ mod test {
     }
 
     #[test]
-    fn test_parse_query_to_triming_spaces() {
+    fn test_parse_query_to_trimming_spaces() {
         test_parse_query_to_ast_helper("   abc", "abc");
         test_parse_query_to_ast_helper("abc ", "abc");
         test_parse_query_to_ast_helper("(  a OR abc)", "(?a ?abc)");

diff --git a/query-grammar/src/user_input_ast.rs b/query-grammar/src/user_input_ast.rs
@@ -267,7 +267,7 @@ impl fmt::Debug for UserInputAst {
         match *self {
             UserInputAst::Clause(ref subqueries) => {
                 if subqueries.is_empty() {
-                    // TODO this will break ast reserialization, is writing "( )" enought?
+                    // TODO this will break ast reserialization, is writing "( )" enough?
                     write!(formatter, "<emptyclause>")?;
                 } else {
                     write!(formatter, "(")?;

diff --git a/src/aggregation/agg_tests.rs b/src/aggregation/agg_tests.rs
@@ -870,7 +870,7 @@ fn test_aggregation_on_json_object_mixed_types() {
         .add_document(doc!(json => json!({"mixed_type": "blue", "mixed_price": 5.0})))
         .unwrap();
     index_writer.commit().unwrap();
-    // => Segment with all boolen
+    // => Segment with all boolean
     index_writer
         .add_document(doc!(json => json!({"mixed_type": true, "mixed_price": "no_price"})))
         .unwrap();

diff --git a/src/aggregation/bucket/term_agg.rs b/src/aggregation/bucket/term_agg.rs
@@ -25,7 +25,7 @@ use crate::aggregation::{format_date, Key};
 use crate::error::DataCorruption;
 use crate::TantivyError;
 
-/// Creates a bucket for every unique term and counts the number of occurences.
+/// Creates a bucket for every unique term and counts the number of occurrences.
 /// Note that doc_count in the response buckets equals term count here.
 ///
 /// If the text is untokenized and single value, that means one term per document and therefore it
@@ -158,7 +158,7 @@ pub struct TermsAggregation {
     /// when loading the text.
     /// Special Case 1:
     /// If we have multiple columns on one field, we need to have a union on the indices on both
-    /// columns, to find docids without a value. That requires a special missing aggreggation.
+    /// columns, to find docids without a value. That requires a special missing aggregation.
     /// Special Case 2: if the key is of type text and the column is numerical, we also need to use
     /// the special missing aggregation, since there is no mechanism in the numerical column to
     /// add text.
@@ -364,7 +364,7 @@ impl SegmentTermCollector {
         let term_buckets = TermBuckets::default();
 
         if let Some(custom_order) = req.order.as_ref() {
-            // Validate sub aggregtion exists
+            // Validate sub aggregation exists
             if let OrderTarget::SubAggregation(sub_agg_name) = &custom_order.target {
                 let (agg_name, _agg_property) = get_agg_name_and_property(sub_agg_name);
 
@@ -1685,7 +1685,7 @@ mod tests {
             res["my_texts"]["buckets"][2]["key"],
             serde_json::Value::Null
         );
-        // text field with numner as missing fallback
+        // text field with number as missing fallback
         assert_eq!(res["my_texts2"]["buckets"][0]["key"], "Hello Hello");
         assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 5);
         assert_eq!(res["my_texts2"]["buckets"][1]["key"], 1337.0);
@@ -1859,7 +1859,7 @@ mod tests {
             res["my_texts"]["buckets"][2]["key"],
             serde_json::Value::Null
         );
-        // text field with numner as missing fallback
+        // text field with number as missing fallback
         assert_eq!(res["my_texts2"]["buckets"][0]["key"], "Hello Hello");
         assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 4);
         assert_eq!(res["my_texts2"]["buckets"][1]["key"], 1337.0);