docs: document the fluent API for schema (#111)

- improve docs for the dialect descriptor - major improvements for the documentation around mapper to object
Seddryck · Feb 6, 2025 · 59697bc · 59697bc
1 parent 973058e
commit 59697bc
Show file tree

Hide file tree

Showing 4 changed files with 154 additions and 39 deletions.
diff --git a/docs/_data/navigation_docs.yml b/docs/_data/navigation_docs.yml
@@ -14,5 +14,6 @@
   - installation
   - basic-usage
   - fluent-api-profile-configuration
+  - fluent-api-schema
   - mapper-object-builder
 
diff --git a/docs/_docs/csv-dialect-descriptor.md b/docs/_docs/csv-dialect-descriptor.md
@@ -2,7 +2,7 @@
 title: CSV dialect descriptor
 tags: [configuration]
 ---
-The `CsvDialectDescriptor` class provides extensive configuration options for tuning the behavior of CSV parsing operations. Below is an explanation of each property and its potential impact on CSV processing. 
+The `CsvDialectDescriptor` class provides extensive configuration options for tuning the behavior of CSV parsing operations. Below is an explanation of each property and its potential impact on CSV processing.
 
 The description of PocketCsvReader is aligned with the [CSV Dialect Specification](https://specs.frictionlessdata.io/csv-dialect/#specification) provided by Frictionless Data.
 

diff --git a/docs/_docs/fluent-api-schema.md b/docs/_docs/fluent-api-schema.md
@@ -0,0 +1,113 @@
+---
+title: Fluent API for Schema
+tags: [configuration]
+---
+
+## Overview
+
+The Fluent API for schema definition in PocketCsvReader provides an intuitive and expressive way to define the structure of CSV data. This is particularly useful when working with `IDataReader`, where the `GetValue` method returns a boxed `object`. This powerful feature enables dynamic retrieval of any column's value without prior type knowledge, making it highly flexible for handling various data types. It seamlessly integrates with schema definitions to ensure proper casting and minimize conversion overhead..
+
+Defining a schema ensures that values are correctly interpreted and cast to their expected types, avoiding unnecessary type conversions at runtime.
+
+## Defining a Schema
+
+PocketCsvReader provides two ways to define schemas:
+
+- Indexed Schema: Fields are defined by their position (index) in the dataset.
+- Named Schema: Fields are defined by their column names.
+
+### Creating an Indexed Schema
+
+Indexed schemas are useful when working with CSV files that do not contain headers or when column order is fixed.
+
+Example:
+
+```csharp
+var schema = new SchemaDescriptorBuilder()
+    .Indexed()
+    .WithField<int>()
+    .WithField<string>(x => x.WithName("Description"))
+    .Build();
+```
+
+In this example:
+
+- The first column is an int.
+- The second column is a string with the name "Description".
+
+### Creating a Named Schema
+
+Named schemas provide more flexibility when working with CSV files that contain headers.
+
+Example:
+
+```csharp
+var schema = new SchemaDescriptorBuilder()
+    .Named()
+    .WithField<int>("ID")
+    .WithField<string>("Description")
+    .Build();
+```
+
+Here, the schema explicitly assigns types to fields based on column names.
+
+## Using Field Formatting and Format Descriptors
+
+The `WithFormat()` method allows specifying a format for fields that require special parsing, such as `DateTime` values, and relies on format descriptor builders like `IntegerFormatDescriptorBuilder`, `NumberFormatDescriptorBuilder`, and `TemporalFormatDescriptorBuilder` to handle culture-specific formatting details. This format is passed to the parser of the respective type, ensuring correct conversion from text to the expected type.
+
+**Example:**
+
+```csharp
+var schema = new SchemaDescriptorBuilder()
+    .Named()
+    .WithField<DateTime>("Date", x => x.WithFormat("dd/MM/yyyy"))
+    .Build();
+```
+
+In this example, the "Date" field is expected to be in the format `dd/MM/yyyy` (e.g., `25/12/2024`). The parser will use this format to correctly interpret and convert the string into a DateTime object.
+
+Using `WithFormat()` ensures that structured data such as dates are properly parsed and prevents errors due to mismatched formats. The `TemporalFormatDescriptorBuilder` provides control over date and time separators, ensuring compatibility with different cultural representations.
+
+### Numeric Formatting
+
+The `NumericFieldDescriptorBuilder` allows further customization of numeric fields:
+
+- `.WithDecimalChar(char decimalChar)`: Defines the character used for the decimal separator.
+- `.WithGroupChar(char? groupChar)`: Defines the character used for digit grouping. Passing null removes grouping.
+- `.WithoutGroupChar()`: Explicitly disables grouping.
+
+Example:
+
+```csharp
+var schema = new SchemaDescriptorBuilder()
+    .Named()
+    .WithNumericField<double>("Amount", x => x.WithDecimalChar(',')
+                                               .WithoutGroupChar())
+    .Build();
+```
+
+This defines an "Amount" field as a double, using `,` as the decimal separator and disabling digit grouping.
+
+### Custom Field Formatting
+
+```csharp
+var schema = new SchemaDescriptorBuilder()
+    .Named()
+    .WithCustomField<Point>("Location", x => x.WithFormat("x;y"))
+    .Build();
+```
+
+This ensures that the "Location" field is interpreted as a `Point` and formatted accordingly.
+
+When assigning a custom field, the parser is automatically searched for a method named `Parse` that accepts a string (the span to read) and an `IFormatProvider` as the last argument. Optionally, a second argument of type string can be provided to accept a format.
+
+## Benefits of Using a Schema
+
+- Ensures Type Safety: The schema guarantees that values are returned in their expected type.
+- Simplifies Parsing: Eliminates the need for manual type conversion when using IDataReader.GetValue.
+- Improves Readability: Fluent API provides a clean and declarative way to define schemas.
+- Customizable Numeric Fields: Allows control over decimal and grouping characters for numeric fields.
+
+## Conclusion
+
+Using the Fluent API for schema definition in PocketCsvReader significantly enhances the usability and reliability of working with CSV data, especially when processing untyped data from an IDataReader. By leveraging indexed or named schemas, developers can streamline their data processing workflows while ensuring type safety and maintainability.
diff --git a/docs/_docs/mapper-object-builder.md b/docs/_docs/mapper-object-builder.md
@@ -4,10 +4,9 @@ tags: [configuration]
 ---
 
 This documentation explains how to use the `SpanMapper<T>` and `SpanObjectBuilder<T>` classes for mapping and parsing flat-file data. Their primary purpose is to configure the mapping of fields from delimited data to a strongly-typed class.
+These features are designed to work with the `To<T>` method, which facilitates conversion of CSV rows into instances of T. Each row's fields are mapped according to the schema defined in `SpanObjectBuilder<T>` or the `SpanMapper<T>` delegate, ensuring accurate transformation into structured objects.
 
-## Delegates
-
-### SpanMapper&lt;T&gt;
+## SpanMapper&lt;T&gt;
 
 ```csharp
 public delegate T SpanMapper<T>(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans);
@@ -19,32 +18,44 @@ The `SpanMapper<T>` delegate maps data from a `ReadOnlySpan<char>` representing
 - **`span`**: The source `ReadOnlySpan<char>` containing the delimited row data.
 - **`fieldSpans`**: A collection of `FieldSpan` objects defining the start position and length of each field in the row.
 
-### Parse
+## Class SpanObjectBuilder&lt;T&gt;
+
+The `SpanObjectBuilder<T>` class is designed to instantiate strongly-typed objects (`T`) from delimited flat-file data. It supports default parsers for common data types and allows customization via the `SetParser` method and the `Parse` delegate.
+
+### `Instantiate` a SpanObjectBuilder
 
 ```csharp
-public delegate object? Parse(ReadOnlySpan<char> span);
+public T Instantiate(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans)
 ```
 
 **Purpose:**  
-The `Parse` delegate defines a method for parsing a `ReadOnlySpan<char>` into an object of a specific type. It is used to handle custom parsing for various data types in the `SpanObjectBuilder<T>` class.
+Creates an instance of type `T` using constructor injection. The fields in the constructor are populated based on `fieldSpans` and the mapped parsers in `ParserMapping`.
 
-- **`span`**: The `ReadOnlySpan<char>` containing the value to parse.
+- **`span`**: The `ReadOnlySpan<char>` containing the delimited row data.
+- **`fieldSpans`**: A collection of `FieldSpan` objects specifying the position and length of each field.
 
-## Class SpanObjectBuilder&lt;T&gt;
+**Behavior:**
 
-The `SpanObjectBuilder<T>` class is designed to instantiate strongly-typed objects (`T`) from delimited flat-file data using `SpanMapper<T>` and the `Parse` delegates. It supports default parsers for common data types and allows customization via the `SetParser` method.
+1. Identifies the appropriate constructor of `T` by matching the number of fields in `fieldSpans`.
+2. Iterates through each `FieldSpan`, using the associated parser to convert the field to the required type.
+3. If a type lacks a parser, throws an exception.
+4. If parsing fails, throws a `FormatException` with detailed error information.
 
-### Default Parsers
+**Example:**
 
-By default, the `SpanObjectBuilder<T>` supports the following types:
+```csharp
+var builder = new SpanObjectBuilder<MyClass>();
+var spans = new List<FieldSpan>
+{
+    new FieldSpan { Start = 0, Length = 5 },   // Field 1
+    new FieldSpan { Start = 6, Length = 10 }  // Field 2
+};
+var obj = builder.Instantiate("12345     true", spans);
+```
 
-- Strings
-- Numbers (`int`, `long`, `short`, `byte`, `float`, `double`, `decimal`)
-- Booleans
-- Dates (`DateTime`, `DateOnly`, `TimeOnly`, `DateTimeOffset`)
-- Characters (`char`)
+### Specifying the field parsers
 
-### SetParser&lt;T&gt;TField&lt;T&gt;
+#### SetParser&lt;TField&gt;
 
 If you need to parse additional types or override the default behavior, use the `SetParser` method.
 
@@ -61,36 +72,26 @@ var builder = new SpanObjectBuilder<MyClass>();
 builder.SetParser<Guid>(s => Guid.Parse(s));
 ```
 
-### `Instantiate`
+#### Parse delegate
 
 ```csharp
-public T Instantiate(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans)
+public delegate object? Parse(ReadOnlySpan<char> span);
 ```
 
 **Purpose:**  
-Creates an instance of type `T` using constructor injection. The fields in the constructor are populated based on `fieldSpans` and the mapped parsers in `ParserMapping`.
-
-- **`span`**: The `ReadOnlySpan<char>` containing the delimited row data.
-- **`fieldSpans`**: A collection of `FieldSpan` objects specifying the position and length of each field.
+The `Parse` delegate defines a method for parsing a `ReadOnlySpan<char>` into an object of a specific type. It is used to handle custom parsing for various data types in the `SpanObjectBuilder<T>` class.
 
-**Behavior:**
+- **`span`**: The `ReadOnlySpan<char>` containing the value to parse.
 
-1. Identifies the appropriate constructor of `T` by matching the number of fields in `fieldSpans`.
-2. Iterates through each `FieldSpan`, using the associated parser to convert the field to the required type.
-3. If a type lacks a parser, throws an exception.
-4. If parsing fails, throws a `FormatException` with detailed error information.
+#### Default Parsers
 
-**Example:**
+By default, the `SpanObjectBuilder<T>` supports the following types:
 
-```csharp
-var builder = new SpanObjectBuilder<MyClass>();
-var spans = new List<FieldSpan>
-{
-    new FieldSpan { Start = 0, Length = 5 },   // Field 1
-    new FieldSpan { Start = 6, Length = 10 }  // Field 2
-};
-var obj = builder.Instantiate("12345     true", spans);
-```
+- Strings
+- Numbers (`int`, `long`, `short`, `byte`, `float`, `double`, `decimal`)
+- Booleans
+- Dates (`DateTime`, `DateOnly`, `TimeOnly`, `DateTimeOffset`)
+- Characters (`char`)
 
 ### To&lt;T&gt; Method