Skip to content

Commit

Permalink
docs: document the fluent API for schema (#111)
Browse files Browse the repository at this point in the history
- improve docs for the dialect descriptor
- major improvements for the documentation around mapper to object
  • Loading branch information
Seddryck authored Feb 6, 2025
1 parent 973058e commit 59697bc
Show file tree
Hide file tree
Showing 4 changed files with 154 additions and 39 deletions.
1 change: 1 addition & 0 deletions docs/_data/navigation_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
- installation
- basic-usage
- fluent-api-profile-configuration
- fluent-api-schema
- mapper-object-builder

2 changes: 1 addition & 1 deletion docs/_docs/csv-dialect-descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: CSV dialect descriptor
tags: [configuration]
---
The `CsvDialectDescriptor` class provides extensive configuration options for tuning the behavior of CSV parsing operations. Below is an explanation of each property and its potential impact on CSV processing.
The `CsvDialectDescriptor` class provides extensive configuration options for tuning the behavior of CSV parsing operations. Below is an explanation of each property and its potential impact on CSV processing.

The description of PocketCsvReader is aligned with the [CSV Dialect Specification](https://specs.frictionlessdata.io/csv-dialect/#specification) provided by Frictionless Data.

Expand Down
113 changes: 113 additions & 0 deletions docs/_docs/fluent-api-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: Fluent API for Schema
tags: [configuration]
---

## Overview

The Fluent API for schema definition in PocketCsvReader provides an intuitive and expressive way to define the structure of CSV data. This is particularly useful when working with `IDataReader`, where the `GetValue` method returns a boxed `object`. This powerful feature enables dynamic retrieval of any column's value without prior type knowledge, making it highly flexible for handling various data types. It seamlessly integrates with schema definitions to ensure proper casting and minimize conversion overhead..

Defining a schema ensures that values are correctly interpreted and cast to their expected types, avoiding unnecessary type conversions at runtime.

## Defining a Schema

PocketCsvReader provides two ways to define schemas:

- Indexed Schema: Fields are defined by their position (index) in the dataset.
- Named Schema: Fields are defined by their column names.

### Creating an Indexed Schema

Indexed schemas are useful when working with CSV files that do not contain headers or when column order is fixed.

Example:

```csharp
var schema = new SchemaDescriptorBuilder()
.Indexed()
.WithField<int>()
.WithField<string>(x => x.WithName("Description"))
.Build();
```

In this example:

- The first column is an int.
- The second column is a string with the name "Description".

### Creating a Named Schema

Named schemas provide more flexibility when working with CSV files that contain headers.

Example:

```csharp
var schema = new SchemaDescriptorBuilder()
.Named()
.WithField<int>("ID")
.WithField<string>("Description")
.Build();
```

Here, the schema explicitly assigns types to fields based on column names.

## Using Field Formatting and Format Descriptors

The `WithFormat()` method allows specifying a format for fields that require special parsing, such as `DateTime` values, and relies on format descriptor builders like `IntegerFormatDescriptorBuilder`, `NumberFormatDescriptorBuilder`, and `TemporalFormatDescriptorBuilder` to handle culture-specific formatting details. This format is passed to the parser of the respective type, ensuring correct conversion from text to the expected type.

**Example:**

```csharp
var schema = new SchemaDescriptorBuilder()
.Named()
.WithField<DateTime>("Date", x => x.WithFormat("dd/MM/yyyy"))
.Build();
```

In this example, the "Date" field is expected to be in the format `dd/MM/yyyy` (e.g., `25/12/2024`). The parser will use this format to correctly interpret and convert the string into a DateTime object.

Using `WithFormat()` ensures that structured data such as dates are properly parsed and prevents errors due to mismatched formats. The `TemporalFormatDescriptorBuilder` provides control over date and time separators, ensuring compatibility with different cultural representations.

### Numeric Formatting

The `NumericFieldDescriptorBuilder` allows further customization of numeric fields:

- `.WithDecimalChar(char decimalChar)`: Defines the character used for the decimal separator.
- `.WithGroupChar(char? groupChar)`: Defines the character used for digit grouping. Passing null removes grouping.
- `.WithoutGroupChar()`: Explicitly disables grouping.

Example:

```csharp
var schema = new SchemaDescriptorBuilder()
.Named()
.WithNumericField<double>("Amount", x => x.WithDecimalChar(',')
.WithoutGroupChar())
.Build();
```

This defines an "Amount" field as a double, using `,` as the decimal separator and disabling digit grouping.

### Custom Field Formatting

```csharp
var schema = new SchemaDescriptorBuilder()
.Named()
.WithCustomField<Point>("Location", x => x.WithFormat("x;y"))
.Build();
```

This ensures that the "Location" field is interpreted as a `Point` and formatted accordingly.

When assigning a custom field, the parser is automatically searched for a method named `Parse` that accepts a string (the span to read) and an `IFormatProvider` as the last argument. Optionally, a second argument of type string can be provided to accept a format.

## Benefits of Using a Schema

- Ensures Type Safety: The schema guarantees that values are returned in their expected type.
- Simplifies Parsing: Eliminates the need for manual type conversion when using IDataReader.GetValue.
- Improves Readability: Fluent API provides a clean and declarative way to define schemas.
- Customizable Numeric Fields: Allows control over decimal and grouping characters for numeric fields.

## Conclusion

Using the Fluent API for schema definition in PocketCsvReader significantly enhances the usability and reliability of working with CSV data, especially when processing untyped data from an IDataReader. By leveraging indexed or named schemas, developers can streamline their data processing workflows while ensuring type safety and maintainability.
77 changes: 39 additions & 38 deletions docs/_docs/mapper-object-builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ tags: [configuration]
---

This documentation explains how to use the `SpanMapper<T>` and `SpanObjectBuilder<T>` classes for mapping and parsing flat-file data. Their primary purpose is to configure the mapping of fields from delimited data to a strongly-typed class.
These features are designed to work with the `To<T>` method, which facilitates conversion of CSV rows into instances of T. Each row's fields are mapped according to the schema defined in `SpanObjectBuilder<T>` or the `SpanMapper<T>` delegate, ensuring accurate transformation into structured objects.

## Delegates

### SpanMapper&lt;T&gt;
## SpanMapper&lt;T&gt;

```csharp
public delegate T SpanMapper<T>(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans);
Expand All @@ -19,32 +18,44 @@ The `SpanMapper<T>` delegate maps data from a `ReadOnlySpan<char>` representing
- **`span`**: The source `ReadOnlySpan<char>` containing the delimited row data.
- **`fieldSpans`**: A collection of `FieldSpan` objects defining the start position and length of each field in the row.

### Parse
## Class SpanObjectBuilder&lt;T&gt;

The `SpanObjectBuilder<T>` class is designed to instantiate strongly-typed objects (`T`) from delimited flat-file data. It supports default parsers for common data types and allows customization via the `SetParser` method and the `Parse` delegate.

### `Instantiate` a SpanObjectBuilder

```csharp
public delegate object? Parse(ReadOnlySpan<char> span);
public T Instantiate(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans)
```

**Purpose:**
The `Parse` delegate defines a method for parsing a `ReadOnlySpan<char>` into an object of a specific type. It is used to handle custom parsing for various data types in the `SpanObjectBuilder<T>` class.
Creates an instance of type `T` using constructor injection. The fields in the constructor are populated based on `fieldSpans` and the mapped parsers in `ParserMapping`.

- **`span`**: The `ReadOnlySpan<char>` containing the value to parse.
- **`span`**: The `ReadOnlySpan<char>` containing the delimited row data.
- **`fieldSpans`**: A collection of `FieldSpan` objects specifying the position and length of each field.

## Class SpanObjectBuilder&lt;T&gt;
**Behavior:**

The `SpanObjectBuilder<T>` class is designed to instantiate strongly-typed objects (`T`) from delimited flat-file data using `SpanMapper<T>` and the `Parse` delegates. It supports default parsers for common data types and allows customization via the `SetParser` method.
1. Identifies the appropriate constructor of `T` by matching the number of fields in `fieldSpans`.
2. Iterates through each `FieldSpan`, using the associated parser to convert the field to the required type.
3. If a type lacks a parser, throws an exception.
4. If parsing fails, throws a `FormatException` with detailed error information.

### Default Parsers
**Example:**

By default, the `SpanObjectBuilder<T>` supports the following types:
```csharp
var builder = new SpanObjectBuilder<MyClass>();
var spans = new List<FieldSpan>
{
new FieldSpan { Start = 0, Length = 5 }, // Field 1
new FieldSpan { Start = 6, Length = 10 } // Field 2
};
var obj = builder.Instantiate("12345 true", spans);
```

- Strings
- Numbers (`int`, `long`, `short`, `byte`, `float`, `double`, `decimal`)
- Booleans
- Dates (`DateTime`, `DateOnly`, `TimeOnly`, `DateTimeOffset`)
- Characters (`char`)
### Specifying the field parsers

### SetParser&lt;T&gt;TField&lt;T&gt;
#### SetParser&lt;TField&gt;

If you need to parse additional types or override the default behavior, use the `SetParser` method.

Expand All @@ -61,36 +72,26 @@ var builder = new SpanObjectBuilder<MyClass>();
builder.SetParser<Guid>(s => Guid.Parse(s));
```

### `Instantiate`
#### Parse delegate

```csharp
public T Instantiate(ReadOnlySpan<char> span, IEnumerable<FieldSpan> fieldSpans)
public delegate object? Parse(ReadOnlySpan<char> span);
```

**Purpose:**
Creates an instance of type `T` using constructor injection. The fields in the constructor are populated based on `fieldSpans` and the mapped parsers in `ParserMapping`.

- **`span`**: The `ReadOnlySpan<char>` containing the delimited row data.
- **`fieldSpans`**: A collection of `FieldSpan` objects specifying the position and length of each field.
The `Parse` delegate defines a method for parsing a `ReadOnlySpan<char>` into an object of a specific type. It is used to handle custom parsing for various data types in the `SpanObjectBuilder<T>` class.

**Behavior:**
- **`span`**: The `ReadOnlySpan<char>` containing the value to parse.

1. Identifies the appropriate constructor of `T` by matching the number of fields in `fieldSpans`.
2. Iterates through each `FieldSpan`, using the associated parser to convert the field to the required type.
3. If a type lacks a parser, throws an exception.
4. If parsing fails, throws a `FormatException` with detailed error information.
#### Default Parsers

**Example:**
By default, the `SpanObjectBuilder<T>` supports the following types:

```csharp
var builder = new SpanObjectBuilder<MyClass>();
var spans = new List<FieldSpan>
{
new FieldSpan { Start = 0, Length = 5 }, // Field 1
new FieldSpan { Start = 6, Length = 10 } // Field 2
};
var obj = builder.Instantiate("12345 true", spans);
```
- Strings
- Numbers (`int`, `long`, `short`, `byte`, `float`, `double`, `decimal`)
- Booleans
- Dates (`DateTime`, `DateOnly`, `TimeOnly`, `DateTimeOffset`)
- Characters (`char`)

### To&lt;T&gt; Method

Expand Down

0 comments on commit 59697bc

Please sign in to comment.