diff --git a/README.md b/README.md index 26a365d..79f93bf 100644 --- a/README.md +++ b/README.md @@ -11,11 +11,12 @@ rpft --help # Command Line Interface (CLI) -The CLI supports three subcommands: +The CLI supports the following subcommands: -- `create_flows`: create RapidPro flows (in JSON format) from spreadsheets +- `create_flows`: create RapidPro flows (in JSON format) from spreadsheets using content index - `flows_to_sheets`: convert RapidPro flows (in JSON format) into spreadsheets - `convert`: save input spreadsheets as JSON +- `save_data_sheets`: save input spreadsheets as nested JSON using content index - an experimental feature that is likely to change. Full details of the available options for each can be found via the help feature: diff --git a/docs/components.md b/docs/components.md index 3cbd4f8..94c5981 100644 --- a/docs/components.md +++ b/docs/components.md @@ -2,10 +2,10 @@ This toolkit consists of three components. -The first component ([](/src/rpft/parsers/common)) is RapidPro-agnostic and takes care of reader spreadsheets and converting them into internal data models and other output formats, see [](sheets.md) +`rpft.parsers.common` is RapidPro-agnostic and takes care of reader spreadsheets and converting them into internal data models and other output formats, see [sheets.md](sheets.md). -The second component ([](/src/rpft/parsers/creation)) defines data models for a spreadsheet format for RapidPro flows, and process spreadsheets into RapidPro flows (and back) using the first component. +`rpft.parsers.creation` defines data models for a spreadsheet format for RapidPro flows, and process spreadsheets into RapidPro flows (and back) using `rpft.parsers.common`. -The third component ([](/src/rpft/rapidpro)) defines internal representations of RapidPro flows and to read and write to a JSON format that can be import to/exported from RapidPro. It is partially entangled with the second component, as it needs to be aware of the data models of the second component to convert RapidPro flows into the spreadsheet format. +`rpft.rapidpro` defines internal representations of RapidPro flows, and reads and writes to a JSON format that can be imported to and exported from RapidPro. It is partially entangled with `rpft.parsers.creation` as it needs to be aware of the data models of the second component to convert RapidPro flows into the spreadsheet format. -The latter two components are (poorly) documented here: [](rapidpro.md) +The latter two components are [documented](rapidpro.md). diff --git a/docs/models.md b/docs/models.md index 8437f0d..c46f9e6 100644 --- a/docs/models.md +++ b/docs/models.md @@ -1,20 +1,16 @@ # Models -`RowModel`s are subclasses of [`pydantic.BaseModel`] -(https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage), and may -contain basic types, lists and other models as attributes, nested arbirarily -deep. Every `Sheet` can only be parsed in the context of a given `RowModel` -(which can, however, be automatically inferred from the sheet headers, if desired). +`RowModel`s are subclasses of [pydantic.BaseModel], and may contain basic types, lists and other models as attributes, nested arbirarily deep. Every `Sheet` can only be parsed in the context of a given `RowModel` (which can, however, be automatically inferred from the sheet headers, if desired). -Technically, there is no `RowModel` class, but instead it is called `ParserModel` -and is defined in [](/src/rpft/parsers/common/rowparser.py). `ParserModel` attributes have to be -basic types, lists or `ParserModel`s. -The only addition to `pydantic.BaseModel` are the (optional) methods `header_name_to_field_name`, `field_name_to_header_name` and (for full row models) `header_name_to_field_name_with_context` that allow remapping -column header names to different model attributes. +Technically, there is no `RowModel` class, but instead is called `ParserModel` and is defined in `rpft.parsers.common.rowparser`. `ParserModel` attributes have to be basic types, lists or `ParserModel`s. The only additions to `pydantic.BaseModel` are the optional methods: -Example: +- `header_name_to_field_name` +- `field_name_to_header_name` +- `header_name_to_field_name_with_context` (for full row models) -``` +These methods allow remapping column header names to different model attributes, for example: + +```python class SubModel(ParserModel): word: str = "" number: int = 0 @@ -24,36 +20,29 @@ class MyModel(ParserModel): sub: SubModel = SubModel() ``` -The headers of a sheet and its content that can be parsed into `MyModel` could for example be: +The following table could be parsed into an instance of `MyModel`: |numbers.1 | numbers.2 | sub.word | sub.number | |----------|-----------|----------|------------| | 42 | 16 | hello | 24 | -with each column containing a basic type (int, int, int, str, int). - - -However, the headers and content could also look like this: +Each column contains a basic type, in this case, `int`, `int`, `str`, `int`. However, the table could be expressed differently. |numbers | sub | |--------|-----------------------| | 42;16 | word;hello\|number;24 | -With the first column representing a `List[int]` and the second a `SubModel`. - -How sheets and their column headers correspond to `RowModel`s is specified in -[RapidPro sheet specification]. - -More examples can also be found in the tests: +The first column has type `List[int]`, the second `SubModel`. How sheets and their column headers correspond to `RowModel`s is specified in the [RapidPro sheet specification]. -- [](/src/rpft/parsers/common/tests/test_rowparser.py) -- [](/src/rpft/parsers/common/tests/test_full_rows.py) -- [](/src/rpft/parsers/common/tests/test_differentways.py) +More examples can be found in the tests: +- `tests.test_rowparser` +- `tests.test_full_rows` +- `tests.test_differentways` -The `header_name_to_field_name` and related `ParserModel` methods can be used to map column headers to fields of a different name, for example: +The method `header_name_to_field_name` and related `ParserModel` methods can be used to map column headers to fields of a different name, for example: -``` +```python class MyModel(ParserModel): number: int = 0 first_name_and_surname: str = "" @@ -69,36 +58,41 @@ class MyModel(ParserModel): return field ``` -Then +Then, the following would be a valid table that can be converted into `MyModel` instances. | number | name | |--------|------| | 42 | John | -would be a valid table that can be converted into `MyModel` instances. +The original motivation for this feature was that the original flow sheet format had a column named 'from', which is a keyword in Python, and thus could not be used as a field name, so it had to be remapped. + +There is also a more complex use case where we have a list of conditions, each condition being a model with multiple attributes, such as value, variable and name (when we think of it from a OOP standpoint). However, the original sheet format had columns 'condition', 'condition\_variable', 'condition\_name', etc, containing a list of the value/variable/name fields respectively, so technically their headers should have been 'condition.\*.value', 'condition.\*.variable' and 'condition.\*.name'. The remapping feature is used to map the short forms to the accurate forms. + +Then there is the context specific remapping, where remapping happens taking the content of the row into account. In practice, we remap certain headers based on a row type (encoded in a type column). This is when different row types have different attributes (so really it's arguable whether they should be in a spreadsheet at all), and for compactness, we map some of their attributes to the same column header. In particular, each row type has a 'main argument', which may be of different types, which all get mapped to the 'message_text' column header. + +The module `rpft.parsers.creation.flowrowmodel` shows all of these use cases. ## Automatic model inference -Models of sheets can now be automatically inferred if no explicit model is provided, see [model inference](/src/rpft/parsers/common/model_inference.py) +Models of sheets can now be automatically inferred if no explicit model is provided, see [model inference]. -This is done exclusively by parsing the header row of a sheet. Headers can be annotated with types (basic types and list; dict and existing models are currently not supported). If no annotation is present, the column is assumed to be a string. +This is done exclusively by parsing the header row of a sheet. Headers can be annotated as basic types and `list`. `dict` and existing models are currently not supported. If no annotation is present, the column is assumed to be a string. Examples of what the data in a column can represent: -- `field`: `field` is inferred to be a string -- `field:int`: `field` is inferred to be a int -- `field:list`: `field` is inferred to be a list -- `field:List[int]`: `field` is inferred to be a list of integers -- `field.1`: `field` is inferred to be a list, and this column contains its first entry -- `field.1:int`: `field` is inferred to be a list of integers, and this column contains its first entry -- `field.subfield`: `field` is inferred to be another model with one or multiple subfields, and this column contains values for the `subfield` subfield -- `field.subfield:int`: `field` is inferred to be another model with one or multiple subfields, and this column contains values for the `subfield` subfield which is inferred to be an integer -- `field.1.subfield`: `field` is inferred to be a list of another model with one or multiple subfields, and this column contains values for the `subfield` subfield of the first list entry - -Intermediate models like in the last three examples are created automatically. +- `field`: no annotation; type assumed to be `str` +- `field:int`: integer +- `field:list`: list +- `field:List[int]`: list of integers +- `field.1`: first entry in a list +- `field.1:int`: first entry in list; integer +- `field.subfield`: subfield; string +- `field.subfield:int`: integer subfield +- `field.1.subfield`: list of objects with string subfield; first item of list -Field name remapping cannot be done when using automated model inference. +Intermediate models like in the last three examples are created automatically. Field name remapping cannot be done when using automated model inference. `*`-notation is also not currently supported, but could be done in principle. -`*`-notation is also not currently supported, but could be done in principle. +[model inference]: /src/rpft/parsers/common/model_inference.py +[pydantic.BaseModel]: https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage [RapidPro sheet specification]: https://docs.google.com/document/d/1m2yrzZS8kRGihUkPW0YjMkT_Fmz_L7Gl53WjD0AJRV0/edit?usp=sharing diff --git a/docs/rapidpro.md b/docs/rapidpro.md index 6e305f7..d438ab7 100644 --- a/docs/rapidpro.md +++ b/docs/rapidpro.md @@ -5,26 +5,24 @@ Used for parsing collections of flows (with templating). Flow-specific features are ommitted here. We only give the general idea of a content index and its parser. -The [`ContentIndexParser`](/src/rpft/parsers/creation/contentindexparser.py) takes a [`SheetReader`](sheets.md) and looks for one or multiple sheets called `content_index` and processes them (in the order provided). Rows of a content index generally reference other sheets with additional meta information. These may also be, again, content index sheets, which in that case are parsed recursively (and from a parsing order perspective, its rows are parsed right in between the rows above and the rows below of the containing content index). +The class `rpft.parsers.creation.contentindexparser.ContentIndexParser` takes a [SheetReader](sheets.md) and looks for one or multiple sheets called `content_index` and processes them (in the order provided). Rows of a content index generally reference other sheets with additional meta information. These may also be, again, content index sheets, which in that case are parsed recursively (and from a parsing order perspective, its rows are parsed right in between the rows above and the rows below of the containing content index). -In essence (modulo technicalities), for each type of sheet, the content index maintains dictionaries (one per sheet type) mapping sheet names to the actual sheets. When a content index sheet is processed, each row is inspected and the referenced sheet added to the relevant (type-specific) dictionary. If an entry of the given name already exists, it is overwritten. Thus it is possible to have a parent content index containing some data, and a (later) child content index replacing some of that data. There is also an `ignore_row` type indicating that a previously referenced sheet should be deleted from its respective index. +In essence, for each type of sheet, the content index maintains dictionaries (one per sheet type) mapping sheet names to the actual sheets. When a content index sheet is processed, each row is inspected and the referenced sheet added to the relevant (type-specific) dictionary. If an entry with a given name already exists, it is overwritten. Thus it is possible to have a parent content index containing some data, and a (later) child content index replacing some of that data. There is also an `ignore_row` type indicating that a previously referenced sheet should be deleted from its respective index. Sheets can also be renamed before being added to the respective dict using the `new_name` column. Details of the content index sheet format are detailed in [New features documentation]. -There are two sheet types of particular interest +There are two sheet types of particular interest. -- [`DataSheet`](/src/rpft/parsers/creation/contentindexparser.py): Similar to a [`RowDataSheet`](sheets.md), however, assumed that the `RowModel` has an `ID` field, and rather than storing a list of rows, it stores an ordered dict of rows, indexed by their ID. -- [`TemplateSheet`](/src/rpft/parsers/creation/contentindexparser.py): Wrapper around `tablib.Dataset`, with template arguments. +- `rpft.parsers.creation.contentindexparser.DataSheet`: Similar to a [RowDataSheet](sheets.md), but assumes that the `RowModel` has an `ID` field, and, rather than storing a list of rows, stores an ordered `dict` of rows, indexed by their ID. +- `rpft.parsers.creation.contentindexparser.TemplateSheet`: Wrapper around `tablib.Dataset`, with template arguments. -Note: It may be worthwhile unifying the data structures used here, to be consistent with `Sheet` and `RowDataSheet` documented in [](sheets.md). Also see the discussion there why `DataSheet`s can be exported to nested JSON, while `TemplateSheet`s can only be exported to flat JSON. +Note: It may be worthwhile unifying the data structures used here, to be consistent with `Sheet` and `RowDataSheet` documented in [sheets](sheets.md). Also see the discussion there why `DataSheet`s can be exported to nested JSON, while `TemplateSheet`s can only be exported to flat JSON. -`DataSheet`s are often used to instantiate `TemplateSheet`s, and the ContentIndexParser has mechanisms for this, see [New features documentation]. Furthermore, `DataSheet`s can also be concatenated, filtered and sorted via the `operation` column, see [here](https://docs.google.com/document/d/1Onx2RhNoWKW9BQvFrgTc5R5hcwDy1OMsLKnNB7YxQH0/edit#heading=h.c93jouk7sqq) +`DataSheet`s are often used to instantiate `TemplateSheet`s, and the `ContentIndexParser` has mechanisms for this, see [New features documentation]. Furthermore, `DataSheet`s can also be concatenated, filtered and sorted via the `operation` column, see [Data sheet operations]. - - -Relevant code: `parse_all_flows` in [](/src/rpft/parsers/creation/contentindexparser.py). +Relevant code in `rpft.parsers.creation.contentindexparser.ContentIndexParser.parse_all_flows`. Examples: @@ -34,9 +32,7 @@ Examples: ## FlowParser -See `/src/rpft/parsers/creation/flowparser.py` and [RapidPro sheet specification]. -Parser to turn sheets in the standard format (Documentation TBD) into RapidPro flows. -See `/src/tests/input` and `/src/tests/output` for some examples. +See `rpft.parsers.creation.flowparser` and [RapidPro sheet specification]. Parser to turn sheets in the standard format (Documentation TBD) into RapidPro flows. See `/src/tests/input` and `/src/tests/output` for some examples. Examples: @@ -46,11 +42,9 @@ Examples: ## RapidPro models -See `/src/rpft/rapidpro/models`. Models for flows, nodes, etc, with convenience -functions to assemble RapidPro flows. Each model has a `render` method -to render the model into a dictionary, that can be exported to a json -file whose fields are consistent with the format used by RapidPro. +See `rpft.rapidpro.models`. Models for flows, nodes, etc, with convenience functions to assemble RapidPro flows. Each model has a `render` method to render the model into a dictionary, that can be exported to a json file whose fields are consistent with the format used by RapidPro. +[Data sheet operations]: https://docs.google.com/document/d/1Onx2RhNoWKW9BQvFrgTc5R5hcwDy1OMsLKnNB7YxQH0/edit#heading=h.c93jouk7sqq [RapidPro sheet specification]: https://docs.google.com/document/d/1m2yrzZS8kRGihUkPW0YjMkT_Fmz_L7Gl53WjD0AJRV0/edit?usp=sharing [New features documentation]: https://docs.google.com/document/d/1Onx2RhNoWKW9BQvFrgTc5R5hcwDy1OMsLKnNB7YxQH0/edit?usp=sharing diff --git a/docs/sheets.md b/docs/sheets.md index bf80d32..b8b56cd 100644 --- a/docs/sheets.md +++ b/docs/sheets.md @@ -1,72 +1,52 @@ # Generic parsers from spreadsheets to data models -On an idealized and simplified level, we have the following chain of converting -different representations of data: +On an idealized and simplified level, we have the following chain of converting different representations of data: Spreadsheet File: An XLSX, folder of CSVs, ID of a Google Sheet, flat JSON (list of dicts mapping column headers to column entries) -^ - +^
| `SheetReader`: Upon construction, reads a file and then has a dict -of `Sheet`s indexed by name. - +of `Sheet`s indexed by name.
v -`Sheet`: Wrapper around [`tablib.Dataset`] -(https://tablib.readthedocs.io/en/stable/api.html#dataset-object) - -^ - -| `SheetParser`, `RowParser`, `CellParser` +`Sheet`: Wrapper around [tablib.Dataset] +^
+| `SheetParser`, `RowParser`, `CellParser`
v `RowDataSheet`: Wrapper around `List[RowModel]` for some `RowModel`, -which is a subclass of [`pydantic.BaseModel`] -(https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage) - -^ - -| pydantic functionality +which is a subclass of [pydantic.BaseModel] +^
+| pydantic functionality
v Nested dict/JSON -`RowModel`s are classes representing the data contained in an individual row. -Thus, the data in one row is a `RowModel` instance. `RowModel` may contain -basic types, lists or other `pydantic.BaseModel`s as attributes, and thereby -their data can be nested. +`RowModel`s are classes representing the data contained in an individual row. Thus, the data in one row is a `RowModel` instance. `RowModel` may contain basic types, lists or other `pydantic.BaseModel`s as attributes, and thereby their data can be nested. -In practice, `Sheet` and `RowDataSheet` are used somewhat inconsistently, and in -their stead sometimes `tablib.Dataset`, `List[RowModel]` or other custom -classes are used, maybe motivating a refactor in the future. We outline the -details of the three conversion steps below. +In practice, `Sheet` and `RowDataSheet` are used somewhat inconsistently, and in their stead sometimes `tablib.Dataset`, `List[RowModel]` or other custom classes are used, maybe motivating a refactor in the future. We outline the details of the three conversion steps below. ## Conversion between spreadsheet files and `Sheet`s -SheetReaders and `Sheet`s are defined in [](/src/rpft/parsers/sheets.py) +SheetReaders and `Sheet`s are defined in `rpft.parsers.sheets`. -The `Sheet` class wraps [`tablib.Dataset`] -(https://tablib.readthedocs.io/en/stable/api.html#dataset-object) (which is -often referred to as `table`). `Sheet`s also have a `name`. +The `Sheet` class wraps [tablib.Dataset] (which is often referred to as `table`). `Sheet`s also have a `name`. ### Forward direction -`Sheet`s are produced by SheetReaders that can read different input file -formats. Currently, `Sheet`s have an additional SheetReader attribute `reader` -indicating which reader produced it (it's not clear whether this is necessary. -refactor?). +`Sheet`s are produced by SheetReaders that can read different input file formats. Currently, `Sheet`s have an additional SheetReader attribute `reader` indicating which reader produced it, which may be useful for error reporting. + +`SheetReader`s take a file reference upon construction and provide the following methods to access `Sheet`s by name: -SheetReaders take a file reference upon construction, and provide -`reader.get_sheet(name) -> Sheet` and `reader.get_sheets_by_name(name) -> List[Sheet]` -to access `Sheet`s by their name. While generally the name of a sheet -within a file is unique, the latter function is useful with the -`CompositeSheetReader`, which is composed of multiple SheetReaders and thus -implicitly references multiple files. +- `get_sheet(name) -> Sheet` +- `get_sheets_by_name(name) -> List[Sheet]` + +While generally the name of a sheet within a file is unique, the latter function is useful with the `CompositeSheetReader`, which is composed of multiple SheetReaders and thus implicitly references multiple files. We have subclasses of `AbstractSheetReader` for different formats: @@ -79,68 +59,45 @@ We have subclasses of `AbstractSheetReader` for different formats: ### Reverse direction -The are currently no SheetWriters, and thus this conversion step is -one-directional. When we want to write spreadsheets, this is implemented ad-hoc -by using `tablib.Dataset`'s export functionality (for CSV, XLSX), see for -example [`RowDataSheet.export`](`/src/rpft/parsers/common/rowdatasheet.py`), or -`json.dump` for flat JSON. +The are currently no SheetWriters, and thus this conversion step is one-directional. When we want to write spreadsheets, this is implemented ad-hoc by using the functionality of `tablib.Dataset` to export to CSV and XLSX. For example, see [RowDataSheet.export](/src/rpft/parsers/common/rowdatasheet.py), or `json.dump` for flat JSON. ## Conversion between `Sheet`s and `RowDataSheet`s -While `Sheet` is a simple table representation of a sheet with rows and columns -(which have column headers), a [`RowDataSheet`](/src/rpft/parsers/common/rowdatasheet.py) -represents a list of `RowModel` -instances. `RowModel`s are subclasses of [`pydantic.BaseModel`] -(https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage), and may -contain basic types, lists and other models as attributes, nested arbirarily -deep. How sheets and their column headers correspond to `RowModel`s is -documented in more detail in [](models.md). +While `Sheet` is a simple table representation of a sheet with rows and columns (which have column headers), a [RowDataSheet](/src/rpft/parsers/common/rowdatasheet.py) represents a list of `RowModel` instances. `RowModel`s are subclasses of [pydantic.BaseModel], and may contain basic types, lists and other models as attributes, nested arbirarily deep. How sheets and their column headers correspond to `RowModel`s is documented in more detail in [models](models.md). ### Forward direction The conversion from `Sheet` and `RowDataSheet` invokes a 3 level hierarchy: -`SheetParser`, `RowParser` and `CellParser`. + +- `SheetParser` + - `RowParser` + - `CellParser` #### Sheet parser -The [`SheetParser`](/src/rpft/parsers/common/sheetparser.py) has a `table` -(`tablib.Dataset`) and a `RowParser` and provides two functions: +The [SheetParser](/src/rpft/parsers/common/sheetparser.py) has a `table` (`tablib.Dataset`) and a `RowParser` and provides two functions: - `parse_all`: returns the output as `List[RowModel]` - `get_row_data_sheet`: returns the output as `RowDataSheet` -The `SheetParser` invokes the `RowParser` to convert each of the rows. -The `RowParser` has the `RowModel` that each row is to be converted to. +The `SheetParser` invokes the `RowParser` to convert each of the rows. The `RowParser` has the `RowModel` that each row is to be converted to. #### Row parser -A [`RowParser`](`/src/rpft/parsers/common/rowparser.py`) has an -associated `RowModel` and a `CellParser`. +A [RowParser](/src/rpft/parsers/common/rowparser.py) has an associated `RowModel` and a `CellParser`. -It provides a function `parse_row(data)` to convert a spreadsheet -row into a `RowModel` instance containing the provided data. -`data` is a `dict[str, str]` mapping column headers to the -corresponding entry of the spreadsheet in this row, and is provided -by the `SheetParser`. Column headers determine which field of the -model the column contains data for, and different ways to address fields -in the data models are supported, see [](models.md). +It provides a function `parse_row(data)` to convert a spreadsheet row into a `RowModel` instance containing the provided data. `data` is a `dict[str, str]` mapping column headers to the corresponding entry of the spreadsheet in this row, and is provided by the `SheetParser`. Column headers determine which field of the model the column contains data for, and different ways to address fields in the data models are supported, see [models](models.md). -The `RowParser` interprets the column headers and if the column contains -a non-basic type (e.g. a list or a submodel), it invokes the `CellParser` -to convert the cell content into a nested list, which it then processes -further to assign values to the model fields. +The `RowParser` interprets the column headers and if the column contains a non-basic type (e.g. a list or a submodel), it invokes the `CellParser` to convert the cell content into a nested list, which it then processes further to assign values to the model fields. #### Cell parser -The [`CellParser`](/src/rpft/parsers/common/cellparser.py) has a function -`parse(value)` that takes a string (the cell content) and converts it into -a nested list. It uses `|` and `;` characters (in that order) as list -separators, and `\` can be used as an escape chacter. +The [CellParser](/src/rpft/parsers/common/cellparser.py) has a function `parse(value)` that takes a string (the cell content) and converts it into a nested list. It uses `|` and `;` characters (in that order) as list separators. `\` can be used as an escape character. Examples: @@ -148,102 +105,155 @@ Examples: - `a\;b` --> 'a;b' - `a,b|1,2` --> [['a','b'],['1','2']] -More examples can be found in [](/tests/test_cellparser.py). +More examples can be found in [/tests/test\_cellparser.py](/tests/test\_cellparser.py). #### Templating -Cells of a sheet may contain [Jinja2](https://jinja.palletsprojects.com/en/3.1.x/) templates. -For example, the content of a cell may look like `Hello {{user_name}}!`. +Cells of a sheet may contain [Jinja2](https://jinja.palletsprojects.com/en/3.1.x/) templates. For example, the content of a cell may look like: -With a given templating context mapping variable names to values -(e.g. {"user_name": "Chris"), such a string can be evaluated, -e.g. to `Hello Chris!`. +``` +Hello {{user_name}}!. +``` -More examples can be found in [](/tests/test_cellparser.py). +Given a templating context mapping variable names to values, for example: + +```python +{"user_name": "Chris"} +``` + +The template above can be evaluated to the string `Hello Chris!`. + +More examples can be found in [/tests/test\_cellparser.py](/tests/test\_cellparser.py). ##### Instantiating templated sheets -A `SheetParser` as an optional templating context, that it passes down to the `RowParser` -for each row, which in turn passes it down to the `CellParser` for each cell. -The `CellParser` will try to evaluate all templates that appear in the cell, and -throw an error if a variable is undefined (because it is missing from the context). +A `SheetParser` may have a templating context, that it passes down to the `RowParser` for each row, which in turn passes it down to the `CellParser` for each cell. The `CellParser` will try to evaluate all templates that appear in the cell and throw an error if a variable is undefined (because it is missing from the context). -Therefore, if a `Sheet` contains templating, we need a templating context in order to -instantiate the templates and convert the `Sheet` into a `RowDataSheet`. -It is not possible to convert `Sheet`s with uninstantiated templates into -`RowDataSheet`s (and thus also nested JSONs). If we want to store such sheets as JSON, -we have to store it as flat JSON. +Therefore, if a `Sheet` contains templates, we need a templating context in order to instantiate the templates and convert the `Sheet` into a `RowDataSheet`. It is not possible to convert `Sheet`s with uninstantiated templates into `RowDataSheet`s (and thus also nested JSONs). -In fact, such a conversion of uninstatiated templates is in principle not possible, -as the following example shows: +If we want to store such sheets as JSON, we have to store it as flat JSON. In fact, such a conversion of uninstatiated templates is in principle not possible, as the following example shows. Imagine we have a column encoding a `list` field in our `RowModel`, and the corresponding cell contains: -Imagine we have a column encoding a `list` field in our `RowModel`, and the -corresponding cell contains `{% for e in my_list %}{{e}};{% endfor %}`. -With a context (e.g. `{"my_list": [1, 2, 3]}`), this can be evalued to -`"1;2;3;"` and processed by the `CellParser` into `[1, 2, 3]`, which the -`RowParser` then assigns to the field in the `RowModel`. However, uninstantiated, -this cell contains the string `"{% for e in my_list %}{{e}};{% endfor %}"`, -and the `RowParser` cannot assign a string to a field of type `list`. +``` +{% for e in my_list %}{{e}};{% endfor %} +``` +Given the context: -##### Control flow and changing context +```python +{"my_list": [1, 2, 3]} +``` -In addition to `parse_all`, the `SheetParser` also offers a function -`parse_next_row` to parse a sheet row by row. The invoker of `parse_next_row` -may change the templating context between calls by using `add_to_context` -and `remove_from_context`. Thus, the invoker may interpret the content of a -row, and adjust the templating context accordingly before parsing the next row. -The invoker may also repeat the parsing of rows by setting and returning to -bookmarks, e.g. to implement for loops, using `create_bookmark`, -`go_to_bookmark` and `remove_bookmark`. +The template will be evaluated to `"1;2;3;"`, which will be interpreted by the `CellParser` as the list `[1, 2, 3]`. The `RowParser` then assigns the list to the field in the `RowModel`. However, uninstantiated, this cell contains the raw template as a string, and the `RowParser` cannot assign a string to a field of type `list`. +##### Control flow and changing context + +In addition to `parse_all`, the `SheetParser` also offers a function `parse_next_row` to parse a sheet row by row. The invoker of `parse_next_row` may change the templating context between calls by using `add_to_context` and `remove_from_context`. Thus, the invoker may interpret the content of a row, and adjust the templating context accordingly before parsing the next row. The invoker may also repeat the parsing of rows by setting and returning to bookmarks, e.g. to implement for loops, using `create_bookmark`, `go_to_bookmark` and `remove_bookmark`. + ### Reverse direction -The `RowDataSheet` class has a method `convert_to_tablib` which converts -its content to `tablib.Dataset`. It uses the `unparse_row` method of the -`RowParser` associated to the `RowDataSheet` to turn each `RowModel` instance -into a `dict[str,str]` mapping column headers to a string representation of -their content. In the process, it converts complex types into nested lists -and uses the `CellParser`'s `join_from_lists` method to get a string -representation of nested lists. - -By default, the column headers are chosen in such a way the every column -contains a basic type: For example, for list fields, we have one column -per entry. -But as there are many different sheet representations of a `RowModel`, -depending on the choice of the headers, the `RowDataSheet` has two more -optional arguments: - -- `target_headers` (`set[str]`): Complex type fields (`RowModel`s, `list`s, - `dict`s) whose content should be represented in the output dict as a single - entry. A trailing asterisk may be used to specify multiple fields at once, - such as `list.*` and `field.*`. -- `excluded_headers` (`set[str]`): Fields to exclude from the output. Same format - as target_headers. +The `RowDataSheet` class has a method `convert_to_tablib` which converts its content to `tablib.Dataset`. It uses the `unparse_row` method of the `RowParser` associated to the `RowDataSheet` to turn each `RowModel` instance into a `dict[str,str]` mapping column headers to a string representation of their content. In the process, it converts complex types into nested lists and uses the `CellParser`'s `join_from_lists` method to get a string representation of nested lists. + +By default, the column headers are chosen in such a way that every column contains a basic type. For example, for list fields, we have one column per entry. As there are many different possible sheet representations of a `RowModel`, depending on the choice of the headers, the `RowDataSheet` has two more optional arguments: + +- `target_headers` (`set[str]`): Complex type fields (`RowModel`, `list`, `dict`) whose content should be represented in the output dict as a single entry. A trailing asterisk may be used to specify multiple fields at once, such as `list.*` and `field.*`. +- `excluded_headers` (`set[str]`): Fields to exclude from the output. Same format as target_headers. Remark: No templating is supported in the reverse direction. ## Conversion between `RowDataSheet`s and Nested JSON -As `RowModel`s are instances of `pydantic.BaseModel`, -it is easy to convert them to dict/json: +As `RowModel`s are instances of `pydantic.BaseModel`, it is easy to convert them to `dict` or JSON: Reading/writing a single `RowModel` instance from/to json: - `RowModel.parse_json(nested_json)` - `rowmodelinstance.json()` -In practice, we have a list of `RowModel`s, which we want to convert into a -single JSON containing a list of rows (and possibly additional metadata). -Thus we can use the conversion to dict functions and then process the results -further as needed: +In practice, we have a list of `RowModel`s, which we want to convert into a single JSON containing a list of rows (and possibly additional metadata). Thus we can use the conversion to dict functions and then process the results further, as needed: - `RowModel.parse_obj(nested_dict)` - `rowmodelinstance.dict()` -However, no such CLI functionality is implemented, at this point. +It would be desirable to add a method to `RowDataSheet` to export its content to a nested JSON. The reverse is less straight-forward, as we need to store some metadata describing the model somewhere - either via headers, JSON Schema, or a reference to an already defined model. + +The CLI command `save_data_sheets` implements exporting all data sheets referenced in a content index as (a single) nested JSON. This is implemented in `save_data_sheets` in [/src/rpft/converters.py](/src/rpft/converters.py), using the [ContentIndexParser](rapidpro.md). However, it its own `DataSheet` class via its `to_dict` method. It would be good to unify `DataSheet` and `RowDataSheet`, and provide this as standalone functionality, once it's decided which metadata describing the underlying model needs to be stored. + +Below is some (untested) code outlining roughly how this could look like: + +```python +from converters import create_sheet_reader +from rpft.parsers.common.model_inference import model_from_headers +import importlib + +def convert_to_nested_json(input_file, sheet_format, user_data_model_module_name=None): + """ + Convert source spreadsheet(s) into nested json. + + :param input_file: source spreadsheet to convert + :param sheet_format: format of the input spreadsheet + :param user_data_model_module_name: see ContentIndexParser + :returns: content of the input file converted to nested json. + """ + + reader = create_sheet_reader(sheet_format, input_file) + # reader.sheets: Mapping[str, Sheet] + # user_data_model_module_name: We need this once + user_models_module = None + if user_data_model_module_name: + user_models_module = importlib.import_module( + user_data_model_module_name + ) + sheets = {} + for sheet_name, sheet in reader.sheets.items(): + data_model_name = ... # This is not stored anywhere. We need this for each sheet + user_model = infer_model(sheet.name, user_models_module, data_model_name, sheet.table.headers) + rows = sheet_to_list_of_nested_dict(sheet, user_model) + sheets[sheet_name] = rows + return sheets + +def sheet_to_list_of_nested_dict(sheet, user_model): + ''' + The first three lines of this is a common functionality already used in various places, + and should be wrapped in a function (and the output should probably be RowDataSheet + rather than List[RowModel]). + ''' + row_parser = RowParser(user_model, CellParser()) + sheet_parser = SheetParser(row_parser, sheet.table) + data_rows = sheet_parser.parse_all() # list of row model + return [row.dict() for row in data_rows] + # Below is what the content index parser does: + # it stores it as a dict rather than list, assuming an ID column + # model_instances = OrderedDict((row.ID, row) for row in data_rows) + # return DataSheet(model_instances, user_model) + +def nested_json_to_data_sheet(row, user_models_module=None, data_model_name=None, headers=None): + # rows: a list of nested dicts + user_model = infer_model("model_name?", user_models_module, data_model_name, headers) + data_rows = [] + for row in rows: + # Remark: there is also parse_raw for json strings and parse_file, + # however, I assume these are not applicable here because we have some + # meta information inside our files. + data_rows.append(user_model.parse_obj(row)) + return data_rows + # Alternatively, using DataSheet again: + # model_instances = OrderedDict((row.ID, row) for row in data_rows) + # return DataSheet(model_instances, user_model) + +def infer_model(name, user_models_module=None, data_model_name=None, headers=None) + # returns a subclass of https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel + if user_models_module and data_model_name: + user_model = getattr(user_models_module, data_model_name) + else: + user_model = model_from_headers(name, headers) + return user_model +``` + + +[tablib.Dataset]: https://tablib.readthedocs.io/en/stable/api.html#dataset-object +[pydantic.BaseModel]: https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage diff --git a/src/rpft/cli.py b/src/rpft/cli.py index 98efa3d..f169546 100644 --- a/src/rpft/cli.py +++ b/src/rpft/cli.py @@ -38,6 +38,18 @@ def flows_to_sheets(args): ) +def save_data_sheets(args): + output = converters.save_data_sheets( + args.input, + None, + args.format, + data_models=args.datamodels, + tags=args.tags, + ) + with open(args.output, "w", encoding="utf-8") as export: + json.dump(output, export, indent=4) + + def create_parser(): parser = argparse.ArgumentParser( description=("create RapidPro flows JSON from spreadsheets"), @@ -51,6 +63,7 @@ def create_parser(): _add_create_command(sub) _add_convert_command(sub) _add_flows_to_sheets_command(sub) + _add_save_data_sheets_command(sub) return parser @@ -63,6 +76,10 @@ def _add_create_command(sub): ) parser.set_defaults(func=create_flows) + _add_content_index_arguments(parser) + + +def _add_content_index_arguments(parser): parser.add_argument( "--datamodels", help=( @@ -162,5 +179,15 @@ def _add_flows_to_sheets_command(sub): ) +def _add_save_data_sheets_command(sub): + parser = sub.add_parser( + "save_data_sheets", + help="save data sheets referenced in context index as nested json", + ) + + parser.set_defaults(func=save_data_sheets) + _add_content_index_arguments(parser) + + if __name__ == "__main__": main() diff --git a/src/rpft/converters.py b/src/rpft/converters.py index ea03e4b..4443a72 100644 --- a/src/rpft/converters.py +++ b/src/rpft/converters.py @@ -28,11 +28,7 @@ def create_flows(input_files, output_file, sheet_format, data_models=None, tags= :returns: dict representing the RapidPro import/export format. """ - reader = CompositeSheetReader() - for input_file in input_files: - sub_reader = create_sheet_reader(sheet_format, input_file) - reader.add_reader(sub_reader) - parser = ContentIndexParser(reader, data_models, TagMatcher(tags)) + parser = get_content_index_parser(input_files, sheet_format, data_models, tags) flows = parser.parse_all().render() @@ -43,6 +39,41 @@ def create_flows(input_files, output_file, sheet_format, data_models=None, tags= return flows +def save_data_sheets(input_files, output_file, sheet_format, data_models=None, tags=[]): + """ + Save data sheets as JSON. + + Collect the data sheets referenced in the source content index spreadsheet(s) and + save this collection in a single JSON file. Returns the output as a dict. + + :param sources: list of source spreadsheets + :param output_files: (deprecated) path of file to export output to as JSON + :param sheet_format: format of the spreadsheets + :param data_models: name of module containing supporting Python data classes + :param tags: names of tags to be used to filter the source spreadsheets + :returns: dict representing the collection of data sheets. + """ + + parser = get_content_index_parser(input_files, sheet_format, data_models, tags) + + output = parser.data_sheets_to_dict() + + if output_file: + with open(output_file, "w") as export: + json.dump(output, export, indent=4) + + return output + + +def get_content_index_parser(input_files, sheet_format, data_models, tags): + reader = CompositeSheetReader() + for input_file in input_files: + sub_reader = create_sheet_reader(sheet_format, input_file) + reader.add_reader(sub_reader) + parser = ContentIndexParser(reader, data_models, TagMatcher(tags)) + return parser + + def convert_to_json(input_file, sheet_format): """ Convert source spreadsheet(s) into json. diff --git a/src/rpft/parsers/creation/contentindexparser.py b/src/rpft/parsers/creation/contentindexparser.py index c91fc7b..145a945 100644 --- a/src/rpft/parsers/creation/contentindexparser.py +++ b/src/rpft/parsers/creation/contentindexparser.py @@ -34,6 +34,15 @@ def __init__(self, rows, row_model): self.rows = rows self.row_model = row_model + def to_dict(self): + rows = [] + for content in self.rows.values(): + rows.append(content.dict()) + return { + "model" : self.row_model.__name__, + "rows" : rows + } + class ParserError(Exception): pass @@ -336,6 +345,18 @@ def get_node_group( "or neither have to be provided." ) + def data_sheets_to_dict(self): + sheets = {} + for sheet_name, sheet in self.data_sheets.items(): + sheets[sheet_name] = sheet.to_dict() + return { + "sheets" : sheets, + "meta" : { + "user_models_module" : self.user_models_module.__name__, + "version" : "0.1.0", + } + } + def parse_all(self): rapidpro_container = RapidProContainer() self.parse_all_flows(rapidpro_container) diff --git a/tests/datarowmodels/nestedmodel.py b/tests/datarowmodels/nestedmodel.py index 4e98168..1e3a17d 100644 --- a/tests/datarowmodels/nestedmodel.py +++ b/tests/datarowmodels/nestedmodel.py @@ -1,3 +1,5 @@ +from typing import List + from rpft.parsers.creation.datarowmodel import DataRowModel from rpft.parsers.common.rowparser import ParserModel @@ -14,3 +16,9 @@ class NestedRowModel(DataRowModel): # it inherits DataRowModel, which gives it the ID column. value1: str = "" custom_field: CustomModel = CustomModel() # Default value is an empty custom model + + +class ListRowModel(DataRowModel): + # Because this defines the content of a datasheet, + # it inherits DataRowModel, which gives it the ID column. + list_value: List[str] = [] diff --git a/tests/test_contentindexparser.py b/tests/test_contentindexparser.py index 3b46568..3489d2f 100644 --- a/tests/test_contentindexparser.py +++ b/tests/test_contentindexparser.py @@ -8,8 +8,6 @@ from tests.mocks import MockSheetReader from tests.utils import Context, traverse_flow -# flake8: noqa: E501 - def csv_join(*args): return "\n".join(args) + "\n" @@ -57,7 +55,10 @@ def check_basic_template_definition(self, ci_sheet): ",send_message,start,Some text", ) - sheet_reader = MockSheetReader(ci_sheet, {"my_template": my_template, "my_template2": my_template}) + sheet_reader = MockSheetReader( + ci_sheet, + {"my_template": my_template, "my_template2": my_template}, + ) ci_parser = ContentIndexParser(sheet_reader) template_sheet = ci_parser.get_template_sheet("my_template") self.assertEqual(template_sheet.table[0][1], "send_message") @@ -173,11 +174,14 @@ def test_ignore_templated_flow_definition(self): render_output = container.render() self.assertEqual(len(render_output["flows"]), 3) names = {flow["name"] for flow in render_output["flows"]} - self.assertEqual(names, { - "bulk_renamed - row1", - "bulk_renamed - row2", - "row2_renamed - row2", - }) + self.assertEqual( + names, + { + "bulk_renamed - row1", + "bulk_renamed - row2", + "row2_renamed - row2", + }, + ) def test_generate_flows(self): ci_sheet = ( @@ -639,33 +643,33 @@ def test_tags(self): class TestOperation(unittest.TestCase): def test_concat(self): # Concatenate two fresh sheets - ci_sheet = ( - "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type\n" - "data_sheet,simpleA;simpleB,,,simpledata,SimpleRowModel,concat\n" + ci_sheet = csv_join( + "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type", + "data_sheet,simpleA;simpleB,,,simpledata,SimpleRowModel,concat", ) self.check_concat(ci_sheet) def test_concat_implicit(self): # Concatenate two fresh sheets - ci_sheet = ( - "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type\n" - "data_sheet,simpleA;simpleB,,,simpledata,SimpleRowModel,\n" + ci_sheet = csv_join( + "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type", + "data_sheet,simpleA;simpleB,,,simpledata,SimpleRowModel," ) self.check_concat(ci_sheet) def test_concat2(self): # Concatenate a fresh sheet with an existing sheet - ci_sheet = ( - "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type\n" - "data_sheet,simpleA,,,renamedA,SimpleRowModel,\n" - "data_sheet,renamedA;simpleB,,,simpledata,SimpleRowModel,concat\n" + ci_sheet = csv_join( + "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type", + "data_sheet,simpleA,,,renamedA,SimpleRowModel,", + "data_sheet,renamedA;simpleB,,,simpledata,SimpleRowModel,concat", ) self.check_concat(ci_sheet) def test_concat3(self): # Concatenate two existing sheets - ci_sheet = ( - "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type\n" + ci_sheet = csv_join( + "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation.type", "data_sheet,simpleA,,,renamedA,SimpleRowModel,\n" "data_sheet,simpleB,,,renamedB,SimpleRowModel,\n" "data_sheet,renamedA;renamedB,,,simpledata,SimpleRowModel,concat\n" @@ -699,7 +703,7 @@ def test_filter_fresh(self): # The filter operation is referencing a sheet new (not previously parsed) sheet ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" # noqa: E501 ) self.check_example1(ci_sheet) @@ -708,7 +712,7 @@ def test_filter_existing(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" "data_sheet,simpleA,,,,SimpleRowModel,\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" # noqa: E501 ) self.check_example1(ci_sheet, original="simpleA") @@ -716,7 +720,7 @@ def test_filter_existing_renamed(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" "data_sheet,simpleA,,,renamedA,SimpleRowModel,\n" - "data_sheet,renamedA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" + "data_sheet,renamedA,,,simpledata,SimpleRowModel,filter|expression;value2=='fruit'\n" # noqa: E501 ) self.check_example1(ci_sheet, original="renamedA") @@ -758,7 +762,7 @@ def check_filtersort(self, ci_sheet, exp_keys, original=None): def test_filter_fresh2(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,\"filter|expression;value1 in ['orange','apple']\"\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,\"filter|expression;value1 in ['orange','apple']\"\n" # noqa: E501 ) exp_keys = ["rowA", "rowC"] self.check_filtersort(ci_sheet, exp_keys) @@ -766,7 +770,7 @@ def test_filter_fresh2(self): def test_filter_fresh3(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value1.lower() > 'd'\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,filter|expression;value1.lower() > 'd'\n" # noqa: E501 ) exp_keys = ["rowA", "rowB", "rowD"] self.check_filtersort(ci_sheet, exp_keys) @@ -774,7 +778,7 @@ def test_filter_fresh3(self): def test_sort(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()\n" # noqa: E501 ) exp_keys = ["rowC", "rowD", "rowA", "rowB"] rows = self.check_filtersort(ci_sheet, exp_keys) @@ -788,7 +792,7 @@ def test_sort_existing(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" "data_sheet,simpleA,,,,SimpleRowModel,\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()\n" # noqa: E501 ) exp_keys = ["rowC", "rowD", "rowA", "rowB"] self.check_filtersort(ci_sheet, exp_keys, original="simpleA") @@ -796,7 +800,7 @@ def test_sort_existing(self): def test_sort_descending(self): ci_sheet = ( "type,sheet_name,data_sheet,data_row_id,new_name,data_model,operation\n" - "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()|order;descending\n" + "data_sheet,simpleA,,,simpledata,SimpleRowModel,sort|expression;value1.lower()|order;descending\n" # noqa: E501 ) exp_keys = ["rowB", "rowA", "rowD", "rowC"] self.check_filtersort(ci_sheet, exp_keys) @@ -805,7 +809,7 @@ def test_sort_descending(self): class TestModelInference(TestTemplate): def setUp(self): self.ci_sheet = ( - "type,sheet_name,data_sheet,data_row_id,status\n" # noqa: E501 + "type,sheet_name,data_sheet,data_row_id,status\n" "template_definition,my_template,,,\n" "create_flow,my_template,mydata,,\n" "data_sheet,mydata,,,\n" @@ -816,7 +820,6 @@ def setUp(self): ",send_message,,{{custom_field.happy}} and {{custom_field.sad}}\n" ) - def check_example(self, sheet_dict): sheet_reader = MockSheetReader(self.ci_sheet, sheet_dict) ci_parser = ContentIndexParser(sheet_reader) @@ -1025,22 +1028,17 @@ def test_parse_triggers(self): self.assertEqual(groups2[0]["uuid"], mygroup_uuid) def test_parse_triggers_without_flow(self): - ci_sheet = ( - "type,sheet_name\n" - "create_triggers,my_triggers\n" - ) + ci_sheet = "type,sheet_name\n" "create_triggers,my_triggers\n" my_triggers = ( "type,keywords,flow,groups,exclude_groups,match_type\n" "K,the word,my_basic_flow,My Group,,\n" ) - sheet_reader = MockSheetReader( - ci_sheet, {"my_triggers": my_triggers} - ) + sheet_reader = MockSheetReader(ci_sheet, {"my_triggers": my_triggers}) ci_parser = ContentIndexParser(sheet_reader) container = ci_parser.parse_all() with self.assertRaises(RapidProTriggerError): - render_output = container.render() + container.render() def test_ignore_triggers(self): ci_sheet = ( @@ -1216,3 +1214,76 @@ def test_with_model(self): ) self.check(ci_parser, "template - a", ["hello georg"]) self.check(ci_parser, "template - b", ["hello chiara"]) + + +class TestSaveAsDict(unittest.TestCase): + def test_save_as_dict(self): + self.maxDiff = None + ci_sheet = ( + "type,sheet_name,data_sheet,data_row_id,new_name,data_model,status\n" + "data_sheet,simpledata,,,simpledata_renamed,ListRowModel,\n" + "create_flow,my_basic_flow,,,,,\n" + "data_sheet,nesteddata,,,,NestedRowModel,\n" + ) + simpledata = csv_join( + "ID,list_value.1,list_value.2", + "rowID,val1,val2", + ) + nesteddata = ( + "ID,value1,custom_field.happy,custom_field.sad\n" + "row1,Value1,Happy1,Sad1\n" + "row2,Value2,Happy2,Sad2\n" + ) + my_basic_flow = csv_join( + "row_id,type,from,message_text", + ",send_message,start,Some text", + ) + + sheet_dict = { + "simpledata": simpledata, + "my_basic_flow": my_basic_flow, + "nesteddata": nesteddata, + } + + sheet_reader = MockSheetReader(ci_sheet, sheet_dict) + ci_parser = ContentIndexParser(sheet_reader, "tests.datarowmodels.nestedmodel") + output = ci_parser.data_sheets_to_dict() + output["meta"].pop("version") + exp = { + "meta": { + "user_models_module": "tests.datarowmodels.nestedmodel", + }, + "sheets": { + "simpledata_renamed": { + "model": "ListRowModel", + "rows": [ + { + "ID": "rowID", + "list_value": ["val1", "val2"], + } + ], + }, + "nesteddata": { + "model": "NestedRowModel", + "rows": [ + { + "ID": "row1", + "value1": "Value1", + "custom_field": { + "happy": "Happy1", + "sad": "Sad1", + }, + }, + { + "ID": "row2", + "value1": "Value2", + "custom_field": { + "happy": "Happy2", + "sad": "Sad2", + }, + }, + ], + }, + }, + } + self.assertEqual(output, exp) diff --git a/tests/test_differentways.py b/tests/test_differentways.py index c3ba6bd..b65eccd 100644 --- a/tests/test_differentways.py +++ b/tests/test_differentways.py @@ -133,7 +133,3 @@ def test_different_ways(self): def test_single_kwarg(self): output_single_kwarg = self.parser.parse_row(input_single_kwarg) self.assertEqual(output_single_kwarg, output_single_kwarg_exp) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_full_rows.py b/tests/test_full_rows.py index 7413ace..13e64d9 100644 --- a/tests/test_full_rows.py +++ b/tests/test_full_rows.py @@ -225,7 +225,3 @@ def test_input_5(self): def test_input_6(self): output6 = self.parser.parse_row(input6) self.assertEqual(output6, output6_exp) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_model_inference.py b/tests/test_model_inference.py index ddb0aa6..9b36d2c 100644 --- a/tests/test_model_inference.py +++ b/tests/test_model_inference.py @@ -148,7 +148,3 @@ class MySubmodel(BaseModel): field1=(List[MySubmodel], [MySubmodel(), MySubmodel()]), ), ) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_rowdatasheet.py b/tests/test_rowdatasheet.py index f183dfd..d6a2459 100644 --- a/tests/test_rowdatasheet.py +++ b/tests/test_rowdatasheet.py @@ -117,7 +117,3 @@ def test_export_xlsx(self): RowDataSheet(self.rowparser, [rowA, rowC]).export( outfile, file_format="xlsx" ) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_rowparser.py b/tests/test_rowparser.py index d5af207..4a04965 100644 --- a/tests/test_rowparser.py +++ b/tests/test_rowparser.py @@ -1,7 +1,7 @@ import unittest from typing import List -from rpft.parsers.common.rowparser import ParserModel, RowParser, RowParserError +from rpft.parsers.common.rowparser import ParserModel, RowParser from tests.mocks import MockCellParser @@ -128,10 +128,6 @@ def test_convert_two_element(self): out = self.parser.parse_row(inp) self.assertEqual(out, self.onetwoModel) - # inp = {"list_field": "1"} - # with self.assertRaises(ValueError): - # out = self.parser.parse_row(inp) - class ListIntModel(ParserModel): list_field: List[int] = [] @@ -155,7 +151,3 @@ def setUp(self): self.emptyModel = ListModel(**{"list_field" : []}) self.oneModel = ListModel(**{"list_field" : ["1"]}) self.onetwoModel = ListModel(**{"list_field" : ["1", "2"]}) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_sheetparser.py b/tests/test_sheetparser.py index 08e6feb..39b54ff 100644 --- a/tests/test_sheetparser.py +++ b/tests/test_sheetparser.py @@ -53,7 +53,3 @@ def test_parse_all(self): self.assertEqual( rows[2], {"field1": "row3f1", "field2": "row3f2", "context": {}} ) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/test_unparse.py b/tests/test_unparse.py index 06a7bbe..ad1b605 100644 --- a/tests/test_unparse.py +++ b/tests/test_unparse.py @@ -387,7 +387,3 @@ def test_exclude(self): "model_with_stuff.str_field": "string", } self.assertEqual(output1, exp1) - - -if __name__ == "__main__": - unittest.main()