New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Align YAML tests and JSON API situation description #716

Closed

fpagnoux opened this issue Aug 23, 2018 · 28 comments

Assignees

Labels

Member

fpagnoux commented Aug 23, 2018 •

edited

Loading

Right now, the description of a situation in a YAML test and in the Web API have a slightly different structure:

In the JSON API, they follow this structure
In the YAML tests, the structure in inherited from the former Web API, which is slightly different.

This is often confusing for users

The test runner also currently depends on Scenarios to parse these YAML files, while we've been trying to uncouple scenarios from the core of OpenFisca.

We should align the test structure to the API one, and process them the same way.

fpagnoux added the flow:team:backlog label

fpagnoux added flow:team:next and removed flow:team:backlog labels

bonjourmauko added flow:team:backlog and removed flow:team:next labels

sandcha added flow:team:next and removed flow:team:backlog labels

bonjourmauko added the kind:feat label

fpagnoux added flow:team:doing and removed flow:team:next labels

fpagnoux self-assigned this

fpagnoux mentioned this issue

Corrige le calcul de l'ASS openfisca/openfisca-france#1172

Merged

7 tasks

Member Author

fpagnoux commented Oct 23, 2018

Solving this issue means defining a new format for YAML tests. Here are some thoughts:

As mentioned in the opening post, we want to avoid differences between the YAML test structure and the JSON request one.
- to avoid confusion
- so that an user can easily just copy-paste a simulation they sent to the API in a YAML test
However, strictly limiting YAML tests syntax to the current Web API syntax is really heavy for test writers:
- Unlike API requests, which are coded "once and for all" and then automatically generated, tests are written manually. Therefore, every piece of boilerplate can become a hassle, and discourage testing. We certainly don't want that!

Trying to balance these two principles, here is a proposition of specification for a new test format:

Member Author

fpagnoux commented Oct 23, 2018 •

edited

Loading

RFC 1 - General structure of a YAML test

- name: "Income tax - basic case"
  description: ...
  ...  # other test-level data, such as error margin, reforms to use, etc.
  input:
    ... # Simulation input
  output:
    ...  # Expected output of the simulation

The main change relatively to today's format is to:

clearly separate input description from test-level data

Member Author

fpagnoux commented Oct 23, 2018 •

edited

Loading

RFC 2 - Entity-less mode

The following format is valid:

- name: "Income tax - basic case"
  input:
    salary:
      2018-10: 2000
  output:
    income_tax:
      2018-10: 300

In this case, we make the assumption that:

There is only one person in the simulation
There is only one instance of each group entity in the simulation (family, household, etc.)
This person has, in each group entity, the first role (e.g. first_parent, principal_caregiver)

Member Author

fpagnoux commented Oct 23, 2018

RFC 3 - Simplified period mode

The following format is valid:

- name: "Income tax - basic case"
  period: 2018-10
  input:
    salary: 2000
  output:
    income_tax: 300

In this case, we make the assumption that:

All values, input or output, for which no period is explicitly defined, is considered to apply for the global test period.

Member Author

fpagnoux commented Oct 23, 2018 •

edited

Loading

RFC 3-alt - Simplified period mode (alternative)

The following format is valid:

- name: "Income tax - basic case"
  input:
    period: 2018-10
    salary: 2000
  output:
    income_tax: 300

In this case, we make the same assumption than in 3, but the period is specified in a different place.

Advantage: The "default period" semantically seem to be part of the input
Disadvantage: Using a default period defined in the input for the output sounds a little funny. We could have a period in each of the two blocks, but that's a little heavy.

Member Author

fpagnoux commented Oct 23, 2018

RFC 4 - Full entities specification

The following format is valid:

- name: "Income tax - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    households:
      household_1:
        parents: [Alicia, Javier]
  output:
    persons:
      Alicia:
        income_tax: 600
      Javier:
        income_tax: 375

This is almost a Web API-like JSON, except it's using RFC 3 to simplify periods. Of course, fully specifying periods for all input and output would work too.

Note the repetition of households and household_1 for single-household cases, which may be confusing and heavy.

Member Author

fpagnoux commented Oct 23, 2018

RFC 5 - Full entities specification input, entity-less output

The following format is valid:

- name: "Total taxes - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    households:
      household_1:
        parents: [Alicia, Javier]
  output:
    total_taxes: 1050

When there is no ambiguity, the entity structure doesn't need to be repeated in the output.

Note that this works because total_taxes applies to household, and there is only one household in the simulation. We could not use it for income_tax, which is a person variable, with 2 persons in the simulation.

Member Author

fpagnoux commented Oct 23, 2018

RFC 6 - Single-entity simplified mode

The following format is valid:

- name: "Total taxes - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    household:
      parents: [Alicia, Javier]
  output:
    total_taxes: 1050

Using the singular name (key field in the entities.py file) for the entity (here household), instead of the plural (households), implies that there is only one entity of this type. This avoid the households/household_1 repetition.

Member Author

fpagnoux commented Oct 23, 2018

RFC 7 - Vectorial input

The following format is valid:

- name: "Income tax - different salaries"
  period: 2018-10
  input:
    salary: [2000, 3000, 5000]
  output:
    income_tax: [300, 450, 750]

In this case, if n is the length of the inputs/outputs, we make the assumption that:

There are n persons in the simulation
There is n instances of each group entity in the simulation (family, household, etc.). Each of these entities contain only one person.
Each person has, in each group entity, the first role (e.g. first_parent, principal_caregiver)

Member Author

fpagnoux commented Oct 23, 2018 •

edited

Loading

RFC 8 - Partial entity specification

In a tax and benefit system that includes two group entities, titled_property and family, specifying only one entity is valid:

- name: "Housing tax - two owners"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    titled_property:
      owners: [Alicia, Javier]
      accommodation_size: 100
  output:
    housing_tax: 1050

In this case, for consistency with RFC 7, and because we can't make general assumptions about relationships between entities, we make the assumption that:

Each person belongs to its own family
Each person has, in each family, the first role

Note: France developed a way to deal with partial entity specification in a country-specific way. While this feature is interesting, it suffered from significant drawbacks: opacity, complexity, and dependency to tools (biryani, scenarios) that we would like to avoid in Core. As of October 22nd 2018, this feature was used in YAML tests, but not in the Web API. For now, we are not considering importing this feature in OpenFisca-Core in a foreseeable future.

Member Author

fpagnoux commented Oct 23, 2018

@jvalduvieco, @dlouise64, @Br3nda, @verbman, @guillett, @Morendil, you are the one writing a lot of tests. Please let us know if:

these RFCs seem reasonable to you
some of your use-cases are not covered, and you would like to be able to write YAML tests in a different way.

We intend to move on relatively soon to start implementing, but if you think you have an opinion about this topic and need more time to think about it, let us know.

Note that whatever choices we make here, migrations scripts will be provided to OpenFisca users.

@jvalduvieco, you wanted to be able to use anchors and references in YAML tests, right? We could try to enable this in the PR to come.

fpagnoux mentioned this issue

Utilise PyTest openfisca/openfisca-france#1180

Merged

7 tasks

Member

guillett commented Oct 29, 2018

Thanks @fpagnoux for make easier the OpenFisca onboarding.

A few comments:

I am ok with any format as long as there is a script to migrate;
In case of failing test, please make sure that the entity order is easily accessible (cf. entitiesDescription in /trace)
I work with extensions so I would be happy if I could create tests for a specific extension easily.

Collaborator

sandcha commented Oct 29, 2018 •

edited

Loading

Thanks @fpagnoux!

For openfisca-tunisia, I would be happy to keep the entity less mode (RFC2) and to be able to use a specific extension (in the description, like reforms).

I'm only confused about period definition: isn't it awkward to give one period only to a test calculating, for example, 2017 taxes on 2016 incomes?

Contributor

Morendil commented Oct 29, 2018

@fpagnoux I owe you comments on this, but other things took priority, I'm not forgetting it.

Contributor

jvalduvieco commented Oct 29, 2018

Thanks @fpagnoux !
My comments:

Anchors & references - My need is to avoid repeating most of the test case to just change one of the parameters. YAML Anchors and references seem a good way to compose use cases, extends also seems interesting.
RFC 2, RFC 6, RFC 7, RFC 8 - Feel like those are trying to go down the easy path hiding and making assumptions.. For me this is dangerous as the developer needs to know which assumptions are being made or thing on not specified entities, etc... I prefer making things explicit and simple. This combined with YAML tools like references and extends might be enough to have succinct tests.
All group entities defined in all use cases - I understand this is a requirement due internal core implementation but ties all the tests to all group entities defined in the system. This makes tests fragile and is one of my pain points (for example in our system we have 2nd_grade_families, parents_and_children_families, persons_living_together group entities. persons_living_together only applies to a certain set of formulas but I need to define this for all formulas).
All persons in all entities - This forces me to create other* roles to include non relevant persons for a group entity (for example I need to add persons who are not part of parents_and_children_families with a role like other_persons to satisfy this requirement).

I guess some of this pains might be avoided with a different design, opinions and guidance are welcome.

fpagnoux added flow:team:themes and removed flow:team:doing labels

Morendil added the flow:team:doing label

bonjourmauko added flow:team:themes and removed flow:team:doing labels

Morendil mentioned this issue

Allow non-integral inputs to int variables #780

Merged

Contributor

Morendil commented Nov 26, 2018

It's not necessarily clear to everyone (it's not to me) what conclusions we've come to after this RFC, and from looking at WIP some of these options are being implemented which will make them de facto decisions, so I'd suggest a recap to make it clear where we are heading.

Is there really a lot of value from changing {input,output}_variables to the shorter {input,output}?

What I'm seeing by looking at WIP is that this is, so far, the only change to some test files. But if many existing test files are already in conformity with the new format, and only some of the more complex ones need to be changed, I would strongly favor backward-compatible naming of the keys. This way large models like France are spared a large migration to the new format.

pblayo mentioned this issue

Uniformize tests with the new API #781

Merged

This was referenced Nov 28, 2018

Uniformize tests with the new API openfisca/country-template#66

Merged

Uniformize tests with the new API openfisca/extension-template#18

Merged

Document uniformisation of tests (new API) openfisca/openfisca-doc#172

Merged

Uniformise les tests avec la nouvelle API openfisca/openfisca-france#1224

Merged

Member Author

fpagnoux commented Nov 28, 2018

This way large models like France are spared a large migration to the new format.

I'm don't think we can spare the large migration to France, and this renaming is not the hard part 😓 .

Is there really a lot of value from changing {input,output}_variables to the shorter {input,output}?

Not a huge one, but it makes it really easy to check if a test has already been migrated, which has some advantages for the migration script.

I'd suggest a recap to make it clear where we are heading

RFC 8 and 3alt are dropped
The others will be implemented, as they are massively used it current tests in France, and dropping them would make the migration much more complicated

Contributor

Br3nda commented Nov 28, 2018

Just fyi, we're hoping to do a visualisation of the tests, from the yaml, to help explain the concept to non coders. We'll delay until the new format is somewhat stabilised into core.

fpagnoux added flow:team:reviewing and removed flow:team:themes labels

Contributor

Br3nda commented Nov 28, 2018

Catching up on this thread, we use entities in many tests in Aotearoa (New Zealand's) openfisca rules. We have entitlements that an adult gets if they have, for example, a disabled child in their care. We also have some entitlements to a rebate for a property (a home) based on the occupants. Many other examples. Much of the law we've encoded so far is all about the relationships between entities.

Br3nda closed this as completed

fpagnoux removed the flow:team:reviewing label

Br3nda reopened this

Contributor

Br3nda commented Nov 28, 2018

Gah hit close button by mistake

Morendil added flow:team:reviewing and removed flow:team:reviewing labels

bonjourmauko mentioned this issue

Refactor code using Biryani #691

Closed

Morendil mentioned this issue

Improve design and test coverage of complex simulation construction #790

Merged

sandcha added flow:team:reviewing and removed flow:team:doing labels

fpagnoux closed this as completed

fpagnoux removed the flow:team:reviewing label

Member Author

fpagnoux commented Dec 5, 2018

Done 🎉

guillett mentioned this issue

Remove ambiguity from YAML test files #808

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment