Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align YAML tests and JSON API situation description #716

Closed
fpagnoux opened this issue Aug 23, 2018 · 28 comments
Closed

Align YAML tests and JSON API situation description #716

fpagnoux opened this issue Aug 23, 2018 · 28 comments
Assignees
Labels
kind:discovery Issue requires discovery: value, ux and tech

Comments

@fpagnoux
Copy link
Member

fpagnoux commented Aug 23, 2018

Right now, the description of a situation in a YAML test and in the Web API have a slightly different structure:

  • In the JSON API, they follow this structure
  • In the YAML tests, the structure in inherited from the former Web API, which is slightly different.

This is often confusing for users

The test runner also currently depends on Scenarios to parse these YAML files, while we've been trying to uncouple scenarios from the core of OpenFisca.

We should align the test structure to the API one, and process them the same way.

@fpagnoux
Copy link
Member Author

Solving this issue means defining a new format for YAML tests. Here are some thoughts:

  • As mentioned in the opening post, we want to avoid differences between the YAML test structure and the JSON request one.
    • to avoid confusion
    • so that an user can easily just copy-paste a simulation they sent to the API in a YAML test
  • However, strictly limiting YAML tests syntax to the current Web API syntax is really heavy for test writers:
    • Unlike API requests, which are coded "once and for all" and then automatically generated, tests are written manually. Therefore, every piece of boilerplate can become a hassle, and discourage testing. We certainly don't want that!

Trying to balance these two principles, here is a proposition of specification for a new test format:

@fpagnoux
Copy link
Member Author

fpagnoux commented Oct 23, 2018

RFC 1 - General structure of a YAML test

- name: "Income tax - basic case"
  description: ...
  ...  # other test-level data, such as error margin, reforms to use, etc.
  input:
    ... # Simulation input
  output:
    ...  # Expected output of the simulation

The main change relatively to today's format is to:

  • clearly separate input description from test-level data

@fpagnoux
Copy link
Member Author

fpagnoux commented Oct 23, 2018

RFC 2 - Entity-less mode

The following format is valid:

- name: "Income tax - basic case"
  input:
    salary:
      2018-10: 2000
  output:
    income_tax:
      2018-10: 300

In this case, we make the assumption that:

  • There is only one person in the simulation
  • There is only one instance of each group entity in the simulation (family, household, etc.)
  • This person has, in each group entity, the first role (e.g. first_parent, principal_caregiver)

@fpagnoux
Copy link
Member Author

RFC 3 - Simplified period mode

The following format is valid:

- name: "Income tax - basic case"
  period: 2018-10
  input:
    salary: 2000
  output:
    income_tax: 300

In this case, we make the assumption that:

  • All values, input or output, for which no period is explicitly defined, is considered to apply for the global test period.

@fpagnoux
Copy link
Member Author

fpagnoux commented Oct 23, 2018

RFC 3-alt - Simplified period mode (alternative)

The following format is valid:

- name: "Income tax - basic case"
  input:
    period: 2018-10
    salary: 2000
  output:
    income_tax: 300

In this case, we make the same assumption than in 3, but the period is specified in a different place.

  • Advantage: The "default period" semantically seem to be part of the input
  • Disadvantage: Using a default period defined in the input for the output sounds a little funny. We could have a period in each of the two blocks, but that's a little heavy.

@fpagnoux
Copy link
Member Author

RFC 4 - Full entities specification

The following format is valid:

- name: "Income tax - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    households:
      household_1:
        parents: [Alicia, Javier]
  output:
    persons:
      Alicia:
        income_tax: 600
      Javier:
        income_tax: 375

This is almost a Web API-like JSON, except it's using RFC 3 to simplify periods. Of course, fully specifying periods for all input and output would work too.

Note the repetition of households and household_1 for single-household cases, which may be confusing and heavy.

@fpagnoux
Copy link
Member Author

RFC 5 - Full entities specification input, entity-less output

The following format is valid:

- name: "Total taxes - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    households:
      household_1:
        parents: [Alicia, Javier]
  output:
    total_taxes: 1050

When there is no ambiguity, the entity structure doesn't need to be repeated in the output.

  • Note that this works because total_taxes applies to household, and there is only one household in the simulation. We could not use it for income_tax, which is a person variable, with 2 persons in the simulation.

@fpagnoux
Copy link
Member Author

RFC 6 - Single-entity simplified mode

The following format is valid:

- name: "Total taxes - couple"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    household:
      parents: [Alicia, Javier]
  output:
    total_taxes: 1050

Using the singular name (key field in the entities.py file) for the entity (here household), instead of the plural (households), implies that there is only one entity of this type. This avoid the households/household_1 repetition.

@fpagnoux
Copy link
Member Author

RFC 7 - Vectorial input

The following format is valid:

- name: "Income tax - different salaries"
  period: 2018-10
  input:
    salary: [2000, 3000, 5000]
  output:
    income_tax: [300, 450, 750]

In this case, if n is the length of the inputs/outputs, we make the assumption that:

  • There are n persons in the simulation
  • There is n instances of each group entity in the simulation (family, household, etc.). Each of these entities contain only one person.
  • Each person has, in each group entity, the first role (e.g. first_parent, principal_caregiver)

@fpagnoux
Copy link
Member Author

fpagnoux commented Oct 23, 2018

RFC 8 - Partial entity specification

In a tax and benefit system that includes two group entities, titled_property and family, specifying only one entity is valid:

- name: "Housing tax - two owners"
  period: 2017-01
  input:
    persons:
      Alicia:
        salary: 4000
      Javier:
        salary: 2500
    titled_property:
      owners: [Alicia, Javier]
      accommodation_size: 100
  output:
    housing_tax: 1050

In this case, for consistency with RFC 7, and because we can't make general assumptions about relationships between entities, we make the assumption that:

  • Each person belongs to its own family
  • Each person has, in each family, the first role

Note: France developed a way to deal with partial entity specification in a country-specific way. While this feature is interesting, it suffered from significant drawbacks: opacity, complexity, and dependency to tools (biryani, scenarios) that we would like to avoid in Core. As of October 22nd 2018, this feature was used in YAML tests, but not in the Web API. For now, we are not considering importing this feature in OpenFisca-Core in a foreseeable future.

@fpagnoux
Copy link
Member Author

@jvalduvieco, @dlouise64, @Br3nda, @verbman, @guillett, @Morendil, you are the one writing a lot of tests. Please let us know if:

  • these RFCs seem reasonable to you
  • some of your use-cases are not covered, and you would like to be able to write YAML tests in a different way.

We intend to move on relatively soon to start implementing, but if you think you have an opinion about this topic and need more time to think about it, let us know.

Note that whatever choices we make here, migrations scripts will be provided to OpenFisca users.

@jvalduvieco, you wanted to be able to use anchors and references in YAML tests, right? We could try to enable this in the PR to come.

@guillett
Copy link
Member

Thanks @fpagnoux for make easier the OpenFisca onboarding.

A few comments:

@sandcha
Copy link
Collaborator

sandcha commented Oct 29, 2018

Thanks @fpagnoux!

For openfisca-tunisia, I would be happy to keep the entity less mode (RFC2) and to be able to use a specific extension (in the description, like reforms).

I'm only confused about period definition: isn't it awkward to give one period only to a test calculating, for example, 2017 taxes on 2016 incomes?

@Morendil
Copy link
Contributor

@fpagnoux I owe you comments on this, but other things took priority, I'm not forgetting it.

@jvalduvieco
Copy link
Contributor

Thanks @fpagnoux !
My comments:

  • Anchors & references - My need is to avoid repeating most of the test case to just change one of the parameters. YAML Anchors and references seem a good way to compose use cases, extends also seems interesting.
  • RFC 2, RFC 6, RFC 7, RFC 8 - Feel like those are trying to go down the easy path hiding and making assumptions.. For me this is dangerous as the developer needs to know which assumptions are being made or thing on not specified entities, etc... I prefer making things explicit and simple. This combined with YAML tools like references and extends might be enough to have succinct tests.
  • All group entities defined in all use cases - I understand this is a requirement due internal core implementation but ties all the tests to all group entities defined in the system. This makes tests fragile and is one of my pain points (for example in our system we have 2nd_grade_families, parents_and_children_families, persons_living_together group entities. persons_living_together only applies to a certain set of formulas but I need to define this for all formulas).
  • All persons in all entities - This forces me to create other* roles to include non relevant persons for a group entity (for example I need to add persons who are not part of parents_and_children_families with a role like other_persons to satisfy this requirement).

I guess some of this pains might be avoided with a different design, opinions and guidance are welcome.

@Morendil
Copy link
Contributor

It's not necessarily clear to everyone (it's not to me) what conclusions we've come to after this RFC, and from looking at WIP some of these options are being implemented which will make them de facto decisions, so I'd suggest a recap to make it clear where we are heading.

Is there really a lot of value from changing {input,output}_variables to the shorter {input,output}?

What I'm seeing by looking at WIP is that this is, so far, the only change to some test files. But if many existing test files are already in conformity with the new format, and only some of the more complex ones need to be changed, I would strongly favor backward-compatible naming of the keys. This way large models like France are spared a large migration to the new format.

@fpagnoux
Copy link
Member Author

This way large models like France are spared a large migration to the new format.

I'm don't think we can spare the large migration to France, and this renaming is not the hard part 😓 .

Is there really a lot of value from changing {input,output}_variables to the shorter {input,output}?

Not a huge one, but it makes it really easy to check if a test has already been migrated, which has some advantages for the migration script.

I'd suggest a recap to make it clear where we are heading

  • RFC 8 and 3alt are dropped
  • The others will be implemented, as they are massively used it current tests in France, and dropping them would make the migration much more complicated

@Br3nda
Copy link
Contributor

Br3nda commented Nov 28, 2018

Just fyi, we're hoping to do a visualisation of the tests, from the yaml, to help explain the concept to non coders. We'll delay until the new format is somewhat stabilised into core.

@Br3nda
Copy link
Contributor

Br3nda commented Nov 28, 2018

Catching up on this thread, we use entities in many tests in Aotearoa (New Zealand's) openfisca rules. We have entitlements that an adult gets if they have, for example, a disabled child in their care. We also have some entitlements to a rebate for a property (a home) based on the occupants. Many other examples. Much of the law we've encoded so far is all about the relationships between entities.

@Br3nda
Copy link
Contributor

Br3nda commented Nov 28, 2018

Gah hit close button by mistake

@fpagnoux
Copy link
Member Author

fpagnoux commented Dec 5, 2018

Done 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:discovery Issue requires discovery: value, ux and tech
Projects
None yet
Development

No branches or pull requests

8 participants