Make descriptions unique #2020

ErikSchierboom · 2022-04-21T13:00:52Z

Recently, there was some discussion on whether test case descriptions should be unique: #2019 (comment)
IMO there is an unwritten rule that each description should be unique, but it is not documented anywhere I think.
To me, it seems like having unique descriptions is useful for several reasons:

It makes for easier identification of test cases (descriptions are easier to scan than UUIDs)
It is often a prerequisite for test generators that a (full) description is unique

To be clear, when I'm referring to a description being unique, what I mean is that the concatenated description of a test case must be unique. In other words, the uniqueness also takes into account any description of parent test case goupings.

@wolf99 Would something like this be possible to encode in the JSON schema we have?

SaschaMann · 2022-04-21T13:18:40Z

This clashes with the actually documented purpose of descriptions:

Descriptions should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?

There are plenty of scenarios where more than one case has the same reason for existing.

It is often a prerequisite for test generators that a (full) description is unique

I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.

If uniqueness is required, the UUID or some kind of hash of the test case can trivially be used already by the generators to achieve it.

jiegillet · 2022-04-21T15:04:00Z

I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.

I think this should be considered. Most generators (or even humans copy-pasting like I did a bunch of times) will use the description as a test name, it's a natural fit, so why wouldn't we make the job easier by having a few person doing the thinking and interpreting here rather than forcing 50 people to do it down the line?

If the documented purpose of description clashes with what people use them for, I think the documentation should adapt.

petertseng · 2022-04-21T17:07:01Z

I can inform you that according to the below, all instances of duplicate descriptions fall under one of the three categories:

although the description by itself repeats another, the cases in question are within different groups, and those groups have different descriptions (-s flag)
although the description repeats another, all but one of the cases in question are reimplemented (-r flag).
state-of-tic-tac-toe

require 'json'

def cases(h)
  # top-level won't have a description, but all lower levels will.
  # therefore we have to unconditionally flat_map cases on the top level.
  # on lower levels we either flat_map cases or just return individuals.
  cases = ->(hh, path = [].freeze) {
    hh['cases']&.flat_map { |c|
      cases[c, (path + [hh['description'].freeze]).freeze]
    } || [hh.merge(path: path)]
  }
  h['cases'].flat_map(&cases)
end

def mark_reimplemented(cases)
  reimpls = cases.map { |c| c['reimplements'] }
  cases.each { |c| c[:reimplemented] = reimpls.include?(c['uuid']) }
end

reimpl = ARGV.delete('-r')
short = ARGV.delete('-s')

maxes = []
freq = Hash.new(0)

Dir.glob("#{__dir__}/exercises/*/canonical-data.json") { |f|
  exercise = File.basename(File.dirname(f))
  descs = mark_reimplemented(cases(JSON.parse(File.read(f)))).filter_map { |c|
    next if c.fetch(:reimplemented) && !reimpl
    arbitrary = ?/
    ((short ? [] : c[:path]) + [c['description']]).join(arbitrary)
  }
  descs.tally.each { |k, v| puts "#{f} #{k} #{v}" if v > 1 }
}

I have absolutely no stake in whether duplicate descriptions should be allowed and my sincere opinion is that I'll be able to deal with it either way, so I won't be weighing in on that; that will have to be left to the interested parties who do have a stake in it.

wolf99 · 2022-04-22T17:51:05Z

Hi @ErikSchierboom
TBH I'm still figuring out exactly how far I can push the "implies" features of the JSON schema spec with regard to the uniqueness requirements.
Your recent comment on the availability (or lack thereof) of JSON schema handling in Nim has given me something to investigate a bit before continuing much more with the schema also. While there is still some benefit to having a schema, it is quite a reduced benefit if the tool cannot use it.

ErikSchierboom · 2022-04-26T06:31:24Z

Okay, thanks!

wolf99 mentioned this issue Apr 22, 2022

Should we use $Schema in config.json? #1972

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make descriptions unique #2020

Make descriptions unique #2020

ErikSchierboom commented Apr 21, 2022

SaschaMann commented Apr 21, 2022 •

edited

Loading

jiegillet commented Apr 21, 2022 •

edited

Loading

petertseng commented Apr 21, 2022

wolf99 commented Apr 22, 2022

ErikSchierboom commented Apr 26, 2022

Make descriptions unique #2020

Make descriptions unique #2020

Comments

ErikSchierboom commented Apr 21, 2022

SaschaMann commented Apr 21, 2022 • edited Loading

jiegillet commented Apr 21, 2022 • edited Loading

petertseng commented Apr 21, 2022

wolf99 commented Apr 22, 2022

ErikSchierboom commented Apr 26, 2022

SaschaMann commented Apr 21, 2022 •

edited

Loading

jiegillet commented Apr 21, 2022 •

edited

Loading