Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align Utils::suggestionList() with the reference implementation #1075

Merged
merged 13 commits into from
Mar 14, 2022
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ You can find and compare releases at the [GitHub release page](https://github.co
- Throw if `Introspection::fromSchema()` returns no data
- Reorganize abstract class `ASTValidationContext` to interface `ValidationContext`
- Reorganize AST interfaces related to schema and type extensions
- Align `Utils::suggestionList()` with the reference implementation (#1075)

### Added

Expand Down
133 changes: 133 additions & 0 deletions src/Utils/LexicalDistance.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<?php declare(strict_types=1);

namespace GraphQL\Utils;

/**
* Computes the lexical distance between strings A and B.
*
* The "distance" between two strings is given by counting the minimum number
* of edits needed to transform string A into string B. An edit can be an
* insertion, deletion, or substitution of a single character, or a swap of two
* adjacent characters.
*
* Includes a custom alteration from Damerau-Levenshtein to treat case changes
* as a single edit which helps identify mis-cased values with an edit distance
* of 1.
*
* This distance can be useful for detecting typos in input or sorting
*/
class LexicalDistance
{
private string $input;

private string $inputLowerCase;

/**
* @var array<int>
*/
private array $inputArray;
vhenzl marked this conversation as resolved.
Show resolved Hide resolved

/**
* @var array<array<int>>
*/
private array $rows;

public function __construct(string $input)
{
$this->input = $input;
$this->inputLowerCase = \strtolower($input);
$this->inputArray = self::stringToArray($this->inputLowerCase);

$length = \mb_strlen($input);
$this->rows = [
\array_fill(0, $length, 0),
\array_fill(0, $length, 0),
\array_fill(0, $length, 0),
];
spawnia marked this conversation as resolved.
Show resolved Hide resolved
}

public function measure(string $option, float $threshold): ?int
{
if ($this->input === $option) {
return 0;
}

$optionLowerCase = \strtolower($option);

// Any case change counts as a single edit
if ($this->inputLowerCase === $optionLowerCase) {
return 1;
}

$a = self::stringToArray($optionLowerCase);
$b = $this->inputArray;

if (\count($a) < \count($b)) {
$tmp = $a;
$a = $b;
$b = $tmp;
}

$aLength = \count($a);
$bLength = \count($b);

if ($aLength - $bLength > $threshold) {
return null;
spawnia marked this conversation as resolved.
Show resolved Hide resolved
}

$rows = &$this->rows;
for ($j = 0; $j <= $bLength; ++$j) {
spawnia marked this conversation as resolved.
Show resolved Hide resolved
$rows[0][$j] = $j;
}

for ($i = 1; $i <= $aLength; ++$i) {
$upRow = &$rows[($i - 1) % 3];
$currentRow = &$rows[$i % 3];

$smallestCell = ($currentRow[0] = $i);
for ($j = 1; $j <= $bLength; ++$j) {
$cost = $a[$i - 1] === $b[$j - 1] ? 0 : 1;

$currentCell = \min(
$upRow[$j] + 1, // delete
$currentRow[$j - 1] + 1, // insert
$upRow[$j - 1] + $cost, // substitute
);

if ($i > 1 && $j > 1 && $a[$i - 1] === $b[$j - 2] && $a[$i - 2] === $b[$j - 1]) {
// transposition
$doubleDiagonalCell = $rows[($i - 2) % 3][$j - 2];
$currentCell = \min($currentCell, $doubleDiagonalCell + 1);
}

if ($currentCell < $smallestCell) {
$smallestCell = $currentCell;
}

$currentRow[$j] = $currentCell;
}

// Early exit, since distance can't go smaller than smallest element of the previous row.
if ($smallestCell > $threshold) {
return null;
}
}

$distance = $rows[$aLength % 3][$bLength];

return $distance <= $threshold ? $distance : null;
}

/**
* @return array<int>
vhenzl marked this conversation as resolved.
Show resolved Hide resolved
*/
private static function stringToArray(string $str): array
{
$array = [];
foreach (\mb_str_split($str) as $char) {
$array[] = \mb_ord($char);
}

return $array;
}
}
24 changes: 8 additions & 16 deletions src/Utils/Utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
use function array_reduce;
use function array_shift;
use function array_slice;
use function asort;
use function count;
use function dechex;
use function func_get_args;
Expand All @@ -24,7 +23,6 @@
use function is_scalar;
use function is_string;
use function json_encode;
use function levenshtein;
use function mb_convert_encoding;
use function mb_strlen;
use function mb_substr;
Expand All @@ -36,7 +34,6 @@
use function range;
use function sprintf;
use stdClass;
use function strtolower;
use function unpack;

class Utils
Expand Down Expand Up @@ -348,33 +345,28 @@ static function ($list, $index) use ($selected, $selectedLength): string {
* Given an invalid input string and a list of valid options, returns a filtered
* list of valid options sorted based on their similarity with the input.
*
* Includes a custom alteration from Damerau-Levenshtein to treat case changes
* as a single edit which helps identify mis-cased values with an edit distance
* of 1
*
* @param array<string> $options
*
* @return array<int, string>
*/
public static function suggestionList(string $input, array $options): array
{
$optionsByDistance = [];
$lexicalDistance = new LexicalDistance($input);
$threshold = mb_strlen($input) * 0.4 + 1;
foreach ($options as $option) {
if ($input === $option) {
$distance = 0;
} else {
$distance = (strtolower($input) === strtolower($option)
? 1
: levenshtein($input, $option));
}
$distance = $lexicalDistance->measure($option, $threshold);

if ($distance <= $threshold) {
if ($distance !== null) {
$optionsByDistance[$option] = $distance;
}
}

asort($optionsByDistance);
\uksort($optionsByDistance, static function (string $a, string $b) use ($optionsByDistance) {
$distanceDiff = $optionsByDistance[$a] - $optionsByDistance[$b];

return $distanceDiff !== 0 ? $distanceDiff : \strnatcmp($a, $b);
});

return array_keys($optionsByDistance);
}
Expand Down
3 changes: 1 addition & 2 deletions tests/Type/EnumTypeTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -348,8 +348,7 @@ public function testDoesNotAcceptValuesWithIncorrectCasing(): void
'{ colorEnum(fromEnum: green) }',
null,
[
// Improves upon the reference implementation
spawnia marked this conversation as resolved.
Show resolved Hide resolved
'message' => 'Value "green" does not exist in "Color" enum. Did you mean the enum value "GREEN"?',
'message' => 'Value "green" does not exist in "Color" enum. Did you mean the enum value "GREEN" or "RED"?',
'locations' => [new SourceLocation(1, 23)],
]
);
Expand Down
72 changes: 62 additions & 10 deletions tests/Utils/SuggestionListTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,90 @@

class SuggestionListTest extends TestCase
{
// DESCRIBE: suggestionList

/**
* @see describe('suggestionList')
* @see it('Returns results when input is empty')
*/
public function testResturnsResultsWhenInputIsEmpty(): void
{
self::assertEquals(
Utils::suggestionList('', ['a']),
['a']
);
self::assertEquals(Utils::suggestionList('', ['a']), ['a']);
spawnia marked this conversation as resolved.
Show resolved Hide resolved
}

/**
* @see it('Returns empty array when there are no options')
*/
public function testReturnsEmptyArrayWhenThereAreNoOptions(): void
{
self::assertEquals(Utils::suggestionList('input', []), []);
}

/**
* @see it('Returns options with small lexical distance')
*/
public function testReturnsOptionsWithSmallLexicalDistance(): void
{
self::assertEquals(Utils::suggestionList('greenish', ['green']), ['green']);
self::assertEquals(Utils::suggestionList('green', ['greenish']), ['greenish']);
}

/**
* @see it('Rejects options with distance that exceeds threshold')
*/
public function testRejectsOptionsWithDistanceThatExceedsThreshold(): void
{
self::assertEquals(Utils::suggestionList('aaaa', ['aaab']), ['aaab']);
self::assertEquals(Utils::suggestionList('aaaa', ['aabb']), ['aabb']);
self::assertEquals(Utils::suggestionList('aaaa', ['abbb']), []);
self::assertEquals(Utils::suggestionList('ab', ['ca']), []);
}

/**
* @see it('Returns options with different case')
*/
public function testReturnsOptionsWithDifferentCase(): void
{
self::assertEquals(
Utils::suggestionList('verylongstring', ['VERYLONGSTRING']),
['VERYLONGSTRING']
);
self::assertEquals(
Utils::suggestionList('VERYLONGSTRING', ['verylongstring']),
['verylongstring']
);
self::assertEquals(
Utils::suggestionList('input', []),
[]
Utils::suggestionList('VERYLONGSTRING', ['VeryLongString']),
['VeryLongString']
);
}

/**
* @see it('Returns options sorted based on similarity')
* @see it('Returns options with transpositions')
*/
public function testReturnsOptionsSortedBasedOnSimilarity(): void
public function testReturnsOptionsWithTranspositions(): void
{
self::assertEquals(Utils::suggestionList('agr', ['arg']), ['arg']);
self::assertEquals(Utils::suggestionList('214365879', ['123456789']), ['123456789']);
}

/**
* @see it('Returns options sorted based on lexical distance')
*/
public function testReturnsOptionsSortedBasedOnLexicalDistance(): void
{
self::assertEquals(
Utils::suggestionList('abc', ['a', 'ab', 'abc']),
['abc', 'ab', 'a']
);
}

/**
* @see it('Returns options with the same lexical distance sorted lexicographically')
*/
public function testReturnsOptionsWithTheSameLexicalDistanceSortedLexicographically(): void
{
self::assertEquals(
Utils::suggestionList('a', ['az', 'ax', 'ay']),
['ax', 'ay', 'az']
);
}
}