-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
GH-6114: Static path matching #6146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dantleech
wants to merge
32
commits into
sebastianbergmann:main
Choose a base branch
from
dantleech:gh-6114-source-map-no-fs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,088
−14
Draft
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
0ec8baf
Initial stub implementations and test cases
dantleech 875a816
Temporary utility method to explore current behavior
dantleech e6259a4
Added initial test cases
dantleech 8bf4038
Add edge case
dantleech 0d51a58
Fixing tests
dantleech 00562d8
Implementing FileMatcher
dantleech 42250d9
Support character groups
dantleech dfff3ea
Escaping unterminated openening brackets
dantleech 5966038
Update
dantleech 5eb851a
Tokenizing
dantleech 7b10051
Progressing
dantleech ca252ad
Failing unterminated
dantleech 1a984fd
Unterminated bracket
dantleech 05a1087
Complementation
dantleech 05e58c4
Negated group
dantleech f3837cc
Fix complementation
dantleech 47efb1f
Ssquare
dantleech 633eb7c
Skip test
dantleech f9a8bc0
Fix CS
dantleech 901c050
Fix nested brackets
dantleech 8048d0e
Support char classes
dantleech a1b7467
Add doc
dantleech 6a3eaea
Add explanation
dantleech 3b0c666
Add more comments
dantleech 69209c2
Add comment and additional test case
dantleech 213b5aa
Inplementing the file matcher
dantleech 2b7bd24
Initial implementation in SourceFilter
dantleech 87de370
Fix missing types
dantleech e07a96f
Failing test after rebase
dantleech ad68e68
Add missing types
dantleech ad0b105
Use the filematcherpattern
dantleech 39baf16
Apply PHPStan fixes
dantleech File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,293 @@ | ||
<?php declare(strict_types=1); | ||
/* | ||
* This file is part of PHPUnit. | ||
* | ||
* (c) Sebastian Bergmann <[email protected]> | ||
* | ||
* For the full copyright and license information, please view the LICENSE | ||
* file that was distributed with this source code. | ||
*/ | ||
namespace PHPUnit\Util; | ||
|
||
use function array_key_last; | ||
use function array_pop; | ||
use function count; | ||
use function ctype_alpha; | ||
use function preg_quote; | ||
use function strlen; | ||
|
||
/** | ||
* FileMatcher ultimately attempts to emulate the behavior `php-file-iterator` | ||
* which *mostly* comes down to emulating PHP's glob function on file paths | ||
* based on POSIX.2: | ||
* | ||
* - https://en.wikipedia.org/wiki/Glob_(programming) | ||
* - https://man7.org/linux/man-pages/man7/glob.7.html | ||
* | ||
* The file matcher compiles the regex in three passes: | ||
* | ||
* - Tokenise interesting chars in the glob grammar. | ||
* - Process the tokens and reorient them to produce regex. | ||
* - Map the processed tokens to regular expression segments. | ||
* | ||
* @no-named-arguments Parameter names are not covered by the backward compatibility promise for PHPUnit | ||
* | ||
* @internal This class is not covered by the backward compatibility promise for PHPUnit | ||
* | ||
* @phpstan-type token array{self::T_*,string} | ||
*/ | ||
final readonly class FileMatcher | ||
{ | ||
private const string T_BRACKET_OPEN = 'bracket_open'; | ||
private const string T_BRACKET_CLOSE = 'bracket_close'; | ||
private const string T_BANG = 'bang'; | ||
private const string T_HYPHEN = 'hyphen'; | ||
private const string T_ASTERIX = 'asterix'; | ||
private const string T_SLASH = 'slash'; | ||
private const string T_BACKSLASH = 'backslash'; | ||
private const string T_CHAR = 'char'; | ||
private const string T_GREEDY_GLOBSTAR = 'greedy_globstar'; | ||
private const string T_QUERY = 'query'; | ||
private const string T_GLOBSTAR = 'globstar'; | ||
private const string T_COLON = 'colon'; | ||
private const string T_CHAR_CLASS = 'char_class'; | ||
|
||
/** | ||
* Compile a regex for the given glob. | ||
*/ | ||
public static function toRegEx(FileMatcherPattern $pattern): FileMatcherRegex | ||
{ | ||
$tokens = self::tokenize($pattern->path); | ||
$tokens = self::processTokens($tokens); | ||
|
||
return self::mapToRegex($tokens); | ||
} | ||
|
||
/** | ||
* @param list<token> $tokens | ||
*/ | ||
private static function mapToRegex(array $tokens): FileMatcherRegex | ||
{ | ||
$regex = ''; | ||
|
||
foreach ($tokens as $token) { | ||
$type = $token[0]; | ||
$regex .= match ($type) { | ||
// literal char | ||
self::T_CHAR => preg_quote($token[1]), | ||
|
||
// literal directory separator | ||
self::T_SLASH => '/', | ||
self::T_QUERY => '.', | ||
self::T_BANG => '^', | ||
|
||
// match any segment up until the next directory separator | ||
self::T_ASTERIX => '[^/]*', | ||
self::T_GREEDY_GLOBSTAR => '.*', | ||
self::T_GLOBSTAR => '/([^/]+/)*', | ||
self::T_BRACKET_OPEN => '[', | ||
self::T_BRACKET_CLOSE => ']', | ||
self::T_HYPHEN => '-', | ||
self::T_COLON => ':', | ||
self::T_BACKSLASH => '\\', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. COLON and BACKSLASH are not tested |
||
self::T_CHAR_CLASS => '[:' . $token[1] . ':]', | ||
}; | ||
} | ||
$regex .= '(/|$)'; | ||
|
||
return new FileMatcherRegex('{^' . $regex . '}'); | ||
} | ||
|
||
/** | ||
* @return list<token> | ||
*/ | ||
private static function tokenize(string $glob): array | ||
{ | ||
$length = strlen($glob); | ||
|
||
$tokens = []; | ||
|
||
for ($i = 0; $i < $length; $i++) { | ||
$c = $glob[$i]; | ||
|
||
$tokens[] = match ($c) { | ||
dantleech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
'[' => [self::T_BRACKET_OPEN, $c], | ||
']' => [self::T_BRACKET_CLOSE, $c], | ||
'?' => [self::T_QUERY, $c], | ||
'-' => [self::T_HYPHEN, $c], | ||
'!' => [self::T_BANG, $c], | ||
'*' => [self::T_ASTERIX, $c], | ||
'/' => [self::T_SLASH, $c], | ||
'\\' => [self::T_BACKSLASH, $c], | ||
':' => [self::T_COLON, $c], | ||
default => [self::T_CHAR, $c], | ||
}; | ||
} | ||
|
||
return $tokens; | ||
} | ||
|
||
/** | ||
* @param list<token> $tokens | ||
* | ||
* @return list<token> | ||
*/ | ||
private static function processTokens(array $tokens): array | ||
{ | ||
$resolved = []; | ||
$escaped = false; | ||
$bracketOpen = false; | ||
$brackets = []; | ||
|
||
for ($offset = 0; $offset < count($tokens); $offset++) { | ||
[$type, $char] = $tokens[$offset]; | ||
$nextType = $tokens[$offset + 1][0] ?? null; | ||
|
||
if ($type === self::T_BACKSLASH && false === $escaped) { | ||
// skip the backslash and set flag to escape next token | ||
$escaped = true; | ||
|
||
continue; | ||
} | ||
|
||
if ($escaped === true) { | ||
// escaped flag is set, so make this a literal char and unset | ||
// the escaped flag | ||
$resolved[] = [self::T_CHAR, $char]; | ||
$escaped = false; | ||
|
||
continue; | ||
} | ||
|
||
// globstar must be preceded by and succeeded by a directory separator | ||
if ( | ||
$type === self::T_SLASH && | ||
$nextType === self::T_ASTERIX && ($tokens[$offset + 2][0] ?? null) === self::T_ASTERIX && ($tokens[$offset + 3][0] ?? null) === self::T_SLASH | ||
) { | ||
$resolved[] = [self::T_GLOBSTAR, '**']; | ||
|
||
// we eat the two `*` and the trailing slash | ||
$offset += 3; | ||
|
||
continue; | ||
} | ||
|
||
// greedy globstar (trailing?) | ||
// TODO: this should probably only apply at the end of the string according to the webmozart implementation and therefore would be "T_TRAILING_GLOBSTAR" | ||
if ( | ||
$type === self::T_SLASH && | ||
($tokens[$offset + 1][0] ?? null) === self::T_ASTERIX && ($tokens[$offset + 2][0] ?? null) === self::T_ASTERIX | ||
) { | ||
$resolved[] = [self::T_GREEDY_GLOBSTAR, '**']; | ||
|
||
// we eat the two `*` in addition to the slash | ||
$offset += 2; | ||
|
||
continue; | ||
} | ||
|
||
// two consecutive ** which are not surrounded by `/` are invalid and | ||
// we interpret them as literals. | ||
if ($type === self::T_ASTERIX && ($tokens[$offset + 1][0] ?? null) === self::T_ASTERIX) { | ||
$resolved[] = [self::T_CHAR, $char]; | ||
$resolved[] = [self::T_CHAR, $char]; | ||
|
||
continue; | ||
} | ||
|
||
// complementation - only parse BANG if it is at the start of a character group | ||
if ($type === self::T_BANG && isset($resolved[array_key_last($resolved)]) && $resolved[array_key_last($resolved)][0] === self::T_BRACKET_OPEN) { | ||
$resolved[] = [self::T_BANG, '!']; | ||
|
||
continue; | ||
} | ||
|
||
// if this was _not_ a bang preceded by a `[` token then convert it | ||
// to a literal char | ||
if ($type === self::T_BANG) { | ||
$resolved[] = [self::T_CHAR, $char]; | ||
|
||
continue; | ||
} | ||
|
||
// https://man7.org/linux/man-pages/man7/glob.7.html | ||
// > The string enclosed by the brackets cannot be empty; therefore | ||
// > ']' can be allowed between the brackets, provided that it is | ||
// > the first character. | ||
if ($type === self::T_BRACKET_OPEN && $nextType === self::T_BRACKET_CLOSE) { | ||
$bracketOpen = true; | ||
$resolved[] = [self::T_BRACKET_OPEN, '[']; | ||
$brackets[] = array_key_last($resolved); | ||
$resolved[] = [self::T_CHAR, ']']; | ||
$offset++; | ||
|
||
continue; | ||
} | ||
|
||
// if we're already in a bracket and the next two chars are [: then | ||
// start parsing a character class... | ||
if ($bracketOpen && $type === self::T_BRACKET_OPEN && $nextType === self::T_COLON) { | ||
// this looks like a named [:character:] class | ||
$class = ''; | ||
$offset += 2; | ||
|
||
// parse the character class name | ||
while (ctype_alpha($tokens[$offset][1])) { | ||
$class .= $tokens[$offset++][1]; | ||
} | ||
|
||
// if followed by a `:` then it's a character class | ||
if ($tokens[$offset][0] === self::T_COLON) { | ||
$offset++; | ||
$resolved[] = [self::T_CHAR_CLASS, $class]; | ||
|
||
continue; | ||
} | ||
|
||
// otherwise it's a harmless literal | ||
$resolved[] = [self::T_CHAR, ':' . $class]; | ||
} | ||
|
||
// if bracket is already open and we have another open bracket | ||
// interpret it as a literal | ||
if ($bracketOpen === true && $type === self::T_BRACKET_OPEN) { | ||
$resolved[] = [self::T_CHAR, $char]; | ||
dantleech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
continue; | ||
} | ||
|
||
// if we are NOT in an open bracket and we have an open bracket | ||
// then pop the bracket on the stack and enter bracket-mode. | ||
if ($bracketOpen === false && $type === self::T_BRACKET_OPEN) { | ||
$bracketOpen = true; | ||
$resolved[] = [$type, $char]; | ||
$brackets[] = array_key_last($resolved); | ||
|
||
continue; | ||
} | ||
|
||
// if are in a bracket and we get to bracket close then | ||
// pop the last open bracket off the stack and continue | ||
// | ||
// TODO: $bracketOpen === true below is not tested | ||
if ($bracketOpen === true && $type === self::T_BRACKET_CLOSE) { | ||
// TODO: this is not tested | ||
$bracketOpen = false; | ||
|
||
array_pop($brackets); | ||
$resolved[] = [$type, $char]; | ||
|
||
continue; | ||
} | ||
|
||
$resolved[] = [$type, $char]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this handles any "left over tokens" - including |
||
} | ||
|
||
// foreach unterminated bracket replace it with a literal char | ||
foreach ($brackets as $unterminatedBracket) { | ||
$resolved[$unterminatedBracket] = [self::T_CHAR, '[']; | ||
} | ||
|
||
return $resolved; | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to change to account for include/exclude matching on the
basename
either in the regex or otherwise.