Generic scope handler interfaces #1031

AndreasArvidsson · 2022-10-12T14:03:51Z

Implements generic scope handlers
Adds "identifier" scope type
Closes Implement generic "every" logic #797
Closes Update scope type modifier to expand on both ends #484

Checklist

Add "identifier" scope type
Switch "word" and "char" to use this new setup, with "identifier" as their parent scope type
Add test for "take next word" when cursor is in the whitespace between two tokens, eg "foo | barBaz"
Add test for "one token backward"
Link issues that this PR closes and clean up related issues
Remove old modifier stages
Figure our grand scopes. Just pass ancestorIndex to getScopesTouchingPosition
I have added tests
I have updated the docs and cheatsheet

src/core/tokenizer.ts

src/processTargets/getModifierStage.ts

src/processTargets/modifiers/ContainingScopeStage.ts

src/processTargets/modifiers/scopeHandlers/TokenScopeHandler.ts

src/processTargets/modifiers/scopeHandlers/scopeHandler.types.ts

src/processTargets/modifiers/scopeHandlers/TokenScopeHandler.ts

pokey · 2022-10-12T21:28:32Z

Ok super helpful to see the code written out; great call to just bang it out.

Here's what I'm thinking. Scope type exposes two functions:

A function that takes a position and returns containing iteration range, just as a range
A function that takes an iteration range and returns a list of scopes

I think it would look as follows

interface ScopeDefinition {
   getIterationRange(editor: TextEditor, position: Position): Range;
   getScopesInIterationRange(editor: TextEditor, range: Range): Scope[];
}

interface Scope {
   domain: Range;
   getTarget(isBackward: boolean): Target;
   getPriority?(): number;
}

Then the base class / intermediary has an interface as follows:

/** For use as return type of {@link ScopeHandler.getTargetsForRange} */
interface IterationScope {
   targets: Target[]
   containingStartIndex: number;
   containingEndIndex: number;
}

class ScopeHandler {
   getTargetsForRange(editor: TextEditor, range: Range, isBackward: boolean): IterationScope {
      const startIterationRange = self.scopeDefinition.getIterationRange(editor, range.start)
      const iterationRange = contains(startIterationRange, range.end) ? startIterationRange : startIterationRange.union(self.scopeDefinition.getIterationRange(editor, range.end)

      const scopes = self.scopeDefinition.getScopesInRange(editor, iterationRange)

      // Then compute containing scope by intersection with domain for non-empty range
      // For empty range, pick weakly containing domain, breaking ties using `scope.getPriority()` if it exists, else prefer right
      // Then call `getTarget()` on the ones we want to return for `targets`
   }
}

Then

containing scope modifier stage is implemented by grabbing the iterationScope.containingIndices and constructing range target
every scope modifier stage is implemented by grabbing the iterationScope.targets and just doing an intersection internally if hasExplicitRange
ordinal scope modifier stage is implemented by calling the every stage, like we do today. I think it is actually good that there is a hard dependency here, because we want every and ordinal to behave identically
relative scope modifier stage is implemented by looking at containing scope and list of targets and doing its magic from there

Note that the only place that we ever refer to hasExplicitRange is in the every scope modifier stage

wdyt?

pokey · 2022-10-12T21:58:39Z

One other question is how we handle "every round". I guess maybe we just have it return an empty range for getIterationRange? I'm not sure we want to do any expansion

Then for getIterationScope, if any pairs start or end inside the iteration range, it returns all top-level pairs that it it intersects with. Top-level here means not contained by any other pair intersecting with iteration range.

If no pairs start or end inside the iteration range, we return the smallest pair containing the iteration range, or maybe just error

So I think this works within the api I defined above; just wanted to write it out to see

I guess one question is how "next round" works, because I believe this algorithm won't return enough for it to see the next one. I wonder if pairs want special treatment for next 🤔. Might be better for them to be able to walk forward and figure it out themselves. Ie go until they hit an opening paren and then take that round.

Maybe "every" isn't really the right abstraction to use for implementing "next" 🤔

Eg it's slightly unfortunate that "next token" breaks when it hits a line boundary

pokey · 2022-10-12T22:03:41Z

For functions, we expand to class in getIterationRange

In getScopesInRange, I think we just check that the range is just one class, and return its functions if so? Otherwise error? Idk

pokey · 2022-10-18T09:43:30Z

Ok what should the following do?

aaa bbb|. ccc

Cursor is the |; saying "two tokens"? Should it refer to bbb. or . ccc? I think probably the consistent thing is bbb., because it's two tokens starting from "token". One could also argue it should always just look to the right, though that breaks down when the cursor is in the middle of a token, which should obviously be that token and the following.

Unfortunately, I don't think it's as simple as containing and then adding as many scopes as necessary. For example:

{
    aaa {bbb} ccc {ddd} eee
    {fff}
}

In this case, if the user says "three curlies line air", I think it should be {bbb} past {fff}, but if we treat "<number> <scope>s" as first expanding to containing, it will expand to the big curly containing everything and then have nothing to add

So my thinking is as follows:

If the input target is empty, then index 0 should refer to containing scope. If the input target is not empty, then index 0 should refer to all intersecting scopes. Wdyt?

for more information, see https://pre-commit.ci

pokey · 2022-10-20T19:28:47Z

fwiw this is what it looks like to add a scope type now: 0818816 cc/ @AndreasArvidsson

for more information, see https://pre-commit.ci

pokey

Ok I think it's good to go!

pokey · 2022-10-21T14:53:51Z

cursorless-talon/src/modifiers/relative_scope.py

+@mod.capture(rule="<user.cursorless_scope_type> {user.cursorless_backward_modifier}")
+def cursorless_relative_scope_one_backward(m) -> dict[str, Any]:


I introduced this one because otherwise you need to say "one tokens backward", which is kinda awkward 😅. Now you just say "token backward". Cheatsheet updated

pokey · 2022-10-21T15:12:13Z

src/processTargets/modifiers/scopeHandlers/CharacterScopeHandler.ts

+
+export default class CharacterScopeHandler extends NestedScopeHandler {
+  public readonly scopeType = { type: "character" } as const;
+  public readonly iterationScopeType = { type: "token" } as const;


Note that I made this token rather than identifier. Having it be identifier broke "every char block" because it would skip over things like .. I think token is a sensible iteration scope anyway: the user can always say "first char identifier"

pokey · 2022-10-21T15:13:52Z

src/processTargets/modifiers/scopeHandlers/NestedScopeHandler.ts

+  constructor(
+    public readonly scopeType: ScopeType,
+    protected languageId: string
+  ) {}


I decided to keep these args for consistency, but note that many nested scope types will just reuse this constructor, so won't need to define anything. See eg IdentifierScopeHandler, CharacterScopeHandler, etc

pokey · 2022-10-21T15:16:02Z

src/processTargets/modifiers/scopeHandlers/WordScopeHandler.ts

+    // FIXME: Switch to using getMatchesInRange once we are able to properly
+    // mock away vscode for the unit tests in subtoken.test.ts


This was a bit annoying, but I wasn't prepared to shove the graph through all the functions necessary to get it to this scope handler for this PR and then add proper VSCode mocking for it. But I think we need to do that sooner rather than later so that we can have unit tests on these handlers rather than having to choose between full end-to-end or having to factor out functions from our handlers so that they can be tested

pokey · 2022-10-21T15:20:33Z

src/processTargets/modifiers/scopeHandlers/index.ts

@@ -0,0 +1,8 @@
+export * from "./NestedScopeHandler";


Update: I forgot to add something to this file and did a direct import instead, and sure enough hit a circular dependency problem. Looks like we do in fact need this file

We should look into a simple plop setup to automate adding these things

pokey · 2022-10-21T15:23:03Z

src/processTargets/modifiers/scopeTypeStages/SubTokenStages.ts

@@ -1,204 +0,0 @@
-import { Range, TextEditor } from "vscode";


pokey · 2022-10-21T15:26:11Z

src/test/suite/fixtures/recorded/subtoken/clearSixthNextWord.yml

@@ -0,0 +1,30 @@
+languageId: plaintext


Notice that word can cross both token and line boundaries now

pokey · 2022-10-21T15:27:51Z

src/test/suite/fixtures/recorded/subtoken/takeEveryWordLine2.yml

@@ -0,0 +1,35 @@
+languageId: plaintext


This used to be one of your word tokeniser unit tests. I made it into an end-to-end test because it now relies on interactions between word tokenizer and identifier tokenizer, so was easier to just test this way. All your other word tokenizer tests were within an identifier boundary so easier to run as unit tests

pokey · 2022-10-21T15:28:14Z

src/test/suite/fixtures/subtoken.fixture.ts

-  {
-    input: "apiV1 api_v_1",
-    expectedOutput: ["api", "V", "1", "api", "v", "1"],
-  },
  {


See above; this one is now an end-to-end test

pokey · 2022-10-21T17:15:59Z

Filed #1052 to track migrating remaining scope types

* Original scope handlers * More stuff * Use proper token stage in sub token stage * Work around for identifier matcher * Use new scope handler in relative scope stage * Use new scope handlers in ordinal scope stage * Update usage of containing indices * Rename * Refactored create target * Clean up * clean up * Add some tests that should pass * Add a bunch more tests * Attempt at different approach to scope handlers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial implementation of new idea * More work on this stuff * Rename and add some todos * More stuff * Tweaks * Renames and docstrings * Restructuring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More jsdocs * Tweaks * Test fixes * Fix error messages * Revert `OrdinalScopeStage` * Tweak * jsdocs; fix import * Don't export legacy types * Fix import * Preparation for surrounding pairs * Naming cleanup * Unify `getLegacyScopeStage` functions * Lots of cleanup * Fix regex `lastIndex` issue * More cleanup * More cleanup * More cleanup * Add `ancestorIndex` in prepartion for #124 * Add more jsdocs * More docs * More docs * docstrings * Improve hierarchical error type * Docs * More minor dog tweaks * More docs * More docs * Doc strings and a couple tests * Remove `isPreferredOver` * Support `ancestorIndex` on api surface * Improved jsdocs * Split and cleanup relative stages * Make scope handler constructor args optional * More legacy type fixes * Add `"identifier"` scope * Working new code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Docs + cheatsheet * Update docs * Add jsdoc * jsdoc * JSDocs * doc tweaks * reflow * Tweaks Co-authored-by: Pokey Rule <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Original scope handlers * More stuff * Use proper token stage in sub token stage * Work around for identifier matcher * Use new scope handler in relative scope stage * Use new scope handlers in ordinal scope stage * Update usage of containing indices * Rename * Refactored create target * Clean up * clean up * Add some tests that should pass * Add a bunch more tests * Attempt at different approach to scope handlers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial implementation of new idea * More work on this stuff * Rename and add some todos * More stuff * Tweaks * Renames and docstrings * Restructuring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More jsdocs * Tweaks * Test fixes * Fix error messages * Revert `OrdinalScopeStage` * Tweak * jsdocs; fix import * Don't export legacy types * Fix import * Preparation for surrounding pairs * Naming cleanup * Unify `getLegacyScopeStage` functions * Lots of cleanup * Fix regex `lastIndex` issue * More cleanup * More cleanup * More cleanup * Add `ancestorIndex` in prepartion for cursorless-dev#124 * Add more jsdocs * More docs * More docs * docstrings * Improve hierarchical error type * Docs * More minor dog tweaks * More docs * More docs * Doc strings and a couple tests * Remove `isPreferredOver` * Support `ancestorIndex` on api surface * Improved jsdocs * Split and cleanup relative stages * Make scope handler constructor args optional * More legacy type fixes * Add `"identifier"` scope * Working new code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Docs + cheatsheet * Update docs * Add jsdoc * jsdoc * JSDocs * doc tweaks * reflow * Tweaks Co-authored-by: Pokey Rule <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

pokey and others added 5 commits October 11, 2022 17:20

Original scope handlers

846bd80

More stuff

ed0d18b

Initial work

b7a4d73

Use proper token stage in sub token stage

6d4dbfe

Work around for identifier matcher

71078ed