Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic code formatting for notebooks #1255

Open
5 tasks
mofojed opened this issue Apr 28, 2023 · 29 comments · May be fixed by #2233
Open
5 tasks

Automatic code formatting for notebooks #1255

mofojed opened this issue Apr 28, 2023 · 29 comments · May be fixed by #2233
Assignees
Labels
enhancement New feature or request

Comments

@mofojed
Copy link
Member

mofojed commented Apr 28, 2023

As a user, I want automatic code formatting in my notebooks so that my code looks pretty.

I would like to have formatting applied by something like black: https://github.com/psf/black.

Ruff wasm Python formatter and linter:

  • if we are building it, need an action to do so and keep it updated, else adopt an active npm version
    (https://github.com/dprint/dprint-plugin-ruff has automation to self update and publish new versions, so maybe use that one?)
  • Should be lazy loaded
  • New settings menu section: "Editor" with options: for formatting on save and lint rules (via a code editor with ruff linting rules they have on the playground?)
  • Format button in notebook overflow, and context menu
  • Settings must be configurable in enterprise by a prop so that orgs can enforce which linting rules apply globally
@mofojed mofojed added enhancement New feature or request triage Issue requires triage labels Apr 28, 2023
@devinrsmith
Copy link
Member

I think this is a great idea. +1 on black.

@vbabich vbabich added this to the Backlog milestone May 3, 2023
@vbabich vbabich removed the triage Issue requires triage label May 3, 2023
@dsmmcken
Copy link
Contributor

dsmmcken commented May 3, 2023

https://github.com/charliermarsh/ruff is just a linter, but I believe they are working on adding an autoformatter.

The list of projects using it is very impressive, so I can imagine it might replace black in the near future.

@devinrsmith
Copy link
Member

Big fan of anything written in rust! Looks like a neat project, I'll definitely keep an eye on it.

astral-sh/ruff#4041

I believe we have plans to add a Ruff formatter, but it's a ways off. For now, black is still required.

That's right. For now, we don't autofix line wrapping, and we do recommend using Ruff alongside Black.

So, it looks like the devs of ruff like black too; my guess is if they eventually get a Ruff formatter, it will have a black-compatible mode. I think black is still our best bet today.

@dsmmcken
Copy link
Contributor

dsmmcken commented May 4, 2023

Black is probably the best choice today, but yeah that could change in the future astral-sh/ruff#1904

We use prettier for our stuff, so them combining black plus some basic config options like prettier allows would be great.

@devinrsmith
Copy link
Member

https://github.com/jpadilla/black-playground
https://black.vercel.app

could be useful starting point?

@devinrsmith
Copy link
Member

It could be useful to use bidirection plugins to support this use case.

@mattrunyon
Copy link
Collaborator

Ruff has a wasm package (have to build it from src) that can be used to format and lint completely client-side

@mattrunyon
Copy link
Collaborator

Not something I will currently continue working on (I have other priorities), but I made a quick POC using ruff as wasm. https://github.com/mattrunyon/web-client-ui/tree/ruff-wasm

I built the wasm from source and just included it as part of the branch. Looked at https://github.com/astral-sh/ruff/blob/main/playground/src/Editor/SourceEditor.tsx for how to call the linter/formatter and map the linter results to actions and markers.

I roughly implemented

  • Format on save (only on ctrl+s shortcut)
  • Linter markers (no debouncing or deferring)
  • Code actions for quick fixing

Would still need to cleanup the code and add

  • Configuring linter settings and format on save via user settings menu
  • Action to add ignore comment for linter rules (noqa comments. See this part of ruff-lsp for some more info)
  • Adding a range formatting provider. Probably just format all the lines before the start of the range, then all lines + range and insert the diff

Some issues on my branch

  • Duplicate code actions. Providers might be registered for every open editor. For autocomplete this doesn't cause an issue as when 1 provider resolves, the others are ignored I think. But for code actions it seems to call all resulting in duplicated code actions.
  • Building prod doesn't copy the wasm file into assets, so it doesn't load properly. The wasm file should probably be part of an assets folder included w/ the console package.
  • Code action lightbulb seems to be off. Not sure if we're doing some CSS to move another element that also moves that button

@mofojed mofojed assigned mattrunyon and unassigned mattrunyon Feb 5, 2024
@devinrsmith
Copy link
Member

Potentially related would be catching syntax errors on the web UI; I don't know if the code formatting tools considered would expose the appropriate syntax error hints, but that would also be a great addition.

@dsmmcken
Copy link
Contributor

dsmmcken commented May 1, 2024

Two questions we should decide an answer on:

  1. Do we enable format on save by default? (my opinion is yes)
  2. Are there any linting rules we turn on beyond the default? Probably, but not sure. We should have our Pythonistas look through the available rule sets.

@dsmmcken
Copy link
Contributor

@jnumainville for investigating any additional default rules sets that are commonly enabled

@mattrunyon
Copy link
Collaborator

https://docs.astral.sh/ruff/settings/#lintflake8-implicit-str-concat I think we should add this rule by default

t = t.update([
  "X=X+1" # missing comma at the end not flagged by other rulesets
  "Y=Y+1"
])

@mattrunyon
Copy link
Collaborator

mattrunyon commented May 13, 2024

We should probably default to no style specific rules and focus only on rules that can show actual problems with the code (like inconsistent tabs/spaces). Or if they are style related, they should be to prevent easy to miss mistakes like the implicit string concatenation. That rule is technically style because it's valid Python.

Another style rule we should consider is warning for unused imports/variables.

Users would be able to add the other rules if they wanted to enforce something like mandatory pep8 naming conventions. We shouldn't enforce those on everyone by default unless we feel that strongly about it.

@mattrunyon
Copy link
Collaborator

I've been looking at the ruff rules https://docs.astral.sh/ruff/rules

I think we should enable

  • Pyflakes (F). These seems to be all actual errors or potential errors at a language level. Like if-tuple which is always true
  • pycodestyle E999 syntax-error. Definitely useful
  • pycodestyle E100 and E200 levels. These are all addressing extra/inconsistent whitespace and don't appear to be style choices, but things that could cause actual parse errors.
  • pycodestyle E711 (none-comparison) might be useful. I'm not sure if x == None can cause issues in Python or it's just recommended to use x is None. If the latter then we maybe leave it off by default since it's just a style choice.
  • pycodestyle W291 (trailing-whitespace) may already be covered by the E100/200s
  • pycodestyle W293 (blank-line-with-whitespace) may be covered by E100/200s
  • pycodestyle W605 (invalid-escape-sequence)
  • isort I002 (missing-required-import) I would need a Python person to tell me if this is actually useful or not in DH
  • flake8-bugbear B002 (unary-prefix-increment-decrement). Unless this is caught by another syntax error rule
  • flake8-bugbear B015 (useless-comparison). Should prevent accidental bugs/errors
  • flake8-bugbear B016 (raise-literal). Might be caught by other rules. Prevents a TypeError
  • flake8-bugbear B018 (useless-expression). Probably prevent accidental bugs
  • flake8-bugbear B020 (loop-variable-overrides-iterator). Prevent accidental bugs
  • flake8-bugbear B023 (function-uses-loop-variable). This is DEFINITELY an issue for developers writing dh.ui components with a loop creating components since Python does not create closures over the loop iterations
  • flake8-bugbear B032 (unintentional-type-annotation)
  • flake8-bugbear B035 (static-key-dict-comprehension). Usually not intended
  • flake8-bugbear B909 (loop-iterator-mutation). Can cause infinite loops if you're not careful
  • flake8-commas COM818 (trailing-comma-on-bare-tuple). Seems like it would catch accidental bugs
  • flake8-implicit-string-concatenation (ISC). I need to check what combo of these rules we would want. There is also a preference (separate from the rules) we probably want on to disable allowing multiline concatenation
  • pylint error (PLE). These seem like actual errors. Not sure how much overlap there is with pyflakes.
  • Ruff-specific rules RUF001 (ambiguous-unicode-character-string)
  • RUF021 (parenthesize-chained-operators). Prevent unclear and/or logic
  • RUF027 (missing-f-string-syntax). Warn if missing the f before what looks like an fstring

Maybe enable, but might not be very useful. I need a Python person to provide more insight

  • flake8-async
  • flake8-bugbear B029 (except-with-empty-tuple). Not sure how common this would be in DH code
  • flake8-builtins (A). All prevent shadowing builtins. Not sure if this is an easy trap in Python
  • flake8-datetimez (DTZ). I'm not sure if we want to warn about any of these. For example, I'm not sure if omitting a timezone in Python leads to an error when you try to put it into an Instant in a table or if it auto adds UTC.
  • flake8-logging (LOG). Idk if this applies to user code in DH at all
  • flake8-return (RET503). Not sure if we want to enforce this. Could catch potential issues
  • pandas-vet (PD). Not sure if these are just style things or actual mistakes in pandas
  • Pylint PLC2401, PLC2403. Prevent somehow getting a non-ascii character in a variable?
  • NumPy-specific rules (NPY). Would numpy users care about these?
  • Perflint (PERF). Some potential performance gains? I disagree with the rules preferring dict comprehension over a for loop claiming comprehensions are more readable. I think only simple comprehensions are more readable.

Probably should not enable, but individual users might want to use these

  • pycodestyle E700 rules are stylistic, but might be something many people want warnings on
  • pep8-naming (N). I think these are all stylistic. There's not actual errors or bugs caused by anything here
  • flake8-comprehensions (C4). If you use lots of comprehensions I imagine this could point out some optimizations

@mattrunyon
Copy link
Collaborator

Formatter settings we should leave as default I think with the following exceptions

  • indent-width should match our monaco width. I think both default to 4
  • quote-style. Default is double. We could use preserve, but probably just leave as double and let users configure

@dsmmcken
Copy link
Contributor

The default is ["E", "F"] which encompasses all the E and F rules, I would propose we only add rules to those, not remove any.

@mattrunyon
Copy link
Collaborator

Many of the E rules outside 1xx/2xx/9xx are more stylistic to me. If @jnumainville wants to chime in I'd like his thoughts on the other E rules.

Also, the default is ["E4", "E7", "E9", "F"]. I don't think E4 and E7 (4xx and 7xx) are actual problems with code

@jnumainville
Copy link

jnumainville commented May 17, 2024

To address some specific points -
On one hand E731 is certainly one I don't think we'd want considering the usefulness of lambdas in React style code.
On the other hand:
E402 could be a code problem if you've accidentally imported something after the fact.
E721 is one that could be a problem if you're subclassing.
E722 could definitely be a code problem by swallowing up exceptions that you shouldn't.

More broadly though, the E rules are PEP8 - these are rules that most people are going to be used to and reasonably expect. I don't think it's unreasonable to have those in a basic rule set, even though some are stylistic choices. Or at least E4 and E7 (besides E731) are reasonable. Seems like E1, E2, E3 rules are unstable? E3 are ones I wouldn't argue as being important.

@devinrsmith
Copy link
Member

Are we planning to provide an easy way to turn off formatting and turn off linting (likely, two different settings)?

@devinrsmith
Copy link
Member

I think @jmao-denver and @chipkent should take a look specifically. I'm hoping that the settings we choose to use for web UI can be adopted by deephaven-core python development, ala deephaven/deephaven-core#3638

@mattrunyon
Copy link
Collaborator

Yes they will be toggles either in the notebook settings (like mini-map) or the user settings (like shortcuts).

Linting config will be user customizable. Formatting has a few options as well, but I think it's only indent size and single/double quotes for strings.

For aligning w/ deephaven-core Python development, the web UI might be a subset of the rules you want there. There are some rules that are more preference/style that we might want in deephaven-core, but not necessarily applied to all users of DH

@chipkent
Copy link
Member

  • isort I002 (missing-required-import) https://docs.astral.sh/ruff/rules/missing-required-import/ looks like it is adding imports to the file. I would be concerned about this happening without careful review.
  • flake8-async -> yes
  • flake8-bugbear B029 -> yes
  • flake8-builtins -> would be good to turn on. We may get a bunch of alerts where we missed things.
  • flake8-datetimez (DTZ) -> probably not. I think we wrote the code to be flexible with what users do, so I'm sure we have some test cases where this will fail.
  • lake8-logging (LOG) -> probably yes.
  • flake8-return (RET503) -> RET501-RET503 yes. RET504-RET508 probably not.
  • pandas-vet (PD) -> probably yes. Confirm with @jmao-denver
  • Pylint PLC2401, PLC2403 -> yes
  • NumPy-specific rules (NPY) -> yes
  • Perflint (PERF) -> yes, but we may have failures that need to be cleaned up
  • pycodestyle -> I would make a style assessment a separate effort since it will have more arguing and contention
  • pep8-naming (N) -> I'm very much in favor of turning this on. We have tried to make the API PEP8 compliant and should continue to do this. Having said that, we may get a bunch of alerts that need to be cleaned.
  • flake8-comprehensions (C4) -> We use a lot of comprehensions, so this is probably good as a performance audit. Not sure how many alerts we'll get.

DM me if you want me to dig in beyond the list you have.

@mattrunyon
Copy link
Collaborator

@chipkent just to clarify the rules list I proposed was for the web UI to show to users in notebooks/terminal. I think most of what you added would probably not be annoying to users, but pep8-naming might be something we leave up to users to add if they want it. Or we can enable it by default and users can disable if they don't care about it. Just depends on what we want the default experience to be in the web UI.

We can also set some rules to warning (yellow squiggly underline) and others to error (red squiggly underline). That would be up to us to determine the defaults as ruff does not assign any default levels to rules.

If we're adding linting rules to deephaven-core then we can definitely be as opinionated as we want and should add things like pep8-naming.

The required imports rule is something you can customize. I think by default it does nothing when on because you have to configure a list of required imports. https://docs.astral.sh/ruff/settings/#lint_isort_required-imports

@chipkent
Copy link
Member

I see. My bad. Let me reasses the list with that in mind.

  • isort I002 (missing-required-import) -> not sure. I'm a bit fuzzy on what this does. You would need to play with it.
  • flake8-async -> yes
  • flake8-bugbear B029 -> yes
  • flake8-builtins -> A001 & A002 yes; A003 maybe
  • flake8-datetimez (DTZ) -> no
  • lake8-logging (LOG) -> yes.
  • flake8-return (RET503) -> RET501-RET503 yes. Others no.
  • pandas-vet (PD) -> PD002 no; PD901 no; others maybe with a lean to no. It would depend on how much info warnings provide. Confirm with @jmao-denver
  • Pylint -> PLC2401 & PLC2403 I would lean to yes, but I have some concerns that international users might want it off. These are both preview features, so that might make us want to lean to no.
  • NumPy-specific rules (NPY) -> yes
  • Perflint (PERF) -> I could go either way. I might lean to yes to potentially avoid some perf problems that DH has to track down.
  • pycodestyle -> E101, E113, E999 yes. Others are probably too style specific. These should avoid us support costs.
  • pep8-naming (N) -> no, but it would be good to have an option to turn this on.
  • flake8-comprehensions (C4) -> I lean to yes. I think this could save support time chasing perf problems. Should get @jmao-denver thoughts.

DM me if you want me to dig in beyond the list you have.

@mattrunyon
Copy link
Collaborator

I002 we don't need. It lets you configure a list of imports that must occur in every file. So you could require every file do from __future__ import typings or import deephaven.special_sauce if that were required for every file

@jmao-denver
Copy link

jmao-denver commented May 31, 2024

I am in agreement with @chipkent's preference. During my discussion with @mattrunyon , I raised the issue of Python version compliance/compatibility check, it seems that that's currently not available with Ruff and the rulesets. But one can argue that it is more important to DH than to the users.

@mofojed
Copy link
Member Author

mofojed commented Jun 4, 2024

Will add to a feature branch first, starting with this PR #2043

@mofojed
Copy link
Member Author

mofojed commented Aug 2, 2024

@rbasralian would like to make sure there is an option to disable/enable auto format on save.

@mofojed mofojed modified the milestones: Backlog, September 2024 Sep 9, 2024
mattrunyon added a commit that referenced this issue Sep 20, 2024
Fixes #1255 

I need to test this doesn't crash if using groovy. Not sure if the ruff
version is available without initializing ruff first or what happens if
it's unavailable.

- Added global setting for minimap, format on save, formatter, and
formatting settings
- Added format on save option and format button to notebook overflow
menu
- Format on save triggers on both shortcut and pressing the save button
- Config editor has JSON schema included
- Config editor cannot be closed by clicking outside the modal, but can
be closed w/ escape key

Something we should add maybe as part of this PR or a final follow-up to
this feature branch is disabling certain/all linting/formatting for the
console. One way we might do this is by registering the console w/ a
specific URI or URI scheme so we can check it when checking if we should
lint. Or include it as a prop to the `MonacoProviders` component. This
would require some changes to the current setup of setting/using the
config on `MonacoProviders` static members/methods

(This might not actually be breaking, it's more just notes for
implementing in DHE)
BREAKING CHANGE:
The app should call `MonacoUtils.init` with a `getWorker` function that
uses the JSON worker in addition to the general fallback worker when
adding support for configuring ruff.
@mattrunyon mattrunyon linked a pull request Sep 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants