-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: mild mode for the code
parser
#155
Comments
Hi! I think I understand your sentiment, but there is a number of problems with the approach proposed. First: Spell Right is supposed to spell "fenced" comments (e.g. docstrings) in e.g. Python and other code as well, see here: If it does not work in some language then let me know. And BTW the parser is supposed to omit all the variables so far. Second: similarity measure: As you may have noticed my extension uses native spelling APIs which results in better spelling quality (e.g. short words, abbreviations, case and many other are taken under consideration whereas other spellers e.g. start spelling from three letters to speed things up.) but has its limitations one of them is that It does not allow to adjust spelling metrics like you propose. Of course there are approaches which could, as you propose, infer about what is code and what is string (e.g. there is nice Bayesian approach to this, I remember extension which was automatically choosing dictionary in a mail being written) but that seems a bit of overshoot especially when you cannot control anything below the API level. Main problem with the solution that you postulate is that so far VSCode does not give extensions an access to the document's syntax information. If I could use VSCode's parser used to colorize code to determine what text is what (variable, keyword, identifier etc.) I could easily apply e.g. CamelCase or snake_case rules for spelling. It would not require a similarity measure approach. Could be just strict spelling. On the other hand I am not able to provide parsers so fine-grained to service a multitude of code documents. Hence, the comments/strings approach known for long time e.g. from Visual Studio extensions. The whole extension has originated from VSCode's issue #20266 which points out that there is generally a problem with spelling in VSCode. There is other speller which does spell code in a brute way. That is care not about the syntax just spell everything and eliminate in some way as much as possible of what seems to be OK: keywords in separate dictionaries etc. I have simply decided for different approach. But I am not ruling anything out. Especially if there would be some document syntax support from VSCode's part. |
Oh, gosh! I have finally understood what you desire! One well commented picture is worth more than ... never mind. This |
I gave the thing another thought and I think I have found a much better way to do this. The speller should use document's symbols! That is I cannot use the parsers which are used by VSCode but there is a way to get the symbols exactly like the list that you get when you do a Because of some considerations it was a complicated code modification and I would like to ask you to test the modified extension. You can download it from here: https://drive.google.com/open?id=1IBI8JseAlrQNHGYQVV2O3B--TiSmJLFi Just install it from file using |
Hi @bartosz-antosik , |
Is the document you test it with available somewhere? Could I have a similar document to test things? |
I also tired to uninstall the extension and then install it from the provided .vsix, which reduced the number of errors from 184 to 171 (helped a bit, but not much): But now the synchronization is lost between the reporting errors and actual lines of the code and still most of the errors are remained. |
Something is not OK here (e.g. switching document has sometimes problems), but it seems that this approach does reduce the number of spelling errors (see Args names below) and helps to detect spelling errors (TaskUnfoExt instead of TaskInfoExt): There is a known problem with reading symbols from the document that is sometimes returns empty list at first. I will investigate it further but it seems that it is the way to go. Partially at least. |
Another take, should be better (version 2.3.7): https://drive.google.com/open?id=1_iOS4VgN0Njp8rfhd-jTbMXNt6GmbQ_S Could you please install and tell me whether it eliminates document symbols correctly? |
Thanks, it works much better! |
Last release has the symbols spelling included. I will deliberate a bit on these other ideas of yours. |
Spellright works fine with the text but becomes hardly usable for the code (Python, C++, JS, ...) with docstrings even in the
parser
mode (only comments are strings are spelled).It makes sense to use more mild mode / less strict mode when the code is parsed to omit most of the variables instead of showing them as errors. It could be done just by specifying the lower bound of similarity for typos and consider words that are too much different from the dictionary as variables. It would be nice to be able to adjust this similarity value E [0, 1] for particular project to control false alarms vs precision.
A [pseudo] example of the implementation can be checked here.
The text was updated successfully, but these errors were encountered: