Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 5.20 signatures #194

Closed
shumphrey opened this issue Sep 21, 2016 · 38 comments
Closed

Support 5.20 signatures #194

shumphrey opened this issue Sep 21, 2016 · 38 comments

Comments

@shumphrey
Copy link

shumphrey commented Sep 21, 2016

Currently PPI treats perl 5.20 signatures as PPI::Token::Prototype

I had a look through the code to see how this works.
PPI::Token::Whitespace returns a Prototype if the previous tokens imply it should be.

What is the best approach to support Signatures? Should there be a PPI::Token::Signature and should Prototype.pm return 'Signature' if it encounters a non prototype character? [^\$\@\%\\;]

@iynehz
Copy link

iynehz commented May 18, 2017

+1

This issue is affecting Perl::Critic Perl-Critic/Perl-Critic#591

@glauschwuffel
Copy link

Hi,

are there any news on this topic? I'd implement it if the approach described above is the right one.

@kberov
Copy link

kberov commented Jan 24, 2018

For Perl::Critic to not complain when signatures are used, the workaround is still to put
[-Subroutines::ProhibitSubroutinePrototypes] in .perlcriticrc.

@glauschwuffel
Copy link

So, is the approach described above the way to go? If so, I'd go ahead and give it a try.

@kberov
Copy link

kberov commented Jan 26, 2018

@glauschwuffel I don't know.
@adamkennedy should know or someone involved in the project.

@wchristian
Copy link
Member

I'll have a think on this over the weekend.

@EvanCarroll
Copy link

Any progress on this?

@wchristian
Copy link
Member

I never said which weekend. 😆

That said, we have THUNK on it.

And it's pretty fucked. In the completely context-free case there is no accurate way to determine whether it's a prototype of a signature, due to:

  • signatures can contain arbitrary valid perl code
  • prototypes can contain arbitrary characters of almost any type (see Web::Simple)
  • the most common forms of either will look like sub meep (xxxx) {, so any other distinguishing features can be ignored as they'll be edge cases

Thus as far as the goal of "allow the policy to determine whether to complain about it being an old-style policy" goes, PPI can't actually help.

The primary way you should go about it is try and have the policy check for signatures being imported into the code space, at which point all prototypes ARE signatures, and prototypes look like attributes.

PPI could maybe be extended to answer the question if a prototype smells like a signature, which could be a number of factors like:

  • doesn't look like a strict, documented, prototype
  • looks a lot like a basic signature
  • is contained within blocks into which the signatures feature was imported

It must also be kept in mind though that import analysis is fraught on account of imports being chainable through use()/require() of other modules and even the possibility of function calls that use()/require() modules into the caller space.

So, thoughts on this?

@EvanCarroll
Copy link

EvanCarroll commented Aug 15, 2019

Being that signatures won't be experimental forever, perhaps we could simplify smells and look for non-sigils

  • if so, assume it's a signature
  • if not, assume it's a prototype

If this acceptable I'll look at patching.

@kberov
Copy link

kberov commented Aug 15, 2019

"… look for non-sigils…"

  • if so, assume it's a signature
  • if not, assume it's a protype

@jtk18
Copy link

jtk18 commented Aug 15, 2019

I was researching a fix for this issue and I stumbled upon this text here

To avoid ambiguity, when signatures are enabled the special syntax for prototypes is disabled. There is no attempt to guess whether a parenthesised group was intended to be a prototype or a signature.

So, according to that blurb, prototypes have a different syntax when signatures are enabled. I created the following tests:

#!/usr/bin/env perl

use strict;
use warnings;

sub blah($) {
}

blah();

Which outputs:

Not enough arguments for main::blah at ./proto.pl line 9, near "()"
Execution of ./proto.pl aborted due to compilation errors.
#!/usr/bin/env perl

use strict;
use warnings;

use feature qw(signatures);

sub blah($) {
}

blah();

Which outputs:

The signatures feature is experimental at ./signatures.pl line 8.
Too few arguments for subroutine 'main::blah' at ./signatures.pl line 11.

One would have to do a deeper dive for a better confirmation, but this at least shows that enabling the signatures feature changes what message is outputted.

@toddr
Copy link

toddr commented May 5, 2021

Seems like PPR fixed it. It may be that we can only solve 98% of this. That would be better than nothing at all. Perfect is the enemy of good enough.

https://rt.cpan.org/Public/Bug/Display.html?id=125616

@oalders
Copy link
Collaborator

oalders commented May 5, 2021

I think I would be happy with "good enough" on this issue.

@FGasper
Copy link

FGasper commented May 13, 2021

Hi all.

What is needed for motion on this one? This is a continual source of “itch”.

@toddr
Copy link

toddr commented May 13, 2021

Probably a Pull request?

@toddr
Copy link

toddr commented May 13, 2021

@oalders submitted this #257

@oalders
Copy link
Collaborator

oalders commented May 13, 2021

That basically just tries to prevent the file from being reformatted when signatures get parsed. It doesn't do the heavy lifting of actually trying to make sense of the signature.

@jackdeguest
Copy link

Any update on this?
Because a simple code like my $k = sub ($f = foo()) {}; would turn into this PPI structure where PPI fails to recognise this anonymous subroutine with a signature.

PPI::Document
  PPI::Statement::Variable
    PPI::Token::Word  	'my'
    PPI::Token::Whitespace  	' '
    PPI::Token::Symbol  	'$k'
    PPI::Token::Whitespace  	' '
    PPI::Token::Operator  	'='
    PPI::Token::Whitespace  	' '
    PPI::Token::Word  	'sub'
    PPI::Token::Whitespace  	' '
    PPI::Token::Prototype  	'($f = foo()'
  PPI::Statement::UnmatchedBrace
    PPI::Token::Structure  	')'
  PPI::Token::Whitespace  	' '
  PPI::Statement
    PPI::Structure::Constructor  	{ ... }
    PPI::Token::Structure  	';'

@oalders
Copy link
Collaborator

oalders commented Mar 29, 2022

We need:

  1. A proposal on how to do this so that we can get feedback from @wchristian on whether the approach would be acceptable
  2. A volunteer to do the work

@toddr
Copy link

toddr commented Jun 27, 2022

@oalders now PPI has been resolved does something need to be done here or are we ok to close this?

@oalders
Copy link
Collaborator

oalders commented Jun 27, 2022

@toddr nothing has really changed since my previous comment. PPI no longer changes the formatting of signatures when saving a document (which is a good step forward), but it still doesn't really know what a signature is.

My understanding is that this would be a non-trivial (but not impossible) task. I don't know what the status of @wchristian is these days but in the absence of input from @wchristian I would say that if someone can come up with a good proposal and a couple of others could review it before implementation, then we could take this on.

With the way things are moving, I think it's going to be an increasing pain point that PPI can't really deal with this. If there's some company that could donate a developer or sponsor someone to do the work, that might get it moving. (That's not directed at any company in particular).

@EvanCarroll
Copy link

EvanCarroll commented Jun 27, 2022

I would say that if someone can come up with a good proposal and a couple of others could review it before implementation, then we could take this on.

Is this a good proposal:

  • If the string use v5.36; is detected, only interpret prototypes as that which is provided to :prototype.
  • Assume all things in parenthesis in a declaration before definition is a signature.

Example

sub foo :prototype(prototype context) { }
sub bar (signature context) { }
sub baz :prototype(prototype context) (signature context) { }

@oalders
Copy link
Collaborator

oalders commented Jun 27, 2022

There will likely be a lot of cases where PPI needs to parse something, but it doesn't have the use statements, since it can parse fragments of code. I think that has historically been the biggest blocker.

@EvanCarroll
Copy link

EvanCarroll commented Jun 27, 2022

If so, that sounds problematic since a pragma, especially features and experimental, can change the parsing semantics of anything. Perhaps that should be a bigger issue that we look at then.

As to the "fragments of code" I don't see this as being relevant. If the fragment includes the use v5.36 the parsing semantics are changed. So even if PPI parses fragments it would seem the fragment must include the use statement which changes semantics? That is to say, a fragment will parse differently depending on the document it's contained in, so I'm not sure how a fragment can be useful (unless it includes the environment).

@oalders
Copy link
Collaborator

oalders commented Jun 27, 2022

I pass fragments of code to https://metacpan.org/dist/App-PPI-Dumper/view/script/ppi_dumper pretty regularly. If I want to see how PPI parses something without including hundreds of lines of code, I can do that with this tool. There's nothing to stop me from passing

sub foo ($one, $two) { ... }
PPI::Document
  PPI::Statement::Sub
    PPI::Token::Word    'sub'
    PPI::Token::Whitespace      ' '
    PPI::Token::Word    'foo'
    PPI::Token::Whitespace      ' '
    PPI::Token::Prototype       '($one, $two)'
    PPI::Token::Whitespace      ' '
    PPI::Structure::Block       { ... }
      PPI::Token::Whitespace    ' '
      PPI::Statement
        PPI::Token::Operator    '...'
      PPI::Token::Whitespace    ' '
  PPI::Token::Whitespace        '\n'

That's a valid use case.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 27, 2022

PPI cannot depend on context to determine if the signatures feature is enabled. Its only possible job is to successfully parse them as a prototype-or-signature. It's up to perlcritic rules and other higher level code to have context based heuristics to make a further determination.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 27, 2022

There is currently no foolproof method for determining the parsing state of dynamic Perl code with static parsing, which is a common issue with PPI based parsing. There's really nothing that can be done about that except come up with something smarter yet still static, which LeoNerd has had thoughts on.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 27, 2022

There will likely be a lot of cases where PPI needs to parse something, but it doesn't have the use statements, since it can parse fragments of code. I think that has historically been the biggest blocker.

and more commonly, the signatures feature can be enabled by any use statement or BEGIN block. PPI can't possibly approach this problem.

@EvanCarroll
Copy link

EvanCarroll commented Jun 27, 2022

and more commonly, the signatures feature can be enabled by any use statement or BEGIN block. PPI can't possibly approach this problem.

At this point, you may as well just argue against features, experimental, and Perl's versioning. In a different world, we'd not have any of this, and a new version of Perl like 5.32 would just break backwards compat with older perls, and the parsing semantic would obviously be such that it must be different between two versions of Perl.

Now we have a model where we don't have to break backwards compat with older perl. It seems like an awkward position to argue from that we should NOT be able to statically analyze Perl under newer semantics because we either,

  1. Allow the features to be turned on a different way.
  2. Allow the features to be turned off afterward (in the same package).

By any degree, this is another case where Perl in striving to meet the 0.0001% of use cases missing what the users want and moreover what they'd already expect to happen.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 27, 2022

I'm not arguing against anything. This is a PPI-specific problem due to Perl's dynamic parser, and it can't be fixed by hoping.

@oalders
Copy link
Collaborator

oalders commented Jun 27, 2022

By any degree, this is another case where Perl in striving to meet the 0.0001% of use cases missing what the users want and moreover what they'd already expect to happen.

Right, maybe there's a middle ground somewhere. For my use case (at $work), I can basically assume that any internal code is either a) using signatures or b) at least not using prototypes. I'd like for some way to let PPI know about this. I don't know if that's an ENV var or an argument to the constructor or both, but it would be nice to have the option of forcing PPI to DTRT, even if it doesn't have the full context of the code it's dealing with.

@wchristian
Copy link
Member

@EvanCarroll PPI will not, ever, abandon the ability to successfully parse even partial, truncated, corrupted or faulty Perl Documents. The documentation explains why: https://metacpan.org/pod/PPI#DESCRIPTION

If what you want requires abandoning that ability, then you will need to use a different module. Maybe PPR?

Outside of that i am happy to consider proposals and pull requests for how to extend PPI functionality without breaking that ability. Olafs suggestions in the post above strike me as useful, and might even be worth implementing as experimental features.

On the other hand, as @Grinnz correctly indicates, by parsing a sig-or-prot into a relevant ambiguous object, code that actually processes the PPI tree can decide on its own how to treat it. As long as PPI communicates back that a document part could be a signature, the wrapping code can decide that it will indeed treat it as one.

While i'm thinking about it, i think the first useful step would be for PPI to TRY to parse any prototype also as a signature, and if successful, indicate in the prototype object that a signature parse is available.

@EvanCarroll
Copy link

EvanCarroll commented Jun 28, 2022

even partial, truncated, corrupted or faulty Perl Documents

Respectfully, I don't understand what that means. To avoid this discussion which clearly is political, let's assume Perl 5.40 (or whatever) adds Corina. In doing so Perl will see role, class, method elevated to keywords, and a whole new parsing syntax for these keywords. This will almost certainly be hidden behind a feature flag. From a mere fragment of code from Perl 5.40, you will never know whether or not these keywords are provided by Corina or they are made by Depry McDerpFace in a seldom-used Object Constructor module from 1995.

You're again given a few options,

  • Assume they're not Corina, fold your deck, and say you can't tell whether or not these keywords correspond to Corina.
  • Assume they are Corina.
  • Use the information from the feature flags.

Again, I don't see how this is disputable. I know I'm talking to smart people here. But when feature flags change parsing semantics, abstractly it doesn't make sense to parse a fragment divorced from them.

To the extent that you can tinker with the underlying parsing semantics without using the feature flag, the simple answer there is don't support that. That's a much simpler answer then not supporting the language semantics as they're intended to be used.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2022

This is not a political discussion. We are explaining PPI does not and cannot work with context. There is simply no possibility of acknowledging feature flags with PPI itself.

@adamkennedy
Copy link
Collaborator

  • Assume they are Corina.

If the presence of the barewords role, class and method don't make any real sense in a document OTHER than as the new keywords for most documents most of the time, I'd probably go down the assume they are Corina direction.

Although we never had use statements alter semantics, it's never been completely impossible to do so.

There's really only 3 things that are completely non-negotiable. You can't alter tokenization, so Acme::Bleach and Acme::Pony are out of the question, you can't alter nested structure construction, and you can't alter statement boundaries.

Theoretically, with sufficient work and willingness for some perf hits, from memory I always thought it was possible in principle to type the structures and statements once they had been assembled. But in this case, typing the following doesn't really work.

class {
   ...
}

That's why the lookalike modules always had to put a semi-colon at the end of the look-alike things, because they were really a function, a block parameter, and a statement terminator.

So it's not completely off limits, but I think in practice you are fucked.

@elcamlost
Copy link

Probably, PPI can accept additional options, like "signatures_enabled => 1"? perl-critic and other tools can work with context and pass context knowledge to PPI.

The main goal of this feature for me is enable critic policy that signatures are required. Not the creating universal abitlity to detect if signatures are enabled or not for any code snippet passed to PPI.

So maybe this problem can be solved for bounded context which makes sense for the majority of users.

@wchristian
Copy link
Member

@elcamlost Please review what i said here: #194 (comment) :) The more i think about it, honestly, i find that an option like signatures_enabled is at the wrong level of fine-grained-ness, and an object indicating parse ambiguity, leaving it to the consuming code to decide how to handle it, would be the correct decision.

@EvanCarroll I'll be blunt, while i haven't been able to put much time into PPI recently due to various reasons which ARE politically grounded (hello perl foundation), the limitations of PPI that i explained are not political at all, but raw and technical. As the current primary maintainer i have put many painful hours into considering the nature of Perl parsing and how to adjust to changing realities without breaking existing use cases. That said, yes, keywords are a valid question, but also a completely different beast from signatures.

I've created a ticket with a suggestion for that here: #273

@wchristian
Copy link
Member

I'm closing this, as we now have a proof-of-concept: #280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests