-
-
Notifications
You must be signed in to change notification settings - Fork 431
Fixing Issue #151
Continuing in the series of how I fix parse bugs encountered.
Issue 151 ultimately was missing a grammar rule, and another too permissive grammar rule was covering for that.
Here are the steps I have took to address issue #151
Step one is get a small test case and compile it. The test case provided in the issue was:
def func(a, b, c):
while a:
if b:
continue
return False
return True
To compile it one can use the compile-file.py program in python-uncompyle6/test/stdlib
$ cd test/stdlib
$ pyenv 2.7.14
$ ./compile-file.py /tmp/issue-151.py
compiling /tmp/issue-151.py to /tmp/issue-151-2.7.pyc
Let's get detailed information in decomplation: assembly code, grammar reductions and the AST tree (which will fail below):
$ ../../bin/uncompyle6 -agt /tmp/issue-151-2.7.pyc > /tmp/issue-151.log 2>&1
It so happens that uncompyle2
decompiles correctly. So let's run that
and compare the grammar. The option that corresponds to t
in uncompyle2
is --showast
:
$ uncompyle2 --showast /tmp/issue-151-2.7.pyc > /tmp/issue-151-good.log
Looking at the tree for that we see:
whilestmt
0 SETUP_LOOP '26'
testexpr
testfalse
expr
3 LOAD_FAST 'a'
jmp_false
6 POP_JUMP_IF_FALSE '25'
return_stmts
POP_BLOCK
COME_FROM
Here I compared the grammar reductions with the tree that uncompyle2 has. Unfortunately it doesn't have an option to track grammar reductions.
Here is how you track this.
In the /tmp/issue-151.log
log will see you see
3 expr ::= LOAD_FAST (2)
3 ret_expr ::= expr (2)
3 assert_expr ::= expr (2)
6 jmp_false ::= POP_JUMP_IF_FALSE (3)
3-6 testfalse ::= expr jmp_false (3)
3 testexpr ::= testfalse (3)
That first LOAD_FAST
matches the load fast shown in the tree above. The token number, 2
is given in parenthesis. (The first instruction is SETUP_LOOP
).
So we have expr ::= LOAD_FAST (2)
in /tmp/issue-151.log
and if you
look in the tree you can find expr
with a child LOAD_FAST
. Note
the 3 in both places refers to offset 3. Then a couple of lines down
in /tmp/issue-151.log
you will see 6 jmp_false ::= POP_JUMP_IF_FALSE (3)
. And you will see in the tree jmp_false
has a
child POP_JUMP_IF_FALSE
in the tree. Then after that in
/tmp/issue-151.log
there is 3-6 testfalse ::= expr jmp_false (3)
. The 3-6 means that this rule spans offset 3 to 6 in bytecode
instructions. But token number 3 in parenthesis is the last token that we have seen. In
particular we have seen SETUP_LOOP LOAD_FAST POP_JUMP_IF_FALSE
.
The reduction rules ret_expr
and assert_expr
don't appear in the
tree. That's okay. What's going on is that there are things the
grammar considered as possibilities. They just didn't pan out though.
Tracing on like this I finally get in issue-151-bad.log
this:
L. 5: 21 expr ::= LOAD_GLOBAL (9)
L. 5: 21 ret_expr ::= expr (9)
L. 5: 21 assert_expr ::= expr (9)
L. 5: 21-24 return ::= ret_expr RETURN_VALUE (10)
L. 3: 9-24 returns ::= _stmts return (10)
L. 5: 21 stmt ::= return (10)
3-24 returns ::= _stmts return (10)
this is where things start to go awry...
L. 3: 9 l_stmts ::= returns (10)
L. 3: 9-24 _stmts ::= _stmts stmt (10)
3-24 _stmts ::= _stmts stmt (10)
3 l_stmts ::= returns (10)
L. 3: 9 l_stmts_opt ::= l_stmts (10)
After the returns ::= _stmts return (10)
is where we expect to see a rule like
whilestmt ::= SETUP_LOOP testexpr return_stmts
. Also notice that the returns
starts
on line three, token 9 which encompases things that should be in testexpr
.
To address the grabbiness I changed:
iflaststmtl ::= testexpr c_stmts_opt
to:
iflaststmtl ::= testexpr c_stmts
Okay, so I guess now I got to explain what some of the nonterminal means.
First returns
. This is from uncompyle6/parser.py
:
# The "returns" nonterminal is a sequence of statements that ends in a RETURN statement.
# In later Python versions with jump optimization, this can cause JUMPs
# that would normally appear to be omitted.
returns ::= return
returns ::= _stmts return
In uncompyle6 it is called return_stmts
. To try to keep nonterminal names closer to the
Python AST names change the name to returns
. The AST symbol would be Return
.
The nonterminal iflaststmtl
an "if" statement which is inside a loop
(that's the "l" suffix. I think the "last" part means it is the last statement of the loop.
The nonterminl c_stmts
are statements that can contain a continue
statement in them. And when there is an _opt
suffix it means it is optional.
So it seems unusual that iflastmtl
could just contain some sort of
test without a body. It's actually more complicated than that because that might not be
true for for other things. So this change might cause a bug in the future.
Lastly, we add a new test case for this particular code. Run inside test
directory::
$ cp /tmp/issue-151.py simple_source/bug27\+/05_while_if_continue.py
$ git add simple_source/bug27+/05_while_if_continue.py
$ ./add-test.py simple_source/bug27+/05_while_if_continue.py
$ git add -f bytecode_2.7/05_while_if_continue.pyc
Ideally though we need a whole separate process for figuring out control flow.
(adding some comments to the beginning of 05_while_if_continue.py
to describe what's up.