Skip to content
rocky edited this page Jan 29, 2018 · 15 revisions

Continuing in the series of how I fix parse bugs encountered.

Issue 151 ultimately was missing a grammar rule, and another too permissive grammar rule was covering for that.

Here are the steps I have took to address issue #151

Step one is get a small test case and compile it. The test case provided in the issue was:

def func(a, b, c):
    while a:
        if b:
            continue
        return False
    return True

To compile it one can use the compile-file.py program in python-uncompyle6/test/stdlib

   $ cd test/stdlib
   $ pyenv 2.7.14
   $ ./compile-file.py /tmp/issue-151.py
compiling /tmp/issue-151.py to /tmp/issue-151-2.7.pyc

Let's get detailed information in decomplation: assembly code, grammar reductions and the AST tree (which will fail below):

   $ ../../bin/uncompyle6 -agt /tmp/issue-151-2.7.pyc > /tmp/issue-151.log 2>&1

It so happens that uncompyle2 decompiles correctly. So let's run that and compare the grammar. The option that corresponds to t in uncompyle2 is --showast:

    $ uncompyle2 --showast /tmp/issue-151-2.7.pyc > /tmp/issue-151-good.log

Looking at the tree for that we see:

      whilestmt

            0	SETUP_LOOP        '26'
         testexpr
            testfalse
               expr
                  3	LOAD_FAST         'a'
               jmp_false
                  6	POP_JUMP_IF_FALSE '25'
         return_stmts
		 POP_BLOCK
		 COME_FROM

Here I compared the grammar reductions with the tree that uncompyle2 has. Unfortunately it doesn't have an option to track grammar reductions.

Here is how you track this.

In the /tmp/issue-151.log log will see you see

         3     expr ::= LOAD_FAST (2)
         3     ret_expr ::= expr (2)
         3     assert_expr ::= expr (2)
         6     jmp_false ::= POP_JUMP_IF_FALSE (3)
         3-6   testfalse ::= expr jmp_false (3)
         3     testexpr ::= testfalse (3)

That first LOAD_FAST matches the load fast shown in the tree above. The token number, 2 is given in parenthesis. (The first instruction is SETUP_LOOP).

So we have expr ::= LOAD_FAST (2) in /tmp/issue-151.log and if you look in the tree you can find expr with a child LOAD_FAST. Note the 3 in both places refers to offset 3. Then a couple of lines down in /tmp/issue-151.log you will see 6 jmp_false ::= POP_JUMP_IF_FALSE (3). And you will see in the tree jmp_false has a child POP_JUMP_IF_FALSE in the tree. Then after that in /tmp/issue-151.log there is 3-6 testfalse ::= expr jmp_false (3). The 3-6 means that this rule spans offset 3 to 6 in bytecode instructions. But token number 3 in parenthesis is the last token that we have seen. In particular we have seen SETUP_LOOP LOAD_FAST POP_JUMP_IF_FALSE.

The reduction rules ret_expr and assert_expr don't appear in the tree. That's okay. What's going on is that there are things the grammar considered as possibilities. They just didn't pan out though.

Tracing on like this I finally get in issue-151-bad.log this:

L.  5:  21     expr ::= LOAD_GLOBAL (9)
L.  5:  21     ret_expr ::= expr (9)
L.  5:  21     assert_expr ::= expr (9)
L.  5:  21-24  return ::= ret_expr RETURN_VALUE (10)
L.  3:   9-24  returns ::= _stmts return (10)
L.  5:  21     stmt ::= return (10)
         3-24  returns ::= _stmts return (10)

this is where things start to go awry...
L.  3:   9     l_stmts ::= returns (10)
L.  3:   9-24  _stmts ::= _stmts stmt (10)
         3-24  _stmts ::= _stmts stmt (10)
         3     l_stmts ::= returns (10)
L.  3:   9     l_stmts_opt ::= l_stmts (10)

After the returns ::= _stmts return (10) is where we expect to see a rule like whilestmt ::= SETUP_LOOP testexpr return_stmts. Also notice that the returns starts on line three, token 9 which encompases things that should be in testexpr.

To address the grabbiness I changed:

 iflaststmtl     ::= testexpr c_stmts_opt

to:

iflaststmtl     ::= testexpr c_stmts

Okay, so I guess now I got to explain what some of the nonterminal means.

First returns. This is from uncompyle6/parser.py:

        # The "returns" nonterminal is a sequence of statements that ends in a RETURN statement.
        # In later Python versions with jump optimization, this can cause JUMPs
        # that would normally appear to be omitted.

        returns ::= return
        returns ::= _stmts return

In uncompyle6 it is called return_stmts. To try to keep nonterminal names closer to the Python AST names change the name to returns. The AST symbol would be Return.

The nonterminal iflaststmtl an "if" statement which is inside a loop (that's the "l" suffix. I think the "last" part means it is the last statement of the loop.

The nonterminl c_stmts are statements that can contain a continue statement in them. And when there is an _opt suffix it means it is optional.

So it seems unusual that iflastmtl could just contain some sort of test without a body. It's actually more complicated than that because that might not be true for for other things. So this change might cause a bug in the future.

Lastly, we add a new test case for this particular code. Run inside test directory::

$ cp /tmp/issue-151.py simple_source/bug27\+/05_while_if_continue.py
$ git add simple_source/bug27+/05_while_if_continue.py
$ ./add-test.py simple_source/bug27+/05_while_if_continue.py
$ git add -f bytecode_2.7/05_while_if_continue.pyc

Ideally though we need a whole separate process for figuring out control flow.

(adding some comments to the beginning of 05_while_if_continue.py to describe what's up.