Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Dev shadow #39

Open
wants to merge 6 commits into
base: execution
Choose a base branch
from
Open

Dev shadow #39

wants to merge 6 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 4, 2023

本pr验证了方案三(即”随行记录“方案)的可行性

@ghost
Copy link
Author

ghost commented May 4, 2023

两种设计思路的对比

上次会议讨论了两条思路:1)模拟运行;2)代码回放。
在后来的迭代中,”代码回放“方案被优化成了”随行记录“方案。

”模拟运行“简要思路

  1. 初次执行python函数时,首先在python解释器沙盒里模拟执行该python函数的PyCodeObject,生成一份等价的包含静态图的第二份PyCodeObject和check_fn,该check_fn作为该静态图的约束。同一个python函数可能对应着多分静态图,而静态图的约束应当被理解为缓存的key,满足约束即类似命中缓存key。然后,直接运行第二份PyCodeObject
  2. 再次执行python函数时,从缓存的一些列静态图中通过check_fn检查得到该计算图所对应的再次执行python函数时,从缓存的一些列静态图中通过check_fn检查得到该计算图所对应的第二份PyCodeObject,然后直接执行它。。

”随行记录“简要思路

  1. 初次执行python函数时,首先编译该函数的PyCodeObject,生成第二份PyCodeObject第二份PyCodeObject相对于原始的PyCodeObject而言,多了”随行记录“的埋点代码。这些埋点代码非常密集,原始的PyCodeObject中的每条指令的前后都可能被插入”随行记录“代码。然后,执行第二份PyCodeObject,于是触发上述”随行记录“代码。这些随行记录代码在执行过程中最终生成了第三份PyCodeObject和check_fn,缓存起来供以后使用。
  2. 再次执行python函数时,从缓存的一系列静态图中通过check_fn检查得到该计算图所对应的第三份PyCodeObject,然后直接执行。

二者的主要差异

  1. ”模拟运行“方案需要实现python的解释器沙盒,"随行记录"方案不需要。
  2. ”模拟运行“方案只生成一份PyCodeObject,而”随行记录“方案会生成两份PyCodeObject。
  3. ”模拟运行“方案中的静态图在第一次运行之前就会生成,这意味着第一次运行就能执行到静态图代码。”随行记录“方案的静态图是在第一运行之后生成的,这意味着第一次运行必须走eager执行,第二次及以后才能执行到静态图代码。

@ghost
Copy link
Author

ghost commented May 4, 2023

实现思路

  1. 使用SymbolicTranslatorhttps://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-bcbe5aefbe1c4ef064e7cc9edf6fcc28e802e63c166a5bad27b3b69591a5ac7aR14 )来生成第二份PyCodeObject
  2. 使用SymbolicExecutorhttps://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-15f4ccc8aa6d7be0845f457ad6ab33714bcba81605e7a17de18bc63115b7c077R11 )来生成第三份PyCodeObjectSymbolicExecutor有两个子类:a) NormalSymbolicExecutorhttps://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-3f203b475efba052ac4d04e88ea0345b37eeb234733bd137d0192f9306334dbfR6 ); b) InitialSymbolicExecutor https://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-7471e7e10cbef893aacf85c362aaad67890d30a8d906ec0fa88721b70dd0eb59R6 )。前者用于调用栈的深层函数调用,后者用于调用栈的第一层函数调用。
  3. SymbolicTranslator所生成的PyCodeObject会包含SymbolicExecutor的随行代码,以供运行时被触发。
  4. InitialSymbolicExecutor.pre_RETURN_VALUE 函数(https://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-7471e7e10cbef893aacf85c362aaad67890d30a8d906ec0fa88721b70dd0eb59R12 )会最终生成第三份PyCodeObject,其将被写入到 SymbolicTranslatorCache https://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-d2632668f941b376bba39c2f79c6b5fc1ff701cb2052cec22bf2bc70515a046e
  5. eval_frame_callback函数(https://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-3f1202a855c3cbe633a7c73e3071ca1f477ec4c676ace6e3be840c4871f36c0bR12 )中,类似”模型运行“方案的InstructionTranslatorCache的()(frame)位置,放置了SymbolicTranslatorCache()()作为”随行记录“方案的代码入口。

@ghost
Copy link
Author

ghost commented May 4, 2023

运行结果示例

单测https://github.com/2742195759/paddle-symbolic-trace/pull/39/files#diff-86d0716744133ffd6c4b58c1f87ae1df82bf5ecdab579a976fb56d129c5c4a11R20 的运行结果如下:

======================================== [ code object begin ] ========================================
 15           0 LOAD_GLOBAL              0 (InitialSymbolicExecutor_140393472447392)
              2 LOAD_CONST               2 (<code object simple at 0x7fafe70dd3a0, file "test_execution_base.py", line 15>)
              4 CALL_FUNCTION            1
              6 STORE_FAST               2 (symbolic_executor_140393472447392)

 16           8 LOAD_CONST               1 (2)
             10 LOAD_FAST                2 (symbolic_executor_140393472447392)
             12 LOAD_CONST               3 (0)
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 LOAD_FAST                0 (x)
             20 LOAD_FAST                2 (symbolic_executor_140393472447392)
             22 LOAD_CONST               4 (1)
             24 CALL_FUNCTION            1
             26 POP_TOP
             28 BINARY_MULTIPLY
             30 LOAD_FAST                2 (symbolic_executor_140393472447392)
             32 LOAD_CONST               5 (2)
             34 CALL_FUNCTION            1
             36 POP_TOP
             38 STORE_FAST               1 (ret)
             40 LOAD_FAST                2 (symbolic_executor_140393472447392)
             42 LOAD_CONST               6 (3)
             44 CALL_FUNCTION            1
             46 POP_TOP

 17          48 LOAD_FAST                1 (ret)
             50 LOAD_FAST                2 (symbolic_executor_140393472447392)
             52 LOAD_CONST               7 (4)
             54 CALL_FUNCTION            1
             56 POP_TOP
             58 LOAD_FAST                2 (symbolic_executor_140393472447392)
             60 LOAD_METHOD              1 (pre_action)
             62 LOAD_CONST               8 (5)
             64 CALL_METHOD              1
             66 POP_TOP
             68 RETURN_VALUE

Disassembly of <code object simple at 0x7fafe70dd3a0, file "test_execution_base.py", line 15>:
 16           0 LOAD_CONST               1 (2)
              2 LOAD_FAST                0 (x)
              4 BINARY_MULTIPLY
              6 STORE_FAST               1 (ret)

 17           8 LOAD_FAST                1 (ret)
             10 RETURN_VALUE
======================================== [ code object end ] ========================================
======================================== [ code object begin ] ========================================
 15           0 LOAD_GLOBAL              0 (SIR_0)
              2 LOAD_FAST                0 (x)
              4 BUILD_TUPLE              1
              6 CALL_FUNCTION            1
              8 RETURN_VALUE
======================================== [ code object end ] ========================================
Exception ignored in: <generator object _all_is_type.<locals>.<genexpr> at 0x7fafbf213c80>
Traceback (most recent call last):
  File "/root/workspace/paddle-symbolic-trace/symbolic_trace/opcode_translator/transform.py", line 13, in eval_frame_callback
    if not need_skip_path(frame.f_code.co_filename):
  File "/root/workspace/paddle-symbolic-trace/symbolic_trace/opcode_translator/skip_files.py", line 102, in need_skip_path
    if not filepath.startswith("<"):
SystemError: <method 'startswith' of 'str' objects> returned a result with an error set
I0504 09:31:23.127987  3480 interpretercore.cc:282] New Executor is Running.
.
----------------------------------------------------------------------
Ran 1 test in 0.032s

OK

上述两个PyCodeObject是因为那个单测继承自DoubleTestCase,该测试方法会对给定函数执行两次。我们从中可以看到第一次执行过程中使用了第二份PyCodeObject,而第二次执行过程中使用了第三份PyCodeObject。
备注:那个SystemError并非本pr引入,且已经在别的pr (PaddlePaddle/Paddle@affdb4a )里修复了。

instruction = self.instructions[instruction_index]
opname = instruction.opname
method_name = f"pre_{opname}"
if not hasattr(SymbolicExecutor, method_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_action 和 post_action 是互斥的么?我看这里判断条件是互斥的

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前我的理解是最好是互斥的。当然这个可以讨论。

Copy link
Author

@ghost ghost May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于CALL_FUNCTION一族的指令而言,pre_action和post_action应该是都需要的。

def pre_action(self, instruction_index):
instruction = self.frame.instructions[instruction_index]
method_name = f"pre_{instruction.opname}"
assert hasattr(self, method_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里实现了pre_RETURN_VALUE,是否所有的opcode类型都需要实现 pre_XXX 函数逻辑?还是按需的?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有的控制类包括跳转指令、RETURN_VALUE、YIELD_VALUE等都需要实现pre_XXX。其他实现post action就行。

@CLAassistant
Copy link

CLAassistant commented Jun 21, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Tian Chao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants