+-----------------------+
| State Observation |
| (车辆位置、任务需求等状态) | ← [已正确包含状态信息]
+-----------------------+
|
v
+-----------------+
| Actor Network |
| (策略网络 πθ(s)) | ← [正确表示策略网络]
+-----------------+
|
+--------------------+|+--------------------+
| 概率: 本地计算 (p1) || 概率: 卸载到无人机 (p2) | ← [需添加地面服务器选项]
+--------------------+|+--------------------+
|
v
+--------------------------+
| 动作选择 (Action) |
| p1/p2 卸载决策 + 资源分配 | ← [需添加资源分配部分]
+--------------------------+
|
v
+--------------------------+
| 与环境交互 (Environment) |
| - 计算时延 (Delay) |
| - 资源分配 (Resources) | ← [正确的环境交互]
| - 奖励反馈 (Reward) |
+--------------------------+
|
v
+-----------------+
| Critic Network |
| (价值网络 V(s)) | ← [正确表示价值网络]
+-----------------+
|
v
+-----------------------------+
| PPO 更新策略: |
| 1. 策略目标: LCLIP(θ) | ← [正确的PPO更新机制]
| 2. 限制更新幅度 (Clip ε) |
| 3. 使用优势函数 A(s,a) |
+-----------------------------+