-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsubtask_tree.py
206 lines (170 loc) · 9.4 KB
/
subtask_tree.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
import json
import base64
from openai import OpenAI
def generate_subtask_tree(llm_api_key, image_path, prompt):
client = OpenAI(api_key=llm_api_key)
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode("utf-8")
mime_type = "image/jpeg"
if image_path.lower().endswith(".png"):
mime_type = "image/png"
message = f"""You are an advanced reasoning model responsible for decomposing a given image editing task into a structured subtask tree. Your task is to generate a well-formed subtask tree that logically organizes all necessary steps to fulfill the given user prompt. Below are key guidelines and expectations:
1. Understanding the Subtask Tree
A subtask tree is a structured representation of how the given image editing task should be broken down into smaller, logically ordered subtasks. Each node in the tree represents a subtask which is involved in the prompt, and edges represent the ordering like which subtask needs to be completed before or after which.
Each node of the tree represents the subtasks required to complete the task. The tree ensures that all necessary operations are logically ordered, meaning a subtask that depends on another must appear after its dependency.
2. Steps to Generate the Subtask Tree
• Step 1: Identify all relevant subtasks needed to fulfill the given prompt.
• Step 2: Ensure that each subtask is logically ordered, meaning operations dependent on another should be placed later in the path.
• Step 3: Each subtask should be uniquely labeled based on the object it applies to and of the format (Obj1 -> Obj2) where obj1 is to be replaced with obj2 and in case of recoloring (obj -> new color) while with removal just include (obj) which is to be removed. Example: If two objects require replacement, the subtasks should be labeled distinctly, such as Object Replacement (Obj1 -> Obj2).
• Step 4: A tree may involve multiple correct paths like subtask1 and subtask2 are not related on each other so one path can be subtask1->subtask2 and another can be subtask2->subtask1. For such cases the occurrence of these subtasks are repeated. eg. here subtask1 is appearing twice once as a parent node once as a child node. In such cases, you should number them like suubtask1(1) and subtask1(2) and subtask2(1) and subtask2(2) so now it will look like subtask1(1)->subtask2(2), subtask2(1)->subtask1(2)
• Step 5: There also might be multiple possible subtasks for a particular requirement like if a part of task is to replace the cat with a pink dog then the two possible ways are Object Replacement (cat-> pink dog) and another is Object Replacement (cat->dog) -> Object Recoloration (dog->pink)
3. Logical Constraints & Dependencies
When constructing the tree, keep in mind that you take care of the order as well like if a task involves replacing an object with something and then doing some operation on the new object then this operation should always be after the object replacement for this object since we cannot do the operation on the new object till it is actually created and in the image.
4. Input Format
The LLM will receive:
1. An image.
2. A text prompt describing the editing task.
3. A predefined list of subtasks the model supports (provided below).
Supported Subtasks
Here is the complete list of subtasks available for constructing the subtask tree:
Object Detection
Object Segmentation
Object Addition
Object Removal
Background Removal
Landmark Detection
Object Replacement
Image Upscaling
Image Captioning
Changing Scenery
Object Recoloration
Outpainting
Depth Estimation
Image Deblurring
Text Extraction
Text Replacement
Text Removal
Text Addition
Text Redaction
VQuestion Answering based on text
Keyword Highlighting
Sentiment Analysis
Caption Consistency Check
Text Detection
You must strictly use only these subtasks when constructing the tree.
5. Expected Output Format
The model should output the subtask tree in structured JSON format, where each node contains:
• Subtask Name (with object label if applicable)
• Parent Node (Parent node of that subtask)
• Execution Order (logical flow of tasks)
6. Example Inputs & Expected Outputs
Here are some example prompts along with the expected subtask trees:
Example 1
Input Prompt:
“Detect the pedestrians, remove the car and replacement the cat with rabbit and recolor the dog to pink.”
Expected Subtask Tree:
{{
"task": "Detect the pedestrians, remove the car and replacement the cat with rabbit and recolor the dog to pink",
"subtask_tree": [
{{
"subtask": "Object Detection (Pedestrian)(1)",
"parent": []
}},
{{
"subtask": "Object Removal (Car)(2)",
"parent": ["Object Detection (Pedestrian)(1)"]
}},
{{
"subtask": "Object Replacement (Cat -> Rabbit)(3)",
"parent": ["Object Removal (Car)(2)"]
}},
{{
"subtask": "Object Replacement (Cat -> Rabbit)(4)",
"parent": ["Object Detection (Pedestrian)(1)"]
}},
{{
"subtask": "Object Removal (Car)(5)",
"parent": ["Object Replacement (Cat -> Rabbit)(4)"]
}},
{{
"subtask": "Object Recoloration (Dog -> Pink Dog)(6)",
"parent": ["Object Replacement (Cat -> Rabbit)(3)", "Object Removal (Car)(5)"]
}}
]
}}
Example 2
Input Prompt:
“Update the closed signage to open while detecting the trash can and pedestrian crossing for better scene understanding. Also, remove the people for clarity.”
Expected Subtask Tree:
{{
"task": "Update the closed signage to open while detecting the trash can and pedestrian crossing for better scene understanding. Also, remove the people for clarity.",
"subtask_tree": [
{{
"subtask": "Text Replacement (CLOSED -> OPEN)(1)",
"parent": []
}},
{{
"subtask": "Object Detection (Pedestrian Crossing)(2)",
"parent": ["Text Replacement (CLOSED -> OPEN)(1)"]
}},
{{
"subtask": "Object Detection (Trash Can)(3)",
"parent": ["Text Replacement (CLOSED -> OPEN)(1)"]
}},
{{
"subtask": "Object Detection (Pedestrian Crossing)(4)",
"parent": ["Object Detection (Trash Can)(3)"]
}},
{{
"subtask": "Object Detection (Trash Can)(5)",
"parent": ["Object Detection (Pedestrian Crossing)(2)"]
}},
{{
"subtask": "Object Removal (People)(6)",
"parent": ["Object Detection (Pedestrian Crossing)(4)", "Object Detection (Trash Can)(5)"]
}}
]
}}
7. Your Task
Now, using the given input image and prompt, generate a well-structured subtask tree that adheres to the principles outlined above.
• Ensure logical ordering and clear dependencies.
• Label subtasks by object name where needed.
• Structure the output as a JSON-formatted subtask tree.
Input Details
• Image: input image
• Prompt: ["{prompt}"]
• Supported Subtasks: (See the list above)
Now, generate the correct subtask tree.
Before you generate the tree you need to make sure that for every path possible in the subtask tree all the subtasks in that tree are covered and none are skipped. Also if a prompt involves something related to object replacement then just have that you dont need to think about its prerequisites like detecting it or anything bcz it already covers it. Also make sure that there is only one start node so that would mean that there would be only one subtask which doesnt have any parent node. Also, things like the outpainting should be always at first because they will add more details to image which might be related to other subtasks. Also, it is not necessary that the path you choose has the same order as the order of subtasks mentioned in prompt. It can be different based on some dependencies which you might think are there and there might be multiple possible sequences as well like subtask1(1) -> subtask2(1) -> subtask3(1) and subtask1(1) -> subtask3(2) -> subtask2(2) but you need to make sure it is not very long. also if there is a subtask which is there at the exact same location in different possible paths then its number will be same like in this example subtask1 was always at the same location so its number was always same"""
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": message + "\n\nYou must respond in a valid JSON format. Do not include any extra text before or after the JSON output."
},
{
"type": "image_url",
"image_url": {
"url": f"data:{mime_type};base64,{base64_image}"
}
}
]
}
]
response = client.chat.completions.create(
model="o1",
messages=messages,
max_tokens=2000
)
content = response.choices[0].message.content
if content.startswith("```json"):
content = content[7:-3]
try:
return json.loads(content)
except json.JSONDecodeError as e:
print("Failed to parse response:")
print("Raw content:", content)
raise ValueError(f"Invalid JSON response: {str(e)}") from e
return subtask_tree