Skip to content

Commit 68463a8

Browse files
committed
simplify and proofread
1 parent 2190941 commit 68463a8

File tree

6 files changed

+31
-21
lines changed

6 files changed

+31
-21
lines changed
1.07 MB
Loading
1.16 MB
Loading
1.18 MB
Loading

docs/teaser_tldr.jpg

701 KB
Loading

index.html

+23-21
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,14 @@ <h5>ICML 2024</h5>
3030
</p>
3131
</font>
3232
<br>
33-
<img src="./docs/teaser.jpg" class="teaser-gif" style="width:100%;"><br>
34-
<h4 style="text-align:center"><em>Multi-Concept Prompt Learning (MCPL) pioneers the novel task of mask-free text-guided learning for multiple prompts from one scene. Our approach not only enhances current methodologies but also paves the way for novel applications, such as facilitating knowledge discovery through natural language-driven interactions between humans and machines. </em></h4>
33+
<img src="./docs/teaser_tldr.jpg" class="teaser-gif" style="width:100%;"><br>
34+
<!-- <h4 style="text-align:center"><em>Multi-Concept Prompt Learning (MCPL) pioneers the novel task of mask-free text-guided learning for multiple prompts from one scene. Our approach not only enhances current methodologies but also paves the way for novel applications, such as facilitating knowledge discovery through natural language-driven interactions between humans and machines. </em></h4> -->
35+
<h4 style="text-align:center"><em>TL;DR: We propose a framework that allows us to discover and manipulate multiple concepts in a given image with partial text instructions. </em></h4>
3536
</div>
3637

37-
<div class="content">
38+
<!-- <div class="content">
3839
<img style="width: 100%;" src="./docs/teaser.gif" alt="teaser.">
39-
</div>
40+
</div> -->
4041

4142
<div class="content">
4243
<h2 style="text-align:center;">Abstract</h2>
@@ -45,10 +46,10 @@ <h2 style="text-align:center;">Abstract</h2>
4546

4647
<div class="content">
4748
<h2>Learning Multiple Concepts from Single Image and Editing</h2>
48-
<h3>teddybear and skateboard example</h3>
49+
<h3>teddy bear and skateboard example</h3>
4950
<div class="cat-hat-main">
50-
<div class="intro-text"> Our method learn multiple new concepts and assuring disentangled and precise prompt-concept correlation (verified with per-prompt attention map). <br> </div>
51-
<div class="intro-text"> We can then modifying each local concept by replacing the prompts/words to generate novel images (click words below to try editing).<br> </div>
51+
<div class="intro-text"> Our method learns multiple new concepts and assures disentangled and precise prompt-concept correlation (click to view per-prompt attention maps). <br> </div>
52+
<div class="intro-text"> We can then modify each concept by replacing the prompts/words to generate novel images (click words below to try editing).<br> </div>
5253
</div>
5354
<br>
5455

@@ -83,7 +84,8 @@ <h3>banana and basket example</h3>
8384

8485
<div class="content">
8586
<h2>Discovering OOD Concepts from Medical Image and Disentangling</h2>
86-
</p>Our method opens an avenue for discovering/introducing new concepts the model have not seen before, from abundantly available natural language annotations such as paired textbook figures and captions. </p>
87+
<div class="body-text"> Our method opens an avenue for discovering/introducing new concepts the model have not seen before, from abundantly available natural language annotations such as paired textbook figures and captions. <br> </div>
88+
<br><br>
8789
<h3>cardiac MRI example</h3>
8890
<div class="cat-hat-main">
8991
<div class="intro-text"> We learn out-of-distribution concepts using biomedical figures and their simplified captions. <br> </div>
@@ -133,36 +135,36 @@ <h3>chest X-ray example</h3>
133135
</div>
134136

135137
<div class="content">
136-
<h2>Hypothesis generation of disease progression</h2>
137-
<p>Our method can also help experts/non-experts learn unfamiliar concepts from picture(s) and explore their impacts.</p>
138+
<h2>Hypothesis Generation of Disease Progression</h2>
139+
<div class="body-text"> Our method can also help experts/non-experts learn unfamiliar concepts from picture(s) and explore their impacts. <br> </div>
138140
<div class="skin3d-main">
139-
<img class="dev-img-demo" id="skin3d-img-ori" src="./docs/melanoma_skin_3d/ms_3d_demo.png">
141+
<img class="dev-img-demo" id="skin3d-img-ori" src="./docs/melanoma_skin_3d/ms_3d_demo_b.png">
140142
<img class="dev-img" id="skin3d-img" src="./docs/melanoma_skin_3d/ms_3d_25.jpg">
141143
<div class="dev-text">
142144

143-
<span style="display: block; text-align: center">Human: "how <span id ="skincancer" style="background-color: #f88000;">skin cancer</span> may develop?" </span><br>
145+
<span style="display: block; text-align: center">Human: "how <span id ="skincancer" style="background-color: #e55608;">skin cancer</span> may develop?" </span><br>
144146
<input type="range" min="1" max="48" value="26" class="slider" id="range_skin3d">
145147

146148
</div>
147149
</div>
148150
<br>
149151
<div class="skinreal-main">
150-
<img class="dev-img-demo" id="skinreal-img-ori" src="./docs/melanoma_skin_real/ms_real_demo.png">
152+
<img class="dev-img-demo" id="skinreal-img-ori" src="./docs/melanoma_skin_real/ms_real_demo_b.png">
151153
<img class="dev-img" id="skinreal-img" src="./docs/melanoma_skin_real/ms_real_25.jpg">
152154
<div class="dev-text">
153155

154-
<span style="display: block; text-align: center">Human: "how <span id ="skincancer_real" style="background-color: #f88000;">skin cancer</span> may develop?" </span><br>
156+
<span style="display: block; text-align: center">Human: "how <span id ="skincancer_real" style="background-color: #e55608;">skin cancer</span> may develop?" </span><br>
155157
<input type="range" min="0" max="47" value="25" class="slider2" id="range_skinreal">
156158

157159
</div>
158160
</div>
159161
<br>
160162
<div class="cxray-main">
161-
<img class="dev-img-demo" id="cxray-img-ori" src="./docs/chest_xray_cos/chest_xray_demo.png">
163+
<img class="dev-img-demo" id="cxray-img-ori" src="./docs/chest_xray_cos/chest_xray_demo_b.png">
162164
<img class="dev-img" id="cxray-img" src="./docs/chest_xray_cos/cxrayl_25.jpg">
163165
<div class="dev-text">
164166

165-
<span style="display: block; text-align: center">Human: "how <span id ="cxray_real" style="background-color: #f88000;">lung consolidation</span> may develop?" </span><br>
167+
<span style="display: block; text-align: center">Human: "how <span id ="cxray_real" style="background-color: #e55608;">lung consolidation</span> may develop?" </span><br>
166168
<input type="range" min="0" max="46" value="25" class="slider2" id="range_cxray">
167169

168170
</div>
@@ -171,30 +173,30 @@ <h2>Hypothesis generation of disease progression</h2>
171173
</div>
172174

173175
<div class="content">
174-
<h2>Method overview</h2>
176+
<h2>Method Overview</h2>
175177
<img class="summary-img" src="./docs/method.png" style="width:100%;">
176178
<p>
177179
<i>MCPL</i> takes a sentence (top-left) and a sample image (top-right) as input, feeding them into a pre-trained text-guided diffusion model comprising a text encoder \(c_\phi\) and a denoising network \(\epsilon_\theta\). The string's multiple prompts are encoded into a sequence of embeddings which guide the network to generate images \(\tilde{X}_0\) close to the target one \(X_0\). MCPL focuses on learning multiple learnable prompts (coloured texts), updating only the embeddings \(\{v^*\}\) and \(\{v^\&\}\) of the learnable prompts while keeping \(c_\phi\) and \(\epsilon_\theta\) frozen. We introduce <i>Prompts Contrastive Loss (PromptCL)</i> to help separate multiple concepts within learnable embeddings. We also apply <i>Attention Masking (AttnMask)</i>, using masks based on the average cross-attention of prompts, to refine prompt learning on images. Optionally we associate each learnable prompt with an adjective (e.g., "brown" and "rolling") to improve control over each learned concept, referred to as <i>Bind adj.</i>
178180
</p>
179181
</div>
180182

181183
<div class="content">
182-
<h2>Introducing MCPL-one and MCPL-diverse training strategies</h2>
184+
<h2>Introducing MCPL-one and MCPL-diverse Training Strategies</h2>
183185
<img class="summary-img" src="./docs/041_MCv1_MCv2_v2.png" style="width:100%;">
184186
<p>
185187
Learning and Composing “ball” and “box”. We learned the concepts of “ball” and “box” using different methods (top row) and composed them into unified scenes (bottom row). We compare three learning methods: Textural Inversion (Gal et al., 2022), which learns each concept separately from isolated images (left); MCPL-one, which jointly learns both concepts from un- cropped examples using a single prompt string (middle); and MCPL-diverse, which advances this by learning both concepts with per-image specific relationships (right).
186188
</p>
187189
</div>
188190

189191
<div class="content">
190-
<h2>Ablation studies</h2>
191-
<h3>comparing MCPL-diverse versus MCPL-one</h3>
192+
<h2>Ablation Studies</h2>
193+
<h3>Comparing MCPL-diverse versus MCPL-one</h3>
192194
<img class="experiment-img" src="./docs/ablation_per_img_diverse_vs_one_cat_hat.png"">
193195
<p>
194196
Visual comparison of MCPL-diverse versus MCPL-one in learning per-image different concept tasks (cat with different hat example). As MCPL-diverse are specially designed for such tasks, it outperforms MCPL-one, which fails to capture per image different hat styles.
195197
</p>
196198

197-
<h3>learning more than two concepts from a single image</h3>
199+
<h3>Learning more than two concepts from a single image</h3>
198200
<img class="experiment-img" src="./docs/ours_vs_bas_chicken.png"">
199201
<p>
200202
A qualitative comparison between our method (MCPL-diverse) and mask-based approaches. Our MCPL-diverse, which neither uses mask inputs nor updates model parameters, showed decent results, outperforming most mask-based approaches and was closer to SoTA Break-A-Scene. Images modified from Break-A-Scene (Avrahami et al., 2023).

style.css

+8
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,14 @@ p code {
304304
}
305305

306306

307+
.body-text {
308+
float: right;
309+
font-size: 18px;
310+
width: 100%;
311+
margin-left: auto;
312+
margin-right: auto;
313+
}
314+
307315
.intro-text {
308316
float: right;
309317
font-size: 18px;

0 commit comments

Comments
 (0)