Image-to-Prompts

The goal of the models is to reverse the typical direction of a generative text-to-image model: instead of generating an image from a text prompt creating a model which can predict the text prompt given a generated image.

Model-1 makes use of a simple encoder-decoder architecture.Feature embeddings fromthe image are obtained by using resnet-50 while a gru based model ,combined withself attention,has beenused for the decoder part of the model.

Model 2 makes use of a combination of VIT and Gpt2 inorder to generate the final prompts.

Further I believe results can be improved by experimenting withother pretrained models like OFA and clip-blip architectures. Morever data augmentation and feature extraction can help in improving the results.

Some of the prompts generated:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
20057f34d.png		20057f34d.png
227ef0887.png		227ef0887.png
92e911621.png		92e911621.png
README.md		README.md
Resnet+gru_decoder.ipynb		Resnet+gru_decoder.ipynb
Vit_B_32+Gpt2.ipynb		Vit_B_32+Gpt2.ipynb
a4e1c55a9.png		a4e1c55a9.png
c98f79f71.png		c98f79f71.png
d8edf2e40.png		d8edf2e40.png
f27825b2c.png		f27825b2c.png
prompts.csv		prompts.csv
sample_submission.csv		sample_submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-to-Prompts

About

Releases

Packages

Languages

MansiGupta1603/Image-to-Prompts

Folders and files

Latest commit

History

Repository files navigation

Image-to-Prompts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages