Skip to content

Latest commit

 

History

History
300 lines (229 loc) · 20.4 KB

research.md

File metadata and controls

300 lines (229 loc) · 20.4 KB

DALL•E 2: panlingual?

Author(s)

Me and three AI: Google Translate, DALL•E 2, ChatGPT

Abstract

DALL•E 2 is a state-of-the-art neural network developed by OpenAI that is capable of generating original images from textual prompts. In this poster, I present a comparison of DALL•E 2's outputs when given the same prompt in various languages.

To conduct the comparison, I first translated the original prompt into several different languages using Google Translate. I then fed each of these translated prompts into DALL•E 2 and recorded the resulting images.

The results show that DALL•E 2's outputs vary depending on the language of the prompt, with the connection between the prompt and the result varying from a good match (English) to no apparent connection to anything in the prompt (Hebrew), or failing entirely (Somali).

Overall, this study highlights the importance of including a broad range of languages in the training data of ML and AI systems. Some of the specific results also raise questions about the potential cultural biases present in such systems and the need for further research in this area.

Methods

The prompt Boy and girl playing with a soccer ball in a sunny beautiful park with a tree, photographer, professional, 4k was translated from English into 14 other languages via Google Translate. Each of these translations was then fed into DALL•E 2 on the https://labs.openai.com web interface. The results were then downloaded. All translated prompts and resulting images are shown here without cherry-picking.

Each image was then scored by how well it matched the (original, English-language) prompt, one point for the presence of each of {Boy, Girl, Play, Soccer ball, Sunny, Park, Tree, Photographer} in the image. As this study concerns the effect of language and not the artistic merits of the images themselves, whenever an image is ambiguous (e.g. is the environment a park or a garden? Is that blurry humanoid a boy or a girl? Do adults count?) the image was interpreted generously. The maximum score for any image is 8, the maximum score for any language is 32.

Prompts and resulting images

English

Prompt: Boy and girl playing with a soccer ball in a sunny beautiful park with a tree, photographer, professional, 4k

Arabic

Prompt: صبي وفتاة يلعبان بكرة القدم في حديقة مشمسة وجميلة مع شجرة ، مصور ، محترف ، 4k

Note: There may be an issue with right-to-left encoding of "4k"

Chinese (simplified)

Prompt: 男孩和女孩在阳光明媚的美丽公园里踢足球,公园里有一棵树,摄影师,专业人士,4k

Esperanto

Prompt: Knabo kaj knabino ludanta kun futbala pilko en suna bela parko kun arbo, fotisto, profesia, 4k

French

Prompt: Garçon et fille jouant avec un ballon de football dans un beau parc ensoleillé avec un arbre, photographe, professionnel, 4k

German

Prompt: Junge und Mädchen spielen mit einem Fußball in einem sonnigen, schönen Park mit einem Baum, Fotograf, Profi, 4k

Greek

Prompt: Αγόρι και κορίτσι παίζουν με μια μπάλα ποδοσφαίρου σε ένα ηλιόλουστο όμορφο πάρκο με ένα δέντρο, φωτογράφος, επαγγελματίας, 4k

Hebrew

Prompt: ילד וילדה משחקים עם כדור כדורגל בפארק שטוף שמש יפהפה עם עץ, צלם, מקצועי, 4k

Note: There may be an issue with right-to-left encoding of "4k"

Korean

Prompt: 나무, 사진작가, 전문가, 4k가 있는 햇살 가득한 아름다운 공원에서 축구공을 가지고 노는 소년 소녀

Norwegian

Prompt: Gutt og jente leker med en fotball i en solrik vakker park med et tre, fotograf, profesjonell, 4k

Russian

Prompt: Мальчик и девочка играют с футбольным мячом в солнечном красивом парке с деревом, фотограф, профессионал, 4k

Somali

Prompt: Wiil iyo gabadh ku ciyaaraya kubbadda cagta meel aad u qurux badan oo qorraxdu leedahay oo geed leh, sawir qaade, xirfadle, 4k

Spanish

Prompt: Niño y niña jugando con una pelota de fútbol en un hermoso parque soleado con un árbol, fotógrafo, profesional, 4k

Swahili

Prompt: Mvulana na msichana wakicheza na mpira wa miguu katika bustani nzuri ya jua yenye mti, mpiga picha, mtaalamu, 4k

Vietnamese

Prompt: Chàng trai và cô gái chơi bóng trong công viên xinh đẹp đầy nắng với một cái cây, nhiếp ảnh gia, chuyên nghiệp, 4k

Zulu

Prompt: Umfana nentombazane badlala ngebhola epaki elihle elinelanga elinesihlahla, umthwebuli wezithombe, uchwepheshe, 4k

Results

Language # Has: Image score Language score
Boy Girl Play Soccer ball Sunny Park Tree Photographer
English 1 0 1 1 1 1 1 1 1 7 28
2 0 1 1 1 1 1 1 1 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Arabic 1 0 0 0 0 1 0 1 0 2 8
2 0 0 0 0 1 0 1 0 2
3 0 0 0 0 1 0 1 0 2
4 0 0 0 0 1 0 1 0 2
Chinese 1 1 1 1 1 1 1 1 0 7 28
2 1 1 1 1 1 1 1 0 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Esperanto 1 0 0 0 0 1 1 1 0 3 12
2 0 0 0 0 0 1 1 0 2
3 0 0 0 0 1 1 1 0 3
4 0 0 0 1 1 1 1 0 4
French 1 1 1 1 1 1 1 1 0 7 28
2 1 1 1 1 1 1 1 0 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
German 1 1 1 1 1 1 1 1 0 7 28
2 1 1 1 1 1 1 1 0 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Greek 1 0 0 0 0 1 0 1 0 2 8
2 0 0 0 0 1 0 1 0 2
3 0 0 0 0 1 0 1 0 2
4 0 0 0 0 1 0 1 0 2
Hebrew 1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
Korean 1 0 0 0 0 0 0 0 0 0 3
2 0 0 0 0 1 0 0 0 1
3 0 0 0 0 1 0 0 0 1
4 0 0 0 0 1 0 0 0 1
Norwegian 1 1 1 1 1 1 1 1 0 7 28
2 1 1 1 1 1 1 1 0 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Russian 1 1 1 1 1 1 1 1 0 7 28
2 1 1 1 1 1 1 1 0 7
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Somali N/A N/A N/A
N/A N/A
N/A N/A
N/A N/A
Spanish 1 1 1 1 1 1 1 1 0 7 27
2 1 0 1 1 1 1 1 0 6
3 1 1 1 1 1 1 1 0 7
4 1 1 1 1 1 1 1 0 7
Swahili 1 0 0 0 0 1 0 0 0 1 8
2 0 0 0 0 1 0 0 0 1
3 1 0 0 0 1 0 1 0 3
4 0 0 0 0 1 1 1 0 3
Vietnamese 1 1 1 0 0 1 1 1 0 5 20
2 1 1 0 0 1 1 1 0 5
3 1 1 0 0 1 1 1 0 5
4 1 1 0 0 1 1 1 0 5
Zulu 1 0 0 0 0 1 0 1 0 2 9
2 0 0 0 0 1 0 1 0 2
3 0 0 0 0 1 0 1 0 2
4 1 0 0 0 1 0 1 0 3

Limitations

As these prompts were generated by another AI, namely Google Translate, it is entirely possible that some of these failings are due to mis-translated prompts rather than failings of the DALL•E 2 model itself.

Conclusions

Based on the results of this study, it appears that DALL•E 2's performance varies depending on the language of the prompt. Some languages, such as English, Chinese, French, German, Norwegian, Russian, and Spanish, are interpreted correctly by DALL•E 2's language model and produce detailed and realistic images.

Other languages, such as Vietnamese and Esperanto, are interpreted poorly and result in outputs that are missing most of the salient details. In contrast, certain languages, such as Zulu, Arabic, Greek, Swahili, and Korean, are interpreted badly and appear to only adhere to stereotypes about the language itself rather than the content of the text.

Finally, some languages, such as Hebrew and Somali, are interpreted catastrophically badly, resulting in nonsensical or filtered outputs.

Overall, these findings suggest that language plays a significant role in the performance of AI and ML systems and that further research is needed to understand and address any potential cultural biases present in such systems.