freshness blog post #29

lisadunlap · 2025-03-13T18:16:11Z

No description provided.

aangelopoulos · 2025-04-18T21:58:25Z

_posts/2025-03-5-freshness.md

+
+Given a prompt at submitted at time $$t$$, we examine the following:
+
+- Nearest neighbor with all prompts submitted before time $$t$$


We may want to make this clearer, e.g.

The largest similarity between the current prompt and any prompt submitted before time $$t$$

The largest similarity between the current prompt and any prompt submitted at least one day before time $$t$$

...

Edited in cd539a3

aangelopoulos · 2025-04-18T21:59:03Z

_posts/2025-03-5-freshness.md

+
+## How do we measure prompt duplicates?
+
+Prompt duplicates are measured by the cosine similarity of the text embeddings (OpenAI's text-embedding-3-small). If the similarity between the embeddings of prompt a and prompt b are greater than or equal to 0.7, we consider it a duplicate. This threshold is set by manually looking through examples to determine when two prompts are asking the same thing. A random sample of prompt pairs with their similarities are provided on our hugging face.


Hugging Face? HuggingFace? Don't know what's the right way to capitalize/space this lol.

Edited in cd539a3

aangelopoulos · 2025-04-18T23:09:15Z

_posts/2025-03-5-freshness.md

+}
+</script>
+
+While we do see a downward trend in proportion of unique prompts over time, this decrease is plateauing. Interestingly, we also see certain dates where prompt freshness is significantly lower than neighboring dates: we will get to why that is in the next section.


Can we say something a little more descriptive here? E.g.

"If you look at the above analysis, the proportion of fresh prompts decreases as a function of $$t$$. This is expected, since as $$t$$ grows, we are comparing new prompts with an ever-larger set of past prompts. For example, when $$t=1$$, there are no previos prompts, so of course, the freshness is 100%.

However, as $$t$$ grows, this number stabilizes to around 70-80% fresh prompts at a similarity threshold of 0.7. This equilibrium represents the fraction of fresh prompts that we expect chatbot arena to generate in the long run."

Edited in cd539a3

lisadunlap added 4 commits March 9, 2025 17:17

freshness v1

f235929

ran prettier

0092b58

fixed some plots

cf8449c

small formatting things

f3c2bbc

aangelopoulos reviewed Apr 18, 2025

View reviewed changes

aangelopoulos added 2 commits April 18, 2025 16:18

[minor edits]

cd539a3

[freshness]

c778222

aangelopoulos merged commit 0b68fd7 into lmarena:main Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

freshness blog post #29

freshness blog post #29

Uh oh!

lisadunlap commented Mar 13, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Uh oh!

Uh oh!


		Given a prompt at submitted at time $$t$$, we examine the following:

		- Nearest neighbor with all prompts submitted before time $$t$$


		## How do we measure prompt duplicates?

		Prompt duplicates are measured by the cosine similarity of the text embeddings (OpenAI's text-embedding-3-small). If the similarity between the embeddings of prompt a and prompt b are greater than or equal to 0.7, we consider it a duplicate. This threshold is set by manually looking through examples to determine when two prompts are asking the same thing. A random sample of prompt pairs with their similarities are provided on our hugging face.

freshness blog post #29

freshness blog post #29

Uh oh!

Conversation

lisadunlap commented Mar 13, 2025

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

aangelopoulos Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!