-
Notifications
You must be signed in to change notification settings - Fork 10
freshness blog post #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_posts/2025-03-5-freshness.md
Outdated
|
||
Given a prompt at submitted at time $$t$$, we examine the following: | ||
|
||
- Nearest neighbor with all prompts submitted before time $$t$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to make this clearer, e.g.
- The largest similarity between the current prompt and any prompt submitted before time
$$t$$ - The largest similarity between the current prompt and any prompt submitted at least one day before time
$$t$$ - ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited in cd539a3
_posts/2025-03-5-freshness.md
Outdated
|
||
## How do we measure prompt duplicates? | ||
|
||
Prompt duplicates are measured by the cosine similarity of the text embeddings (OpenAI's text-embedding-3-small). If the similarity between the embeddings of prompt a and prompt b are greater than or equal to 0.7, we consider it a duplicate. This threshold is set by manually looking through examples to determine when two prompts are asking the same thing. A random sample of prompt pairs with their similarities are provided on our hugging face. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hugging Face? HuggingFace? Don't know what's the right way to capitalize/space this lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited in cd539a3
_posts/2025-03-5-freshness.md
Outdated
} | ||
</script> | ||
|
||
While we do see a downward trend in proportion of unique prompts over time, this decrease is plateauing. Interestingly, we also see certain dates where prompt freshness is significantly lower than neighboring dates: we will get to why that is in the next section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say something a little more descriptive here? E.g.
"If you look at the above analysis, the proportion of fresh prompts decreases as a function of
However, as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited in cd539a3
No description provided.