Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mythz committed May 13, 2024
1 parent 3710c55 commit 4044af9
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 33 deletions.
24 changes: 15 additions & 9 deletions MyApp/_pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,30 @@ Like most developers we're captivated by the amazing things large language model
have to transform the way we interact with and use technology. One of the areas they can be immediately beneficial with
is in getting help in learning how to accomplish a task or solving a particular issue.

Previously we would need to seek out answers by scanning the Internet, reading through documentation and blogs to find
out answers for ourselves. Forums and particularly Stack Overflow have been a great resource for developers in being able
to get help from other developers who have faced similar issues. But the timeliness and quality of the responses can vary
Previously we'd need to seek out answers by scanning the Internet, reading through docs, tutorials and blogs to find
out answers for ourselves. Forums and particularly Stack Overflow have been great resources for developers in being able
to get help from other devs who have faced similar issues. But the timeliness and quality of the responses can vary
based on the popularity of the question and the expertise of the person answering. Answers may also not be 100% relevant
to our specific situation, potentially requiring reading through multiple answers from multiple questions to get the help
we want.

But now, with the advent of large language models, we can get help in a more natural way by simply asking a question in
plain English and getting an immediate response that is tailored to our specific needs.
But with the advent of large language models, we can get help in a more natural way by simply asking a question in
plain English and getting an immediate response that's tailored to our specific needs.

With the rate of progress in both the quality of performance of LLMs and the hardware to run them we expect this to become
the new normal for how most people will get answers to their questions in future.

## Person vs Question

[pvq.app](https://pvq.app) was built to provide a useful platform for other developers in this new age by enlisting the help of the
[pvq.app](https://pvq.app) was created to provide a useful platform for other developers in this new age by enlisting the help of the
best Open Source and Proprietary large language models available to provide immediate and relevant answers to specific questions.
Instead of just using a single LLM to provide answers, we're using multiple models to provide different perspectives
on the same question that we'll use to analyze the strengths of different LLMs at answering different types of questions.

## Initial Base Line

For our initial dataset we've started with the top 100k questions from StackOverflow and created answers for them using
the most popular open LLM's that were ideally suited for answering technical and programming questions:
PvQ's initial dataset started with the **top 100k questions** from StackOverflow and generated **over 1 million answers**
for them using the most popular open LLMs that were ideally suited for answering technical and programming questions, including:

- [Gemma 2B](https://ai.google.dev/gemma) (2B) by Google
- [Qwen 1.5](https://github.com/QwenLM/Qwen1.5) (4B) by Qwen Team
Expand Down Expand Up @@ -87,12 +90,15 @@ For new questions asked we'll also include access to the best performing proprie
- [Claude 3 Haiku](https://www.anthropic.com/news/claude-3-haiku) by Anthropic
- [Llama3 70B](https://llama.meta.com/llama3/) (70B) by Meta
- [Command-R](https://cohere.com/blog/command-r) (35B) by Cohere
- [WizardLM2](https://wizardlm.github.io/WizardLM2/) (8x22B) by Microsoft
- [WizardLM2](https://wizardlm.github.io/WizardLM2/) (8x22B) by Microsoft (Mistral AI base model)
- [Claude 3 Sonnet](https://www.anthropic.com/news/claude-3-family) by Anthropic
- [Command-R+](https://cohere.com/blog/command-r-plus-microsoft-azure) (104B) by Cohere
- [GPT 4 Turbo](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo) by OpenAI
- [Claude 3 Opus](https://www.anthropic.com/claude) by Anthropic

All models were used to answer the **Top 1000 most voted questions** on StackOverflow to evaluate their performance in
answering technical questions on our [Leaderboard](/leaderboard).

## Open Questions and Answers for all

All questions, answers and comments is publicly available for everyone to freely use under the same
Expand Down
54 changes: 30 additions & 24 deletions MyApp/_posts/2024-04-01_pvq-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,30 @@ Like most developers we're captivated by the amazing things large language model
have to transform the way we interact with and use technology. One of the areas they can be immediately beneficial with
is in getting help in learning how to accomplish a task or solving a particular issue.

Previously we would need to seek out answers by scanning the Internet, reading through documentation and blogs to find
out answers for ourselves. Forums and particularly Stack Overflow have been a great resource for developers in being able
to get help from other developers who have faced similar issues. But the timeliness and quality of the responses can vary
Previously we'd need to seek out answers by scanning the Internet, reading through docs, tutorials and blogs to find
out answers for ourselves. Forums and particularly Stack Overflow have been great resources for developers in being able
to get help from other devs who have faced similar issues. But the timeliness and quality of the responses can vary
based on the popularity of the question and the expertise of the person answering. Answers may also not be 100% relevant
to our specific situation, potentially requiring reading through multiple answers from multiple questions to get the help
we want.

But now, with the advent of large language models, we can get help in a more natural way by simply asking a question in
plain English and getting an immediate response that is tailored to our specific needs.
But with the advent of large language models, we can get help in a more natural way by simply asking a question in
plain English and getting an immediate response that's tailored to our specific needs.

With the rate of progress in both the quality of performance of LLMs and the hardware to run them we expect this to become
the new normal for how most people will get answers to their questions in future.

## Person vs Question

[pvq.app](https://pvq.app) was built to provide a useful platform for other developers in this new age by enlisting the help of the
best Open Source and Proprietary large language models available to provide immediate and relevant answers to specific questions.
Instead of just using a single LLM to provide answers, we're using multiple models to provide different perspectives
[pvq.app](https://pvq.app) was created to provide a useful platform for other developers in this new age by enlisting the help of the
best Open Source and Proprietary large language models available to provide immediate and relevant answers to specific questions.
Instead of just using a single LLM to provide answers, we're using multiple models to provide different perspectives
on the same question that we'll use to analyze the strengths of different LLMs at answering different types of questions.

## Initial Base Line

For our initial dataset we've started with the top 100k questions from StackOverflow and created answers for them using
the most popular open LLM's that were ideally suited for answering technical and programming questions:
PvQ's initial dataset started with the **top 100k questions** from StackOverflow and generated **over 1 million answers**
for them using the most popular open LLMs that were ideally suited for answering technical and programming questions, including:

- [Gemma 2B](https://ai.google.dev/gemma) (2B) by Google
- [Qwen 1.5](https://github.com/QwenLM/Qwen1.5) (4B) by Qwen Team
Expand All @@ -43,25 +46,25 @@ the most popular open LLM's that were ideally suited for answering technical and
- [Gemma 7B](https://ai.google.dev/gemma) (7B) by Google
- [Llama3 8B](https://llama.meta.com/llama3/) (8B) by Meta

For our initial pass we've evaluated how each of these models performed on the StackOverflow dataset and have published
the results on our [Leaderboard](/leaderboard) page which we're also comparing against the highest voted and accepted answers on
For our initial pass we've evaluated how each of these models performed on the StackOverflow dataset and have published
the results on our [Leaderboard](/leaderboard) page which we're also comparing against the highest voted and accepted answers on
StackOverflow to see how well they measure up against the best human answers.

### Continuously Improving Models

After evaluating the initial results we decided to remove the worst performing **Phi 2**, **Gemma 2B** and **Qwen 1.5 4B**
models from our base model lineup and replaced **Phi2** answers with **Phi3**, upgraded **Gemma 2B** to **Gemma 7B** and included the
After evaluating the initial results we decided to remove the worst performing **Phi 2**, **Gemma 2B** and **Qwen 1.5 4B**
models from our base model lineup and replaced **Phi2** answers with **Phi3**, upgraded **Gemma 2B** to **Gemma 7B** and included the
newly released **Llama3 8B** and **70B** models from Meta to our lineup.

We'll be continuously evaluating and upgrading our active models to ensure we're using the best models available.

### Answers are Graded and Ranked

In addition to answering questions, we're also enlisting the help of LLMs to help moderate answers, where all answers
(including user contributed answers) are graded and ranked based on how well and how relevant they answer the
question asked.
In addition to answering questions, we're also enlisting the help of LLMs to help moderate answers, where all answers
(including user contributed answers) are graded and ranked based on how well and how relevant they answer the
question asked.

This information is used to rank the best answers for each question which are surfaced to the top, with its grade
This information is used to rank the best answers for each question which are surfaced to the top, with its grade
displayed alongside answers to provide a review on the quality, relevance and critiques of the answer.

::: {.shadow .hover:shadow-lg}
Expand Down Expand Up @@ -91,24 +94,27 @@ For new questions asked we'll also include access to the best performing proprie
- [Claude 3 Haiku](https://www.anthropic.com/news/claude-3-haiku) by Anthropic
- [Llama3 70B](https://llama.meta.com/llama3/) (70B) by Meta
- [Command-R](https://cohere.com/blog/command-r) (35B) by Cohere
- [WizardLM2](https://wizardlm.github.io/WizardLM2/) (8x22B) by Microsoft
- [WizardLM2](https://wizardlm.github.io/WizardLM2/) (8x22B) by Microsoft (Mistral AI base model)
- [Claude 3 Sonnet](https://www.anthropic.com/news/claude-3-family) by Anthropic
- [Command-R+](https://cohere.com/blog/command-r-plus-microsoft-azure) (104B) by Cohere
- [GPT 4 Turbo](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo) by OpenAI
- [Claude 3 Opus](https://www.anthropic.com/claude) by Anthropic

All models were used to answer the **Top 1000 most voted questions** on StackOverflow to evaluate their performance in
answering technical questions on our [Leaderboard](/leaderboard).

## Open Questions and Answers for all

All questions, answers and comments is publicly available for everyone to freely use under the same
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license used by StackOverflow.
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license used by StackOverflow.

## Help improve Answers

You can help improve the quality of answers by providing any kind of feedback including asking new questions,
up voting good answers, down voting bad ones, reporting inappropriate ones, correcting answers with inaccuracies or
asking the model for further clarifications on answers that are unclear.
up voting good answers, down voting bad ones, reporting inappropriate ones, correcting answers with inaccuracies or
asking the model for further clarifications on answers that are unclear.

The most active users who help curate and improve the quality of questions and answers will have the opportunity to
The most active users who help curate and improve the quality of questions and answers will have the opportunity to
become moderators where they'll have access to all our models.

We also welcome attempts to **Beat Large Language Models** by providing your own answers to questions. We'll rank
Expand All @@ -118,7 +124,7 @@ This feedback will feed back into [LeaderBoard](/leaderboard) and improve the qu

## Future Work

After having established the initial base line we'll look towards evaluating different strategies and specialized models
After having established the initial base line we'll look towards evaluating different strategies and specialized models
to see if we're able to improve the quality, ranking and grading of answers that can be provided.

## Feedback ❤️
Expand Down

0 comments on commit 4044af9

Please sign in to comment.