Prompt Logprobs via echo=True #82

ciaran-regan-ie · 2024-10-28T22:41:00Z

ciaran-regan-ie
Oct 28, 2024

I'm looking for an inference server that can generate the Logprobs for the input prompt.

For older OpenAI models, such as davinci, this was possible by querying the server with Logprobs and echo=True. It was since depreciated, however I believe this is an important capability of LLMs.

This feature is available via vllm but not via llama.cpp, however, I'm looking for something that runs on MacOS.

Does this feature exist in optillm?

Answered by ciaran-regan-ie

Nov 1, 2024

@codelion I believe this solves it! Thank you!

View full answer

codelion · 2024-10-29T01:37:01Z

codelion
Oct 29, 2024
Maintainer

@ciaran-regan-ie logprobs are still available for newer OpenAI models as well (gpt-4o-mini, and gpt-4o) just set logprobs=True in the request.

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": "How many rs are there in strawberry? Use code to solve the problem."}
  ],
  temperature=0.2,
  logprobs = True,
  top_logprobs = 3,
)

print(response)

Are you looking for some solution like vllm that can give you these for say any model loaded from HuggingFace?

It is actually supported in llama.cpp using the n_probs field (see - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion).

Also for MacOS, it is supported in the mlx-server (see - https://github.com/ml-explore/mlx-examples/blob/8fe9539af76075405b2c3071ba9657aa921d749d/llms/mlx_lm/SERVER.md#request-fields).

4 replies

ciaran-regan-ie Oct 29, 2024
Author

Thank you @codelion for the response!

I think the key problem is I require echo=True. That is, I need the logprobs of the prompt itself, which is not supported anymore by OpenAI or llama.cpp.

codelion Oct 29, 2024
Maintainer

We can do it in PyTorch, will something like this notebook work - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing

The above code should work for CUDA, CPU and MLP (macOS) devices.

What are you looking to use the logits for?

codelion Nov 1, 2024
Maintainer

@ciaran-regan-ie did you get a chance to look at the colab I linked above? Was it something you were looking for?

ciaran-regan-ie Nov 1, 2024
Author

@codelion I believe this solves it! Thank you!

Answer selected by codelion

codelion · 2024-11-13T02:50:27Z

codelion
Nov 13, 2024
Maintainer

Logprobs are now directly supported in optillm with the new #90 v0.0.10 release.

response = client.chat.completions.create(
  model = "meta-llama/Llama-3.2-1B-Instruct",
  messages=messages,
  temperature=0.2,
  logprobs = True,
  top_logprobs = 3,

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Logprobs via echo=True #82

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Prompt Logprobs via echo=True #82

ciaran-regan-ie Oct 28, 2024

Replies: 2 comments · 4 replies

codelion Oct 29, 2024 Maintainer

ciaran-regan-ie Oct 29, 2024 Author

codelion Oct 29, 2024 Maintainer

codelion Nov 1, 2024 Maintainer

ciaran-regan-ie Nov 1, 2024 Author

codelion Nov 13, 2024 Maintainer

ciaran-regan-ie
Oct 28, 2024

Replies: 2 comments 4 replies

codelion
Oct 29, 2024
Maintainer

ciaran-regan-ie Oct 29, 2024
Author

codelion Oct 29, 2024
Maintainer

codelion Nov 1, 2024
Maintainer

ciaran-regan-ie Nov 1, 2024
Author

codelion
Nov 13, 2024
Maintainer