no prefill when decoder_input_details=True from InferenceClient #2973

lifeng-jin · 2025-01-30T19:53:26Z

System Info

I used 3.0.2 official docker to load a local Llama 3 instruct model

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

I used 3.0.2 official docker to load a local Llama 3 instruct model, and used InferenceClient to call it (see some interaction here)

output = client.text_generation("Today is a ", max_new_tokens=2, do_sample=True, temperature=1.0, details=True, decoder_input_details=True)

The output is this. Prefill is empty.

TextGenerationOutput(generated_text='5-minute', details=TextGenerationOutputDetails(finish_reason='length', generated_tokens=2, prefill=[], tokens=[TextGenerationOutputToken(id=20, logprob=-2.1425781, special=False, text='5'), TextGenerationOutputToken(id=24401, logprob=-4.4609375, special=False, text='-minute')], best_of_sequences=None, seed=9305067545921572115, top_tokens=None))

Expected behavior

I expect prefill to include tokens in the prompt as well as their logprobs, as shown in the doc here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no prefill when decoder_input_details=True from InferenceClient #2973

no prefill when decoder_input_details=True from InferenceClient #2973

lifeng-jin commented Jan 30, 2025 •

edited

Loading

no prefill when decoder_input_details=True from InferenceClient #2973

no prefill when decoder_input_details=True from InferenceClient #2973

Comments

lifeng-jin commented Jan 30, 2025 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

lifeng-jin commented Jan 30, 2025 •

edited

Loading