Replies: 3 comments
-
So after digging through the C++ source code, the answer is: logits=generator.get_output("logits") However for some reason at the first step the maximum token is different from the output of import onnxruntime_genai as og
import numpy as np
prompt = '''<|user|>
Please tell me the time.<|end|>
<|assistant|>'''
model=og.Model("/home/ubuntu/models/Phi-3-mini-4k-instruct-onnx/cuda/cuda-fp16/")
tokenizer = og.Tokenizer(model)
tokens = tokenizer.encode(prompt)
params=og.GeneratorParams(model)
params.input_ids = tokens
generator = og.Generator(model, params)
i = 0
while not generator.is_done():
generator.compute_logits()
generator.generate_next_token()
new_token = generator.get_next_tokens()[0]
logits = generator.get_output("logits").squeeze()
new_token2 = np.argmax(logits)
print(new_token, " ", new_token2)
i += 1
if i > 10:
break
print() And the result:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Created an issue for this as it looks like it needs to be investigated |
Beta Was this translation helpful? Give feedback.
0 replies
-
See #591 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
According to the documentation
generator.get_output()
should return the generated logits.In practice, this is the error message I get:
The function expects an input string. However no matter what I put, the output is
array([], dtype=float64)
.What is the correct way to use this method?
Beta Was this translation helpful? Give feedback.
All reactions