Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextVectorization: output_mode={multi_hot, count} promise int arrays but output floats #711

Open
divyashreepathihalli opened this issue Dec 21, 2023 · 0 comments
Assignees
Labels

Comments

@divyashreepathihalli
Copy link
Contributor

Issue filed in Keras by @nicdumz - keras-team/keras#18973

Documentation for output_mode currently reads:

"multi_hot": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item.
"count": Like "multi_hot", but the int array contains a count of the number of times the token at that index appeared in the batch item.

repro

import tensorflow as tf, tensorflow.version as tv

print(f"{tv.VERSION}, {tv.COMPILER_VERSION}, {tv.GIT_VERSION}")

v = tf.keras.layers.TextVectorization(output_mode="count")
v.adapt(["foo", "bar", "baz"])
print(v(["bar baz"]).dtype)

ouput

2.15.0, Ubuntu Clang 17.0.2 (++20231003073124+b2417f51dbbd-1~exp1~20231003073217.50), v2.15.0-2-g0b15fdfcb3f
<dtype: 'float32'>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants