Whole document embedding #152

aalloul · 2020-09-14T15:29:36Z

Hi there,

I was wondering whether it makes sense to "trick" LASER to consider a whole document made out of multiple sentences as a single sentence? That way I'd get a whole document embedding and wouldn't need to devise any aggregation method.

I know there's a limit of 12000 tokens on sentences (as per

LASER/source/embed.py

Line 343 in 5767f18

parser.add_argument('--max-tokens', type=int, default=12000,

) but let's forget this for now please :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whole document embedding #152

Whole document embedding #152

aalloul commented Sep 14, 2020

Whole document embedding #152

Whole document embedding #152

Comments

aalloul commented Sep 14, 2020