Skip to content

Vocab Replace \t to blank issue #33

Open
@NiHaoUCAS

Description

@NiHaoUCAS

when the corpus is:
how are you \ tnice to meet you
and apply bert-vocab cmd, the output of the vacab is
['<pad>', '<unk>', '<eos>', '<sos>', '<mask>', 'you', 'are', 'how', 'meet', 'nice', 'to'].
But when change the corputs to
how are you\tnice to meet you, the result is ['<pad>', '<unk>', '<eos>', '<sos>', '<mask>', 'are', 'how', 'meet', 'to', 'you', 'younice'], the last token become younice.
a <'blank'> need on both sides of <'\t'>.
it's may not a bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions