Skip to content
Sebastian Riedel edited this page Apr 17, 2015 · 1 revision

Exercises

  • Write a REGEX tokenizer that takes "Mr." and "PhD." and "MSc." into account
  • Write a REGEX tokenizer and segmenter that tokenizes and segments lyrics on OHHLA in a one sentence per "bar" fashion. Data will be provided.
  • Write a classifier that learns to tokenizer (data provided, and simple tokenizer.manual() method that can be used to create a tokenizer)
Clone this wiki locally