2. Getting the Corpus

We need to have big corpus to train word2vec model. You can access all wikipedia articles written in Turkish language from wikimedia dumps. The available one is 20180101 for this day and you can download all articles until 01/01/2018 by this link, 20180101. Of course, you can use another corpus to train word2vec model but you must modify your corpus to train a model with gensim library, explained below.

Previous: 1. Prerequisites
Next: 3. Preprocessing the Corpus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Getting the Corpus

Clone this wiki locally