I trained the YOLOR [1] algorithm on the Manga109 dataset [2] to detect texts on manga pages.
You can use the yolor.ipynb
file to test the model or train it with your own dataset.
I used the Manga109 dataset [2] to train my model. If you wish to download it, you can send an application via this form. I used Roboflow to manage my dataset (split it, apply preprocessing and augmentations), and to easily download it to my python notebook.
You can download the trained weights here. This is the best overall over 100 epochs of training.
Below is the result of the trained model for a page:
Read it here.
[1] Wang, Chien-Yao & Yeh, I-Hau & Liao, Hong-yuan. You Only Learn One Representation: Unified Network for Multiple Tasks. 2021
[2] Kiyoharu Aizawa and Azuma Fujimoto and Atsushi Otsubo and Toru Ogawa and Yusuke Matsui and Koki Tsubota and Hikaru Ikuta. Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications. IEEE MultiMedia, 27, 2, 8--18, 2020