-
Notifications
You must be signed in to change notification settings - Fork 0
/
Supplementary.qmd
15 lines (8 loc) · 1.01 KB
/
Supplementary.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## Supplementary {.unnumbered}
May students have expressed particular interest in extracting tables from PDFs. For this, I will present here some tools for the specific use case. If your PDF is actually a set of images (as we discuss in this Unit), you might try 'Nougat' by Meta.
If interested, you can also view the following Colab notebook and video demonstration.
- Video Demonstration of Nougat
<iframe width="560" height="315" src="https://www.youtube.com/embed/SYO_4uhdHKM?si=d1N_QxzwFyxBRV1O" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen>
</iframe>
- [Example Notebook for Nougat](https://colab.research.google.com/drive/1oC7jK8UMEYRDAEPevn5VQweadjN4WyeW?usp=sharing#scrollTo=ortVi_5g3ADU)
If your PDFs have the PostScript content intact, you might consider the relatively simpler [Camelot package](https://camelot-py.readthedocs.io/en/master/).