Supplementary.qmd

## Supplementary {.unnumbered}

May students have expressed particular interest in extracting tables from PDFs. For this, I will present here some tools for the specific use case. If your PDF is actually a set of images (as we discuss in this Unit), you might try 'Nougat' by Meta.

If interested, you can also view the following Colab notebook and video demonstration.

-   Video Demonstration of Nougat

<iframe width="560" height="315" src="https://www.youtube.com/embed/SYO_4uhdHKM?si=d1N_QxzwFyxBRV1O" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen>

</iframe>

-   [Example Notebook for Nougat](https://colab.research.google.com/drive/1oC7jK8UMEYRDAEPevn5VQweadjN4WyeW?usp=sharing#scrollTo=ortVi_5g3ADU)

If your PDFs have the PostScript content intact, you might consider the relatively simpler [Camelot package](https://camelot-py.readthedocs.io/en/master/).