jupytext | kernelspec | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
(auto-research-outputs)=
In this chapter, you'll learn how to automate the inclusion of figures and tables in LaTeX-derived research outputs including PDFs and slides——plus how to convert those outputs to Microsoft Word documents and more. Much of what you'll see in this chapter applies to a wide range of coding languages.
This chapter has some similarities with another chapter, on {ref}quarto
. But this chapter puts the LaTeX typesetting language front and centre, because it's the de facto standard for preparing research outputs (most journals have a LaTeX template for submission, for example), and it gives you full control over every aspect of how your outputs look. However, if you don't already know LaTeX, there is a steep-ish learning curve and—if you're just looking to create some automated reports using code and text rather than write pre-prints, working papers, journal articles, or academic-talk style slide decks—the chapter on {ref}quarto
is going to be a better and easier fit for you.
Automating the inclusion of figures and tables in your research outputs has many benefits:
- once configured, it's clearly easier than manual updates
- your paper can update at the touch of a button
- it helps with creating a reproducible analytical pipeline (for more on these, see the {ref}
wrkflow-rap
chapter). - it enforces structure on your project
- automation is complementary to other good practices such as version control
Let's now turn to the how.
Let's say you're writing a paper, using
Including code outputs is pretty simple, but is slightly different for figures and tables (the two main outputs you might include).
For figures, the outputs/figures
, which would be set like this at the top of the document:
\usepackage{graphicx}
\graphicspath{{outputs/figures/}}
We're imagining here that we have a project structure like this:
code.py
paper.tex
outputs/
figures/
chart.pdf
tables/
reg_table.tex
Then, whenever you need to include a figure, say chart.pdf
, you can always do it using
\begin{figure}
\includegraphics[width=\textwidth]{chart.pdf}
\caption{Example figure. \label{fig:example}}
\end{figure}
Let's pretend chart.pdf
is generated by the most popular Python graphics library, matplotlib. The code in 'code.py' which puts the chart in the 'figures' folder could look something like this:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(range(5), range(5), s=50, c='b')
plt.savefig("outputs/figures/chart.pdf")
The important line here is plt.savefig("outputs/figures/chart.pdf")
because it says to save the figure in the 'figures' directory. When you re-run your code, the chart ends up in the right place. When you re-compile your
Now let's imagine you've created a table of descriptive statistics such as the one below:
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
table = tips.groupby(["smoker", "time"], observed=True)["tip"].mean().unstack().round(2)
table
This can be turned into a
table.style.to_latex(caption='A Table', label='tab:descriptive')
Or perhaps you have a regression table, for example
import pandas as pd
from sklearn import datasets
import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer
diabetes = datasets.load_diabetes()
df = pd.DataFrame(diabetes.data)
df.columns = ['Age', 'Sex', 'BMI', 'ABP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
df['target'] = diabetes.target
est = smf.ols('target ~ Age + Sex + BMI + ABP', data=df).fit()
est2 = smf.ols('target ~ Age + Sex + BMI + ABP + S1 + S2', data=df).fit()
reg_results = Stargazer([est, est2])
reg_results
import numpy as np
import pandas as pd
#import pylatex as pl # for the latex table; note: not a dependency of pyfixest - needs manual installation
from great_tables import loc, style
from IPython.display import FileLink, display
import pyfixest as pf
data = pf.get_data()
fit1 = pf.feols("Y ~ X1 + X2 | f1", data=data)
fit2 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=data)
fit3 = pf.feols("Y2 ~ X1 + X2 | f1", data=data)
fit4 = pf.feols("Y2 ~ X1 + X2 | f1 + f2", data=data)
pf.etable([fit1, fit2, fit3, fit4,])
which can be cast into type="tex"
.
tab = pf.etable(
[fit1, fit2, fit3, fit4],
digits=2,
type="tex",
print_tex=True,
)
tab
We'd like to export tables like this into files that can be picked up by our
from pathlib import Path
with open(Path('outputs/tables/reg_table.tex'), 'w') as f:
f.write(table.style.to_latex(caption='A Table', label='tab:descriptive'))
in the first example, and
from pathlib import Path
with open(Path('outputs/tables/reg_table.tex'), 'w') as f:
f.write(tab)
in the second. Remember that Path
is a clever module that will find the relevant file path regardless of which operating system you happen to be using at the time. This is especially useful when you have co-authors on different systems!
The code chunk above opens up a file in write mode in the right directory relative to code.py
, and puts the
\input{outputs/tables/reg_table.tex}
which picks up your table. If you don't want to have to add the full path to the tables directory each time, you can add this near the top of 'paper.tex':
\makeatletter
\providecommand*{\input@path}{}
\g@addto@macro\input@path{{outputs/tables/}}
\makeatother
So that you need only write \input{reg_table.tex}
in your
When including your research outputs automatically, you may not want your final output to be a PDF (the standard output for
To perform the magic conversion to other document types (and often between types), we'll use the command line tool pandoc, which is absolutely brilliant. It can translate
To use pandoc, first install it following the instructions on the website.
To convert documents, the general syntax for pandoc looks like this:
pandoc mydoc.tex -o mydoc.docx
This is an example where the input is a .tex document and the output, -o
, is a Microsoft Word docx file.
You can try this yourself using the following minimal tex file:
\documentclass{article}
\usepackage[margin=0.7in]{geometry}
\usepackage[pasrfill]{parskip}
\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb,amsfonts,amsthm}
\begin{document}
This is some text
And an equation:
\[
u'(c_{t})=\beta(1+r_{t+1})u'(c_{t+1})
\]
\section{Section Heading}
More text
\end{document}
Create a .tex file from the tex code above and convert it to a word document using **pandoc**.
What's surprising is how effective the conversion to word is: even if you have figures, equations, and other non-text features.
You can get quite fancy with pandoc, for example you can translate a whole book's worth of latex into a Word doc complete with a Word style, a bibliography via biblatex, equations, and figures. Nothing can save Word from being painful to use, but pandoc certainly helps. If you want to see a couple of examples, you could check out cookie-cutter-latex-book-manuscript.
Beamer slides can be converted in much the same way that documents can. Popular output formats for slides include PDF, HTML (via dzslides, slidy, or revealjs), and .pptx (powerpoint).
For example, to create revealjs slides,
pandoc -f latex -t revealjs -s --self-contained -o presentation.html presentation.tex --mathjax
where presentation.tex
is the input file. (Self-contained just creates a single, large output HTML file; mathjax enables equations in the HTML.) For powerpoint, the equivalent is
pandoc -f latex -t -o presentation.pptx presentation.tex
As with the example above and the reference file, you can use a reference powerpoint file for style. Here is a minimal example of the tex code for a beamer presentation:
\documentclass[aspectratio=169]{beamer}
\usepackage[english]{babel}
\usepackage[utf8x]{inputenc}
\mode<presentation>
{
\usetheme{default}
\usecolortheme{default}
\usefonttheme{default}
\setbeamertemplate{caption}[numbered]
}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\title{Title for a minimal beamer presentation}
\author{Author One}
\institute{Name of institution}
\date{\today}
\begin{document}
\begin{frame}
\titlepage
\end{frame}
\section{Section One}
\begin{frame}{Slide with bullet points}
This is a bullet list of two points:
\begin{itemize}
\item Point one
\item Point two
\end{itemize}
\end{frame}
\section{Section Two}
\begin{frame}
Slide with an equation
\[
u'(c_{t})=\beta(1+r_{t+1})u'(c_{t+1})
\]
\end{frame}
\end{document}
Create a .tex beamer file from the tex code above and convert it to a powerpoint presentation using **pandoc**.