index.qmd

# Planning

-   9h-10h: Introduction to Computo and Quarto
-   10h-10h30: Coffee break/Discussion
-   10h30-11h30: Hands-on with a toy example
-   11h30-12h30: Follow-up and optional personal article submission

## Learning objectives

-   Understand the benefits of reproducible research
-   Learn how to create a quarto document
-   Learn how to include code, data, and narrative text in a quarto document
-   Learn how to submit a quarto document to Computo
-   How to navigate the Computo submission process (optional)

# Short introduction to Computo and quarto

## Team

::: columns
::: {.column width="60%"}
Editorial board
:::

::: {.column width="10%"}
:::

::: {.column width="20%"}
IT support
:::
:::

::: columns
::: {.column width="20%"}
::: people
![](./people/julien.jpg){width="120"}

#### Julien Chiquet <small>(chief editor)</small>

<small>Stat. learning, DR INRAE<br/> Paris-Saclay University</small>
:::
:::

::: {.column width="20%"}
::: people
![](./people/pierre.jpg){width="120"}

#### Pierre Neuvial

<small>Statistique, DR CNRS<br/> IMT Toulouse<br/></small>
:::
:::

::: {.column width="20%"}
::: people
![](./people/mathurin.jpg){width="120"}

#### Mathurin Massias

<small>Optim./Machine-Learning<br/> CR INRIA Lyon</small>
:::
:::

::: {.column width="10%"}
:::

::: {.column width="20%"}
::: people
![](./people/fradav.jpg){width="120"}

#### Fra.-Dav. Collin

<small>CS/Stats/ML, IR CNRS<br/> IMAG, Montpellier University</small>
:::
:::
:::

::: columns
::: {.column width="20%"}
::: people
![](./people/nelle.jpg){width="120"}

#### Nelle Varoquaux

<small>Machine learning, CR CNRS<br/> Grenoble Alpes University</small>
:::
:::

::: {.column width="20%"}
::: people
![](./people/marie-pierre.jpg){width="120"}

#### Marie-Pierre Étienne

<small>Statistics, MCF<br/> Institut Agro Rennes-Angers</small>
:::
:::

::: {.column width="20%"}
::: people
![](./people/chloe.jpg){width="120"}

#### Chloé Azencott

<small>Machine Learning <br/> CR MinesParisTech</small>
:::
:::

::: {.column width="10%"}
:::

::: {.column width="20%"}
::: people
![](./people/ghislain.jpg){width="120"}

#### Ghislain Durif

<small>Stats/ML/dev, IR CNRS<br/> LBMC, ENS LYON</small>
:::
:::
:::

## What is reproducible research?

Fundamentally, it provides three things:

:::: { layout="[[20,80],[20,80],[20,80]]" layout-valign="center"}

![](img/mix.svg){ width="100" }

:::{.fragment}
Tools to reproduce the results (that’s like cooking)
:::


![](img/recipe.svg){ width="100" }

:::{.fragment}
A “recipe” to reproduce the results (still like cooking)
:::

![](img/head-side-thinking.svg){ width="100" height="100" }

:::{.fragment}
A path to understanding the results and the process that led to them (unlike cooking…^[Even so, we may discuss the fact that blindly following recipes will not make you a good cook.])
:::

::::

## Pre-Computo era

:::::{.r-stack}
::: {layout-ncol=2 .fragment }

::::{}
![](img/gutenberg-press.png)
::::

::::{}
The pdf era and paper submission.

The reproducibility was not a priority:

- Tools has to be bought, installed, and maintained
- Data and code were not shared (social engineering
- Even methodology details are often missing
::::


:::

![](img/perplex-scentist.jpg){ .center .fragment width="600"}

:::::

::: {.notes}
Social engineering is required to get reproducible results: at best you just have to ask the authors, at worst you have to reverse-engineer everything… and have no guarantee whatsoever to get to the same results, when authors are in a "Après moi, le déluge" mood and ditch, forget or let it crumble once the paper is published.
:::

## Pre-Computo era (2)


![](img/minority-report_header.webp){ width="600" fig-align="center" }

:::{.fragment}

And then in the Machine Learning domain, there was [distill.pub](https://distill.pub/2016/misread-tsne/){preview-link="true"} @olah2017research

::::{.incremental}
- state-of-the art visualizations
- paradigm shift in the scientific publication: **“distillation” of complex ideas**
- 100% reproducible (just a git clone and a few standard commands)
::::

:::

:::{ .fragment style="font-size: 1.8em;"}
but…
:::

## Pre-Computo era (3)

:::{ layout="[45,-10,45]"}

![](img/hard-work.webp)

:::::{}

:::{.fragment}
… **engineering was too complex** for the average scientist (a lot of javascript, etc.)
:::

:::{.fragment}
In fact, the distill.pub project was discontinued in 2021 @team2021distill
:::

:::::{.content-hidden unless-format="revealjs"}
::::{.r-stack}

:::{.r-fit-text layout-align="center" style="color: lightgray;"}
distill.pub
:::

![](img/RIP.png){width="400" fig-align="center" .fragment }

::::

:::::
:::::
:::

::: {.notes}
Due to a series of burnouts from the staff
:::

:::::{.content-hidden unless-format="revealjs"}

## The Rise of the Pragmatic { auto-animate=true }

[distill.pub](https://distill.pub)’s goals were right, but they outpaced themselves in terms of development complexity.

:::{.incremental}
- **Computo** is a fresh start with a pragmatic approach
- leverage what the scientific community is already using (Rmarkdown, Jupyter notebooks, etc.)
:::


:::{ .fragment .r-fit-text }
$\Rightarrow$ *bring the community* to the higher standards
:::

## The Rise of the Pragmatic { auto-animate=true auto-animate-easing="ease-in-out }

[distill.pub](https://distill.pub)’s goals were right, but they outpaced themselves in terms of development complexity.

- **Computo** is a fresh start with a pragmatic approach
- leverage what the scientific community is already using (Rmarkdown, Jupyter notebooks, etc.)

:::{ data-id="where-computo-stands" }
![](figures/where-computo-stands1.svg)
:::
:::::

## The Rise of the Pragmatic { auto-animate=true auto-animate-easing="ease-in-out }

[distill.pub](https://distill.pub)’s goals were right, but they outpaced themselves in terms of development complexity.

- **Computo** is a fresh start with a pragmatic approach
- leverage what the scientific community is already using (Rmarkdown, Jupyter notebooks, etc.)

:::{ data-id="where-computo-stands" }
![](figures/where-computo-stands2.svg)
:::

## Origin of Computo (\~ 2020s)

[French Statistical Society](https://www.sfds.asso.fr/) appoints a "publication" committee (led by Julien then Pierre) to develop a new journal

::::{layout-ncol="2"}

:::{.callout-note .fragment}

### Assessment

:::{.incremental .no-bullets}
- 😔 Multiplication of "traditional" journals...
- 😔 No valorization of "negative" results
- 😥 No or not enough valorization of source codes and case studies
- 😱 ↘ of publication quality and time dedicated to each article (on author or reviewer sides) [@hanson2023]
- 😱 Issue with *scientific* reproducibility (analyses, experiments) [@ioannidis2005; @steen2011; @allison2016; @bastian2016; @whitfield2021; @hernandez2023]
:::
:::

::: {.callout-tip .fragment}
### Point of view

-   Need for renewal regarding scientific research implementation
-   Need for higher standards regarding result publications
:::
::::

:::{.fragment layout-align="center" .r-fit-text}
⇝ Emergence of "Computo" idea
:::

## Philosophy

::: callout-note
### Scientific perimeter

Promote contribution in **statistics** and **machine learning** that provide insight into which models or methods are more appropriate to address a specific scientific question
:::

::: callout-tip
### Open access

::: columns
::: {.column width="90%"}
-   ["Diamond" open access](https://en.wikipedia.org/wiki/Open_access#Diamond/platinum_OA) (free to publish and free to read, possible to reuse)
-   🅭 🅯 Content published under [CC-BY](https://creativecommons.org/licenses/by/4.0/) license (attribution, share, adapt)
-   Reviews and discussions available after acceptance for publication (anonymous reviews)

⇝ In accordance with [Budapest Open Access Initiative (BOAI)](https://www.budapestopenaccessinitiative.org/) and [Plan S](https://www.coalition-s.org/addendum-to-the-coalition-s-guidance-on-the-implementation-of-plan-s/principles-and-implementation/)
:::

::: {.column width="10%"}
![](img/open_access_logo.png){width="70"}
:::
:::
:::

::: callout-important
### Reproducible

-   Numerical (statistical) reproducibility is a necessary condition
-   Source code and data should be available, at least partly executed and fully executable
:::

## Note on reproducible research [@desquilbet2019; @hejblum2020; @the_turing_way2022]

#### Why reproduce scientific results?

-   To strengthen their credibility
-   To check for errors (everyone makes errors at some point!!!)
-   To build new research upon them (science is incremental)

#### Issues?

-   Reproduce numerical scientific results is often difficult (technology/environment evolution, source code/environment configuration/software partially available or not available)
-   Waste of time and resources to reproduce existing non-reproducible results

#### Reproducible research?

-   For others but also **for your future self**
-   Improve result credibility
-   Facilitate future research works

## Setup

Official launch at the end of 2021

<center>[![](img/computo_website.png){height="280px"}](https://computo.sfds.asso.fr) [![](img/computo_github-group.png){height="280px"}](https://github.com/computorg/)</center>

### "Economical" model

-   A few tenacious people...
-   Free/Open-source community tools (Pandoc, Quarto, Git forge)
-   Institutional support (INRAE, INRIA, CNRS, SFdS)

## Functioning

::: columns
::: {.column width="65%"}
### Writing system

Notebook and literate programming</br><small>text (markdown) + math ($\LaTeX$) + code (Python/R/Julia), references (bib$\TeX$)</small>

### Publication system

Environment management, Compilation, Multi-format publication (pdf, html)<br><small>Continuous integration/Continuous deployment (CI/CD)</small>

### Reviewing system

-   Anonymous exchange published after acceptance
-   Reviewer pool (you can join)
-   \[*Ongoing switch from Scholastica to Open review*\]
:::

::: {.column width="35%"}
#### Solutions/Prototype

Reproducible articles and computations

[![](img/quarto.png){height="40px"}](https://quarto.org)

Automatic editorial reproducibility

[![](img/github_actions.png){height="80px"}](https://github.com/features/actions)

Scientific validation

[![](img/openreview.png){height="80px"}](https://openreview.net/)
:::
:::

## Note on literate programming

<br>

-   Literate programming [@knuth1984]: notebook including text and code
-   Markup formatting language: e.g. [`markdown`](https://fr.wikipedia.org/wiki/Markdown)
-   Separate content from rendering (≠ ["what you see is what you get"](https://en.wikipedia.org/wiki/WYSIWYG) editors)
-   Rendering includes text, code and results (from code computations)

<br>

````{markdown}
---
title: "My article"
---

We compute 1+1:

```{{r}}
1+1
```
````

## Note on quarto

<br>

::: r-stack
[![](img/quarto.png){height="80px"}](https://quarto.org) <https://quarto.org>
:::

<br>

-   Generalization of [`Rmarkdown`](https://pkgs.rstudio.com/rmarkdown/)
-   Relying on top community tools like universal document converter [`Pandoc`](https://pandoc.org/)
-   Developed and supported by RStudio/Posit
-   Native support of complex documents (website, articles, books) and multiple languages for computations (R, Python, Julia)
-   Management of references, citations, figures, tables, metadata, etc.

## Note on [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration)

-   Implementation in git forges (e.g. [github actions](https://github.com/features/actions) or [gitlab CI/CD](https://about.gitlab.com/topics/ci-cd/))
-   Triggered by commits
-   Automatic tests
-   Automatic deployment: package/software publication, website

[![](img/continuous_integration.jpg){width="600px" fig-align="center"}](https://en.wikipedia.org/wiki/File:Continuous_Integration.jpg)

::: r-stack
<small>Credit: Pratik89Roy [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) from [Wikimedia](https://en.wikipedia.org/wiki/File:Continuous_Integration.jpg)</small>
:::

## Tools for authors

::: columns
::: {.column width="50%"}
#### Document model

[`quarto` Computo extension](https://computorg.github.io/computo-quarto-extension/) [![](img/computo_repo_quarto_extension.png){height="300px"}](https://github.com/computorg/computo-quarto-extension)
:::

::: {.column width="50%"}
#### Document template

[Git `template` repository](https://computo.sfds.asso.fr/repos/)

with template notebook document + doc + pre-configured compilation and publication setup

[![](img/computo_template_repositories.png)](https://computo.sfds.asso.fr/repos/)
:::
:::

### Locally

::: columns
::: {.column width="55%"}
-   Text editor/IDE (VS Code, Rstudio, NeoVim, etc.)
-   Quarto (compilation)
:::

::: {.column width="45%"}
-   Julia / R / Python code + computations
-   git versioning system
:::
:::

## Author point of view (1/3)

<br>

### Step 0: setup a git repository for your article

::: columns
::: {.column width="58%"}
Startup from a template repository ([R](https://github.com/computorg/template-computo-R), [Python](https://github.com/computorg/template-computo-python), [Julia](https://github.com/computorg/template-computo-julia))

::: callout-tip
You can host your git repository on [**github**](https://github.com) and soon an any **gitlab** forge[^1].
:::
:::

::: {.column width="42%"}
[![](img/computo_use_template.png)](https://github.com/computorg/template-computo-R)
:::
:::

[^1]: with CI/CD support

<br>

### Step 1: write your article

Let's go, locally (same spirit as Jupyter/Rmarkdown notebooks)

## Author point of view (2/3)

### Step 2: configure the environment (dependencies management)

::: panel-tabset
#### Example in Python

::: columns
::: {.column width="75%"}
`venv`: use a virtual environment and generate the `requirements.txt` file
:::

::: {.column width="25%"}
``` yaml
# requirements.txt
jupyter
matplotlib
numpy
```
:::
:::

#### Example in R

::: columns
::: {.column width="58%"}
`renv`: generate the `renv.lock` file
:::

::: {.column width="42%"}
``` r
renv::init()
renv::install("ggplot2")
# or equivalently install.packages("ggplot2")
renv::snapshot()
```
:::
:::

#### Example in Julia

::: columns
::: {.column width="75%"}
`Pkg`: native `Julia` package manager (with generated `Project.toml` et `Manifest.toml` files)
:::

::: {.column width="25%"}
``` julia
add Plots
add IJulia
```
:::
:::
:::

<small>Configuration file versionned and used during CI compilation/publication action</small>

### Step 3: (re)production

A `git push` command will trigger your article compilation (including computations) and publication as a [*github* page](https://pages.github.com)[^2]

[^2]: or as a *gitlab* page when *gitlab* will be supported (soon)

<small>See the preconfigured `.github/workflows/build.yml` file for the *github* action configuration[^3]</small>

[^3]: and soon the `.gitlab-ci.yml` file for the *gitlab* CI/CD configuration

## Author point of view (3/3)

::: columns
::: {.column width="70%"}
<br>

### Step 4: submit your article

If the CI process succeeds, both HTML and PDF versions are published on the [*github-page*](https://computorg.github.io/template-computo-R/) associated to the repository

<br/><br/>

#### <s>Scholastica</s> Open review

<https://openreview.net/group?id=Computo>

Submit:

-   your article PDF (scientific content review)
-   your git repository (source code and reproducibility review)
:::

::: {.column width="30%"}
[![](img/computo_template_rendered.png)](https://github.com/computorg/template-computo-python/)

[![](img/computo_openreview.png)](https://openreview.net/group?id=Computo)
:::
:::

## Editor point of view

::: columns
::: {.column width="70%"}
After a "traditionnal" review process, a 3 step procedure:

1.  Acceptance
2.  Pre-production
3.  Publication in Computo (with a DOI)

including

-   Copy of the author git repository to <https://github.com/computorg/>
-   Final version formatting
-   Review report publication
-   Registration in the journal bibliographic data base
-   Copy of the repository to [Software Heritage](https://archive.softwareheritage.org/browse/search/?q=computorg%2Fpublished&with_visit=true&with_content=true) for archiving
-   Publication of the article on the journal website
:::

::: {.column width="30%"}
[Task-list = github issue](https://github.com/computorg/published-paper-tsne/issues/5)

[![](img/computo_template_issue_editor.png)](https://github.com/computorg/published-paper-tsne/issues/5)
:::
:::

## 2year and a half report

<br/>

🥲 Fully operational + doi, ISSN

🙂 7 published articles articles, 3 in preproduction, 6 under review (more details [here](https://computo.sfds.asso.fr/publications/))

🙂 x presentations (Montpellier, Toronto, Humastica, Grenoble, RR2023, etc.)

🙂 [French reproducible research network](https://www.recherche-reproductible.fr/)

🤯 Difficult to find reviewers

🤔 Institutional support?

🤔 Changing practices in the scientific community?

<br/>

## Discussion

### About several choices

-   [`quarto`](https://quarto.org/): dynamic, agnostic language, [FOSS](https://en.wikipedia.org/wiki/Free_and_open-source_software)[^4], community-based ([`pandoc`](https://pandoc.org/)), Rstudio/Posit support
-   [`github`](https://github.com): dynamic, large user community but not institutional and limited computing resources

[^4]: "free and open-source"

### Comparison/inspiration

-   [Peer Community-In (PCI)](https://peercommunityin.org/)[^5], [EpiSciences](https://www.episciences.org/): different philosophy and/or functioning
-   <https://rescience.github.io/>: "remake" published articles
-   <https://distill.pub> (discontinued): technically more complicated and only ML/AI-oriented

[^5]: Computo is a [PCI-friendly journal](https://peercommunityin.org/pci-friendly-journals/)

## Perspectives

<br>

-   Provision of computing resources (to be able to run all computations)
-   Full *gitlab* support (CI/CD, docker, registry, etc.)
-   Switch to a french institutional gitlab forge?
-   Improve long-term reproducibility stack ([docker](https://www.docker.com/) container, [GUIX](https://guix.gnu.org) fully reproducible environment, only at the end of the publication process, )

<br>

### How to help?

::: columns
::: {.column width="50%"}
-   By submitting[^6] your work!
:::

::: {.column width="50%"}
-   By becoming reviewer[^7]
:::
:::

[^6]: <https://computo.sfds.asso.fr/submit/>

[^7]: contact us at [computo\@sfds.asso.fr](mailto:computo@sfds.asso.fr){.email}

## Regarding the logo

<br>

![](img/computo_concept.png){width="600px" fig-align="center"}

## References {.scrollable .smaller}

::: {#refs}
:::

# Reproducibility considerations


![Scientific and editorial reproducibility](figures/reproducible-sequence-orig.svg)


## Two-fold reproducibility

The global scientific workflow of a reproducible process for a Computo may be split into two types of steps:

**External and Editorial**

## External

External
: Process to obtain (intermediate) results utside of the notebook environment, for a list of reasons (non-exclusive to each other):

- the process is too long to be conducted in a notebook
- the data to be processed is too big to be handled directly in the notebook
- it needs a specific environment (e.g. a cluster, with gpus, etc.)
- it needs to involve specific languages (e.g. C, C++, Fortran, etc.) or build tools (e.g. make, cmake, etc.)

<!-- ## Reproducibility considerations (3)

In our context, **External** is “Computational reproducibility”, where the reproducibility is achieved by providing the code and the environment to run it, but not the results themselves. -->

## Editorial

Editorial
: notebook rendering with the results of the external process

:::{.callout-note title="Requirement"}
If the notebook contains *everything* to produce the final document

$\Rightarrow$ “Direct reproducibility” in the sense that the notebook is the only thing needed to reproduce the results.

Ultimately, the workflow must end with a direct reproducibility step which concludes the whole process.
:::

## Reproducibility considerations (5)

Data transfer
: When applicable, the switch from external to editorial reproducibility is done with a “data transfer” step,

data produced by the external process $\Rightarrow$ transferred to the notebook environment.

:::{.callout-warning title="Requirement"}
Not only the intermediate results are provided, but also **the code to transfer it in the notebook environment.**

There are a variety of software solutions to do so.
:::

## Examples of data transfer solutions

### Intermediate results storage
- Python: [`joblib.Memory`](https://joblib.readthedocs.io/en/latest/memory.html), caching mechanism for python functions, save the results of a function call to disk, and load it back later.
- R : `.RData` file format, can be loaded back in R with the `load()` function.
- If results are small enough, storing them in a text file (e.g. `.csv`, `.tsv`, `.json`, etc.) is also a solution.

### Transfer of the results to the notebook environment
- (`.joblib` directory or `.Rdata` file) could be committed to the git repository, and directly loaded in the notebook environment.
- Alternative, centralize input data (when large enough) and intermediate results on a shared scientific provider (we recommend [Zenodo](https://zenodo.org/) for this purpose), and download them in the notebook environment.

# Workshop

## Quarto

In this workshop, we will learn how to use quarto to create a document that includes code, data, and narrative text. We will also learn how to make the CI (continuous integration) work.

## The main pipeline, step by step

-   Template installation
-   computing environment: renv, conda, etc.
-   Authoring in the qmd
-   rendering locally
-   pushing to github


## Getting started

To get started you will need to clone the mock template for this workshop. The template is available at

[https://github.com/computorg/template-jds2024](https://github.com/computorg/template-jds2024)

:::{ .content-hidden unless-format="revealjs" }
{{< qrcode https://github.com/computorg/template-jds2024 >}}
:::

## Mock repository

https://github.com/computorg/template-jds2024

:::{ .content-hidden unless-format="revealjs" }
{{< qrcode https://github.com/computorg/template-jds2024 >}}
:::

## Creating a repo from a template

1. On GitHub.com, navigate to the main page of the repository.
2. Above the file list, click Use this template.
3. Select Create a new repository.
4. Select `Include all branches`.

## Language version

Make a `git clone` of the repository you just templated and open it in your favorite IDE.

:::{ layout-ncol="2"}

### Python version

Rename the `published-paper-tsne-python.qmd` to `published-paper-tsne.qmd`

### R version

Rename the `published-paper-tsne-R.qmd` to `published-paper-tsne.qmd`

:::


# Conclusion