Reproducibility Workshop

Workshop at IEA 2025 Belfast

Florian Oswald

Uni Turin, Collegio Carlo Alberto, SciencesPo Paris, RES Data Editor

30 April, 2025

Agenda

  1. What is expected from you as an author nowadays?
  2. 10 simple rules to Reproducibility compiled by the Econ Data Editors.
  3. The README file.
  4. Some Reproducibility Best Practices.
  5. Let’s start a reproducible research project!

What Is Required From Authors? 🫡

What is Required from Authors?

What Do We Expect

An advanced graduate student should be able to generate

  1. All Figures
  2. All Tables
  3. All in-text numbers

with your package in the most user-friendly way possible.

A priori, our output should be exactly equal to yours. 😬

10 simple rules to Reproducibility

  1. Computational Empathy
  2. Make data accessible
  3. Cite Data and how to access it
  4. Describe software and hardware requirements
  5. Provide all code
  1. Explain how to reproduce your work
  2. Provide a table of all things that can be reproduced
  3. Include all supporting material
  4. Use a permissible license. Any license is better than none.
  5. Re-run everything!

The README File

  1. Plain text top level file which explains everything about your package.
  2. We have a useful template and a template generator.
  3. Here are the minimum requirements for a README at The Economic Journal

Best Practices

Best Practices

  1. Project Organisation (folder structure)
  2. Code
  3. Data
  4. Output

Best Practices

Project Organisation

  • Folder Structure is a first order concern for your project.

Minimum Requirement

There should be a separation along:

  1. Inputs: Data, parameters, etc
  2. Outputs: Numbers, tables, figures
  3. Code
  4. Paper/Report etc

Example?

Best Practices

Good or Bad?


.
β”œβ”€β”€ 20211107ext_2v1.do
β”œβ”€β”€ 20220120ext_2v1.do
β”œβ”€β”€ 20221101wave1.dta
β”œβ”€β”€ james
β”‚   └── NLSY97
β”‚       └── nlsy97_v2.do
β”œβ”€β”€ mary
β”‚   └── NLSY97
β”‚       └── nlsy97.do
β”œβ”€β”€ matlab_fortran
β”‚   β”œβ”€β”€ graphs
β”‚   β”œβ”€β”€ sensitivity1
β”‚   β”‚   β”œβ”€β”€ data.xlsx
β”‚   β”‚   β”œβ”€β”€ good_version.do
β”‚   β”‚   └── script.m
β”‚   └── sensitivity2
β”‚       β”œβ”€β”€ models.f90
β”‚       β”œβ”€β”€ models.mod
β”‚       └── nrtype.f90
β”œβ”€β”€ readme.do
β”œβ”€β”€ scatter1.eps
β”œβ”€β”€ scatter1_1.eps
β”œβ”€β”€ scatter1_2.eps
β”œβ”€β”€ ts.eps
β”œβ”€β”€ wave1.dta
└── wave2.dta
└── wave2regs.dta
└── wave2regs2.dta

(scroll down! πŸ˜‰)



Bad! πŸ‘Ž

  • Sub directories are not helpful
  • File names are confusing
  • code/data/output are not separated

Best Practices

Good πŸ‘


.
β”œβ”€β”€ README.md
β”œβ”€β”€ code
β”‚   β”œβ”€β”€ R
β”‚   β”‚   β”œβ”€β”€ 0-install.R
β”‚   β”‚   β”œβ”€β”€ 1-main.R
β”‚   β”‚   β”œβ”€β”€ 2-figure2.R
β”‚   β”‚   └── 3-table2.R
β”‚   β”œβ”€β”€ stata
β”‚   β”‚   β”œβ”€β”€ 1-main.do
β”‚   β”‚   β”œβ”€β”€ 2-read_raw.do
β”‚   β”‚   β”œβ”€β”€ 3-figure1.do
β”‚   β”‚   β”œβ”€β”€ 4-figure3.do
β”‚   β”‚   └── 5-table1.do
β”‚   └── tex
β”‚       β”œβ”€β”€ appendix.tex
β”‚       └── main.tex
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ processed
β”‚   └── raw
└── output
    β”œβ”€β”€ plots
    └── tables


Good.

  • Meaningful sub directories
  • top level README
  • code/data/output are separated

Best Practices

Example: TIER Protocol structure

Best Practices

Best Project Structure?


Note

There is no unique best way to organize your project: Make it simple, intuitive and helpful.


Important

Ideally your entire project is under version control.

Reproducible Code

Reproducible Code

Question:

How to write reproducible code?

πŸ‘‰ Huge question to answer. Let’s try with a few simple things first:

  1. Provide a run script which…runs everything. Run it often!
  2. No copy and paste in your pipeline! Write results to computer’s storage.
  3. Clear instructions
  4. Provide a clear way to create the required environment (library installation etc)

Reproducible Code

No Manual Manipulation.

  • Change this parameter to 0.4, then run code again πŸ˜–
  • I computed this number manually πŸ˜–πŸ˜–

Do This!

  • Use functions, ado files, programs, macros, subroutines etc
  • Use loops and parameters
  • Use placeholders for file paths

In general, take all necessary steps to ensure cross-platform compatibility of your code.

file paths are such low-hanging fruit πŸ‡β€¦

don’t build tables by hand

Reproducible Code

File Paths

πŸ‘‰ Ask the user to set the root of your project, via global variable, environment variable, or other

# in my R, I do
Sys.setenv(PACKAGE_ROOT="/Users/floswald/Downloads/your_package")

# your package uses:
file.path(Sys.getenv("PACKAGE_ROOT"), "data", "wages.csv")
# in my stata, I do
global PACKAGE_ROOT "/Users/floswald/Downloads/your_package"

# your package uses
use "$PACKAGE_ROOT/data/wages.dta"

Always use forward slashes on Stata /, even on a windows machine!

Reproducible Code

Tables

stata

In General

A little mustache templating goes a long way…

insert {{ x }} here

Reproducible Code

Safe Environments for Running Your Code

No Guarantee

Your code will yield identical results on a different computer only if certain conditions apply.

Protected Environments

πŸ‘‰ You should provide a mechanism which ensures that those conditions do apply.

Reproducible Code πŸ’»
Is Our Daily Bread 🍞

πŸ§ͺ Reproducibility: Bread Baking vs. Code Execution

🍞 Baking Bread (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
Ingredients Dependencies
- 500g flour - Python 3.10
- 300ml water - numpy==1.24.0
- 7g dry yeast - pandas==1.5.3
- 10g salt - scikit-learn (no version specified)

πŸ§ͺ Reproducibility: Bread Baking vs. Code Execution

🍞 Baking Bread (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
Ingredients Dependencies
- 500g flour - Python 3.10
- 300ml water - numpy==1.24.0
- 7g dry yeast - pandas==1.5.3
- 10g salt - scikit-learn (no version specified)
Instructions Instructions
1. Mix ingredients 1. Clone the repository from GitHub
2. Knead dough 2. Create and activate a virtual environment
3. Let rise 1 hour at room temperature 3. Install dependencies from requirements.txt
4. Bake at 220Β°C for 30 minutes 4. Run python train_model.py with default config

πŸ§ͺ Reproducibility: Bread Baking vs. Code Execution

🍞 Baking Bread (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
Ingredients Dependencies
- 500g flour - Python 3.10
- 300ml water - numpy==1.24.0
- 7g dry yeast - pandas==1.5.3
- 10g salt - scikit-learn (no version specified)
Instructions Instructions
1. Mix ingredients 1. Clone the repository from GitHub
2. Knead dough 2. Create and activate a virtual environment
3. Let rise 1 hour at room temperature 3. Install dependencies from requirements.txt
4. Bake at 220Β°C for 30 minutes 4. Run python train_model.py with default config
Expected Outcome Expected Outcome
- Well-risen, airy loaf of bread - Consistent training accuracy and saved model

⚠️ What Could Possibly Go Wrong?

🍞 Bread Baking (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
1. Yeast Inactivation 1. Library Version Mismatch
Water too hot (e.g., 60Β°C) kills the yeast. No rise. scikit-learn was updated β†’ train_test_split() behaves differently, causing changes in results.

⚠️ What Could Possibly Go Wrong?

🍞 Bread Baking (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
1. Yeast Inactivation 1. Library Version Mismatch
Water too hot (e.g., 60Β°C) kills the yeast. No rise. scikit-learn was updated β†’ train_test_split() behaves differently, causing changes in results.
2. Cold Proofing 2. Different OS / File System
Room too cold (e.g., 15Β°C) β†’ dough rises too slowly. Path handling fails on Windows vs. Linux (\ vs. /), or line endings cause script errors.

⚠️ What Could Possibly Go Wrong?

🍞 Bread Baking (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
1. Yeast Inactivation 1. Library Version Mismatch
Water too hot (e.g., 60Β°C) kills the yeast. No rise. scikit-learn was updated β†’ train_test_split() behaves differently, causing changes in results.
2. Cold Proofing 2. Different OS / File System
Room too cold (e.g., 15Β°C) β†’ dough rises too slowly. Path handling fails on Windows vs. Linux (\ vs. /), or line endings cause script errors.
3. High Altitude Baking 3. Hardware Differences (e.g., CPU vs. GPU)
Lower pressure expands gas too fast; loaf collapses. Numerical precision differs β†’ inconsistent model outputs.

⚠️ What Could Possibly Go Wrong?

🍞 Bread Baking (Chemical Experiment) πŸ’» Running a Script (Computational Experiment)
1. Yeast Inactivation 1. Library Version Mismatch
Water too hot (e.g., 60Β°C) kills the yeast. No rise. scikit-learn was updated β†’ train_test_split() behaves differently, causing changes in results.
2. Cold Proofing 2. Different OS / File System
Room too cold (e.g., 15Β°C) β†’ dough rises too slowly. Path handling fails on Windows vs. Linux (\ vs. /), or line endings cause script errors.
3. High Altitude Baking 3. Hardware Differences (e.g., CPU vs. GPU)
Lower pressure expands gas too fast; loaf collapses. Numerical precision differs β†’ inconsistent model outputs.
4. Too Much Salt 4. Missing or Incorrect Environment Variable
Excess salt suppresses yeast β†’ poor fermentation. DATA_DIR not set β†’ script fails or loads wrong input silently.
Result: Flat, dense, or failed bread Result: Different outputs, errors, or failed experiments

Reproducible Code

Safe Environments for Running Your Code

  • At a minimum, your README lists the exact computing environment:

  • OS, software and which version used (R 4.1, stata 17/MP, matlab 2023b, GNU Fortran (Homebrew GCC 13.2.0))

  • Libraries and which exact version used (ggplot2 1.3.4, outreg 2, numpy 1.26.4, boost 1.8.3 )

  • Stata: install all libraries into your replication package.

πŸ‘‰ Virtual Environments can help.

Reproducible Code

Provide a Virtual Environment

python via anaconda:

conda create -n py27 python=2.7 numpy=1.15.4 matplotlib
conda activate py27

There are other virtual env managers in python

R via renv

# in your existing project:
renv::init() # creates local library
renv::snapshot() # commit
renv::restore()  # checkout

julia built-in Pkg manager:

(@v1.10) pkg> activate .
  Activating new project at `~/my-project`
  
(my-project) pkg> add DataFrames GLM
# created 2 files in `~/my-project`
# tracking all dependencies

Docker 🐳 container. This provides a fully specified virtual machine (i.e. a dedicated computer for your project)

Reproducible Code

Stata Virtual Environment

  1. Include a version xyz statement in master script.
  2. User contributed libraries are not versioned.
  3. You must install all libraries next to your project code. If not, ssc install somelib will install an incompatible version a few years later.
  4. Here is a _config.do script forcing stata to use only libraries installed in a given location.
  5. Excellent guidance by Julian Reif
  6. We will do this later on!

Reproducible code

Note

Such mechanisms can reduce version conflicts amongst your dependencies. To the extent that all versions of those dependencies are still available, this guarantees a stable computing environment.

Data

Data

  • Always keep your raw data intact (i.e. read-only).
  • Generate separate analysis datasets to perform analysis.
  • Datasets change over time, keep a record of the date and versions you obtained. It might be difficult to obtain it in the future.

What about Confidential Data?

  1. If we have instructions for direct access, we try (time limit: 30 mins)
  2. If not, try to get access to authors/data provider’s machine (i.e. their screen)
  3. If not, data provider may certify results for us.
  4. If not, must provide simulated version of data.

Output

Output

  • Write both tables and figures to local storage (don’t just display on the console!)
  • The gold standard: include this table in your readme.
Output in Paper Output in Package Program to execute
Table 1 outputs/tables/table1.tex code/table1.do
Figure 1 outputs/plots/figure1.pdf code/figure1.do
Figure 2 outputs/plots/figure2.pdf code/figure2.do

Output

  • keep a full pipeline intact at all times: run_all()
  • have a dedicated output folder which you delete frequently
  • version output: during revisions, create separate locations for output, rev1, rev2 etc, so you know exactly what version of code made which output.

Break β˜•οΈ 🍰

Hands On Session πŸ’ͺ🏽

10 Steps till Reproducibility

Step 1: Project Setup and Data Acquisition

  • Create a folder structure: data, code, output, paper

  • Create README.md at root of this structure

  • download example data from zenodo

    • save data citation
    • copy data into data/raw
    • set data to read only

Step 2: Stata Setup

  • Create folder code/stata
  • Create a run.do file
  • Set up a config.do as well.

Here is an outline of a potential run.do file:

run.do
    - set global variables: paths, full/partial data etc
    - call config.do
        - tell stata where to look for add-ons
    - run analysis
  • Stata does not crash upon error in nested do-file: must look at logs.

Step 3: Stata Analysis Code

  • Always operate full pipeline via run.do (can abbreviate)
  • read, transform, store data
  • do analysis proper
  • Never build a table by hand!

Step 4: Document Output in README

  • Add used software packages to readme
  • Add OS and stata version to readme
  • Create table in readme indicating where which piece of output can be found

Step 5: Write a Paper!

  • Paper should reference objects in /output/
  • Delete /output/ and try to recompile: error. Good!
  • Regenerate /output/
  • Add total runtime to readme.

Step 6: Add R Code

  • Create /code/R/
  • Add R script

Step 7: Incorporate R output in Paper

  • Paper should reference objects in /output/
  • Recompile paper

Step 8: Record R package environment

  • How to make sure we freeze the R package environment?
  • What about upstream dependencies?
  • add renv to /code/R/ folder.
  • Re-run.

Step 9: Add R package citations

  • Cite the software packages you used!
  • Very easy with R.

Step 10: Recompile Paper

and submit to a great journal like the EJ! πŸ˜‰

End 🍻