Uni Turin, Collegio Carlo Alberto, SciencesPo Paris, RES Data Editor
30 April, 2025
README
file.What Do We Expect
An advanced graduate student should be able to generate
with your package in the most user-friendly way possible.
A priori, our output should be exactly equal to yours. π¬
README
FileREADME
at The Economic JournalMinimum Requirement
There should be a separation along:
Example?
.
βββ 20211107ext_2v1.do
βββ 20220120ext_2v1.do
βββ 20221101wave1.dta
βββ james
β βββ NLSY97
β βββ nlsy97_v2.do
βββ mary
β βββ NLSY97
β βββ nlsy97.do
βββ matlab_fortran
β βββ graphs
β βββ sensitivity1
β β βββ data.xlsx
β β βββ good_version.do
β β βββ script.m
β βββ sensitivity2
β βββ models.f90
β βββ models.mod
β βββ nrtype.f90
βββ readme.do
βββ scatter1.eps
βββ scatter1_1.eps
βββ scatter1_2.eps
βββ ts.eps
βββ wave1.dta
βββ wave2.dta
βββ wave2regs.dta
βββ wave2regs2.dta
(scroll down! π)
.
βββ README.md
βββ code
β βββ R
β β βββ 0-install.R
β β βββ 1-main.R
β β βββ 2-figure2.R
β β βββ 3-table2.R
β βββ stata
β β βββ 1-main.do
β β βββ 2-read_raw.do
β β βββ 3-figure1.do
β β βββ 4-figure3.do
β β βββ 5-table1.do
β βββ tex
β βββ appendix.tex
β βββ main.tex
βββ data
β βββ processed
β βββ raw
βββ output
βββ plots
βββ tables
README
Note
There is no unique best way to organize your project: Make it simple, intuitive and helpful.
Important
Ideally your entire project is under version control.
Question:
How to write reproducible code?
π Huge question to answer. Letβs try with a few simple things first:
No Manual Manipulation.
Do This!
In general, take all necessary steps to ensure cross-platform compatibility of your code.
file paths are such low-hanging fruit πβ¦
donβt build tables by hand
π Ask the user to set the root
of your project, via global variable, environment variable, or other
# in my R, I do
Sys.setenv(PACKAGE_ROOT="/Users/floswald/Downloads/your_package")
# your package uses:
file.path(Sys.getenv("PACKAGE_ROOT"), "data", "wages.csv")
# in my stata, I do
global PACKAGE_ROOT "/Users/floswald/Downloads/your_package"
# your package uses
use "$PACKAGE_ROOT/data/wages.dta"
Always use forward slashes on Stata /
, even on a windows machine!
No Guarantee
Your code will yield identical results on a different computer only if certain conditions apply.
Protected Environments
π You should provide a mechanism which ensures that those conditions do apply.
π Baking Bread (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
Ingredients | Dependencies |
- 500g flour | - Python 3.10 |
- 300ml water | - numpy==1.24.0 |
- 7g dry yeast | - pandas==1.5.3 |
- 10g salt | - scikit-learn (no version specified) |
π Baking Bread (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
Ingredients | Dependencies |
- 500g flour | - Python 3.10 |
- 300ml water | - numpy==1.24.0 |
- 7g dry yeast | - pandas==1.5.3 |
- 10g salt | - scikit-learn (no version specified) |
Instructions | Instructions |
1. Mix ingredients | 1. Clone the repository from GitHub |
2. Knead dough | 2. Create and activate a virtual environment |
3. Let rise 1 hour at room temperature | 3. Install dependencies from requirements.txt |
4. Bake at 220Β°C for 30 minutes | 4. Run python train_model.py with default config |
π Baking Bread (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
Ingredients | Dependencies |
- 500g flour | - Python 3.10 |
- 300ml water | - numpy==1.24.0 |
- 7g dry yeast | - pandas==1.5.3 |
- 10g salt | - scikit-learn (no version specified) |
Instructions | Instructions |
1. Mix ingredients | 1. Clone the repository from GitHub |
2. Knead dough | 2. Create and activate a virtual environment |
3. Let rise 1 hour at room temperature | 3. Install dependencies from requirements.txt |
4. Bake at 220Β°C for 30 minutes | 4. Run python train_model.py with default config |
Expected Outcome | Expected Outcome |
- Well-risen, airy loaf of bread | - Consistent training accuracy and saved model |
π Bread Baking (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
1. Yeast Inactivation | 1. Library Version Mismatch |
Water too hot (e.g., 60Β°C) kills the yeast. No rise. | scikit-learn was updated β train_test_split() behaves differently, causing changes in results. |
π Bread Baking (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
1. Yeast Inactivation | 1. Library Version Mismatch |
Water too hot (e.g., 60Β°C) kills the yeast. No rise. | scikit-learn was updated β train_test_split() behaves differently, causing changes in results. |
2. Cold Proofing | 2. Different OS / File System |
Room too cold (e.g., 15Β°C) β dough rises too slowly. | Path handling fails on Windows vs. Linux (\ vs. / ), or line endings cause script errors. |
π Bread Baking (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
1. Yeast Inactivation | 1. Library Version Mismatch |
Water too hot (e.g., 60Β°C) kills the yeast. No rise. | scikit-learn was updated β train_test_split() behaves differently, causing changes in results. |
2. Cold Proofing | 2. Different OS / File System |
Room too cold (e.g., 15Β°C) β dough rises too slowly. | Path handling fails on Windows vs. Linux (\ vs. / ), or line endings cause script errors. |
3. High Altitude Baking | 3. Hardware Differences (e.g., CPU vs. GPU) |
Lower pressure expands gas too fast; loaf collapses. | Numerical precision differs β inconsistent model outputs. |
π Bread Baking (Chemical Experiment) | π» Running a Script (Computational Experiment) |
---|---|
1. Yeast Inactivation | 1. Library Version Mismatch |
Water too hot (e.g., 60Β°C) kills the yeast. No rise. | scikit-learn was updated β train_test_split() behaves differently, causing changes in results. |
2. Cold Proofing | 2. Different OS / File System |
Room too cold (e.g., 15Β°C) β dough rises too slowly. | Path handling fails on Windows vs. Linux (\ vs. / ), or line endings cause script errors. |
3. High Altitude Baking | 3. Hardware Differences (e.g., CPU vs. GPU) |
Lower pressure expands gas too fast; loaf collapses. | Numerical precision differs β inconsistent model outputs. |
4. Too Much Salt | 4. Missing or Incorrect Environment Variable |
Excess salt suppresses yeast β poor fermentation. | DATA_DIR not set β script fails or loads wrong input silently. |
Result: Flat, dense, or failed bread | Result: Different outputs, errors, or failed experiments |
At a minimum, your README
lists the exact computing environment:
OS, software and which version used (R 4.1
, stata 17/MP
, matlab 2023b
, GNU Fortran (Homebrew GCC 13.2.0)
)
Libraries and which exact version used (ggplot2 1.3.4
, outreg 2
, numpy 1.26.4
, boost 1.8.3
)
Stata: install all libraries into your replication package.
π Virtual Environments can help.
julia
built-in Pkg
manager:
(@v1.10) pkg> activate .
Activating new project at `~/my-project`
(my-project) pkg> add DataFrames GLM
# created 2 files in `~/my-project`
# tracking all dependencies
Docker
π³ container. This provides a fully specified virtual machine (i.e. a dedicated computer for your project)
version xyz
statement in master script.ssc install somelib
will install an incompatible version a few years later.Note
Such mechanisms can reduce version conflicts amongst your dependencies. To the extent that all versions of those dependencies are still available, this guarantees a stable computing environment.
Output in Paper | Output in Package | Program to execute |
---|---|---|
Table 1 | outputs/tables/table1.tex |
code/table1.do |
Figure 1 | outputs/plots/figure1.pdf |
code/figure1.do |
Figure 2 | outputs/plots/figure2.pdf |
code/figure2.do |
run_all()
rev1
, rev2
etc, so you know exactly what version of code made which output.10 Steps till Reproducibility
Create a folder structure: data
, code
, output
, paper
Create README.md
at root of this structure
download example data from zenodo
data/raw
code/stata
run.do
fileconfig.do
as well.Here is an outline of a potential run.do
file:
run.do
- set global variables: paths, full/partial data etc
- call config.do
- tell stata where to look for add-ons
- run analysis
log
s.run.do
(can abbreviate)README
/output/
/output/
and try to recompile: error. Good!/output/
R
Code/code/R/
R
script/output/
renv
to /code/R/
folder.R
.and submit to a great journal like the EJ! π