Applied Data Science & ML

Hi, my name is Billy. This is where I try to archive topics related to Applied Data Science & ML.

Building the Right Thing Is Hard

Know-how

Needfinding, and why knowing the fix is the easy part.

Jun 30, 2026

Changes to my workflow

What changed in how I work after handing real tasks to a terminal coding agent, and what it replaced.

Apr 12, 2026

Going down a random rabbit hole: From XML Tags to $100M Weight Updates

Why Anthropic leans on XML tags. Following the thread from prompt formatting down to how self-attention was trained.

Mar 4, 2026

Estimating the Distribution of Omitted Variable Bias in Causal Inference

CausalInference

How large would an unobserved confounder need to be to overturn your causal estimate? A survey of methods for bounding omitted variable bias.

Apr 6, 2025

When Linear Regression Gets Massively Confused

DataScienceBasics

Know-how

regression

A mass point (a big cluster of identical values) can wreck linear regression after a log transform. Why it happens and what it breaks.

Mar 15, 2025

Causal Inference: Assessing Overlap in Covariate Distributions

CausalInference

DataScience

DataScienceBasics

howto

Working through Chapter 14 of Imbens & Rubin: the diagnostics for checking covariate overlap between treatment and control, with the formulas spelled out.

Sep 1, 2024

Recommendations as treatments

CausalInference

Recommendations

Joachims et al. reframe recommender systems as policies you can study with causal tools like inverse propensity weighting. My summary.

Jun 17, 2024

Causal Inference cheatsheet

CausalInference

Cheatsheet

A single-page reference for causal inference methods and the load-bearing assumption behind each one, following Facure’s Causal Inference for the Brave and True.

Jan 15, 2024

Relationship of covariance and dot product

DataScienceBasics

Covariance is a dot product of centered variables. The short derivation that connects the statistical and geometric views.

Feb 1, 2022

Deep dive into MLOps.

mlops

What happens after the ML proof-of-concept: the deployment patterns, tradeoffs, and failure modes of putting a model into production.

May 14, 2021

Data engineering: simple and complex data pipelines

data-engineering

Notes on data pipeline patterns I picked up moving from ML work into data engineering, from a Chris Riccomini talk.

Apr 13, 2021

Takeaways from Kaggle’s “Jane Street Market Prediction” competition

kaggle

tips and tricks

What I took away from Kaggle’s Jane Street Market Prediction competition: cross-validation, Keras tuning, and fast inference tricks.

Mar 13, 2021

Not so simple classification.

classification

Binary classification gets called easy. A POC predicting campaign visits showed me where that assumption falls apart.

Feb 14, 2021

Categories