Hi! I’m a Data Scientist with over 15 years of experience in data science, data engineering, and machine learning engineering. My experience thus far has been in the educational technology and journalism domains, the latter focusing primarily on US elections. I have been writing code in Python almost exclusively for over 10 years and cut my teeth on Java, C++, and Perl.

Outside of work, I enjoy spending time with friends and family, live music, hiking with my dog, traveling, working on side programming problems, and studying foreign languages. I’m currently an A1 in French and a very early beginner in Dutch.

I don’t make use of generative AI in any of my writing, so please enjoy everything here in its quirky and imperfect glory.

Current Experience: The Washington Post

Polling Model

Polling Averages

I co-developed our Bayesian Presidential polling averages, which generated over 9 million pageviews and was the most-read Politics story in 2024.
pandas NumPyro scikit-learn AWS Batch MySQL SQLAlchemy

Live Model

Live Election Model

I contributed to our model that predicts who will likely win elections as votes are tabulated live. You can find the code for our Live Model on Github.
pandas cvxpy NumPy SciPy

Personalization

Personalization

I also contribute to our site’s Personalization efforts and am currently conducting an extensive literature review and exploratory data analysis to identify the best-possible metrics we can use to measure product success.
pandas AWS Redshift scikit-learn matplotlib seaborn statsmodels

See all of my experience here

Side Projects

TROON

Troon Brewing Release Prediction

Troon is a famous brewery in New Jersey that tends to release new beers whenever they are ready, without any advance notice. With this project, I’m trying to predict when the next Troon release will be. (tl;dr: It’s hard, but I’ve learned a lot about Bayesian modeling and uncertainty in the process.)
pandas PyMC matplotlib seaborn spaCy scikit-learn Prophet

Image credit: https://customerstrategy.net

Understanding Survey Weighting

I’m trying to gain a better understanding of survey weighting techniques, including raking and multilevel regression with poststratification.
pandas PyMC Bambi Samplics