I recently finished Flatiron School’s bootcamp in data science, and thinking back on where I was in early 2020, it’s amazing how much stuff has been crammed into my head in the space of a year. Web scraping, data visualization, A/B testing, linear algebra, vector calculus, regression, SVMs, random forests, XGBoost, Docker, AWS, Tableau, and so many different ways that neural networks can be put together. It has been an education with both breadth and depth *within the field of data science.*

There’s kind of a caveat there: as much as I’ve learned about data science in a year of…

A.k.a. the neural network acronym post, this is in fact an announcement for a series of four articles to be published, each covering one of the four major types of modern neural network: **unsupervised pretrained networks**, including autoencoders and generative adversarial networks (GANs); **convolutional neural networks** (CNNs); **recurrent neural networks** (RNNs), including long short-term memory (LSTM) and gated recurrent units (GRU) models; and **recursive neural networks**. Each of the following four posts, as well as this one, is in no small part a self study of *Deep Learning: A Practitioner’s Approach* by Patterson and Gibson, and as such, each of…

Somehow, in years of schooling, I’d never heard of beta distributions until I stumbled onto them on accident over at David Robinson’s blog, but they’ve quickly become one of my favorite distributions. I mean, just look at that thing wiggling off to the left there. They’re an incredibly flexible distribution, but that’s not what makes them really cool. It’s the ease with which you can update your predictions that makes beta distributions stand out.

Beta distributions are described by two parameters, alpha and beta (you can see how changing these values affects the shape of the distribution above), and while…

Statistics and probability lie at the heart of so very, very many applied sciences today, not least amongst them data science, but it can be a frighteningly jargon-y field; after all, it is, at *its *heart, math. What’s the difference between probability, likelihood, and statistics? What’s Bayes’ theorem all about? What’s exactly is a prior and a posterior?

A good introduction to probability is Khan Academy, which can get you started with permutations and combinations, random variables, and even basic set theory. I’d call this tier 1 foundational knowledge, and while important, you hit the ceiling on Khan Academy pretty…

If you’ve worked at all with real data, you’ve probably already had to handle cases of missing data. (I wonder what the probability of missing data is in any given natural data. I suspect about as close to certainty as you can get.)

We should be suspicious of any dataset (large or small) which appears perfect.

— David J. Hand

How did you handle it? Row-wise deletion? Column-wise deletion? Imputation? What did you impute? If continuous, did you use the mean, baseline, or a KNN-derived value? …

So you’ve been using GitHub for a while now, and maybe you’re starting to notice that your repositories page (pages???) is fuller than it used to be. Maybe, in fact, it’s become so full that it’s something more akin to a code dump than a meaningful collection of code. At the time of writing this, my GitHub falls decidedly into the former category (yes, that’s me up there), and it feels like it’s time to change that.

After all, your GitHub is in many ways the coding face you present to the rest of the world, and as such, you…

I’ve heard of ROC curves, you’ve heard of ROC curves, we’ve all heard of ROC curves. (If you haven’t heard of ROC curves, you can read about them here.)

They’re the go-to visual assessment of how well a classifier performs on binary data, which is already a very loosey-goosey definition. What do we mean, “how well a classifier performs”––by what metric? For which data? Now you’re asking the right questions.

ROC curves have a nice semantic interpretation: they’re a classifier’s true positive rate (TPR) plotted against its false positive rate (FPR). It has some nice interpretations, as does the accompanying…

That photo pretty much says it all: the probability of event A given event B is equal to the product of the probability of B given A, the probability of A, and the reciprocal of the probability of B. Actually, I think the image does a nicer job of summing it all up.

Coming from a background in math (as opposed to statistics) this formula was a bit elusive at first, and temptingly nonsensical, if one interprets these conditional probabilities too literally as fractions. …

No, I don’t mean AI art, though that is beautiful. I mean the intrinsic, intellectual beauty of the field, before it’s ever taken into the realm of art. I suppose this is going to be my “why data science” post.

My last semester at college, I had the good luck to take a class that left me with so much to think about that I still think about it, years later. The serendipitous combination of my being interested in the material, the whole of the class seeming to get along well, and the whole of the class seeming to adore…

The person I remember most from my first few weeks in Flatiron School’s Data Science Program was my one-on-one technical coach, Eli. He was a thespian at heart, knew how to hold an audience’s attention, but the encouragement he showed those under his tutelage was no act; I have never met a more naturally caring, generous, and supportive person, and I consider it my personal privilege to have known him and to have spent some time under his wings.

He passed earlier this year––a loss for the whole data science community––and I’ve had it on my mind lately to share…

Student of data science. Translator (日本語). Tutor. Bicyclist. Stoic. Tea pot. Seattle.