For self-starters, bootcamp grads, and even experienced professionals

Image for post
Image for post
Photo by Nick Morrison on Unsplash.

I recently finished Flatiron School’s bootcamp in data science, and thinking back on where I was in early 2020, it’s amazing how much stuff has been crammed into my head in the space of a year. Web scraping, data visualization, A/B testing, linear algebra, vector calculus, regression, SVMs, random forests, XGBoost, Docker, AWS, Tableau, and so many different ways that neural networks can be put together. It has been an education with both breadth and depth within the field of data science.

There’s kind of a caveat there: as much as I’ve learned about data science in a year of…

An overview of the four archetypes of modern networks

Image for post
Image for post
Photo by Rock’n Roll Monkey on Unsplash.

A.k.a. the neural network acronym post, this is in fact an announcement for a series of four articles to be published, each covering one of the four major types of modern neural network: unsupervised pretrained networks, including autoencoders and generative adversarial networks (GANs); convolutional neural networks (CNNs); recurrent neural networks (RNNs), including long short-term memory (LSTM) and gated recurrent units (GRU) models; and recursive neural networks. Each of the following four posts, as well as this one, is in no small part a self study of Deep Learning: A Practitioner’s Approach by Patterson and Gibson, and as such, each of…

Easy updating for fast live predictions

Image for post
Image for post
An animation of the beta distribution for different values of its parameters. Gif by Pabloparsil on Wikipedia. CC BY-SA 4.0.

Somehow, in years of schooling, I’d never heard of beta distributions until I stumbled onto them on accident over at David Robinson’s blog, but they’ve quickly become one of my favorite distributions. I mean, just look at that thing wiggling off to the left there. They’re an incredibly flexible distribution, but that’s not what makes them really cool. It’s the ease with which you can update your predictions that makes beta distributions stand out.

Beta distributions are described by two parameters, alpha and beta (you can see how changing these values affects the shape of the distribution above), and while…

An excellent resource for core probability and statistics concepts

Image for post
Image for post
Screen capture by author.

Statistics and probability lie at the heart of so very, very many applied sciences today, not least amongst them data science, but it can be a frighteningly jargon-y field; after all, it is, at its heart, math. What’s the difference between probability, likelihood, and statistics? What’s Bayes’ theorem all about? What’s exactly is a prior and a posterior?

A good introduction to probability is Khan Academy, which can get you started with permutations and combinations, random variables, and even basic set theory. I’d call this tier 1 foundational knowledge, and while important, you hit the ceiling on Khan Academy pretty…

The vocabulary you’ll need for a technical interview

Image for post
Image for post
Missing data. Image by Author.

If you’ve worked at all with real data, you’ve probably already had to handle cases of missing data. (I wonder what the probability of missing data is in any given natural data. I suspect about as close to certainty as you can get.)

We should be suspicious of any dataset (large or small) which appears perfect.

— David J. Hand

How did you handle it? Row-wise deletion? Column-wise deletion? Imputation? What did you impute? If continuous, did you use the mean, baseline, or a KNN-derived value? …

Keeping your amount of repositories manageable

Image for post
Image for post
Yikes. Image by author.

So you’ve been using GitHub for a while now, and maybe you’re starting to notice that your repositories page (pages???) is fuller than it used to be. Maybe, in fact, it’s become so full that it’s something more akin to a code dump than a meaningful collection of code. At the time of writing this, my GitHub falls decidedly into the former category (yes, that’s me up there), and it feels like it’s time to change that.

After all, your GitHub is in many ways the coding face you present to the rest of the world, and as such, you…

The unsung hero-twin of ROC curves

Image for post
Image for post
Precision-recall plots for several classifiers working with binary data. Image by author.

I’ve heard of ROC curves, you’ve heard of ROC curves, we’ve all heard of ROC curves. (If you haven’t heard of ROC curves, you can read about them here.)

They’re the go-to visual assessment of how well a classifier performs on binary data, which is already a very loosey-goosey definition. What do we mean, “how well a classifier performs”––by what metric? For which data? Now you’re asking the right questions.

ROC curves have a nice semantic interpretation: they’re a classifier’s true positive rate (TPR) plotted against its false positive rate (FPR). It has some nice interpretations, as does the accompanying…

An area analogy

Image for post
Image for post
“Maths in neon at Autonomy in Cambridge.” Image by Matt Buck on flickr, licensed under Creative Commons BY-SA 2.0.

That photo pretty much says it all: the probability of event A given event B is equal to the product of the probability of B given A, the probability of A, and the reciprocal of the probability of B. Actually, I think the image does a nicer job of summing it all up.

Coming from a background in math (as opposed to statistics) this formula was a bit elusive at first, and temptingly nonsensical, if one interprets these conditional probabilities too literally as fractions. …

Who are you calling a sophist?

Image for post
Image for post
Photo by Ivana Cajina on Unsplash

No, I don’t mean AI art, though that is beautiful. I mean the intrinsic, intellectual beauty of the field, before it’s ever taken into the realm of art. I suppose this is going to be my “why data science” post.

My last semester at college, I had the good luck to take a class that left me with so much to think about that I still think about it, years later. The serendipitous combination of my being interested in the material, the whole of the class seeming to get along well, and the whole of the class seeming to adore…

How to reel ’em in

Image for post
Image for post
Photo by Miguel Henriques. Taken from Unsplash.

The person I remember most from my first few weeks in Flatiron School’s Data Science Program was my one-on-one technical coach, Eli. He was a thespian at heart, knew how to hold an audience’s attention, but the encouragement he showed those under his tutelage was no act; I have never met a more naturally caring, generous, and supportive person, and I consider it my personal privilege to have known him and to have spent some time under his wings.

He passed earlier this year––a loss for the whole data science community––and I’ve had it on my mind lately to share…

S. T. Lanier

Student of data science. Translator (日本語). Tutor. Bicyclist. Stoic. Tea pot. Seattle.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store