A.k.a. the neural network acronym post, this is in fact an announcement for a series of four articles to be published, each covering one of the four major types of modern neural network: unsupervised pretrained networks, including autoencoders and generative adversarial networks (GANs); convolutional neural networks (CNNs); recurrent neural networks (RNNs), including long short-term memory (LSTM) and gated recurrent units (GRU) models; and recursive neural networks. Each of the following four posts, as well as this one, is in no small part a self study of Deep Learning: A Practitioner’s Approach by Patterson and Gibson, and as such, each of these articles are and will continue to be, at least for a time, living documents, subject to revision, edition, perdition. Feel free to read along in the book (see end for info). Here, a very brief overview of each of the major architectures and what their uses are.
- Deep belief networks
- Convolutional neural networks (CNNs)
- Recurrent neural networks (RNNs)
- Recursive neural networks
Deep belief networks
These notably include autoencoders and generative adversarial networks (GANs).
Most notable are perhaps GANs, as they’re doing a lot of the really fun stuff you’re hearing about in the realm of neural networks. Neural network that makes a real pictures out of your cat doodle? GAN. A machine that turns your input text into a Trump-style tweet? GAN. AI art? Often GANs. They work by actually training two neural networks, (1) a generative network, which makes whatever thing your making (tweet, image), and (2) a discriminator network, which is often itself a CNN, that judges how well whatever thing was made turned out.
Autoencoders, often used as part of larger neural networks, are mostly used for dimensionality reduction.
“The output of the autoencoder network is a reconstruction of the input data in its most efficient form.”
––Patterson and Gibson, p.112
Convolutional neural networks (CNNs)
“The goal of a CNN is to learn higher-order features in the data via convolution.”
— Patterson and Gibson, p.125
These have their claim to fame in image data, though they can also work well with sound and text data. These are often the main driving engine behind computer vision, image recognition, and image classification. Their most iconic feature and namesake, the convolution, is the process of a sliding “window” moving across the data. This pairs nicely when think about images: imagine a much smaller window scanning a much larger image for “features” (maybe horizontal or vertical lines, or diagonal lines, or anything!). That’s a convolution. CNNs do that a whole lot, with a whole bunch of different windows, each looking for a different feature in the (image) data its scanning.
Other than image data, they have also been applied to MRI data, graph data, and natural language processing (NLP) applications (141).
Recurrent neural networks (RNNs)
These networks are iconic for their applications to data with a time dimension because, unlike the other major network archetypes, RNNs move along the data in a sequential pattern, including the output from one moment in time as part of the input for the next moment in time in a cyclical pattern (hence “recurrent”). This allows the network to “remember” earlier parts of the input for the sake of predicting later parts.
There are two very popular variations of RNNs, long short-term memory (LSTM) networks and gated recurrent units (GRUs). LSTM networks are most iconic for their success generating sentences and GRUs are still somewhat the new kid on the block, considered the newer, smarter LSTM as they’re considered easier to compute and implement (156).
RNNs as a whole have found application anywhere there is a sequential element to data, including speech and music synthesis, language modeling, and time-series (read: stock) prediction (145).
And as an aside, it’s my opinion that these are by far the most convoluted (not to be confused with convolutional above) of the modern networks. Simple at a high level (they send previous data forward, so that the network “remembers” what it’s seen) but complex at the microscopic level. Both LSTM networks and GRUs come with explanations that resemble diagrams you saw in cell biology. Accordingly, these will probably be the last networks to get covered in this series.
Recursive neural networks
“Recursive Neural Networks, like Recurrent Neural Networks, can deal with variable length input. The primary difference is that Recurrent Neural Networks have the ability to model the hierarchical structures in the training dataset…[they] are composed of a shared-weight matrix and binary tree structure that allows the recursive network to learn varying sequences of words or parts of an image. It is useful as a sentence and scene parser.”
––Patterson and Gibson, p. 160
Gibson, A., & Patterson, J. (2017). Deep Learning: A Practitioner’s Approach. Sebastopol: O’Reilly.