### Introduction to Deep Learning¶

##### By Kai-Zhan Lee¶

Welcome! This guide is a quick introduction to the theory and applications of deep learning.

A brief topic summary:

• Machine Learning Basics
• History of Deep Learning
• Feedforward Neural Networks
• Handwritten Digit Classification

## Before we get started...¶

### Why should you care?¶

(Aside from landing that killer machine learning job and making \$\$\$\$)

Cool Applications:

• Speech synthesis and speech recognition (Siri, Alexa, etc.)

• Identifying those in mental distress through social media (current research)

Deep learning models certain aspects of the human brain.

It helps solve problems that only humans could solve before!

### Deep learning is growing -- fast¶

In 2012, deep learning was considered a nice mathematical escape from reality that only researchers investigated.

Today, it approaches ubiquity in resesarch and industry alike.

Source

### Even this actress...¶

has published a paper on deep learning:

### Just don't be this guy...¶

XKCD, "Machine Learning"

## Machine Learning Background¶

### What is data science? (Thanks Robbie!)¶

Put simply: Finding meaning from data.

### What is artificial intelligence (AI)?¶

"... what we want is a machine that can learn from experience."

- Alan Turing, 1947

Artificial intelligence: a perceiving agent within an environment that takes actions to maximize its chances of achieving a specific set of goals.

Remember PEAS!

• Performance Measure: how well is the agent acting to achieve its goals?
• Environment: what is there besides the agent?
• Actuators: what actions can the agent perform?
• Sensors: what does the agent perceive?

### 4 Kinds of AI¶

- Thinking Acting
Naturally Emotion, belief Running, flying, swimming
Rationally Logic, proofs Decisions, choices

In this talk, we'll examine rational actors.

### What is machine learning?¶

Machine learning lies at the intersection of artificial intelligence and data science.

#### Formulation¶

Assume we have input and output sets $\mathcal{X}$ and $\mathcal{Y}$ and a distribution $\mathcal{D}: \mathcal{X} \times \mathcal{Y} \to \mathbb R$. Given $n$ datapoints identically and independently drawn from this distribution, we attempt to find a function $f: \mathcal{X} \to \mathcal{Y}$ that minimizes training error, or loss.

There are many types of loss:

• Mean squared error: $\mathbb E_{x, y \in \mathcal{D}} \left[\lVert f(x) - y\right \rVert_2^2]$.
• Cross-entropy loss: $\mathbb E_{x, y \in \mathcal{D}} \left[\lVert - y \log f(x)\right\rVert_1]$.

## Deep Learning History¶

### Threshold Logic¶

Deep learning has had a long history of iterative redesigning and improvement.

• 1943: Pioneers in mathematically modelling the brain, Walter Pitts and Warren McCulloch propose the Threshold Logic Unit (TLU), a linear model with adjustable, but non-learnable parameters $t$ and $w_1 \dots w_d$, where $d$ is the dimensionality of the input vector $\vec x$.

### Introduction to Learning¶

• 1947: "... what we want is a machine that can learn from experience." - Alan Turing
• 1957: Frank Rosenblatt invents the Linear Perceptron (LP), making the TLU's parameters learnable.

Is this guaranteed to converge? (i.e. give a solution for $\vec w$?)

### Differential Learning¶

• 1958: David Cox invents Logistic Regression (LR) for perceptrons, using gradient descent to find optimal weights. He uses cross-entropy loss to measure model performance.

## Structure of a Neural Net¶

### So what is deep learning anyway?¶

Deep learning is the study of neural networks, models that "learn from experience and understand the world in terms of a hierarchy of concepts."

A neural network consists of multiple layers. Each layer (aside from the input layer) consists of independent perceptrons that take in the previous layer's output as their input. There are 3 main divisions of layers:

• Input Layer: our input vector $\vec x \in \mathcal{X}$.
• Hidden Layer(s): vector(s) $\vec h_1 ... \vec h_n$ of fixed size. Each element of each vector is its own perceptron with its own weights. We refer to each perceptron as a "node".
• Output Layer: our predicted output vector $\vec{y}_{pred} \in \mathcal{Y}$

### Sample Visualization¶

Each column of circles is a vector; each circle with arrows pointing to it is a node (perceptron) taking input from the arrows' sources (the previous layer). This structure is called a feedforward neural network (FFNN or NN for short), because the input is fed forward through the model and, well, it's a neural network. Like with logistic regression, weights are learned with gradient descent; for each weight, we take the partial derivative $\frac{\partial L}{\partial w}$ and move in the opposite direction, to minimize loss.

This particular NN has two layers: one hidden layer of 5 nodes and one output layer of 2 nodes. Note that the input (a 3-vector) doesn't count as a layer.

## Deep Learning with Keras¶

### MNIST¶

Let's try out the MNIST dataset: a collection of handwritten digits from postal codes.

Python Packages: python-mnist (our data!), numpy (matrix library), matplotlib (a good plotting tool in Python)

In [259]:
import numpy as np
import cv2 as cv
from skimage.measure import block_reduce
from matplotlib import pyplot as plt
from mnist import MNIST
%matplotlib inline



### What does this actually look like?¶

In [212]:
# Show the first MNIST training image
img = plt.imshow(trX[0].reshape(28, 28), cmap='gray')
print('True Label:', trY[0])

True Label: 5


### Setup¶

Training a model in Keras is simple: specify a model structure, optimizer, and loss function, and you're ready to start training!

In [252]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential([
Dense(512, activation='sigmoid', input_dim=trX.shape[1]),
Dropout(0.5),
Dense(128, activation='sigmoid'),
Dropout(0.5),
Dense(10, activation='softmax')
])
model.fit(trX, trY, validation_split=0.1, epochs=20, batch_size=256)

Train on 48600 samples, validate on 5400 samples
Epoch 1/20
48600/48600 [==============================] - 4s 78us/step - loss: 1.0246 - acc: 0.6840 - val_loss: 0.4162 - val_acc: 0.8846
Epoch 2/20
48600/48600 [==============================] - 4s 75us/step - loss: 0.5051 - acc: 0.8523 - val_loss: 0.3234 - val_acc: 0.9091
Epoch 3/20
48600/48600 [==============================] - 4s 77us/step - loss: 0.4372 - acc: 0.8710 - val_loss: 0.2868 - val_acc: 0.9154
Epoch 4/20
48600/48600 [==============================] - 4s 75us/step - loss: 0.4003 - acc: 0.8790 - val_loss: 0.2705 - val_acc: 0.9230
Epoch 5/20
48600/48600 [==============================] - 4s 82us/step - loss: 0.3719 - acc: 0.8879 - val_loss: 0.2510 - val_acc: 0.9256
Epoch 6/20
48600/48600 [==============================] - 4s 81us/step - loss: 0.3582 - acc: 0.8941 - val_loss: 0.2340 - val_acc: 0.9302
Epoch 7/20
48600/48600 [==============================] - 4s 81us/step - loss: 0.3389 - acc: 0.8981 - val_loss: 0.2264 - val_acc: 0.9343
Epoch 8/20
48600/48600 [==============================] - 4s 82us/step - loss: 0.3262 - acc: 0.9009 - val_loss: 0.2271 - val_acc: 0.9307
Epoch 9/20
48600/48600 [==============================] - 4s 76us/step - loss: 0.3167 - acc: 0.9044 - val_loss: 0.2126 - val_acc: 0.9357
Epoch 10/20
48600/48600 [==============================] - 4s 81us/step - loss: 0.3100 - acc: 0.9064 - val_loss: 0.2061 - val_acc: 0.9383
Epoch 11/20
48600/48600 [==============================] - 4s 77us/step - loss: 0.2980 - acc: 0.9097 - val_loss: 0.2155 - val_acc: 0.9381
Epoch 12/20
48600/48600 [==============================] - 4s 80us/step - loss: 0.2959 - acc: 0.9110 - val_loss: 0.1980 - val_acc: 0.9404
Epoch 13/20
48600/48600 [==============================] - 4s 83us/step - loss: 0.2881 - acc: 0.9134 - val_loss: 0.1903 - val_acc: 0.9435
Epoch 14/20
48600/48600 [==============================] - 4s 77us/step - loss: 0.2792 - acc: 0.9153 - val_loss: 0.1929 - val_acc: 0.9420
Epoch 15/20
48600/48600 [==============================] - 4s 85us/step - loss: 0.2763 - acc: 0.9180 - val_loss: 0.1857 - val_acc: 0.9430
Epoch 16/20
48600/48600 [==============================] - 4s 80us/step - loss: 0.2772 - acc: 0.9156 - val_loss: 0.1891 - val_acc: 0.9430
Epoch 17/20
48600/48600 [==============================] - 4s 86us/step - loss: 0.2687 - acc: 0.9177 - val_loss: 0.1845 - val_acc: 0.9439
Epoch 18/20
48600/48600 [==============================] - 5s 96us/step - loss: 0.2701 - acc: 0.9185 - val_loss: 0.1864 - val_acc: 0.9406
Epoch 19/20
48600/48600 [==============================] - ETA: 0s - loss: 0.2594 - acc: 0.921 - 4s 87us/step - loss: 0.2604 - acc: 0.9212 - val_loss: 0.1801 - val_acc: 0.9470
Epoch 20/20
48600/48600 [==============================] - 4s 82us/step - loss: 0.2532 - acc: 0.9246 - val_loss: 0.1720 - val_acc: 0.9485

Out[252]:
<keras.callbacks.History at 0x1120dd278>
In [254]:
model.evaluate(teX, teY)

10000/10000 [==============================] - 0s 39us/step

Out[254]:
[0.14467686785683037, 0.9552]

### Drawing input¶

Let's try drawing an input and seeing how the network does!

In [354]:
sketch = Sketcher()
img = sketch.get_image()
plt.imshow(img, cmap='gray')
print('Prediction:', model.predict(np.array([img.flatten()])).argmax())

Read image...
Prediction: 2


## Recap¶

• Artificial intelligence consists of performance measure, environment, actuators, and sensors.
• Machine learning is the intersection of AI and data science. It poses the problem of modeling a distribution: mapping input $x \in \mathcal{X}$ to output $y \in \mathcal{Y}$ while minimizing a loss function.
• Logistic regression perceptrons are classifiers that linearly separate data and yield a continuous probability $y \in [0, 1]$.
• Parameters are learned through backpropagation: taking the partial derivative of the loss function with respect to each parameter, and changing the parameters in the opposite direction of the derivative, in order to minimize loss.
• Deep learning is a subset of machine learning that studies neural networks, which hierarchically represent data in layers of increasingly meaning, eventually culminating in a predicted output $y \in \mathcal{Y}$. Optimal parameter settings are learned through backpropagation as well.
• Feedforward neural networks use layers of logistic regression perceptrons to represent meaning.

## Where to next?¶

Machine Learning Resources:

Deep Learning:

Research papers: scholar.google.com is your best friend! Here's a starter set of papers.

The whole internet's out there to help you; feel free to use it!

## Contact¶

#### Kai-Zhan Lee¶

kl2792@columbia.edu

The best way to learn how to do something is... to do it! So please reach out if you want some advice on a deep learning/machine learning project, or if you want to brainstorm ideas for one!

If you have any questions on starting summer research, please feel to reach out as well.