(Aside from landing that killer machine learning job and making \$\$\$\$)

Cool Applications:

Finding things you like (Google, Facebook, etc.)

Speech synthesis and speech recognition (Siri, Alexa, etc.)

Identifying those in mental distress through social media (current research)

Deep learning models certain aspects of the human brain.

It helps solve problems that *only humans could solve before*!

In 2012, deep learning was considered a nice mathematical escape from reality that only researchers investigated.

Today, it approaches ubiquity in resesarch and industry alike.

has published a paper on deep learning:

*XKCD, "Machine Learning"*

Put simply: Finding meaning from data.

*"... what we want is a machine that can learn from experience."*

- Alan Turing, 1947

**perceiving agent** within an **environment** that takes **actions** to maximize its chances of achieving a specific set of **goals**.

Remember ** PEAS**!

**Performance Measure**: how well is the agent acting to achieve its goals?**Environment**: what is there besides the agent?**Actuators**: what actions can the agent perform?**Sensors**: what does the agent perceive?

- | Thinking | Acting |
---|---|---|

Naturally |
Emotion, belief | Running, flying, swimming |

Rationally |
Logic, proofs | Decisions, choices |

In this talk, we'll examine ** rational actors**.

Machine learning lies at the *intersection* of **artificial intelligence** and **data science**.

Assume we have input and output sets $\mathcal{X}$ and $\mathcal{Y}$ and a distribution $\mathcal{D}: \mathcal{X} \times \mathcal{Y} \to \mathbb R$. Given $n$ datapoints identically and independently drawn from this distribution, we attempt to find a function $f: \mathcal{X} \to \mathcal{Y}$ that minimizes training *error*, or *loss*.

There are many types of loss:

- Mean squared error: $\mathbb E_{x, y \in \mathcal{D}} \left[\lVert f(x) - y\right \rVert_2^2]$.
- Cross-entropy loss: $\mathbb E_{x, y \in \mathcal{D}} \left[\lVert - y \log f(x)\right\rVert_1]$.

Deep learning has had a long history of iterative redesigning and improvement.

- 1943: Pioneers in mathematically modelling the brain, Walter Pitts and Warren McCulloch propose the
**Threshold Logic Unit**(TLU), a linear model with adjustable, but non-learnable parameters $t$ and $w_1 \dots w_d$, where $d$ is the dimensionality of the input vector $\vec x$.

- 1947:
*"... what we want is a machine that can learn from experience."*- Alan Turing

- 1957: Frank Rosenblatt invents the
**Linear Perceptron**(LP), making the TLU's parameters learnable.

Is this guaranteed to converge? (i.e. give a solution for $\vec w$?)

- 1958: David Cox invents
**Logistic Regression**(LR) for perceptrons, using**gradient descent**to find optimal weights. He uses*cross-entropy loss*to measure model performance.

Deep learning is the study of neural networks, models that "learn from experience and understand the world in terms of a hierarchy of concepts."

A neural network consists of *multiple layers*. Each layer (aside from the input layer) consists of *independent* perceptrons that take in the previous layer's output as their input. There are 3 main divisions of layers:

- Input Layer: our input vector $\vec x \in \mathcal{X}$.
- Hidden Layer(s): vector(s) $\vec h_1 ... \vec h_n$ of fixed size. Each element of each vector is its own perceptron with its own weights. We refer to each perceptron as a "node".
- Output Layer: our predicted output vector $\vec{y}_{pred} \in \mathcal{Y}$

Each column of circles is a vector; each circle with arrows pointing to it is a node (perceptron) taking input from the arrows' sources (the previous layer). This structure is called a **feedforward neural network** (FFNN or NN for short), because the input is fed forward through the model and, well, it's a neural network. Like with logistic regression, weights are learned with **gradient descent**; for each weight, we take the partial derivative $\frac{\partial L}{\partial w}$ and move in the *opposite* direction, to *minimize* loss.

This particular NN has two layers: one hidden layer of 5 nodes and one output layer of 2 nodes. Note that the input (a 3-vector) doesn't count as a layer.

Let's try out the MNIST dataset: a collection of handwritten digits from postal codes.

Python Packages: `python-mnist`

(our data!), `numpy`

(matrix library), `matplotlib`

(a good plotting tool in Python)

In [259]:

```
import numpy as np
import cv2 as cv
from skimage.measure import block_reduce
from matplotlib import pyplot as plt
from mnist import MNIST
%matplotlib inline
# Load data
loader = MNIST(gz=True)
trX, trY = map(np.array, loader.load_training())
teX, teY = map(np.array, loader.load_testing())
```

In [212]:

```
# Show the first MNIST training image
img = plt.imshow(trX[0].reshape(28, 28), cmap='gray')
print('True Label:', trY[0])
```

True Label: 5

Training a model in Keras is simple: specify a model structure, optimizer, and loss function, and you're ready to start training!

In [252]:

```
from keras.models import Sequential
from keras.layers import Dense, Dropout
model = Sequential([
Dense(512, activation='sigmoid', input_dim=trX.shape[1]),
Dropout(0.5),
Dense(128, activation='sigmoid'),
Dropout(0.5),
Dense(10, activation='softmax')
])
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(trX, trY, validation_split=0.1, epochs=20, batch_size=256)
```

Train on 48600 samples, validate on 5400 samples Epoch 1/20 48600/48600 [==============================] - 4s 78us/step - loss: 1.0246 - acc: 0.6840 - val_loss: 0.4162 - val_acc: 0.8846 Epoch 2/20 48600/48600 [==============================] - 4s 75us/step - loss: 0.5051 - acc: 0.8523 - val_loss: 0.3234 - val_acc: 0.9091 Epoch 3/20 48600/48600 [==============================] - 4s 77us/step - loss: 0.4372 - acc: 0.8710 - val_loss: 0.2868 - val_acc: 0.9154 Epoch 4/20 48600/48600 [==============================] - 4s 75us/step - loss: 0.4003 - acc: 0.8790 - val_loss: 0.2705 - val_acc: 0.9230 Epoch 5/20 48600/48600 [==============================] - 4s 82us/step - loss: 0.3719 - acc: 0.8879 - val_loss: 0.2510 - val_acc: 0.9256 Epoch 6/20 48600/48600 [==============================] - 4s 81us/step - loss: 0.3582 - acc: 0.8941 - val_loss: 0.2340 - val_acc: 0.9302 Epoch 7/20 48600/48600 [==============================] - 4s 81us/step - loss: 0.3389 - acc: 0.8981 - val_loss: 0.2264 - val_acc: 0.9343 Epoch 8/20 48600/48600 [==============================] - 4s 82us/step - loss: 0.3262 - acc: 0.9009 - val_loss: 0.2271 - val_acc: 0.9307 Epoch 9/20 48600/48600 [==============================] - 4s 76us/step - loss: 0.3167 - acc: 0.9044 - val_loss: 0.2126 - val_acc: 0.9357 Epoch 10/20 48600/48600 [==============================] - 4s 81us/step - loss: 0.3100 - acc: 0.9064 - val_loss: 0.2061 - val_acc: 0.9383 Epoch 11/20 48600/48600 [==============================] - 4s 77us/step - loss: 0.2980 - acc: 0.9097 - val_loss: 0.2155 - val_acc: 0.9381 Epoch 12/20 48600/48600 [==============================] - 4s 80us/step - loss: 0.2959 - acc: 0.9110 - val_loss: 0.1980 - val_acc: 0.9404 Epoch 13/20 48600/48600 [==============================] - 4s 83us/step - loss: 0.2881 - acc: 0.9134 - val_loss: 0.1903 - val_acc: 0.9435 Epoch 14/20 48600/48600 [==============================] - 4s 77us/step - loss: 0.2792 - acc: 0.9153 - val_loss: 0.1929 - val_acc: 0.9420 Epoch 15/20 48600/48600 [==============================] - 4s 85us/step - loss: 0.2763 - acc: 0.9180 - val_loss: 0.1857 - val_acc: 0.9430 Epoch 16/20 48600/48600 [==============================] - 4s 80us/step - loss: 0.2772 - acc: 0.9156 - val_loss: 0.1891 - val_acc: 0.9430 Epoch 17/20 48600/48600 [==============================] - 4s 86us/step - loss: 0.2687 - acc: 0.9177 - val_loss: 0.1845 - val_acc: 0.9439 Epoch 18/20 48600/48600 [==============================] - 5s 96us/step - loss: 0.2701 - acc: 0.9185 - val_loss: 0.1864 - val_acc: 0.9406 Epoch 19/20 48600/48600 [==============================] - ETA: 0s - loss: 0.2594 - acc: 0.921 - 4s 87us/step - loss: 0.2604 - acc: 0.9212 - val_loss: 0.1801 - val_acc: 0.9470 Epoch 20/20 48600/48600 [==============================] - 4s 82us/step - loss: 0.2532 - acc: 0.9246 - val_loss: 0.1720 - val_acc: 0.9485

Out[252]:

<keras.callbacks.History at 0x1120dd278>

In [254]:

```
model.evaluate(teX, teY)
```

10000/10000 [==============================] - 0s 39us/step

Out[254]:

[0.14467686785683037, 0.9552]

Let's try drawing an input and seeing how the network does!

In [354]:

```
sketch = Sketcher()
img = sketch.get_image()
plt.imshow(img, cmap='gray')
print('Prediction:', model.predict(np.array([img.flatten()])).argmax())
```

Read image... Prediction: 2

*Artificial intelligence*consists of**performance measure, environment, actuators, and sensors**.*Machine learning*is the intersection of AI and data science. It poses the problem of modeling a distribution:**mapping input**$x \in \mathcal{X}$**to output**$y \in \mathcal{Y}$ while minimizing a**loss function**.*Logistic regression perceptrons*are classifiers that**linearly separate**data and yield a**continuous**probability $y \in [0, 1]$.- Parameters are learned through
**backpropagation**: taking the partial derivative of the loss function with respect to each parameter, and changing the parameters in the opposite direction of the derivative, in order to minimize loss.

- Parameters are learned through
*Deep learning*is a subset of machine learning that studies**neural networks**, which**hierarchically represent data in layers of increasingly meaning**, eventually culminating in a predicted output $y \in \mathcal{Y}$. Optimal parameter settings are learned through**backpropagation**as well.*Feedforward neural networks*use layers of**logistic regression perceptrons**to represent meaning.

Machine Learning Resources:

- Machine Learning, Prof. Nakul Verma (lecture slides and homework online)
- HackerRank
- Coursera

Deep Learning:

*Neural Networks and Deep Learning*: math-notation-light*Deep Learning*: extremely comprehensive but math-heavy- Coursera
- Experts' Blogs

Research papers: scholar.google.com is your best friend! Here's a starter set of papers.

The whole internet's out there to help you; feel free to use it!

kl2792@columbia.edu

The best way to learn how to do something is... to do it! So please reach out if you want some advice on a deep learning/machine learning project, or if you want to brainstorm ideas for one!

If you have any questions on starting summer research, please feel to reach out as well.