Source: CS7015/CS6910 of IIT Madras, Prof. Mitesh M. Khapra
In the era of computer's infancy, came a brilliant idea that would change the course of history in a very radical way.
Let's dive back in history, imagine a world where machines could mimic the complex decision-making processes of a human brain. It was a dream that captivated the minds of pioneers in the field of Artificial Intelligence.
McCulloch-Pitts Neuron
The year was 1943 when two electrical engineers, Warren McCulloch and Walter Pitts came together to birth this vision. Little did they know that their creation would set the stage for one of the most remarkable technological journeys ever embarked upon.
What they proposed was a simplified computation model of the neuron
Source:towardsdatascience.com/mcculloch-pitts-mode..
Working of an MP-neuron
\(g\) is a function which sums the inputs meanwhile \(f\) takes a decision based on the sum
\(g(x_{1}, x_{2}, ..., x_{n}) = g(x) = \sum^{n}_{i = 1}x_{i}\)
\(y = f(g(x)) = \begin{cases} 1 & \text{if }g(x) \geq \theta \\[10pt] 0 & \text{if } g(x) \lt \theta\end{cases}\)
\(\theta\) is the threshold
Common Boolean functions can be approximated using Perceptron
Example: AND function
To summarize
A single McCulloch-Pitts neuron can be used to represent the boolean functions that are linearly separable.
Perceptron
MP-neuron was not perfect and had a glaring issue, all inputs were considered equal and hence had equal "weights", meanwhile in reality, our decisions often depend on some inputs more than the other.
Enter Perceptron, which was proposed by Frank Rosenblatt in 1958. It was a more general computational model than MP-neuron and it introduced the fundamentals principles which are still in use today:
Weights assigned to inputs
Mechanism to learn those weights
Inputs no longer limited to boolean values
Refined later by Minsky and Papert in 1969
Source:towardsdatascience.com/what-is-a-perceptron..
\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} \geq \theta \\[10pt] 0 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} \lt \theta \end{cases}\)
Bring the \(\theta\) on the left side, we get
\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} - \theta \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} - \theta \lt 0 \end{cases}\)
However, a more accepted convention is the following
\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \lt 0 \end{cases}\)
where \(x_{0} = 1\) and \(w_{0} = -\theta\\)\\((w_{0}\) is the bias and \(x_{0}\) is always \(1\\)\\()\)
Below is the difference between MP-neuron and Perceptron
MP-neuron | Perceptron |
\(y = \begin{cases} 1 & \text{if }\sum^{n}_{i = 0}x_{i} \geq 0 \\[10pt] 0 & \text{if }\sum^{n}_{i = 0}x_{i} \lt 0 \end{cases}\) | \(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \lt 0 \end{cases}\) |
Perceptron Learning Algorithm
initialize weights w randomly;
while !convergence do:
pick random input x
if x has positive label and w.x < 0:
w = w + x
if x has negative label and w.x > 0:
w = w - x
Quick diversion to learn PyTorch basics
What is PyTorch?
It's an open-source Machine Learning library and Deep Learning framework primarily used for developing and training neural networks. It provides a range of tools and modules for building and training various types of neural networks, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and more.
Tensor in PyTorch is the fundamental data structure that is used to store and manipulate multi-dimensional arrays. They are very similar to NumPy arrays and are a core component of PyTorch's computation graph.
Tensors in PyTorch are designed to work with automatic differentiation, a key feature for training neural networks using techniques like backpropagation (blog post coming soon).
Install PyTorch
Visit PyTorch's homepage for instructions on how to install. TLDR: pip install torch
should do the job.
Load PyTorch using the following code
import torch
Scalar (Rank-0 Tensor)
A single numeric value, such as a real number or an integer and it has no dimension.
a = torch.tensor(1.)
print(a) # tensor(1.)
print(a.shape) # torch.Size([])
Vector (Rank-1 Tensor)
A 1D array or a vector. It has a single dimension and are used to represent list of values.
a = torch.tensor([1., 2., 3.])
print(a) # tensor([1., 2., 3.])
print(a.shape) # torch.Size([3])
Matrix (Rank-2 Tensor)
A 2D array or a matrix. It has two dimensions and are used to represent tabular data or images (black and white).
a = torch.tensor([
[1., 2., 3.],
[2., 3., 4.]
])
print(a.shape) # torch.Size([2, 3])
print(a.ndim) # 2
print(a.dtype) # torch.float32
3D Matrix (Rank-3 Tensor)
A 3D array or a matrix. It has three dimensions and are used to represent 3 features or images (coloured)
a = torch.tensor([
[
[1., 2., 3.],
[2., 3., 4.]
],
[
[5., 6., 7.],
[8., 9., 10.]
]
])
print(a.shape) # torch.Size([2, 2, 3])
4D Tensor (Rank-4 Tensor)
It has four dimensions and is typically used for higher order data structures in Deep Learning such as video data with time, height, width and colour channels.
We will stick till rank-3 tensor only.
Perceptron in PyTorch
A very basic Perceptron in PyTorch (only using torch tensor)
import torch
import pandas as pd
# Load the dataset
df = pd.read_csv(
"https://raw.githubusercontent.com/kashifulhaque/BSCS3004-Deep-Learning-resources/main/datasets/data.txt",
sep = "\t"
)
# Isolate the features and labels
X_train = df[["x1", "x2"]].values
y_train = df["label"].values
# Convert the features and labels to torch tensor (from numpy array)
X_train = torch.from_numpy(X_train)
y_train = torch.from_numpy(y_train)
# Convert the train features to 32-bit floats
X_train = X_train.to(torch.float32)
# Define the Perceptron model
class Perceptron:
def __init__(self, num_features):
# Number of weights should match the number of features
self.weights = torch.zeros(num_features)
self.bias = torch.tensor(0.)
def forward(self, x):
# Calculate the weighted sum, z = wx + b
z = torch.dot(x, self.weights) + self.bias
# Use the "harsh" Perceptron activation to get prediction
pred = torch.tensor(1.) if z > 0 else torch.tensor(0.)
return pred
def update(self, x, y):
pred = self.forward(x)
error = y - pred
# Perceptron Learning Algorithm
self.bias += error
self.weights += error * x
return error
# Training loop
def train(model, x, y, epochs = 10):
for epoch in range(epochs):
error_count = 0
for x_, y_ in zip(x, y):
error = model.update(x_, y_)
error_count += abs(error)
print(f"Epoch #{epoch + 1}, errors: {error_count}")
ppn = Perceptron(num_features = 2)
train(ppn, X_train, y_train, epochs = 5)
# Evaluate the results
def accuracy(model, x, y):
correct = 0.0
for x_, y_ in zip(x, y):
pred = model.forward(x_)
correct += int(pred == y_)
return correct / len(y)
# Print accuracy
accuracy_train = accuracy(ppn, X_train, y_train)
print(accuracy_train)
Training loop output & Accuracy