1. Introduction to Deep Learning - McCulloch-Pitts Neuron & Perceptron

1. Introduction to Deep Learning - McCulloch-Pitts Neuron & Perceptron

·

6 min read

Source: CS7015/CS6910 of IIT Madras, Prof. Mitesh M. Khapra

In the era of computer's infancy, came a brilliant idea that would change the course of history in a very radical way.

Let's dive back in history, imagine a world where machines could mimic the complex decision-making processes of a human brain. It was a dream that captivated the minds of pioneers in the field of Artificial Intelligence.

McCulloch-Pitts Neuron

The year was 1943 when two electrical engineers, Warren McCulloch and Walter Pitts came together to birth this vision. Little did they know that their creation would set the stage for one of the most remarkable technological journeys ever embarked upon.

What they proposed was a simplified computation model of the neuron

Source:towardsdatascience.com/mcculloch-pitts-mode..

Working of an MP-neuron

\(g\) is a function which sums the inputs meanwhile \(f\) takes a decision based on the sum

\(g(x_{1}, x_{2}, ..., x_{n}) = g(x) = \sum^{n}_{i = 1}x_{i}\)

\(y = f(g(x)) = \begin{cases} 1 & \text{if }g(x) \geq \theta \\[10pt] 0 & \text{if } g(x) \lt \theta\end{cases}\)

\(\theta\) is the threshold

Common Boolean functions can be approximated using Perceptron

Example: AND function

To summarize

A single McCulloch-Pitts neuron can be used to represent the boolean functions that are linearly separable.

Perceptron

MP-neuron was not perfect and had a glaring issue, all inputs were considered equal and hence had equal "weights", meanwhile in reality, our decisions often depend on some inputs more than the other.

Enter Perceptron, which was proposed by Frank Rosenblatt in 1958. It was a more general computational model than MP-neuron and it introduced the fundamentals principles which are still in use today:

  • Weights assigned to inputs

  • Mechanism to learn those weights

  • Inputs no longer limited to boolean values

Refined later by Minsky and Papert in 1969

Source:towardsdatascience.com/what-is-a-perceptron..

\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} \geq \theta \\[10pt] 0 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} \lt \theta \end{cases}\)

Bring the \(\theta\) on the left side, we get

\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} - \theta \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 1} w_{i} \cdot x_{i} - \theta \lt 0 \end{cases}\)

However, a more accepted convention is the following

\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \lt 0 \end{cases}\)

where \(x_{0} = 1\) and \(w_{0} = -\theta\\)\\((w_{0}\) is the bias and \(x_{0}\) is always \(1\\)\\()\)

Below is the difference between MP-neuron and Perceptron

MP-neuronPerceptron
\(y = \begin{cases} 1 & \text{if }\sum^{n}_{i = 0}x_{i} \geq 0 \\[10pt] 0 & \text{if }\sum^{n}_{i = 0}x_{i} \lt 0 \end{cases}\)\(y = \begin{cases} 1 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \geq 0 \\[10pt] 0 & \text{if } \sum^{n}_{i = 0} w_{i} \cdot x_{i} \lt 0 \end{cases}\)

Perceptron Learning Algorithm

initialize weights w randomly;

while !convergence do:
    pick random input x
    if x has positive label and w.x < 0:
        w = w + x
    if x has negative label and w.x > 0:
        w = w - x

Quick diversion to learn PyTorch basics

What is PyTorch?

It's an open-source Machine Learning library and Deep Learning framework primarily used for developing and training neural networks. It provides a range of tools and modules for building and training various types of neural networks, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and more.

Tensor in PyTorch is the fundamental data structure that is used to store and manipulate multi-dimensional arrays. They are very similar to NumPy arrays and are a core component of PyTorch's computation graph.

Tensors in PyTorch are designed to work with automatic differentiation, a key feature for training neural networks using techniques like backpropagation (blog post coming soon).

Install PyTorch

Visit PyTorch's homepage for instructions on how to install. TLDR: pip install torch should do the job.

Load PyTorch using the following code

import torch

Scalar (Rank-0 Tensor)

A single numeric value, such as a real number or an integer and it has no dimension.

a = torch.tensor(1.)
print(a)        # tensor(1.)
print(a.shape)  # torch.Size([])

Vector (Rank-1 Tensor)

A 1D array or a vector. It has a single dimension and are used to represent list of values.

a = torch.tensor([1., 2., 3.])
print(a)        # tensor([1., 2., 3.])
print(a.shape)  # torch.Size([3])

Matrix (Rank-2 Tensor)

A 2D array or a matrix. It has two dimensions and are used to represent tabular data or images (black and white).

a = torch.tensor([
  [1., 2., 3.],
  [2., 3., 4.]
])

print(a.shape)  # torch.Size([2, 3])
print(a.ndim)   # 2
print(a.dtype)  # torch.float32

3D Matrix (Rank-3 Tensor)

A 3D array or a matrix. It has three dimensions and are used to represent 3 features or images (coloured)

a = torch.tensor([
  [
    [1., 2., 3.],
    [2., 3., 4.]
  ],
  [
    [5., 6., 7.],
    [8., 9., 10.]
  ]
])
print(a.shape)  # torch.Size([2, 2, 3])

4D Tensor (Rank-4 Tensor)

It has four dimensions and is typically used for higher order data structures in Deep Learning such as video data with time, height, width and colour channels.

We will stick till rank-3 tensor only.

Perceptron in PyTorch

A very basic Perceptron in PyTorch (only using torch tensor)

import torch
import pandas as pd

# Load the dataset
df = pd.read_csv(
    "https://raw.githubusercontent.com/kashifulhaque/BSCS3004-Deep-Learning-resources/main/datasets/data.txt",
    sep = "\t"
)

# Isolate the features and labels
X_train = df[["x1", "x2"]].values
y_train = df["label"].values

# Convert the features and labels to torch tensor (from numpy array)
X_train = torch.from_numpy(X_train)
y_train = torch.from_numpy(y_train)

# Convert the train features to 32-bit floats
X_train = X_train.to(torch.float32)

# Define the Perceptron model
class Perceptron:
  def __init__(self, num_features):
    # Number of weights should match the number of features
    self.weights = torch.zeros(num_features)
    self.bias = torch.tensor(0.)

  def forward(self, x):
    # Calculate the weighted sum, z = wx + b
    z = torch.dot(x, self.weights) + self.bias

    # Use the "harsh" Perceptron activation to get prediction
    pred = torch.tensor(1.) if z > 0 else torch.tensor(0.)
    return pred

  def update(self, x, y):
    pred = self.forward(x)
    error = y - pred

    # Perceptron Learning Algorithm
    self.bias += error
    self.weights += error * x

    return error

# Training loop
def train(model, x, y, epochs = 10):
  for epoch in range(epochs):
    error_count = 0

    for x_, y_ in zip(x, y):
      error = model.update(x_, y_)
      error_count += abs(error)

    print(f"Epoch #{epoch + 1}, errors: {error_count}")

ppn = Perceptron(num_features = 2)
train(ppn, X_train, y_train, epochs = 5)

# Evaluate the results
def accuracy(model, x, y):
  correct = 0.0

  for x_, y_ in zip(x, y):
    pred = model.forward(x_)
    correct += int(pred == y_)

  return correct / len(y)

# Print accuracy
accuracy_train = accuracy(ppn, X_train, y_train)
print(accuracy_train)

Training loop output & Accuracy