Introduction to AI, Machine Learning & Deep Learning

What Is Artificial Intelligence (AI)?

Artificial Intelligence (AI) is the broad discipline focused on building systems that can perform tasks requiring human-like intelligence. These include:

Understanding and generating language
Recognizing objects and patterns
Solving problems and making decisions
Learning from data and experience

AI includes many subfields:

Natural Language Processing (NLP)

Enables machines to understand, analyze, and generate human language. Examples: chatbots, translation, sentiment analysis.

Computer Vision

Allows computers to “see” and interpret images and video. Examples: face recognition, autonomous driving, surveillance.

Robotics

Combines perception, control, and decision-making to perform tasks autonomously.

Expert Systems

Systems that mimic the decision-making abilities of human specialists.

AI’s Role AI is not designed to replace humans, but to augment our capabilities:

Faster decisions
Reduced errors
Ability to handle massive amounts of data
Automation of repetitive tasks

AI is used widely today:

Domain	AI Use Case
Healthcare	Diagnosis, personalized treatment, drug discovery
Finance	Fraud detection, credit scoring, trading
Cybersecurity	Threat detection, anomaly detection
Transportation	Route optimization, autonomous driving

What Is Machine Learning? (Foundations & Intuition)

Machine Learning (ML) is a subset of AI focused on building systems that learn patterns from data rather than being explicitly programmed.

ML algorithms analyze data, detect patterns, and make predictions or decisions based on new input

Types of ML

Supervised Learning (Spam Detection, Image Classification)
- Model learns from labeled data (e.g., emails marked spam or ham).
Unsupervised Learning (Clustering, Anomaly Detection)
- Model finds structure in unlabeled data (e.g., customer segmentation).
Reinforcement Learning (Robotics, Game Playing)
- Model learns by interacting with an environment and receiving rewards or penalties.

Practical Applications of ML

Medical diagnosis
Fraud detection
Recommendation systems
Malware detection
Predictive maintenance
Autonomous vehicles

ML is the engine that enables AI systems to adapt and improve over time.

What Is Deep Learning (DL)?

Deep Learning (DL) is a subfield of ML that uses neural networks with many layers (deep neural networks) to learn complex patterns.

DL excels with unstructured data:

Text
Images
Audio
Video

DL models automatically discover meaningful features — no manual engineering required.

**Key characteristics:

Hierarchical Feature Learning

Early layers learn simple features (edges, words); deeper layers learn complex concepts (faces, sentiment, intent).

End-to-End Learning

Model receives raw data and produces outputs directly.

Scalability

DL models improve dramatically as data and compute increase.

Common Deep Learning Architectures

Convolutional Neural Networks: A neural network architecture that learns hierarchical spatial features by applying shared convolutional filters over localized regions of structured input data.
Recurrent Neural Networks: A neural network designed to model sequential data by maintaining and updating an internal state that captures temporal dependencies across ordered inputs.
Transformers: A neural network architecture that models relationships within a sequence using self-attention mechanisms to capture global contextual dependencies without relying on recurrence or convolution.

Model Type	Best For	Why
CNNs	Images, spatial data	Detect local patterns & spatial hierarchies
RNNs	Sequences	Handle ordered, time-dependent data
LSTMs / GRUs	Long text sequences	Better long-term memory
Transformers	NLP, vision, multimodal tasks	Self-attention enables global reasoning

Transformers are now the dominant architecture across NLP and many other ML tasks.

Relationship Between AI, ML, and DL

Here is the hierarchy:

Artificial Intelligence (AI)
    └── Machine Learning (ML)
          └── Deep Learning (DL)

Examples of the hierarchy at work

In Computer Vision, Deep CNNs dominate image classification.
In NLP, transformers like BERT and GPT outperform all prior ML models.
In Autonomous Driving, ML + DL work together:
- ML for prediction models
- DL for perception (lanes, objects)

Bringing It Back to This Workshop

You will work primarily with:

Machine Learning (ML) for classifying emails
Deep Learning (DL) using neural networks
Transformers as the modern backbone of NLP

In this lab, we will use SPAM detection as a great example because it requires:

Understanding language
Detecting subtle patterns
Learning from real-world data
Generalizing to new emails
Adapting to evolving spam tactics

By the end of this workshop, you’ll understand how to build an AI system that performs a real-world NLP task using modern deep learning methods.

Why Machine Learning Exists

Traditional programming requires humans to create explicit rules.
This fails in real-world tasks with high variability, such as spam detection.

IF email contains "FREE" → spam

This breaks immediately when:

Spammers change vocabulary
New fraud techniques appear
Grammar, language, and tone vary

Machine learning solves this by learning rules from examples, not from humans.

graph TD
    A[Email Text] --> B[ML Model Learns Patterns]
    B --> C[Spam Probability]

ML models learn patterns automatically instead of relying on rigid rules. Also the model discovers statistical regularities humans never coded.

How Models Learn

All ML models follow this loop:

Input → text
Model predicts → probability of spam
Loss function → evaluates how wrong the prediction was
Backpropagation → computes how each weight influenced error
Gradient descent → updates weights to reduce future errors

Repeating this process thousands of times makes the model learn.

Mathematics (Simple Intuition)

A model is a function:

$$y=f(x)$$

Training aims to minimize:

$$ \mathrm{Loss} = \frac{1}{n}\sum_{i=1}^{n}\left(y_i - \hat{y}_i\right)^2 $$

The optimizer adjusts parameters:

$$ \theta_{\text{new}} = \theta_{\text{old}} - \alpha \frac{\partial \mathrm{Loss}}{\partial \theta} $$

Generalization, Underfitting, Overfitting

Underfitting

Model too simple
Fails to learn patterns

Overfitting

Memorizes training examples
Performs poorly on unseen data

Generalization

Learns underlying structure
Performs well on new data

Concept drift

Real-world patterns change (spam evolves)
Model must be retrained periodically

Types of Machine Learning Models

In this chapter we will go over the following topics:

Understand classical ML vs deep learning
Explain manual vs learned features
Understand the limitations of traditional NLP methods

Classical Machine Learning

Classical ML models include:

Logistic Regression
Naive Bayes
Support Vector Machines
Decision Trees / Random Forests

They rely heavily on manual feature engineering (TF-IDF, Bag-of-Words).

graph LR
    A[Text] --> B[Manual Feature Engineering]
    B --> C[Classical Model]
    C --> D[Spam/Ham]

Strengths

Simple
Fast
Works well on structured data

Weaknesses

Cannot understand meaning
Word order ignored
Cannot capture context
Struggles with long sequences

Why Classical ML Fails for Real NLP

Example:

“You won a free iPhone!”
“Claim your reward now!”

A TF-IDF model sees:

Completely different words
No connection between concepts (“won”, “reward”)

ML fails because meaning ≠ word counts. Deep learning models solved this gap.

Neural Networks: Core Concepts

In this chapter we will go over the following topics:

Understand neurons, layers, weights, and activations
Learn how backpropagation works
Understand hierarchical feature learning

What Is a Neural Network?

A neural network is a series of learned transformations. Example for a chain of transformations:

graph TD
    A[Input] --> B[Hidden Layer 1]
    B --> C[Hidden Layer 2]
    C --> D[Output]

Each neuron computes:

$$a=σ(Wx+b)$$

Hierarchical Learning

Deep networks build layers of understanding:

Layer 1 → edges, patterns, keywords
Layer 2 → sentiment, tone, syntax
Layer 3 → intent, meaning (fraud, manipulation)

This hierarchical learning makes deep networks extremely powerful.

How Neural Networks Learn

Neural networks adjust millions of weights using backpropagation. Each weight is updated by how much it contributed to the model’s error.

This allows the network to gradually shift from random noise to pattern recognition.

Sequence Models: RNN, LSTM, GRU

In this chapter we will go over the following topics:

Understand why sequential models were needed
Learn how RNNs build memory
Understand LSTM and GRU gating mechanisms
Recognize bottlenecks that transformers eliminate

RNN — Recurrent Neural Networks

RNNs introduced the idea of time-dependent memory:

graph LR
    A[Word at t] --> B[RNN Cell]
    B --> C[Word at t+1]
    B --> D[Hidden State Memory]

They track previous context using a hidden state.

Weaknesses

Vanishing gradients
Slow (sequential processing)
Struggles with long-range patterns

LSTM — Long Short-Term Memory

LSTMs added gates (input, forget, output) to regulate information flow.

This allowed:

Better long-term memory
Less catastrophic forgetting

For years, LSTMs were the backbone of NLP.

GRU — Gated Recurrent Unit

A simpler LSTM:

Fewer gates
Faster to train
Slightly less expressive

Why Sequence Models Broke at Scale

Cannot parallelize
Slow to train on large datasets
Memory bottlenecks
Poor long-term reasoning

These limitations motivated the invention of transformers.

CNNs & Feedforward Networks

In this chapter we will go over the following topics:

Understand why CNNs work for textual patterns
Explain feature maps
Know their limits

Feedforward Networks (MLPs)

MLPs treat input as fixed-length vectors. They ignore order, structure, and variable sequence length.

CNNs — Convolutional Neural Networks

CNNs slide filters across sequences to detect local patterns:

graph TD
    A[Embedding Sequence] --> B[Convolution Filters]
    B --> C[Feature Maps]
    C --> D[Spam/Ham Output]

CNNs detect:

“free offer”
“click now”
“urgent response”

Weaknesses

They do not understand long-range dependencies.

Transformers & Self-Attention

In this chapter we will go over the following topics:

Understand self-attention
Compare transformers with RNN/LSTM
Learn why transformers dominate NLP today

The Self-Attention Mechanism

Transformers compare each word with every other word:

graph TD
    A[Tokens] --> B[Self Attention]
    B --> C[Contextualized Representation]
    C --> D[Output]

This allows the model to discover relationships instantly.

Example spam email:

Claim your free prize now before the offer expires.

Attention detects:

free ↔ prize
offer ↔ expires
urgency patterns

Multi-Head Attention

Each head focuses on a different pattern:

Head 1 → urgency
Head 2 → reward structures
Head 3 → threat/pressure

This parallel interpretation is why transformers are so strong.

Transformers Architecture

Why Transformers Replaced LSTMs

Feature	LSTM	Transformer
Reads tokens	sequential	parallel
Long-range memory	limited	excellent
Training speed	slow	fast
Scalability	low	extremely high
NLP performance	outdated	state-of-the-art

Embeddings & Representation Learning

In this chapter we will go over the following topics:

Understand why text → numbers
Learn how embeddings encode meaning
Explain vector similarity

Why Text Must Be Converted Into Numbers

Neural networks require numeric vectors. Embeddings map words to dense vectors:

"free" → [0.2, -0.7, 0.1, ...]
"winner" → [0.25, -0.82, 0.06, ...]

Semantic Vector Spaces

Embeddings encode meaning via relative positions:

king - man + woman ≈ queen
free + prize + claim ≈ spam-like semantics

This allows models to:

Generalize
Understand synonyms
Capture context

Embeddings Learned in This Workshop

Your model learns 10-dimensional embeddings from the dataset. They become specialized for spam semantics, such as:

urgency
reward
threats
scam structure