Skip to content

Machine Learning Techniques

Types of Machine Learning

Machine learning can be divided into three main categories based on how models learn from data.

1. Supervised Learning

Models learn from labeled data to make predictions.

How it Works

  • Input: Data with known answers (labels)
  • Process: Model learns patterns between features and labels
  • Output: Predictions for new data

Types

  1. Classification (Predicts categories)
from sklearn.ensemble import RandomForestClassifier
# Disease Classification Example
class DiseaseClassifier:
def __init__(self):
self.model = RandomForestClassifier()
def train(self, symptoms, diagnoses):
"""Train with patient symptoms and known diagnoses"""
self.model.fit(symptoms, diagnoses)
def predict(self, patient_symptoms):
"""Predict disease category"""
return self.model.predict([patient_symptoms])[0]
# Example Usage
symptoms = [
[1, 0, 1], # Patient 1: [fever, cough, fatigue]
[0, 1, 0], # Patient 2
[1, 1, 1] # Patient 3
]
diagnoses = ['flu', 'cold', 'flu']
classifier = DiseaseClassifier()
classifier.train(symptoms, diagnoses)
new_patient = [1, 0, 1]
prediction = classifier.predict(new_patient)
  1. Regression (Predicts numbers)
from sklearn.linear_model import LinearRegression
# House Price Prediction
class PricePredictor:
def __init__(self):
self.model = LinearRegression()
def train(self, features, prices):
"""Train with house features and known prices"""
self.model.fit(features, prices)
def predict(self, house_features):
"""Predict house price"""
return self.model.predict([house_features])[0]
# Example Usage
features = [
[1500, 3], # [size, bedrooms]
[2000, 4],
[1200, 2]
]
prices = [300000, 450000, 250000]
predictor = PricePredictor()
predictor.train(features, prices)
new_house = [1800, 3]
price = predictor.predict(new_house)

2. Unsupervised Learning

Models find patterns in unlabeled data.

How it Works

  • Input: Data without labels
  • Process: Model discovers inherent patterns
  • Output: Groups or structures in data

Types

  1. Clustering (Groups similar items)
from sklearn.cluster import KMeans
class CustomerSegmentation:
def __init__(self, n_clusters=3):
self.model = KMeans(n_clusters=n_clusters)
def segment(self, customer_data):
"""Group customers by behavior"""
return self.model.fit_predict(customer_data)
# Example Usage
customers = [
[100, 20], # [spending, visits]
[500, 50],
[50, 10]
]
segmentation = CustomerSegmentation()
segments = segmentation.segment(customers)
  1. Dimensionality Reduction (Simplifies data)
from sklearn.decomposition import PCA
class ImageCompressor:
def __init__(self, n_components=2):
self.model = PCA(n_components=n_components)
def compress(self, images):
"""Reduce image dimensions"""
return self.model.fit_transform(images)
# Example Usage
images = [
[1, 2, 3, 4], # Each row is a flattened image
[2, 3, 4, 5],
[3, 4, 5, 6]
]
compressor = ImageCompressor()
compressed = compressor.compress(images)

3. Reinforcement Learning

Models learn through trial and error with rewards.

How it Works

  • Input: Environment state
  • Process: Agent takes actions and receives rewards
  • Output: Optimal action policy
class GameAgent:
def __init__(self, states, actions):
self.q_table = {} # State-action values
def choose_action(self, state):
"""Choose best action for current state"""
if state not in self.q_table:
self.q_table[state] = [0] * len(self.actions)
return np.argmax(self.q_table[state])
def learn(self, state, action, reward, next_state):
"""Update knowledge based on reward"""
old_value = self.q_table[state][action]
next_max = np.max(self.q_table[next_state])
self.q_table[state][action] = old_value + 0.1 * (reward + 0.9 * next_max - old_value)
# Example Usage
agent = GameAgent(states=['start', 'middle', 'end'], actions=['left', 'right'])
state = 'start'
action = agent.choose_action(state)

Key Differences

  1. Supervised vs Unsupervised

    • Supervised: Has correct answers to learn from
    • Unsupervised: Finds patterns without answers
  2. Supervised vs Reinforcement

    • Supervised: Learns from static examples
    • Reinforcement: Learns from dynamic interaction
  3. Unsupervised vs Reinforcement

    • Unsupervised: Finds structure in data
    • Reinforcement: Learns optimal behavior

When to Use Each

  1. Use Supervised Learning When

    • You have labeled data
    • Need specific predictions
    • Have clear input-output pairs
  2. Use Unsupervised Learning When

    • Data isn’t labeled
    • Want to discover patterns
    • Need to reduce data complexity
  3. Use Reinforcement Learning When

    • Have interactive environment
    • Can define clear rewards
    • Need adaptive behavior

Common Applications

  1. Supervised Learning

    • Spam detection
    • Disease diagnosis
    • Price prediction
    • Image recognition
  2. Unsupervised Learning

    • Customer segmentation
    • Anomaly detection
    • Feature extraction
    • Recommendation systems
  3. Reinforcement Learning

    • Game playing
    • Robot navigation
    • Resource management
    • Trading strategies

Remember: Choose the learning type based on your data availability and problem requirements. Each type has its strengths and ideal use cases.

Supervised Learning

Supervised learning is a type of machine learning where models learn from labeled data to make predictions on new, unseen data.

Key Concepts

  1. Input Data (X): Features or predictors
  2. Output Data (y): Labels or target values
  3. Training: Model learns patterns from X → y mapping
  4. Prediction: Model predicts y for new X

Types of Supervised Learning

1. Classification

  • Predicts categorical outputs
  • Examples: Yes/No, Cat/Dog/Bird
from sklearn.ensemble import RandomForestClassifier
# Email Spam Classification
class SpamClassifier:
def __init__(self):
self.model = RandomForestClassifier()
def train(self, emails, labels):
"""
Train spam classifier
Args:
emails: List of email features (word counts, etc.)
labels: 1 for spam, 0 for not spam
"""
self.model.fit(emails, labels)
def predict(self, email):
"""Predict if email is spam"""
prediction = self.model.predict([email])[0]
return "Spam" if prediction == 1 else "Not Spam"
# Example Usage
features = [[1, 0, 2], [0, 0, 0], [3, 1, 4]] # Word counts: [money, friend, buy]
labels = [1, 0, 1] # 1=spam, 0=not spam
classifier = SpamClassifier()
classifier.train(features, labels)
new_email = [2, 0, 3] # New email features
result = classifier.predict(new_email)
print(f"Email classified as: {result}")

2. Regression

  • Predicts continuous numerical values
  • Examples: Price, Temperature, Age
from sklearn.linear_model import LinearRegression
import numpy as np
class HousePricePredictor:
def __init__(self):
self.model = LinearRegression()
def train(self, features, prices):
"""
Train house price predictor
Args:
features: Array of [size, bedrooms, age]
prices: Array of house prices
"""
self.model.fit(features, prices)
def predict(self, house):
"""Predict house price"""
return self.model.predict([house])[0]
# Example Usage
features = [
[1500, 3, 10], # size, bedrooms, age
[2000, 4, 5],
[1200, 2, 15]
]
prices = [300000, 450000, 250000]
predictor = HousePricePredictor()
predictor.train(features, prices)
new_house = [1800, 3, 8]
price = predictor.predict(new_house)
print(f"Predicted price: ${price:,.2f}")

Common Algorithms

  1. Linear Models

    • Linear Regression
    • Logistic Regression
    • Use: Simple, interpretable predictions
  2. Tree-Based Models

    • Decision Trees
    • Random Forests
    • Use: Complex patterns, non-linear relationships
  3. Neural Networks

    • Deep Learning
    • Multi-layer Perceptron
    • Use: Image, text, complex patterns

Real-World Applications

1. Credit Card Fraud Detection

class FraudDetector:
def __init__(self):
self.model = RandomForestClassifier()
def train(self, transactions, fraud_labels):
"""
Train fraud detector
Args:
transactions: Features [amount, time, location_code, etc.]
fraud_labels: 1 for fraud, 0 for legitimate
"""
self.model.fit(transactions, fraud_labels)
def check_transaction(self, transaction):
"""Check if transaction is fraudulent"""
risk_score = self.model.predict_proba([transaction])[0][1]
return {
'risk_score': risk_score,
'flag_for_review': risk_score > 0.7
}
# Example
detector = FraudDetector()
transaction = [1000, 3, 45] # amount, time, location
result = detector.check_transaction(transaction)
print(f"Risk Score: {result['risk_score']:.2%}")

2. Medical Diagnosis

class DiseasePredictor:
def __init__(self):
self.model = RandomForestClassifier()
def train(self, patient_data, diagnoses):
"""
Train disease predictor
Args:
patient_data: Features [symptoms, age, history, etc.]
diagnoses: Disease labels
"""
self.model.fit(patient_data, diagnoses)
def predict_risk(self, patient):
"""Predict disease risk"""
probability = self.model.predict_proba([patient])[0][1]
return {
'risk_level': 'High' if probability > 0.7 else 'Low',
'probability': probability
}
# Example
symptoms = [1, 0, 1, 0] # fever, cough, fatigue, pain
result = DiseasePredictor().predict_risk(symptoms)
print(f"Risk Level: {result['risk_level']}")

Best Practices

  1. Data Quality

    • Clean, representative data
    • Balanced classes
    • Proper feature scaling
  2. Model Selection

    • Start simple
    • Use cross-validation
    • Consider interpretability needs
  3. Evaluation

    • Use appropriate metrics
    • Test on unseen data
    • Monitor performance

Common Challenges

  1. Overfitting

    • Model learns noise in training data
    • Solution: Regularization, cross-validation
  2. Underfitting

    • Model too simple for data
    • Solution: More features, complex model
  3. Data Issues

    • Missing values
    • Imbalanced classes
    • Noisy labels

Performance Metrics

Classification

from sklearn.metrics import accuracy_score, precision_score, recall_score
def evaluate_classifier(y_true, y_pred):
"""Calculate classification metrics"""
return {
'accuracy': accuracy_score(y_true, y_pred),
'precision': precision_score(y_true, y_pred),
'recall': recall_score(y_true, y_pred)
}

Regression

from sklearn.metrics import mean_squared_error, r2_score
def evaluate_regression(y_true, y_pred):
"""Calculate regression metrics"""
return {
'mse': mean_squared_error(y_true, y_pred),
'r2': r2_score(y_true, y_pred)
}

When to Use

  1. Classification

    • Spam detection
    • Disease diagnosis
    • Customer churn prediction
    • Image recognition
  2. Regression

    • Price prediction
    • Sales forecasting
    • Temperature estimation
    • Resource consumption

Remember: Success in supervised learning depends heavily on data quality and proper model selection for your specific problem.

Unsupervised Learning

Unsupervised learning finds patterns in data without labels. Think of it like sorting clothes by color without knowing the color names.

Main Types

  1. Clustering

    • Groups similar items together
    • Example: Grouping customers by shopping behavior
  2. Dimensionality Reduction

    • Simplifies complex data
    • Example: Compressing images while keeping important features

Simple Examples

# Clustering Example
from sklearn.cluster import KMeans
class CustomerGrouping:
def __init__(self, n_groups=3):
self.model = KMeans(n_clusters=n_groups)
def group_customers(self, data):
"""
Group customers based on behavior
data: Array of [purchase_amount, visit_frequency]
"""
return self.model.fit_predict(data)
# Usage
customers = [
[100, 5], # [spending, visits]
[500, 15],
[50, 2]
]
groups = CustomerGrouping().group_customers(customers)
# Dimensionality Reduction Example
from sklearn.decomposition import PCA
class DataSimplifier:
def __init__(self, n_features=2):
self.model = PCA(n_components=n_features)
def simplify(self, data):
"""
Reduce data complexity
data: Complex data with many features
"""
return self.model.fit_transform(data)

When to Use

  1. Use Clustering When:

    • Finding customer segments
    • Grouping similar documents
    • Detecting anomalies
  2. Use Dimensionality Reduction When:

    • Reducing image size
    • Visualizing high-dimensional data
    • Removing noise from data

Common Algorithms

  1. Clustering

    • K-means: Simple, fast grouping
    • DBSCAN: Finds clusters of any shape
    • Hierarchical: Creates tree of clusters
  2. Dimensionality Reduction

    • PCA: Linear data compression
    • t-SNE: Complex data visualization
    • Autoencoders: Neural network compression

Tips for Success

  1. Data Preparation

    • Scale features to same range
    • Remove outliers if needed
    • Handle missing values
  2. Algorithm Selection

    • Start with simple methods
    • K-means for clustering
    • PCA for dimension reduction
  3. Evaluation

    • Check if groups make sense
    • Validate with domain experts
    • Use visualization tools

Reinforcement Learning

Reinforcement learning is like teaching through trial and error with rewards. Think of training a dog - good behavior gets treats, bad behavior doesn’t.

Core Concepts

  1. Agent: The learner (like a robot or game player)
  2. Environment: Where the agent operates
  3. State: Current situation
  4. Action: What the agent can do
  5. Reward: Feedback on how good an action was

Simple Example

import numpy as np
class SimpleGameAgent:
def __init__(self, states, actions):
self.q_table = {} # Stores state-action values
self.states = states
self.actions = actions
self.learning_rate = 0.1
self.discount = 0.9
def get_action(self, state, explore_rate=0.1):
"""Choose action: explore or exploit"""
if state not in self.q_table:
self.q_table[state] = {a: 0 for a in self.actions}
# Sometimes try random actions (explore)
if np.random.random() < explore_rate:
return np.random.choice(self.actions)
# Usually pick best action (exploit)
return max(self.q_table[state], key=self.q_table[state].get)
def learn(self, state, action, reward, next_state):
"""Update knowledge from experience"""
if next_state not in self.q_table:
self.q_table[next_state] = {a: 0 for a in self.actions}
# Update Q-value based on reward
old_value = self.q_table[state][action]
next_best = max(self.q_table[next_state].values())
new_value = (1 - self.learning_rate) * old_value + \
self.learning_rate * (reward + self.discount * next_best)
self.q_table[state][action] = new_value
# Example Usage
states = ['start', 'middle', 'end']
actions = ['left', 'right']
agent = SimpleGameAgent(states, actions)
current_state = 'start'
action = agent.get_action(current_state)
reward = 1 # Get reward from environment
next_state = 'middle'
agent.learn(current_state, action, reward, next_state)

Common Applications

  1. Games

    • Chess
    • Video games
    • Board games
  2. Robotics

    • Navigation
    • Manipulation
    • Walking
  3. Business

    • Trading
    • Resource management
    • Ad placement

Key Algorithms

  1. Q-Learning

    • Simple table-based learning
    • Good for small problems
    • Easy to understand
  2. Deep Q-Learning

    • Uses neural networks
    • Good for complex problems
    • Handles large state spaces
  3. Policy Gradient

    • Learns actions directly
    • Good for continuous actions
    • Used in robotics

Tips for Success

  1. Start Simple

    • Use basic environments
    • Limit state/action space
    • Test frequently
  2. Handle Exploration

    • Try random actions early
    • Reduce randomness over time
    • Balance explore/exploit
  3. Reward Design

    • Clear goals
    • Immediate feedback
    • Consistent values

Note: Reinforcement learning needs patience. Agents often perform poorly at first but improve with practice.

Equation of a Line, 3D Planes, and Hyperplanes

1. Line Equation (2D)

A line is the simplest form of linear relationship:

y = mx + b
# m = slope (rise/run)
# b = y-intercept

Example:

def line_equation(x, m=2, b=1):
"""
Calculate y value for a line
Args:
x: x coordinate
m: slope
b: y-intercept
Returns:
y coordinate
"""
return m * x + b
# Example: y = 2x + 1
x_points = [0, 1, 2]
y_points = [line_equation(x) for x in x_points]
# y_points = [1, 3, 5]
2. Plane Equation (3D)

A plane in 3D space:

z = ax + by + c
# a, b = slopes in x and y directions
# c = z-intercept

Example:

def plane_equation(x, y, a=1, b=1, c=0):
"""
Calculate z value for a plane
Args:
x, y: coordinates
a, b: slopes
c: z-intercept
Returns:
z coordinate
"""
return a * x + b * y + c
# Example: z = x + y
point = plane_equation(1, 2) # z = 3
3. Hyperplane (N dimensions)

For machine learning with multiple features:

y = w1*x1 + w2*x2 + ... + wn*xn + b
# w = weights
# x = features
# b = bias

Simple Linear Classifier:

import numpy as np
class SimpleClassifier:
def __init__(self, n_features):
self.weights = np.zeros(n_features)
self.bias = 0
def predict(self, features):
"""
Classify using hyperplane
Args:
features: array of feature values
Returns:
1 if above plane, 0 if below
"""
value = np.dot(features, self.weights) + self.bias
return 1 if value > 0 else 0
# Example
classifier = SimpleClassifier(2)
classifier.weights = np.array([1, 1])
classifier.bias = -1
# Point (1, 1) is above plane
print(classifier.predict([1, 1])) # 1
# Point (0, 0) is below plane
print(classifier.predict([0, 0])) # 0

Key Points:

  • Lines separate points in 2D
  • Planes separate points in 3D
  • Hyperplanes separate points in higher dimensions
  • Essential for linear classification and regression

Distance Of a Point to a Plane

The distance from a point to a plane is useful in machine learning for:

  • Finding margins in SVM algorithms
  • Calculating error in geometric models
  • Determining point classifications
Formula
from numpy import array, dot, sqrt
def point_to_plane_distance(point, plane_normal, plane_point):
"""
Calculate shortest distance from point to plane
Args:
point: Array [x, y, z] - Point to measure from
plane_normal: Array [a, b, c] - Normal vector to plane
plane_point: Array [x, y, z] - Any point on plane
Returns:
distance: Absolute distance from point to plane
"""
# Normalize the plane normal vector
n = plane_normal / sqrt(dot(plane_normal, plane_normal))
# Vector from plane to point
v = point - plane_point
# Distance is dot product of v and normalized normal
return abs(dot(v, n))
# Example Usage
point = array([1, 1, 1])
plane_normal = array([0, 0, 1]) # Points up Z-axis
plane_point = array([0, 0, 0]) # Plane at Z=0
distance = point_to_plane_distance(point, plane_normal, plane_point)
# distance = 1.0 (point is 1 unit above XY plane)
Visual Example
  • Plane: z = 0 (XY plane)
  • Point: (1, 1, 1)
  • Distance: 1 unit (straight up from plane)

Note: The normal vector determines plane orientation. It points perpendicular to the plane surface.

Instance Based Learning

Instance Based Learning (IBL) stores training examples to predict new cases, like using past experiences to make decisions.

K-Nearest Neighbors (KNN)

The most common IBL algorithm. Predicts based on closest training examples.

from sklearn.neighbors import KNeighborsClassifier
class SimpleKNN:
def __init__(self, k=3):
self.model = KNeighborsClassifier(n_neighbors=k)
def train(self, X, y):
"""Store training examples"""
self.model.fit(X, y)
def predict(self, x):
"""Predict using k closest neighbors"""
return self.model.predict([x])[0]
# Example: Movie Recommendation
movies = [
[3, 4], # [length_hours, rating]
[1, 5],
[4, 2]
]
likes = ['yes', 'yes', 'no']
recommender = SimpleKNN(k=2)
recommender.train(movies, likes)
new_movie = [2, 4]
prediction = recommender.predict(new_movie)

Key Points

  1. Advantages

    • Simple to understand
    • No training phase
    • Works with new data easily
  2. Disadvantages

    • Slow with large datasets
    • Needs lots of memory
    • Sensitive to irrelevant features

When to Use

  • Small to medium datasets
  • When examples are similar to predictions
  • When quick setup is needed

Note: Choose k (number of neighbors) based on dataset size. Larger k reduces noise sensitivity.

Model Based Learning

Model Based Learning finds patterns in data to make predictions. Think of it like creating a rule book from examples.

Basic Types

  1. Linear Models
from sklearn.linear_model import LinearRegression
class SimpleLinearModel:
def __init__(self):
self.model = LinearRegression()
def train(self, X, y):
"""Learn pattern from data"""
self.model.fit(X, y)
def predict(self, x):
"""Make prediction using learned pattern"""
return self.model.predict([x])[0]
# Example: House Price Prediction
sizes = [[1000], [1500], [2000]] # Square feet
prices = [200000, 300000, 400000] # Prices
predictor = SimpleLinearModel()
predictor.train(sizes, prices)
new_size = [1750]
price = predictor.predict(new_size)
  1. Decision Trees
from sklearn.tree import DecisionTreeClassifier
class SimpleTreeModel:
def __init__(self):
self.model = DecisionTreeClassifier(max_depth=3) # Keep it simple
def train(self, X, y):
"""Learn decision rules"""
self.model.fit(X, y)
def predict(self, x):
"""Make decision using rules"""
return self.model.predict([x])[0]
# Example: Loan Approval
data = [
[50000, 3], # [income, years_employed]
[30000, 1],
[70000, 5]
]
approved = ['yes', 'no', 'yes']
approver = SimpleTreeModel()
approver.train(data, approved)
new_application = [45000, 2]
decision = approver.predict(new_application)

When to Use

  1. Linear Models

    • Simple relationships
    • Need explainable results
    • Limited data available
  2. Decision Trees

    • Complex patterns
    • Need clear rules
    • Have enough data

Tips

  • Start simple
  • Use cross-validation
  • Check model assumptions
  • Monitor performance

Remember: Simpler models often work better than complex ones.

Pickling

Pickling is a way to save Python objects to files and load them back later. Think of it like saving your game progress.

Basic Usage

import pickle
# Save object to file
def save_object(obj, filename):
"""Save any Python object to a file"""
with open(filename, 'wb') as file:
pickle.dump(obj, file)
# Load object from file
def load_object(filename):
"""Load a Python object from a file"""
with open(filename, 'rb') as file:
return pickle.load(file)
# Example Usage
model = {'name': 'MyModel', 'score': 0.95}
save_object(model, 'model.pkl')
loaded_model = load_object('model.pkl')

Common Use Cases

  1. Save ML Models
from sklearn.linear_model import LinearRegression
import numpy as np
# Train and save model
model = LinearRegression()
X = np.array([[1], [2], [3]])
y = np.array([2, 4, 6])
model.fit(X, y)
save_object(model, 'linear_model.pkl')
# Load and use model later
loaded_model = load_object('linear_model.pkl')
prediction = loaded_model.predict([[4]])
  1. Save Data Structures
# Save processed data
data = {
'features': [1, 2, 3],
'labels': ['A', 'B', 'C']
}
save_object(data, 'data.pkl')

Best Practices

  1. File Extensions

    • Use .pkl or .pickle
    • Makes files easily identifiable
  2. Error Handling

def safe_save(obj, filename):
"""Save with error handling"""
try:
save_object(obj, filename)
return True
except Exception as e:
print(f"Error saving: {e}")
return False
  1. Security Note
    • Only unpickle files you trust
    • Pickled files can contain malicious code

Remember: Pickling is for Python objects only. Use JSON for sharing data with other languages.