Machine Learning Techniques

Types of Machine Learning

Machine learning can be divided into three main categories based on how models learn from data.

1. Supervised Learning

Models learn from labeled data to make predictions.

How it Works

Input: Data with known answers (labels)
Process: Model learns patterns between features and labels
Output: Predictions for new data

Types

Classification (Predicts categories)

from sklearn.ensemble import RandomForestClassifier

# Disease Classification Example
class DiseaseClassifier:
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, symptoms, diagnoses):
        """Train with patient symptoms and known diagnoses"""
        self.model.fit(symptoms, diagnoses)

    def predict(self, patient_symptoms):
        """Predict disease category"""
        return self.model.predict([patient_symptoms])[0]

# Example Usage
symptoms = [
    [1, 0, 1],  # Patient 1: [fever, cough, fatigue]
    [0, 1, 0],  # Patient 2
    [1, 1, 1]   # Patient 3
]
diagnoses = ['flu', 'cold', 'flu']

classifier = DiseaseClassifier()
classifier.train(symptoms, diagnoses)
new_patient = [1, 0, 1]
prediction = classifier.predict(new_patient)

Regression (Predicts numbers)

from sklearn.linear_model import LinearRegression

# House Price Prediction
class PricePredictor:
    def __init__(self):
        self.model = LinearRegression()

    def train(self, features, prices):
        """Train with house features and known prices"""
        self.model.fit(features, prices)

    def predict(self, house_features):
        """Predict house price"""
        return self.model.predict([house_features])[0]

# Example Usage
features = [
    [1500, 3],  # [size, bedrooms]
    [2000, 4],
    [1200, 2]
]
prices = [300000, 450000, 250000]

predictor = PricePredictor()
predictor.train(features, prices)
new_house = [1800, 3]
price = predictor.predict(new_house)

2. Unsupervised Learning

Models find patterns in unlabeled data.

How it Works

Input: Data without labels
Process: Model discovers inherent patterns
Output: Groups or structures in data

Types

Clustering (Groups similar items)

from sklearn.cluster import KMeans

class CustomerSegmentation:
    def __init__(self, n_clusters=3):
        self.model = KMeans(n_clusters=n_clusters)

    def segment(self, customer_data):
        """Group customers by behavior"""
        return self.model.fit_predict(customer_data)

# Example Usage
customers = [
    [100, 20],  # [spending, visits]
    [500, 50],
    [50, 10]
]
segmentation = CustomerSegmentation()
segments = segmentation.segment(customers)

Dimensionality Reduction (Simplifies data)

from sklearn.decomposition import PCA

class ImageCompressor:
    def __init__(self, n_components=2):
        self.model = PCA(n_components=n_components)

    def compress(self, images):
        """Reduce image dimensions"""
        return self.model.fit_transform(images)

# Example Usage
images = [
    [1, 2, 3, 4],  # Each row is a flattened image
    [2, 3, 4, 5],
    [3, 4, 5, 6]
]
compressor = ImageCompressor()
compressed = compressor.compress(images)

3. Reinforcement Learning

Models learn through trial and error with rewards.

How it Works

Input: Environment state
Process: Agent takes actions and receives rewards
Output: Optimal action policy

class GameAgent:
    def __init__(self, states, actions):
        self.q_table = {}  # State-action values

    def choose_action(self, state):
        """Choose best action for current state"""
        if state not in self.q_table:
            self.q_table[state] = [0] * len(self.actions)
        return np.argmax(self.q_table[state])

    def learn(self, state, action, reward, next_state):
        """Update knowledge based on reward"""
        old_value = self.q_table[state][action]
        next_max = np.max(self.q_table[next_state])
        self.q_table[state][action] = old_value + 0.1 * (reward + 0.9 * next_max - old_value)

# Example Usage
agent = GameAgent(states=['start', 'middle', 'end'], actions=['left', 'right'])
state = 'start'
action = agent.choose_action(state)

Key Differences

Supervised vs Unsupervised
- Supervised: Has correct answers to learn from
- Unsupervised: Finds patterns without answers
Supervised vs Reinforcement
- Supervised: Learns from static examples
- Reinforcement: Learns from dynamic interaction
Unsupervised vs Reinforcement
- Unsupervised: Finds structure in data
- Reinforcement: Learns optimal behavior

When to Use Each

Use Supervised Learning When
- You have labeled data
- Need specific predictions
- Have clear input-output pairs
Use Unsupervised Learning When
- Data isn’t labeled
- Want to discover patterns
- Need to reduce data complexity
Use Reinforcement Learning When
- Have interactive environment
- Can define clear rewards
- Need adaptive behavior

Common Applications

Supervised Learning
- Spam detection
- Disease diagnosis
- Price prediction
- Image recognition
Unsupervised Learning
- Customer segmentation
- Anomaly detection
- Feature extraction
- Recommendation systems
Reinforcement Learning
- Game playing
- Robot navigation
- Resource management
- Trading strategies

Remember: Choose the learning type based on your data availability and problem requirements. Each type has its strengths and ideal use cases.

Supervised Learning

Supervised learning is a type of machine learning where models learn from labeled data to make predictions on new, unseen data.

Key Concepts

Input Data (X): Features or predictors
Output Data (y): Labels or target values
Training: Model learns patterns from X → y mapping
Prediction: Model predicts y for new X

Types of Supervised Learning

1. Classification

Predicts categorical outputs
Examples: Yes/No, Cat/Dog/Bird

from sklearn.ensemble import RandomForestClassifier

# Email Spam Classification
class SpamClassifier:
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, emails, labels):
        """
        Train spam classifier

        Args:
            emails: List of email features (word counts, etc.)
            labels: 1 for spam, 0 for not spam
        """
        self.model.fit(emails, labels)

    def predict(self, email):
        """Predict if email is spam"""
        prediction = self.model.predict([email])[0]
        return "Spam" if prediction == 1 else "Not Spam"

# Example Usage
features = [[1, 0, 2], [0, 0, 0], [3, 1, 4]]  # Word counts: [money, friend, buy]
labels = [1, 0, 1]  # 1=spam, 0=not spam

classifier = SpamClassifier()
classifier.train(features, labels)

new_email = [2, 0, 3]  # New email features
result = classifier.predict(new_email)
print(f"Email classified as: {result}")

2. Regression

Predicts continuous numerical values
Examples: Price, Temperature, Age

from sklearn.linear_model import LinearRegression
import numpy as np

class HousePricePredictor:
    def __init__(self):
        self.model = LinearRegression()

    def train(self, features, prices):
        """
        Train house price predictor

        Args:
            features: Array of [size, bedrooms, age]
            prices: Array of house prices
        """
        self.model.fit(features, prices)

    def predict(self, house):
        """Predict house price"""
        return self.model.predict([house])[0]

# Example Usage
features = [
    [1500, 3, 10],  # size, bedrooms, age
    [2000, 4, 5],
    [1200, 2, 15]
]
prices = [300000, 450000, 250000]

predictor = HousePricePredictor()
predictor.train(features, prices)

new_house = [1800, 3, 8]
price = predictor.predict(new_house)
print(f"Predicted price: ${price:,.2f}")

Common Algorithms

Linear Models
- Linear Regression
- Logistic Regression
- Use: Simple, interpretable predictions
Tree-Based Models
- Decision Trees
- Random Forests
- Use: Complex patterns, non-linear relationships
Neural Networks
- Deep Learning
- Multi-layer Perceptron
- Use: Image, text, complex patterns

Real-World Applications

1. Credit Card Fraud Detection

class FraudDetector:
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, transactions, fraud_labels):
        """
        Train fraud detector

        Args:
            transactions: Features [amount, time, location_code, etc.]
            fraud_labels: 1 for fraud, 0 for legitimate
        """
        self.model.fit(transactions, fraud_labels)

    def check_transaction(self, transaction):
        """Check if transaction is fraudulent"""
        risk_score = self.model.predict_proba([transaction])[0][1]
        return {
            'risk_score': risk_score,
            'flag_for_review': risk_score > 0.7
        }

# Example
detector = FraudDetector()
transaction = [1000, 3, 45]  # amount, time, location
result = detector.check_transaction(transaction)
print(f"Risk Score: {result['risk_score']:.2%}")

2. Medical Diagnosis

class DiseasePredictor:
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, patient_data, diagnoses):
        """
        Train disease predictor

        Args:
            patient_data: Features [symptoms, age, history, etc.]
            diagnoses: Disease labels
        """
        self.model.fit(patient_data, diagnoses)

    def predict_risk(self, patient):
        """Predict disease risk"""
        probability = self.model.predict_proba([patient])[0][1]
        return {
            'risk_level': 'High' if probability > 0.7 else 'Low',
            'probability': probability
        }

# Example
symptoms = [1, 0, 1, 0]  # fever, cough, fatigue, pain
result = DiseasePredictor().predict_risk(symptoms)
print(f"Risk Level: {result['risk_level']}")

Best Practices

Data Quality
- Clean, representative data
- Balanced classes
- Proper feature scaling
Model Selection
- Start simple
- Use cross-validation
- Consider interpretability needs
Evaluation
- Use appropriate metrics
- Test on unseen data
- Monitor performance

Common Challenges

Overfitting
- Model learns noise in training data
- Solution: Regularization, cross-validation
Underfitting
- Model too simple for data
- Solution: More features, complex model
Data Issues
- Missing values
- Imbalanced classes
- Noisy labels

Performance Metrics

Classification

from sklearn.metrics import accuracy_score, precision_score, recall_score

def evaluate_classifier(y_true, y_pred):
    """Calculate classification metrics"""
    return {
        'accuracy': accuracy_score(y_true, y_pred),
        'precision': precision_score(y_true, y_pred),
        'recall': recall_score(y_true, y_pred)
    }

Regression

from sklearn.metrics import mean_squared_error, r2_score

def evaluate_regression(y_true, y_pred):
    """Calculate regression metrics"""
    return {
        'mse': mean_squared_error(y_true, y_pred),
        'r2': r2_score(y_true, y_pred)
    }

When to Use

Classification
- Spam detection
- Disease diagnosis
- Customer churn prediction
- Image recognition
Regression
- Price prediction
- Sales forecasting
- Temperature estimation
- Resource consumption

Remember: Success in supervised learning depends heavily on data quality and proper model selection for your specific problem.

Unsupervised Learning

Unsupervised learning finds patterns in data without labels. Think of it like sorting clothes by color without knowing the color names.

Main Types

Clustering
- Groups similar items together
- Example: Grouping customers by shopping behavior
Dimensionality Reduction
- Simplifies complex data
- Example: Compressing images while keeping important features

Simple Examples

# Clustering Example
from sklearn.cluster import KMeans

class CustomerGrouping:
    def __init__(self, n_groups=3):
        self.model = KMeans(n_clusters=n_groups)

    def group_customers(self, data):
        """
        Group customers based on behavior
        data: Array of [purchase_amount, visit_frequency]
        """
        return self.model.fit_predict(data)

# Usage
customers = [
    [100, 5],   # [spending, visits]
    [500, 15],
    [50, 2]
]
groups = CustomerGrouping().group_customers(customers)

# Dimensionality Reduction Example
from sklearn.decomposition import PCA

class DataSimplifier:
    def __init__(self, n_features=2):
        self.model = PCA(n_components=n_features)

    def simplify(self, data):
        """
        Reduce data complexity
        data: Complex data with many features
        """
        return self.model.fit_transform(data)

When to Use

Use Clustering When:
- Finding customer segments
- Grouping similar documents
- Detecting anomalies
Use Dimensionality Reduction When:
- Reducing image size
- Visualizing high-dimensional data
- Removing noise from data

Common Algorithms

Clustering
- K-means: Simple, fast grouping
- DBSCAN: Finds clusters of any shape
- Hierarchical: Creates tree of clusters
Dimensionality Reduction
- PCA: Linear data compression
- t-SNE: Complex data visualization
- Autoencoders: Neural network compression

Tips for Success

Data Preparation
- Scale features to same range
- Remove outliers if needed
- Handle missing values
Algorithm Selection
- Start with simple methods
- K-means for clustering
- PCA for dimension reduction
Evaluation
- Check if groups make sense
- Validate with domain experts
- Use visualization tools

Reinforcement Learning

Reinforcement learning is like teaching through trial and error with rewards. Think of training a dog - good behavior gets treats, bad behavior doesn’t.

Core Concepts

Agent: The learner (like a robot or game player)
Environment: Where the agent operates
State: Current situation
Action: What the agent can do
Reward: Feedback on how good an action was

Simple Example

import numpy as np

class SimpleGameAgent:
    def __init__(self, states, actions):
        self.q_table = {}  # Stores state-action values
        self.states = states
        self.actions = actions
        self.learning_rate = 0.1
        self.discount = 0.9

    def get_action(self, state, explore_rate=0.1):
        """Choose action: explore or exploit"""
        if state not in self.q_table:
            self.q_table[state] = {a: 0 for a in self.actions}

        # Sometimes try random actions (explore)
        if np.random.random() < explore_rate:
            return np.random.choice(self.actions)

        # Usually pick best action (exploit)
        return max(self.q_table[state], key=self.q_table[state].get)

    def learn(self, state, action, reward, next_state):
        """Update knowledge from experience"""
        if next_state not in self.q_table:
            self.q_table[next_state] = {a: 0 for a in self.actions}

        # Update Q-value based on reward
        old_value = self.q_table[state][action]
        next_best = max(self.q_table[next_state].values())
        new_value = (1 - self.learning_rate) * old_value + \
                   self.learning_rate * (reward + self.discount * next_best)
        self.q_table[state][action] = new_value

# Example Usage
states = ['start', 'middle', 'end']
actions = ['left', 'right']

agent = SimpleGameAgent(states, actions)
current_state = 'start'
action = agent.get_action(current_state)
reward = 1  # Get reward from environment
next_state = 'middle'
agent.learn(current_state, action, reward, next_state)

Common Applications

Games
- Chess
- Video games
- Board games
Robotics
- Navigation
- Manipulation
- Walking
Business
- Trading
- Resource management
- Ad placement

Key Algorithms

Q-Learning
- Simple table-based learning
- Good for small problems
- Easy to understand
Deep Q-Learning
- Uses neural networks
- Good for complex problems
- Handles large state spaces
Policy Gradient
- Learns actions directly
- Good for continuous actions
- Used in robotics

Tips for Success

Start Simple
- Use basic environments
- Limit state/action space
- Test frequently
Handle Exploration
- Try random actions early
- Reduce randomness over time
- Balance explore/exploit
Reward Design
- Clear goals
- Immediate feedback
- Consistent values

Note: Reinforcement learning needs patience. Agents often perform poorly at first but improve with practice.

Equation of a Line, 3D Planes, and Hyperplanes

1. Line Equation (2D)

A line is the simplest form of linear relationship:

y = mx + b
# m = slope (rise/run)
# b = y-intercept

Example:

def line_equation(x, m=2, b=1):
    """
    Calculate y value for a line
    Args:
        x: x coordinate
        m: slope
        b: y-intercept
    Returns:
        y coordinate
    """
    return m * x + b

# Example: y = 2x + 1
x_points = [0, 1, 2]
y_points = [line_equation(x) for x in x_points]
# y_points = [1, 3, 5]

2. Plane Equation (3D)

A plane in 3D space:

z = ax + by + c
# a, b = slopes in x and y directions
# c = z-intercept

Example:

def plane_equation(x, y, a=1, b=1, c=0):
    """
    Calculate z value for a plane
    Args:
        x, y: coordinates
        a, b: slopes
        c: z-intercept
    Returns:
        z coordinate
    """
    return a * x + b * y + c

# Example: z = x + y
point = plane_equation(1, 2)  # z = 3

3. Hyperplane (N dimensions)

For machine learning with multiple features:

y = w1*x1 + w2*x2 + ... + wn*xn + b
# w = weights
# x = features
# b = bias

Simple Linear Classifier:

import numpy as np

class SimpleClassifier:
    def __init__(self, n_features):
        self.weights = np.zeros(n_features)
        self.bias = 0

    def predict(self, features):
        """
        Classify using hyperplane
        Args:
            features: array of feature values
        Returns:
            1 if above plane, 0 if below
        """
        value = np.dot(features, self.weights) + self.bias
        return 1 if value > 0 else 0

# Example
classifier = SimpleClassifier(2)
classifier.weights = np.array([1, 1])
classifier.bias = -1

# Point (1, 1) is above plane
print(classifier.predict([1, 1]))  # 1

# Point (0, 0) is below plane
print(classifier.predict([0, 0]))  # 0

Key Points:

Lines separate points in 2D

Planes separate points in 3D

Hyperplanes separate points in higher dimensions

Essential for linear classification and regression

Distance Of a Point to a Plane

The distance from a point to a plane is useful in machine learning for:

Finding margins in SVM algorithms
Calculating error in geometric models
Determining point classifications

Formula

from numpy import array, dot, sqrt

def point_to_plane_distance(point, plane_normal, plane_point):
    """
    Calculate shortest distance from point to plane

    Args:
        point: Array [x, y, z] - Point to measure from
        plane_normal: Array [a, b, c] - Normal vector to plane
        plane_point: Array [x, y, z] - Any point on plane

    Returns:
        distance: Absolute distance from point to plane
    """
    # Normalize the plane normal vector
    n = plane_normal / sqrt(dot(plane_normal, plane_normal))

    # Vector from plane to point
    v = point - plane_point

    # Distance is dot product of v and normalized normal
    return abs(dot(v, n))

# Example Usage
point = array([1, 1, 1])
plane_normal = array([0, 0, 1])  # Points up Z-axis
plane_point = array([0, 0, 0])   # Plane at Z=0

distance = point_to_plane_distance(point, plane_normal, plane_point)
# distance = 1.0 (point is 1 unit above XY plane)

Visual Example

Plane: z = 0 (XY plane)
Point: (1, 1, 1)
Distance: 1 unit (straight up from plane)

Note: The normal vector determines plane orientation. It points perpendicular to the plane surface.

Instance Based Learning

Instance Based Learning (IBL) stores training examples to predict new cases, like using past experiences to make decisions.

K-Nearest Neighbors (KNN)

The most common IBL algorithm. Predicts based on closest training examples.

from sklearn.neighbors import KNeighborsClassifier

class SimpleKNN:
    def __init__(self, k=3):
        self.model = KNeighborsClassifier(n_neighbors=k)

    def train(self, X, y):
        """Store training examples"""
        self.model.fit(X, y)

    def predict(self, x):
        """Predict using k closest neighbors"""
        return self.model.predict([x])[0]

# Example: Movie Recommendation
movies = [
    [3, 4],  # [length_hours, rating]
    [1, 5],
    [4, 2]
]
likes = ['yes', 'yes', 'no']

recommender = SimpleKNN(k=2)
recommender.train(movies, likes)
new_movie = [2, 4]
prediction = recommender.predict(new_movie)

Key Points

Advantages
- Simple to understand
- No training phase
- Works with new data easily
Disadvantages
- Slow with large datasets
- Needs lots of memory
- Sensitive to irrelevant features

When to Use

Small to medium datasets
When examples are similar to predictions
When quick setup is needed

Note: Choose k (number of neighbors) based on dataset size. Larger k reduces noise sensitivity.

Model Based Learning

Model Based Learning finds patterns in data to make predictions. Think of it like creating a rule book from examples.

Basic Types

Linear Models

from sklearn.linear_model import LinearRegression

class SimpleLinearModel:
    def __init__(self):
        self.model = LinearRegression()

    def train(self, X, y):
        """Learn pattern from data"""
        self.model.fit(X, y)

    def predict(self, x):
        """Make prediction using learned pattern"""
        return self.model.predict([x])[0]

# Example: House Price Prediction
sizes = [[1000], [1500], [2000]]  # Square feet
prices = [200000, 300000, 400000]  # Prices

predictor = SimpleLinearModel()
predictor.train(sizes, prices)
new_size = [1750]
price = predictor.predict(new_size)

Decision Trees

from sklearn.tree import DecisionTreeClassifier

class SimpleTreeModel:
    def __init__(self):
        self.model = DecisionTreeClassifier(max_depth=3)  # Keep it simple

    def train(self, X, y):
        """Learn decision rules"""
        self.model.fit(X, y)

    def predict(self, x):
        """Make decision using rules"""
        return self.model.predict([x])[0]

# Example: Loan Approval
data = [
    [50000, 3],  # [income, years_employed]
    [30000, 1],
    [70000, 5]
]
approved = ['yes', 'no', 'yes']

approver = SimpleTreeModel()
approver.train(data, approved)
new_application = [45000, 2]
decision = approver.predict(new_application)

When to Use

Linear Models
- Simple relationships
- Need explainable results
- Limited data available
Decision Trees
- Complex patterns
- Need clear rules
- Have enough data

Tips

Start simple
Use cross-validation
Check model assumptions
Monitor performance

Remember: Simpler models often work better than complex ones.

Pickling

Pickling is a way to save Python objects to files and load them back later. Think of it like saving your game progress.

Basic Usage

import pickle

# Save object to file
def save_object(obj, filename):
    """Save any Python object to a file"""
    with open(filename, 'wb') as file:
        pickle.dump(obj, file)

# Load object from file
def load_object(filename):
    """Load a Python object from a file"""
    with open(filename, 'rb') as file:
        return pickle.load(file)

# Example Usage
model = {'name': 'MyModel', 'score': 0.95}
save_object(model, 'model.pkl')
loaded_model = load_object('model.pkl')

Common Use Cases

Save ML Models

from sklearn.linear_model import LinearRegression
import numpy as np

# Train and save model
model = LinearRegression()
X = np.array([[1], [2], [3]])
y = np.array([2, 4, 6])
model.fit(X, y)
save_object(model, 'linear_model.pkl')

# Load and use model later
loaded_model = load_object('linear_model.pkl')
prediction = loaded_model.predict([[4]])

Save Data Structures

# Save processed data
data = {
    'features': [1, 2, 3],
    'labels': ['A', 'B', 'C']
}
save_object(data, 'data.pkl')

Best Practices

File Extensions
- Use .pkl or .pickle
- Makes files easily identifiable
Error Handling

def safe_save(obj, filename):
    """Save with error handling"""
    try:
        save_object(obj, filename)
        return True
    except Exception as e:
        print(f"Error saving: {e}")
        return False

Security Note
- Only unpickle files you trust
- Pickled files can contain malicious code

Remember: Pickling is for Python objects only. Use JSON for sharing data with other languages.