Machine Learning Algorithm

Simple Linear Regression

Simple linear regression is a basic predictive modeling technique that models the relationship between one input variable (X) and one output variable (Y).

How it Works

The Line Equation
```
Y = mX + b
```
- Y: Predicted value (dependent variable)
- X: Input value (independent variable)
- m: Slope (how much Y changes when X changes)
- b: Y-intercept (value of Y when X = 0)
Finding Best Fit
- Uses “least squares” method
- Minimizes the sum of squared differences between predicted and actual Y values
- Lower error = better fit

Example

Predicting house prices based on square footage:

X = Square footage (input)
Y = House price (prediction)
m = Price increase per square foot
b = Base price

When to Use

One input variable, one output variable
Data shows roughly linear pattern
Quick insights needed
Basic predictions

Limitations

Only handles linear relationships
Sensitive to outliers
Too simple for complex problems

Code Example

# Basic implementation using sklearn
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1], [2], [3], [4]])  # Input data
y = np.array([2, 4, 6, 8])          # Output data

model = LinearRegression()
model.fit(X, y)

# Predict new value
prediction = model.predict([[5]])

Real World Example: House Price Prediction

Let’s predict house prices using square footage:

import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Sample data
house_data = {
    'sqft': [1200, 1500, 1800, 2200, 2500],
    'price': [150000, 175000, 210000, 250000, 290000]
}
df = pd.DataFrame(house_data)

# Prepare data
X = df[['sqft']].values
y = df['price'].values

# Train model
model = LinearRegression()
model.fit(X, y)

# Get equation components
slope = model.coef_[0]
intercept = model.intercept_

print(f"Price = {slope:.2f} × sqft + {intercept:.2f}")

# Predict price for a 2000 sqft house
new_house = [[2000]]
predicted_price = model.predict(new_house)
print(f"Predicted price for 2000 sqft: ${predicted_price[0]:,.2f}")

What This Shows:

Each square foot increases price by a fixed amount (slope)
Base price is the intercept
Model learns from existing house prices
Can predict prices for new houses

Output Example:

Price = 110.23 × sqft + 15000.00
Predicted price for 2000 sqft: $235,460.00

Cost Function

The cost function helps us measure how well our linear regression line fits the data. Think of it as a “wrongness score” - the lower the score, the better the fit.

How it Works

Mean Squared Error (MSE)
```
MSE = (1/n) * Σ(y_actual - y_predicted)²
```
- n: Number of data points
- y_actual: Real value
- y_predicted: Model’s prediction
- Σ: Sum everything
Why Square the Errors?
- Makes all errors positive
- Penalizes big mistakes more
- Easier to calculate the minimum

Visual Example

import numpy as np
import matplotlib.pyplot as plt

# Sample data
X = np.array([1, 2, 3, 4, 5])
y_actual = np.array([2, 4, 5, 4, 5])

# Bad fit line
m_bad = 0.5
b_bad = 1
y_bad = m_bad * X + b_bad

# Good fit line
m_good = 0.8
b_good = 1.5
y_good = m_good * X + b_good

# Calculate MSE
mse_bad = np.mean((y_actual - y_bad)**2)
mse_good = np.mean((y_actual - y_good)**2)

print(f"Bad fit MSE: {mse_bad:.2f}")
print(f"Good fit MSE: {mse_good:.2f}")

Finding the Best Line

Start with random slope (m) and intercept (b)
Calculate MSE
Adjust m and b to reduce MSE
Repeat until MSE can’t get lower

Code Example

from sklearn.metrics import mean_squared_error

# Sample data
X = np.array([[1], [2], [3], [4]])
y_true = np.array([2, 4, 6, 8])

# Train model
model = LinearRegression()
model.fit(X, y_true)

# Make predictions
y_pred = model.predict(X)

# Calculate cost
mse = mean_squared_error(y_true, y_pred)
print(f"Model's MSE: {mse:.2f}")

Key Points

Lower cost = better fit
Perfect fit has cost of 0
Used to train the model
Helps prevent overfitting

Convergence Algorithm

Gradient descent helps find the best line by gradually adjusting the slope and intercept. Think of it like walking downhill to find the lowest point.

How it Works

Basic Steps

For each step:
1. Calculate current error
2. Find direction of steepest descent
3. Take a small step in that direction
4. Repeat until minimal improvement

Learning Rate (α)
- Controls step size
- Too large: might overshoot
- Too small: takes too long
- Typical values: 0.01 to 0.1

Simple Implementation

import numpy as np

def gradient_descent(X, y, learning_rate=0.01, epochs=1000):
    m = 0  # Initial slope
    b = 0  # Initial intercept
    n = len(X)  # Number of data points

    for _ in range(epochs):
        # Current predictions
        y_pred = m * X + b

        # Calculate gradients
        dm = (-2/n) * sum(X * (y - y_pred))
        db = (-2/n) * sum(y - y_pred)

        # Update parameters
        m = m - learning_rate * dm
        b = b - learning_rate * db

    return m, b

# Example usage
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

final_m, final_b = gradient_descent(X, y)
print(f"Final equation: y = {final_m:.2f}x + {final_b:.2f}")

Convergence Types

Batch Gradient Descent
- Uses all data points
- More stable
- Slower for large datasets
Stochastic Gradient Descent
- Uses one random point
- Faster but noisier
- Better for large datasets

Stopping Conditions

Maximum iterations reached
Error change is very small
Gradient becomes very small

Common Issues and Solutions

Not Converging
- Reduce learning rate
- Normalize input data
- Check for data issues
Slow Convergence
- Increase learning rate
- Use momentum
- Try different initialization

Code with Early Stopping

def gradient_descent_with_stopping(X, y, learning_rate=0.01,
                                 tolerance=1e-6, max_epochs=1000):
    m = b = 0
    prev_cost = float('inf')

    for epoch in range(max_epochs):
        y_pred = m * X + b
        cost = np.mean((y - y_pred) ** 2)

        # Check for convergence
        if abs(prev_cost - cost) < tolerance:
            print(f"Converged at epoch {epoch}")
            break

        # Update parameters
        dm = (-2/len(X)) * sum(X * (y - y_pred))
        db = (-2/len(X)) * sum(y - y_pred)

        m -= learning_rate * dm
        b -= learning_rate * db
        prev_cost = cost

    return m, b

Key Points

Automatically finds best parameters
Learning rate is crucial
May need multiple runs
Works for many ML algorithms

Multiple Linear Regression

Multiple linear regression predicts an outcome using two or more input variables. Think of it as simple linear regression with more features.

How it Works

The Equation
```
Y = b + m₁X₁ + m₂X₂ + ... + mₙXₙ
```
- Y: Predicted value
- b: Base value (intercept)
- m₁, m₂, etc.: Coefficients for each feature
- X₁, X₂, etc.: Input features

Real World Example: House Price Prediction

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Sample data
house_data = {
    'sqft': [1200, 1500, 1800, 2200, 2500],
    'bedrooms': [2, 3, 3, 4, 4],
    'age': [5, 10, 15, 5, 8],
    'price': [150000, 175000, 210000, 250000, 290000]
}
df = pd.DataFrame(house_data)

# Prepare features and target
X = df[['sqft', 'bedrooms', 'age']]
y = df['price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Show coefficients
for feature, coef in zip(X.columns, model.coef_):
    print(f"{feature}: ${coef:,.2f} impact")
print(f"Base price: ${model.intercept_:,.2f}")

# Predict new house
new_house = [[2000, 3, 10]]  # 2000 sqft, 3 beds, 10 years old
prediction = model.predict(new_house)
print(f"\nPredicted price: ${prediction[0]:,.2f}")

Feature Selection

Good features are:

Related to what you’re predicting
Independent from each other
Actually available in real use

Data Preparation

Handle Missing Values

# Fill missing values
df.fillna(df.mean(), inplace=True)

Scale Features

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Model Evaluation

from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

# Make predictions
y_pred = model.predict(X_test)

# Calculate metrics
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f"R² Score: {r2:.2f}")
print(f"RMSE: ${rmse:,.2f}")

Key Points

More features = more complex model
Features should be meaningful
Watch for multicollinearity
Scale features if needed
Check model assumptions

Limitations

Assumes linear relationships
Sensitive to outliers
Can overfit with too many features
Features should be independent

Performance Metrics

Performance metrics help us understand how well our model is performing. Here are the key metrics for regression models.

Common Metrics

Mean Squared Error (MSE)
```
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_true, y_pred)
```
- Measures average squared difference between predictions and actual values
- Penalizes larger errors more
- Always positive, lower is better
Root Mean Squared Error (RMSE)
```
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
```
- Square root of MSE
- Same units as target variable
- Easier to interpret than MSE
R-squared (R²)
```
from sklearn.metrics import r2_score

r2 = r2_score(y_true, y_pred)
```
- Shows percentage of variance explained
- Range: 0 to 1 (higher is better)
- 0.7 means model explains 70% of variance

Complete Example

import numpy as np
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

def evaluate_model(y_true, y_pred):
    # Calculate metrics
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)

    # Print results
    print(f"MSE: {mse:.2f}")
    print(f"RMSE: {rmse:.2f}")
    print(f"R²: {r2:.2f}")
    print(f"MAE: {mae:.2f}")

    return mse, rmse, r2, mae

# Example usage
y_true = np.array([10, 20, 30, 40, 50])
y_pred = np.array([12, 18, 31, 38, 51])

evaluate_model(y_true, y_pred)

Cross-Validation

from sklearn.model_selection import cross_val_score

def cv_evaluate(model, X, y, cv=5):
    # Get cross-validation scores
    scores = cross_val_score(model, X, y, cv=cv)

    print(f"CV Scores: {scores}")
    print(f"Mean Score: {scores.mean():.2f}")
    print(f"Std Dev: {scores.std():.2f}")

Visualization

import matplotlib.pyplot as plt

def plot_predictions(y_true, y_pred):
    plt.scatter(y_true, y_pred)
    plt.plot([y_true.min(), y_true.max()],
             [y_true.min(), y_true.max()],
             'r--', lw=2)
    plt.xlabel('Actual Values')
    plt.ylabel('Predictions')
    plt.title('Actual vs Predicted')
    plt.show()

When to Use Each Metric

Use RMSE when:
- You need error in same units as target
- Large errors are particularly bad
Use R² when:
- Explaining model to non-technical people
- Comparing different models
Use Cross-validation when:
- Dataset is small
- Need reliable performance estimate

Key Points

Use multiple metrics
Consider your audience
Check for overfitting
Validate on test data
Compare to baseline

MSE, MAE and RMSE

These are the three most important error metrics for regression models. Let’s understand each one simply.

Mean Absolute Error (MAE)

MAE = (1/n) * Σ|y_true - y_pred|

What it means:

Average of absolute differences between predictions and actual values
Easier to understand
All errors weighted equally
Same unit as your data

from sklearn.metrics import mean_absolute_error

# Example
y_true = [10, 20, 30]
y_pred = [12, 18, 35]

mae = mean_absolute_error(y_true, y_pred)
print(f"MAE: {mae}")  # Shows average error in original units

Mean Squared Error (MSE)

MSE = (1/n) * Σ(y_true - y_pred)²

What it means:

Square the errors before averaging
Penalizes large errors more
Units are squared (if predicting dollars, MSE is in dollars²)
Always positive

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_true, y_pred)
print(f"MSE: {mse}")

Root Mean Square Error (RMSE)

RMSE = √MSE

What it means:

Square root of MSE
Back to original units
Still penalizes large errors
Most commonly used metric

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE: {rmse}")

Complete Example

import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error

def compare_metrics(y_true, y_pred):
    # Calculate all metrics
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)

    print("Example predictions vs actual:")
    for t, p in zip(y_true, y_pred):
        print(f"Actual: {t}, Predicted: {p}, Difference: {abs(t-p)}")

    print(f"\nMAE: {mae:.2f}")
    print(f"MSE: {mse:.2f}")
    print(f"RMSE: {rmse:.2f}")

# Test with house prices (in thousands)
actual = [200, 300, 400, 500]
predicted = [180, 320, 390, 510]

compare_metrics(actual, predicted)

When to Use Each

Use MAE when:

You need simple interpretation
All errors equally important
Outliers are not a big concern

Use MSE when:

Large errors are more important
You’re training models
You don’t need interpretable units

Use RMSE when:

You want interpretable units
Large errors matter more
Comparing different models

Key Points

MAE is most interpretable
RMSE is most popular
MSE is best for training
Always use same metric when comparing models

OVERFITING AND UNDERFITING

Understanding when your model learns too much or too little from the data.

What Are They?

Underfitting
- Model is too simple
- Doesn’t capture important patterns
- Poor performance on both training and test data
- Like memorizing only basic rules
Overfitting
- Model is too complex
- Learns noise in training data
- Great on training data, poor on test data
- Like memorizing answers instead of understanding

Visual Example

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Generate sample data
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = 3*X + np.sin(X)*2 + np.random.normal(0, 1.5, (100,1))

# Three models
def plot_fits():
    # Underfit: straight line
    underfit = LinearRegression()
    underfit.fit(X, y)
    y_under = underfit.predict(X)

    # Good fit: polynomial degree 3
    good = PolynomialFeatures(degree=3)
    X_good = good.fit_transform(X)
    model_good = LinearRegression().fit(X_good, y)
    y_good = model_good.predict(X_good)

    # Overfit: polynomial degree 15
    overfit = PolynomialFeatures(degree=15)
    X_over = overfit.fit_transform(X)
    model_over = LinearRegression().fit(X_over, y)
    y_over = model_over.predict(X_over)

    # Plot
    plt.scatter(X, y, color='gray', alpha=0.5, label='Data')
    plt.plot(X, y_under, 'r-', label='Underfit')
    plt.plot(X, y_good, 'g-', label='Good fit')
    plt.plot(X, y_over, 'b-', label='Overfit')
    plt.legend()
    plt.show()

plot_fits()

How to Detect

Underfitting Signs:
- High training error
- High validation error
- Model makes very simple predictions
Overfitting Signs:
- Low training error
- High validation error
- Model makes complex, wiggly predictions

Solutions

For Underfitting:

# Add more features
from sklearn.preprocessing import PolynomialFeatures

# Create more complex features
poly = PolynomialFeatures(degree=2)
X_more_features = poly.fit_transform(X)

# Try more complex model
from sklearn.ensemble import RandomForestRegressor
complex_model = RandomForestRegressor(n_estimators=100)

For Overfitting:

# Add regularization
from sklearn.linear_model import Ridge, Lasso

# L2 regularization
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)

# L1 regularization
lasso = Lasso(alpha=1.0)
lasso.fit(X, y)

# Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)

Prevention Techniques

Cross Validation

from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train and evaluate
model.fit(X_train, y_train)
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)

print(f"Training score: {train_score:.2f}")
print(f"Testing score: {test_score:.2f}")

Learning Curves

from sklearn.model_selection import learning_curve

def plot_learning_curve(model, X, y):
    train_sizes, train_scores, val_scores = learning_curve(
        model, X, y, cv=5, n_jobs=-1,
        train_sizes=np.linspace(0.1, 1.0, 10))

    plt.plot(train_sizes, train_scores.mean(axis=1), label='Training score')
    plt.plot(train_sizes, val_scores.mean(axis=1), label='Cross-validation score')
    plt.xlabel('Training examples')
    plt.ylabel('Score')
    plt.legend()
    plt.show()

Key Points

Balance is crucial
Use validation data
Start simple, add complexity slowly
Monitor training vs validation performance
Use regularization when needed

Linear Regression with Ordinary Least Squares (OLS)

OLS is the most common method to find the best-fitting line in linear regression. It minimizes the sum of squared differences between predictions and actual values.

How OLS Works

The Basic Idea
- Find line that minimizes squared errors
- Squared errors = (actual - predicted)²
- Has a mathematical solution (no iteration needed)
The Formula
```
β = (X^T X)^(-1) X^T y
```
Where:
- β: Coefficients (slope and intercept)
- X: Input features
- y: Target values
- ^T: Transpose
- ^(-1): Matrix inverse

Simple Implementation

import numpy as np

def simple_ols(X, y):
    # Add column of 1s for intercept
    X = np.column_stack([np.ones(len(X)), X])

    # Calculate coefficients
    beta = np.linalg.inv(X.T @ X) @ X.T @ y

    # Return intercept and slope
    return beta[0], beta[1]

# Example usage
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

intercept, slope = simple_ols(X, y)
print(f"y = {slope:.2f}x + {intercept:.2f}")

Using Statsmodels (More Detailed)

import statsmodels.api as sm

def detailed_ols(X, y):
    # Add constant
    X = sm.add_constant(X)

    # Fit model
    model = sm.OLS(y, X).fit()

    # Print summary
    print(model.summary().tables[1])

    return model

# Example with house prices
X = np.array([1500, 1800, 2000, 2200, 2500])  # Square footage
y = np.array([150000, 180000, 210000, 220000, 250000])  # Prices

model = detailed_ols(X, y)

Using Scikit-learn (Simple)

from sklearn.linear_model import LinearRegression

def sklearn_ols(X, y):
    # Reshape X if needed
    if X.ndim == 1:
        X = X.reshape(-1, 1)

    # Fit model
    model = LinearRegression()
    model.fit(X, y)

    print(f"Slope: {model.coef_[0]:.2f}")
    print(f"Intercept: {model.intercept_:.2f}")
    print(f"R² Score: {model.score(X, y):.2f}")

    return model

# Example usage
model = sklearn_ols(X, y)

Assumptions of OLS

Linearity
- Relationship is actually linear
- Check with scatter plots
Independence
- Observations are independent
- No time series patterns
Normality
- Residuals are normally distributed
- Check with histogram
Equal Variance
- Spread of residuals is constant
- Check with residual plot

Checking Assumptions

def check_assumptions(model, X, y):
    # Get predictions and residuals
    y_pred = model.predict(X)
    residuals = y - y_pred

    # Plot residuals
    plt.figure(figsize=(10, 4))

    # Residual plot
    plt.subplot(121)
    plt.scatter(y_pred, residuals)
    plt.axhline(y=0, color='r', linestyle='--')
    plt.xlabel('Predicted')
    plt.ylabel('Residuals')

    # Histogram of residuals
    plt.subplot(122)
    plt.hist(residuals, bins=20)
    plt.xlabel('Residuals')
    plt.ylabel('Frequency')

    plt.tight_layout()
    plt.show()

Key Points

Simple and fast
Has exact solution
Works well for linear data
Check assumptions
Use with small/medium datasets

Linear Regression with Regularization

Regularization helps prevent overfitting by adding a penalty for large coefficients. Think of it as making the model simpler.

Types of Regularization

Ridge (L2)
```
Cost = MSE + α * (sum of squared coefficients)
```
- Shrinks coefficients toward zero
- Never makes them exactly zero
- Good for handling multicollinearity
Lasso (L1)
```
Cost = MSE + α * (sum of absolute coefficients)
```
- Can make coefficients exactly zero
- Good for feature selection
- Simpler models

Simple Example

from sklearn.linear_model import Ridge, Lasso
import numpy as np

# Sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([2, 3, 4, 5])

# Ridge regression
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
print("Ridge coefficients:", ridge.coef_)

# Lasso regression
lasso = Lasso(alpha=1.0)
lasso.fit(X, y)
print("Lasso coefficients:", lasso.coef_)

Real World Example: House Price Prediction

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Prepare data
house_data = {
    'sqft': [1200, 1500, 1800, 2200, 2500],
    'bedrooms': [2, 3, 3, 4, 4],
    'age': [5, 10, 15, 5, 8],
    'price': [150000, 175000, 210000, 250000, 290000]
}
df = pd.DataFrame(house_data)

# Scale features
X = df[['sqft', 'bedrooms', 'age']]
y = df['price']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)

# Try different alpha values
alphas = [0.1, 1.0, 10.0]
for alpha in alphas:
    # Ridge
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_train, y_train)

    # Print coefficients
    print(f"\nRidge (alpha={alpha})")
    for name, coef in zip(X.columns, ridge.coef_):
        print(f"{name}: {coef:.2f}")

Finding Best Alpha

from sklearn.model_selection import cross_val_score

def find_best_alpha(X, y, alphas):
    best_score = -float('inf')
    best_alpha = None

    for alpha in alphas:
        model = Ridge(alpha=alpha)
        scores = cross_val_score(model, X, y, cv=5)
        avg_score = scores.mean()

        if avg_score > best_score:
            best_score = avg_score
            best_alpha = alpha

    return best_alpha, best_score

When to Use Each

Use Ridge when:

All features might be important
Features are correlated
Want to reduce coefficients

Use Lasso when:

Need feature selection
Want simpler model
Some features might be useless

Elastic Net

from sklearn.linear_model import ElasticNet

# Combines Ridge and Lasso
elastic = ElasticNet(alpha=1.0, l1_ratio=0.5)
elastic.fit(X_train, y_train)

Key Points

Prevents overfitting
Makes models more stable
Scale features first
Try different alpha values
Use cross-validation

Simple Polynomial Regression

Polynomial regression handles curved relationships by adding powers of X (like X², X³) to linear regression. Think of it as making linear regression flexible enough to fit curves.

How it Works

Basic Idea
```
y = b + m₁x + m₂x² + m₃x³ + ...
```
- b: Base value (intercept)
- x: Input feature
- x²,x³: Powers of x
- m₁,m₂,m₃: Coefficients

Simple Implementation

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

def polynomial_regression(X, y, degree=2):
    # Convert X to polynomial features
    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X.reshape(-1, 1))

    # Fit model
    model = LinearRegression()
    model.fit(X_poly, y)

    return model, poly

# Example usage
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 8, 16, 32])  # Exponential pattern

model, poly = polynomial_regression(X, y, degree=2)

Visual Example

import matplotlib.pyplot as plt

def plot_polynomial_fit(X, y, degree):
    # Fit model
    model, poly = polynomial_regression(X, y, degree)

    # Generate smooth points for curve
    X_smooth = np.linspace(X.min(), X.max(), 100)
    X_smooth_poly = poly.transform(X_smooth.reshape(-1, 1))
    y_smooth = model.predict(X_smooth_poly)

    # Plot
    plt.scatter(X, y, color='blue', label='Data')
    plt.plot(X_smooth, y_smooth, color='red', label=f'Degree {degree}')
    plt.legend()
    plt.show()

    return model

# Example with different degrees
degrees = [1, 2, 3]
for degree in degrees:
    plot_polynomial_fit(X, y, degree)

Real World Example: Temperature Curve

# Daily temperature data
hours = np.array([0, 4, 8, 12, 16, 20, 24])
temp = np.array([15, 13, 18, 25, 23, 18, 15])

def fit_temperature_curve():
    # Fit polynomial model
    model, poly = polynomial_regression(hours, temp, degree=3)

    # Generate smooth curve
    hours_smooth = np.linspace(0, 24, 100)
    hours_poly = poly.transform(hours_smooth.reshape(-1, 1))
    temp_smooth = model.predict(hours_poly)

    # Plot
    plt.scatter(hours, temp, label='Actual')
    plt.plot(hours_smooth, temp_smooth, 'r-', label='Predicted')
    plt.xlabel('Hour of Day')
    plt.ylabel('Temperature (°C)')
    plt.legend()
    plt.show()

Choosing the Right Degree

Too Low (Underfitting)
- Line too rigid
- Misses important patterns
- High error on both training and test
Too High (Overfitting)
- Line too wiggly
- Fits noise in data
- Perfect on training, bad on test

def find_best_degree(X, y, max_degree=10):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    best_score = -float('inf')
    best_degree = 1

    for degree in range(1, max_degree + 1):
        model, poly = polynomial_regression(X_train, y_train, degree)

        # Transform test data
        X_test_poly = poly.transform(X_test.reshape(-1, 1))
        score = model.score(X_test_poly, y_test)

        if score > best_score:
            best_score = score
            best_degree = degree

    return best_degree, best_score

When to Use

Good For:

Curved relationships
Temperature cycles
Growth patterns
Physical processes

Not Good For:

Linear relationships (use simple linear)
Too many features
Very noisy data

Key Points

Start with low degrees (2 or 3)
Check for overfitting
Scale features if needed
Use cross-validation
Balance complexity vs accuracy

Pipeline in Polynomial Regression

A pipeline combines multiple steps (like scaling, polynomial features, and regression) into one clean workflow. Think of it as an assembly line for your data.

Basic Pipeline Structure

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression

def create_poly_pipeline(degree=2):
    return Pipeline([
        ('scale', StandardScaler()),                # Step 1: Scale features
        ('poly', PolynomialFeatures(degree=degree)), # Step 2: Create polynomial
        ('regression', LinearRegression())          # Step 3: Fit regression
    ])

# Simple usage
X = np.array([[1], [2], [3], [4]])
y = np.array([1, 4, 9, 16])  # y = x²

model = create_poly_pipeline(degree=2)
model.fit(X, y)

Complete Example with Cross-Validation

from sklearn.model_selection import cross_val_score

def find_best_polynomial(X, y, max_degree=5):
    best_score = float('-inf')
    best_degree = 1

    for degree in range(1, max_degree + 1):
        # Create pipeline
        pipeline = create_poly_pipeline(degree)

        # Get cross-validation scores
        scores = cross_val_score(pipeline, X, y, cv=5)
        avg_score = scores.mean()

        print(f"Degree {degree}: Score = {avg_score:.3f}")

        if avg_score > best_score:
            best_score = avg_score
            best_degree = degree

    return best_degree, best_score

# Example usage
best_degree, best_score = find_best_polynomial(X, y)
print(f"\nBest degree: {best_degree}")

Real-World Example: House Price Prediction

def house_price_pipeline():
    # Sample data
    house_data = {
        'size': [1000, 1500, 1200, 1700, 2000],
        'price': [200000, 300000, 250000, 350000, 450000]
    }
    X = np.array(house_data['size']).reshape(-1, 1)
    y = np.array(house_data['price'])

    # Create and fit pipeline
    pipeline = create_poly_pipeline(degree=2)
    pipeline.fit(X, y)

    # Make predictions
    sizes = np.linspace(min(X), max(X), 100).reshape(-1, 1)
    predictions = pipeline.predict(sizes)

    # Plot results
    plt.scatter(X, y, color='blue', label='Actual')
    plt.plot(sizes, predictions, color='red', label='Predicted')
    plt.xlabel('House Size (sq ft)')
    plt.ylabel('Price ($)')
    plt.legend()
    plt.show()

Benefits of Using Pipeline

Cleaner Code
- All steps in one place
- No data leakage
- Easy to reproduce
Automatic Order
- Steps run in correct sequence
- No manual data passing
- Handles transformations automatically

Easy Cross-Validation

from sklearn.model_selection import GridSearchCV

# Search for best parameters
param_grid = {
    'poly__degree': [1, 2, 3, 4],
    'regression__fit_intercept': [True, False]
}

grid_search = GridSearchCV(
    create_poly_pipeline(),
    param_grid,
    cv=5
)
grid_search.fit(X, y)

Common Pipeline Steps

Data Scaling
- StandardScaler
- MinMaxScaler
- RobustScaler
Feature Creation
- PolynomialFeatures
- Custom transformers
Model Fitting
- LinearRegression
- Ridge
- Lasso

Key Points

Always scale before polynomial features
Use cross-validation to avoid overfitting
Start with simple pipelines
Add steps as needed
Great for reproducibility

Ridge Regression

Ridge regression prevents overfitting by adding a penalty for large coefficients. Think of it as making the model prefer smaller, more reasonable numbers.

How it Works

Basic Formula
```
Cost = MSE + α * (sum of squared coefficients)
```
- MSE: Regular error term
- α (alpha): Controls penalty strength
- Higher α = smaller coefficients

Simple Implementation

from sklearn.linear_model import Ridge
import numpy as np

def ridge_regression(X, y, alpha=1.0):
    # Create and fit model
    model = Ridge(alpha=alpha)
    model.fit(X, y)

    return model

# Example usage
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([2, 3, 4, 5])

model = ridge_regression(X, y)
print("Coefficients:", model.coef_)

Visual Example: Effect of Alpha

def plot_ridge_coefficients(X, y):
    alphas = [0.1, 1.0, 10.0, 100.0]
    coefficients = []

    for alpha in alphas:
        model = Ridge(alpha=alpha)
        model.fit(X, y)
        coefficients.append(model.coef_)

    # Plot how coefficients change with alpha
    plt.figure(figsize=(10, 6))
    for i in range(X.shape[1]):
        plt.plot(alphas, [c[i] for c in coefficients],
                label=f'Feature {i+1}')

    plt.xscale('log')
    plt.xlabel('Alpha')
    plt.ylabel('Coefficient Value')
    plt.legend()
    plt.title('Ridge Coefficients vs Alpha')
    plt.show()

Real-World Example: House Price Prediction

def house_price_ridge():
    # Sample data with multiple features
    data = {
        'size': [1000, 1500, 1200, 1700, 2000],
        'bedrooms': [2, 3, 2, 3, 4],
        'age': [5, 10, 15, 8, 3],
        'price': [200000, 300000, 250000, 350000, 450000]
    }

    # Prepare data
    X = np.array([[s, b, a] for s, b, a in
                  zip(data['size'], data['bedrooms'], data['age'])])
    y = np.array(data['price'])

    # Compare different alphas
    alphas = [0.1, 1.0, 10.0]
    for alpha in alphas:
        model = ridge_regression(X, y, alpha)
        print(f"\nAlpha = {alpha}")
        print("Size impact: ${:,.2f}".format(model.coef_[0]))
        print("Bedroom impact: ${:,.2f}".format(model.coef_[1]))
        print("Age impact: ${:,.2f}".format(model.coef_[2]))

Finding Best Alpha

from sklearn.model_selection import cross_val_score

def find_best_alpha(X, y, alphas=[0.1, 1.0, 10.0, 100.0]):
    best_score = -float('inf')
    best_alpha = None

    for alpha in alphas:
        model = Ridge(alpha=alpha)
        scores = cross_val_score(model, X, y, cv=5)
        avg_score = scores.mean()

        print(f"Alpha {alpha}: Score = {avg_score:.3f}")

        if avg_score > best_score:
            best_score = avg_score
            best_alpha = alpha

    return best_alpha, best_score

When to Use Ridge

Good For:

Many correlated features
All features might be important
Want to reduce coefficient size
Prevent overfitting

Not Good For:

Feature selection (use Lasso instead)
Very sparse data
When you need exactly zero coefficients

Key Points

Keeps all features
Reduces impact of less important features
Need to scale features first
Choose alpha using cross-validation
More stable than Lasso

Common Workflow

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def ridge_workflow(X, y):
    # Create pipeline
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('ridge', Ridge(alpha=1.0))
    ])

    # Fit and predict
    pipeline.fit(X, y)

    return pipeline

Lasso Regression

Lasso regression helps select important features by setting some coefficients to exactly zero. Think of it as a feature selector that removes less important variables.

How it Works

Basic Formula
```
Cost = MSE + α * (sum of absolute coefficients)
```
- MSE: Regular error term
- α (alpha): Controls feature selection
- Higher α = more coefficients become zero

Simple Implementation

from sklearn.linear_model import Lasso
import numpy as np

def lasso_regression(X, y, alpha=1.0):
    # Create and fit model
    model = Lasso(alpha=alpha)
    model.fit(X, y)

    return model

# Example usage
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([2, 3, 4, 5])

model = lasso_regression(X, y)
print("Coefficients:", model.coef_)

Visual Example: Feature Selection

def plot_lasso_path(X, y):
    alphas = np.logspace(-4, 1, 100)
    coefs = []

    for alpha in alphas:
        model = Lasso(alpha=alpha)
        model.fit(X, y)
        coefs.append(model.coef_)

    # Plot coefficient paths
    plt.figure(figsize=(10, 6))
    for feature_idx in range(X.shape[1]):
        plt.plot(alphas, [c[feature_idx] for c in coefs],
                label=f'Feature {feature_idx+1}')

    plt.xscale('log')
    plt.xlabel('Alpha')
    plt.ylabel('Coefficient Value')
    plt.legend()
    plt.title('Lasso Path: Coefficients vs Alpha')
    plt.show()

Real-World Example: House Price Features

def house_price_lasso():
    # Sample data with many features
    data = {
        'size': [1000, 1500, 1200, 1700, 2000],
        'bedrooms': [2, 3, 2, 3, 4],
        'age': [5, 10, 15, 8, 3],
        'bathrooms': [1, 2, 1, 2, 2],
        'garage': [1, 1, 0, 2, 2],
        'price': [200000, 300000, 250000, 350000, 450000]
    }

    # Prepare data
    features = ['size', 'bedrooms', 'age', 'bathrooms', 'garage']
    X = np.array([[data[f][i] for f in features]
                  for i in range(len(data['price']))])
    y = np.array(data['price'])

    # Try different alphas
    alphas = [0.1, 1.0, 10.0]
    for alpha in alphas:
        model = lasso_regression(X, y, alpha)
        print(f"\nAlpha = {alpha}")
        for feature, coef in zip(features, model.coef_):
            if abs(coef) > 0:  # Only show non-zero coefficients
                print(f"{feature}: ${coef:,.2f}")

Finding Important Features

def identify_important_features(X, y, feature_names, alpha=1.0):
    # Fit Lasso
    model = Lasso(alpha=alpha)
    model.fit(X, y)

    # Get non-zero coefficients
    important_features = []
    for name, coef in zip(feature_names, model.coef_):
        if abs(coef) > 0:
            important_features.append((name, coef))

    # Sort by absolute coefficient value
    important_features.sort(key=lambda x: abs(x[1]), reverse=True)

    return important_features

# Example usage
features = ['size', 'bedrooms', 'age', 'bathrooms', 'garage']
important = identify_important_features(X, y, features)
for feature, impact in important:
    print(f"{feature}: ${impact:,.2f}")

When to Use Lasso

Good For:

Feature selection
Many irrelevant features
Want simpler models
Need to identify key variables

Not Good For:

Correlated features (use Ridge)
When all features matter
Small datasets

Key Points

Eliminates unimportant features
Produces sparse models
Scale features before using
Try multiple alpha values
Good for feature selection

Complete Workflow

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def lasso_workflow(X, y, alpha=1.0):
    # Create pipeline
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('lasso', Lasso(alpha=alpha))
    ])

    # Find best alpha using cross-validation
    alphas = [0.1, 1.0, 10.0]
    best_alpha, best_score = find_best_alpha(X, y, alphas)

    # Update pipeline with best alpha
    pipeline.set_params(lasso__alpha=best_alpha)
    pipeline.fit(X, y)

    return pipeline, best_alpha

Elastic Net Regression

Elastic Net combines Ridge and Lasso regression to get the best of both worlds. It can both select features and handle correlated variables.

How it Works

Basic Formula
```
Cost = MSE + α * (r * L1 + (1-r) * L2)
```
- MSE: Regular error term
- α: Overall penalty strength
- r: Mix ratio (1 = Lasso, 0 = Ridge)
- L1: Sum of absolute coefficients (Lasso)
- L2: Sum of squared coefficients (Ridge)

Simple Implementation

from sklearn.linear_model import ElasticNet
import numpy as np

def elastic_net(X, y, alpha=1.0, l1_ratio=0.5):
    # Create and fit model
    model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
    model.fit(X, y)

    return model

# Example usage
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([2, 3, 4, 5])

model = elastic_net(X, y)
print("Coefficients:", model.coef_)

Real-World Example: House Prices

def house_price_elastic():
    # Sample data
    data = {
        'size': [1000, 1500, 1200, 1700, 2000],
        'bedrooms': [2, 3, 2, 3, 4],
        'age': [5, 10, 15, 8, 3],
        'bathrooms': [1, 2, 1, 2, 2],
        'price': [200000, 300000, 250000, 350000, 450000]
    }

    # Prepare data
    features = ['size', 'bedrooms', 'age', 'bathrooms']
    X = np.array([[data[f][i] for f in features]
                  for i in range(len(data['price']))])
    y = np.array(data['price'])

    # Try different combinations
    alphas = [0.1, 1.0]
    l1_ratios = [0.2, 0.5, 0.8]

    for alpha in alphas:
        for l1_ratio in l1_ratios:
            model = elastic_net(X, y, alpha, l1_ratio)
            print(f"\nAlpha={alpha}, L1 ratio={l1_ratio}")
            for feature, coef in zip(features, model.coef_):
                print(f"{feature}: ${coef:,.2f}")

Finding Best Parameters

from sklearn.model_selection import GridSearchCV

def find_best_params(X, y):
    # Parameter grid
    param_grid = {
        'alpha': [0.1, 0.5, 1.0],
        'l1_ratio': [0.1, 0.5, 0.7, 0.9]
    }

    # Create model
    model = ElasticNet()

    # Grid search
    grid = GridSearchCV(model, param_grid, cv=5)
    grid.fit(X, y)

    print("Best parameters:", grid.best_params_)
    return grid.best_estimator_

Complete Pipeline

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def elastic_net_pipeline(X, y):
    # Create pipeline
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('elastic', ElasticNet())
    ])

    # Parameter grid
    param_grid = {
        'elastic__alpha': [0.1, 1.0, 10.0],
        'elastic__l1_ratio': [0.1, 0.5, 0.9]
    }

    # Find best parameters
    grid = GridSearchCV(pipeline, param_grid, cv=5)
    grid.fit(X, y)

    return grid.best_estimator_

When to Use Elastic Net

Good For:

Correlated features
Feature selection needed
Want balance between Ridge and Lasso
Medium to large datasets

Not Good For:

Very small datasets
When you need simple interpretation
When pure Ridge or Lasso works well

Key Points

Combines Ridge and Lasso benefits
More flexible than either alone
Two parameters to tune (α and r)
Scale features before using
Good default choice for regression

Quick Tips

Start with l1_ratio = 0.5
Try different alpha values
Use cross-validation
Scale your features
Check feature importance

Types of Cross-Validation

Cross-validation helps test how well your model works on new data by splitting your data in different ways.

K-Fold Cross-Validation

from sklearn.model_selection import KFold
import numpy as np

def k_fold_example(X, y, k=5):
    # Create K-Fold splitter
    kf = KFold(n_splits=k, shuffle=True)

    scores = []
    for fold, (train_idx, test_idx) in enumerate(kf.split(X)):
        # Split data
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        # Train and evaluate
        model = LinearRegression()
        model.fit(X_train, y_train)
        score = model.score(X_test, y_test)
        scores.append(score)

        print(f"Fold {fold+1} Score: {score:.3f}")

    print(f"Average Score: {np.mean(scores):.3f}")

Leave-One-Out Cross-Validation

from sklearn.model_selection import LeaveOneOut

def leave_one_out_example(X, y):
    # Good for small datasets
    loo = LeaveOneOut()
    scores = []

    for train_idx, test_idx in loo.split(X):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        model = LinearRegression()
        model.fit(X_train, y_train)
        score = model.score(X_test, y_test)
        scores.append(score)

    return np.mean(scores)

Stratified K-Fold

from sklearn.model_selection import StratifiedKFold

def stratified_kfold_example(X, y, k=5):
    # Good for imbalanced classification
    skf = StratifiedKFold(n_splits=k, shuffle=True)

    for fold, (train_idx, test_idx) in enumerate(skf.split(X, y)):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]

        # Check distribution
        print(f"Fold {fold+1} distribution:")
        print(f"Train: {np.bincount(y_train)}")
        print(f"Test: {np.bincount(y_test)}\n")

Time Series Split

from sklearn.model_selection import TimeSeriesSplit

def time_series_split_example(X, y, n_splits=5):
    # Good for time series data
    tscv = TimeSeriesSplit(n_splits=n_splits)

    for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
        print(f"Fold {fold+1}:")
        print(f"Train: index {min(train_idx)} to {max(train_idx)}")
        print(f"Test: index {min(test_idx)} to {max(test_idx)}\n")

Complete Example

def compare_cv_methods(X, y):
    # Sample data
    X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
    y = np.array([2, 4, 5, 4, 5, 6, 7, 6, 8, 9])

    # 1. K-Fold
    print("K-Fold CV:")
    k_fold_example(X, y)

    # 2. Leave-One-Out
    print("\nLeave-One-Out CV:")
    loo_score = leave_one_out_example(X, y)
    print(f"Score: {loo_score:.3f}")

    # 3. Time Series
    print("\nTime Series CV:")
    time_series_split_example(X, y)

When to Use Each Method

K-Fold (Default Choice)
- General purpose
- Medium to large datasets
- Random data order
Leave-One-Out
- Very small datasets
- When you need exact results
- Computationally expensive
Stratified K-Fold
- Classification problems
- Imbalanced classes
- Need to maintain class ratios
Time Series Split
- Time series data
- Sequential data
- When order matters

Quick Implementation

from sklearn.model_selection import cross_val_score

def quick_cv(model, X, y, cv_type='kfold', n_splits=5):
    if cv_type == 'kfold':
        cv = KFold(n_splits=n_splits, shuffle=True)
    elif cv_type == 'loo':
        cv = LeaveOneOut()
    elif cv_type == 'stratified':
        cv = StratifiedKFold(n_splits=n_splits, shuffle=True)
    elif cv_type == 'timeseries':
        cv = TimeSeriesSplit(n_splits=n_splits)

    scores = cross_val_score(model, X, y, cv=cv)
    print(f"Scores: {scores}")
    print(f"Mean: {scores.mean():.3f}")
    print(f"Std: {scores.std():.3f}")

Key Points

Always shuffle data (except time series)
Use stratified for classification
K-Fold is good default choice
Consider data size and type
Check score distribution