Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR

Welcome to Part 2 of our time series forecasting series! In this post, we’ll explore advanced deep learning models that have revolutionized time series forecasting: N-BEATS and DeepAR. These models represent significant advances beyond traditional statistical methods and basic neural networks.

📖 Table of Contents

Introduction to Deep Learning in Time Series
N-BEATS: Neural Basis Expansion Analysis for Time Series
DeepAR: Deep Autoregressive Models
Hands-on Implementation
Model Comparison and Selection
Next Steps

1. Introduction to Deep Learning in Time Series

Why Deep Learning for Time Series?

Traditional statistical methods like ARIMA and Prophet have limitations:

Linear assumptions: Cannot capture complex non-linear patterns
Fixed patterns: Struggle with evolving seasonality and trends
Limited features: Cannot incorporate external variables effectively

Deep learning models offer several advantages:

Non-linear modeling: Can capture complex temporal relationships
Feature learning: Automatically discovers relevant patterns
Scalability: Handle large datasets and multiple variables
Flexibility: Adapt to various time series characteristics

Key Challenges in Deep Learning Time Series

Temporal dependencies: Long-range relationships
Seasonality: Multiple seasonal patterns
Non-stationarity: Changing statistical properties
Missing data: Irregular sampling and gaps

2. N-BEATS: Neural Basis Expansion Analysis for Time Series

What is N-BEATS?

N-BEATS (Neural Basis Expansion Analysis for Time Series) is a deep neural architecture specifically designed for time series forecasting. It was introduced by Oreshkin et al. in 2019 and has shown remarkable performance on various forecasting tasks.

Key Features of N-BEATS

🏗️ Architecture Design

Deep neural network: Multiple fully connected layers
Residual connections: Skip connections for better gradient flow
Basis expansion: Decomposes time series into interpretable components
No recurrent connections: Pure feedforward architecture

📊 Interpretability

Trend component: Long-term patterns
Seasonality component: Repeating patterns
Residual component: Remaining variations

N-BEATS Architecture

Input → Block 1 → Block 2 → ... → Block N → Output
         ↓         ↓                ↓
    Trend + Seasonality + Residual

Each block consists of:

Fully connected layers with ReLU activation
Basis expansion for trend and seasonality
Residual connections for gradient flow

Advantages of N-BEATS

Interpretable: Clear trend and seasonality decomposition
Fast training: No recurrent connections
Scalable: Handles multiple time series
Robust: Good performance across different domains

3. DeepAR: Deep Autoregressive Models

What is DeepAR?

DeepAR is a probabilistic forecasting model that combines deep learning with autoregressive processes. It was developed by Amazon and is particularly effective for forecasting multiple related time series.

Key Features of DeepAR

🔄 Autoregressive Nature

Sequential prediction: Each prediction depends on previous values
Probabilistic output: Provides uncertainty estimates
Multiple time series: Can handle related series simultaneously

🧠 Neural Architecture

LSTM/GRU cells: Capture temporal dependencies
Attention mechanism: Focus on relevant historical patterns
Conditional generation: Generate forecasts step by step

DeepAR Architecture

Input → LSTM/GRU → Attention → Output Distribution
         ↓           ↓           ↓
    Hidden State → Weights → Parameters

Advantages of DeepAR

Probabilistic: Provides uncertainty quantification
Multi-series: Handles related time series
Flexible: Incorporates external variables
Scalable: Efficient for large datasets

4. Hands-on Implementation

Setting Up the Environment

First, let’s install the required packages:

pip install torch torchvision torchaudio
pip install numpy pandas matplotlib seaborn
pip install scikit-learn

Implementing N-BEATS

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

class NBEATSBlock(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NBEATSBlock, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class NBEATS(nn.Module):
    def __init__(self, input_size, hidden_size, num_blocks, forecast_horizon):
        super(NBEATS, self).__init__()
        self.num_blocks = num_blocks
        self.forecast_horizon = forecast_horizon
        
        # Create blocks
        self.blocks = nn.ModuleList([
            NBEATSBlock(input_size, hidden_size, forecast_horizon)
            for _ in range(num_blocks)
        ])
        
        # Final projection
        self.final_projection = nn.Linear(num_blocks * forecast_horizon, forecast_horizon)
        
    def forward(self, x):
        block_outputs = []
        
        for block in self.blocks:
            block_out = block(x)
            block_outputs.append(block_out)
            
        # Concatenate all block outputs
        concatenated = torch.cat(block_outputs, dim=1)
        
        # Final projection
        output = self.final_projection(concatenated)
        return output

# Example usage
input_size = 24  # Lookback window
hidden_size = 64
num_blocks = 3
forecast_horizon = 12

model = NBEATS(input_size, hidden_size, num_blocks, forecast_horizon)
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")

Implementing DeepAR

class DeepAR(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(DeepAR, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layers
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.1
        )
        
        # Output projection
        self.output_projection = nn.Linear(hidden_size, output_size)
        
        # Distribution parameters (for probabilistic forecasting)
        self.mu_projection = nn.Linear(hidden_size, 1)
        self.sigma_projection = nn.Linear(hidden_size, 1)
        
    def forward(self, x, hidden=None):
        # LSTM forward pass
        lstm_out, hidden = self.lstm(x, hidden)
        
        # Get last output
        last_output = lstm_out[:, -1, :]
        
        # Point forecast
        point_forecast = self.output_projection(last_output)
        
        # Distribution parameters
        mu = self.mu_projection(last_output)
        sigma = torch.exp(self.sigma_projection(last_output))  # Ensure positive
        
        return point_forecast, mu, sigma, hidden
    
    def init_hidden(self, batch_size, device):
        return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
                torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))

# Example usage
input_size = 24
hidden_size = 64
num_layers = 2
output_size = 12

model = DeepAR(input_size, hidden_size, num_layers, output_size)
print(f"DeepAR parameters: {sum(p.numel() for p in model.parameters())}")

Training and Evaluation

def train_model(model, train_loader, val_loader, epochs=100, lr=0.001):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training
        model.train()
        train_loss = 0
        for batch_x, batch_y in train_loader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            
            optimizer.zero_grad()
            output = model(batch_x)
            loss = criterion(output, batch_y)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        # Validation
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                output = model(batch_x)
                loss = criterion(output, batch_y)
                val_loss += loss.item()
        
        train_losses.append(train_loss / len(train_loader))
        val_losses.append(val_loss / len(val_loader))
        
        if epoch % 10 == 0:
            print(f'Epoch {epoch}: Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}')
    
    return train_losses, val_losses

# Plot training curves
def plot_training_curves(train_losses, val_losses):
    plt.figure(figsize=(10, 6))
    plt.plot(train_losses, label='Training Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.grid(True)
    plt.show()

Data Preparation

def prepare_time_series_data(data, lookback=24, forecast_horizon=12):
    """
    Prepare time series data for deep learning models
    
    Args:
        data: 1D numpy array of time series
        lookback: Number of past observations to use
        forecast_horizon: Number of future steps to predict
    
    Returns:
        X: Input features (batch_size, lookback)
        y: Target values (batch_size, forecast_horizon)
    """
    X, y = [], []
    
    for i in range(len(data) - lookback - forecast_horizon + 1):
        X.append(data[i:i+lookback])
        y.append(data[i+lookback:i+lookback+forecast_horizon])
    
    return np.array(X), np.array(y)

# Example with synthetic data
np.random.seed(42)
t = np.arange(1000)
trend = 0.01 * t
seasonal = 10 * np.sin(2 * np.pi * t / 24)  # Daily seasonality
noise = np.random.normal(0, 1, 1000)
data = trend + seasonal + noise

# Prepare data
X, y = prepare_time_series_data(data, lookback=24, forecast_horizon=12)

# Split into train/validation
split_idx = int(0.8 * len(X))
X_train, X_val = X[:split_idx], X[split_idx:]
y_train, y_val = y[:split_idx], y[split_idx:]

# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_val = torch.FloatTensor(X_val)
y_val = torch.FloatTensor(y_val)

# Create data loaders
from torch.utils.data import DataLoader, TensorDataset

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

5. Model Comparison and Selection

When to Use N-BEATS

✅ Best for:

Interpretable forecasts: Need trend and seasonality decomposition
Fast training: Quick model development and iteration
Multiple time series: Handle many series efficiently
Point forecasts: Single value predictions

❌ Not ideal for:

Probabilistic forecasts: Need uncertainty quantification
Very long sequences: Limited by lookback window
External variables: Cannot incorporate additional features

When to Use DeepAR