Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR

Welcome to Part 2 of our time series forecasting series! In this post, we’ll explore advanced deep learning models that have revolutionized time series forecasting: N-BEATS and DeepAR. These models represent significant advances beyond traditional statistical methods and basic neural networks.

📖 Table of Contents

  1. Introduction to Deep Learning in Time Series
  2. N-BEATS: Neural Basis Expansion Analysis for Time Series
  3. DeepAR: Deep Autoregressive Models
  4. Hands-on Implementation
  5. Model Comparison and Selection
  6. Next Steps

1. Introduction to Deep Learning in Time Series

Why Deep Learning for Time Series?

Traditional statistical methods like ARIMA and Prophet have limitations:

  • Linear assumptions: Cannot capture complex non-linear patterns
  • Fixed patterns: Struggle with evolving seasonality and trends
  • Limited features: Cannot incorporate external variables effectively

Deep learning models offer several advantages:

  • Non-linear modeling: Can capture complex temporal relationships
  • Feature learning: Automatically discovers relevant patterns
  • Scalability: Handle large datasets and multiple variables
  • Flexibility: Adapt to various time series characteristics

Key Challenges in Deep Learning Time Series

  1. Temporal dependencies: Long-range relationships
  2. Seasonality: Multiple seasonal patterns
  3. Non-stationarity: Changing statistical properties
  4. Missing data: Irregular sampling and gaps

2. N-BEATS: Neural Basis Expansion Analysis for Time Series

What is N-BEATS?

N-BEATS (Neural Basis Expansion Analysis for Time Series) is a deep neural architecture specifically designed for time series forecasting. It was introduced by Oreshkin et al. in 2019 and has shown remarkable performance on various forecasting tasks.

Key Features of N-BEATS

🏗️ Architecture Design

  • Deep neural network: Multiple fully connected layers
  • Residual connections: Skip connections for better gradient flow
  • Basis expansion: Decomposes time series into interpretable components
  • No recurrent connections: Pure feedforward architecture

📊 Interpretability

  • Trend component: Long-term patterns
  • Seasonality component: Repeating patterns
  • Residual component: Remaining variations

N-BEATS Architecture

Input → Block 1 → Block 2 → ... → Block N → Output
         ↓         ↓                ↓
    Trend + Seasonality + Residual

Each block consists of:

  1. Fully connected layers with ReLU activation
  2. Basis expansion for trend and seasonality
  3. Residual connections for gradient flow

Advantages of N-BEATS

  • Interpretable: Clear trend and seasonality decomposition
  • Fast training: No recurrent connections
  • Scalable: Handles multiple time series
  • Robust: Good performance across different domains

3. DeepAR: Deep Autoregressive Models

What is DeepAR?

DeepAR is a probabilistic forecasting model that combines deep learning with autoregressive processes. It was developed by Amazon and is particularly effective for forecasting multiple related time series.

Key Features of DeepAR

🔄 Autoregressive Nature

  • Sequential prediction: Each prediction depends on previous values
  • Probabilistic output: Provides uncertainty estimates
  • Multiple time series: Can handle related series simultaneously

🧠 Neural Architecture

  • LSTM/GRU cells: Capture temporal dependencies
  • Attention mechanism: Focus on relevant historical patterns
  • Conditional generation: Generate forecasts step by step

DeepAR Architecture

Input → LSTM/GRU → Attention → Output Distribution
         ↓           ↓           ↓
    Hidden State → Weights → Parameters

Advantages of DeepAR

  • Probabilistic: Provides uncertainty quantification
  • Multi-series: Handles related time series
  • Flexible: Incorporates external variables
  • Scalable: Efficient for large datasets

4. Hands-on Implementation

Setting Up the Environment

First, let’s install the required packages:

pip install torch torchvision torchaudio
pip install numpy pandas matplotlib seaborn
pip install scikit-learn

Implementing N-BEATS

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

class NBEATSBlock(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NBEATSBlock, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class NBEATS(nn.Module):
    def __init__(self, input_size, hidden_size, num_blocks, forecast_horizon):
        super(NBEATS, self).__init__()
        self.num_blocks = num_blocks
        self.forecast_horizon = forecast_horizon
        
        # Create blocks
        self.blocks = nn.ModuleList([
            NBEATSBlock(input_size, hidden_size, forecast_horizon)
            for _ in range(num_blocks)
        ])
        
        # Final projection
        self.final_projection = nn.Linear(num_blocks * forecast_horizon, forecast_horizon)
        
    def forward(self, x):
        block_outputs = []
        
        for block in self.blocks:
            block_out = block(x)
            block_outputs.append(block_out)
            
        # Concatenate all block outputs
        concatenated = torch.cat(block_outputs, dim=1)
        
        # Final projection
        output = self.final_projection(concatenated)
        return output

# Example usage
input_size = 24  # Lookback window
hidden_size = 64
num_blocks = 3
forecast_horizon = 12

model = NBEATS(input_size, hidden_size, num_blocks, forecast_horizon)
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")

Implementing DeepAR

class DeepAR(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(DeepAR, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layers
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.1
        )
        
        # Output projection
        self.output_projection = nn.Linear(hidden_size, output_size)
        
        # Distribution parameters (for probabilistic forecasting)
        self.mu_projection = nn.Linear(hidden_size, 1)
        self.sigma_projection = nn.Linear(hidden_size, 1)
        
    def forward(self, x, hidden=None):
        # LSTM forward pass
        lstm_out, hidden = self.lstm(x, hidden)
        
        # Get last output
        last_output = lstm_out[:, -1, :]
        
        # Point forecast
        point_forecast = self.output_projection(last_output)
        
        # Distribution parameters
        mu = self.mu_projection(last_output)
        sigma = torch.exp(self.sigma_projection(last_output))  # Ensure positive
        
        return point_forecast, mu, sigma, hidden
    
    def init_hidden(self, batch_size, device):
        return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
                torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))

# Example usage
input_size = 24
hidden_size = 64
num_layers = 2
output_size = 12

model = DeepAR(input_size, hidden_size, num_layers, output_size)
print(f"DeepAR parameters: {sum(p.numel() for p in model.parameters())}")

Training and Evaluation

def train_model(model, train_loader, val_loader, epochs=100, lr=0.001):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training
        model.train()
        train_loss = 0
        for batch_x, batch_y in train_loader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            
            optimizer.zero_grad()
            output = model(batch_x)
            loss = criterion(output, batch_y)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        # Validation
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x, batch_y = batch_x.to(device), batch_y.to(device)
                output = model(batch_x)
                loss = criterion(output, batch_y)
                val_loss += loss.item()
        
        train_losses.append(train_loss / len(train_loader))
        val_losses.append(val_loss / len(val_loader))
        
        if epoch % 10 == 0:
            print(f'Epoch {epoch}: Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}')
    
    return train_losses, val_losses

# Plot training curves
def plot_training_curves(train_losses, val_losses):
    plt.figure(figsize=(10, 6))
    plt.plot(train_losses, label='Training Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.grid(True)
    plt.show()

Data Preparation

def prepare_time_series_data(data, lookback=24, forecast_horizon=12):
    """
    Prepare time series data for deep learning models
    
    Args:
        data: 1D numpy array of time series
        lookback: Number of past observations to use
        forecast_horizon: Number of future steps to predict
    
    Returns:
        X: Input features (batch_size, lookback)
        y: Target values (batch_size, forecast_horizon)
    """
    X, y = [], []
    
    for i in range(len(data) - lookback - forecast_horizon + 1):
        X.append(data[i:i+lookback])
        y.append(data[i+lookback:i+lookback+forecast_horizon])
    
    return np.array(X), np.array(y)

# Example with synthetic data
np.random.seed(42)
t = np.arange(1000)
trend = 0.01 * t
seasonal = 10 * np.sin(2 * np.pi * t / 24)  # Daily seasonality
noise = np.random.normal(0, 1, 1000)
data = trend + seasonal + noise

# Prepare data
X, y = prepare_time_series_data(data, lookback=24, forecast_horizon=12)

# Split into train/validation
split_idx = int(0.8 * len(X))
X_train, X_val = X[:split_idx], X[split_idx:]
y_train, y_val = y[:split_idx], y[split_idx:]

# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_val = torch.FloatTensor(X_val)
y_val = torch.FloatTensor(y_val)

# Create data loaders
from torch.utils.data import DataLoader, TensorDataset

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

5. Model Comparison and Selection

When to Use N-BEATS

✅ Best for:

  • Interpretable forecasts: Need trend and seasonality decomposition
  • Fast training: Quick model development and iteration
  • Multiple time series: Handle many series efficiently
  • Point forecasts: Single value predictions

❌ Not ideal for:

  • Probabilistic forecasts: Need uncertainty quantification
  • Very long sequences: Limited by lookback window
  • External variables: Cannot incorporate additional features

When to Use DeepAR

✅ Best for:

  • Probabilistic forecasts: Need uncertainty estimates
  • Multiple related series: Leverage relationships between series
  • External variables: Incorporate additional features
  • Long sequences: Handle very long time series

❌ Not ideal for:

  • Fast training: Slower due to sequential nature
  • Interpretability: Less interpretable than N-BEATS
  • Real-time inference: Sequential generation can be slow

Performance Comparison

Aspect N-BEATS DeepAR
Training Speed ⭐⭐⭐⭐⭐ ⭐⭐⭐
Inference Speed ⭐⭐⭐⭐⭐ ⭐⭐⭐
Interpretability ⭐⭐⭐⭐⭐ ⭐⭐⭐
Uncertainty ⭐⭐ ⭐⭐⭐⭐⭐
Multi-series ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
External Variables ⭐⭐ ⭐⭐⭐⭐⭐

6. Next Steps

Advanced Topics to Explore

  1. Attention Mechanisms
    • Transformer-based models for time series
    • Self-attention for long-range dependencies
  2. Multi-horizon Forecasting
    • Direct vs. recursive forecasting
    • Hierarchical forecasting
  3. Ensemble Methods
    • Combining multiple models
    • Stacking and blending strategies
  4. Real-world Applications
    • Financial time series
    • IoT sensor data
    • Energy consumption forecasting

Practical Tips

  1. Data Preprocessing
    • Handle missing values appropriately
    • Normalize/standardize your data
    • Consider seasonal decomposition
  2. Hyperparameter Tuning
    • Use cross-validation for time series
    • Grid search or Bayesian optimization
    • Monitor for overfitting
  3. Model Evaluation
    • Use appropriate metrics (MSE, MAE, MAPE)
    • Consider business context
    • Validate on out-of-sample data

Code Repository

All the code examples from this post are available in our GitHub repository. You can find:

  • Complete implementations of N-BEATS and DeepAR
  • Training and evaluation scripts
  • Example datasets and notebooks
  • Performance benchmarks

🔗 Series Navigation

← Previous: Part 1: Time Series Forecasting Basics - ARIMA to Prophet

Next →: Part 3: Transformer-Based Time Series Forecasting Models


This post is part of our comprehensive series on the evolution of time series forecasting. In the next part, we’ll explore cutting-edge models like Informer, Autoformer, and PatchTST that are pushing the boundaries of what’s possible in time series forecasting.

Happy forecasting! 🚀