Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR
Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR
Welcome to Part 2 of our time series forecasting series! In this post, we’ll explore advanced deep learning models that have revolutionized time series forecasting: N-BEATS and DeepAR. These models represent significant advances beyond traditional statistical methods and basic neural networks.
📖 Table of Contents
- Introduction to Deep Learning in Time Series
- N-BEATS: Neural Basis Expansion Analysis for Time Series
- DeepAR: Deep Autoregressive Models
- Hands-on Implementation
- Model Comparison and Selection
- Next Steps
1. Introduction to Deep Learning in Time Series
Why Deep Learning for Time Series?
Traditional statistical methods like ARIMA and Prophet have limitations:
- Linear assumptions: Cannot capture complex non-linear patterns
- Fixed patterns: Struggle with evolving seasonality and trends
- Limited features: Cannot incorporate external variables effectively
Deep learning models offer several advantages:
- Non-linear modeling: Can capture complex temporal relationships
- Feature learning: Automatically discovers relevant patterns
- Scalability: Handle large datasets and multiple variables
- Flexibility: Adapt to various time series characteristics
Key Challenges in Deep Learning Time Series
- Temporal dependencies: Long-range relationships
- Seasonality: Multiple seasonal patterns
- Non-stationarity: Changing statistical properties
- Missing data: Irregular sampling and gaps
2. N-BEATS: Neural Basis Expansion Analysis for Time Series
What is N-BEATS?
N-BEATS (Neural Basis Expansion Analysis for Time Series) is a deep neural architecture specifically designed for time series forecasting. It was introduced by Oreshkin et al. in 2019 and has shown remarkable performance on various forecasting tasks.
Key Features of N-BEATS
🏗️ Architecture Design
- Deep neural network: Multiple fully connected layers
- Residual connections: Skip connections for better gradient flow
- Basis expansion: Decomposes time series into interpretable components
- No recurrent connections: Pure feedforward architecture
📊 Interpretability
- Trend component: Long-term patterns
- Seasonality component: Repeating patterns
- Residual component: Remaining variations
N-BEATS Architecture
Input → Block 1 → Block 2 → ... → Block N → Output
↓ ↓ ↓
Trend + Seasonality + Residual
Each block consists of:
- Fully connected layers with ReLU activation
- Basis expansion for trend and seasonality
- Residual connections for gradient flow
Advantages of N-BEATS
- Interpretable: Clear trend and seasonality decomposition
- Fast training: No recurrent connections
- Scalable: Handles multiple time series
- Robust: Good performance across different domains
3. DeepAR: Deep Autoregressive Models
What is DeepAR?
DeepAR is a probabilistic forecasting model that combines deep learning with autoregressive processes. It was developed by Amazon and is particularly effective for forecasting multiple related time series.
Key Features of DeepAR
🔄 Autoregressive Nature
- Sequential prediction: Each prediction depends on previous values
- Probabilistic output: Provides uncertainty estimates
- Multiple time series: Can handle related series simultaneously
🧠 Neural Architecture
- LSTM/GRU cells: Capture temporal dependencies
- Attention mechanism: Focus on relevant historical patterns
- Conditional generation: Generate forecasts step by step
DeepAR Architecture
Input → LSTM/GRU → Attention → Output Distribution
↓ ↓ ↓
Hidden State → Weights → Parameters
Advantages of DeepAR
- Probabilistic: Provides uncertainty quantification
- Multi-series: Handles related time series
- Flexible: Incorporates external variables
- Scalable: Efficient for large datasets
4. Hands-on Implementation
Setting Up the Environment
First, let’s install the required packages:
pip install torch torchvision torchaudio
pip install numpy pandas matplotlib seaborn
pip install scikit-learn
Implementing N-BEATS
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
class NBEATSBlock(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NBEATSBlock, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, output_size)
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.fc3(x)
return x
class NBEATS(nn.Module):
def __init__(self, input_size, hidden_size, num_blocks, forecast_horizon):
super(NBEATS, self).__init__()
self.num_blocks = num_blocks
self.forecast_horizon = forecast_horizon
# Create blocks
self.blocks = nn.ModuleList([
NBEATSBlock(input_size, hidden_size, forecast_horizon)
for _ in range(num_blocks)
])
# Final projection
self.final_projection = nn.Linear(num_blocks * forecast_horizon, forecast_horizon)
def forward(self, x):
block_outputs = []
for block in self.blocks:
block_out = block(x)
block_outputs.append(block_out)
# Concatenate all block outputs
concatenated = torch.cat(block_outputs, dim=1)
# Final projection
output = self.final_projection(concatenated)
return output
# Example usage
input_size = 24 # Lookback window
hidden_size = 64
num_blocks = 3
forecast_horizon = 12
model = NBEATS(input_size, hidden_size, num_blocks, forecast_horizon)
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
Implementing DeepAR
class DeepAR(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size):
super(DeepAR, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# LSTM layers
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
dropout=0.1
)
# Output projection
self.output_projection = nn.Linear(hidden_size, output_size)
# Distribution parameters (for probabilistic forecasting)
self.mu_projection = nn.Linear(hidden_size, 1)
self.sigma_projection = nn.Linear(hidden_size, 1)
def forward(self, x, hidden=None):
# LSTM forward pass
lstm_out, hidden = self.lstm(x, hidden)
# Get last output
last_output = lstm_out[:, -1, :]
# Point forecast
point_forecast = self.output_projection(last_output)
# Distribution parameters
mu = self.mu_projection(last_output)
sigma = torch.exp(self.sigma_projection(last_output)) # Ensure positive
return point_forecast, mu, sigma, hidden
def init_hidden(self, batch_size, device):
return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))
# Example usage
input_size = 24
hidden_size = 64
num_layers = 2
output_size = 12
model = DeepAR(input_size, hidden_size, num_layers, output_size)
print(f"DeepAR parameters: {sum(p.numel() for p in model.parameters())}")
Training and Evaluation
def train_model(model, train_loader, val_loader, epochs=100, lr=0.001):
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
train_losses = []
val_losses = []
for epoch in range(epochs):
# Training
model.train()
train_loss = 0
for batch_x, batch_y in train_loader:
batch_x, batch_y = batch_x.to(device), batch_y.to(device)
optimizer.zero_grad()
output = model(batch_x)
loss = criterion(output, batch_y)
loss.backward()
optimizer.step()
train_loss += loss.item()
# Validation
model.eval()
val_loss = 0
with torch.no_grad():
for batch_x, batch_y in val_loader:
batch_x, batch_y = batch_x.to(device), batch_y.to(device)
output = model(batch_x)
loss = criterion(output, batch_y)
val_loss += loss.item()
train_losses.append(train_loss / len(train_loader))
val_losses.append(val_loss / len(val_loader))
if epoch % 10 == 0:
print(f'Epoch {epoch}: Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}')
return train_losses, val_losses
# Plot training curves
def plot_training_curves(train_losses, val_losses):
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True)
plt.show()
Data Preparation
def prepare_time_series_data(data, lookback=24, forecast_horizon=12):
"""
Prepare time series data for deep learning models
Args:
data: 1D numpy array of time series
lookback: Number of past observations to use
forecast_horizon: Number of future steps to predict
Returns:
X: Input features (batch_size, lookback)
y: Target values (batch_size, forecast_horizon)
"""
X, y = [], []
for i in range(len(data) - lookback - forecast_horizon + 1):
X.append(data[i:i+lookback])
y.append(data[i+lookback:i+lookback+forecast_horizon])
return np.array(X), np.array(y)
# Example with synthetic data
np.random.seed(42)
t = np.arange(1000)
trend = 0.01 * t
seasonal = 10 * np.sin(2 * np.pi * t / 24) # Daily seasonality
noise = np.random.normal(0, 1, 1000)
data = trend + seasonal + noise
# Prepare data
X, y = prepare_time_series_data(data, lookback=24, forecast_horizon=12)
# Split into train/validation
split_idx = int(0.8 * len(X))
X_train, X_val = X[:split_idx], X[split_idx:]
y_train, y_val = y[:split_idx], y[split_idx:]
# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_val = torch.FloatTensor(X_val)
y_val = torch.FloatTensor(y_val)
# Create data loaders
from torch.utils.data import DataLoader, TensorDataset
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
5. Model Comparison and Selection
When to Use N-BEATS
✅ Best for:
- Interpretable forecasts: Need trend and seasonality decomposition
- Fast training: Quick model development and iteration
- Multiple time series: Handle many series efficiently
- Point forecasts: Single value predictions
❌ Not ideal for:
- Probabilistic forecasts: Need uncertainty quantification
- Very long sequences: Limited by lookback window
- External variables: Cannot incorporate additional features
When to Use DeepAR
✅ Best for:
- Probabilistic forecasts: Need uncertainty estimates
- Multiple related series: Leverage relationships between series
- External variables: Incorporate additional features
- Long sequences: Handle very long time series
❌ Not ideal for:
- Fast training: Slower due to sequential nature
- Interpretability: Less interpretable than N-BEATS
- Real-time inference: Sequential generation can be slow
Performance Comparison
Aspect | N-BEATS | DeepAR |
---|---|---|
Training Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Inference Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Interpretability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Uncertainty | ⭐⭐ | ⭐⭐⭐⭐⭐ |
Multi-series | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
External Variables | ⭐⭐ | ⭐⭐⭐⭐⭐ |
6. Next Steps
Advanced Topics to Explore
- Attention Mechanisms
- Transformer-based models for time series
- Self-attention for long-range dependencies
- Multi-horizon Forecasting
- Direct vs. recursive forecasting
- Hierarchical forecasting
- Ensemble Methods
- Combining multiple models
- Stacking and blending strategies
- Real-world Applications
- Financial time series
- IoT sensor data
- Energy consumption forecasting
Practical Tips
- Data Preprocessing
- Handle missing values appropriately
- Normalize/standardize your data
- Consider seasonal decomposition
- Hyperparameter Tuning
- Use cross-validation for time series
- Grid search or Bayesian optimization
- Monitor for overfitting
- Model Evaluation
- Use appropriate metrics (MSE, MAE, MAPE)
- Consider business context
- Validate on out-of-sample data
Code Repository
All the code examples from this post are available in our GitHub repository. You can find:
- Complete implementations of N-BEATS and DeepAR
- Training and evaluation scripts
- Example datasets and notebooks
- Performance benchmarks
🔗 Series Navigation
← Previous: Part 1: Time Series Forecasting Basics - ARIMA to Prophet
Next →: Part 3: Transformer-Based Time Series Forecasting Models
This post is part of our comprehensive series on the evolution of time series forecasting. In the next part, we’ll explore cutting-edge models like Informer, Autoformer, and PatchTST that are pushing the boundaries of what’s possible in time series forecasting.
Happy forecasting! 🚀