Part 4: Latest Generative AI Models - TimeGPT, Lag-Llama, Moirai, Chronos
Part 4: Latest Generative AI Models - TimeGPT, Lag-Llama, Moirai, Chronos
Explore innovative time series forecasting models using large language models and implement them in practice.
📖 Table of Contents
- The Rise of LLM-based Time Series Forecasting
- TimeGPT: OpenAI’s Time Series Model
- Lag-Llama: Open Source Alternative
- Moirai: Multi-series Forecasting
- Chronos: Meta’s Time Series Model
- Hands-on: Lag-Llama Implementation
- Model Comparison and Selection Guide
- Next Steps and Future Outlook
- Learning Summary
🚀 The Rise of LLM-based Time Series Forecasting
Limitations of Previous Models
The models we learned in Parts 1-3 each had their strengths, but shared common limitations:
- Domain Specialization: Optimized for specific time series patterns
- Data Dependency: Requires large amounts of domain-specific training data
- Generalization Limits: Difficulty adapting to new time series types
- Limited Multi-tasking: Hard to perform various prediction tasks simultaneously
LLM’s Revolutionary Approach
Large Language Models (LLMs) presented a new paradigm to overcome these limitations:
- Universality: Learning from diverse domain time series data
- Few-shot Learning: Excellent performance with minimal data
- Multi-tasking: Performing multiple prediction tasks simultaneously
- Natural Language Interface: Intuitive querying and explainability
Core Technical Innovations
- Tokenization: Converting time series data into tokens
- Causal Attention: Modeling temporal dependencies in time series
- Multi-scale Learning: Learning patterns at various time scales
- Instruction Following: Prediction based on natural language instructions
🤖 TimeGPT: OpenAI’s Time Series Model
Core Features of TimeGPT
TimeGPT is the first large-scale time series forecasting model developed by OpenAI.
1. Large-scale Pre-training
Training Data: 100+ billion time series data points
Domains: Finance, Retail, Manufacturing, Energy, Healthcare, etc.
Geographic Scope: Worldwide
Time Range: 2020-2024
2. Zero-shot Prediction Capability
TimeGPT can predict new time series without additional training based on pre-trained knowledge:
# Example: Immediate prediction for new time series
forecast = timegpt.predict(
series=unseen_timeseries,
horizon=30, # 30-day prediction
freq="D" # Daily data
)
3. Natural Language Interface
# Request prediction in natural language
result = timegpt.predict_with_context(
series=stock_prices,
context="Please predict stock prices for the next week, considering market volatility.",
horizon=7
)
TimeGPT Architecture
Input Time Series → Tokenization → Transformer Encoder → Decoder → Predictions
↓ ↓ ↓ ↓
Raw Data Time Tokens Attention Forecast
Tokenization Strategy
- Value Binning: Converting continuous values to discrete tokens
- Temporal Encoding: Including time information in tokens
- Context Tokens: Expressing metadata as tokens
🦙 Lag-Llama: Open Source Alternative
Background of Lag-Llama
Lag-Llama was developed by Hugging Face as an open-source alternative to overcome the limitations of TimeGPT being a commercial model available only through API.
1. Open Source Accessibility
- Fully Open Source: Model weights and code publicly available
- Local Execution: Run locally without API dependencies
- Customization: Modify models according to needs
2. LLaMA-based Architecture
Time Series Input → Lag Features → LLaMA-7B → Predictions
↓ ↓ ↓
Raw Data Lag Tokens Transformer
3. Innovation of Lag Features
Lag-Llama converts lag values of time series into tokens:
# Lag Features creation example
def create_lag_features(series, max_lags=24):
"""Create lag features from time series"""
lags = []
for lag in range(1, max_lags + 1):
lags.append(series.shift(lag))
return pd.concat(lags, axis=1)
Advantages of Lag-Llama
- Transparency: Public model structure and training process
- Cost Efficiency: No API costs
- Privacy: Data doesn’t leave local environment
- Scalability: Customizable for various domains
🌊 Moirai: Multi-series Forecasting
Core Concept of Moirai
Moirai is a multi-tasking model that predicts multiple time series simultaneously.
1. Multi-tasking Architecture
Time Series 1 ┐
Time Series 2 ├─→ Shared Encoder → Task-specific Heads → Predictions
Time Series 3 ┘
2. Cross-series Learning
- Common Pattern Learning: Discovering shared patterns across time series
- Knowledge Transfer: Learning from one time series helps others
- Efficiency: Processing multiple time series with a single model
3. Hierarchical Forecasting
Country Level ┐
Region Level ├─→ Hierarchical Prediction
City Level ┘
⏰ Chronos: Meta’s Time Series Model
Chronos’s Innovative Approach
Chronos is a token-based time series forecasting model developed by Meta.
1. Token-based Forecasting
Converting time series to tokens and processing like language models:
# Chronos tokenization example
def tokenize_timeseries(series, vocab_size=512):
"""Convert time series to tokens"""
# Value normalization
normalized = (series - series.mean()) / series.std()
# Quantization
quantized = np.digitize(normalized,
np.linspace(-3, 3, vocab_size-1))
return quantized
2. Multi-scale Learning
- Short-term Patterns: Daily, weekly variations
- Medium-term Patterns: Monthly, quarterly trends
- Long-term Patterns: Annual seasonality, structural changes
3. Instruction Tuning
Performing various prediction tasks based on natural language instructions:
# Chronos instruction examples
instructions = [
"Predict sales for the next 7 days",
"Provide monthly prediction considering seasonality",
"Make stable prediction excluding outliers"
]
🛠️ Hands-on: Lag-Llama Implementation
1. Environment Setup
# Install required libraries
# pip install transformers torch datasets evaluate
# pip install pandas numpy matplotlib seaborn
# pip install scikit-learn
import torch
import torch.nn as nn
from transformers import AutoModel, AutoTokenizer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
# Visualization settings
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
2. Load Lag-Llama Model
# Load Lag-Llama model
model_name = "time-series-foundation-models/Lag-Llama"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
print(f"Model structure: {model.config}")
3. Prepare Time Series Data
def generate_complex_multivariate_timeseries(n=2000, n_series=5, seed=42):
"""Generate complex multivariate time series data"""
np.random.seed(seed)
# Time index
t = np.arange(n)
# Common trend
common_trend = 0.02 * t + 0.0001 * t**2
# Series-specific characteristics
series_data = []
for i in range(n_series):
# Series-specific unique patterns
series_trend = common_trend + 0.01 * i * t
series_seasonal = 10 * np.sin(2 * np.pi * t / 365 + i * np.pi/3)
series_noise = np.random.normal(0, 1 + 0.1 * i, n)
# Cross-series correlation
if i > 0:
correlation = 0.3 * np.sin(2 * np.pi * t / 100 + i)
series_data.append(series_trend + series_seasonal + series_noise + correlation)
else:
series_data.append(series_trend + series_seasonal + series_noise)
# Create DataFrame
df = pd.DataFrame(np.array(series_data).T,
columns=[f'series_{i+1}' for i in range(n_series)],
index=pd.date_range('2020-01-01', periods=n, freq='D'))
return df
# Generate multivariate time series
multivariate_ts = generate_complex_multivariate_timeseries(2000, 5)
# Visualization
plt.figure(figsize=(15, 10))
for i, col in enumerate(multivariate_ts.columns):
plt.subplot(3, 2, i+1)
plt.plot(multivariate_ts.index, multivariate_ts[col], alpha=0.7)
plt.title(f'{col} Time Series')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
print(f"Number of series: {len(multivariate_ts.columns)}")
print(f"Data length: {len(multivariate_ts)}")
print(f"Value range: {multivariate_ts.min().min():.2f} ~ {multivariate_ts.max().max():.2f}")
4. Create Lag Features
def create_lag_features(data, max_lags=24, forecast_horizon=7):
"""Create lag features for Lag-Llama"""
lag_features = {}
targets = {}
for col in data.columns:
series = data[col].values
# Create lag features
lags = []
for lag in range(1, max_lags + 1):
lag_values = np.roll(series, lag)
lag_values[:lag] = np.nan # First lag values are NaN
lags.append(lag_values)
lag_features[col] = np.column_stack(lags)
# Create targets (future values)
targets[col] = series[forecast_horizon:]
return lag_features, targets
# Create lag features
max_lags = 24
forecast_horizon = 7
lag_features, targets = create_lag_features(multivariate_ts, max_lags, forecast_horizon)
print(f"Lag features shape: {lag_features['series_1'].shape}")
print(f"Targets shape: {targets['series_1'].shape}")
5. Implement Lag-Llama Model
class LagLlamaPredictor(nn.Module):
"""Lag-Llama based time series prediction model"""
def __init__(self, input_dim, hidden_dim=512, output_dim=1, num_layers=6):
super(LagLlamaPredictor, self).__init__()
self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.output_dim = output_dim
# Input embedding
self.input_embedding = nn.Linear(input_dim, hidden_dim)
# Transformer layers
encoder_layer = nn.TransformerEncoderLayer(
d_model=hidden_dim,
nhead=8,
dim_feedforward=hidden_dim * 4,
dropout=0.1,
batch_first=True
)
self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
# Output layer
self.output_projection = nn.Linear(hidden_dim, output_dim)
# Dropout
self.dropout = nn.Dropout(0.1)
def forward(self, x):
# Input embedding
x = self.input_embedding(x)
x = self.dropout(x)
# Transformer encoder
x = self.transformer(x)
# Use last time step output
x = x[:, -1, :] # (batch_size, hidden_dim)
# Output prediction
output = self.output_projection(x)
return output
# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LagLlamaPredictor(
input_dim=max_lags,
hidden_dim=512,
output_dim=1,
num_layers=6
).to(device)
print(f"Lag-Llama model parameters: {sum(p.numel() for p in model.parameters())}")
6. Data Preprocessing and Training
def prepare_training_data(lag_features, targets, train_ratio=0.8):
"""Prepare training data"""
all_X = []
all_y = []
for col in lag_features.keys():
X = lag_features[col]
y = targets[col]
# Remove NaN
valid_indices = ~np.isnan(X).any(axis=1)
X = X[valid_indices]
y = y[valid_indices]
# Match lengths
min_len = min(len(X), len(y))
X = X[:min_len]
y = y[:min_len]
all_X.append(X)
all_y.append(y)
# Combine all time series
X = np.vstack(all_X)
y = np.hstack(all_y)
# Train/validation split
split_idx = int(len(X) * train_ratio)
X_train, X_val = X[:split_idx], X[split_idx:]
y_train, y_val = y[:split_idx], y[split_idx:]
return X_train, X_val, y_train, y_val
# Prepare data
X_train, X_val, y_train, y_val = prepare_training_data(lag_features, targets)
# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train).to(device)
X_val = torch.FloatTensor(X_val).to(device)
y_train = torch.FloatTensor(y_train).unsqueeze(1).to(device)
y_val = torch.FloatTensor(y_val).unsqueeze(1).to(device)
print(f"Training data: {X_train.shape}, {y_train.shape}")
print(f"Validation data: {X_val.shape}, {y_val.shape}")
# Training function
def train_model(model, X_train, y_train, X_val, y_val, epochs=100, lr=0.001):
"""Train model"""
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=10, factor=0.5)
train_losses = []
val_losses = []
for epoch in range(epochs):
# Training
model.train()
optimizer.zero_grad()
train_pred = model(X_train)
train_loss = criterion(train_pred, y_train)
train_loss.backward()
optimizer.step()
# Validation
model.eval()
with torch.no_grad():
val_pred = model(X_val)
val_loss = criterion(val_pred, y_val)
train_losses.append(train_loss.item())
val_losses.append(val_loss.item())
# Learning rate scheduling
scheduler.step(val_loss)
if (epoch + 1) % 20 == 0:
print(f"Epoch [{epoch+1}/{epochs}] - Train Loss: {train_loss.item():.6f}, Val Loss: {val_loss.item():.6f}")
return train_losses, val_losses
# Train model
print("=== Starting Lag-Llama Model Training ===")
train_losses, val_losses = train_model(model, X_train, y_train, X_val, y_val, epochs=100)
# Visualize training process
plt.figure(figsize=(12, 6))
plt.plot(train_losses, label='Training Loss', alpha=0.7)
plt.plot(val_losses, label='Validation Loss', alpha=0.7)
plt.title('Lag-Llama Model Training Process')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
7. Prediction and Performance Evaluation
def evaluate_model(model, X_test, y_test, model_name="Lag-Llama"):
"""Evaluate model performance"""
model.eval()
with torch.no_grad():
predictions = model(X_test)
predictions = predictions.cpu().numpy().flatten()
y_test = y_test.cpu().numpy().flatten()
# Calculate performance metrics
mse = mean_squared_error(y_test, predictions)
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mse)
print(f"\n{model_name} Performance:")
print(f" MSE: {mse:.4f}")
print(f" MAE: {mae:.4f}")
print(f" RMSE: {rmse:.4f}")
return predictions, (mse, mae, rmse)
# Evaluate performance
predictions, metrics = evaluate_model(model, X_val, y_val)
# Visualize prediction results
plt.figure(figsize=(15, 8))
# First 200 samples: prediction vs actual
sample_size = 200
plt.subplot(2, 1, 1)
plt.plot(y_val[:sample_size].cpu().numpy(), label='Actual', alpha=0.7)
plt.plot(predictions[:sample_size], label='Lag-Llama Prediction', alpha=0.7)
plt.title('Lag-Llama: Prediction vs Actual (First 200 Samples)')
plt.xlabel('Sample')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
# Scatter plot
plt.subplot(2, 1, 2)
plt.scatter(y_val.cpu().numpy(), predictions, alpha=0.5)
plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2)
plt.xlabel('Actual Value')
plt.ylabel('Predicted Value')
plt.title('Actual vs Predicted Scatter Plot')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
8. Multi-series Prediction
def predict_multiple_series(model, multivariate_ts, max_lags=24, forecast_horizon=7):
"""Predict multiple time series"""
predictions = {}
for col in multivariate_ts.columns:
series = multivariate_ts[col].values
# Create lag features from recent data
recent_data = series[-max_lags:]
recent_data = recent_data.reshape(1, -1)
# Prediction
model.eval()
with torch.no_grad():
X = torch.FloatTensor(recent_data).to(device)
pred = model(X)
predictions[col] = pred.cpu().numpy().flatten()[0]
return predictions
# Multi-series prediction
forecast_predictions = predict_multiple_series(model, multivariate_ts)
print("=== Multi-series Prediction Results ===")
for col, pred in forecast_predictions.items():
print(f"{col}: {pred:.4f}")
# Visualize prediction results
plt.figure(figsize=(15, 10))
for i, (col, pred) in enumerate(forecast_predictions.items()):
plt.subplot(3, 2, i+1)
# Recent 100 days data
recent_data = multivariate_ts[col].tail(100)
plt.plot(recent_data.index, recent_data.values, label='Past Data', alpha=0.7)
# Prediction
last_date = recent_data.index[-1]
next_date = last_date + pd.Timedelta(days=1)
plt.axhline(y=pred, color='red', linestyle='--',
label=f'Prediction: {pred:.2f}')
plt.title(f'{col} - Next Prediction: {pred:.2f}')
plt.ylabel('Value')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
📊 Model Comparison and Selection Guide
Performance Comparison
Model | Advantages | Disadvantages | Use Cases |
---|---|---|---|
TimeGPT | Best performance, natural language interface | API dependency, cost | Commercial services, prototyping |
Lag-Llama | Open source, local execution | Performance limitations, complex setup | Research, personal projects |
Moirai | Multi-series, efficiency | Complex architecture | Large-scale multivariate prediction |
Chronos | Token-based, flexibility | Meta dependency | Experimental research |
Selection Guide
1. By Project Scale
- Small Scale: Lag-Llama (open source, free)
- Medium Scale: TimeGPT (API, fast implementation)
- Large Scale: Moirai (multi-series, efficiency)
2. By Requirements
- Accuracy Priority: TimeGPT
- Cost Efficiency: Lag-Llama
- Multi-series: Moirai
- Experimental Research: Chronos
3. Technical Considerations
- Data Privacy: Lag-Llama (local execution)
- Rapid Prototyping: TimeGPT (API)
- Customization: Lag-Llama (open source)
- Scalability: Moirai (multi-tasking)
🚀 Next Steps and Future Outlook
Technology Development Directions
- Larger Models: Time series models with trillions of parameters
- Multimodal: Integration of text, images, and time series
- Real-time Learning: Continuous learning from streaming data
- Explainability: Natural language explanations of predictions
Practical Application Considerations
- Data Quality: LLMs also need high-quality data
- Cost Management: Monitor API usage
- Model Updates: Regular retraining needed
- Performance Monitoring: Continuous performance tracking
Future Research Directions
- Few-shot Learning: Learning with even less data
- Transfer Learning: Knowledge transfer across domains
- Causal Learning: Causality-based prediction
- Uncertainty Quantification: Quantifying uncertainty
📚 Learning Summary
What We Learned in This Part
- The Rise of LLM-based Time Series Forecasting
- Limitations of previous models
- LLM’s revolutionary approach
- Core technical innovations
- Key Models
- TimeGPT: OpenAI’s commercial model
- Lag-Llama: Open source alternative
- Moirai: Multi-series forecasting
- Chronos: Meta’s token-based model
- Practical Implementation
- Lag-Llama model implementation
- Multi-series prediction
- Performance evaluation and comparison
Key Concepts Summary
Concept | Description | Importance |
---|---|---|
Tokenization | Converting time series to tokens | ⭐⭐⭐⭐⭐ |
Few-shot Learning | Learning with minimal data | ⭐⭐⭐⭐⭐ |
Multi-task Learning | Performing multiple tasks simultaneously | ⭐⭐⭐⭐ |
Zero-shot Prediction | Prediction without additional training | ⭐⭐⭐⭐⭐ |
Practical Application Considerations
- Model Selection: Choose appropriate model based on requirements
- Cost Management: Balance API costs and performance
- Data Preparation: Importance of high-quality data
- Performance Monitoring: Continuous model performance tracking
🔗 Series Navigation
← Previous: Part 3: Transformer-Based Time Series Forecasting Models
Next →: Series Complete! 🎉
Series Complete: We have systematically learned the evolution of time series forecasting from ARIMA to the latest LLM models. Now you can build powerful time series forecasting systems in practice using various tools and techniques! 🚀
Through this series, we experienced the past, present, and future of time series forecasting. Understanding the characteristics and pros/cons of each model, we developed practical capabilities to apply them to real data! 🎯