Part 4: Latest Generative AI Models - TimeGPT, Lag-Llama, Moirai, Chronos

Explore innovative time series forecasting models using large language models and implement them in practice.

📖 Table of Contents

  1. The Rise of LLM-based Time Series Forecasting
  2. TimeGPT: OpenAI’s Time Series Model
  3. Lag-Llama: Open Source Alternative
  4. Moirai: Multi-series Forecasting
  5. Chronos: Meta’s Time Series Model
  6. Hands-on: Lag-Llama Implementation
  7. Model Comparison and Selection Guide
  8. Next Steps and Future Outlook
  9. Learning Summary

🚀 The Rise of LLM-based Time Series Forecasting

Limitations of Previous Models

The models we learned in Parts 1-3 each had their strengths, but shared common limitations:

  1. Domain Specialization: Optimized for specific time series patterns
  2. Data Dependency: Requires large amounts of domain-specific training data
  3. Generalization Limits: Difficulty adapting to new time series types
  4. Limited Multi-tasking: Hard to perform various prediction tasks simultaneously

LLM’s Revolutionary Approach

Large Language Models (LLMs) presented a new paradigm to overcome these limitations:

  • Universality: Learning from diverse domain time series data
  • Few-shot Learning: Excellent performance with minimal data
  • Multi-tasking: Performing multiple prediction tasks simultaneously
  • Natural Language Interface: Intuitive querying and explainability

Core Technical Innovations

  1. Tokenization: Converting time series data into tokens
  2. Causal Attention: Modeling temporal dependencies in time series
  3. Multi-scale Learning: Learning patterns at various time scales
  4. Instruction Following: Prediction based on natural language instructions

🤖 TimeGPT: OpenAI’s Time Series Model

Core Features of TimeGPT

TimeGPT is the first large-scale time series forecasting model developed by OpenAI.

1. Large-scale Pre-training

Training Data: 100+ billion time series data points
Domains: Finance, Retail, Manufacturing, Energy, Healthcare, etc.
Geographic Scope: Worldwide
Time Range: 2020-2024

2. Zero-shot Prediction Capability

TimeGPT can predict new time series without additional training based on pre-trained knowledge:

# Example: Immediate prediction for new time series
forecast = timegpt.predict(
    series=unseen_timeseries,
    horizon=30,  # 30-day prediction
    freq="D"     # Daily data
)

3. Natural Language Interface

# Request prediction in natural language
result = timegpt.predict_with_context(
    series=stock_prices,
    context="Please predict stock prices for the next week, considering market volatility.",
    horizon=7
)

TimeGPT Architecture

Input Time Series → Tokenization → Transformer Encoder → Decoder → Predictions
     ↓                ↓                ↓              ↓
  Raw Data      Time Tokens    Attention      Forecast

Tokenization Strategy

  1. Value Binning: Converting continuous values to discrete tokens
  2. Temporal Encoding: Including time information in tokens
  3. Context Tokens: Expressing metadata as tokens

🦙 Lag-Llama: Open Source Alternative

Background of Lag-Llama

Lag-Llama was developed by Hugging Face as an open-source alternative to overcome the limitations of TimeGPT being a commercial model available only through API.

1. Open Source Accessibility

  • Fully Open Source: Model weights and code publicly available
  • Local Execution: Run locally without API dependencies
  • Customization: Modify models according to needs

2. LLaMA-based Architecture

Time Series Input → Lag Features → LLaMA-7B → Predictions
     ↓                ↓            ↓
  Raw Data        Lag Tokens   Transformer

3. Innovation of Lag Features

Lag-Llama converts lag values of time series into tokens:

# Lag Features creation example
def create_lag_features(series, max_lags=24):
    """Create lag features from time series"""
    lags = []
    for lag in range(1, max_lags + 1):
        lags.append(series.shift(lag))
    return pd.concat(lags, axis=1)

Advantages of Lag-Llama

  1. Transparency: Public model structure and training process
  2. Cost Efficiency: No API costs
  3. Privacy: Data doesn’t leave local environment
  4. Scalability: Customizable for various domains

🌊 Moirai: Multi-series Forecasting

Core Concept of Moirai

Moirai is a multi-tasking model that predicts multiple time series simultaneously.

1. Multi-tasking Architecture

Time Series 1 ┐
Time Series 2 ├─→ Shared Encoder → Task-specific Heads → Predictions
Time Series 3 ┘

2. Cross-series Learning

  • Common Pattern Learning: Discovering shared patterns across time series
  • Knowledge Transfer: Learning from one time series helps others
  • Efficiency: Processing multiple time series with a single model

3. Hierarchical Forecasting

Country Level ┐
Region Level  ├─→ Hierarchical Prediction
City Level    ┘

⏰ Chronos: Meta’s Time Series Model

Chronos’s Innovative Approach

Chronos is a token-based time series forecasting model developed by Meta.

1. Token-based Forecasting

Converting time series to tokens and processing like language models:

# Chronos tokenization example
def tokenize_timeseries(series, vocab_size=512):
    """Convert time series to tokens"""
    # Value normalization
    normalized = (series - series.mean()) / series.std()
    
    # Quantization
    quantized = np.digitize(normalized, 
                          np.linspace(-3, 3, vocab_size-1))
    
    return quantized

2. Multi-scale Learning

  • Short-term Patterns: Daily, weekly variations
  • Medium-term Patterns: Monthly, quarterly trends
  • Long-term Patterns: Annual seasonality, structural changes

3. Instruction Tuning

Performing various prediction tasks based on natural language instructions:

# Chronos instruction examples
instructions = [
    "Predict sales for the next 7 days",
    "Provide monthly prediction considering seasonality",
    "Make stable prediction excluding outliers"
]

🛠️ Hands-on: Lag-Llama Implementation

1. Environment Setup

# Install required libraries
# pip install transformers torch datasets evaluate
# pip install pandas numpy matplotlib seaborn
# pip install scikit-learn

import torch
import torch.nn as nn
from transformers import AutoModel, AutoTokenizer
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

2. Load Lag-Llama Model

# Load Lag-Llama model
model_name = "time-series-foundation-models/Lag-Llama"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

print(f"Model parameters: {sum(p.numel() for p in model.parameters())}")
print(f"Model structure: {model.config}")

3. Prepare Time Series Data

def generate_complex_multivariate_timeseries(n=2000, n_series=5, seed=42):
    """Generate complex multivariate time series data"""
    np.random.seed(seed)
    
    # Time index
    t = np.arange(n)
    
    # Common trend
    common_trend = 0.02 * t + 0.0001 * t**2
    
    # Series-specific characteristics
    series_data = []
    for i in range(n_series):
        # Series-specific unique patterns
        series_trend = common_trend + 0.01 * i * t
        series_seasonal = 10 * np.sin(2 * np.pi * t / 365 + i * np.pi/3)
        series_noise = np.random.normal(0, 1 + 0.1 * i, n)
        
        # Cross-series correlation
        if i > 0:
            correlation = 0.3 * np.sin(2 * np.pi * t / 100 + i)
            series_data.append(series_trend + series_seasonal + series_noise + correlation)
        else:
            series_data.append(series_trend + series_seasonal + series_noise)
    
    # Create DataFrame
    df = pd.DataFrame(np.array(series_data).T, 
                     columns=[f'series_{i+1}' for i in range(n_series)],
                     index=pd.date_range('2020-01-01', periods=n, freq='D'))
    
    return df

# Generate multivariate time series
multivariate_ts = generate_complex_multivariate_timeseries(2000, 5)

# Visualization
plt.figure(figsize=(15, 10))
for i, col in enumerate(multivariate_ts.columns):
    plt.subplot(3, 2, i+1)
    plt.plot(multivariate_ts.index, multivariate_ts[col], alpha=0.7)
    plt.title(f'{col} Time Series')
    plt.ylabel('Value')
    plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

print(f"Number of series: {len(multivariate_ts.columns)}")
print(f"Data length: {len(multivariate_ts)}")
print(f"Value range: {multivariate_ts.min().min():.2f} ~ {multivariate_ts.max().max():.2f}")

4. Create Lag Features

def create_lag_features(data, max_lags=24, forecast_horizon=7):
    """Create lag features for Lag-Llama"""
    lag_features = {}
    targets = {}
    
    for col in data.columns:
        series = data[col].values
        
        # Create lag features
        lags = []
        for lag in range(1, max_lags + 1):
            lag_values = np.roll(series, lag)
            lag_values[:lag] = np.nan  # First lag values are NaN
            lags.append(lag_values)
        
        lag_features[col] = np.column_stack(lags)
        
        # Create targets (future values)
        targets[col] = series[forecast_horizon:]
    
    return lag_features, targets

# Create lag features
max_lags = 24
forecast_horizon = 7
lag_features, targets = create_lag_features(multivariate_ts, max_lags, forecast_horizon)

print(f"Lag features shape: {lag_features['series_1'].shape}")
print(f"Targets shape: {targets['series_1'].shape}")

5. Implement Lag-Llama Model

class LagLlamaPredictor(nn.Module):
    """Lag-Llama based time series prediction model"""
    
    def __init__(self, input_dim, hidden_dim=512, output_dim=1, num_layers=6):
        super(LagLlamaPredictor, self).__init__()
        
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        
        # Input embedding
        self.input_embedding = nn.Linear(input_dim, hidden_dim)
        
        # Transformer layers
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim,
            nhead=8,
            dim_feedforward=hidden_dim * 4,
            dropout=0.1,
            batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        
        # Output layer
        self.output_projection = nn.Linear(hidden_dim, output_dim)
        
        # Dropout
        self.dropout = nn.Dropout(0.1)
        
    def forward(self, x):
        # Input embedding
        x = self.input_embedding(x)
        x = self.dropout(x)
        
        # Transformer encoder
        x = self.transformer(x)
        
        # Use last time step output
        x = x[:, -1, :]  # (batch_size, hidden_dim)
        
        # Output prediction
        output = self.output_projection(x)
        
        return output

# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LagLlamaPredictor(
    input_dim=max_lags,
    hidden_dim=512,
    output_dim=1,
    num_layers=6
).to(device)

print(f"Lag-Llama model parameters: {sum(p.numel() for p in model.parameters())}")

6. Data Preprocessing and Training

def prepare_training_data(lag_features, targets, train_ratio=0.8):
    """Prepare training data"""
    all_X = []
    all_y = []
    
    for col in lag_features.keys():
        X = lag_features[col]
        y = targets[col]
        
        # Remove NaN
        valid_indices = ~np.isnan(X).any(axis=1)
        X = X[valid_indices]
        y = y[valid_indices]
        
        # Match lengths
        min_len = min(len(X), len(y))
        X = X[:min_len]
        y = y[:min_len]
        
        all_X.append(X)
        all_y.append(y)
    
    # Combine all time series
    X = np.vstack(all_X)
    y = np.hstack(all_y)
    
    # Train/validation split
    split_idx = int(len(X) * train_ratio)
    X_train, X_val = X[:split_idx], X[split_idx:]
    y_train, y_val = y[:split_idx], y[split_idx:]
    
    return X_train, X_val, y_train, y_val

# Prepare data
X_train, X_val, y_train, y_val = prepare_training_data(lag_features, targets)

# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train).to(device)
X_val = torch.FloatTensor(X_val).to(device)
y_train = torch.FloatTensor(y_train).unsqueeze(1).to(device)
y_val = torch.FloatTensor(y_val).unsqueeze(1).to(device)

print(f"Training data: {X_train.shape}, {y_train.shape}")
print(f"Validation data: {X_val.shape}, {y_val.shape}")

# Training function
def train_model(model, X_train, y_train, X_val, y_val, epochs=100, lr=0.001):
    """Train model"""
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=10, factor=0.5)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training
        model.train()
        optimizer.zero_grad()
        
        train_pred = model(X_train)
        train_loss = criterion(train_pred, y_train)
        train_loss.backward()
        optimizer.step()
        
        # Validation
        model.eval()
        with torch.no_grad():
            val_pred = model(X_val)
            val_loss = criterion(val_pred, y_val)
        
        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())
        
        # Learning rate scheduling
        scheduler.step(val_loss)
        
        if (epoch + 1) % 20 == 0:
            print(f"Epoch [{epoch+1}/{epochs}] - Train Loss: {train_loss.item():.6f}, Val Loss: {val_loss.item():.6f}")
    
    return train_losses, val_losses

# Train model
print("=== Starting Lag-Llama Model Training ===")
train_losses, val_losses = train_model(model, X_train, y_train, X_val, y_val, epochs=100)

# Visualize training process
plt.figure(figsize=(12, 6))
plt.plot(train_losses, label='Training Loss', alpha=0.7)
plt.plot(val_losses, label='Validation Loss', alpha=0.7)
plt.title('Lag-Llama Model Training Process')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

7. Prediction and Performance Evaluation

def evaluate_model(model, X_test, y_test, model_name="Lag-Llama"):
    """Evaluate model performance"""
    model.eval()
    with torch.no_grad():
        predictions = model(X_test)
        predictions = predictions.cpu().numpy().flatten()
        y_test = y_test.cpu().numpy().flatten()
    
    # Calculate performance metrics
    mse = mean_squared_error(y_test, predictions)
    mae = mean_absolute_error(y_test, predictions)
    rmse = np.sqrt(mse)
    
    print(f"\n{model_name} Performance:")
    print(f"  MSE: {mse:.4f}")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")
    
    return predictions, (mse, mae, rmse)

# Evaluate performance
predictions, metrics = evaluate_model(model, X_val, y_val)

# Visualize prediction results
plt.figure(figsize=(15, 8))

# First 200 samples: prediction vs actual
sample_size = 200
plt.subplot(2, 1, 1)
plt.plot(y_val[:sample_size].cpu().numpy(), label='Actual', alpha=0.7)
plt.plot(predictions[:sample_size], label='Lag-Llama Prediction', alpha=0.7)
plt.title('Lag-Llama: Prediction vs Actual (First 200 Samples)')
plt.xlabel('Sample')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)

# Scatter plot
plt.subplot(2, 1, 2)
plt.scatter(y_val.cpu().numpy(), predictions, alpha=0.5)
plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2)
plt.xlabel('Actual Value')
plt.ylabel('Predicted Value')
plt.title('Actual vs Predicted Scatter Plot')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

8. Multi-series Prediction

def predict_multiple_series(model, multivariate_ts, max_lags=24, forecast_horizon=7):
    """Predict multiple time series"""
    predictions = {}
    
    for col in multivariate_ts.columns:
        series = multivariate_ts[col].values
        
        # Create lag features from recent data
        recent_data = series[-max_lags:]
        recent_data = recent_data.reshape(1, -1)
        
        # Prediction
        model.eval()
        with torch.no_grad():
            X = torch.FloatTensor(recent_data).to(device)
            pred = model(X)
            predictions[col] = pred.cpu().numpy().flatten()[0]
    
    return predictions

# Multi-series prediction
forecast_predictions = predict_multiple_series(model, multivariate_ts)

print("=== Multi-series Prediction Results ===")
for col, pred in forecast_predictions.items():
    print(f"{col}: {pred:.4f}")

# Visualize prediction results
plt.figure(figsize=(15, 10))
for i, (col, pred) in enumerate(forecast_predictions.items()):
    plt.subplot(3, 2, i+1)
    
    # Recent 100 days data
    recent_data = multivariate_ts[col].tail(100)
    plt.plot(recent_data.index, recent_data.values, label='Past Data', alpha=0.7)
    
    # Prediction
    last_date = recent_data.index[-1]
    next_date = last_date + pd.Timedelta(days=1)
    plt.axhline(y=pred, color='red', linestyle='--', 
                label=f'Prediction: {pred:.2f}')
    
    plt.title(f'{col} - Next Prediction: {pred:.2f}')
    plt.ylabel('Value')
    plt.legend()
    plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

📊 Model Comparison and Selection Guide

Performance Comparison

Model Advantages Disadvantages Use Cases
TimeGPT Best performance, natural language interface API dependency, cost Commercial services, prototyping
Lag-Llama Open source, local execution Performance limitations, complex setup Research, personal projects
Moirai Multi-series, efficiency Complex architecture Large-scale multivariate prediction
Chronos Token-based, flexibility Meta dependency Experimental research

Selection Guide

1. By Project Scale

  • Small Scale: Lag-Llama (open source, free)
  • Medium Scale: TimeGPT (API, fast implementation)
  • Large Scale: Moirai (multi-series, efficiency)

2. By Requirements

  • Accuracy Priority: TimeGPT
  • Cost Efficiency: Lag-Llama
  • Multi-series: Moirai
  • Experimental Research: Chronos

3. Technical Considerations

  • Data Privacy: Lag-Llama (local execution)
  • Rapid Prototyping: TimeGPT (API)
  • Customization: Lag-Llama (open source)
  • Scalability: Moirai (multi-tasking)

🚀 Next Steps and Future Outlook

Technology Development Directions

  1. Larger Models: Time series models with trillions of parameters
  2. Multimodal: Integration of text, images, and time series
  3. Real-time Learning: Continuous learning from streaming data
  4. Explainability: Natural language explanations of predictions

Practical Application Considerations

  1. Data Quality: LLMs also need high-quality data
  2. Cost Management: Monitor API usage
  3. Model Updates: Regular retraining needed
  4. Performance Monitoring: Continuous performance tracking

Future Research Directions

  1. Few-shot Learning: Learning with even less data
  2. Transfer Learning: Knowledge transfer across domains
  3. Causal Learning: Causality-based prediction
  4. Uncertainty Quantification: Quantifying uncertainty

📚 Learning Summary

What We Learned in This Part

  1. The Rise of LLM-based Time Series Forecasting
    • Limitations of previous models
    • LLM’s revolutionary approach
    • Core technical innovations
  2. Key Models
    • TimeGPT: OpenAI’s commercial model
    • Lag-Llama: Open source alternative
    • Moirai: Multi-series forecasting
    • Chronos: Meta’s token-based model
  3. Practical Implementation
    • Lag-Llama model implementation
    • Multi-series prediction
    • Performance evaluation and comparison

Key Concepts Summary

Concept Description Importance
Tokenization Converting time series to tokens ⭐⭐⭐⭐⭐
Few-shot Learning Learning with minimal data ⭐⭐⭐⭐⭐
Multi-task Learning Performing multiple tasks simultaneously ⭐⭐⭐⭐
Zero-shot Prediction Prediction without additional training ⭐⭐⭐⭐⭐

Practical Application Considerations

  1. Model Selection: Choose appropriate model based on requirements
  2. Cost Management: Balance API costs and performance
  3. Data Preparation: Importance of high-quality data
  4. Performance Monitoring: Continuous model performance tracking

🔗 Series Navigation

← Previous: Part 3: Transformer-Based Time Series Forecasting Models

Next →: Series Complete! 🎉


Series Complete: We have systematically learned the evolution of time series forecasting from ARIMA to the latest LLM models. Now you can build powerful time series forecasting systems in practice using various tools and techniques! 🚀

Through this series, we experienced the past, present, and future of time series forecasting. Understanding the characteristics and pros/cons of each model, we developed practical capabilities to apply them to real data! 🎯