📚 Blog Archive
Explore all technical posts by category, date, and search functionality
What is Data Lakehouse?
Lakehouse combining the advantages of data lakes and data warehouses
Limitations of Hive Metastore and the Emergence of Lakehouse
Learn about the structural limitations of Hadoop Hive Metastore and the Lakehouse architecture that emerged as a result.
Lakehouse Table Formats: Delta Lake, Apache Iceberg, Apache Hudi
Detailed analysis and comparison of table formats that are the core of modern data lakehouse
Kubernetes Local Setup Guide for macOS - Using Docker Desktop and Minikube
Step-by-step guide to install and configure Kubernetes cluster locally on macOS using Docker Desktop and Minikube.
What is Kubernetes? - The Core of Container Orchestration
Learn about Kubernetes' background, core concepts, key features, and its role in modern cloud-native applications.
Part 1: Fundamentals of Time Series Forecasting - From ARIMA to Prophet
Systematically learn the basic concepts of time series data and traditional statistical methods, up to the emergence of Prophet, and implement them with actual code.
Evolution of Time Series Forecasting: From Traditional Methods to Latest AI Models
From ARIMA to TimeGPT, a perfect guide to systematically learn the evolution of time series forecasting technology and the latest trends.
Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR
Explore advanced deep learning models for time series forecasting, including N-BEATS and DeepAR, with hands-on implementation using PyTorch.
Part 3: Transformer-Based Time Series Forecasting Models
Explore state-of-the-art transformer-based time series forecasting models including Informer, Autoformer, FEDformer, and PatchTST with hands-on practice.
Part 4: Latest Generative AI Models - TimeGPT, Lag-Llama, Moirai, Chronos
Explore innovative time series forecasting models using large language models and implement them in practice.
Apache Airflow Advanced Guide: From DAG Optimization to Monitoring
Learn advanced features and best practices of Apache Airflow commonly used in production environments and apply them to real projects.
Apache Kafka Python Guide: Real-time Streaming and Data Processing
Learn real-time streaming development and data processing techniques using Apache Kafka with Python and apply them to real projects.
Apache Kafka Real-time Streaming Guide: From Producer to Consumer
Learn core concepts and practical applications of Apache Kafka for processing large-scale real-time data and apply them to real projects.
Complete Apache Spark Mastery Series: Everything About Big Data Processing
From Apache Spark's origins to advanced performance tuning - a complete guide series for big data processing.
Part 1: Apache Spark Basics and Core Concepts - From RDD to DataFrame
Learn Apache Spark's basic structure and core concepts including RDD, DataFrame, and Spark SQL through hands-on practice.
Part 2: Apache Spark Large-scale Batch Processing and UDF Usage - Real-world Project
Advanced batch processing techniques in Apache Spark, UDF writing, and production environment setup using Docker and Kubernetes.
Part 3: Apache Spark Real-time Streaming Processing and Kafka Integration - Real-world Project
Build real-time data processing and analysis systems using Apache Spark Streaming, Structured Streaming, and Kafka integration.
Part 4: Apache Spark Monitoring and Performance Tuning - Production Environment Completion
Complete production environment setup through Apache Spark performance monitoring, profiling, memory optimization, and cluster tuning.
Complete Apache Flink Mastery Series: Everything About True Streaming Processing
From Apache Flink's core concepts to production deployment - a complete guide series for true real-time streaming processing.
Part 1: Apache Flink Basics and Core Concepts - The Beginning of True Streaming Processing
Learn Apache Flink's basic structure and core concepts including DataStream API, state management, and time processing through hands-on practice.
Part 2: Apache Flink Advanced Streaming Processing and State Management - Production-grade Real-time Systems
Learn advanced state management, checkpointing, savepoints, and complex time processing strategies in Apache Flink, and implement advanced patterns that can be applied directly to real-world scenarios.
Part 4: Apache Flink Production Deployment and Performance Optimization - Enterprise Operations Mastery
Complete guide to deploying Apache Flink on Kubernetes in production environments, optimizing performance, and implementing monitoring and disaster recovery strategies.
Part 1: Change Data Capture and Debezium Practical Implementation - Complete Real-time Data Synchronization
From CDC core concepts to building real-time data synchronization systems with Debezium, a complete guide to event-driven architecture.
Part 2: Kafka Connect and Production CDC Operations - Enterprise Real-time Data Pipeline
Advanced Kafka Connect architecture, custom connector development, large-scale CDC pipeline operation strategies, performance optimization and disaster recovery.
Part 1: Apache Iceberg Fundamentals and Table Format - The Beginning of Modern Data Lakehouse
Learn the complete fundamentals of modern data lakehouse from Apache Iceberg's core concepts to table format, schema evolution, and partitioning strategies.
Part 2: Apache Iceberg Advanced Features and Performance Optimization - Production-grade Data Platform
Learn all advanced features needed for production environments including advanced partitioning strategies, compaction and cleanup operations, query performance optimization, and metadata management with version control.
Part 3: Apache Iceberg and Big Data Ecosystem Integration - Enterprise Data Platform
Complete guide to Apache Iceberg integration with Spark, Flink, Presto/Trino, comparison with Delta Lake and Hudi, cloud storage optimization, and building large-scale data lakehouse through practical projects.
Part 1: HyperLogLog Fundamentals and Cardinality Estimation - Efficient Unique Value Counting in Big Data
Master the complete guide to HyperLogLog algorithm from principles to practical applications, efficiently estimating cardinality in large-scale data.
Part 2: HyperLogLog Production Application and Optimization - Building Production-grade BI Systems
Part 3: HyperLogLog and Advanced Probabilistic Algorithms - Completion of Modern BI Analytics
Part 1: Time Series Database Fundamentals and Architecture - Complete Guide to Modern TDB
Complete guide to Time Series Database fundamentals, architecture, and optimization principles. Learn about InfluxDB, TimescaleDB, Prometheus, and practical implementation strategies.
Part 2: Time Series Database Advanced Features and Optimization - Building Production-grade TDB Systems
Complete guide to advanced TDB features, distributed architecture, high availability, and performance tuning for production environments.
Part 3: Time Series Database Integration and Deployment - Completing the Modern TDB Ecosystem
Complete guide to TDB integration with other systems, cloud-native architecture, latest trends, and actual production deployment strategies for the modern TDB ecosystem.
Complete Guide to Data Quality Management with dbt - Core of Modern Data Pipelines
Everything about data quality management using dbt and major data platforms. A complete practical guide with Snowflake, BigQuery, Redshift, and more.
Complete Guide to BA (Business Analytics) Terminology - Essential Concepts for Data Analysts
A comprehensive guide to core terminology in the Business Analytics field. Covering everything from analytical techniques to business metrics and tools.