📚 Blog Archive

Explore all technical posts by category, date, and search functionality

Apache-Spark Apache-Flink Apache-Kafka Time-Series Kubernetes Data-Engineering Python Machine-Learning AI Big-Data
Total 35 posts

What is Data Lakehouse?

Lakehouse combining the advantages of data lakes and data warehouses

Limitations of Hive Metastore and the Emergence of Lakehouse

Learn about the structural limitations of Hadoop Hive Metastore and the Lakehouse architecture that emerged as a result.

Lakehouse Table Formats: Delta Lake, Apache Iceberg, Apache Hudi

Detailed analysis and comparison of table formats that are the core of modern data lakehouse

Kubernetes Local Setup Guide for macOS - Using Docker Desktop and Minikube

Step-by-step guide to install and configure Kubernetes cluster locally on macOS using Docker Desktop and Minikube.

What is Kubernetes? - The Core of Container Orchestration

Learn about Kubernetes' background, core concepts, key features, and its role in modern cloud-native applications.

Part 1: Fundamentals of Time Series Forecasting - From ARIMA to Prophet

Systematically learn the basic concepts of time series data and traditional statistical methods, up to the emergence of Prophet, and implement them with actual code.

📚 Time series forecasting Series Part 2

Evolution of Time Series Forecasting: From Traditional Methods to Latest AI Models

From ARIMA to TimeGPT, a perfect guide to systematically learn the evolution of time series forecasting technology and the latest trends.

📚 Time series forecasting Series

Part 2: Deep Learning-based Time Series Forecasting - N-BEATS and DeepAR

Explore advanced deep learning models for time series forecasting, including N-BEATS and DeepAR, with hands-on implementation using PyTorch.

📚 Evolution of time series forecasting Series Part 3

Part 3: Transformer-Based Time Series Forecasting Models

Explore state-of-the-art transformer-based time series forecasting models including Informer, Autoformer, FEDformer, and PatchTST with hands-on practice.

📚 Time series forecasting Series Part 4

Part 4: Latest Generative AI Models - TimeGPT, Lag-Llama, Moirai, Chronos

Explore innovative time series forecasting models using large language models and implement them in practice.

📚 Time series forecasting Series Part 5

Apache Airflow Advanced Guide: From DAG Optimization to Monitoring

Learn advanced features and best practices of Apache Airflow commonly used in production environments and apply them to real projects.

Apache Kafka Python Guide: Real-time Streaming and Data Processing

Learn real-time streaming development and data processing techniques using Apache Kafka with Python and apply them to real projects.

Apache Kafka Real-time Streaming Guide: From Producer to Consumer

Learn core concepts and practical applications of Apache Kafka for processing large-scale real-time data and apply them to real projects.

Complete Apache Spark Mastery Series: Everything About Big Data Processing

From Apache Spark's origins to advanced performance tuning - a complete guide series for big data processing.

Part 1: Apache Spark Basics and Core Concepts - From RDD to DataFrame

Learn Apache Spark's basic structure and core concepts including RDD, DataFrame, and Spark SQL through hands-on practice.

📚 Apache spark complete guide Series Part 2

Part 2: Apache Spark Large-scale Batch Processing and UDF Usage - Real-world Project

Advanced batch processing techniques in Apache Spark, UDF writing, and production environment setup using Docker and Kubernetes.

📚 Apache spark complete guide Series Part 3

Part 3: Apache Spark Real-time Streaming Processing and Kafka Integration - Real-world Project

Build real-time data processing and analysis systems using Apache Spark Streaming, Structured Streaming, and Kafka integration.

📚 Apache spark complete guide Series Part 4

Part 4: Apache Spark Monitoring and Performance Tuning - Production Environment Completion

Complete production environment setup through Apache Spark performance monitoring, profiling, memory optimization, and cluster tuning.

📚 Apache spark complete guide Series Part 5

Complete Apache Flink Mastery Series: Everything About True Streaming Processing

From Apache Flink's core concepts to production deployment - a complete guide series for true real-time streaming processing.

Part 1: Apache Flink Basics and Core Concepts - The Beginning of True Streaming Processing

Learn Apache Flink's basic structure and core concepts including DataStream API, state management, and time processing through hands-on practice.

📚 Apache flink complete guide Series Part 2

Part 2: Apache Flink Advanced Streaming Processing and State Management - Production-grade Real-time Systems

Learn advanced state management, checkpointing, savepoints, and complex time processing strategies in Apache Flink, and implement advanced patterns that can be applied directly to real-world scenarios.

📚 Apache flink complete guide Series Part 3

Part 4: Apache Flink Production Deployment and Performance Optimization - Enterprise Operations Mastery

Complete guide to deploying Apache Flink on Kubernetes in production environments, optimizing performance, and implementing monitoring and disaster recovery strategies.

📚 Apache flink complete guide Series Part 5

Part 1: Change Data Capture and Debezium Practical Implementation - Complete Real-time Data Synchronization

From CDC core concepts to building real-time data synchronization systems with Debezium, a complete guide to event-driven architecture.

📚 Change data capture complete guide Series Part 2

Part 2: Kafka Connect and Production CDC Operations - Enterprise Real-time Data Pipeline

Advanced Kafka Connect architecture, custom connector development, large-scale CDC pipeline operation strategies, performance optimization and disaster recovery.

📚 Change data capture complete guide Series Part 3

Part 1: Apache Iceberg Fundamentals and Table Format - The Beginning of Modern Data Lakehouse

Learn the complete fundamentals of modern data lakehouse from Apache Iceberg's core concepts to table format, schema evolution, and partitioning strategies.

📚 Apache iceberg complete guide Series Part 2

Part 2: Apache Iceberg Advanced Features and Performance Optimization - Production-grade Data Platform

Learn all advanced features needed for production environments including advanced partitioning strategies, compaction and cleanup operations, query performance optimization, and metadata management with version control.

📚 Apache iceberg complete guide Series Part 3

Part 3: Apache Iceberg and Big Data Ecosystem Integration - Enterprise Data Platform

Complete guide to Apache Iceberg integration with Spark, Flink, Presto/Trino, comparison with Delta Lake and Hudi, cloud storage optimization, and building large-scale data lakehouse through practical projects.

📚 Apache iceberg complete guide Series Part 4

Part 1: HyperLogLog Fundamentals and Cardinality Estimation - Efficient Unique Value Counting in Big Data

Master the complete guide to HyperLogLog algorithm from principles to practical applications, efficiently estimating cardinality in large-scale data.

📚 Modern bi engineering Series Part 2

Part 2: HyperLogLog Production Application and Optimization - Building Production-grade BI Systems

📚 Modern bi engineering Series Part 3

Part 3: HyperLogLog and Advanced Probabilistic Algorithms - Completion of Modern BI Analytics

📚 Modern bi engineering Series Part 4

Part 1: Time Series Database Fundamentals and Architecture - Complete Guide to Modern TDB

Complete guide to Time Series Database fundamentals, architecture, and optimization principles. Learn about InfluxDB, TimescaleDB, Prometheus, and practical implementation strategies.

📚 Time series database master Series Part 2

Part 2: Time Series Database Advanced Features and Optimization - Building Production-grade TDB Systems

Complete guide to advanced TDB features, distributed architecture, high availability, and performance tuning for production environments.

📚 Time series database master Series Part 3

Part 3: Time Series Database Integration and Deployment - Completing the Modern TDB Ecosystem

Complete guide to TDB integration with other systems, cloud-native architecture, latest trends, and actual production deployment strategies for the modern TDB ecosystem.

📚 Time series database master Series Part 4

Complete Guide to Data Quality Management with dbt - Core of Modern Data Pipelines

Everything about data quality management using dbt and major data platforms. A complete practical guide with Snowflake, BigQuery, Redshift, and more.

📚 Modern data stack Series Part 2

Complete Guide to BA (Business Analytics) Terminology - Essential Concepts for Data Analysts

A comprehensive guide to core terminology in the Business Analytics field. Covering everything from analytical techniques to business metrics and tools.

📚 Modern bi analytics Series Part 2