From Batch to Micro-Batch Streaming: Lessons Learned the Hard Way in a Delta Index Pipeline



Posted on Mon May 4 2026 | 4:30 pm


This article describes how a production delta-index pipeline migrated from scheduled batch to micro-batch Spark Structured Streaming. It covers why record-level streaming was rejected, how partition-based watermarks replaced fragile S3 completion markers, overlap-window correctness, and restart-as-design strategies for better predictability in object-store–based ingestion systems.




Search
Side Widget
You can put anything you want inside of these side widgets. They are easy to use, and feature the new Bootstrap 4 card containers!