You’re running a massive logistics operation. A sudden downpour hits North Bangalore at 5 PM, and demand for vehicles instantly triples. Your operations team needs to decide right now whether to apply surge pricing to incentivise drivers to move to that area.
But their dashboard is showing data from 15 minutes ago. In logistics, 15 minutes isn’t a delay; it’s a different reality.
This was our daily battle at Porter. We were making mission-critical decisions using a rear-view mirror. The problem wasn’t just slow dashboards; it was a fundamental architectural mismatch that was putting our production systems and our business intelligence at odds.
This article isn’t just about how we fixed it. It’s about the architectural principle of workload separation and how choosing the right tool for the right job moved us from lagging indicators to real-time operational control.
The Core Problem: Using a Bank Teller to Do an Analyst’s Job
Our most critical dashboards-Surge for supply/demand balancing and Spot for system health-were querying replicas of our production OLTP (Online Transaction Processing) databases.
Think of an OLTP database like a bank teller. It’s optimised for thousands of small, fast, individual transactions: “Deposit $50,” “Withdraw $20,” “Check balance.” It’s designed for consistency and speed on row-level operations.
Our analytics dashboards, however, were asking questions like a bank analyst: “What was the average transaction volume across all branches in the Koramangala area every minute for the past three hours?” This requires scanning and aggregating millions of rows.
Asking your bank teller to do an analyst’s job leads to predictable chaos:
- Query Contention: The analyst’s heavy query slows down the entire line of customers waiting for the teller. For us, this meant analytics queries were causing replica lag, threatening the stability of our customer-facing applications.
- Stale Data: To prevent this, we throttled the analyst, only letting them ask questions every 15 minutes. This made the system stable, but the data was useless for real-time decisions.
We weren’t facing a database problem; we were facing a systems design problem. We needed a dedicated OLAP (Online Analytical Processing) engine-our “bank analyst”-built for this exact purpose.
The “How”: Architecting for Real-Time Analytics with Apache Pinot
After evaluating several OLAP technologies, we chose Apache Pinot. It is purpose-built for user-facing analytics requiring high-throughput ingestion and sub-second query latency. The decision wasn’t just about speed; it was about an architecture that could decouple our analytical workloads from our transactional ones.

Here is the high-level architecture we implemented:

Let’s break down the data flow:
- Change Data Capture (CDC): We use Debezium to capture row-level changes from our production Postgres database’s write-ahead log (WAL). This is non-intrusive; it doesn’t add any load to the primary database. It simply listens for changes as they happen.
- Streaming with Apache Kafka: These change events are published to a Kafka topic. Kafka acts as a durable, scalable buffer that decouples the source database from our analytics engine. If Pinot goes down for maintenance, the data is safely queued in Kafka, waiting to be consumed.
- Ingestion into Apache Pinot: Pinot subscribes to the Kafka topic, ingesting the data in near real-time. Its columnar storage format and advanced indexing (like star-tree and sorted indexes) are what allow it to perform massive aggregations in milliseconds.
Strategic Decision: Build vs. Buy We deliberately chose StarTree Cloud (a managed Pinot service). While we have the talent to run Pinot ourselves, that would mean dedicating engineers to managing Zookeeper, Helix, and the nuances of scaling a distributed database. By outsourcing the infrastructure management, we freed up our team to focus on the higher-value problem: building data products that leverage real-time insights. This is a critical trade-off every data leader should evaluate.
The “So What?”: The Strategic Impact of Real-Time
The technical shift unlocked immense operational value.
- Surge Dashboard: Query latency dropped from 15 minutes to ~1-2 seconds. Our operations team can now observe supply and demand elasticity as it happens, adjusting incentives with surgical precision rather than swinging a sledgehammer 15 minutes too late.
- Spot Dashboard: We now have sub-second visibility into driver availability and system performance metrics. This is the difference between discovering a problem from a customer complaint versus catching an anomaly in the system before it ever impacts an order.
Quantifiable Impact & Trade-offs:
- Benefit: We’ve fundamentally de-risked our production environment. The number of “analytics-induced replica lag” incidents has dropped to zero. Engineering and operations teams now work in harmony, not in conflict.
- Benefit: While direct revenue attribution is complex, we estimate that faster, more accurate surge decisions have improved driver availability in high-demand zones by 5-7% during peak hours, directly impacting order fulfilment rates.
- Trade-off (Cost): This architecture introduces new costs: the managed Pinot service license and Kafka cluster expenses. However, we project that this cost is easily offset by the operational efficiencies and the avoidance of even a single major production incident.
- Trade-off (Complexity): We now manage a streaming data pipeline, which requires new monitoring and skills. It’s a more complex system, but it’s the right kind of complexity for the problem we’re solving.
Common Anti-Patterns to Avoid:
- The Monolithic Approach: Don’t try to solve every analytics problem with one database. The tool you use for deep, complex business intelligence (like a data warehouse) is likely not the right tool for real-time operational dashboards.
- Ignoring the Pipeline: A fast OLAP engine is useless if your data pipeline is batch-based. Real-time insights demand a real-time, end-to-end architecture from source to dashboard.
What’s Next: From Dashboards to Products
This journey was never just about dashboards. It was about building a real-time analytics foundation. Having proven the architecture, we are now extending its capabilities:
- Vehicle operator/Vendor App Insights: Powering real-time performance metrics directly within our vehicle operator/vendor app.
- Location Recency Monitoring: Building systems to monitor the freshness of the location data network-wide, a critical input for our dispatch algorithms.
The core lesson is this: your data’s value decays with time. By moving from a 15-minute batch mindset to a real-time streaming architecture, we’ve not only solved our immediate dashboarding pains but have also unlocked a new class of data-driven products.
So, here’s the question we leave you with:
👉 How much value is trapped in the latency of your current data architecture?