I spend most of my time at the UI boundary, which means I call send() a lot more often than I think about what happens after it.
That changed earlier this year. I was tracing behavior in systems my frontend work depends on and kept feeling the same gap: I knew what an SDK promised the browser, but not what the pipeline behind that promise actually had to survive. Network jitter. Retries. Out-of-order delivery. Bursty traffic. Storage costs that only become visible once somebody has to pay them for real.
send() makes the whole thing look simple. It isn't.
So I went looking. I spent time reading four production event collectors closely: RudderStack, Snowplow, PostHog, and Lytics. Then I built a smaller pipeline in Python, eventkit, to see whether the patterns I thought I was seeing were real or whether I was just getting attached to architecture diagrams.
The broad answer is that the patterns were real. The more useful answer is which patterns survived.
the question behind the question
The question I started with sounded narrow: what happens after send()?
But that wasn't really the question. The real question was: what does a production event pipeline have to get right before anybody can trust the numbers, profiles, or decisions that come out the other side?
That question matters a lot more than most frontend engineers get to see directly. We ship instrumentation. We assume it lands somewhere coherent. We assume retries don't duplicate revenue events, that identities don't drift apart, that a dashboard spike is real and not just some buffer flushing late. Usually that trust is implicit. I wanted a cleaner mental model of what was earning it.
I also wanted the model to be concrete enough that I could use it the next time something smelled wrong in the data.
how I studied it
I approached this the same way I learned to approach work in a chemistry lab: read the literature first, then build something small enough to break.
Here, the literature was production code.
I picked four systems because they sit in different parts of the design space:
- RudderStack treats collection as a durable routing problem.
- Snowplow cares deeply about schemas and downstream trust.
- PostHog splits the stack across Python and Rust, which makes its tradeoffs unusually visible.
- Lytics was useful to study because I could trace how the same event stream serves two different constraints at once: analytical storage and low-latency profile updates.
I wasn't trying to memorize implementation details. I was looking for repeated boundaries.
Across all four systems, I kept asking the same questions:
- Where does validation happen?
- How do events stay in order without global coordination?
- Where does durability actually begin?
- What gets buffered, and what happens on crash?
- Why does an event land in one storage system first instead of another?
Then I built eventkit in Python, which I hadn't really worked in before. That part was deliberate. If the patterns still held in an unfamiliar ecosystem, they were more likely to be structural than accidental.
The build helped because production systems can make anything look inevitable in hindsight. Implementation removes that illusion very quickly.
the four boundaries that kept repeating
This is the part I wish someone had given me earlier. Once I stopped staring at vendor-specific details, four boundaries kept showing up.
1. collection and validation are not the same job
This was the first pattern that became impossible to unsee.
RudderStack validates lightly at ingestion. Snowplow collects permissively, then validates hard in Enrich. Lytics accepts heterogeneous events and resolves a lot of meaning later in processing. PostHog normalizes aggressively and enforces limits partly to protect the system, not just the data.
The implementations differ, but the boundary is the same: the HTTP edge exists to accept and preserve events quickly, not to do all the semantic judgment up front.
That matters because these systems are not only defending data quality. They are defending against data loss.
Once I built even a naive version of the pipeline, the reason became obvious. If collection is blocked on deep semantic correctness, every schema edge case becomes an ingestion risk. If collection is fast and durability comes first, you can be stricter downstream in places where failures are observable, recoverable, and replayable.
The cleanest version of the principle is simple:
Accept fast. Make durable. Validate where failure is easier to inspect than to lose.
That does not mean validation is optional. It means collection and validation are solving different problems, and production systems stop getting weird once you treat them that way.
2. ordering is usually solved locally, not globally
The second recurring pattern was about sequence.
Events don't arrive in the order they were produced. Retries, queues, buffering, mobile reconnects, and clocks all ruin that fantasy almost immediately. But a lot of downstream state still depends on order. Identify before track. Consent before capture. Profile update before audience evaluation.
The interesting thing is that production systems mostly do not solve this with global coordination. They solve it with deterministic routing.
RudderStack hashes users to workers. PostHog leans on Kafka partitions. Lytics hashes identity references into partitions with bounded dedup windows. Snowplow is the outlier here; it reconstructs order more logically through timestamps and storage semantics than through strict ingest routing.
What survived for me from all of that was this: most event systems only need partition-local ordering, not universal ordering.
That sounds obvious once you say it plainly, but it changes what kinds of architectures feel necessary. You do not need one giant ordered stream for the universe. You need all events for the same entity, or whatever unit of correctness you care about, to keep landing in the same lane.
In eventkit, that ended up as simple hash-based routing with identity fallback. Nothing fancy. But the pattern held. Once events for the same person stayed on the same partition, a lot of ordering anxiety collapsed into something smaller and more manageable.
The problem never becomes easy. Hot partitions still exist. Missing identity still exists. Duplicates still exist. But the shape of the problem becomes local enough to reason about.
3. buffering is where throughput, latency, and fear meet
This was the most visceral part of the build.
Batching sounds like a throughput optimization until you actually have to decide where the events sit while they wait.
Memory is fast. Memory is also a lie if the process dies.
All four systems buffer, but they don't all buffer in the same place:
- RudderStack uses in-memory queues before durable writes.
- Snowplow buffers before sinks like S3 or Kinesis.
- PostHog largely delegates buffering to Kafka.
- Lytics layers buffering across the pipeline rather than pretending one buffer can solve every constraint at once.
The architectural pressure underneath all of them is the same. Buffer too aggressively and you widen the crash window. Flush too aggressively and you pay for it in write amplification, latency, or both.
The strongest recurring pattern was dual-trigger batching:
- flush on size
- flush on time
That sounds like a small design choice. It is not. It is one of those boring, durable production patterns because it handles uneven traffic honestly. Busy partitions flush by size. Quiet partitions still flush eventually. Neither throughput nor latency gets to become the only truth.
When I built eventkit, buffering was the place where the system stopped being theoretical. It is one thing to nod along while reading that in-memory events are ephemeral. It is another to stare at your own buffer and realize you do not have a satisfying answer to, "What happens if the process crashes right now?"
That question changed the design immediately.
4. storage follows access patterns, not ideology
The last boundary was storage.
This one gets flattened in a lot of architecture conversations because people want one storage answer. But these systems do not converge on one storage answer. They converge on a storage posture.
RudderStack persists durably, then fans out asynchronously. Snowplow treats object storage as the long-lived archive and warehouses as downstream consumers. PostHog writes into ClickHouse because real-time analytical access is central to what the product is. Lytics separates the storage path that is optimized for analytics from the one optimized for low-latency profile access.
That distinction matters because storage is never just about where bytes go. It is about which questions need fast answers and which ones can tolerate batch delay.
What held across all of them was this: the storage system should match the access pattern it serves.
Object storage is great when you care about durability, replay, and cost. Warehouses are great when the question is analytical and batch-friendly. Specialized low-latency stores make sense when the system needs to answer a user-facing question now, not in fifteen minutes.
The mistake would be to force one storage system to carry all of those constraints at once just because it feels architecturally tidy.
Production systems are less tidy than that. They are more honest.
building eventkit is where the patterns stopped being abstract
Reading got me to a hypothesis. Building got me to the point where the hypothesis had to survive contact with reality.
The first version of eventkit looked exactly like the kind of system a person writes when they understand the shape of a problem but haven't yet been punished by it.
There was an HTTP endpoint. An adapter transformed raw events into a canonical shape. A sequencer hashed events into partitions. A buffer flushed on size and time. Data landed in storage. Tests passed. The architecture diagram looked respectable.
And then the obvious question showed up:
What happens if the process crashes while accepted events are still in memory?
I remember how quickly the answer collapsed. The architecture was fine right up until that moment, and then it wasn't. That is one of the useful things about building after studying: you find out which parts of your understanding were descriptive and which parts were actually structural.
The fix pushed the design toward something much closer to what I'd seen in production. In eventkit's case, that meant a durable ring buffer using SQLite WAL before acknowledgment, with background publication happening afterward.
That change did three things at once:
- it closed the acceptance window where events could vanish silently
- it reduced the amount of work the request path had to do
- it made the acknowledgment boundary mean something real
Then cost showed up.
Firestore was fine for getting the pipeline moving. It was not a storage posture I wanted to carry at higher volume. Once I pushed on the shape of the writes and the cost of the read patterns, object storage plus warehouse loading started to look much more like the grown-up answer. Again, not because some reference architecture said so. Because the pressure from the system said so.
Then observability showed up.
As soon as the system became asynchronous enough to feel realistic, the "I'll just print some logs" phase ended. Structured logs, metrics, queue depth, flush latency, and publication visibility all stopped feeling optional.
That's the part I keep coming back to. Production patterns are not elegant because smart people like elegant systems. They are elegant because the same pressures keep stripping away everything else.
what changed for me
I started this work wanting a clearer backend mental model as a frontend engineer. I ended it with something more practical than that.
I now think about every send() as the start of a long-lived process with four explicit questions attached to it:
- When is this event actually durable?
- What gives it ordering guarantees, if anything?
- Where can it sit temporarily, and what happens if that layer dies?
- What storage path is this event entering, and why that one?
That sounds small, but it changes the kind of debugging questions I ask.
If a dashboard number looks inflated, I think duplicates, replay semantics, or dedup windows before I think "analytics is weird."
If a profile looks incoherent, I think identity routing and validation boundaries before I think "the SDK must have misfired."
If something in the pipeline feels flaky, I try to place the flake: validation, sequencing, buffering, or storage. Once those boundaries are visible, the debugging surface gets a lot smaller.
The build changed something else too. It made me trust comparative study more when it's paired with implementation. Reading four codebases in a row can make patterns feel cleaner than they are. Building gives the patterns weight. It forces you to pay for your ideas.
That loop is the part I'm keeping:
- study the real systems
- notice what converges
- build a smaller version
- let failure tell you what was actually essential
what I think the post is really about
This post started as a question about event collection. It ended up being a question about how production systems reveal themselves.
They don't usually reveal themselves through abstractions. They reveal themselves through pressure.
A production event pipeline has to survive retries, burstiness, missing identity, delayed clocks, crashes, and storage bills. Once you look at enough of them, the common shape becomes hard to miss. Not because all companies copy each other. Because the pressures are real enough that they keep forcing similar boundaries.
That was the part I wanted to understand.
And honestly, as someone who spends most days on the browser side of the line, it feels good to understand it a little better now.
If you want to look at the smaller implementation that came out of this study, eventkit is here.
FAQ
What happens after send() in production event collectors?
Events move through four recurring boundaries: validation, sequencing, buffering, and storage. Different systems implement them differently, but the shape stays consistent because the pressures stay consistent.
Why do production systems separate collection from validation?
Because the HTTP edge has to protect against data loss first. Strict semantic validation is safer downstream, where failures can be inspected, replayed, and corrected without blocking ingestion.
How do event collectors preserve ordering without global coordination?
Most use deterministic routing based on identity. Events for the same user or entity land on the same partition or worker, which processes them sequentially. That gives you partition-local order without trying to globally order the whole system.
Why is buffering such a critical design point?
Because buffering is where throughput, latency, and durability collide. Memory is fast but fragile. Production systems use bounded buffers, dual flush triggers, and durable queues or logs so they can batch efficiently without pretending in-memory state is safe.
Why use tiered storage instead of one storage system for everything?
Because different questions need different answers. Object storage is cheap and durable. Warehouses are good for analytical queries. Low-latency stores make sense for real-time profile or product behavior. Production systems match storage to access patterns instead of forcing one database to do every job badly.
What did building eventkit add beyond reading the systems?
Building forced the patterns to survive failure. It surfaced the same practical constraints the production systems had already solved: crash windows, buffering risk, durability boundaries, and storage cost tradeoffs.