The Event-Driven Architecture

1. Introduction

In the modern age of data, viewing data as a series of real-time streaming events is pivotal, as opposed to considering it something kept in static repositories. At Ingenious, we've embraced this philosophy, building our infrastructure around the powerful event streaming paradigm.

2. The Philosophy Behind Event Systems

Our core belief is that as companies consolidate their data into an event streaming platform, they'll naturally transition to processing these event streams in real-time. This integration of real-time data across various company sectors allows for on-the-fly joining and summarization of data. Whether building a fraud detection system or a ride-sharing platform, modern applications necessitate data to be combined, summarized, and updated continuously as new events are generated.

3. The Influence of The Log

Ingenious's work is heavily inspired by innovations at LinkedIn, with the open-source Kafka system being a standout. In our foundational years around 2017, a particular article (https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) greatly influenced our direction. Amidst the complexities of an n-to-m interaction scheme, where all services were intertwined in a complex mesh, we envisioned Ingenious utilizing a centralized event-log. This log would serve as the neural network for message exchanges between services.

 

 

4. Understanding Events at Ingenious

Our perception of events has been shaped by Martin Fowler's article titled "What do you mean by “Event-Driven”?" (Link). Initial forays into Event-Sourcing led us to conclude that it wasn't the right pattern for us. We didn't want to utilize events merely for notifications. Instead, Ingenious adopted the Event-Carried State Transfer approach. This approach aims to:

"[…] update clients of a system in such a way that they don't need to contact the source system in order to do further work."

Advantages of this pattern include:

  • Enhanced resilience, allowing recipient systems to function even if the primary system is unavailable.

  • Reduced latency, negating the need for remote calls to access customer data.

  • No concerns about system load from numerous consuming systems.

However, there are challenges:

  • The recipient faces increased complexity, managing state and ensuring idempotency, rather than simply requesting more information from the sender.

5. The Importance of Schema

From our inception, it was evident that a rigorous schema was essential for our event exchanges, mirroring our strict adherence to Database-Schemas. Consistency was paramount. We selected the Avro format for its robustness and reliability.

5.1 What is Avro?

Avro is a renowned open-source data serialization system that facilitates data exchange across platforms, languages, and frameworks. It defines a binary format for data and maps it to your chosen programming language.

"Apache Avro™ is the leading serialization format for record data, and the first choice for streaming data pipelines. It offers excellent schema evolution, and boasts implementations in numerous languages." - Avro Official

For a deeper dive into Avro's benefits, refer to this article: Avro and Kafka Data.

5.2 Schema Registry

To manage our schemas, we employ the Confluent Schema Registry.

What Is Schema Registry?

Schema Registry is an external server process separate from the Kafka brokers. It's tasked with maintaining a database of schemas written into topics within its cluster. This database is stored in an internal Kafka topic and cached in the Schema Registry for swift access. It can run in a redundant, high-availability configuration to ensure uptime even if one instance fails.

6. Kafka and Its Implementation at Ingenious

We began our journey with Kafka in late 2017, establishing principles that still hold true in 2022:

  • Information Exchange: Information exchange between products is facilitated using Kafka-events that publish object-state.

  • Change Data Capturing (CDC): Change Data Capturing techniques are pivotal for Ingenious. CDC harnesses the duality principle of "Tables and Events." Essentially, while tables represent the state of data at a specific moment, events depict the transitions leading to that state. Over time, by replaying these events, one can reconstruct the table or state. This concept underscores the importance of capturing changes as events, which can later be used to recreate or synchronize states across systems.

  • Outbox-Pattern: We utilize the Outbox-Pattern (Link), table triggers to write logs, and a custom Java library to follow the Log-Stream and produce Kafka events.

7. Advantages of Using a Log for Data Integration Between Services

Utilizing an event-log for data integration between different services offers numerous compelling benefits:

  1. Unified Data Flow: An event-log provides a centralized and consistent mechanism for data flow across multiple services, ensuring that all services are consuming data in a unified manner.

  2. Temporal Data Storage: Logs store events sequentially based on time, allowing for chronological processing and analysis. This time-ordered sequence is beneficial for understanding the evolution of data and system states.

  3. Immutability: Once data is written to the log, it cannot be changed, ensuring that the historical record remains consistent. This immutability guarantees the integrity of data and provides a reliable source of truth.

  4. Decoupled Services: Logs enable services to be decoupled from each other. Producers write data to the log, and consumers read from it. This separation ensures that services can evolve independently without impacting others.

  5. Fault Tolerance and Recovery: With logs, if a service fails, it can recover by reprocessing events from the log. This capability ensures system resilience and reduces downtime.

  6. Scalability: Logs can be distributed across multiple servers or clusters, allowing systems to scale horizontally and handle vast amounts of data efficiently.

  7. Simplified Data Integration: Logs provide a standardized way to integrate data across heterogeneous systems, reducing the complexity traditionally associated with ETL (Extract, Transform, Load) processes.

  8. Real-time Data Sync: Logs enable real-time data synchronization across services. As soon as data is written to the log, it's available for other services to consume, ensuring that all parts of the system are updated promptly.

  9. Audit and Compliance: The sequential nature of logs makes it easier to audit changes, track data lineage, and ensure compliance with data governance policies.