Apache Kafka provides a log-structured stream data structure, called a "topic," which it stores durably across the different nodes in the cluster. These topics can be partitioned and replicated, aiding in multi-tenant style scenarios, as well as providing recovery in the face of failure. We built an application that ingests data from a REST API representing basic financial ledger information and produces several derived reports from the raw events. For example, ledger account events are persisted in one topic, while ledger entries (credits and debits) are persisted in another. The original entries are durably stored by Kafka, and we can then stream these entries, join them with their referenced accounts, and produce updated balances nearly instantly when new entries are recorded.
Kafka provides state stores that allow our application to provide the latest balances for accounts on-demand in response to a normal GET request. Additionally, we built a WebSocket integration that will push batches of updates to subscribed clients, such as a demo web application, so that balance changes can be observed without requiring user interaction or page refreshes. The same model could be used to integrate with mobile push notification systems to power Android or iOS applications as well.
In addition to simple map/filter/reduce operations, Kafka Streams also has sophisticated time-windowing operators. Our demo application also produces an "aged receivable report" showing 30-day buckets of the Accounts Receivable balance based on the original general ledger entries. Rather than recalculating these buckets on demand when a report is requested, Kafka tracks the time windows for us in 1-day intervals, recording the ledger entries into the appropriate bucket. Then, when our application asks the state store for the latest values, we pick the disjoint windows our user requests. Additionally, Kafka supports automatic retention periods so that we do not have to manually remove old windows (past a 120-day lookback period, for example).
There are tradeoffs with Kafka. Initially, we experimented with an application that could perform real-time analysis of invoice payables versus contract agreements. However, allowing a change on either side (the payable or the agreement) and updating the derived records is not yet well supported by Kafka. In this case, a traditional OLAP store may work better. Additionally, Kafka Streams are geared towards pre-defined reports. State stores are simple key-value stores and do not support advanced filtering beyond simple key range queries. Thus, ad-hoc or exploratory reporting is still best supported by traditional database systems.
HOW IT WAS DONE
- Kafka cluster was deployed using Docker
- Kafka topics for financial ledger modelling were created within local cluster
- Spring Boot application was developed that emitted events on Kafka topics in response to REST API calls
- Kafka Streams topology was created to produce aggregate outputs to topics and state stores
- Spring Boot application was updated to stream output topics over WebSockets to the browser, and to allow querying via REST to state stores