Integrating data and logic with next-generation distributed caching

Author: Dr. William Bain, CEO, ScaleOut Software.

Modern enterprise applications are under constant pressure to respond instantly, scale seamlessly, and deliver reliable results. From retail and financial services to transportation and logistics, organisations rely increasingly on real-time insights to drive rapid decision-making. However, databases alone and cloud-based services, like serverless functions, that access them were not designed to handle the high volume of data access created by large workloads. These systems struggle with both throughput and latency.

For the past two decades, distributed caches, also known as in-memory data grids, have been used to address such challenges. By keeping fast-changing data in memory and distributing it in a cluster of physical or virtual servers, they have dramatically reduced access latency and offloaded databases.

Why today’s architectures fall short

While highly effective for more than two decades, distributed caches have their limitations. By treating stored data as opaque binary large objects (BLOBs), they can incur increasingly high access latency and begin to stress network architectures as workloads and object sizes grow. Cloud-based serverless functions cannot easily incorporate distributed, in-memory data caching into their event-driven architectures.

To address the overhead imposed by BLOB storage, distributed caches have evolved into “data structure stores,” which access objects with APIs that perform specific actions implemented by the distributed cache. For example, cached objects might hold hash tables or sorted sets. The approach streamlines access and boosts application performance.

However, current data structure stores have limitations that prevent application developers from taking full advantage of the concept. Because they offer only a limited set of built-in data structures, they can’t handle specific use cases, such as implementing an object that holds a mortgage application. Adding new data structures can be difficult, and developers need to typically either write extensions in a scripting language like Lua or code and link extensions into the cache service using C. These techniques can be complex, hard to maintain, and insecure.

Turning a distributed cache into an active engine

To addresses the limitations of today’s data structure stores, ScaleOut Software has recently introduced ScaleOut Active Caching, an extension to its distributed caching infrastructure that embeds and executes application code. It lets developers deploy application-defined modules – data structures and the code that manages them – into ScaleOut’s distributed cache. The modules boost application performance, offload clients, and reduce network overhead. They also let a distributed cache take over the role of serverless functions in processing event messages.

An API module lets developers build and deploy custom, strongly-typed data structures written in C# or Java. API modules extend the concept of built-in data structures to embed application-specific code. They customise cache accesses to meet specific business needs by migrating this functionality from clients into the distributed cache. Because these modules run on all cache servers, they automatically scale performance and eliminate unnecessary data motion. API modules run in separate processes from the cache service to provide isolation and increased security.

A second module type, a message module, also deploys application-specific data structures and code, and like API modules, they access and update cached objects. However, instead of fielding API calls from client applications, they ingest and process messages. They connect to messaging hubs, like Kafka, AWS SQS, or a built-in REST service to receive messages from other services as part of an event-driven architecture.

When used in the cloud, message modules can replace serverless functions by handling incoming messages directly in the distributed cache. They reduce delays in accessing live data and avoid the need to access a persistence store for every message. The distributed cache can integrate with a variety of persistence stores, like Dynamo DB and Cosmos DB, to automatically retrieve and update stored data. Message modules also solve the problem of synchronising access to a persistence store by multiple serverless functions.

Active caching in action

The value of ScaleOut Active Caching becomes clear when it is applied to scenarios that require both scale and business agility.

Consider an e-commerce company specialising in apparel. Using API modules, the company can deploy custom logic to the distributed cache which improves the customer experience and provides live business insights. E-commerce companies typically store shopping carts in the cache to keep their sites fast under heavy workloads. Instead of treating shopping carts as generic objects, the apparel company can now enrich its shopping cart logic with specialised information, like garment types, materials, styles, etc. This data can help the company calculate which clothing categories are trending by region or season, track the performance of active promotions, and generate personalised product recommendations based on each shopper’s browsing behaviour. The result is immediate feedback for shoppers and the business.

Message modules benefit industries which have to process thousands of events every second and maintain high responsiveness. For example, airlines use event processing to track the countless operations that keep their complex systems operational. When unexpected conditions arise, like flight cancellations due to weather, the volume of events can increase quickly. By using message modules to manage flight and passenger objects, airlines can use the speed and scalability of distributed caching to rebook passengers efficiently while automatically persisting changes. This eliminates the overhead and complexity of using serverless functions, which lack fast, in-memory storage and must compete to access persistent data stores.

Final thoughts

Having evolved from passive storage for fast-changing data into an active, intelligent infrastructure that powers the next generation of live applications, distributed caching has now reached an inflection point. By bringing application data and logic together, ScaleOut Active Caching accelerates performance and enables developers to build scalable systems tailored to their needs. Now available in ScaleOut Product Suite Version 6, it introduces a powerful new way to harness the power of distributed caching.

Author: Dr. William Bain, CEO, ScaleOut Software.

Image source: Unsplash

Integrating data and logic with next-generation distributed caching

Why today’s architectures fall short

Turning a distributed cache into an active engine

Active caching in action

Final thoughts

About Sparklex

Sparklex Technologies leverages the latest technologies to create bespoke solutions tailored to meet the unique needs of each client.

Quick Links

Policies