Is BPM Dead, Long Live Microservices?

With the massive uptake of Microservices Architecture -industry wide- and with it, the adoption of patterns such as Event Sourcing, CQRS and Saga as the means for Microservices to asynchronously communicate with each and effectively "choreograph" business processes, it might seem as if the days of process orchestration using BPM engines (e.g. Oracle Process Cloud now also part of Oracle Integration Cloud, Pega, Appian, etc) or BPEL (or BPEL-like) engines are over.

Although the use of choreography and associated patterns (such as the aforementioned) makes tons of sense in many use cases, I've come across a number of them where choreography can be impractical.

Some examples:
  • Data needs to be collected and aggregated from multiple services -e.g. check the Composition pattern. Note that this pattern doesn't necessarily implies that an orchestration is required. Could be that data is collected and aggregated (not transformed) into a single response. But if data collected from multiple sources needs to also be transformed into a common response payload, then it feels pretty close to one of the typical use cases for orchestration.
  • The process is human-centric and can't be fully automated. Basically at some point a human has to take an action in other for the process to complete (e.g. approval of a credit card application, or a credit check) -BPM/Orchestration tools tend to be quite good at this.
  • There is a need to have very clear visibility of the end to end business processes. In traditional BPM tools, this is fairly straight forward, with Choreography / Events, although possible to monitor individual events, a form of correlation would be required to build an end to end view on the status of a business process.
It was "perhaps" for some of these reasons, that Netflix developed their own process orchestration engine called Netflix Conductor (now also open sourced). Their reason for developing this tool, with their own words:

"With peer to peer task choreography, we found it was harder to scale with growing business needs and complexities" --read this link for the complete article.

And it's not just Netflix, the like sof Camunda, Zeebe, Baker, seem to have spotted the need for such microservices oriented process engines, and thus their solutions fits well with this architectural style.

This is also one of the reasons why has the concept of semi-decoupled services, in other words, a service that's not entirely independent either because it runs on a share runtime or because it conducts an orchestration and therefore has runtime-coupling to other services. This compared to a fully-decoupled service that only implements choreography to interact with other services (aka Microservice).

Sample Use Case
In order to better illustrate what's being said, take the following sample use case:
  • A simple Credit Check process that determines if a customer credit score is adequate or not for a given transaction
  • In certain scenarios (e.g. just above threshold credit score), a manual human intervention is required to accept or reject an application.
  • The process can be implemented in a number of ways:
    1. As an orchestrated (synchronous) business process
    2. As a choreographed (asynchronous) business process - no process engine
    3. Choreographed but with process engine
let's have a deeper look:

1) Orchestrated (Synchronous) Credit Check Process
An orchestrated business process implemented with a traditional process engine tool of choice and synchronous as both request/reply would be within the same HTTP thread. As it's notable in the diagram, this is not really dramatically different from a traditional SOA architecture. The process could be a BPMN 2.0 engine or a BPEL orchestration tool (as many support human workflows).

Main advantage of this approach is that process metrics are clearly visible end to end plus it would be fairly straight forward to implement, including important capabilities such as exception and compensation handling. However, performance and scalability wise, as all HTTP requests are synchronous, if threads can't be served rapidly they could accumulate becoming a bottle neck (e.g. hang threads).

It's worth also noting these advantages are true so long that a process engine is already available for use, and it supports REST as entry point to trigger a process. Shall this not be the case, the effort involved in standing up new infrastructure would probably counter the benefits and other options might be more viable.

2) Choreographed (Asynchronous) Credit Check Process –no process engine
This alternative would be a fully choreographed thus asynchronous business process, purely based on a Microservices Architecture. No process engine used in this option. Services are completely independent and only communicate to each other using events via an event hub.

The main benefit of this option is the flexibility, extensibility and adaptability it delivers. Because all interactions would be via events, services are decoupled from one another, so therefore can be developed, deployed, tested and scaled independently. furthermore, this architecture, if done well, could scale to handle very large throughput as each service can independently scale.

However this approach would be complex to implement given the increased number of services and events. Coordinating the sequence of actions > events requires careful modelling. Also getting end to end visibility of the business process wouldn't be straight forward unless additional tolling or custom solutions are adopted. In addition, for the human workflow bits, a custom additional web application would have to be developed, no simple task if the approval workflows are completed (at this point the question would come: why reinvent the wheel when process orchestration engine do this out of the box and quite well?).

Lastly, it's also worth noting that in order to be able to effectively "call back" to the consumer application, some sort of callback handler implemented for example using Websockets or Server-Side Events would be required. This isn't necessarily simple and therefore would add to the complexity.

3) Choreographed (Asynchronous) Credit Check Process –with process engine
This option can be seen as the best of both worlds. Event Sourcing is still adopted as a pattern, however instead of having only services react to events in order to accomplished all desired process steps, a business process implemented in a process engine is adopted such as it can execute all desired steps by publishing or subscribing to the relevant steps. Using modern process engines such as the aforementioned, the process it self could either be compact enough to be deployed to its own runtime meaning it would effectively be Microservice in its own right, or like it's the case in Netflix Conductor, multiple (work) microservices could independently interact (via events) by interacting with Conductor's worklist and queue services.

Alternatively only a portion of the process could be implemented in the process engine e.g. the Human Workflow bit given that this feature alone would safe considerable effort if the tool does it out of the box.  Key to get this right, is to ensure that the process itself can be map to a single bounded context, and doesn't multiple ones as that would break one of the main principles of domain-driven design, fundamental in Microservices Architectures. As it was well stated by Bernd Rücker from Camunda (another great BPM engine that aligns well to Microservices Architectures), a much better way to define business processes is to "cut the end-to-end process into appropriate pieces which fit into the bounded contexts" and therefore aligning well to Microservices Architectures.

There are many benefits in this approach. For starting, it would be simpler (thought not simple) to implement to the previous option -but not as scalable and flexible. However because better visibility of the business process analytics would be available, it would perhaps compensate for the drawbacks.  In addition to the fact that be-spoking a custom web app for approvals wouldn't be required, certainly an option to consider.

Lastly, as as per previous approach, a call back handler would also be required in this approach.

Comparing the 3 approaches
The following table makes it simpler to visualise the pros/cons of each option:

Orchestrated (Synchronous) Credit Check Process
Choreographed (Asynchronous) Credit Check Process –no process engine
Choreographed (Asynchronous) Credit Check Process –with process engine
(++) Less complex. Fairly straight forward to implement. Known pattern.
(--) Complex to implement (many moving pieces) with additional technologies and considerations. Increased number of services and events to handle + plus a custom web application for human workflow.
(+-) Simpler not simple specially when compared to a purely choreographed process.
Scalability (ability to scale and handle high-throughout)
(--) Process can become the bottle neck if many parallel threads need to be handled.
(++) Very scalable and can handle large throughputs as each service can scale fully independently. Fully decoupled architecture.
(+-) Even though Services can scale independently, process ”could” become a bottleneck (depending heavily on what process engine is used). However because process is asynchronous it could handle more parallel threads than a synchronous option.
Visibility of end to end process
(++) Process can be monitored end to end. Which can be very useful from a business standpoint.
(+-) Visibility of an end to end process not as straight but still possible if right tooling is used.
(+) Process can be monitored almost end to end. Which can be very useful from a business standpoint.
Flexibility (ability to independently change/deploy components without affecting others)
(+-) Any change to an API consumed by the process will directly impact the process itself. This could be avoided by adding a virtualisation layer in between, but would result in additional complexity.
(++) Very flexible. Runtime decoupling via events. Almost all components can be evolved without impacting others (provided events are kept consistent)
(+) Good flexibility. Runtime decoupling via events. Almost all components can be evolved without impacting others (provided events are kept consistent) however majority of process still dependent in central process engine.

There are no silver bullets. No exceptions in this case. However once again, I think Netflix with its Conductor "microservice" orchestrator is changing the ball game on what we thought would be acceptable in a Microservices Architecture.

That said and answering the question in the title of this article, I don't think the days of BPM / process orchestration are dead per say. What I do think though, is that the way process orchestrations are implemented and processes modelled should (and probably will) change to to be more microservices / event oriented therefore be able to take part in a Choreography.

However it is equally important that the anatomy of the underlying architecture of more traditional process engines also change (evolve) not just to cope much higher levels of throughput but also to support the distributed deployment models prevalent in distributed system and microservices. In addition it would be desirable specific processes could be packaged and deployed within its own independent runtime and within its own bounded context.  On the meantime though, I hope that some of the approaches described in this article provides some inspiration on how to try and combine two paradigms that until recently (at least to me) seemed completely incompatible.

Lastly, I would like to thank Lucas Jellema, Lonneke Dickmans and specially Guido Schmutz and Sven Bernhardt for their valuable contributions to this article.