Microservices workflow orchestration

A recurring pattern in software architecture is the need to trigger a process or workflow that is implemented across multiple microservices and then report to the user the results when the process completes.

In a previous project, I faced this issue when building a SaaS application in the Intelligent Document Processing (IDP) space. The application was supposed to take a collection of scanned pages, split it in documents, and for each document perform several document understanding tasks. There is a mix of per-page-bundle, per-page and per-document processing steps.

Given the desire to develop each step independently and be able to scale the processing independently (e.g. page OCR consumes more resources than other tasks) I designed a system around a message bus (RabbitMQ) and individual workers that pull requests from message queues.

Unfortunately there aren’t a whole lot of easy to use solutions available for this type of design. Googling for “rabbitmq workflow orchestration” the most helpful link I get is for an article that recommends the use of BPMN for this type of design. That is rather centered in the Java ecosystem. For my use case I needed something that worked well in python and would be preferably language agnostic. I ended up building a custom solution for this company.

However as I have design conversations around system architecture topics I do often end up seeing scenarios where a similar tool would be desirable. That motivated me to start working on a new workflow orchestration project. It is still a work in progress but it is able to execute the kind of workflows that I’ve encountered in the past.

Workflows can be declared in simple yaml syntax as a set of steps with dependencies; sub-tasks can be triggered (as in my original requirement of processing a collection of pages, per page steps and documents) and workflows can mix services programmed in different programming languages.

There are existing open source orchestration tools to manage batch workflows such as Apache Airflow and Argo Workflows. In these systems, for each batch job, a new process instance is created and passed command line arguments that specify the workflow parameters.

This project provides similar functionality for online micro-services. For instance, in machine learning use cases it is common that loading the inference process takes in the order of 30s – 1m while processing a single user request is an operation in the order of 10-100ms. A system that is designed to process 10s or 100s of requests per minute can’t afford to use a batch approach for this type of system.

Instead all micro-services are pre-loaded and managed as standard online services, except that instead of receiving REST operations from a load-balancer, they receive requests and post responses from/to an AMQP message queue. This is done so that the logic of determining the next step in the workflow does not have to be distributed in the individual services. Debugging is also simplified as the workflow-manager is tracking the state of the user request.

I’m looking forward to getting some feedback. Drop me a line if you think it can be useful to any problem you are working on or have feature requests.