The advances in PaaS and APaaS offerings from all the major cloud services providers have enabled rapid adoption of the microservices architecture paradigm by SaaS vendors big and small. Furthermore, application of Domain Driven Design methodology has ensured that microservices are well aligned with business models decomposed in to well defined bounded contexts. This has also allowed SaaS vendors and customers alike to assemble complimentary services from different providers to be integrated to support new business models and/or value added services. However, most vendors still continue to base their pricing strategies on a standard multi-tiered per user subscription model.
The advances in per use pricing models based on the serverless deployment mechanisms being provided by the cloud providers (AWS Lambda, Azure Functions, etc.) are certainly helping SaaS vendors to reduce their fixed infrastructure costs and pass on the benefits to the customers; but the complexities involved in tracking per subscriber use of shared resources even at the function level are formidable enough to tackle without some further tweaks to the serverless deployment options for microservices. These complexities further increase significantly when you consider deployments wherein multiple microservices are deployed in to shared containers managed through platforms such as Kubernetes or APaaS offerings such as Pivotal Cloud Foundry.
To address these complexities we introduce this concept of “Transactional Infrastructure”. It allows for the well understood best practices in designing multi-tenancy systems to be incorporated in to the architecture of cloud native applications based on microservices. We shall illustrate this concept in the context of a hypothetical BIaaS product offering hosted in AWS.
The figure above illustrates the typical components that would be involved in a BIaaS product offering. For the sake of this article we will ignore accessibility challenges with the source data systems by assuming that a simple VPN tunnel is sufficient to gain pull access to the source data stores. A push based mechanism would need a software appliance to be deployed behind the corporate firewall and as such the computing resources required for the same would be the sole responsibility of the customer.
For the purpose of this article, we define a transaction as a distinct outcome that would deliver tangible value to a business user who would subscribe to this service. Consequently, we shall ignore exploratory work associated with the initial configuration work required to onboard a new customer on to the service which would be charged to the customer as a one-time on boarding fee. We can also possibly ignore subsequent configuration changes and also ad-hoc queries performed by the customer although the later could still be brought within the ambit of a transaction. Thus, a transaction would be a repeatable process that would be executed either at scheduled times or on-demand by any authorised user of the subscribing organisation or an individual.
A typical implementation of such a system using microservices deployed as Lambda functions on AWS might be as depicted in the figure below.
The use of the AWS Simple Workflow Service (SWF) is used to indicate an imperative style of programming implemented to support asynchronous loosely coupled execution. AWS RedShift is proposed as the OLAP data store although other offerings from AWS such as DynamoDB or Relational Data Store (RDS) can also be considered. The AWS Secured Simple Storage Service (Secure S3) is proposed to address security concerns associated with a multi-tenancy BLOB store. It is conceivable that instead of S3 the extracted data could directly be uploaded to a staging area in RedShift for further processing. The auxiliary AWS services such as IAM, Cognito, CloudWatch, SNS, and SES are proposed to address other common multi-tenancy concerns and notification requirements.
While this architecture does support asynchronous loosely coupled execution, the imperative style of implementing business logic favours direct synchronous calls amongst the various microservices with the workflow engine providing the overall orchestration function.
Alternatively, a truly asynchronous execution model can be implemented by adopting an event driven streams based architecture by replacing the SWF service with a queue management service such as AWS Simple Queueing Service (SQS) or Kafka (https://kafka.apache.org/). This is illustrated in the figure below.
A detailed discussion around the use of event driven architecture and reactive programming techniques is beyond the scope of this article and interested parties are requested to read other articles on the topic, some of which are listed below:
https://www.infoq.com/news/2017/11/event-sourcing-microservices
https://www.infoq.com/news/2015/06/ddd-events-microservices
https://www.infoq.com/presentations/cloud-native-kafka-netflix
https://www.infoq.com/presentations/reactive-ddd-distributed-systems
As would be apparent from these articles, the event driven implementation architecture and reactive programming techniques are the solution of choice to optimise resource utilisation and as we shall further see are more suitable for incorporating instrumentation to enable tracking of per transaction resource usage. Hence, in the rest of this article we shall focus on the event driven architecture although most of the techniques would apply equally to the former.
A trivial approach to accommodating multi-tenancy in a microservices based architecture would be to create a complete deployment stack image and simply deploy the same across different AWS accounts each dedicated to a single subscribing user and/or organisation. The AWS CloudFormation, OpsWorks and CodeDeploy services can be leveraged appropriately to support this deployment strategy across computing resources available as EC2 instances, Elastic Container Service (ECS), and Lambda functions combined with various storage and other services. The resource consumption can then be easily tracked at a per subscriber level and can be billed at cost plus managed services overheads. However, this will require certain fixed capacity to be reserved for each subscriber which cannot be leveraged for other subscribers. Thus this strategy is not sufficient to serve as a market differentiator for a SaaS vendor.
On the other hand, if all the components are deployed within a single AWS account all resources can be optimally leveraged across all available load at any given time and thus will help the SaaS vendor to minimise their infrastructure costs. However, no suitable instrumentation services are available at the moment to help the vendor track resource utilisation even at a per subscriber level let alone at a per transaction level. Services such as AWS CloudWatch provide instrumentation at a very coarse level and other fine grained monitoring services such as AWS X-Ray or Zipkin (https://zipkin.io/) are primarily distributed tracing and performance monitoring mechanisms that are not equipped to handle transaction context and/or resource utilisation.
The challenges outlined above can be addressed by incorporating the transaction context as a first degree concern within the microservices architecture and by extending the deployment automation mechanisms by incorporating them in to add on SaaS vendor specific deployment services using low level Infrastructure-as-Code APIs exposed by all the cloud services providers. These vendor specific deployment services will automatically inject the subscriber and transaction context in to the deployment metadata so that every invocation of the business services will be able to capture this context within the event data as well as trace logs. A “transaction analysis” service can then be implemented to scan and analyse the events being generated and processed by the business services to determine the resource utilisation and thus the costs associated with each subscriber account and/or the specific transactions of interest. The resulting data can then be passed along to accounting systems to assist with customer invoicing and reconciliation taking in to account specific SLAs and usage tier discounts that the customer may have signed up for. The figure below illustrates the resulting “transactional infrastructure” architecture.
As a part of the customer on boarding process the appropriate Deployment functions are invoked for creating subscriber specific Secure S3 buckets and other appropriate “Subscription” context incorporating user identities federated using AWS Cognito. When a transaction is initiated by a specific user this subscription context is utilised appropriately, e.g. for an extract transaction the subscription context will be utilised to choose the appropriate S3 bucket in to which the data is to be staged. Subsequently, all events generated will be enriched by the Event Enrichment functions to embed the subscription context. The processing of these enriched events will in turn embed the subscription context in to the trace logs captured in X-Ray. The Event Analysis functions will then finally scan all the events on a continual or periodic basis and use the embedded subscription context to extract corresponding trace logs from X-Ray so as to generate usage data aggregated to one minute time scale for compute resources and one GB scale for storage resources. Finally, this resource usage data can be exported to accounting systems for invoicing purposes.
Thus, the proposed “Transactional Infrastructure” paradigm can help SaaS vendors and even enterprises get detailed insights in to their resource utilisation costs and appropriate the same for gaining competitive advantages and/or internal efficiencies thereby generating contributions to the top and the bottom lines.