Transactional Infrastructure – Next generation cloud native architecture to enable innovative pricing models for SaaS products

The advances in PaaS and APaaS offerings from all the major cloud services providers have enabled rapid adoption of the microservices architecture paradigm by SaaS vendors big and small. Furthermore, application of Domain Driven Design methodology has ensured that microservices are well aligned with business models decomposed in to well defined bounded contexts. This has also allowed SaaS vendors and customers alike to assemble complimentary services from different providers to be integrated to support new business models and/or value added services. However, most vendors still continue to base their pricing strategies on a standard multi-tiered per user subscription model.

The advances in per use pricing models based on the serverless deployment mechanisms being provided by the cloud providers (AWS Lambda, Azure Functions, etc.) are certainly helping SaaS vendors to reduce their fixed infrastructure costs and pass on the benefits to the customers; but the complexities involved in tracking per subscriber use of shared resources even at the function level are formidable enough to tackle without some further tweaks to the serverless deployment options for microservices. These complexities further increase significantly when you consider deployments wherein multiple microservices are deployed in to shared containers managed through platforms such as Kubernetes or APaaS offerings such as Pivotal Cloud Foundry.

To address these complexities we introduce this concept of “Transactional Infrastructure”. It allows for the well understood best practices in designing multi-tenancy systems to be incorporated in to the architecture of cloud native applications based on microservices. We shall illustrate this concept in the context of a hypothetical BIaaS product offering hosted in AWS.

BIaaS System Architecture

The figure above illustrates the typical components that would be involved in a BIaaS product offering. For the sake of this article we will ignore accessibility challenges with the source data systems by assuming that a simple VPN tunnel is sufficient to gain pull access to the source data stores. A push based mechanism would need a software appliance to be deployed behind the corporate firewall and as such the computing resources required for the same would be the sole responsibility of the customer.

For the purpose of this article, we define a transaction as a distinct outcome that would deliver tangible value to a business user who would subscribe to this service. Consequently, we shall ignore exploratory work associated with the initial configuration work required to onboard a new customer on to the service which would be charged to the customer as a one-time on boarding fee. We can also possibly ignore subsequent configuration changes and also ad-hoc queries performed by the customer although the later could still be brought within the ambit of a transaction. Thus, a transaction would be a repeatable process that would be executed either at scheduled times or on-demand by any authorised user of the subscribing organisation or an individual. 

A typical implementation of such a system using microservices deployed as Lambda functions on AWS might be as depicted in the figure below.

BIaaS on AWS with SWF

The use of the AWS Simple Workflow Service (SWF) is used to indicate an imperative style of programming implemented to support asynchronous loosely coupled execution. AWS RedShift is proposed as the OLAP data store although other offerings from AWS such as DynamoDB or Relational Data Store (RDS) can also be considered. The AWS Secured Simple Storage Service (Secure S3) is proposed to address security concerns associated with a multi-tenancy BLOB store. It is conceivable that instead of S3 the extracted data could directly be uploaded to a staging area in RedShift for further processing. The auxiliary AWS services such as IAM, Cognito, CloudWatch, SNS, and SES are proposed to address other common multi-tenancy concerns and notification requirements.

While this architecture does support asynchronous loosely coupled execution, the imperative style of implementing business logic favours direct synchronous calls amongst the various microservices with the workflow engine providing the overall orchestration function.

Alternatively, a truly asynchronous execution model can be implemented by adopting an event driven streams based architecture by replacing the SWF service with a queue management service such as AWS Simple Queueing Service (SQS) or Kafka (https://kafka.apache.org/). This is illustrated in the figure below.

BIaaS on AWS with SQS

A detailed discussion around the use of event driven architecture and reactive programming techniques is beyond the scope of this article and interested parties are requested to read other articles on the topic, some of which are listed below:

https://www.infoq.com/news/2017/11/event-sourcing-microservices
https://www.infoq.com/news/2015/06/ddd-events-microservices
https://www.infoq.com/presentations/cloud-native-kafka-netflix
https://www.infoq.com/presentations/reactive-ddd-distributed-systems

As would be apparent from these articles, the event driven implementation architecture and reactive programming techniques are the solution of choice to optimise resource utilisation and as we shall further see are more suitable for incorporating instrumentation to enable tracking of per transaction resource usage. Hence, in the rest of this article we shall focus on the event driven architecture although most of the techniques would apply equally to the former.

A trivial approach to accommodating multi-tenancy in a microservices based architecture would be to create a complete deployment stack image and simply deploy the same across different AWS accounts each dedicated to a single subscribing user and/or organisation. The AWS CloudFormation, OpsWorks and CodeDeploy services can be leveraged appropriately to support this deployment strategy across computing resources available as EC2 instances, Elastic Container Service (ECS), and Lambda functions combined with various storage and other services. The resource consumption can then be easily tracked at a per subscriber level and can be billed at cost plus managed services overheads. However, this will require certain fixed capacity to be reserved for each subscriber which cannot be leveraged for other subscribers. Thus this strategy is not sufficient to serve as a market differentiator for a SaaS vendor.

On the other hand, if all the components are deployed within a single AWS account all resources can be optimally leveraged across all available load at any given time and thus will help the SaaS vendor to minimise their infrastructure costs. However, no suitable instrumentation services are available at the moment to help the vendor track resource utilisation even at a per subscriber level let alone at a per transaction level. Services such as AWS CloudWatch provide instrumentation at a very coarse level and other fine grained monitoring services such as AWS X-Ray or Zipkin (https://zipkin.io/) are primarily distributed tracing and performance monitoring mechanisms that are not equipped to handle transaction context and/or resource utilisation.

The challenges outlined above can be addressed by incorporating the transaction context as a first degree concern within the microservices architecture and by extending the deployment automation mechanisms by incorporating them in to add on SaaS vendor specific deployment services using low level Infrastructure-as-Code APIs exposed by all the cloud services providers. These vendor specific deployment services will automatically inject the subscriber and transaction context in to the deployment metadata so that every invocation of the business services will be able to capture this context within the event data as well as trace logs. A “transaction analysis” service can then be implemented to scan and analyse the events being generated and processed by the business services to determine the resource utilisation and thus the costs associated with each subscriber account and/or the specific transactions of interest. The resulting data can then be passed along to accounting systems to assist with customer invoicing and reconciliation taking in to account specific SLAs and usage tier discounts that the customer may have signed up for. The figure below illustrates the resulting “transactional infrastructure” architecture.

BIaaS on AWS with Trans Infra

As a part of the customer on boarding process the appropriate Deployment functions are invoked for creating subscriber specific Secure S3 buckets and other appropriate “Subscription” context incorporating user identities federated using AWS Cognito. When a transaction is initiated by a specific user this subscription context is utilised appropriately, e.g. for an extract transaction the subscription context will be utilised to choose the appropriate S3 bucket in to which the data is to be staged. Subsequently, all events generated will be enriched by the Event Enrichment functions to embed the subscription context. The processing of these enriched events will in turn embed the subscription context in to the trace logs captured in X-Ray. The Event Analysis functions will then finally scan all the events on a continual or periodic basis and use the embedded subscription context to extract corresponding trace logs from X-Ray so as to generate usage data aggregated to one minute time scale for compute resources and one GB scale for storage resources. Finally, this resource usage data can be exported to accounting systems for invoicing purposes.

Thus, the proposed “Transactional Infrastructure” paradigm can help SaaS vendors and even enterprises get detailed insights in to their resource utilisation costs and appropriate the same for gaining competitive advantages and/or internal efficiencies thereby generating contributions to the top and the bottom lines.

Advertisement

To pair or not to pair: How to improve the productivity of a distributed Agile team?

In the context of industry Productivity is defined as the effectiveness of effort as measured in terms of output per unit of input. Clearly, productivity is a key metric that every business venture aims to improve upon and hence it is essential to choose an appropriate production methodology that helps a company optimize productivity. For the purpose of this article, in the context of the software development industry, we shall define the per unit of input to be one person hour of effort by a member of a development team. Thus we shall exclude all the other overheads  such as executive management, marketing, sales, support, and the contributions of the administrative staff.

All commercially successful products in the market today rely on geographically distributed design and manufacturing teams. So it is no surprise that the software product industry has also been quick to adopt the distributed model for pretty much the same reasons as the rest of the product manufacturing industry. The challenge, however, is that due to its ephemeral nature, a software product does not lend itself to a typical lifecycle of market analysis, design, development, quality assurance, distribution, sales, and support. This has led to the success of methodologies such as Agile that aim to compress the development lifecycle by requiring the members of a development team to collaborate very closely on all aspects of the product. This high degree of collaboration required across a distributed team not only complicates the measurement of the productivity metric but also requires adoption of unique strategies to improve upon the same.

A brief history of Software Development Methodologies

In the early days of modern computing software development was very closely coupled to the associated hardware on which it was intended to run. This allowed the development team to make a fair number of assumptions about the operating environment such as the operating system and available resources. This rigidity allowed for very formalized product development methodologies to be implemented rather successfully since the expected use and behavior of a product could be predicted with a high degree of accuracy. However, with the advent of the Internet followed by the World Wide Web, software has increasingly been decoupled from the hardware platform and this has led to a drastic change in how a software product is defined let alone how it is designed and implemented. The growing role of mobile devices in our daily lives has further changed the software development process.

The Agile software development methodology evolved out of this need to address the unpredictability of how the requirements for a software product change during its lifecycle from inception to design to implementation. It aims to avoid analysis paralysis and minimize waste of development effort by empowering the development team to start building working versions of the product using the shortest possible iteration cycles typically no more than two calendar weeks in duration. The methodology as it is known today is an amalgamation of a number of methodologies that preceded it unified by the Agile Manifesto published in 2001:

“We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over Processes and tools

Working software over Comprehensive documentation

Customer collaborationover Contract negotiation

Responding to change over Following a plan

That is, while there is value in the items on the right, we value the items on the left more. – Beck, Kent; et al. (2001). “Manifesto for Agile Software Development”. Agile Alliance.

Of the four manifesto items we shall focus on the first two items in the context of evaluating and improving productivity.

Pair Programming – Pros and Cons

By choosing to emphasize on individuals and interactions over processes and tools, the adoption of Agile requires that the team members be highly motivated allowing the team itself to be self-organizing. More importantly collaboration between team members should be as efficient as possible thereby promoting the practices of co-location and pair programming. Let us look at how these concepts can be applied to improve the productivity of distributed teams.

Co-location is the obvious challenge with a distributed team. However, it can be addressed by adopting a number of techniques and tools to ensure that the team members are collaborating as efficiently as they would be had they been co-located. Furthermore, as long as a suitable overlap is maintained between the working hours of a distributed team, productivity can be improved due to a longer “working day” for interdependent tasks.

The pair programming model helps address one of the key challenges with the agile development methodology: lack of detailed elaboration of the problem statement for a task so as to avoid “Analysis Paralysis”. Consequently, a pair of programmers is able to work through a complex problem statement by leveraging their collective understanding and considering multiple solutions in an objective manner. Thus, pair programming also improves learning due to the constant knowledge sharing happening at a level of detail that otherwise is simply not possible. By rotating team members in to different pairs not only further improves the collective knowledge of the team but also helps in team building and consequently job satisfaction thereby resulting in a highly motivated team. So then is productivity a mute point? Interestingly the jury still seems to be out on that!

A large number of studies have been undertaken on evaluating the impact of pair programming on the overall productivity of a team (1 – 4). While the general consensus seems to be that the total effort expended on a task by the pair either increases or remains the same, the benefits realized are harder to quantify in terms of code quality metrics such as defects and code readability. Furthermore, since team members will have varying degrees of experience and expertise the various pairing combinations such as “Expert-Expert”, “Expert-Novice”, “Novice-Novice”, may result in non-optimal situations such as high costs, watching the master, or disengagement. These challenges are further magnified when practicing pair programming across distributed teams.

Alternatives to Pair Programming

It is interesting to note that while the Agile manifesto emphasizes on producing working software it does not necessarily limit documentation to only user stories. However, documentation typically is restricted to user stories since it allows for a clear hand-off point for a task from the non-technical members of the team such as Product Owners and/or Story Authors to the technical members of a team such as the Developers and the QA Engineers. Unfortunately this also limits the design activities to information architecture and user experience. Software architecture is often considered redundant since it typically only focuses on creating models rather than working code. Furthermore, scaffolding features provided in practically all the technology stacks automates the generation of code for CRUD operations for entities for most of the scenarios. Subsequently, techniques such as Model Driven Architecture and Round Trip Engineering are rarely adopted in Agile projects especially if all members of the team are co-located.

However, for distributed teams adoption of appropriate tools can enable implementation of Model Driven Architecture and Round Trip Engineering techniques so as to improve productivity by optimally leveraging the varying levels of expertise available at different locations. Product owners collocated with subject matter experts can collaborate on architecture models to address complex aspects of the product at an abstract level by adopting techniques such as Domain Driven Design and using modelling languages such as UML to generate models that are automatically converted to “compilable” code and handed over to the remote team which would then continue with techniques such as Test Driven Development. Round trip engineering would then be employed to allow updates to the models, if any, to be jointly reviewed at the end of each sprint/iteration. Similarly, Rapid Application Development (RAD) platforms such as WaveMaker and OutSystems and Application Platform as a Service (aPaaS) providers such as Mendix can be used to generate skeletons that can then be elaborated upon by remote teams.

This approach can be further extended once suitable maturity has been reached so as to implement the Software Factory model wherein there is higher emphasis on assembling new product features using reusable code widgets. Software product development can thus benefit from the high productivity usually associated with product factories.

To pair or not to pair: Conclusion

It is clear that there is no silver bullet that will guarantee optimal productivity across all scenarios. Hence we shall conclude by laying out some guidelines for choosing an approach especially when working with distributed teams.

To start with it is important to first baseline the current productivity of your team(s) based on an appropriate metric such as velocity, control charts based on cycle time, etc. and a suitable time horizon. Once a suitable baseline has been arrived at, set a realistic and tangible goal for the improvement that is desired over the chosen time period. More often than not a blanket statement is made about improving productivity and that is a recipe for disaster since it is bound to make the team(s) feel under appreciated. The challenge here is that budgetary constraints are rarely exposed to the entire team and hence it is difficult to justify the need for productivity improvements in the first place.

Assuming that the current model is neither based on pair programming nor on software factories the choice for a new methodology must take in to account the existing culture of and skill distribution within the team. Pair programming puts certain specific demands on “the workstation” from perspective of space and hardware requirements such as a large and preferably dual monitor setup. Occupant density also plays an important role in preventing high noise levels that would otherwise be far too distracting for developers who can no longer don their favorite headsets to block out ambient noise. Finally, conscious effort must be undertaken to ensure that pairing is truly a two way street by encouraging different pairing techniques such as ping-pong pairing and ensuring that pairs are frequently changed. It might also be possible to pair a developer with a story author or a QA engineer while employing techniques such as Behavior Driven Development (BDD) or Acceptance Test Driven Development (ATDD) using tools such as Cucumber. In such a scenario the non-developer member of the pair would focus on developing tests using the Gherkin language while the developer focuses on the coding tasks.

Adoption of Model Driven Architecture and RAD platforms requires procurement of specialized tools to enable effective round trip engineering resulting in additional cost burden on the project. Some of the latest frameworks such as Spring 4, Lombok, etc significantly reduce the boiler plate code required to add support for typical architectural concerns such as logging, data access, security, exposing web service end points, etc. thereby allowing developers to focus on business logic development. Hence, investments can instead be made in to “up-skilling” the team to use these new frameworks and new language features such as lambda functions, reactive streams, etc. and allowing for capacity to refactor legacy code bases. Finally, a strong governance model based on feature based branching and promiscuous merging should be implemented to ensure that Continuous Integration (CI) practices are continually being improved upon to achieve lower defect leakage.

In conclusion classical techniques of pair programming are less likely to provide any significant benefits in terms of productivity over adoption of automation tools and modern frameworks especially across distributed teams.

References:

1. Cockburn, Alistair; Williams, Laurie (2000), The Costs and Benefits of Pair Programming, Proceedings of the First International Conference on Extreme Programming and Flexible Processes in Software Engineering (XP2000).

2. Williams, Laurie; Kessler, Robert (2003), Pair Programming Illuminated, Addison-Wesley, pp. 27–28, ISBN 0-201-74576-3.

3. Lui, Kim Man (2006), Pair programming productivity: Novice–novice vs. expert–expert, International Journal of Human–Computer Studies, 64 (9): 915–925.

4. Arisholm, Erik; Hans Gallis; Tore Dybå; Dag I.K. Sjøberg (2007), Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise, IEEE Transactions on Software Engineering, 33 (2): 65–86.

Best practices for implementing Web Services based APIs

 

Application Programming Interfaces (APIs) have evolved in pace with the computing paradigms from shared statically linked libraries to completely decoupled web service end points. However a common thread across all these forms is that the key to large scale adoption of an API lies in the ease of orchestration across calls to multiple methods/functions contained within. This allows for the API to be highly granular in nature thereby being amenable to use across a wide spectrum of usage scenarios. But as any API developer will attest to this is easier said than done primarily because most classical methods for API distribution including Software Development Kits (SDKs) do not allow for this metadata to be exposed in a dynamic manner that is easy to consume. The best recourse is to include API documentation and to rely on reflection functionality available within the underlying programming language.

Most of the modern Integrated Development Environments (IDEs) leverage this embedded documentation and frameworks that rely on reflection such as the Java Bean Specifications to provide the “IntelliSense” functionality that so many of us today and come to rely upon. But this does not still provide any insights in to the best sequence in which functions within an API are to be invoked so as to accomplish a particular task. Some APIs, notably the DirectX APIs, introduced the concept of a pipeline wherein you could register a series of callback functions that would be invoked in the desired sequence but the programmer still needs to rely on sample code so as to determine the sequencing of function calls required to setup the pipeline. However, the web services based paradigm for developing and hosting APIs has changed the picture significantly and holds the promise to allow for API discoverability and explorability in an unique manner.

At first glance developing APIs based on web service end points offers no further benefits than loose coupling of the interface from the implementation combined with distributed computing that allows for higher scalability. Discoverability of web services based APIs can be achieved through the use of online registries such as APIs.io that rely on an open format called APIs.json to expose suitable metadata about the APIs. This is very similar to repositories for static APIs such as Maven. However, a key benefit to be realized from implementing APIs based on web services is to include additional metadata in the response body that can be auto-interrogated to determine the next service call to be made. One such framework is based on the Hypertext Application Language (HAL) specification and it allows for API explorability if the following best practices are followed:

  • Use of Links & Actions in responses: This could be used to allow for a dynamic and possibly configurable flow for API calls. For RESTful web services the HATEOAS constraint is a great way to implement this functionality.
  • Expose metadata as a separate resource or introduce a meta tag in the response: This could be leveraged to reduce the response size either when multiple items share a bunch of attributes related to say taxonomy or when contextually unnecessary information is always being included in the response. This is to be used differently from attribute selection via query parameters and will require a clear definition of what would constitute as metadata.
  • Consider the use of the OData protocol for exposing data via web services as this allows for programming by convention rather than by contract.

In addition to the above best practices that would simplify API orchestration the following best practices should also be followed to allow for suitable instrumentation in web services based APIs and to ensure highest standards of security, scalability, and backward compatibility:

  • Make call tracing and debugging more efficient by requiring each request to include a timestamp and by assigning a request id to each request received. The request id should be returned in the response and logging of the same along with the timestamp should be encouraged within the client app.
  • Enable response caching, ideally, through the use of ETags or other mechanisms as might be applicable within a given context.
  • Prefer the use of JSON Web Tokens (JWTs) via OAuth 2.0 for implementing security (http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html). Additional links:
    http://tools.ietf.org/html/draft-ietf-oauth-jwt-bearer-07
    https://developers.google.com/accounts/docs/OAuth2ServiceAccount
  • Support expiry and renewal of JWTs via developing appropriate client SDKs.
  • Prefer URL based API versioning since that would help with DevOps automation but query parameter based versioning can also be supported on an as required basis. Also, SDK versioning may be applied as well.

Of course the use of API hosting platforms such as Apigee can certainly help adhere with a number of the best practices defined above when combined with microservices architecture paradigm for implementation.

Taming the public cloud beast: Your monthly bill

Beyond a doubt the public clouds have been a godsend to practically all the startups out there today. Plus the ongoing price wars between the major players: Amazon, Microsoft, Google, IBM, has meant that the price per unit of compute/storage/network capacity has been on the decline. Even adoption amongst the traditional large enterprises is on the increase with success stories being written on the hour every hour. So then how does one explain articles such as the one below?

Here’s why this startup ditched Amazon Web Services by John Cook

And this is not an odd one out case. and other similar articles are available albeit they don’t exactly show up on the first page of most search queries pertaining to cloud pricing/costs thanks to the excellent SEO efforts by all the big providers.

The bottom line: Just like dining out every day at an unlimited buffet leads to obesity ad-hoc usage of cloud computing resources leads to a bloated bill that can take many a startup by surprise. Is the answer then to simply jump ship and switch over to a private cloud or worse yet a traditional infrastructure model? All of the players in the private cloud space are trying hard to convince you to do so. Here is an excellent white paper from Eucalyptus to help you take the leap. Maybe not just yet if some of the strategies outlined below are adopted appropriately.

Utilize compute resources for the shortest possible time: Throughout the history of the World Wide Web the one golden rule followed religiously has been: Be always available lest your loyal consumer ditches you the instant your site is down. Given that public clouds rely on sharing the same physical resources across multiple customers it should come as no surprise that the cheapest pricing plan available for the longest time was the one wherein you spun up reserved instances with a guaranteed up time of 99.995% or better. Not only do you end up paying an upfront charge but also the costs can spiral exponentially as you keep adding nodes. To add to your woes as the application and data complexity increases along with the upsurge in customers you start spinning up the more expensive high end instances.

Invest instead in re-architecting your applications to utilize micro instances by adopting a micro services based approach or better still invest in building up your in house DevOps skills to leverage the on-demand and spot pricing plans. The latest PaaS offerings from Amazon such as AWS Lambda, and BlueMix from IBM provide a host of ready to use micro services that can be leveraged on a as needed basis. To add to that the newest auto-scaling offerings from some of the providers also allow you to spin up container based compute instances instead of entire VMs.

Have a crystal clear strategy for processing raw usage data and/or archive it as quickly as you can: Success in boosting site traffic which invariable leads to more business brings with it a deluge of raw usage data that in turn holds the secrets to the next chapter of your growth. Hence it is very tempting to hold on to as much usage data as you can. Plus there may not be a clean separation between transactional and raw usage data. All the cloud providers leverage this aspect of the growth phase of any startup to drive up your monthly spend. Hence it is critical to watch your storage needs very closely and adapt to increasing raw usage data very quickly.

To start with ensure that you can clearly demarcate between transactional data and all other data generated until the time the transaction is actually completed. Also make sure you can easily sift between anonymous usage data and that associated with a known logged in customer. Store all usage data using object based storage services such as AWS S3 limiting each bucket to a relatively short time duration say five minutes and employ data aggregation to reduce data volume by aggregating to a longer time duration say one hour. The key here is not to try and convert the data to a full-fledged data warehouse/mart schema at this stage. Once the raw data has been thus processed it should be archived on a daily basis using solutions such as AWS Glacier. If you don’t have a strategy to further utilize the semi-processed usage data to populate a data warehouse then archive that as well say on a weekly or monthly basis.

Reduce network traffic to the compute instances and between different availability zones: This is probably the most easily overlooked aspect of your monthly bill. Most of the savvy startups will quickly utilize CDN for static content and script caching thereby reducing network traffic to the compute instances hosting the web applications but as your overall cloud infrastructure grows and you start spanning availability zones for ensuring high availability and disaster recovery the corresponding increase in network traffic across availability zones will start adding up quickly. Luckily your startup will have to be wildly successful before this component of the monthly bill will require too much attention and by that time you will be able to afford the real high end talent required to optimize the architecture further.

The kind of monthly spend on public clouds as described in the article referenced at the start of the article represents a dream come true to most of the startups just starting out of the gate but it is always a good idea to start adopting the right strategies and architectures to manage your monthly spend from the very beginning when even a thousand bucks out of your pocket can seem like a million. Furthermore the right architecture will help you eventually transition to a hybrid cloud model at the right time in the future with the least amount of effort and risk. your pocket can seem like a million. Furthermore the right architecture will help you eventually transition to a hybrid cloud model at the right time in the future with the least amount of effort and risk.

This blog was first published on the ContractIQ site at http://blog.contractiq.com/taming-the-public-cloud-beast-your-monthly-cloud-computing-bill/ on December 17, 2014.

Architecting solutions for Cluster Computing as opposed to Cloud Computing

Recently, while evaluating storage options as part of a consulting engagement, I came across the Isilion offering from EMC and some of the articles in the associated literature talked about the use of Isilion for cluster computing. Given that the emphasis is still on storage, specifically HDFS, it was intriguing that the possibility of compute functions being delegated to nodes a la Map Reduce was discussed quite a bit. Further reading in to what is considered to be cluster computing got me to the Wikipedia article on Computer Clusters.

So it is quite clear as to what the difference is between cloud computing and cluster computing to the extent that we can even safely say the cluster computing is a subset of cloud computing especially given the offerings from Amazon Web Services such as Elastic Map Reduce and the newly launched Lamda. Hence in this blog article I will focus instead on how a solution needs to be architected to leverage cluster computing effectively to get the best bang for the buck out of cloud computing.

Lets start by addressing the biggest challenge with implementing cluster computing: co-location of data on the compute node. While this is an easy problem to solve while utilizing the Map Reduce paradigm it represents a real challenge to use cluster computing for achieving scalability in the typical usage scenarios. Although the use of technologies such as InfiniBand may be an option in some cases the cost benefit analysis would render it useless for most of the typical business applications.

One immediate option is to utilize microservices based architecture. But it is clear from the description in the seminal article by Martin Fowler that it does not address co-location of transaction data although he does talk about decentralized data management and polygot persistence. Clearly is not really meant to allow for easy adoption in a cluster computing scenario. Interestingly though there is a reference to Enterprise Service Bus as an example of smart end points and that is what got me thinking about extending the concept to cluster computing.

The trick then is to apply the event based programming model to the microservices architecture concept leveraging in turn the smart end points aspect. All the transaction data needs to be embedded in the event combined with any contextual state data. Through the use of interceptors or other adapters the data can be deserialized in to the appropriate service specific representation. This is key since the service need not and actually should not be built to consume the event data structure.

While the approach described above would require you to invest significantly in setting up the requisite infrastructure components to provision compute nodes on the fly to handle events, given the recent release of the AWS Lambda service provides us with an opportunity to apply this concept more easily albeit with some new terminology: microservices are implemented as AWS Lamda functions! It would be very interesting to figure out if argument reduction is supported! Check this blog again in a few weeks to find out…

Virtual Machines, Containers, and now LXD: What’s best for me?

First there were virtual machine instances running on bare metal or para-virtual hypervisors, then came containers allowing for better utilization of virtual machine instances, and now we have the Linux Container Daemon (LXD) pronounced “lex-dee”. As the tag line goes “The new hypervisor isn’t a hypervisor, and it’s much, much faster”; one quickly realizes that hyperboles are not where your troubles end! Instead your choices for virtualization just got trickier to sort out.

So here are some simplifying rules to get going quickly:

1. If you have the luxury of hosting your application entirely on the public cloud, stick to the good old fashioned VMs managed using the auto-scale functionality provided by the cloud platform of your choice.

2. If you have the luxury of creating your private cloud using paid software such as the vCloud suite, stick to the good old fashioned VMs managed using the components of the suite such as vRealize.

3. If you are stuck with the free bare metal hypervisors such as vSphere ESXi or Hyper-V(not entirely free) or with paravirtual hypervisors such as KVM then get the right DevOps skills on your team and use Docker containers managed using Fig for development and Chef for production environments.

4. If you are the brave spirit capable of dealing with the hardcore hardware and low level kernel configuration matters opt for Metal As A Service (MAAS) offering from Canonical combined with Juju.

5. If you are lucky to have the hardware setup exactly as required for running OpenStack/CloudStack and have the chops to customize the provided management apps then by all means rock your world by replicating most of the basic AWS offerings within your datacenter.

6. Finally, if you are truly bored of your mind by all the mundane options listed above and have the distinct air of “Been there – Done that” around you with the appropriate management support within your back pocket venture in to the brand new entrants such as LXD!

Of course you might also just be lucky enough to be in my position: Make recommendations and then sit back to enjoy the show! And, of course, come back to this blog for more …

Developer Workstations – The untapped Private Cloud

The “Infrastructure as a Service”(IaaS) model for Cloud Computing has matured significantly in the recent years with a large number of providers offering a variety of public, private, and hybrid clouds at very competitive prices. However, adoption of the “Platform as a Service”(PaaS) model is still lagging. The primary reason for this is quite easy to understand: most developers and delivery managers are least impacted by the IaaS model and hence they can continue developing and delivering software the same way as before. Transitioning to a PaaS model on the other hand may involve significant changes to the software development and delivery process and thus represents a significant risk to the overall success of the project.

Challenges to PaaS adoption in Development Teams

One of the key challenge in adopting a PaaS model is the cost of making the target platform available to the entire development team for all functions. Instead, the typical use case is to implement Continuous Integration using PaaS offerings from vendors such as CloudMunch, OpenShift, Heroku, etc. But this still limits the developers from being able to have access to all the features of the target platform and thereby constraining innovation to a small set of “experts”. Clearly, lack of infrastructure should not be a limiting factor for increased innovation and productivity.

Underutilization of Developer Workstations

Due to the exponential growth in computing power combined with ever decreasing cost of hardware, every developer typically is provided with either a laptop or a desktop that has sufficient computing power to host an application server as well as a database and any other required software locally for their own private use. Doing so allows for a number of benefits from telecommuting to ease of adoption of the Agile development methodology. Furthermore, lack of consistent high speed broadband access, either due to poor infrastructure in developing countries or due to network congestion over the airways in developed nations, limits access to shared enterprise PaaS resources for developers who are rarely tethered to a desk. On the other hand synchronization through a centralized code repository requires very little bandwidth. Finally, these local environments will eventually start varying from each other and more importantly from the target environment thereby introducing the risk that submissions from different team members may not play well together in the CI environment.

Cloud Infrastructure Management Frameworks to the rescue …

Private clouds based on open source software such as OpenStack or CloudStack may offer a solution wherein each developer workstation is a virtual machine host on to which an image of the target platform can be launched on demand. These images can be auto generated as a part of the daily CI cycle. A developer thus would have access to the overall platform so as to either try out new features or make configuration changes to resolve certain issues. The same infrastructure can also be leveraged to provide additional computing resources for the CI cycle or for load testing.

But not without some additional innovation

At present, the primary constraint of the open source cloud management frameworks is the homogeneity of the host hardware. This severely limits the use of developer workstations as hosts. Furthermore, the use of bare metal hypervisors is not feasible and the most popular operating systems for developer workstations are not ideally suitable for hosting type 2 hypervisors. Instead, as a first step, it is recommended that an approach based on desktop virtualization products such as VirtualBox or VMware Player be adopted. As a part of the daily CI build a new version of target environment can be packaged as an appliance and be made available to developers for download. Thus, in addition to getting the latest code at the start of the day, developers can also get the latest target environment and fire it up on their workstations. Additionally, some basic agent software can be developed to allow the developer to add their local guest OS instance to a resource pool for use in intensive computing tasks such as load testing. Simultaneously, open source frameworks can be extended to allow for a more heterogeneous mix of hosts.

This blog was originally published at http://www.compassitesinc.com/blogs/developer-workstations on January 10, 2013