Best practices for implementing Web Services based APIs


Application Programming Interfaces (APIs) have evolved in pace with the computing paradigms from shared statically linked libraries to completely decoupled web service end points. However a common thread across all these forms is that the key to large scale adoption of an API lies in the ease of orchestration across calls to multiple methods/functions contained within. This allows for the API to be highly granular in nature thereby being amenable to use across a wide spectrum of usage scenarios. But as any API developer will attest to this is easier said than done primarily because most classical methods for API distribution including Software Development Kits (SDKs) do not allow for this metadata to be exposed in a dynamic manner that is easy to consume. The best recourse is to include API documentation and to rely on reflection functionality available within the underlying programming language.

Most of the modern Integrated Development Environments (IDEs) leverage this embedded documentation and frameworks that rely on reflection such as the Java Bean Specifications to provide the “IntelliSense” functionality that so many of us today and come to rely upon. But this does not still provide any insights in to the best sequence in which functions within an API are to be invoked so as to accomplish a particular task. Some APIs, notably the DirectX APIs, introduced the concept of a pipeline wherein you could register a series of callback functions that would be invoked in the desired sequence but the programmer still needs to rely on sample code so as to determine the sequencing of function calls required to setup the pipeline. However, the web services based paradigm for developing and hosting APIs has changed the picture significantly and holds the promise to allow for API discoverability and explorability in an unique manner.

At first glance developing APIs based on web service end points offers no further benefits than loose coupling of the interface from the implementation combined with distributed computing that allows for higher scalability. Discoverability of web services based APIs can be achieved through the use of online registries such as that rely on an open format called APIs.json to expose suitable metadata about the APIs. This is very similar to repositories for static APIs such as Maven. However, a key benefit to be realized from implementing APIs based on web services is to include additional metadata in the response body that can be auto-interrogated to determine the next service call to be made. One such framework is based on the Hypertext Application Language (HAL) specification and it allows for API explorability if the following best practices are followed:

  • Use of Links & Actions in responses: This could be used to allow for a dynamic and possibly configurable flow for API calls. For RESTful web services the HATEOAS constraint is a great way to implement this functionality.
  • Expose metadata as a separate resource or introduce a meta tag in the response: This could be leveraged to reduce the response size either when multiple items share a bunch of attributes related to say taxonomy or when contextually unnecessary information is always being included in the response. This is to be used differently from attribute selection via query parameters and will require a clear definition of what would constitute as metadata.
  • Consider the use of the OData protocol for exposing data via web services as this allows for programming by convention rather than by contract.

In addition to the above best practices that would simplify API orchestration the following best practices should also be followed to allow for suitable instrumentation in web services based APIs and to ensure highest standards of security, scalability, and backward compatibility:

  • Make call tracing and debugging more efficient by requiring each request to include a timestamp and by assigning a request id to each request received. The request id should be returned in the response and logging of the same along with the timestamp should be encouraged within the client app.
  • Enable response caching, ideally, through the use of ETags or other mechanisms as might be applicable within a given context.
  • Prefer the use of JSON Web Tokens (JWTs) via OAuth 2.0 for implementing security ( Additional links:
  • Support expiry and renewal of JWTs via developing appropriate client SDKs.
  • Prefer URL based API versioning since that would help with DevOps automation but query parameter based versioning can also be supported on an as required basis. Also, SDK versioning may be applied as well.

Of course the use of API hosting platforms such as Apigee can certainly help adhere with a number of the best practices defined above when combined with microservices architecture paradigm for implementation.

Architecting solutions for Cluster Computing as opposed to Cloud Computing

Recently, while evaluating storage options as part of a consulting engagement, I came across the Isilion offering from EMC and some of the articles in the associated literature talked about the use of Isilion for cluster computing. Given that the emphasis is still on storage, specifically HDFS, it was intriguing that the possibility of compute functions being delegated to nodes a la Map Reduce was discussed quite a bit. Further reading in to what is considered to be cluster computing got me to the Wikipedia article on Computer Clusters.

So it is quite clear as to what the difference is between cloud computing and cluster computing to the extent that we can even safely say the cluster computing is a subset of cloud computing especially given the offerings from Amazon Web Services such as Elastic Map Reduce and the newly launched Lamda. Hence in this blog article I will focus instead on how a solution needs to be architected to leverage cluster computing effectively to get the best bang for the buck out of cloud computing.

Lets start by addressing the biggest challenge with implementing cluster computing: co-location of data on the compute node. While this is an easy problem to solve while utilizing the Map Reduce paradigm it represents a real challenge to use cluster computing for achieving scalability in the typical usage scenarios. Although the use of technologies such as InfiniBand may be an option in some cases the cost benefit analysis would render it useless for most of the typical business applications.

One immediate option is to utilize microservices based architecture. But it is clear from the description in the seminal article by Martin Fowler that it does not address co-location of transaction data although he does talk about decentralized data management and polygot persistence. Clearly is not really meant to allow for easy adoption in a cluster computing scenario. Interestingly though there is a reference to Enterprise Service Bus as an example of smart end points and that is what got me thinking about extending the concept to cluster computing.

The trick then is to apply the event based programming model to the microservices architecture concept leveraging in turn the smart end points aspect. All the transaction data needs to be embedded in the event combined with any contextual state data. Through the use of interceptors or other adapters the data can be deserialized in to the appropriate service specific representation. This is key since the service need not and actually should not be built to consume the event data structure.

While the approach described above would require you to invest significantly in setting up the requisite infrastructure components to provision compute nodes on the fly to handle events, given the recent release of the AWS Lambda service provides us with an opportunity to apply this concept more easily albeit with some new terminology: microservices are implemented as AWS Lamda functions! It would be very interesting to figure out if argument reduction is supported! Check this blog again in a few weeks to find out…