Best practices for implementing Web Services based APIs

 

Application Programming Interfaces (APIs) have evolved in pace with the computing paradigms from shared statically linked libraries to completely decoupled web service end points. However a common thread across all these forms is that the key to large scale adoption of an API lies in the ease of orchestration across calls to multiple methods/functions contained within. This allows for the API to be highly granular in nature thereby being amenable to use across a wide spectrum of usage scenarios. But as any API developer will attest to this is easier said than done primarily because most classical methods for API distribution including Software Development Kits (SDKs) do not allow for this metadata to be exposed in a dynamic manner that is easy to consume. The best recourse is to include API documentation and to rely on reflection functionality available within the underlying programming language.

Most of the modern Integrated Development Environments (IDEs) leverage this embedded documentation and frameworks that rely on reflection such as the Java Bean Specifications to provide the “IntelliSense” functionality that so many of us today and come to rely upon. But this does not still provide any insights in to the best sequence in which functions within an API are to be invoked so as to accomplish a particular task. Some APIs, notably the DirectX APIs, introduced the concept of a pipeline wherein you could register a series of callback functions that would be invoked in the desired sequence but the programmer still needs to rely on sample code so as to determine the sequencing of function calls required to setup the pipeline. However, the web services based paradigm for developing and hosting APIs has changed the picture significantly and holds the promise to allow for API discoverability and explorability in an unique manner.

At first glance developing APIs based on web service end points offers no further benefits than loose coupling of the interface from the implementation combined with distributed computing that allows for higher scalability. Discoverability of web services based APIs can be achieved through the use of online registries such as APIs.io that rely on an open format called APIs.json to expose suitable metadata about the APIs. This is very similar to repositories for static APIs such as Maven. However, a key benefit to be realized from implementing APIs based on web services is to include additional metadata in the response body that can be auto-interrogated to determine the next service call to be made. One such framework is based on the Hypertext Application Language (HAL) specification and it allows for API explorability if the following best practices are followed:

  • Use of Links & Actions in responses: This could be used to allow for a dynamic and possibly configurable flow for API calls. For RESTful web services the HATEOAS constraint is a great way to implement this functionality.
  • Expose metadata as a separate resource or introduce a meta tag in the response: This could be leveraged to reduce the response size either when multiple items share a bunch of attributes related to say taxonomy or when contextually unnecessary information is always being included in the response. This is to be used differently from attribute selection via query parameters and will require a clear definition of what would constitute as metadata.
  • Consider the use of the OData protocol for exposing data via web services as this allows for programming by convention rather than by contract.

In addition to the above best practices that would simplify API orchestration the following best practices should also be followed to allow for suitable instrumentation in web services based APIs and to ensure highest standards of security, scalability, and backward compatibility:

  • Make call tracing and debugging more efficient by requiring each request to include a timestamp and by assigning a request id to each request received. The request id should be returned in the response and logging of the same along with the timestamp should be encouraged within the client app.
  • Enable response caching, ideally, through the use of ETags or other mechanisms as might be applicable within a given context.
  • Prefer the use of JSON Web Tokens (JWTs) via OAuth 2.0 for implementing security (http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html). Additional links:
    http://tools.ietf.org/html/draft-ietf-oauth-jwt-bearer-07
    https://developers.google.com/accounts/docs/OAuth2ServiceAccount
  • Support expiry and renewal of JWTs via developing appropriate client SDKs.
  • Prefer URL based API versioning since that would help with DevOps automation but query parameter based versioning can also be supported on an as required basis. Also, SDK versioning may be applied as well.

Of course the use of API hosting platforms such as Apigee can certainly help adhere with a number of the best practices defined above when combined with microservices architecture paradigm for implementation.

Taming the public cloud beast: Your monthly bill

Beyond a doubt the public clouds have been a godsend to practically all the startups out there today. Plus the ongoing price wars between the major players: Amazon, Microsoft, Google, IBM, has meant that the price per unit of compute/storage/network capacity has been on the decline. Even adoption amongst the traditional large enterprises is on the increase with success stories being written on the hour every hour. So then how does one explain articles such as the one below?

Here’s why this startup ditched Amazon Web Services by John Cook

And this is not an odd one out case. and other similar articles are available albeit they don’t exactly show up on the first page of most search queries pertaining to cloud pricing/costs thanks to the excellent SEO efforts by all the big providers.

The bottom line: Just like dining out every day at an unlimited buffet leads to obesity ad-hoc usage of cloud computing resources leads to a bloated bill that can take many a startup by surprise. Is the answer then to simply jump ship and switch over to a private cloud or worse yet a traditional infrastructure model? All of the players in the private cloud space are trying hard to convince you to do so. Here is an excellent white paper from Eucalyptus to help you take the leap. Maybe not just yet if some of the strategies outlined below are adopted appropriately.

Utilize compute resources for the shortest possible time: Throughout the history of the World Wide Web the one golden rule followed religiously has been: Be always available lest your loyal consumer ditches you the instant your site is down. Given that public clouds rely on sharing the same physical resources across multiple customers it should come as no surprise that the cheapest pricing plan available for the longest time was the one wherein you spun up reserved instances with a guaranteed up time of 99.995% or better. Not only do you end up paying an upfront charge but also the costs can spiral exponentially as you keep adding nodes. To add to your woes as the application and data complexity increases along with the upsurge in customers you start spinning up the more expensive high end instances.

Invest instead in re-architecting your applications to utilize micro instances by adopting a micro services based approach or better still invest in building up your in house DevOps skills to leverage the on-demand and spot pricing plans. The latest PaaS offerings from Amazon such as AWS Lambda, and BlueMix from IBM provide a host of ready to use micro services that can be leveraged on a as needed basis. To add to that the newest auto-scaling offerings from some of the providers also allow you to spin up container based compute instances instead of entire VMs.

Have a crystal clear strategy for processing raw usage data and/or archive it as quickly as you can: Success in boosting site traffic which invariable leads to more business brings with it a deluge of raw usage data that in turn holds the secrets to the next chapter of your growth. Hence it is very tempting to hold on to as much usage data as you can. Plus there may not be a clean separation between transactional and raw usage data. All the cloud providers leverage this aspect of the growth phase of any startup to drive up your monthly spend. Hence it is critical to watch your storage needs very closely and adapt to increasing raw usage data very quickly.

To start with ensure that you can clearly demarcate between transactional data and all other data generated until the time the transaction is actually completed. Also make sure you can easily sift between anonymous usage data and that associated with a known logged in customer. Store all usage data using object based storage services such as AWS S3 limiting each bucket to a relatively short time duration say five minutes and employ data aggregation to reduce data volume by aggregating to a longer time duration say one hour. The key here is not to try and convert the data to a full-fledged data warehouse/mart schema at this stage. Once the raw data has been thus processed it should be archived on a daily basis using solutions such as AWS Glacier. If you don’t have a strategy to further utilize the semi-processed usage data to populate a data warehouse then archive that as well say on a weekly or monthly basis.

Reduce network traffic to the compute instances and between different availability zones: This is probably the most easily overlooked aspect of your monthly bill. Most of the savvy startups will quickly utilize CDN for static content and script caching thereby reducing network traffic to the compute instances hosting the web applications but as your overall cloud infrastructure grows and you start spanning availability zones for ensuring high availability and disaster recovery the corresponding increase in network traffic across availability zones will start adding up quickly. Luckily your startup will have to be wildly successful before this component of the monthly bill will require too much attention and by that time you will be able to afford the real high end talent required to optimize the architecture further.

The kind of monthly spend on public clouds as described in the article referenced at the start of the article represents a dream come true to most of the startups just starting out of the gate but it is always a good idea to start adopting the right strategies and architectures to manage your monthly spend from the very beginning when even a thousand bucks out of your pocket can seem like a million. Furthermore the right architecture will help you eventually transition to a hybrid cloud model at the right time in the future with the least amount of effort and risk. your pocket can seem like a million. Furthermore the right architecture will help you eventually transition to a hybrid cloud model at the right time in the future with the least amount of effort and risk.

This blog was first published on the ContractIQ site at http://blog.contractiq.com/taming-the-public-cloud-beast-your-monthly-cloud-computing-bill/ on December 17, 2014.