Websocket Streaming APIs vs WebHooks

When it comes to push technology, there are LOTs of options out there. And I do mean lots. The market is hot as you can clearly see in the following picture from my Event driven API Strategies presentation at Nordic API 2019 submit.

However what technology and/or approach to adopt really depends of course on the use case in question and the targeted business benefits for both API producers and consumers. For example we at Oracle Hospitality recently announced a GraphQL Subscriptions  / WebSocket based Streaming API for business event push. Details of the announcement here.

Overall our Streaming API is being extremely well received and there is a huge amount of excitement in our vendor and customer communities alike about it. This is great to see of course given the amount of time and effort that was put into delivering this strategy.

That being said, There has has been a few questions as to why we didn't opt for the more traditional Webhook approach. This article by James Neate offers a great explanation on the reasoning behind our tech strategy (highly recommend the article), however in this blog post I wanted to expand on the specifics as to why we favoured GraphQL subscriptions / WebSockets over Webhooks.

Here are our main 8 reasons:

  1. Webhook requires an API server on the receiver side so events can be pushed back. What this means in practice is that the event consumer must expose an endpoint  (akas call back URL) that can be invoked at any point in time by the event producer to push events. This of course adds additional runtime infrastructure requiring additional API runtime governance, close monitoring and security. All of this naturally incurring additional operating costs. With WebSocket, however there is no need to implement an API server on the consumer side, but just an API client e.g. using the Apollo GraphQL Subscription library if Graphql is being used. An important remark: because we're talking about an API client and not an API server, using serverless infrastructure charged by execution time may not be desirable. Instead consider a different deployment model like for example a container running in kubernetes.
  2. Because callback URLs are actually public endpoints, it means that networks/firewalls have to be configured to accept external calls. It also means that such endpoints are exposed to external security threats and therefore they have to be adequately secured with infrastructure such as WAF and API gateways. With Gartner predicting that API-based attacks will become the most frequent attack vector for applications, this matters a lot.
  3. In a Webhook approach API lifecycle becomes more complicated. This is because the event producer has to define an API spec which must be adopted (by the letter) by all event consumers. Any deviation in structure or behaviour  by the consumer will almost certainly result in issues. Moreover this introduces an additional dimension of complexity in API change management. The server must coordinate carefully any changes made to the spec such as callbacks don't error for example, because the consumer is in an older version of the spec and therefore doesn't recognise a new field added to the payload or new HTTP header.  In a websocket approach on the flip side, the API consumer binds to the server API spec. Therefore so long that the event producer follows good API management practices around versioning, handling change will be simpler on both ends. Change management becomes particularly simpler with GraphQL subscriptions as subscriptions too benefit from the great schema evolvability features available with Graphql (e.g. an event consumer decides when to consume the new field added).
  4. The Webhook server will most likely always push full event payloads, even if the consumer isn't interested in everything. Websocket streaming APIs implementing with GraphQL subscriptions on the other side work just like GraphQL, you only get the data you are interested on which makes it super efficient. Basically users have the ability to 'cherry pick' the event data they're interested on.
  5. Related to the previous point, every event pushed by the event producer actually is a new synchronous HTTP call which isn't very efficient. In an WebSocket approach the server actually reuses a single TCP connection to push events.
  6. Implementing features such as playback events in Webhooks can be very complicated e.g. how does the Webhook server know that the consumer wants to play back e.g. 3 hours worth of events? this means that additional infrastructure will be required by event producer for the event consumer to be able to instruct to the server that it wants to play back events. With WebSocket however, this is much easier to implement. This is because during the initial handshake, the event consumer has the opportunity to pass additional information to the server such as an offset number.  And this is exactly what we did in our Oracle Hospitality Streaming API, we enable consumers to just pass the offset and then the server will just play back events from there. Currently we support up to 7 days of worth of events to play back from.
  7. Another important factor to consider is how to deal with back pressure. With Webhooks there isn't a simple mechanism to deal with it. This is mainly because the server can't easily that the client is suffering from back pressure and because of it doesn't slow down the flow of events being pushed. A common strategy to avoid back pressure with Webhooks tends to be to avoid it all together by means of scaling the Webhook API. But this strategy may not always work as expected (especially during sudden peaks) and can introduce additional challenges such as maintaining message sequence when processing events in parallel. With WebSockets this is a lot simpler because the consumer has the ability to switch off the connection, deal with the backlog of events and then switch it back on once it's ready to handle more events. This combined with the ability to play back events, means that the consumer has better options to deal with back pressure.
  8. And lastly, is the popularity factor. Although a subjective (and for some a controversial) factor, it's clear that GraphQL and WebSocket are both increasing exponentially in popularity -which can't be said for REST and/or Webhooks. So if you're implementing a brand new streaming API that will be around for a while, like in our case, it is important to also consider subjective factors like this too. For many reasons that I hope you appreciate without me having to write them down :)