Examples of com.stumbleupon.async.Deferred

com.stumbleupon.async.Deferred
/1PlbY7">Deferred Reference
A tutorial (Deferred in depth)
Source code of {@code defer.py}

This API is a simple and elegant way of managing asynchronous and dynamic "pipelines" (processing chains) without having to explicitly define a finite state machine.

The tl;dr version

We're all busy and don't always have time to RTFM in details. Please pay special attention to the invariants you must respect. Other than that, here's an executive summary of what {@code Deferred} offers:

A {@code Deferred} is like a {@link java.util.concurrent.Future} witha dynamic {@link Callback} chain associated to it.
When the deferred result becomes available, the callback chain gets triggered.
The result of one callback in the chain is passed on to the next.
When a callback returns another {@code Deferred}, the next callback in the chain doesn't get executed until that other {@code Deferred} resultbecomes available.
There are actually two callback chains. One is for normal processing, the other is for error handling / recovery. A {@link Callback} that handleserrors is called an "errback".
{@code Deferred} is an important building block for writing easy-to-useasynchronous APIs in a thread-safe fashion.

Understanding the concept of {@code Deferred}

The idea is that a {@code Deferred} represents a result that's not yetavailable. An asynchronous operation (I/O, RPC, whatever) has been started and will hand its result (be it successful or not) to the {@code Deferred}in the future. The key difference between a {@code Deferred} and a{@link java.util.concurrent.Future Future} is that a {@code Deferred} hasa callback chain associated to it, whereas with just a {@code Future} you need get the result manually at some point, which posesproblems such as: How do you know when the result is available? What if the result itself depends on another future?

When you start an asynchronous operation, you typically want to be called back when the operation completes. If the operation was successful, you want your callback to use its result to carry on what you were doing at the time you started the asynchronous operation. If there was an error, you want to trigger some error handling code.

But there's more to a {@code Deferred} than a single callback. You can addarbitrary number of callbacks, which effectively allows you to easily build complex processing pipelines in a really simple and elegant way.

Understanding the callback chain

Let's take a typical example. You're writing a client library for others to use your simple remote storage service. When your users call the {@code get} method in your library, you want to retrieve some piece of data fromyour remote service and hand it back to the user, but you want to do so in an asynchronous fashion.

When the user of your client library invokes {@code get}, you assemble a request and send it out to the remote server through a socket. Before sending it to the socket, you create a {@code Deferred} and you store itsomewhere, for example in a map, to keep an association between the request and this {@code Deferred}. You then return this {@code Deferred} to theuser, this is how they will access the deferred result as soon as the RPC completes.

Sooner or later, the RPC will complete (successfully or not), and your socket will become readable (or maybe closed, in the event of a failure). Let's assume for now that everything works as expected, and thus the socket is readable, so you read the response from the socket. At this point you extract the result of the remote {@code get} call, and you hand it out tothe {@code Deferred} you created for this request (remember, you had tostore it somewhere, so you could give it the deferred result once you have it). The {@code Deferred} then stores this result and triggers anycallback that may have been added to it. The expectation is that the user of your client library, after calling your {@code get} method, will add a{@link Callback} to the {@code Deferred} you gave them. This way, when thedeferred result becomes available, you'll call it with the result in argument.

So far what we've explained is nothing more than a {@code Future} with acallback associated to it. But there's more to {@code Deferred} than justthis. Let's assume now that someone else wants to build a caching layer on top of your client library, to avoid repeatedly {@code get}ting the same value over and over again through the network. Users who want to use the cache will invoke {@code get} on the caching library instead of directlycalling your client library.

Let's assume that the caching library already has a result cached for a {@code get} call. It will create a {@code Deferred}, and immediately hand it the cached result, and it will return this {@code Deferred} to the user.The user will add a {@link Callback} to it, which will be immediatelyinvoked since the deferred result is already available. So the entire {@code get} call completed virtually instantaneously and entirely from thesame thread. There was no context switch (no other thread involved, no I/O and whatnot), nothing ever blocked, everything just happened really quickly.

Now let's assume that the caching library has a cache miss and needs to do a remote {@code get} call using the original client library describedearlier. The RPC is sent out to the remote server and the client library returns a {@code Deferred} to the caching library. This is where thingsbecome exciting. The caching library can then add its own callback to the {@code Deferred} before returning it to the user. This callback will takethe result that came back from the remote server, add it to the cache and return it. As usual, the user then adds their own callback to process the result. So now the {@code Deferred} has 2 callbacks associated to it:

 1st callback       2nd callback Deferred:  add to cache  -->  user callback

When the RPC completes, the original client library will de-serialize the result from the wire and hand it out to the {@code Deferred}. The first callback will be invoked, which will add the result to the cache of the caching library. Then whatever the first callback returns will be passed on to the second callback. It turns out that the caching callback returns the {@code get} response unchanged, so that will be passed on to the usercallback.

Now it's very important to understand that the first callback could have returned another arbitrary value, and that's what would have been passed to the second callback. This may sound weird at first but it's actually the key behind {@code Deferred}.

To illustrate why, let's complicate things a bit more. Let's assume the remote service that serves those {@code get} requests is a fairly simpleand low-level storage service (think {@code memcached}), so it only works with byte arrays, it doesn't care what the contents is. So the original client library is only de-serializing the byte array from the network and handing that byte array to the {@code Deferred}.

Now you're writing a higher-level library that uses this storage system to store some of your custom objects. So when you get the byte array from the server, you need to further de-serialize it into some kind of an object. Users of your higher-level library don't care about what kind of remote storage system you use, the only thing they care about is {@code get}ting those objects asynchronously. Your higher-level library is built on top of the original low-level library that does the RPC communication.

When the users of the higher-level library call {@code get}, you call {@code get} on the lower-level library, which issues an RPC call andreturns a {@code Deferred} to the higher-level library. The higher-levellibrary then adds a first callback to further de-serialize the byte array into an object. Then the user of the higher-level library adds their own callback that does something with that object. So now we have something that looks like this:

 1st callback                    2nd callback Deferred:  de-serialize to an object  -->  user callback

When the result comes in from the network, the byte array is de-serialized from the socket. The first callback is invoked and its argument is the initial result, the byte array. So the first callback further de-serializes it into some object that it returns. The second callback is then invoked and its argument is the result of the previous callback, that is the de-serialized object.

Now back to the caching library, which has nothing to do with the higher level library. All it does is, given an object that implements some interface with a {@code get} method, it keeps a map of whatever arguments{@code get} receives to an {@code Object} that was cached for thisparticular {@code get} call. Thanks to the way the callback chain works,it's possible to use the caching library together with the higher-level library transparently. Users who want to use caching simply need to use the caching library together with the higher level library. Now when they call {@code get} on the caching library, and there's a cache miss,here's what happens, step by step:

The caching library calls {@code get} on the higher-levellibrary.
The higher-level library calls {@code get} on the lower-levellibrary.
The lower-level library creates a {@code Deferred}, issues out the RPC call and returns its {@code Deferred}.
The higher-level library adds its own object de-serialization callback to the {@code Deferred} and returns it.
The caching library adds its own cache-updating callback to the {@code Deferred} and returns it.
The user gets the {@code Deferred} and adds their own callbackto do something with the object retrieved from the data store.

 1st callback       2nd callback       3rd callback Deferred:  de-serialize  -->  add to cache  -->  user callback result: (none available)

Once the response comes back, the first callback is invoked, it de-serializes the object, returns it. The current result of the {@code Deferred} becomes the de-serialized object. The current state ofthe {@code Deferred} is as follows:

 2nd callback       3rd callback Deferred:  add to cache  -->  user callback result: de-serialized object

Because there are more callbacks in the chain, the {@code Deferred} invokesthe next one and gives it the current result (the de-serialized object) in argument. The callback adds that object to its cache and returns it unchanged.

 3rd callback Deferred:  user callback result: de-serialized object

Finally, the user's callback is invoked with the object in argument.

 Deferred:  (no more callbacks) result: (whatever the user's callback returned)

If you think this is becoming interesting, read on, you haven't reached the most interesting thing about {@code Deferred} yet.

Building dynamic processing pipelines with {@code Deferred}

Let's complicate the previous example a little bit more. Let's assume that the remote storage service that serves those {@code get} calls is adistributed service that runs on many machines. The data is partitioned over many nodes and moves around as nodes come and go (due to machine failures and whatnot). In order to execute a {@code get} call, thelow-level client library first needs to know which server is currently serving that piece of data. Let's assume that there's another server, which is part of that distributed service, that maintains an index and keeps track of where each piece of data is. The low-level client library first needs to lookup the location of the data using that first server (that's a first RPC), then retrieves it from the storage node (that's another RPC). End users don't care that retrieving data involves a 2-step process, they just want to call {@code get} and be called back when thedata (a byte array) is available.

This is where what's probably the most useful feature of {@code Deferred}comes in. When the user calls {@code get}, the low-level library will issue a first RPC to the index server to locate the piece of data requested by the user. When issuing this {@code lookup} RPC, a {@code Deferred} getscreated. The low-level {@code get} code adds a first callback to processthe {@code lookup} response and then returns it to the user.

 1st callback       2nd callback Deferred:  index lookup  -->  user callback result: (none available)

Eventually, the {@code lookup} RPC completes, and the {@code Deferred} isgiven the {@code lookup} response. So before triggering the firstcallback, the {@code Deferred} will be in this state:

 1st callback       2nd callback Deferred:  index lookup  -->  user callback result: lookup response

The first callback runs and now knows where to find the piece of data initially requested. It issues the {@code get} request to the right storagenode. Doing so creates another {@code Deferred}, let's call it {@code (B)}, which is then returned by the {@code index lookup} callback.And this is where the magic happens. Now we're in this state:

 (A)        2nd callback    |   (B) | Deferred:  user callback   |   Deferred:  (no more callbacks) result: Deferred (B)       |   result: (none available)

Because a callback returned a {@code Deferred}, we can't invoke the user callback just yet, since the user doesn't want their callback receive a {@code Deferred}, they want it to receive a byte array. The current callback gets paused and stops processing the callback chain. This callback chain needs to be resumed whenever the {@code Deferred} ofthe {@code get} call [{@code (B)}] completes. In order to achieve that, a callback is added to that other {@code Deferred} that will resumethe execution of the callback chain.

 (A)        2nd callback    |   (B)        1st callback | Deferred:  user callback   |   Deferred:  resume (A) result: Deferred (B)       |   result: (none available)

Once {@code (A)} added the callback on {@code (B)}, it can return immediately, there's no need to wait, block a thread or anything like that. So the whole process of receiving the {@code lookup} response and sendingout the {@code get} RPC happened really quickly, without blocking anything.

Now when the {@code get} response comes back from the network, the RPClayer de-serializes the byte array, as usual, and hands it to {@code (B)}:

 (A)        2nd callback    |   (B)        1st callback | Deferred:  user callback   |   Deferred:  resume (A) result: Deferred (B)       |   result: byte array

{@code (B)}'s first and only callback is going to set the result of {@code (A)} and resume {@code (A)}'s callback chain.

 (A)        2nd callback    |   (B)        1st callback | Deferred:  user callback   |   Deferred:  resume (A) result: byte array         |   result: byte array

So now {@code (A)} resumes its callback chain, and invokes the user'scallback with the byte array in argument, which is what they wanted.

 (A)                        |   (B)        1st callback | Deferred:  (no more cb)    |   Deferred:  resume (A) result: (return value of   |   result: byte array the user's cb)

Then {@code (B)} moves on to its next callback in the chain, but there arenone, so {@code (B)} is done too.

 (A)                        |   (B) | Deferred:  (no more cb)    |   Deferred:  (no more cb) result: (return value of   |   result: byte array the user's cb)

The whole process of reading the {@code get} response, resuming the initial{@code Deferred} and executing the second {@code Deferred} happened all inthe same thread, sequentially, and without blocking anything (provided that the user's callback didn't block, as it must not).

What we've done is essentially equivalent to dynamically building an implicit finite state machine to handle the life cycle of the {@code get}request. This simple API allows you to build arbitrarily complex processing pipelines that make dynamic decisions at each stage of the pipeline as to what to do next.

Handling errors

A {@code Deferred} has in fact not one but two callback chains. The firstchain is the "normal" processing chain, and the second is the error handling chain. Twisted calls an error handling callback an "errback", so we've kept that term here. When the asynchronous processing completes with an error, the {@code Deferred} must be given the {@link Exception} that wascaught instead of giving it the result (or if no {@link Exception} wascaught, one must be created and handed to the {@code Deferred}). When the current result of a {@code Deferred} is an instance of {@link Exception}, the next errback is invoked. As for normal callbacks, whatever the errback returns becomes the current result. If the current result is still an instance of {@link Exception}, the next errback is invoked. If the current result is no longer an {@link Exception}, the next callback is invoked.

When a callback or an errback itself throws an exception, it is caught by the {@code Deferred} and becomes the current result, which meansthat the next errback in the chain will be invoked with that exception in argument. Note that {@code Deferred} will only catch {@link Exception}s, not any {@link Throwable} or {@link Error}.

Contract and Invariants

Read this carefully as this is your warranty.

A {@code Deferred} can receive only one initial result.
Only one thread at a time is going to execute the callback chain.
Each action taken by a callback happens-before the next callback is invoked. In other words, if a callback chain manipulates a variable (and no one else manipulates it), no synchronization is required.
The thread that executes the callback chain is the thread that hands the initial result to the {@code Deferred}. This class does not create or manage any thread or executor.
As soon as a callback is executed, the {@code Deferred} will lose itsreference to it.
Every method that adds a callback to a {@code Deferred} does so in{@code O(1)}.
A {@code Deferred} cannot receive itself as an initial orintermediate result, as this would cause an infinite recursion.
You must not build a cycle of mutually dependant {@code Deferred}s, as this would cause an infinite recursion (thankfully, it will quickly fail with a {@link CallbackOverflowError}).
Callbacks and errbacks cannot receive a {@code Deferred} inargument. This is because they always receive the result of a previous callback, and when the result becomes a {@code Deferred}, we suspend the execution of the callback chain until the result of that other {@code Deferred} is available.
Callbacks cannot receive an {@link Exception} in argument. Thisbecause they're always given to the errbacks.
Using the monitor of a {@code Deferred} can lead to a deadlock, sodon't use it. In other words, writing
```
synchronized (some_deferred) { ... }
```
(or anything equivalent) voids your warranty.

@param < T> The type of the deferred result.