When you start an asynchronous operation, you typically want to be called back when the operation completes. If the operation was successful, you want your callback to use its result to carry on what you were doing at the time you started the asynchronous operation. If there was an error, you want to trigger some error handling code.
But there's more to a {@code Deferred} than a single callback. You can addarbitrary number of callbacks, which effectively allows you to easily build complex processing pipelines in a really simple and elegant way.
When the user of your client library invokes {@code get}, you assemble a request and send it out to the remote server through a socket. Before sending it to the socket, you create a {@code Deferred} and you store itsomewhere, for example in a map, to keep an association between the request and this {@code Deferred}. You then return this {@code Deferred} to theuser, this is how they will access the deferred result as soon as the RPC completes.
Sooner or later, the RPC will complete (successfully or not), and your socket will become readable (or maybe closed, in the event of a failure). Let's assume for now that everything works as expected, and thus the socket is readable, so you read the response from the socket. At this point you extract the result of the remote {@code get} call, and you hand it out tothe {@code Deferred} you created for this request (remember, you had tostore it somewhere, so you could give it the deferred result once you have it). The {@code Deferred} then stores this result and triggers anycallback that may have been added to it. The expectation is that the user of your client library, after calling your {@code get} method, will add a{@link Callback} to the {@code Deferred} you gave them. This way, when thedeferred result becomes available, you'll call it with the result in argument.
So far what we've explained is nothing more than a {@code Future} with acallback associated to it. But there's more to {@code Deferred} than justthis. Let's assume now that someone else wants to build a caching layer on top of your client library, to avoid repeatedly {@code get}ting the same value over and over again through the network. Users who want to use the cache will invoke {@code get} on the caching library instead of directlycalling your client library.
Let's assume that the caching library already has a result cached for a {@code get} call. It will create a {@code Deferred}, and immediately hand it the cached result, and it will return this {@code Deferred} to the user.The user will add a {@link Callback} to it, which will be immediatelyinvoked since the deferred result is already available. So the entire {@code get} call completed virtually instantaneously and entirely from thesame thread. There was no context switch (no other thread involved, no I/O and whatnot), nothing ever blocked, everything just happened really quickly.
Now let's assume that the caching library has a cache miss and needs to do a remote {@code get} call using the original client library describedearlier. The RPC is sent out to the remote server and the client library returns a {@code Deferred} to the caching library. This is where thingsbecome exciting. The caching library can then add its own callback to the {@code Deferred} before returning it to the user. This callback will takethe result that came back from the remote server, add it to the cache and return it. As usual, the user then adds their own callback to process the result. So now the {@code Deferred} has 2 callbacks associated to it:
1st callback 2nd callback Deferred: add to cache --> user callbackWhen the RPC completes, the original client library will de-serialize the result from the wire and hand it out to the {@code Deferred}. The first callback will be invoked, which will add the result to the cache of the caching library. Then whatever the first callback returns will be passed on to the second callback. It turns out that the caching callback returns the {@code get} response unchanged, so that will be passed on to the usercallback.
Now it's very important to understand that the first callback could have returned another arbitrary value, and that's what would have been passed to the second callback. This may sound weird at first but it's actually the key behind {@code Deferred}.
To illustrate why, let's complicate things a bit more. Let's assume the remote service that serves those {@code get} requests is a fairly simpleand low-level storage service (think {@code memcached}), so it only works with byte arrays, it doesn't care what the contents is. So the original client library is only de-serializing the byte array from the network and handing that byte array to the {@code Deferred}.
Now you're writing a higher-level library that uses this storage system to store some of your custom objects. So when you get the byte array from the server, you need to further de-serialize it into some kind of an object. Users of your higher-level library don't care about what kind of remote storage system you use, the only thing they care about is {@code get}ting those objects asynchronously. Your higher-level library is built on top of the original low-level library that does the RPC communication.
When the users of the higher-level library call {@code get}, you call {@code get} on the lower-level library, which issues an RPC call andreturns a {@code Deferred} to the higher-level library. The higher-levellibrary then adds a first callback to further de-serialize the byte array into an object. Then the user of the higher-level library adds their own callback that does something with that object. So now we have something that looks like this:
1st callback 2nd callback Deferred: de-serialize to an object --> user callback
When the result comes in from the network, the byte array is de-serialized from the socket. The first callback is invoked and its argument is the initial result, the byte array. So the first callback further de-serializes it into some object that it returns. The second callback is then invoked and its argument is the result of the previous callback, that is the de-serialized object.
Now back to the caching library, which has nothing to do with the higher level library. All it does is, given an object that implements some interface with a {@code get} method, it keeps a map of whatever arguments{@code get} receives to an {@code Object} that was cached for thisparticular {@code get} call. Thanks to the way the callback chain works,it's possible to use the caching library together with the higher-level library transparently. Users who want to use caching simply need to use the caching library together with the higher level library. Now when they call {@code get} on the caching library, and there's a cache miss,here's what happens, step by step:
1st callback 2nd callback 3rd callback Deferred: de-serialize --> add to cache --> user callback result: (none available)Once the response comes back, the first callback is invoked, it de-serializes the object, returns it. The current result of the {@code Deferred} becomes the de-serialized object. The current state ofthe {@code Deferred} is as follows:
2nd callback 3rd callback Deferred: add to cache --> user callback result: de-serialized objectBecause there are more callbacks in the chain, the {@code Deferred} invokesthe next one and gives it the current result (the de-serialized object) in argument. The callback adds that object to its cache and returns it unchanged.
3rd callback Deferred: user callback result: de-serialized objectFinally, the user's callback is invoked with the object in argument.
Deferred: (no more callbacks) result: (whatever the user's callback returned)If you think this is becoming interesting, read on, you haven't reached the most interesting thing about {@code Deferred} yet.
This is where what's probably the most useful feature of {@code Deferred}comes in. When the user calls {@code get}, the low-level library will issue a first RPC to the index server to locate the piece of data requested by the user. When issuing this {@code lookup} RPC, a {@code Deferred} getscreated. The low-level {@code get} code adds a first callback to processthe {@code lookup} response and then returns it to the user.
1st callback 2nd callback Deferred: index lookup --> user callback result: (none available)Eventually, the {@code lookup} RPC completes, and the {@code Deferred} isgiven the {@code lookup} response. So before triggering the firstcallback, the {@code Deferred} will be in this state:
1st callback 2nd callback Deferred: index lookup --> user callback result: lookup responseThe first callback runs and now knows where to find the piece of data initially requested. It issues the {@code get} request to the right storagenode. Doing so creates another {@code Deferred}, let's call it {@code (B)}, which is then returned by the {@code index lookup} callback.And this is where the magic happens. Now we're in this state:
(A) 2nd callback | (B) | Deferred: user callback | Deferred: (no more callbacks) result: Deferred (B) | result: (none available)Because a callback returned a {@code Deferred}, we can't invoke the user callback just yet, since the user doesn't want their callback receive a {@code Deferred}, they want it to receive a byte array. The current callback gets paused and stops processing the callback chain. This callback chain needs to be resumed whenever the {@code Deferred} ofthe {@code get} call [{@code (B)}] completes. In order to achieve that, a callback is added to that other {@code Deferred} that will resumethe execution of the callback chain.
(A) 2nd callback | (B) 1st callback | Deferred: user callback | Deferred: resume (A) result: Deferred (B) | result: (none available)Once {@code (A)} added the callback on {@code (B)}, it can return immediately, there's no need to wait, block a thread or anything like that. So the whole process of receiving the {@code lookup} response and sendingout the {@code get} RPC happened really quickly, without blocking anything.
Now when the {@code get} response comes back from the network, the RPClayer de-serializes the byte array, as usual, and hands it to {@code (B)}:
(A) 2nd callback | (B) 1st callback | Deferred: user callback | Deferred: resume (A) result: Deferred (B) | result: byte array{@code (B)}'s first and only callback is going to set the result of {@code (A)} and resume {@code (A)}'s callback chain.
(A) 2nd callback | (B) 1st callback | Deferred: user callback | Deferred: resume (A) result: byte array | result: byte arraySo now {@code (A)} resumes its callback chain, and invokes the user'scallback with the byte array in argument, which is what they wanted.
(A) | (B) 1st callback | Deferred: (no more cb) | Deferred: resume (A) result: (return value of | result: byte array the user's cb)Then {@code (B)} moves on to its next callback in the chain, but there arenone, so {@code (B)} is done too.
(A) | (B) | Deferred: (no more cb) | Deferred: (no more cb) result: (return value of | result: byte array the user's cb)The whole process of reading the {@code get} response, resuming the initial{@code Deferred} and executing the second {@code Deferred} happened all inthe same thread, sequentially, and without blocking anything (provided that the user's callback didn't block, as it must not).
What we've done is essentially equivalent to dynamically building an implicit finite state machine to handle the life cycle of the {@code get}request. This simple API allows you to build arbitrarily complex processing pipelines that make dynamic decisions at each stage of the pipeline as to what to do next.
When a callback or an errback itself throws an exception, it is caught by the {@code Deferred} and becomes the current result, which meansthat the next errback in the chain will be invoked with that exception in argument. Note that {@code Deferred} will only catch {@link Exception}s, not any {@link Throwable} or {@link Error}.
synchronized (some_deferred) { ... }(or anything equivalent) voids your warranty.
|
|