Redis Transactions
In the world of concurrent and parallel programming, there will be some point where you will write something that requires access to a resource without any other thread or process accessing it. In Redis this is actually very common; multiple readers, multiple writers, all dealing with the same data structures. Redis includes a combination of five commands for handling optimistic data locking. Those commands are WATCH, UNWATCH, MULTI, EXEC, and DISCARD.The typical path for code that intends to modify data in Redis that requires checking values before updating will have a 4-step process. First you WATCH data that you want to modify/check on. Next you check your data. If it isn't what you need to continue, you UNWATCH and return. If it was what you were looking for, you start a MULTI transaction, send your commands, then EXEC them. Below is a simple example that transfers money between two accounts, making sure to prevent overdrafts.
def transfer_funds(conn, sender, recipient, amount): pipe = conn.pipeline(True) while 1: try: pipe.watch(sender) if pipe.hget(sender, 'money') < amount: pipe.unwatch() return False pipe.multi() pipe.hincrby(sender, 'money', -amount) pipe.hincrby(recipient, 'money', amount) pipe.execute() return True except redis.exceptions.WatchError: passThe only command not used above is the DISCARD command, which will discard any commands passed after MULTI. As you can see, we will attempt to loop through the above money transfer until it either fails due to no money, or succeeds. This is fine, but if you have some account that is updated often, or if you have some shared data structure that is constantly being updated, you can fall into the except clause and have to retry the transaction repeatedly.
Locking
In other software, rather than using WATCH/MULTI/EXEC, there is a primitive called a Lock, which allows us to gain exclusive access to a resource. You acquire the lock, perform your operation, then release the lock. Because of the variety of commands in Redis, you can build a lock using the available tools. Unfortunately, using those tools correctly is not easy. In my research about locks in Redis, I haven't found a single implementation available online that is 100% correct (until now). Some of the problems with locks that I have seen are as follows:- A process acquired a lock, operated on data, but took too long and the lock was automatically released. The process doesn't know that it lost the lock, or may even release the lock that some other process has since acquired.
- A process acquired a lock for an operation that takes a long time, and crashed. Other processes that want the lock don't know what process had the lock, so cannot detect that the process failed, and wastes time waiting for the lock to be released.
- One process had a lock, but it timed out. Other processes try to acquire the lock simultaneously, and multiple processes are able to get the lock.
- Because of a combination of #1 and #3, many processes now hold the believed exclusive lock, leading to data corruption or other incorrect behavior.
Building a mostly correct lock in Redis is pretty easy. Building a completely correct lock in Redis isn't actually much more difficult, but it requires being extra careful about the operations we use to build it (in the book, we first build a simple lock to show the basics, then add in the full functionality, here we will jump to the fully-featured lock).
The first part of making sure that no other code can run is to "acquire" the lock. The natural building block to use for acquiring a lock is the SETNX command, which will only set a value if the value doesn't already exist. We will set the value to be a unique identifier to ensure that no other process can get the lock. If we were able to set the value (we have acquired the lock), then we immediately set the expiration of the key to ensure that if we take too long with our operation, the lock is eventually released. But if our client happens to crash (and the worst place for it to crash for us is between SETNX and EXPIRE), we still want the lock to eventually time out. To handle that situation, any time a client fails to get the lock, the client will check the expiration on the lock, and if it's not set, set it. Because clients are going to be checking and setting timeouts if they fail to get a lock, the lock will always have a timeout, and will eventually expire, letting other clients get a timed out lock.
But what if multiple clients set the expiration time at the same time? That is fine, they will be running essentially at the same time, so the expiration will be set for the same time.
def acquire_lock(conn, lockname, identifier, atime=10, ltime=10): end = time.time() + atime while end > time.time(): if conn.setnx(lockname, identifier): conn.expire(lockname, ltime) return identifier elif not conn.ttl(lockname): conn.expire(lockname, ltime) time.sleep(.001) return FalseTo release the lock, we have to be at least as careful as when acquiring the lock. Between the time when we acquired the lock and when we are trying to release it, someone may have done bad things to the lock. To release the lock, we actually need to WATCH the lock key, then check to make sure that the value is still the same as what we set it to before we delete it. This also prevents us from releasing a lock multiple times.
def release_lock(conn, lockname, identifier): pipe = conn.pipeline(True) while True: try: pipe.watch(lockname) if pipe.get(lockname) == identifier: pipe.multi() pipe.delete(lockname) pipe.execute() return True pipe.unwatch() break except redis.exceptions.WatchError: pass # we lost the lock return FalseAnd as a bonus, a Python context manager with the updated transfer_funds() code using it...
import contextlib, uuid class LockException(Exception): pass @contextlib.contextmanager def redis_lock(conn, lockname, atime=10, ltime=10): identifier = str(uuid.uuid4()) if acquire_lock(**locals()) != identifier: raise LockException("could not acquire lock") try: yield identifier finally: if not release_lock(conn, lockname, identifier): raise LockException("lock was lost") def transfer_funds(conn, sender, recipient, amount): with redis_lock(conn, 'lock:' + sender): if conn.hget(sender, 'money') < amount: return False pipe = conn.pipeline(True) pipe.hincrby(sender, 'money', -amount) pipe.hincrby(recipient, 'money', amount) pipe.execute() return TrueIf you generated your identifier correctly (UUID like we did, or IP address + process id + thread id, etc.), this lock is correct. Not almost correct, not mostly correct, not a little bit wrong. Completely correct. Other locks that I've seen for Redis have one of a couple different mistakes, usually either accidentally resetting the timeout when you shouldn't, or deleting the lock when you shouldn't. Those kinds of errors lead to the 4 problems I listed earlier.
Hey Josh,
ReplyDeleteThanks for the article.
The new EVAL command would be able to achieve your "simplified locking" scheme.
lock (setnx with expire)
local val = redis.call('setnx', KEYS[1], KEYS[2])
if (val == 1)
then
redis.call('expire', KEYS[1], KEYS[3])
return KEYS[2]
end
return 0
unlock (delete if current)
local val = redis.call('get', KEYS[1])
if (val == KEYS[2])
then
redis.call('del', KEYS[1])
return 1
end
return 0
Thoughts ?
Indeed, with EVAL/EVALSHA, locks can be cleaned up. See chapter 11 section 2 of my book, Redis in Action, for the version that I use. It is similar to what you describe, but uses get+setex instead of setnx+expire.
DeleteHi Josiah,
ReplyDeletethe first of the problems you're describing in your intro is the following:
[..]
1. A process acquired a lock, operated on data, but took too long and the lock was automatically released. The process doesn't know that it lost the lock, or may even release the lock that some other process has since acquired.
[..]
I don't see how you address this issue in your design. Here's an example:
1. Client A tries to acquire the lock and succeeds, at time 0.
2. Client A immediately sets an expiration to let's say, 30 seconds
3. Client A starts the critical section which is quite heavy, thus taking totally 40 seconds
4. Somewhere in the middle (let's say at time 34) comes Client B, trying to acquire the lock
5. Since the lock created by client A expired 4 seconds ago (at time 30), B's call to setnx succeeds and it acquires the lock
6. Client A never finds out and between times 34 and 40, both A and B occupy the critical section
7. At least A does not release B's lock, but that's the least of our problems right now. Our biggest one is that A and B were in the critical section for 6 seconds
Am I missing something? If not, I don't see any way around this, except for a "refresh_lock" issued by A at intervals necessarily smaller than the expiration time.
Of course client A could always set an incredibly high expiration time, but that would practically:
1. Lower the possibility that A runs late and occupies the critical section longer than the (incredibly high) timeout, thus losing the lock. But stil, "lower" means "tends to zero but it's not zero", so the lock is not completely safe, but is "very safe", with "very" being quantified as the probability that something goes really wrong and A needs (more) incredibly large amount of time to leave the critical section.
2. Had a terrible impact on concurrency. In case A crashed, then all other clients would be locked out waiting for an incredibly high expiration to take place.
If you want locks to time out properly, the critical observation is that you must set your lock timeout to be longer than the time your process must be inside the critical section.
DeleteAs an alternative to picking longer lock times is to pick a shorter time and to "refresh" the lock periodically. Something like https://gist.github.com/josiahcarlson/5453257 , which is an answer to a similar question from the Manning forums: http://www.manning-sandbox.com/thread.jspa?threadID=56702&tstart=0 .
Of course then you are going to need to pick a refresh timeout that is long enough to prevent a slow refresh from losing the lock, but not so long as to block other clients that want the lock.
Building a lock in Redis has a variety of trade-offs, and the locks that can be built in Redis aren't necessarily a useful solution for 100% of applications that could find a use for locks. If you know what timeouts to use for initial and refresh timeouts, that can help locks in Redis be useful for more applications. But if you don't know what timeouts to use, or if you can't make refresh calls, then they aren't going to work for you.
Thanks for the nice post. Our first locking algorithm had problems described by you. Also in new Redis version 'SET' command has many useful options such as key expiration and nx. Definitely I'll buy your book in Kindle version.
ReplyDeleteHi,
ReplyDeleteI want to set a lock for a particular KEY.
can some one tell me how to do that.
-Thanks,
Please see this message in the forum (locks are semantics over data, you can't actually prevent writing to a key in Redis): https://forums.manning.com/posts/list/39216.page
Delete