stiemannkj1 February 2016

Does double-checked locking work with a final Map in Java?

I'm trying to implement a thread-safe Map cache, and I want the cached Strings to be lazily initialized. Here's my first pass at an implementation:

public class ExampleClass {

    private static final Map<String, String> CACHED_STRINGS = new HashMap<String, String>();

    public String getText(String key) {

        String string = CACHED_STRINGS.get(key);

        if (string == null) {

            synchronized (CACHED_STRINGS) {

                string = CACHED_STRINGS.get(key);

                if (string == null) {
                    string = createString();
                    CACHED_STRINGS.put(key, string);
                }
            }
        }

        return string;
    }
}

After writing this code, Netbeans warned me about "double-checked locking," so I started researching it. I found The "Double-Checked Locking is Broken" Declaration and read it, but I'm unsure if my implementation falls prey to the issues it mentioned. It seems like all the issues mentioned in the article are related to object instantiation with the new operator within the synchronized block. I'm not using the new operator, and Strings are immutable, so I'm not sure that if the article is relevant to this situation or not. Is this a thread-safe way to cache strings in a HashMap? Does the thread-safety depend on what action is taken in the createString() method?

Answers


Jarrod Roberson February 2016

Non-trivial problem domains:

Concurrency is easy to do and hard to do correctly.

Caching is easy to do and hard to do correctly.

Both are right up there with Encryption in the category of hard to get right without an intimate understanding of the problem domain and its many subtle side effects and behaviors.

Combine them and you get a problem an order of magnitude harder than either one.

This is a non-trivial problem that your naive implementation will not solve in a bug free manner. The HashMap you are using is not going to threadsafe if any accesses are not checked and serialized, it will not be performant and will cause lots of contention that will cause lot of blocking and latency depending on the use.

The proper way to implement a lazy loading cache is to use something like Guava Cache with a Cache Loader it takes care of all the concurrency and cache race conditions for you transparently. A cursory glance through the source code shows how they do it.


weston February 2016

No it's not correct because the first access is done out side of a sync block.

It's somewhat down to how get and put might be implemented. You must bare in mind that they are not atomic operations.

For example, what if they were implemented like this:

public T get(string key){
    Entry e = findEntry(key);
    return e.value;
}

public void put(string key, string value){
    Entry e = addNewEntry(key);
    //danger for get while in-between these lines
    e.value = value;
}

private Entry addNewEntry(key){
   Entry entry = new Entry(key, ""); //a new entry starts with empty string not null!
   addToBuckets(entry); //now it's findable by get
   return entry; 
}

Now the get might not return null when the put operation is still in progress, and the whole getText method could return the wrong value.

The example is a bit convoluted, but you can see that correct behaviour of your code relies on the inner workings of the map class. That's not good.

And while you can look that code up, you cannot account for compiler, JIT and processor optimisations and inlining which effectively can change the order of operations just like the wacky but correct way I chose to write that map implementation.


Prim February 2016

Consider use of a concurrent hashmap and the method Map.computeIfAbsent() which takes a function to call to compute a default value if key is absent from the map.

Map<String, String> cache = new ConcurrentHashMap<>(  );
cache.computeIfAbsent( "key", key -> "ComputedDefaultValue" );

Javadoc: If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.


user2023577 February 2016

No, and ConcurrentHashMap would not help.

Recap: the double check idiom is typically about assigning a new instance to a variable/field; it is broken because the compiler can reorder instructions, meaning the field can be assigned with a partially constructed object.

For your setup, you have a distinct issue: the map.get() is not safe from the put() which may be occurring thus possibly rehashing the table. Using a Concurrent hash map fixes ONLY that but not the risk of a false positive (that you think the map has no entry but it is actually being made). The issue is not so much a partially constructed object but the duplication of work.

As for the avoidable guava cacheloader: this is just a lazy-init callback that you give to the map so it can create the object if missing. This is essentially the same as putting all the 'if null' code inside the lock, which is certainly NOT going to be faster than good old direct synchronization. (The only times it makes sense to use a cacheloader is for pluggin-in a factory of such missing objects while you are passing the map to classes who don't know how to make missing objects and don't want to be told how).

Post Status

Asked in February 2016
Viewed 2,863 times
Voted 13
Answered 4 times

Search




Leave an answer