We have Ignite running in server mode in our JVM. Ignite is going into deadlock in following scenario. I have added the thread stack at the end of this question
a.Create a cache with write through enabled
b.In CacheWriter.write() implementation
1.Wait for a second to for step c to be invoked
2.Try to read from another cache
c. While step b is executing Trigger a thread which will create a new
cache.
d.On executing above scenario, Ignite is going into deadlock as
1.Readlock has been acquired by cache.put() operation
2.When cache creation is triggered in separate thread, Partition Map Exchange is also started
3.PME tries to acquire all 16 locks , but wait as one Read lock is already acquire
4.While reading from cache, cache.get() can not complete as it waits for current Partition Map Exchange to complete
We have face this issue in production and above scenario is just its reproducer. Write Through implementation is just trying to read from cache and cache creation is happening in totally different thread
Why Ignite is blocking all cache.get() operation for PME when it does not even have all required locks? Shouldn’t the call be blocked only after PME operation has all the locks?
why PME stops everything? If I create cache A then only related operation for cache A or its cache group should be stopped
Also is there any solution to solve this deadlock?
Thread executing cache.put() and write through
"main" #1 prio=5 os_prio=0 tid=0x0000000003505000 nid=0x43f4 waiting on condition [0x000000000334b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4870)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4830)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1463)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1128)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:688)
at ReadWriteThroughInterceptor.write(ReadWriteThroughInterceptor.java:70)
at org.apache.ignite.internal.processors.cache.GridCacheLoaderWriterStore.write(GridCacheLoaderWriterStore.java:121)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter.put(GridCacheStoreManagerAdapter.java:585)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6468)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6239)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5923)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4041)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3935)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2039)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1923)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1734)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1717)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2327)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2553)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2016)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1833)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1692)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:300)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:481)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:441)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:249)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1147)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:615)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2571)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2550)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1337)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:868)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.writeToCache(WriteReadThroughTest.java:54)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.lambda$runTest$0(WriteReadThroughTest.java:26)
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest$$Lambda$1095/2028767654.execute(Unknown Source)
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:50)
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:37)
at org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3060)
at WriteReadThroughTest.runTest(WriteReadThroughTest.java:24)
PME thread waiting for locks
"exchange-worker-#39" #56 prio=5 os_prio=0 tid=0x0000000022b91800 nid=0x450 waiting on condition [0x000000002866e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076e73b428> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:897)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1222)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lockInterruptibly(ReentrantReadWriteLock.java:998)
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lock0(StripedCompositeReadWriteLock.java:192)
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lockInterruptibly(StripedCompositeReadWriteLock.java:172)
at org.apache.ignite.internal.util.IgniteUtils.writeLock(IgniteUtils.java:10487)
at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.updateTopologyVersion(GridDhtPartitionTopologyImpl.java:272)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updateTopologies(GridDhtPartitionsExchangeFuture.java:1269)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1028)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3370)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3197)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.lang.Thread.run(Thread.java:748)
Technically, you have answered your question on your own, that is great work, to be honest.
You are not supposed to have blocking methods in your write-through cache store implementation that might get in conflict with PME or cause pool starvation.
You have to remember that PME is a show-stopper mechanism: the entire user load is stopped. In short, that is required to ensure ACID guarantees. The lock indeed is divided into multiple parts to speed up the processing, i.e. allowing up to 16 threads to perform cache operations concurrently. But a PME does need exclusive control over the cluster, thus it acquires a write lock over all the threads.
Shouldn’t the call be blocked only after PME operation has all the locks?
Yes, that's indeed how it's supposed to work. But in your case, PME tries to get the write lock, whereas the read lock is there, therefore it's waiting for its completion, and all further read locks are being queued after the write lock.
Also is there any solution to solve this deadlock?
But still, it all depends on your use case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With