Atlassian Cache crashes Confluence/Jira upon app disable

Hi,

I have an app which uses atlassian-cache-api in version 4.0.0. I’ve observed this problem for a Confluence and for a Jira app. Upon installing a new version, uninstall, or deactivating the app the whole instances dies. I see errors like the following:

confluence-7.3.3 | Exception in thread "com.atlassian.upm.core.async.AutoProgressIncrementer" java.util.concurrent.CompletionException: com.atlassian.vcache.ExternalCacheException: Failed due to UNCLASSIFIED_FAILURE
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932)
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:946)
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2266)
confluence-7.3.3 |  at java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:143)
confluence-7.3.3 |  at com.atlassian.confluence.impl.concurrency.CompletionStageUtils.foldResult(CompletionStageUtils.java:19)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.SynchronousExternalCache.put(SynchronousExternalCache.java:184)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.SynchronousExternalCache.put(SynchronousExternalCache.java:168)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.SynchronousExternalCache.put(SynchronousExternalCache.java:151)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.SynchronousExternalCache.put(SynchronousExternalCache.java:138)
confluence-7.3.3 |  at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.store(ConfluenceCachingBandanaPersister.java:143)
confluence-7.3.3 |  at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.store(ConfluenceCachingBandanaPersister.java:118)
confluence-7.3.3 |  at com.atlassian.bandana.DefaultBandanaManager.setValue(DefaultBandanaManager.java:48)
confluence-7.3.3 |  at jdk.internal.reflect.GeneratedMethodAccessor667.invoke(Unknown Source)
confluence-7.3.3 |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
confluence-7.3.3 |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
confluence-7.3.3 |  at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26)
confluence-7.3.3 |  at com.sun.proxy.$Proxy571.setValue(Unknown Source)
confluence-7.3.3 |  at jdk.internal.reflect.GeneratedMethodAccessor667.invoke(Unknown Source)
confluence-7.3.3 |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
confluence-7.3.3 |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
confluence-7.3.3 |  at com.atlassian.plugin.osgi.bridge.external.HostComponentFactoryBean$DynamicServiceInvocationHandler.invoke(HostComponentFactoryBean.java:131)
confluence-7.3.3 |  at com.sun.proxy.$Proxy571.setValue(Unknown Source)
confluence-7.3.3 |  at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.lambda$put$1(ConfluencePluginSettings.java:49)
confluence-7.3.3 |  at com.atlassian.sal.spring.component.SpringHostContextAccessor.lambda$doInTransaction$0(SpringHostContextAccessor.java:70)
confluence-7.3.3 |  at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
confluence-7.3.3 |  at com.atlassian.sal.spring.component.SpringHostContextAccessor.doInTransaction(SpringHostContextAccessor.java:68)
confluence-7.3.3 |  at com.atlassian.confluence.spring.transaction.interceptor.ConfluenceSpringHostContextAccessor.access$001(ConfluenceSpringHostContextAccessor.java:21)
confluence-7.3.3 |  at com.atlassian.confluence.spring.transaction.interceptor.ConfluenceSpringHostContextAccessor.lambda$doInTransaction$3(ConfluenceSpringHostContextAccessor.java:72)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContextInternal(VCacheRequestContextManager.java:84)
confluence-7.3.3 |  at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContext(VCacheRequestContextManager.java:68)
confluence-7.3.3 |  at com.atlassian.confluence.spring.transaction.interceptor.ConfluenceSpringHostContextAccessor.doInTransaction(ConfluenceSpringHostContextAccessor.java:72)
confluence-7.3.3 |  at jdk.internal.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
confluence-7.3.3 |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
confluence-7.3.3 |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
confluence-7.3.3 |  at com.atlassian.plugin.util.ContextClassLoaderSettingInvocationHandler.invoke(ContextClassLoaderSettingInvocationHandler.java:26)
confluence-7.3.3 |  at com.sun.proxy.$Proxy374.doInTransaction(Unknown Source)
confluence-7.3.3 |  at jdk.internal.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
confluence-7.3.3 |  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
confluence-7.3.3 |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
confluence-7.3.3 |  at com.atlassian.plugin.osgi.bridge.external.HostComponentFactoryBean$DynamicServiceInvocationHandler.invoke(HostComponentFactoryBean.java:131)
confluence-7.3.3 |  at com.sun.proxy.$Proxy374.doInTransaction(Unknown Source)
confluence-7.3.3 |  at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.put(ConfluencePluginSettings.java:47)
confluence-7.3.3 |  at com.atlassian.upm.core.impl.NamespacedPluginSettings.put(NamespacedPluginSettings.java:34)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.storeOngoingTask(AsynchronousTaskStatusStoreImpl.java:100)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.lambda$updateTaskStatus$1(AsynchronousTaskStatusStoreImpl.java:76)
confluence-7.3.3 |  at com.atlassian.upm.impl.Locks.runWithLock(Locks.java:113)
confluence-7.3.3 |  at com.atlassian.upm.impl.Locks.writeWithLock(Locks.java:70)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.updateTaskStatus(AsynchronousTaskStatusStoreImpl.java:66)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AsynchronousTaskManager$2.updateStatus(AsynchronousTaskManager.java:118)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AutoProgressIncrementer.updateProgress(AutoProgressIncrementer.java:68)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AutoProgressIncrementer.access$500(AutoProgressIncrementer.java:14)
confluence-7.3.3 |  at com.atlassian.upm.core.async.AutoProgressIncrementer$UpdateTask.run(AutoProgressIncrementer.java:88)
confluence-7.3.3 |  at java.base/java.util.TimerThread.mainLoop(Timer.java:556)
confluence-7.3.3 |  at java.base/java.util.TimerThread.run(Timer.java:506)
confluence-7.3.3 | Caused by: com.atlassian.vcache.ExternalCacheException: Failed due to UNCLASSIFIED_FAILURE
confluence-7.3.3 |  at com.atlassian.vcache.internal.legacy.LegacyUtils.mapException(LegacyUtils.java:51)
confluence-7.3.3 |  at com.atlassian.vcache.internal.legacy.LegacyStableReadExternalCache.mapException(LegacyStableReadExternalCache.java:125)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractExternalCache.perform(AbstractExternalCache.java:99)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractExternalCache.perform(AbstractExternalCache.java:74)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractStableReadExternalCache.put(AbstractStableReadExternalCache.java:247)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.metrics.TimedExternalWriteOperationsUnbuffered.put(TimedExternalWriteOperationsUnbuffered.java:36)
confluence-7.3.3 |  ... 49 more
confluence-7.3.3 | Caused by: com.atlassian.cache.CacheException: java.lang.IllegalStateException: The com.atlassian.bandana.BandanaPersister Cache is not alive (STATUS_SHUTDOWN)
confluence-7.3.3 |  at com.atlassian.cache.ehcache.DelegatingCache.put(DelegatingCache.java:97)
confluence-7.3.3 |  at com.atlassian.confluence.cache.ehcache.DefaultConfluenceEhCache.put(DefaultConfluenceEhCache.java:149)
confluence-7.3.3 |  at com.atlassian.confluence.cache.ConfluenceMonitoringCache.put(ConfluenceMonitoringCache.java:88)
confluence-7.3.3 |  at com.atlassian.vcache.internal.legacy.LegacyUtils.directPut(LegacyUtils.java:38)
confluence-7.3.3 |  at com.atlassian.vcache.internal.legacy.LegacyStableReadExternalCache.internalPut(LegacyStableReadExternalCache.java:64)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractStableReadExternalCache.lambda$null$17(AbstractStableReadExternalCache.java:250)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.VCacheLock.withLock(VCacheLock.java:33)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractStableReadExternalCache.lambda$put$18(AbstractStableReadExternalCache.java:250)
confluence-7.3.3 |  at com.atlassian.vcache.internal.core.service.AbstractExternalCache.perform(AbstractExternalCache.java:89)
confluence-7.3.3 |  ... 52 more
confluence-7.3.3 | Caused by: java.lang.IllegalStateException: The com.atlassian.bandana.BandanaPersister Cache is not alive (STATUS_SHUTDOWN)
confluence-7.3.3 |  at net.sf.ehcache.Cache$CacheStatus.checkAlive(Cache.java:4086)
confluence-7.3.3 |  at net.sf.ehcache.Cache.checkStatus(Cache.java:2777)
confluence-7.3.3 |  at net.sf.ehcache.Cache.putInternal(Cache.java:1556)
confluence-7.3.3 |  at net.sf.ehcache.Cache.put(Cache.java:1532)
confluence-7.3.3 |  at net.sf.ehcache.Cache.put(Cache.java:1497)
confluence-7.3.3 |  at com.atlassian.cache.ehcache.DelegatingCache.put(DelegatingCache.java:93)
confluence-7.3.3 |  ... 60 more

I use the cache like this:

    private final CacheSettings cacheSettings = new CacheSettingsBuilder()
            .expireAfterWrite(60, TimeUnit.MINUTES)
            .maxEntries(100000)
            .flushable()
            .replicateAsynchronously()
            .statisticsEnabled()
            .build();
    public PageStatusTree(PageManager pageManager, CacheManager cacheManager) {
        this.pageManager = pageManager;
        pageStatusCache = cacheManager.getCache("gardenerPageStatusCache", null, cacheSettings);
    }

Any advice would be helpful.

1 Like

I’ve received the recommendation to not create the cache in the constructor. Instead, I should use LifecycleAware. LifecycleAware's onStart and onStop method were never called on reloading the plugin (as expected). I’ve tried InitializingBean and DisposableBean instead. This results in the same problem.

@dennis.fischer we’ve reached out to some folks on the server platform team for them to investigate and hopefully chime in right here.

@nmansilla Thanks! Hope we get to a solution. If needed I might be able to create an example app which shows this behavior.

A few more notes, error occurs on:
Confluence 7.3.3 with builtin hsqldb running with atlas-debug.
Confluence 7.3.3 (official image) with Postgres running from within a Docker container.
App uses Spring Java Config for dependency injection / OSGi.

Does it crash only on 7.3.3, or does it happen on other versions as well?

Have seen it on 7.3.1, and had the same problem on Jira as well (do not remember the version but I guess it was 8.5.X). Ran into the problem while going for the DC performance tests and crashed our AWS stack. We might have found a solution, or the error disappeared. I cannot remember and the problem does no longer occur from what I’ve seen lately.

But I have no clue as to what the root cause is.

Confluence 7.2.0 (atlas-debug), same problem.

@dennis.fischer, if you could please provide a minimal app that demonstrates the problem, that would be super helpful.

Cheers,

Andrew

Alright, will do. Timezones though :wink: . Should be done in about 10 hours.

What I’ve seen is that <context:annotation-config/> seems to be part of the root cause, i.e., if <context:annotation-config/> is not in my spring xml file, then the instantiating the Cache works flawlessly.

What I observe in the log (org.springframework.beans = TRACE).

TRACE [QuickReload - Plugin Installer] [beans.factory.support.DisposableBeanAdapter] invokeCustomDestroyMethod Invoking destroy method 'shutdown' on bean with name 'cacheManager'

If this calls shutdown on the shared cacheManager, then all of Jira/Confluence will no longer work.

I will just check if <context:annotation-config/> is even needed. I added it after annotations such as @PostConstruct, @PreDestroy no longer worked.

1 Like

Problem occurs with both component-scan and annotation-config. The plugin does not do much, just import the CacheManager.

2 Likes

Thanks Dennis, that will be very helpful. I’ve logged this as CACHE-240.

Could this be a bug in Spring Java Config? I’ve replaced component-scan with bean definition as suggested: https://ecosystem.atlassian.net/browse/APOJC-23
The error will no longer occur. Though, I cannot say anything about it without the source code :wink:

OK, I’ve documented my complete findings on CACHE-240, but in a nutshell, you can fix this by either:

  • (preferred) importing the public CacheFactory API rather than the internal CacheManager interface, or if that’s not possible for some odd reason,
  • set destroyMethod = "" on the @Bean annotation for the imported CacheManager.

The underlying cause is that when you use @Bean (with or without our helper library), Spring will by default call close and shutdown on any bean that happens to declare those public, no-arg methods. CacheManager does declare shutdown, but this is only intended for use by the host product, not plugins (hence the “internal” status of that interface).

2 Likes

I’m not sure if I should use CacheFactory. The CacheManager is advertised by official documentation as the preferred way.

https://developer.atlassian.com/server/confluence/how-do-i-cache-data-in-a-plugin/
Always depend on CacheManager instead of its super-interface CacheFactory. CacheFactory exported by Confluence is an instance of TransactionalCacheFactory with often surprising performance characteristics.

I do believe you in your findings. However, not using context:component-scan or context:annotation-config and relying on bean stopped the error from occurring. I would have expected no difference w.r.t the destroy behavior of Spring :man_shrugging:

In that case I will try to use the destroyMethod way if the recommendation above is still valid.
Thank you for your help.

2 Likes

The CacheManager is advertised by official documentation as the preferred way.

Thanks for pointing out that part of the Confluence documentation - I’ll talk to that team to find a way forward that’s not confusing for the ecosystem.

However, not using context:component-scan or context:annotation-config and relying on bean stopped the error from occurring.

Yes, because as I said above, it happens whenever you use @Bean (i.e. not XML) to declare your beans. By contrast, the XML way of declaring beans sets the destroyMethod to null by default (verified experimentally).

2 Likes

Ah! Got it. :wink: In general all of the Cache documentation refers to CacheManager or cacheManager variable - as far as I know.

2 Likes

CACHE-240 is closed. Whether or not the CacheManager problem is the fault of the Atlassian cache, the CacheManager is the recommended approach to accessing the cache and it has a problem that can bring down servers.

Maybe it’s hard or nearly impossible but it seems to me that the safe solution here is for Atlassian to implement destroy() and take that @Internal annotation off of CacheManager. It’s too late to say “it’s internal”. Who amongst us noticed the @Internal annotation before today? I don’t read the Atlassian source code unless I’m stuck on something. If my IDE was showing @Internal when I hovered over CacheManager in my code I never noticed it.

So: can we please refactor CacheManager so that it is safe for this shutdown issue? If that is impossible then can we have recommendations for all of the variations of ways to import CacheManager in a plugin? I have 2 branches for one of my plugins and the production one (old way) is using <component-import key="userManager" interface="com.atlassian.sal.api.user.UserManager"/> (no @Bean annotations)… so not a problem? or problem? I don’t know.
The develop branch has been updated to use newer Atlassian Spring Scanner approach. Problem? I don’t know… I’m busy and this thread is confusing. This feels very, very dangerous to me.

Please reopen CACHE-240 and make CacheManager safe for all the various ways it can be imported by plugins. It really is the safest alternative. There is no way all vendors are going to correctly address this problem.

3 Likes

can we have recommendations for all of the variations of ways to import CacheManager in a plugin?

Sorry if this thread has been confusing. Let me summarise the implications of CACHE-240 for importing the CacheManager service from OSGi:

  1. If you import it via an <component-import> element in atlassian-plugin.xml, you’re safe.
  2. If you import it via Spring Scanner’s @ComponentImport annotation, you’re safe.
  3. If you import it via Spring Java Config, then you need to use @Bean(destroyMethod="") instead of just @Bean in your configuration class, otherwise your app will shut down the imported CacheManager if your app is ever disabled or uninstalled.

Unfortunately there’s no way to make CacheManager unconditionally “safe” for apps to use, because it needs to have the “dangerous” shutdown method so that the product can legitimately shut it down. The CacheFactory API was explicitly designed to be the safe way for apps to create a cache. It’s regrettable that Confluence app developers have no choice but to use CacheManager (noting that they can use the fix described above). As far as I know, apps for other products can quite happily use CacheFactory. So to put this problem in perspective, CACHE-240 only affects apps that run in Confluence AND use Spring Java Config (and for that edge case, use the above workaround).

Changing the public API isn’t an option, that’s a recipe for trouble. The workaround described above will need to be used until Confluence provides a CacheFactory implementation that’s suitable for apps to use (and updates their doco accordingly). Let me repeat that this only relates to a tiny fraction of apps, noting that we only announced Spring Java Config late last year.

1 Like