What I suspect the problem to be is SchedulerFactoryBean's setOverwriteExistingJobs not offering enough protection.
One node will be initializing the scheduler and it will decide to replace the trigger (breakpoint org.quartz.impl.jdbcjobstore.SimpleTriggerPersistenceDelegate#deleteExtendedTriggerProperties )
Right after it executes this method, the trigger won't be in the database any longer so when another node in the cluster will try to read it (org.quartz.impl.jdbcjobstore.JobStoreSupport#retrieveTrigger) it will fail with the exception below. Because of this exception, the whole application will fail to start (not just the scheduler).
Caused by: org.quartz.JobPersistenceException: Couldn't retrieve trigger: No record found for selection of Trigger with key:
The logs can be found at https://github.com/apixandru/case-study/tree/master/spring-boot-quartz/logs (The exception can be found on the Server-1 node after the 4th restart)
For the whole project that demonstrates this issue go to https://github.com/apixandru/case-study/tree/master/spring-boot-quartz
The way that we configure the scheduler is here
@Bean
JobDetailFactoryBean jobFactoryBean() {
    JobDetailFactoryBean bean = new JobDetailFactoryBean();
    bean.setDurability(true);
    bean.setName("Sampler");
    bean.setJobClass(SampleJob.class);
    return bean;
}
@Bean
SimpleTriggerFactoryBean triggerFactoryBean(JobDetailFactoryBean jobFactoryBean) {
    SimpleTriggerFactoryBean bean = new SimpleTriggerFactoryBean();
    bean.setName("Sampler Trigger");
    bean.setRepeatInterval(20_000);
    bean.setJobDetail(jobFactoryBean.getObject());
    return bean;
}
@Bean
SchedulerFactoryBean schedulerFactoryBean(SimpleTriggerFactoryBean triggerFactoryBean, DataSource dataSource, Dependency dependency) {
    Properties props = new Properties();
    props.put("org.quartz.scheduler.instanceId", "AUTO");
    props.put("org.quartz.jobStore.isClustered", "true");
    SchedulerFactoryBean bean = new SchedulerFactoryBean();
    bean.setTriggers(triggerFactoryBean.getObject());
    bean.setSchedulerName("Demo Scheduler");
    bean.setSchedulerContextAsMap(Collections.singletonMap("dependency", dependency));
    bean.setOverwriteExistingJobs(true);
    bean.setDataSource(dataSource);
    bean.setQuartzProperties(props);
    return bean;
}
This happens a lot on our work servers but it's a lot harder to get locally (possibly due to the fact that the actual servers are dedicated and have a lot more power than my local machine?)
To get the bug on any machine, start one server in debug mode and put a breakpoint on SimpleTriggerPersistenceDelegate.deleteExtendedTriggerProperties and just after it executes, start the second server and you will get this exception
Anyway, I managed to get this error locally as well after about 40 redeploys to my local clustered weblogic server.
The problem is the fact that by default no transaction manager is used, so no locking is used.
To solve the issue, it is required to call the schedulerFactoryBean's setTransactionManager method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With