I'm trying to understand how Spring Batch does transaction management. This is not a technical question but more of conceptual one: what approach does Spring Batch use and what are the consequences of that approach?
Let me try to clarify this question a bit. For instance, looking at the TaskletStep, I see that generally a step execution looks something like this:
This seems to make sense. But what about a failure between 2 and 3? This would mean the business transaction was committed but Spring Batch was unable to record that fact in its internal metadata. So a restart would reprocess the same items again even though they have already been committed. Right?
I'm looking for an explanation of these details and the consequences of the design decisions made in Spring Batch. Is this documented somewhere? The Spring Batch reference guide has very few details on this. It simply explains things from the application developer's point of view.
Make sure that your Spring Configuration is annotated with the @EnableTransactionManagement annotation (In Spring Boot this will be done automatically for you). Make sure you specify a transaction manager in your Spring Configuration (this you need to do anyway).
Step 1: Define a transaction manager in your Spring application context XML file. Step 2: Turn on support for transaction annotations by adding below entry to your Spring application context XML file.
There are two fundamental types of steps in Spring Batch, a Tasklet Step and a chunk based step. Each has it's own transaction details. Let's look at each:
Tasklet Based Step
When a developer implements their own tasklet, the transactionality is pretty straight forward.  Each call to the Tasklet#execute method is executed within a transaction.  You are correct in that there are updates before and after a step's logic is executed.  They are not technically wrapped in a transaction since rollback isn't something we'd want to support for the job repository updates.
Chunk Based Step
When a developer uses a chunk based step, there is a bit more complexity involved due to the added abilities for skip/retry.  However, from a simple level, each chunk is processed in a transaction.  You still have the same updates before and after a chunk based step that are non-transactional for the same reasons previously mentioned.
The "What if" scenario
In your question, you ask about what would happen if the business logic completed but the updates to the job repository failed for some reason.  Would the previously updated items be re-processed on a restart.  As in most things, that depends.  If you are using stateful readers/writers like the FlatFileItemReader, with each commit of the business transaction, the job repository is updated with the current state of what has been processed (within the same transaction).  So in that case, a restart of the job would pick up where it left off...in this case at the end, and process no additional records.
If you are not using stateful readers/writers or have save state turned off, then it is a bit of buyer beware and you may end up with the situation you describe. The default behavior in the framework is to save state so that restartability is preserved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With