Spring Batch is literally a batch framework based on Spring Framework. I usually use it to develop a simple ETL(Extraction, Transaformation and Loading) program.
In this post, I’ll show you how to write a simple ETL program. (This sample is tested on Spring Batch 3.0.10)
- Database (MySQL or Oracle)
- Spring batch context database
- Spring batch libraries (spring-batch-core, spring framework, spring-jdbc)
- Database JDBC driver
- DBCP (Optional)
Spring batch context database
Spring batch context database must be created to run Spring batch job. Table creation DDL can be found on spring-batch-core-version.jar (org.springframework.batch.core package contains DDL for several databases)
It contains the following tables.
In this post, I use two tables : TB_SOURCE, TB_TARGET. The sample program reads from TB_SOURCE and writes all data into TB_TARGET.
Writing base Spring context
To run a Spring batch program, some beans need to be declared.
- Spring Batch Context DataSource
- DataSource for source and target
- TransactionManager for Spring Batch Context
- TransactionManager for source and target
- Job Repository bean
- Job Launcher (which is the starting point of a job)
The following is a snippet of context.xml.
Writing Job flow
Now, it’s ready to write a ETL program. A job is also declared in spring context.xml. The following is the basic structure of a Job.
- A job can have several Steps (for simplicity, I use one Step for this sample)
- Each Step has a Reader, a Processor and a Writer
- Reader reads data from database, file or some data store
- Reader invokes RowMapper to convert raw data into Source VO
- Processor reads a Source VO and convert it into Target VO
- Writer writes Target VO into database, file or some data store
The following is a sample ETL job definition.
As the above sample shows, if a Reader or a Writer’s target is database, SQL can be used directly. (It’s the strength of Spring batch)
RowMapper is a component which converts raw source data into Source VO. It must implement org.springframework.jdbc.core.RowMapper. The sample RowMapper is as follows.
Processor is the core of Spring Batch. It can transform source data, verify it or execute any additional logic. In this sample, it simply maps source to another column name. The sample Processor is as follows.
Writing Source VO
Writing Target VO
Target VO must have getter method in this sample.
And there is no more components to write except a program which invokes this sample Job.
You can download the full sources from github.