23. Comparing databases

23.1. Compare two database schemas - WbSchemaDiff
23.2. Compare data across databases - WbDataDiff

There are two SQL Workbench/J specific commands that can compare either the structure of two databases or the data contained in them.

These commands (WbSchemaDiff and WbDataDiff) can be used like any other SQL command as long as they are run using SQL Workbench/J This includes the usage in scripts that are run in batch mode.

23.1. Compare two database schemas - WbSchemaDiff

WbSchemaDiff analyzes two schemas (or a list of tables) and outputs the differences between those schemas as an XML file. The XML file describes the changes that need to be applied to the target schema to have the same structure as the reference schema, e.g. modify column definitions, remove or add tables, remove or add indexes.

The output is intended to be transformed using XSLT (e.g. with the WbXSLT Command). Sample XSLT transformations are stored in the xslt subdirectory of the SQL Workbench/J installation directory. All scripts that are part of the download can also be found on the SQL Workbench/J homepage

[Note]

This feature should only be considered as a one-off solution to quickly compare to database schemas. Is not intended to replace a proper schema (script) management. You should consider tools like Liquibase or Flyway to manage a database schema. Those scripts should also be stored in a version control system (Subversion, GIT, ...)

The command supports the following parameters:

ParameterDescription
-referenceProfileThe name of the connection profile for the reference connection. If this is not specified, then the current connection is used.
-referenceGroupIf the name of your reference profile is not unique across all profiles, you will need to specify the group in which the profile is located with this parameter.
-referenceConnection

Allows to specify a full connection definition as a single parameter (and thus does not require a pre-defined connection profile).

The connection is specified with a comma separated list of key value pairs:

  • username - the username for the connection
  • password - the password for the connection
  • url - the JDBC URL
  • driver - the class name for the JDBC driver. If this is not specified, SQL Workbench/J will try to determine the driver from the JDBC URL
  • driverJar - the full path to the JDBC driver. This not required if a driver for the specified class is already configured

e.g.: "username=foo,password=bar,url=jdbc:postgresql://localhost/mydb"

For a sample connection string please see the documentation for WbCopy.

If this parameter is specified -referenceProfile will be ignored.

-targetProfile

The name of the connection profile for the target connection (the one that needs to be migrated). If this is not specified, then the current connection is used.

If you use the current connection for reference and target, then you should prefix the table names with schema/user or use the -referenceschema and -targetschema parameters.

-targetGroup If the name of your target profile is not unique across all profiles, you will need to specify the group in which the profile is located with this parameter.
-targetConnection

Allows to specify a full connection definition as a single parameter (and thus does not require a pre-defined connection profile).

The connection is specified with a comma separated list of key value pairs:

  • username - the username for the connection
  • password - the password for the connection
  • url - the JDBC URL
  • driver - the class name for the JDBC driver. If this is not specified, SQL Workbench/J will try to determine the driver from the JDBC URL
  • driverJar - the full path to the JDBC driver. This not required if a driver for the specified class is already configured

e.g.: "username=foo,password=bar,url=jdbc:postgresql://localhost/mydb"

For a sample connection string please see the documentation for WbConnect.

If this parameter is specified -targetProfile will be ignored.

-fileThe filename of the output file. If this is not supplied the output will be written to the message area
-referenceTablesA (comma separated) list of tables that are the reference tables, to be checked.
-targetTables

A (comma separated) list of tables in the target connection to be compared to the source tables. The tables are "matched" by their position in the list. The first table in the -referenceTables parameter is compared to the first table in the -targetTables parameter, and so on. Using this parameter you can compare tables that do not have the same name.

If you omit this parameter, then all tables from the target connection with the same names as those listed in -referenceTables are compared.

If you omit both parameters, then all tables that the user can access are retrieved from the source connection and compared to the tables with the same name in the target connection.

-referenceSchema Compare all tables from the specified schema (user)
-targetSchema A schema in the target connection to be compared to the tables from the reference schema.
-excludeTables A comma separated list of tables that should not be compared. If tables from several schemas are compared (using -referenceTables=schema_one.*, schema_two.*) then the listed tables must be qualified with a schema, e.g. -excludeTables=schema_one.foobar, schema_two.fubar
-encodingThe encoding to be used for the XML file. The default is UTF-8
-includePrimaryKeysSelect whether primary key constraint definitions should be compared as well. The default is true. Valid values are true or false.
-includeForeignKeysSelect whether foreign key constraint definitions should be compared as well. The default is true. Valid values are true or false.
-includeTableGrants Select whether table grants should be compared as well. The default is false.
-includeTriggers Select whether table triggers are compared as well. The default value is true.
-includeConstraints

Select whether table and column (check) constraints should be compared as well. SQL Workbench/J compares the constraint definition (SQL) as stored in the database.

The default is to compare table constraints (true) Valid values are true or false.

-useConstraintNames

When including check constraints this parameter controls whether constraints should be matched by name, or only by their expression. If comparing by names the diff output will contain elements for constraint modification otherwise only drop and add entries will be available.

The default is to compare by names(true) Valid values are true or false.

-includeViews

Select whether views should also be compared.

Note that this comparison is very unreliable, because this compares the source code, not the logical representation of the view definition.

The source code is compared the way it is returned by the DBMS is compared. This comparison is case-sensitiv, which means SELECT * FROM foo; will be reported as a difference to select * from foo; even if they are logically the same. Aslo different indention or empty lines will result in the views being reported as "different".

A comparison across different DBMS will not work.

The default is true Valid values are true or false.

-includeProcedures

Select whether stored procedures should also be compared. When comparing procedures the source as it is stored in the DBMS is compared. This comparison is case-sensitive. A comparison across different DBMS will also not work!

The default is false Valid values are true or false.

-includeIndex Select whether indexes should be compared as well. The default is to not compare index definitions. Valid values are true or false.
-includeSequences Select whether sequences should be compared as well. The default is to not compare sequences. Valid values are true, false.
-useJdbcTypes

Define whether to compare the DBMS specific data types, or the JDBC data type returned by the driver. When comparing tables from two different DBMS it is recommended to use -useJdbcType=true as this will make the comparison a bit more DBMS-independent. When comparing e.g. Oracle vs. PostgreSQL a column defined as VARCHAR2(100) in Oracle would be reported as being different to a VARCHAR(100) column in PostgreSQL which is not really true As both drivers report the column as java.sql.Types.VARCHAR, they would be considered as identical when using -useJdbcType=true.

Valid values are true or false.

-additionalTypes

Select additional object types that are not compared by default (using the -includeXXX parameters) such as Oracle TYPE definitions. Those objects are compared on source code level (like procedures) rather than on attribute level.

Valid values are object type names as shown in the "Type" drop down in the DbExplorer.

-xsltParameter A list of parameters (key/value pairs) that should be passed to the XSLT processor. When using e.g. the wbreport2liquibase.xslt stylesheet, the value of the author attribute can be set using -xsltParameter="authorName=42". This parameter can be provided multiple times for multiple parameters, e.g. when using wbreport2pg.xslt: -xsltParameter="makeLowerCase=42" -xsltParameter="useJdbcTypes=true"

WbSchemaDiff Examples

Compare all tables between two connections, and write the output to the file migrate_prod.xml and convert the XML to a series of SQL statements for PostgreSQL

WbSchemaDiff -referenceProfile="Staging"
             -targetProfile="Production"
             -file=migrate_prod.xml
             -styleSheet=wbdiff2pg.xslt
             -xsltOutput=migrate_prod.sql

Compare a list of matching tables between two databases and write the output to the file migrate_staging.xml ignoring all tables that start with TMP_ and exclude any index definition from the comparison. Convert the output to a SQL script for Oracle

WbSchemaDiff -referenceProfile="Development"
             -targetProfile="Staging"
             -file=migrate_stage.xml
             -excludeTables=TMP_*
             -includeIndex=false
             -styleSheet=wbdiff2oracle.xslt
             -xsltOutput=migrate_stage.sql

23.2. Compare data across databases - WbDataDiff

The WbDataDiff command can be used to generate SQL scripts that update a target database such that the data is identical to a reference database. This is similar to the WbSchemaDiff but compares the actual data in the tables rather than the table structure.

For each table the command will create up to three script files, depending on the needed statements to migrate the data. One file for UPDATE statements, one file for INSERT statements and one file for DELETE statements (if -includeDelete=true is specified)

[Note]

As this command needs to read every row from the reference and the target table, processing large tables can take quite some time, especially if DELETE statements should also be generated.

WbDataDiff requires that all involved tables have a primary key defined. If a table does not have a primary key, WbDataDiff will stop the processing.

To improve performance (a bit), the rows are retrieved in chunks from the target table by dynamically constructing a WHERE clause for the rows that were retrieved from the reference table. The chunk size can be controlled using the property workbench.sql.sync.chunksize The chunk size defaults to 25. This is a conservative setting to avoid problems with long SQL statements when processing tables that have a PK with multiple columns. If you know that your primary keys consist only of a single column and the values won't be too long, you can increase the chunk size, possibly increasing the performance when generating the SQL statements. As most DBMS have a limit on the length of a single SQL statement, be careful when setting the chunksize too high. The same chunk size is applied when generating DELETE statements by the WbCopy command, when syncDelete mode is enabled.

The command supports the following parameters:

ParameterDescription
-referenceProfile The name of the connection profile for the reference connection. If this is not specified, then the current connection is used.
-referenceGroup If the name of your reference profile is not unique across all profiles, you will need to specify the group in which the profile is located with this parameter. If the profile's name is unique you can omit this parameter
-referenceConnection

Allows to specify a full connection definition as a single parameter (and thus does not require a pre-defined connection profile).

The connection is specified with a comma separated list of key value pairs:

  • username - the username for the connection
  • password - the password for the connection
  • url - the JDBC URL
  • driver - the class name for the JDBC driver. If this is not specified, SQL Workbench/J will try to determine the driver from the JDBC URL
  • driverJar - the full path to the JDBC driver. This not required if a driver for the specified class is already configured

e.g.: "username=foo,password=bar,url=jdbc:postgresql://localhost/mydb"

For a sample connection string please see the documentation for WbCopy.

If this parameter is specified -referenceProfile will be ignored.

-targetProfile

The name of the connection profile for the target connection (the one that needs to be migrated). If this is not specified, then the current connection is used.

If you use the current connection for reference and target, then you should prefix the table names with schema/user or use the -referenceschema and -targetschema parameters.

-targetGroup If the name of your target profile is not unique across all profiles, you will need to specify the group in which the profile is located with this parameter.
-targetConnection

Allows to specify a full connection definition as a single parameter (and thus does not require a pre-defined connection profile).

The connection is specified with a comma separated list of key value pairs:

  • username - the username for the connection
  • password - the password for the connection
  • url - the JDBC URL
  • driver - the class name for the JDBC driver. If this is not specified, SQL Workbench/J will try to determine the driver from the JDBC URL
  • driverJar - the full path to the JDBC driver. This not required if a driver for the specified class is already configured

e.g.: "username=foo,password=bar,url=jdbc:postgresql://localhost/mydb"

For a sample connection string please see the documentation for WbConnect.

If this parameter is specified -targetProfile will be ignored.

-file The filename of the main script file. The command creates two scripts per table. One script named update_<tablename>.sql that contains all needed UPDATE or INSERT statements. The second script is named delete_<tablename>.sql and will contain all DELETE statements for the target table. The main script merely calls (using WbInclude) the generated scripts for each table. You can enable writing a single file that includes all statements for all tables by using the parameter -singleFile=true
-singleFile If this parameter's value is true, then only one single file containing all statements will be written.
-referenceTables A (comma separated) list of tables that are the reference tables, to be checked. You can specify the table with wildcards, e.g. -referenceTables=P% to compare all tables that start with the letter P.
-targetTables

A (comma separated) list of tables in the target connection to be compared to the source tables. The tables are "matched" by their position in the list. The first table in the -referenceTables parameter is compared to the first table in the -targetTables parameter, and so on. Using this parameter you can compare tables that do not have the same name.

If you omit this parameter, then all tables from the target connection with the same names as those listed in -referenceTables are compared.

If you omit both parameters, then all tables that the user can access are retrieved from the source connection and compared to the tables with the same name in the target connection.

-referenceSchema Compare all tables from the specified schema (user)
-targetSchema A schema in the target connection to be compared to the tables from the reference schema.
-excludeTables A comma separated list of tables that should not be compared. If tables from several schemas are compared (using -referenceTables=schema_one.*, schema_two.*) then the listed tables must be qualified with a schema, e.g. -excludeTables=schema_one.foobar, schema_two.fubar
-checkDependencies

Valid values are true, false.

Sorts the generated scripts in order to respect foreign key dependencies for deleting and inserting rows.

The default is true.

-includeDelete

Valid values are true, false.

Generates DELETE statements for rows that are present in the target table, but not in the reference table. Note that enabling this option will result in additional overhead reading the target table's data and will slow down the overal comparison speed.

The default is false.

-type

Valid values are sql, xml

Defines the type of the generated files.

-encoding

The encoding to be used for the SQL scripts. The default depends on your operating system. It will be displayed when you run WbDataDiff without any parameters. You can overwrite the platform default with the property workbench.encoding in the file workbench.settings

XML files are always stored in UTF-8

-sqlDateLiterals

Valid values: jdbc, ansi, dbms, default

Controls the format in which the values of DATE, TIME and TIMESTAMP columns are written into the generated SQL statements. For a detailed description of the possible values, please refer to the WbExport command.

-ignoreColumns

With this parameter you can define a list of column names that should not be considered when comparing data. You can e.g. exclude columns that store the last access time of a row, or the last update time if that should not be taken into account when checking for changes.

They will however be part of generated INSERT or UPDATE statements unless -exclueIgnored=true is also specified.

-excludeIgnored

Valid values: true, false

If this is set to true, the ignored columns will also be removed from any INSERT or UPDATE statement

The default is false.

-alternateKey

With this parameter alternate keys can be defined for the tables that are compared. The parameter can be repeated multiple times to set the keys for multiple tables in the following format: -alternateKey='table_1=column_1,column_2'

Note that each value has to be enclosed in either single or double quotes to mask the equals sign embedded in the parameter value.

Once an alternate (primary) key has been defined, the primary key columns defined on the tables are ignored. By default the real PK columns will however be included in INSERT statement that are generated. To avoid this, set the parameter -excludeRealPK to true.

-excludeRealPK

Valid values are true, false.

This parameter controls the usage of the real PK columns in case alternate PK columns are defined. If set to true the real PK columns are excluded from generated INSERT statements (assuming that a new PK value will be generated during inserting the rows).

Note that this parameter will enable/disable the use of the real PK columns for all tables for which alternate key columns were defined.

This parameter has no effect if no alternate keys were specified using the -alternateKey option.

-showProgress

Valid values: true, false, <numeric value>

Control the update frequence in the status bar (when running in GUI mode). The default is every 10th row is reported. To disable the display of the progress specify a value of 0 (zero) or the value false. true will set the progress interval to 1 (one).

WbDataDiff Examples

Compare all tables between two connections, and write the output to the file migrate_staging.sql, but do not generate DELETE statements.

WbDataDiff -referenceProfile="Production"
           -targetProfile="Staging"
           -file=migrate_staging.sql
           -includeDelete=false

Compare a list of matching tables between two databases and write the output to the file migrate_staging.sql including DELETE statements.

WbDataDiff -referenceProfile="Production"
           -targetProfile="Staging"
           -referenceTables=person,address,person_address
           -file=migrate_staging.sql
           -includeDelete=true

Compare three tables that are differently named in the target database and ignore all columns (regardless in which table they appear) that are named LAST_ACCESS or LAST_UPDATE

WbDataDiff -referenceProfile="Production"
           -targetProfile="Staging"
           -referenceTables=person,address,person_address
           -targetTables=t_person,t_address,t_person_address
           -ignoreColumns=last_access,last_update
           -file=migrate_staging.sql
           -includeDelete=true