Idempotent Database Journaling

Just something that I wrote for the team the other day…

An idempotent operation is an operation that can be repeated multiple times obtaining always the same result. The idea of idempotent journaling is that a sequence of scripts could be repeated multiple times always obtaining the same end state, no matter how many times the sequence of scripts is run (as long as the order is preserved).

This concept seems complex but in practice, the implementation is rather simple. All we need to do is to check if the object that we are trying to modify exists or not.

Native support

Databases like MySQL and PostgreSQL support this idea natively through creation statements that look like this:


This way, the table is only created once no matter how many times the script is run.

SQL Server and Oracle support CREATE OR ALTER syntax for database objects that are not tables 🤦🏽‍♂️.

So you would find journaled scripts following a pattern like this:

DROP TABLE table_name;
CREATE TABLE table_name (...);

This works well and it is very simple as it is supported for most DBMS.

Using the database dictionary

Also, I have seen people using both Oracle and SQL Server using the database metadata to determine if a table exists and then decide whether they can run an update script on it. In Oracle, this information is available in the USER_TABLES table that is part of the database dictionary. In SQL Server it is similar to querying sys.objects. However, SQL server provides a object_id function which is very nice:

IF (object_id('[schema].[object]',object_type) is not null)

with object types:

AF = Aggregate function (CLR)
C = CHECK constraint
D = DEFAULT (constraint or stand-alone)
F = FOREIGN KEY constraint
FN = SQL scalar function
FS = Assembly (CLR) scalar-function
FT = Assembly (CLR) table-valued function
IF = SQL inline table-valued function
IT = Internal table
P = SQL Stored Procedure
PC = Assembly (CLR) stored-procedure
PG = Plan guide
PK = PRIMARY KEY constraint
R = Rule (old-style, stand-alone)
RF = Replication-filter-procedure
S = System base table
SN = Synonym
SO = Sequence object
SQ = Service queue
TA = Assembly (CLR) DML trigger
TF = SQL table-valued-function
TR = SQL DML trigger
TT = Table type
U = Table (user-defined)
UQ = UNIQUE constraint
V = View
X = Extended stored procedure

SQL Server also provides col_length which can be used to verify if a column exists:

IF (col_length('[schema.table]','[column]') IS NOT NULL

So these two functions are very useful to test if a table or a column exists. These checkpoints enable us to run a script multiple times without collisions or side effects.

Dealing with constraints

However altering a table is not always easy, especially when there are constraints such as default values, foreign keys, or indices. In that case, we would need to know the name of the constraint that we want to modify and drop it before deleting the column. In SQL Server that looks like:

ALTER TABLE table_name
DROP CONSTRAINT constraint_name

Doing this as part of your script is straightforward since the constraint information is easily accessible in the database. For example:

However, sometimes we just want to eliminate of all the constraints when eliminating a column. In the following section, I am presenting an example of how to do that.

Example of idempotent journaling in SQL Server

In this toy example, I am representing the evolution of a database through 5 different scripts, possibly written by different developers. The final state is a schema with two tables: Parent and Child, that have a one-to-many relationship.

A big difference with respect to solutions such as Mayflower is that this set of scripts is idempotent and therefore the journal can be run multiple times.

-- script 1.sql: creates parent
IF object_id('tests.Parent','U') IS NULL
	CREATE TABLE tests.Parent (
		Description varchar(150) NOT NULL DEFAULT 'default description text',
		Field1 int NULL,
		Field2 int NULL,
		Field3 int NULL
-- script2.sql: adds removes description and adds summary
IF col_length('tests.Parent','Description') IS NOT NULL
	EXEC tests.drop_constraints @p_column = 'tests.Parent.Description'
	ALTER TABLE tests.Parent 
	DROP COLUMN Description

IF col_length('tests.Parent', 'Summary') IS NULL
	ALTER TABLE tests.Parent
	ADD Summary varchar(3000) NOT NULL DEFAULT 'default summary text'
-- script3.sql: creates child table with referential integrity
IF object_id('tests.Child','U') IS NULL
	CREATE TABLE tests.Child (
		Description varchar(300),
		ParentId int FOREIGN KEY REFERENCES tests.Parent(Id)

At this point, the result is:

then removing the foreign key like this:

--script4.sql: removes foreign key
IF col_length('tests.Child','ParentId') IS NOT NULL
	EXEC tests.drop_fk @p_column = 'tests.Child.ParentId'

Finally, adding the foreign key constraint again:

--script5.sql: adding the foreign key again
IF col_length('tests.Child','ParentId') IS NOT NULL 
	ALTER TABLE tests.Child
	ADD FOREIGN KEY (ParentId) REFERENCES tests.Parent(Id)

The result is then:

Other benefits of idempotency on database migrations

Accidentally deleted objects are fully recoverable in a consistent state given that the migration can be run multiple times.

Migrations do not need to be transactional. This means that if the migration fails halfway there is no need to undo it and start over. The migration can run again and every script will contribute to the achieve the final desired state.

Reading Resources

One does not simply update a database–migration-based-database-development-527d

Idempotent DDL scripts that always achieve the same result

Trouble-free database migration