Just something that I wrote for the team the other day…
An idempotent operation is an operation that can be repeated multiple times obtaining always the same result. The idea of idempotent journaling is that a sequence of scripts could be repeated multiple times always obtaining the same end state, no matter how many times the sequence of scripts is run (as long as the order is preserved).
This concept seems complex but in practice, the implementation is rather simple. All we need to do is to check if the object that we are trying to modify exists or not.
Native support
Databases like MySQL and PostgreSQL support this idea natively through creation statements that look like this:
CREATE TABLE IF NOT EXISTS table_name
This way, the table is only created once no matter how many times the script is run.
SQL Server and Oracle support CREATE OR ALTER
syntax for database objects that are not tables 🤦🏽♂️.
So you would find journaled scripts following a pattern like this:
DROP TABLE table_name;
CREATE TABLE table_name (...);
This works well and it is very simple as it is supported for most DBMS.
Using the database dictionary
Also, I have seen people using both Oracle and SQL Server using the database metadata to determine if a table exists and then decide whether they can run an update script on it. In Oracle, this information is available in the USER_TABLES
table that is part of the database dictionary. In SQL Server it is similar to querying sys.objects
. However, SQL server provides a object_id
function which is very nice:
IF (object_id('[schema].[object]',object_type) is not null)
BEGIN
...
END
with object types:
AF = Aggregate function (CLR)
C = CHECK constraint
D = DEFAULT (constraint or stand-alone)
F = FOREIGN KEY constraint
FN = SQL scalar function
FS = Assembly (CLR) scalar-function
FT = Assembly (CLR) table-valued function
IF = SQL inline table-valued function
IT = Internal table
P = SQL Stored Procedure
PC = Assembly (CLR) stored-procedure
PG = Plan guide
PK = PRIMARY KEY constraint
R = Rule (old-style, stand-alone)
RF = Replication-filter-procedure
S = System base table
SN = Synonym
SO = Sequence object
SQ = Service queue
TA = Assembly (CLR) DML trigger
TF = SQL table-valued-function
TR = SQL DML trigger
TT = Table type
U = Table (user-defined)
UQ = UNIQUE constraint
V = View
X = Extended stored procedure
SQL Server also provides col_length
which can be used to verify if a column exists:
IF (col_length('[schema.table]','[column]') IS NOT NULL
BEGIN
...
END
So these two functions are very useful to test if a table or a column exists. These checkpoints enable us to run a script multiple times without collisions or side effects.
Dealing with constraints
However altering a table is not always easy, especially when there are constraints such as default values, foreign keys, or indices. In that case, we would need to know the name of the constraint that we want to modify and drop it before deleting the column. In SQL Server that looks like:
ALTER TABLE table_name
DROP CONSTRAINT constraint_name
Doing this as part of your script is straightforward since the constraint information is easily accessible in the database. For example:
However, sometimes we just want to eliminate of all the constraints when eliminating a column. In the following section, I am presenting an example of how to do that.
Example of idempotent journaling in SQL Server
In this toy example, I am representing the evolution of a database through 5 different scripts, possibly written by different developers. The final state is a schema with two tables: Parent
and Child
, that have a one-to-many relationship.
A big difference with respect to solutions such as Mayflower is that this set of scripts is idempotent and therefore the journal can be run multiple times.
-- script 1.sql: creates parent
IF object_id('tests.Parent','U') IS NULL
BEGIN
CREATE TABLE tests.Parent (
Id int NOT NULL PRIMARY KEY,
Description varchar(150) NOT NULL DEFAULT 'default description text',
Field1 int NULL,
Field2 int NULL,
Field3 int NULL
)
END
-- script2.sql: adds removes description and adds summary
IF col_length('tests.Parent','Description') IS NOT NULL
BEGIN
EXEC tests.drop_constraints @p_column = 'tests.Parent.Description'
ALTER TABLE tests.Parent
DROP COLUMN Description
END
IF col_length('tests.Parent', 'Summary') IS NULL
BEGIN
ALTER TABLE tests.Parent
ADD Summary varchar(3000) NOT NULL DEFAULT 'default summary text'
END
-- script3.sql: creates child table with referential integrity
IF object_id('tests.Child','U') IS NULL
BEGIN
CREATE TABLE tests.Child (
Id int NOT NULL PRIMARY KEY,
Description varchar(300),
ParentId int FOREIGN KEY REFERENCES tests.Parent(Id)
)
END
At this point, the result is:
then removing the foreign key like this:
--script4.sql: removes foreign key
IF col_length('tests.Child','ParentId') IS NOT NULL
BEGIN
EXEC tests.drop_fk @p_column = 'tests.Child.ParentId'
END
Finally, adding the foreign key constraint again:
--script5.sql: adding the foreign key again
IF col_length('tests.Child','ParentId') IS NOT NULL
BEGIN
ALTER TABLE tests.Child
ADD FOREIGN KEY (ParentId) REFERENCES tests.Parent(Id)
END
The result is then:
Other benefits of idempotency on database migrations
Accidentally deleted objects are fully recoverable in a consistent state given that the migration can be run multiple times.
Migrations do not need to be transactional. This means that if the migration fails halfway there is no need to undo it and start over. The migration can run again and every script will contribute to the achieve the final desired state.
Reading Resources
One does not simply update a database
https://dev.to/pesse/one-does-not-simply-update-a-database–migration-based-database-development-527d
Idempotent DDL scripts that always achieve the same result https://www.sqlservercentral.com/steps/idempotent-ddl-scripts-that-always-achieve-the-same-result-making-changes-only-once-stairway-to-exploring-database-metadata-level-6
Trouble-free database migration
https://dzone.com/articles/trouble-free-database-migration-idempotence-and-co