Just something that I wrote for the team the other day…
An idempotent operation is an operation that can be repeated multiple times obtaining always the same result. The idea of idempotent journaling is that a sequence of scripts could be repeated multiple times always obtaining the same end state, no matter how many times the sequence of scripts is run (as long as the order is preserved).
This concept seems complex but in practice, the implementation is rather simple. All we need to do is to check if the object that we are trying to modify exists or not.
Databases like MySQL and PostgreSQL support this idea natively through creation statements that look like this:
CREATE TABLE IF NOT EXISTS table_name
This way, the table is only created once no matter how many times the script is run.
SQL Server and Oracle support
CREATE OR ALTER syntax for database objects that are not tables 🤦🏽♂️.
So you would find journaled scripts following a pattern like this:
DROP TABLE table_name; CREATE TABLE table_name (...);
This works well and it is very simple as it is supported for most DBMS.
Using the database dictionary
Also, I have seen people using both Oracle and SQL Server using the database metadata to determine if a table exists and then decide whether they can run an update script on it. In Oracle, this information is available in the
USER_TABLES table that is part of the database dictionary. In SQL Server it is similar to querying
sys.objects. However, SQL server provides a
object_id function which is very nice:
IF (object_id('[schema].[object]',object_type) is not null) BEGIN ... END
with object types:
AF = Aggregate function (CLR) C = CHECK constraint D = DEFAULT (constraint or stand-alone) F = FOREIGN KEY constraint FN = SQL scalar function FS = Assembly (CLR) scalar-function FT = Assembly (CLR) table-valued function IF = SQL inline table-valued function IT = Internal table P = SQL Stored Procedure PC = Assembly (CLR) stored-procedure PG = Plan guide PK = PRIMARY KEY constraint R = Rule (old-style, stand-alone) RF = Replication-filter-procedure S = System base table SN = Synonym SO = Sequence object SQ = Service queue TA = Assembly (CLR) DML trigger TF = SQL table-valued-function TR = SQL DML trigger TT = Table type U = Table (user-defined) UQ = UNIQUE constraint V = View X = Extended stored procedure
SQL Server also provides
col_length which can be used to verify if a column exists:
IF (col_length('[schema.table]','[column]') IS NOT NULL BEGIN ... END
So these two functions are very useful to test if a table or a column exists. These checkpoints enable us to run a script multiple times without collisions or side effects.
Dealing with constraints
However altering a table is not always easy, especially when there are constraints such as default values, foreign keys, or indices. In that case, we would need to know the name of the constraint that we want to modify and drop it before deleting the column. In SQL Server that looks like:
ALTER TABLE table_name DROP CONSTRAINT constraint_name
Doing this as part of your script is straightforward since the constraint information is easily accessible in the database. For example:
However, sometimes we just want to eliminate of all the constraints when eliminating a column. In the following section, I am presenting an example of how to do that.
Example of idempotent journaling in SQL Server
In this toy example, I am representing the evolution of a database through 5 different scripts, possibly written by different developers. The final state is a schema with two tables:
Child, that have a one-to-many relationship.
A big difference with respect to solutions such as Mayflower is that this set of scripts is idempotent and therefore the journal can be run multiple times.
-- script 1.sql: creates parent IF object_id('tests.Parent','U') IS NULL BEGIN CREATE TABLE tests.Parent ( Id int NOT NULL PRIMARY KEY, Description varchar(150) NOT NULL DEFAULT 'default description text', Field1 int NULL, Field2 int NULL, Field3 int NULL ) END
-- script2.sql: adds removes description and adds summary IF col_length('tests.Parent','Description') IS NOT NULL BEGIN EXEC tests.drop_constraints @p_column = 'tests.Parent.Description' ALTER TABLE tests.Parent DROP COLUMN Description END IF col_length('tests.Parent', 'Summary') IS NULL BEGIN ALTER TABLE tests.Parent ADD Summary varchar(3000) NOT NULL DEFAULT 'default summary text' END
-- script3.sql: creates child table with referential integrity IF object_id('tests.Child','U') IS NULL BEGIN CREATE TABLE tests.Child ( Id int NOT NULL PRIMARY KEY, Description varchar(300), ParentId int FOREIGN KEY REFERENCES tests.Parent(Id) ) END
At this point, the result is:
then removing the foreign key like this:
--script4.sql: removes foreign key IF col_length('tests.Child','ParentId') IS NOT NULL BEGIN EXEC tests.drop_fk @p_column = 'tests.Child.ParentId' END
Finally, adding the foreign key constraint again:
--script5.sql: adding the foreign key again IF col_length('tests.Child','ParentId') IS NOT NULL BEGIN ALTER TABLE tests.Child ADD FOREIGN KEY (ParentId) REFERENCES tests.Parent(Id) END
The result is then:
Other benefits of idempotency on database migrations
Accidentally deleted objects are fully recoverable in a consistent state given that the migration can be run multiple times.
Migrations do not need to be transactional. This means that if the migration fails halfway there is no need to undo it and start over. The migration can run again and every script will contribute to the achieve the final desired state.
One does not simply update a database
Idempotent DDL scripts that always achieve the same result https://www.sqlservercentral.com/steps/idempotent-ddl-scripts-that-always-achieve-the-same-result-making-changes-only-once-stairway-to-exploring-database-metadata-level-6
Trouble-free database migration