Thursday, March 28, 2019

Aphorism of the Day

"I used a standard technique" is just another way to say "I didn't think at all."

Sunday, March 10, 2019

Automation of Upsert Operations with QA Checks and Logging

Scripts that extend the data merging technique illustrated in the post Driving Data Table Merges from the Information Schema add the following capabilities:

  • The upsert operation can be applied to multiple tables simultaneously, and will execute SQL update and insert statements in top-down order to maintain referential integrity among the tables.
  • Prior to updating data in the base tables with data in the staging tables, the scripts check for:
    • Null values in the columns of each staging table that must be non-null in the corresponding base table.
    • Duplicate primary key values in the staging tables.
    • Invalid foreign keys in the staging tables; foreign key values are checked against both the appropriate base table and any other appropriate staging table.
  • No changes are made to the base tables if there are any violations of the non-null checks, primary key checks, or foreign key checks carried out on the staging tables.
  • If QA violations are found, the results of all of the QA checks are returned in a table that identifies, for each base table to be updated, each type of QA violation that was found, and the number of such violations (e.g., for foreign key violations) or the number of rows affected in the staging table.
  • The results of a successful upsert operation are returned in a table that identifies, for each base table to be updated, the number of rows updated and the number of rows inserted.
  • The changes to be made to each base table (updates and inserts) may optionally be displayed in a GUI interface so that the user can review them and approve or disapprove each set of changes.
  • All of the steps that are carried out by the scripts (QA checks, update operations, and insert operations) can be logged in a custom text file.  The log can optionally include each SQL statement used to update the database, and also include the data values that are used for each update and insert operation.
  • If an execsql console is active, the scripts will use the console's status bar and progress bar to indicate the activity underway.
These scripts do not require any customization for different data models.  They can be applied to any tables in any database in any DBMS that supports the standard SQL information_schema views.  Some operations carried out by these scripts are DBMS-specific, however (such as string aggregation), so the features listed above are implemented in DBMS-specific scripts.  These features have been implemented in scripts for:
  • Postgres
  • MariaDB/MySQL
  • MS-SQL Server.

These scripts are too lengthy to include in this blog post, but are available from the Bitbucket repository at https://bitbucket.org/rdnielsen/execsql_upsert/src/default/.  More complete documentation is available at ReadTheDocs.