These examples are written for Postgres, which represents its system catalog as a set of views in the information_schema schema, compatible with the SQL-92 ANSI standard. These examples also use Postgres' "string_agg()" aggregate function to convert columns of column names to string expressions. Equivalent functionality is available in some other DBMSs (e.g., "group_concat()" in MySQL/MariaDB, and "for xml_path" expressions in Microsoft SQL Server prior to 2017). These examples use the execsql script language to eliminate dependence on any DBMS-specific language extensions.
Some DBMSs support a form of the SQL-standard MERGE (or "upsert") statement, which allows both UPDATE and INSERT operations to be done in a single statement. The first example below illustrates the use of a MERGE statement, and the second example illustrates the use of separate UPDATE and INSERT statements. These examples assume that the new data are staged in a table with the same name as the base table, but in a different schema (e.g., a staging schema). Column names in the base table and staging table must be identical, and types compatible. Base tables may contain some columns that should not be updated using new data, such as autonumber columns and columns that are populated by triggers—these example scripts accept a list of column names that are to be excluded from the merge operation. Table and column names are not quoted in these examples, assuming that the database has been created using the DBMS' naming rules for unquoted identifiers.
Example 1. Generating a MERGE Statement for Any TablePostgres uses a non-standard form of the MERGE statement: the INSERT statement supports an ON CONFLICT clause that specifies the action to be taken when there are key conflicts between the base table and the incoming data.
The SQL for the merge statement is generated and executed by an execsql SCRIPT metacommand. The schema names, table name, and list of columns to exclude are specified as execsql substitution variables.
This script can be used by running the following execsql metacommands:
Example 2. Generating Separate UPDATE and INSERT Statements for Any Table
A merge statement is convenient for automatic integration of new data into an existing base table, but there are some conditions under which the use of separate UPDATE and INSERT statements may be necessary or desirable. In particular:
- The DBMS in use doesn't support any form of merge statement.
- You want to be able to review the data to be modified or inserted before changes are made.
- You want to log all of the data changes.
Example 3. Automatically Merging Data for Multiple Tables
Merge operations may need to be performed on multiple database tables for a single incoming data set. Carrying out this process can be simplified by creating a list of all of the tables to be updated, using the information schema to sort that list of tables into dependency order, and then using one of the merging scripts from Example 1 or Example 2 to modify data in the base tables. This example uses the recursive CTE from http://splinterofthesingularity.blogspot.com/2017/12/ordering-database-tables-by-foreign-key.html to order the tables, and then uses the script from Example 2 to perform the data updates. The table of table names that drives this process includes the following columns:
- table_name — The name of the table to be updated.
- exclude_cols — A comma-delimited list of quoted column names identifying the columns that are not to be updated.
- display_changes — A value of "Yes" or "No" to indicate whether data modifications for the corresponding table should be displayed to the user, allowing him or her to allow or disallow those changes.
- display_final — A value of "Yes" or "No" to indicate whether the the final data table, after the data merge, should be displayed to the user.
Thanks to Elizabeth Shea for simplifying the update statement generation in the UPSERT_ONE script.