Sunday, April 24, 2011

Converging and Cyclic Cascades in Database Systems

Enforcement of referential integrity is an essential feature of a database management system that will be used for non-trivial applications. However, among DBMSs that implement some referential integrity features there are important differences in the DBMS' abilities to propagate cascading updates and deletes through multiple pathways. Some commonly used DBMSs provide only partial, or no, support for cascading updates and deletes through multiple pathways. The lack of support for cascading updates and deletes means that the database cannot be used to accurately model some types of real-world relationships, or that artificial data structures must be used to work around the limitation.

There are two different basic data structures in which updates and deletes may be cascaded through multiple pathways. These two situations need to be distinguished because some DBMSs will cascade updates in one situation but not in the other. The two data structures are:

  • The primary key defined in one table (first level) is referenced in two child tables (second level), and both of the child tables serve as parents of a fourth table (third level) that has a foreign key into each of the third-level tables.
  • The primary key defined in one table is referenced by two different columns of a child table.

Both of these situations are illustrated in the following figure.


This is a simplified model of environmental sampling information. Samples may be collected as part of several different studies. Any study may be subdivided into different elements (representing, for example, different phases, teams, locations, or sample types). Some studies do not have distinct elements, so the study element table will not be populated. There are therefore two pathways from the study table to the sample table: one direct pathway and one indirect pathway via the study element table, where the latter pathway may or may not be complete. An update or deletion of a study identifier in the study table needs to be propagated through both pathways. The relationship between the study table and the sample table illustrates the first of the two types of structures where updates and deletions must be cascaded through multiple pathways.

The relationship between the length unit lookup table and the sample table illustrates the second of the two types of structures. Here the sample table has two distinct foreign keys into the length unit lookup table, each making up a distinct pathway between the two tables.

Several common and widely used DBMSs vary in their abilities to handle these types of data structures. The differences in support for cascading updates and deletions in data structures containing these features, including testing code and test results, are presented below for:

  • PostgreSQL 9.0.3,
  • MySQL 5.1.49,
  • SQL Server 2008 R2,
  • SQLite 3.7.2,
  • MS-Access 2007, and
  • LibreOffice Base 3.3.2.

Initializing the Databases

The DDL used to create the tables used for this test is:

create table d_study (study character varying(10) not null primary key);


create table d_studyelement (study character varying (10) not null,
  element character varying(10) not null,
  constraint pk_element primary key (study, element),
  constraint fk_elementstudy foreign key (study) references d_study (study)
    on update cascade on delete cascade);


create table l_lunit (length_unit character varying(10) not null primary key);


create table d_sample (study character varying (10) not null,
  sample character varying (10) not null,
  element character varying (10) null,
  surface_elevation double precision,
  elevation_units character varying (10),
  sample_depth double precision,
  depth_units character varying (10),
  constraint pk_sample primary key (study, sample),
  constraint fk_samplestudy foreign key (study) references d_study (study)
    on update cascade on delete cascade,
  constraint fk_sampleelem foreign key (study, element) references d_studyelement
    (study, element) on update cascade on delete cascade,
  constraint fk_sampleelev foreign key (elevation_units) references l_lunit (length_unit)
    on update cascade on delete cascade,
  constraint fk_sampledepth foreign key (depth_units) references l_lunit (length_unit)
    on update cascade on delete cascade);


This DDL can be executed directly by PostgreSQL, MySQL, SQLite, LibreOffice Base, and SQL Server. For MS-Access, the foreign key constraint specifications must be removed, and created using MS-Access' GUI instead. Also, for MS-Access, the character varying type must be replaced with a text type.

SQLite supports foreign keys (and cascading updates and deletions) only if PRAGMA foreign_keys = ON is specified. MySQL supports foreign keys only when the InnoDb engine is used.

The DML used to initialize these tables with testing data is:

insert into d_study (study) values ('StudyA');
insert into d_study (study) values ('StudyB');


insert into d_studyelement (study, element) values ('StudyA', 'Elem1');
insert into d_studyelement (study, element) values ('StudyA', 'Elem2');


insert into l_lunit (length_unit) values ('meters');
insert into l_lunit (length_unit) values ('cm');
insert into l_lunit (length_unit) values ('ft');


insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyA', 'Sample1', 'Elem1', 1234, 'meters', 2, 'cm');
insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyA', 'Sample2', 'Elem1', 1234, 'meters', 50, 'cm'); 
insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyA', 'Sample4', 'Elem1', 1234, 'meters', 1.2, 'meters');
insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyA', 'Sample5', 'Elem2', 602, 'ft', 2, 'meters');
insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyA', 'Sample6', 'Elem2', 602, 'ft', 4, 'meters');
insert into d_sample (study, sample, element, surface_elevation,
  elevation_units, sample_depth, depth_units)
  values ('StudyB', 'Sample1', null, 900, 'meters', 2, 'cm');


The same DML can be used for all databases, except for those cases (described below) where DBMS limitations do not allow data to be inserted.

Tests

The ability of each DBMS to cascade updates and deletes through multiple pathways can be illustrated using the following three test statements applied to databases initialized as described above.

1. Cascade an update through multiple paths

Changing the name of a study in the study table should result in that change being cascaded to the sample table through two pathways, both directly and via the study element table.

update d_study set study='StudyZ' where study='StudyA';


2. Cascade an update of multiple columns

Changing the code used for a length unit in the length unit table should result in that change being cascaded to both of the length unit columns in the sample table.

update l_lunit set length_unit='m' where length_unit='meters';


3. Cascade a deletion through multiple paths

Deleting a study from the d_study table should result in that change being cascaded to the sample table.

delete from d_study where study='StudyZ';


The statement above is appropriate when the first test of cascading updates has been completed successfully. When the first test is not passed, then the where clause for this test should reference 'StudyA' instead of 'StudyZ'.


DBMS Support for Converging Cascading Updates and Deletions

The following table summarizes the capabilities of the different DBMSs to cascade updates and deletions through multiple paths.

DBMSCreate
relationships
Add dataCascade update
though multiple paths
Cascade update
of multiple columns
Cascade
deletions
PostgreSQLYesYesYesYesYes
MySQL (InnoDb)YesYesNoYesYes
SQL ServerNoNoNoNoNo
SQLiteYesYesYesYesYes
MS-AccessYesPartial (see text)YesYesYes
LibreOffice BaseYesYesNoYesYes


PostgreSQL and SQLite are the only DBMSs that correctly handle both cascading updates and deletes for both types of multiple pathways.

MySQL and LibreOffice Base both fail to cascade updates through multiple pathways, but can cascade updates of multiple columns, and can cascade deletions through multiple pathways.

Microsoft Access can't add all of the example data. The sample for StudyB with a null element can't be added because MS-Access does not allow null columns in a foreign key relationship. This limitation prohibits the use of data structures in which one table is linked to several different parent tables by different columns, and at least two of those links contain data derived from a common parent. To ensure that every row in the table is linked to only one of the parents, all but one of the foreign keys in a row must be null, and this condition will prohibit data addition in MS-Access. In the case where the data does not contain nulls (as for 'StudyA' and 'Elem1' in the example), MS-Access successfully propagates updates and deletes through both types of data structures.

SQL Server is notable among the DBMSs tested here because it does not allow creation of the data structure to test cascading updates and deletions. Executing the DDL to create the sample table causes SQL Server to issue the error message:

Introducing FOREIGN KEY constraint 'fk_sampleelem' on table 'd_sample' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

If the foreign key from the sample table to the study element table is not specified (removed from the DDL or moved to the end of the CREATE TABLE statement, then SQL Server issues the error message

Introducing FOREIGN KEY constraint 'fk_sampledepth' on table 'd_sample' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

SQL Server does not allow either type of relationship to be created with foreign keys to automatically maintain referential integrity. This behavior is by design. Commonly recommended workarounds are to use surrogate keys and disable cascading updates, or to create custom triggers to propagate updates and deletions. One rationale for this limitation of SQL Server is that it protects the user from cascading operations through data structures containing cyclic relationships; presumably this protection is needed because such cascading operations would fail or introduce data integrity errors.

Data Structures with Cyclic Relationships

Are limitations on cascading updates and deletions through multiple pathways necessary to protect against infinite loops or data integrity errors? Such effects should not happen in the previous examples because there are no cycles in those data structures. However, perhaps the DBMSs that allow cascading through multiple paths are vulnerable to problems when updates and deletions are cascaded through data structures that actually do contain cyles. Tests with cyclic data were performed for all the DBMSs that allow the specification of cascading updates and deletions for such structures.

The data structure used for these tests is a single self-referential table, illustrated below.


The column names used in this example are representative of a data structure used for tree-structured data, but such self-referential structures are appropriate for a variety of types of data.

With this simple structure, two different types of data cycles were tested:

  1. A completely self-referential cycle, where a node is its own parent.
  2. An intertwined cross-reference, where two nodes each reference the other as their parent.

Both of these types of cycles may in some circumstances represent a logical error in the data. However, for those DBMSs that allow a self-referential structure to be created with automatic cascading of updates and deletions, these situations should nevertheless be handled sensibly.

Cascading updates and deletions were tested for all DBMSs except for SQL Server, which will not create this structure. Two deletion options were tested: "on delete cascade" and "on delete set null". All the DBMSs tested support the "on delete set null" option except for MS-Access.

Code to create the table and carry out cycle test #1 is below:

create table d_node (
  node_name character varying(10) not null,
  parent_node character varying(10),
  constraint pk_node primary key (node_name),
  constraint fk_node foreign key (parent_node)
    references d_node (node_name)
    on update cascade on delete cascade
  );


-- **** Test update
insert into d_node (node_name, parent_node) values ('NodeA', 'NodeA');
insert into d_node (node_name, parent_node) values ('NodeB', 'NodeA');
update d_node set node_name='NodeZ' where node_name='NodeA';
-- Check result
select * from d_node;


-- **** Test deletion with cascading deletions on deletes
delete from d_node;
insert into d_node (node_name, parent_node) values ('NodeA', 'NodeA');
insert into d_node (node_name, parent_node) values ('NodeB', 'NodeA');
delete from d_node where node_name='NodeA';
-- Check result
select * from d_node;


-- **** Test deletion with set null on deletes
delete from d_node;
alter table d_node drop constraint fk_node;
alter table d_node
add constraint fk_node foreign key (parent_node)
    references d_node (node_name)
    on update cascade on delete set null;
insert into d_node (node_name, parent_node) values ('NodeA', 'NodeA');
insert into d_node (node_name, parent_node) values ('NodeB', 'NodeA');
delete from d_node where node_name='NodeA';
-- Check result
select * from d_node;


The code to carry out cycle test #2 is similar, except that the following insert statements are used:

insert into d_node (node_name, parent_node) values ('NodeA', null);
insert into d_node (node_name, parent_node) values ('NodeB', 'NodeA');
update d_node set parent_node='NodeB' where node_name='NodeA';

The results of tests on the two types of cyclic data relationships are shown in the following table. A notation of "Yes" indicates that cascading updates and deletions were completed without error. The two entries for the deletion tests represent the results for "on delete cascade" and "on delete set null" options, respectively.

DBMSCycle Test 1
update
Cycle Test 1
delete
Cycle Test 2
update
Cycle Test 2
delete
PostgeSQLYesYes/YesYesYes/Yes
MySQL (InnoDb)NoYes/YesNoYes/Yes
SQLiteYesYes/YesYesYes/Yes
MS-AccessYesYes/NoYesYes/No
LibreOffice BaseYesYes/YesYesNo/Yes


Both PostgreSQL and SQLite complete all cascading updates and deletions without error.

The update operation on both tests fails in MySQL with an error message indicating that a foreign key constraint fails. However, deletions are cascaded correctly for both deletion options.

MS-Access correctly executes some update and deletion cascades for both tests. However, MS-Access does not support, and therefore cannot successfully complete, cascading deletions with a "set null" option.

LibreOffice Base correctly executes the update and delete operations of test 1, and the update operation of test 2. LibreOffice Base fails to complete the cascading deletion in test 2 with a stack overflow error. However, if the deletion cascade option is changed to "set null", LibreOffice Base completes the deletion without error. The results for cascading updates are interesting because although LibreOffice Base fails to cascade updates through multiple pathways, it successfully cascades updates through cyclic data structures.

Although usability assessments are not the focus of these tests, LibreOffice's user interface is notably ill suited to handling the cyclic data structure used here: the UI does not allow interactive creation of the self-join, and when data are edited interactively, the GUI display is not immediately updated to show the cascaded updates, with the consequence that the update appears not to have been performed correctly. There is no option to refresh the display, so the table must be closed and re-opened to see the effect of interactive changes that are cascaded through a self-join.

Conclusion

Although cascading updates and deletions are supported by many DBMSs, there are important variations in the robustness with which these features are implemented, particularly when there are multiple cascade pathways and cyclic structures. The differing capabilities may be important when selecting a DBMS with which to implement a particular data model. Most of the commonly used DBMSs tested failed to support cascading updates and deletions in one or more ways. However, there is currently at least one desktop DBMS (SQLite) and one client-server DBMS (PostgreSQL) that will successfully propagate cascading updates and deletions through multiple pathways and through cyclic data structures.