- Python code
- A CTE
- An execsql script.
All of these methods take, as input, a table of dependencies in which each row contains a pair of table names corresponding to a parent:child relationship. The Python code produces a Python list containing tables listed in dependency order, and the other two methods produce a table consisting of one column containing table names and another column containing integers that specify the dependency order. The first items in these lists are the tables that have no parents, and the last items in these lists are the tables with the longest chain of dependencies.
Getting the Dependencies
When using a DBMS that supports INFORMATION_SCHEMA tables, the following SQL can be used to obtain a table containing all direct parent:child pairs
create table dependencies as select tc.table_name as child, tu.table_name as parent from information_schema.table_constraints as tc inner join information_schema.constraint_table_usage as tu on tu.constraint_name = tc.constraint_name where tc.constraint_type = 'FOREIGN KEY' and tc.table_name <> tu.table_name;
Additional constraints may be used, or needed, in the WHERE clause to limit the set of tables returned--for example, to eliminate system tables, or to select only the tables in a particular schema. If there are cyclic dependency relationships among tables, this SQL will not complete, and so in those cases at least one of the tables in the cycle of mutual dependencies should be omitted using an appropriate specification in the WHERE clause.
The table produced by this SQL will include only those tables that are part of some foreign key relationship. Standalone tables that are neither a parent nor a child will not be included. Standalone tables and tables that are omitted because of cyclic dependency relationships can be added to the output of the dependency-ordering step.
Ordering Tables by Dependency
Converting the set of parent:child dependencies into a list of tables in dependency order requires an iterative or recursive traversal of the tree of relationships that is rooted at the tables that have no parents. The table-ordering routines that follow use different approaches:
- A loop over a list of unprocessed dependencies in Python, where that list is modified within the loop.
- A recursive traversal of the tree using a CTE, terminating when the farthest leaves of the tree have been reached.
- Looping using end recursion in the execsql script to traverse the tree in a manner similar to the CTE.
The algorithm used in the Python code is distinct, whereas the algorithms used with the recursive CTE and the execsql script are very similar. (The algorithm used by the Python code can also be implemented using an execsql script, but the code is considerably longer than the implementation shown below.)
Python
The input for the Python code to generate a dependency-ordered list of tables is a table of dependencies consisting of a list of two-element lists or tuples, each containing a child table name and a parent table name. This list might be generated by querying the database (e.g., using the SQL above) or from static configuration data. Given this table of dependencies, a Python list of all of the tables in dependency order can be generated with the following function
def dependency_order(dep_list): rem_tables = list(set([t[0] for t in dep_list] + [t[1] for t in dep_list])) rem_dep = copy.copy(dep_list) sortkey = 1 ret_list = [] while len(rem_dep) > 0: tbls = [tbl for tbl in rem_tables if tbl not in [dep[0] for dep in rem_dep] ] ret_list.extend([ (tb, sortkey) for tb in tbls ]) rem_tables = [ tbl for tbl in rem_tables if tbl not in tbls ] rem_dep = [ dep for dep in rem_dep if dep[1] not in tbls ] sortkey += 1 if len(rem_tables) > 0: ret_list.extend([(tb, sortkey) for tb in rem_tables]) ret_list.sort(cmp=lambda x,y: cmp(x[1], y[1])) return [ item[0] for item in ret_list ]
SQL CTE
For DBMSs that support them, a recursive CTE can be used to convert a table of parent:child dependencies into a list of tables in dependency order. The input for the following code should be a table of such dependencies; that table should be named "dependencies". The output of the recursive CTE contains all parent tables, and the remaining tables (that are not parents to any other table) are added in the SELECT statement that uses the CTE.
with recursive dep_depth as ( select dep.child, dep.parent, 1 as lvl from dependencies as dep union all select dep.child, dep.parent, dd.lvl + 1 as lvl from dep_depth as dd inner join dependencies as dep on dep.parent = dd.child ) select table_name, table_order from ( select dd.parent as table_name, max(lvl) as table_order from dep_depth as dd group by table_name union select dd.child as table_name, max(lvl) + 1 as level from dep_depth as dd left join dependencies as dp on dp.parent = dd.child where dp.parent is null group by dd.child ) as all_levels;
Execsql Script
For DBMSs that don't support recursive CTEs, and when use of a client language like Python is not desired, the metacommands provided by execsql allow recursive traversal of the tree of dependencies, as shown in the following code. Explanations of the metacommands used in this code can be found in the on-line documentation.
As with the CTE implementation, the input for the following code should be a table of parent:child dependencies that is named "dependencies".
The following code increments a counter to track and assign the successive levels of recursion during traversal from the root to the leaves of the dependency tree. Because automatically-generated sequences and variables are DBMS-specific extensions to SQL, for the sake of generality, this implementation uses execsql counter variables and substitution variables. Thus, but for minor differences in SQL syntax, the following code should run in any DBMS.
-- ==================================================================== -- Initialize the tables used to summarize dependency order. -- Table created: -- dep_level: A copy of "dependencies" with an additional column -- to store the level in the hierarchy. The dependency -- level is set by an execsql counter variable for -- generality. -- ==================================================================== -- !x! sub current_level !!$counter_530!! select child, parent, !!current_level!! as lvl into temporary table dep_level from dependencies; -- ==================================================================== -- Create a view to evaluate whether there are any remaining -- dependencies to evaluate. -- View created: -- unprocessed: The number of parent tables whose children are -- not already listed as parents in the 'dep_level' table. -- Tables used: -- dep_level -- dependencies -- ==================================================================== create temporary view unprocessed as select count(distinct dep.child) as unproc from dep_level as dl inner join dependencies as dep on dl.child = dep.parent where dl.lvl = (select max(lvl) from dep_level); -- ==================================================================== -- Define an execsql sub-script to increment the dependency level. -- ==================================================================== -- !x! begin script add_new_level -- !x! sub last_level !!current_level!! -- !x! sub current_level !!$counter_530!! insert into dep_level (child, parent, lvl) select distinct dep.child, dl.child as parent, !!current_level!! from dep_level as dl inner join dependencies as dep on dl.child = dep.parent where dl.lvl = !!last_level!!; -- !x! end script -- ==================================================================== -- Define and execute an execsql sub-script to increment the dependency -- level as many times as necessary. -- ==================================================================== -- !x! begin script add_levels -- !x! subdata remaining unprocessed -- !x! execute script add_new_level -- !x! subdata remaining unprocessed -- !x! if(is_gt(!!remaining!!, 0)) { execute script add_levels } -- !x! end script -- !x! execute script add_levels -- ==================================================================== -- Convert the dependency levels into a table order. -- ==================================================================== create temporary table dependency_order as select table_name, table_order from ( select dd.parent as table_name, max(lvl) as table_order from dep_level as dd group by table_name union select dd.child as table_name, max(lvl) + 1 as level from dep_level as dd left join dependencies as dp on dp.parent = dd.child where dp.parent is null group by dd.child ) as all_levels;
No comments:
Post a Comment