Friday, March 16, 2018

Tech Support Allegory

So I go to my doctor, and I say “Doctor, when I stand up, I get this pain in my chest.”  And the doctor says “Well, let’s see about that.  Does this happen when you stand up from a sitting position?”

“Yes,” I say.

So the doctor sits down in a chair and then stands up.  “Hm,” says the doctor, “I don’t feel anything.  There’s no problem as far as I can tell.”  And he walks out the door.

Sunday, February 4, 2018

Building Tkinter Interfaces with Reusable Components from the TkPane Library

Building graphical user interfaces (GUIs) with Python's Tkinter library usually requires a bit of fiddling not only to get the visual layout correct, but also to link the various Tkinter widgets so that they interact properly. Common interactions among widgets include enabling or disabling one based on the state of another, and accessing the data in one widget from another widget. The network of interactions between widgets in a user interface can require painstaking effort to create and modify, and limits the resuability of those user interface components.

Whereas the TkLayout package simplifies creation of the visual layout of a Tkinter GUI, the TkPane package simplifies the creation, use, and re-use of Tkinter widgets and groups of Tkinter widgets, particularly those that are used to collect or manage data, and that need to interact with other components of the GUI.

The TkPane package provides a TkPane class that can be subclassed to create custom 'pane' objects. Several general-purpose panes are provided in the tkpane.lib module. Pane objects are completely stand-alone, with no inherent dependencies on any other components of the GUI. Pane objects have a standard set of methods, however, by which other panes (or other application code) can enable or disable those panes, or clear their displays. Pane objects also have internal lists of callback functions that are automatically executed when their data are changed to become valid, when their data are changed to become invalid, or when the user leaves the pane (via a keyboard focus change or a mouse movement). Dependencies between panes are easily established with the requires() method. For example, the statement

run_button_pane.requires(input_file_pane)

ensures that the run_button_pane pane will be disabled when the input_file_pane pane contains invalid data, and will be enabled when the input_file_pane pane contains valid data. When one pane enables another, the first pane passes its own data to the second pane.

The type of inter-pane interaction that is enabled by the requires() method may be all that is needed in many cases. If more complex interactions are required, however, callback lists can be modified directly to enable other types of inter-pane activation and data sharing.

When a pane object is instantiated, the constructor method (__init()__) must be passed the Tkinter widget within which the frame will be embedded. Typically that parent widget is a frame. This required argument of pane constructors is identical to the argument that must be passed to the 'build' functions of TkLayout objects. Consequently, TkLayout and TkPane objects work together very well, and the combination of the two practically eliminates all of the fiddling with visual and functional aspects of a GUI when using Tkinter directly.

The following example shows how these two packages can be used together to easily build a Tkinter GUI interface by populating layout elements with panes. This example uses pane classes from tkpane.lib.

try:
    import Tkinter as tk
except:
    import tkinter as tk
import tkpane.lib
import tklayout
import time


# Add a method to the AppLayout class to get a pane: the first child of
# a frame's widgets.
def layout_pane(self, pane_name):
    return self.frame_widgets(pane_name)[0]

tklayout.AppLayout.pane = layout_pane


# Create simple sets of data to display in the TableDisplayPane.
ds1_headers = ["Title", "Author", "Published"]
ds1_data = [["Kon-Tiki: Across the Pacific in a Raft", "Thor Heyerdahl", 1950],
            ["Mawson's Will", "Lennard Bickel", 2000],
            ["Southern Cross to Pole Star - Tschiffely's Ride", "A. F. Tschiffely", 1933],
            ["The Hitchhiker's Guide to the Galaxy", "Douglas Adams", 1979]]

ds2_headers = ["row_id","row_number","long_text","some_date","some_number"]
ds2_data = [
           ["Row 4",4,"Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem.","1951-03-19",61.9917173461],
           ["Row 8",8,"Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus.","1977-07-21",34.5729855806],
           ["Row 12",12,"DJs flock by when MTV ax quiz prog.","1983-10-12",2.3773967111]
           ]

def build_message_pane(parent):
    tkpane.lib.MessagePane(parent, "Enter user credentials and an output directory.")

def build_table_pane(parent):
    tkpane.lib.TableDisplayPane(parent, "This is an example message to accompany the data table that is long enough that it should wrap when the window is resized to a relatively small size.", ds1_headers, ds1_data)

def build_button_pane(parent):
    def no_action():
        pass
    tkpane.lib.OkCancelPane(parent, no_action, no_action)

# Lay out the panes
lo = tklayout.AppLayout()
inp_panes = lo.row_elements(["user_pane", "output_pane"], row_weight=0)
app = lo.column_elements(["message_pane", inp_panes, "table_pane", 
                          "button_pane", "status_pane"], 
                          row_weights=[0,0,1,0,0])

root = tk.Tk()
root.title("Demo of the TkPane Package")

# Use an extra frame within the root element with padding to add extra space
# around the outermost app widgets.
appframe = tk.Frame(root, padx=11, pady=11)
appframe.pack(expand=True, fill=tk.BOTH)

lo.create_layout(appframe, app)

lo.build_elements({"message_pane": build_message_pane,
                   "user_pane": tkpane.lib.UserPane,
                   "output_pane": tkpane.lib.OutputDirPane,
                   "table_pane": build_table_pane,
                   "button_pane": build_button_pane,
                   "status_pane": tkpane.lib.StatusProgressPane
                   })

# Get the pane objects for customization.
user_pane = lo.pane("user_pane")
output_pane = lo.pane("output_pane")
button_pane = lo.pane("button_pane")
status_pane = lo.pane("status_pane")
table_pane = lo.pane("table_pane")

# Require a user name and output directory to be entered for the 'OK' button
# to be enabled.
button_pane.requires(user_pane)
button_pane.requires(output_pane)

# Start the demo application with the 'OK' button disabled.
tkpane.en_or_dis_able_all([button_pane])

# Make the user and output directory panes report their status.
user_pane.status_reporter = status_pane
output_pane.status_reporter = status_pane

# Make the buttons change the data and the status bar.
def ok_click(*args):
    status_pane.set_status("OK button clicked.")
    time.sleep(0.5)
    status_pane.set_status("Working...")
    for p in range(10, 110, 10):
        status_pane.set_value(p)
        root.update_idletasks()
        time.sleep(0.5)
    status_pane.set_status("Done.")
    table_pane.display_data(ds2_headers, ds2_data)
button_pane.set_ok_action(ok_click)

def cancel_click(*args):
    status_pane.clear([])
button_pane.set_cancel_action(cancel_click)

# Bind  and  to the buttons.
root.bind("", ok_click)
root.bind("", cancel_click)

# Run the application
root.mainloop()



This code produces a GUI with the initial appearance shown below.  The background of the output directory entry is automatically colored to indicate that it is required but does not contain valid data.



As this example illustrates, the re-usability of pane classes allows Tkinter GUIs to be created with almost no Tkinter code.

The TkPane package can be downloaded from the Python Package Index (PyPI), or installed with pip.

pip install tkpane


The documentation for TkPane is available on ReadTheDocs.

Sunday, January 28, 2018

Building Tkinter Interfaces from the Inside Out

Graphical user interfaces (GUIs) ordinarily consist of a number of different elements, such as those illustrated in Figure 1. Some GUI development toolkits, specifically including Python's Tkinter module, require these elements to be assembled from the outside in. That is, the outermost frame must be defined first, and that first frame then populated with any container elements (e.g., additional frames), and then those second-level frames populated next, and so forth.
Figure 1. Sketch of example GUI.

Such an interface may be easier to describe from the inside out, though, rather than from the outside in. For example, the GUI shown in Figure 1 could be described as:

group_1 ← column_of (topic_selection, date_selection)
group_2 ← row_of (group_1, format_options)
group_3 ← column_of (header, group_2, output_selection, buttons)

The group_3 element then describes the entire GUI, including the nested hierarchy of other elements.

The TkLayout Python package allows GUIs to be built by describing them in just this way. The key to this process is separation of the steps of describing the interface and creating the interface.

To describe the interface, the GUI implementer must first assign a name to each element of the GUI. For the example shown in Figure 1, reasonable names would be "topic_selection", "date_selection", etc. The interface is then described using methods of an AppLayout object as follows:

group_1 = column_elements("topic_selection", "date_selection")
group_2 = row_elements(group_1, "format_options")
group_3 = column_elements("header", group_2, "output_selection", "buttons")


The row_elements and column_elements methods synthesize and return new element names that identify the element groupings that are created—i.e., group_1, group_2, and group_3. After the structure of the GUI has been described, the frames that implement that structure are created with another method of the AppLayout object, as follows:

create_layout(root, group_3)

where the root argument is a Tkinter top level object or other container that will contain the GUI.

The create_layout method creates a series of nested frames, as shown in Figure 2 for the GUI illustrated in Figure 1.


Figure 2. Frame hierarchy for the GUI in Figure 1.

After this set of frames is created, the enclosing frame for each element can be obtained from the AppLayout object using that element's name. The GUI implementer can then populate each of these frames with other GUI widgets as appropriate. The AppLayout class will also take a dictionary of element names and element-construction functions, and run those functions to populate each element's enclosing frame.  The functions that create the elements for each frame require only the enclosing frame object, and do not need to know anything about how the elements are laid out in relation to one another.

Separation of structure description and implementation in this way allows the GUI construction process to be greatly simplified.  Exploration of alternate interface layouts is also simplified: each alternate layout can be described in just a few lines of code, without any changes required to code to populate or arrange the interface elements.

The TkLayout package can be downloaded from the Python Package Index (PyPI) or installed with the command

pip install tklayout

The documentation is available on ReadTheDocs.

Saturday, December 9, 2017

Ordering Database Tables by Foreign Key Dependencies

Some database operations that affect multiple tables must touch the affected tables in an order corresponding to their dependencies. For example, data must be loaded into parent tables before data can be loaded into child (dependent) tables. Some deletion and update operations may also need to be carried out in dependency order. A list of all tables in dependency order can be useful to automate such operations. Several methods are shown here for generating a list of tables in dependency order; these methods are:
  • Python code
  • A CTE
  • An execsql script.

All of these methods take, as input, a table of dependencies in which each row contains a pair of table names corresponding to a parent:child relationship. The Python code produces a Python list containing tables listed in dependency order, and the other two methods produce a table consisting of one column containing table names and another column containing integers that specify the dependency order. The first items in these lists are the tables that have no parents, and the last items in these lists are the tables with the longest chain of dependencies.

Getting the Dependencies


When using a DBMS that supports INFORMATION_SCHEMA tables, the following SQL can be used to obtain a table containing all direct parent:child pairs

create table dependencies as
select 
        tc.table_name as child,
        tu.table_name as parent
from 
        information_schema.table_constraints as tc
        inner join information_schema.constraint_table_usage as tu
             on tu.constraint_name = tc.constraint_name
where 
        tc.constraint_type = 'FOREIGN KEY'
        and tc.table_name <> tu.table_name;

Additional constraints may be used, or needed, in the WHERE clause to limit the set of tables returned--for example, to eliminate system tables, or to select only the tables in a particular schema. If there are cyclic dependency relationships among tables, this SQL will not complete, and so in those cases at least one of the tables in the cycle of mutual dependencies should be omitted using an appropriate specification in the WHERE clause.

The table produced by this SQL will include only those tables that are part of some foreign key relationship. Standalone tables that are neither a parent nor a child will not be included. Standalone tables and tables that are omitted because of cyclic dependency relationships can be added to the output of the dependency-ordering step.

Ordering Tables by Dependency


Converting the set of parent:child dependencies into a list of tables in dependency order requires an iterative or recursive traversal of the tree of relationships that is rooted at the tables that have no parents. The table-ordering routines that follow use different approaches:
  • A loop over a list of unprocessed dependencies in Python, where that list is modified within the loop.
  • A recursive traversal of the tree using a CTE, terminating when the farthest leaves of the tree have been reached.
  • Looping using end recursion in the execsql script to traverse the tree in a manner similar to the CTE.

The algorithm used in the Python code is distinct, whereas the algorithms used with the recursive CTE and the execsql script are very similar. (The algorithm used by the Python code can also be implemented using an execsql script, but the code is considerably longer than the implementation shown below.)

Python


The input for the Python code to generate a dependency-ordered list of tables is a table of dependencies consisting of a list of two-element lists or tuples, each containing a child table name and a parent table name. This list might be generated by querying the database (e.g., using the SQL above) or from static configuration data. Given this table of dependencies, a Python list of all of the tables in dependency order can be generated with the following function

def dependency_order(dep_list):
    rem_tables = list(set([t[0] for t in dep_list] + [t[1] for t in dep_list]))
    rem_dep = copy.copy(dep_list)
    sortkey = 1
    ret_list = []
    while len(rem_dep) > 0:
        tbls = [tbl for tbl in rem_tables if tbl not in [dep[0] for dep in rem_dep] ]
        ret_list.extend([ (tb, sortkey) for tb in tbls ])
        rem_tables = [ tbl for tbl in rem_tables if tbl not in tbls ]
        rem_dep = [ dep for dep in rem_dep if dep[1] not in tbls ]
        sortkey += 1
    if len(rem_tables) > 0:
        ret_list.extend([(tb, sortkey) for tb in rem_tables])
    ret_list.sort(cmp=lambda x,y: cmp(x[1], y[1]))
    return [ item[0] for item in ret_list ]

SQL CTE


For DBMSs that support them, a recursive CTE can be used to convert a table of parent:child dependencies into a list of tables in dependency order. The input for the following code should be a table of such dependencies; that table should be named "dependencies". The output of the recursive CTE contains all parent tables, and the remaining tables (that are not parents to any other table) are added in the SELECT statement that uses the CTE.

with recursive dep_depth as (
 select
  dep.child,
  dep.parent,
  1 as lvl
 from
  dependencies as dep
 union all
 select
  dep.child,
  dep.parent,
  dd.lvl + 1 as lvl
 from
  dep_depth as dd
  inner join dependencies as dep on dep.parent = dd.child
 )
select
 table_name,
 table_order
from (
 select
  dd.parent as table_name,
  max(lvl) as table_order
 from
  dep_depth as dd
 group by
  table_name
 union
 select
  dd.child as table_name,
  max(lvl) + 1 as level
 from
  dep_depth as dd
  left join dependencies as dp on dp.parent = dd.child
 where
  dp.parent is null
 group by
  dd.child
 ) as all_levels;

Execsql Script


For DBMSs that don't support recursive CTEs, and when use of a client language like Python is not desired, the metacommands provided by execsql allow recursive traversal of the tree of dependencies, as shown in the following code. Explanations of the metacommands used in this code can be found in the on-line documentation.

As with the CTE implementation, the input for the following code should be a table of parent:child dependencies that is named "dependencies".

The following code increments a counter to track and assign the successive levels of recursion during traversal from the root to the leaves of the dependency tree. Because automatically-generated sequences and variables are DBMS-specific extensions to SQL, for the sake of generality, this implementation uses execsql counter variables and substitution variables. Thus, but for minor differences in SQL syntax, the following code should run in any DBMS.

-- ====================================================================
--  Initialize the tables used to summarize dependency order.
--  Table created:
--    dep_level: A copy of "dependencies" with an additional column
--               to store the level in the hierarchy.  The dependency
--               level is set by an execsql counter variable for
--               generality.
-- ====================================================================
-- !x! sub current_level !!$counter_530!!
select
    child,
    parent,
    !!current_level!! as lvl
into
    temporary table dep_level
from
    dependencies;


-- ====================================================================
--  Create a view to evaluate whether there are any remaining
--  dependencies to evaluate.
--  View created:
--    unprocessed: The number of parent tables whose children are
--                 not already listed as parents in the 'dep_level' table.
--  Tables used:
--    dep_level
--    dependencies
-- ====================================================================
create temporary view unprocessed as
select count(distinct dep.child) as unproc
from
    dep_level as dl
    inner join dependencies as dep on dl.child = dep.parent
where
    dl.lvl = (select max(lvl) from dep_level);


-- ====================================================================
--  Define an execsql sub-script to increment the dependency level.
-- ====================================================================
-- !x! begin script add_new_level
-- !x! sub last_level !!current_level!!
-- !x! sub current_level !!$counter_530!!
insert into dep_level
    (child, parent, lvl)
select distinct
    dep.child,
    dl.child as parent,
    !!current_level!!
from
    dep_level as dl
    inner join dependencies as dep on dl.child = dep.parent
where
    dl.lvl = !!last_level!!;
-- !x! end script


-- ====================================================================
--  Define and execute an execsql sub-script to increment the dependency
--  level as many times as necessary.
-- ====================================================================
-- !x! begin script add_levels
-- !x! subdata remaining unprocessed
-- !x! execute script add_new_level
-- !x! subdata remaining unprocessed
-- !x! if(is_gt(!!remaining!!, 0)) { execute script add_levels }
-- !x! end script

-- !x! execute script add_levels


-- ====================================================================
--  Convert the dependency levels into a table order.
-- ====================================================================
create temporary table dependency_order as
select
 table_name,
 table_order
from (
 select
  dd.parent as table_name,
  max(lvl) as table_order
 from
  dep_level as dd
 group by
  table_name
 union
 select
  dd.child as table_name,
  max(lvl) + 1 as level
 from
  dep_level as dd
  left join dependencies as dp on dp.parent = dd.child
 where
  dp.parent is null
 group by
  dd.child
 ) as all_levels;

Saturday, November 19, 2016

A Year and Quarter Data Type for PostgreSQL

For some applications, time periods are needed that represent quarters of the year.  Although quarters could be represented by using an exact date data type and constraining the month and day values (i.e., to the first day of each quarter) or by using date ranges in PostgreSQL, these representations do not allow easy computation of the difference between two dates (in quarters) or the addition of specific number of quarters to a date to produce a new date.

The code below shows the implementation of custom composite data type in Postgres that represents quarters of a year. This data type is named "quarter". The implementation supports addition of a number of quarters to a "quarter" date to produce a new "quarter" date, and supports calculation of the difference between two "quarter" dates. Comparison operators are also defined for use in expressions using "quarter" dates and to allow indexing of data tables by "quarter" values.

The Postgres code to define this data type, and arithmetic and comparison operators, is shown below. Details on the creation of a custom data type and operators can be found in the Postgres documentation.

The definition of the "quarter" data type is simply:

create type quarter as (
 year integer,
 quarter integer
 );

To support addition and subtraction of an integer number of quarters to (or from) a "quarter" data type, a function must be defined to carry out this calculation. The arguments to the function are a "quarter" data type and the number of quarters; the second argument may be either positive or negative.

create or replace function qtr_add(
 qtr quarter,
 addqtr integer)
 returns quarter as
$BODY$
DECLARE
 qtr2 quarter;
BEGIN
 -- Adjust the year
 if addqtr >= 0 then
  qtr2.year := qtr.year + div((qtr.quarter + addqtr-1), 4);
 else
  qtr2.year := qtr.year - div(((4 - qtr.quarter) - addqtr), 4);
 end if;
 -- Adjust the quarter
 qtr2.quarter := mod(qtr.quarter + addqtr - 1, 4) +1;
 if qtr2.quarter < 1 then 
  qtr2.quarter := qtr2.quarter + 4;
 end if;
 --
 return qtr2;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


Subtracting two "quarter" dates can be carried out in a simple expression, but a function is defined to carry out this calculation because it is used for both the mathematical and comparison operators.

create or replace function qtr_diff(
 qtr1 quarter,
 qtr2 quarter)
 returns integer as
$BODY$
BEGIN
 return 4 * (qtr1.year - qtr2.year) + (qtr1.quarter - qtr2.quarter);
END;
$BODY$
 language plpgsql
 immutable leakproof strict;

To support comparison operators (<, <=, =, <>, >=, and >), a function is defined to compare two "quarter" dates. This function returns -1, 0, or 1 depending on whether its first argument is less than, equal to, or greater than its second argument.

create or replace function quarter_comp(
 qtr1 quarter,
 qtr2 quarter)
 returns integer as
$BODY$
DECLARE
 diff integer;
BEGIN
 diff := qtr_diff(qtr1, qtr2);
 if diff = 0 then
  return 0;
 elsif diff < 1 then
  return -1;
 end if;
 return 1;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


Each comparison operation is carried out by an individual function that uses the 'quarter_comp()' function.

-- --------------------------------------------------------------------------
-- qtr_lt()
-- --------------------------------------------------------------------------
create or replace function qtr_lt(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) < 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


-- --------------------------------------------------------------------------
-- qtr_lte()
-- --------------------------------------------------------------------------
create or replace function qtr_lte(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) <= 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


-- --------------------------------------------------------------------------
-- qtr_eq()
-- --------------------------------------------------------------------------
create or replace function qtr_eq(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) = 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


-- --------------------------------------------------------------------------
-- qtr_ne()
-- --------------------------------------------------------------------------
create or replace function qtr_ne(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) <> 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


-- --------------------------------------------------------------------------
-- qtr_gte()
-- --------------------------------------------------------------------------
create or replace function qtr_gte(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) >= 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;



-- --------------------------------------------------------------------------
-- qtr_gt()
-- --------------------------------------------------------------------------
create or replace function qtr_gt(
 qtr1 quarter,
 qtr2 quarter)
 returns boolean as
$BODY$
BEGIN
 return quarter_comp(qtr1, qtr2) > 0;
END;
$BODY$
 language plpgsql
 immutable leakproof strict;


After the functions to carry out the arithmetic and comparison operations have been defined, the operators themselves can be defined.

-- --------------------------------------------------------------------------
-- +
-- --------------------------------------------------------------------------
create operator +(
 procedure = qtr_add,
 leftarg = quarter,
 rightarg = integer,
 commutator = +);

-- --------------------------------------------------------------------------
-- -
-- --------------------------------------------------------------------------
create operator -(
 procedure = qtr_diff,
 leftarg = quarter,
 rightarg = quarter);


-- --------------------------------------------------------------------------
-- <
-- --------------------------------------------------------------------------
create operator <(
 procedure = qtr_lt,
 leftarg = quarter,
 rightarg = quarter,
 commutator = >,
 negator = >=);

-- --------------------------------------------------------------------------
-- <=
-- --------------------------------------------------------------------------
create operator <=(
 procedure = qtr_lte,
 leftarg = quarter,
 rightarg = quarter,
 commutator = >=,
 negator = >);

-- --------------------------------------------------------------------------
-- =
-- --------------------------------------------------------------------------
create operator =(
 procedure = qtr_eq,
 leftarg = quarter,
 rightarg = quarter,
 commutator = =,
 negator = <>);

-- --------------------------------------------------------------------------
-- <>
-- --------------------------------------------------------------------------
create operator <>(
 procedure = qtr_ne,
 leftarg = quarter,
 rightarg = quarter,
 commutator = <>,
 negator = =);

-- --------------------------------------------------------------------------
-- >
-- --------------------------------------------------------------------------
create operator >(
 procedure = qtr_gt,
 leftarg = quarter,
 rightarg = quarter,
 commutator = <,
 negator = <=);

-- --------------------------------------------------------------------------
-- >=
-- --------------------------------------------------------------------------
create operator >=(
 procedure = qtr_gte,
 leftarg = quarter,
 rightarg = quarter,
 commutator = >=,
 negator = <);


To allow "quarter" dates to be used in indexes, an operator class must be defined.

create operator class quarter_ops default
for type quarter using btree as
   operator 1  <,
   operator 2  <=,
   operator 3  =,
   operator 4  >=,
   operator 5  >,
   function 1  quarter_comp(quarter, quarter);

The same approach could be used to define a custom "month" data type.  The code for a "month" data type would be almost identical, with changes needed only to the data type name used and to the functions 'qtr_add()' and 'qtr_diff()'.

Sunday, May 22, 2016

Using a Makefile with a Database

A makefile is a convenient and efficient way to run a multi-step process that creates or modifies files.  However, when one step of that process is to extract data from a server-based database, make has no inherent ability to determine when the data in the database has last been changed, and therefore, whether the data extraction step needs to be re-run.

The following Python program (db_updated.py) addresses this problem. It updates the modification date of a specific flag file (named db_updated.flag by default), creating the file if it does not already exist. This code is written to be used with a PostgreSQL database, but can be easily modified for other databases.

The program requires that the database contain at least one table with a column that contains a revision date (and time), either for that table or for the database as a whole. Command-line options can be used to specify the table(s) to be checked, and the name of the column to be checked. If neither is specified, a default table name of "auditlog" is used, and a default column name of "rev_time" is used.

Command-line options and arguments are:
db_updated.py [-t table] [-c column] [-u user] server database

Multiple -t options may be specified; the latest revision date of any of the tables will be used to set the modification date of the db_updated.flag file. If a user name is specified, the program will prompt for a password.


db_updated.py:
import argparse
import psycopg2
import getpass
import os
import os.path
import time

FLAG_FILE = 'db_updated.flag'

def get_user(server_name, database_name):
 user_name = raw_input("User name for %s database %s: " % (server_name, database_name))
 return user_name

def get_password(server_name, database_name, user_name):
 passwd = getpass.getpass("Password for user %s on %s database %s: " % (user_name, server_name, database_name))
 return passwd

def clparser():
 desc_msg = "Create or update the  database flag file if the given database has been recently modified."
 parser = argparse.ArgumentParser(description=desc_msg)
 parser.add_argument('-u', '--db_user', dest='db_user', action='store')
 parser.add_argument('-t', '--table', dest='table', action='append')
 parser.add_argument('-c', '--column', dest='column', action='store')
 parser.add_argument('server', action='store')
 parser.add_argument('database', action='store')
 return parser

def main():
 parser = clparser()
 arg = parser.parse_args()
 if not os.path.exists(FLAG_FILE):
  open(FLAG_FILE, 'a').close()
 if arg.db_user:
  db_user = arg.db_user
  pw = get_password(arg.server, arg.database, db_user)
  conn = psycopg2.connect(host=arg.server, database=arg.database, port=5432, user=unicode(db_user), password=unicode(pw))
 else:
  conn = psycopg2.connect(host=arg.server, database=arg.database, port=5432)
 curs = conn.cursor()
 column = arg.column or "rev_time"
 if arg.table:
  tables = 0
  sql = ''
  for t in arg.table:
   if tables > 0:
    sql = sql + " union "
   tables += 1
   sql = sql + "select max(%s) as latest from %s" % (column, t)
  if tables > 1:
   sql = "select max(latest) from (%s) as all_t" % sql
  sql = sql + ";"
 else:
  sql = "select max(%s) from auditlog;" % column
 curs.execute(sql)
 revtime = time.mktime(curs.fetchone()[0].timetuple())
 conn.close()
 os.utime(FLAG_FILE, (revtime, revtime))

main()

This can be used in a makefile like the example below. This example extracts data from a database and then runs an R script to further summarize the data.  The input and output file names for the R script are shown on the command line here, though encoding them in the script itself may be more convenient in many cases than parsing the command line in R.

.PHONY: all

all: db_updated.flag data_summary.csv
 db_updated.py -u user_name server_name db_name

data_summary.csv: data.csv summary_stats.R
 Rscript --vanilla summary_stats.R data.csv data_summary.csv

data.csv: db_updated.flag get_data.sql
 execsql.py -tp -u user_name get_data.sql server_name db_name

db_updated.flag:
 db_updated.py -u user_name server_name db_name

In this example, neither a table name nor a column name are included as arguments to db_updated.py. These may be needed in some circumstances, or the program modified so that the defaults are appropriate for the circumstances in which it is used.

Saturday, April 23, 2016

A Threaded Tkinter Toplevel Console Window

The code below implements a GUI window for output display, such as might be used as a console to display status messages or other information from a running program. It is implemented using a threaded Tkinter Toplevel widget, and is designed to be used in a non-GUI command-line program. The display that it produces looks like this:


The Python code for this GUI console is:
class ConsoleUIError(Exception):
 def __init__(self, msg):
  self.value = msg
 def __repr__(self):
  return ("ConsoleUIError(%r)" % self.value)

class ConsoleUI(object):
 class TkUI(object):
  def __init__(self, kill_event, stop_update_event, msg_queue, status_queue, title=None):
   self.kill_event = kill_event
   self.stop_update_event = stop_update_event
   self.msg_queue = msg_queue
   self.status_queue = status_queue
   import Tkinter as tk1
   import ttk as ttk1
   self.win = tk1.Toplevel()
   self.status_msg = tk1.StringVar()
   self.status_msg.set('')
   self.win.title(title if title else "execsql console")
   console_frame = ttk1.Frame(master=self.win, padding="2 2 2 2")
   console_frame.grid(column=0, row=0, sticky=tk1.NSEW)
   self.textarea = tk1.Text(console_frame, width=100, height=25, wrap='none')
   # Status bar
   statusframe = ttk1.Frame(master=self.win)
   statusbar = ttk1.Label(statusframe, text='', textvariable=self.status_msg, 
    relief=tk1.RIDGE, anchor=tk1.W)
   statusbar.pack(side=tk1.BOTTOM, fill=tk1.X)
   statusframe.grid(column=0, row=1, sticky=tk1.EW)
   # Scrollbars
   vscroll = tk1.Scrollbar(console_frame, orient="vertical", command=self.textarea.yview)
   hscroll = tk1.Scrollbar(console_frame, orient="horizontal", command=self.textarea.xview)
   self.textarea.configure(yscrollcommand=vscroll.set)
   self.textarea.configure(xscrollcommand=hscroll.set)
   self.textarea.grid(column=0, row=0, sticky=tk1.NSEW)
   vscroll.grid(column=1, row=0, sticky=tk1.NS)
   hscroll.grid(column=0, row=2, sticky=tk1.EW)
   # Allow resizing
   self.win.columnconfigure(0, weight=1)
   self.win.rowconfigure(0, weight=1)
   console_frame.columnconfigure(0, weight=1)
   console_frame.rowconfigure(0, weight=1)
   # Kill on window close
   self.win.protocol("WM_DELETE_WINDOW", self.kill)
   # Display and center the window
   self.win.update_idletasks()
   m = re.match("(\d+)x(\d+)([-+]\d+)([-+]\d+)", self.win.geometry())
   wwd = int(m.group(1))
   wht = int(m.group(2))
   swd = self.win.winfo_screenwidth()
   sht = self.win.winfo_screenheight()
   xpos = (swd/2) - (wwd/2)
   ypos = (sht/2) - (wht/2)
   self.win.geometry("%dx%d+%d+%d" % (wwd, wht, xpos, ypos))
   self.win.grab_set()
   self.win._root().withdraw()
   self.update_id = self.win.after(200, self.update)
   self.win.wait_window(self.win)
  def kill(self):
   if self.update_id:
    self.win.after_cancel(self.update_id)
    self.update_id = None
   self.win.destroy()
   self.win.update_idletasks()
  def update(self):
   self.update_id = None
   while not self.msg_queue.empty():
    msg = self.msg_queue.get(False)
    self.textarea.insert('end', msg)
    self.textarea.see('end')
    self.msg_queue.task_done()
   while not self.status_queue.empty():
    msg = self.status_queue.get(False)
    self.status_msg.set(msg)
   if self.kill_event.is_set():
    self.kill()
   else:
    if not self.stop_update_event.is_set():
     self.update_id = self.win.after(200, self.update)
 def __init__(self, title=None):
  self.title = title
  self.msg_queue = Queue.Queue()
  self.status_queue = Queue.Queue()
  self.kill_event = threading.Event()
  self.stop_update_event = threading.Event()
  self.consolethread = None
  self.update_id = None
  # Start the local event loop in a thread.
  def openconsole():
   self.active = True
   self.ui = self.TkUI(self.kill_event, self.stop_update_event, self.msg_queue, 
    self.status_queue, self.title)
   # Deallocate the Tk object here to avoid the "main thread is not in main loop" error.
   self.ui = None
  self.consolethread = threading.Thread(target=openconsole)
  self.consolethread.start()
 def write(self, msg):
  self.active = self.consolethread and self.consolethread.is_alive()
  if not self.active:
   raise ConsoleUIError(msg)
  self.msg_queue.put(msg)
 def write_status(self, msg):
  self.active = self.consolethread and self.consolethread.is_alive()
  if not self.active:
   raise ConsoleUIError(msg)
  self.status_queue.put(msg)
 def deactivate(self):
  self.kill_event.set()
  if self.consolethread and self.consolethread.is_alive():
   self.consolethread.join()
  self.active = False
 def wait_for_user_quit(self):
  self.stop_update_event.set()
  if self.consolethread and self.consolethread.is_alive():
   self.consolethread.join()
  self.active = False


Because it runs in its own thread, this console can be used to display information produced by the main program or even several other separate processes.

The 'write()' method of the ConsoleUI object will write the given text at the end of the console display.  A status message can also be written, separately from the stream of text that is written in the main part of the window.  The console window can be closed directly from the program with the 'deactivate()' method, or the program can pause until the user closes the window by using the 'wait_for_user_quit()' method.

Other Tkinter widgets should not be activated in other threads while this GUI console window is open.

Tuesday, April 12, 2016

Calculating the Median in SQL Using Window Functions

Numerous approaches to calculating the median in SQL have been presented in books and online. Here's another one that is short and simple, and makes use of window functions. This example is specific to PostgreSQL, but could be adapted to any other DBMS that supports window functions. To illustrate this method, I'll use a set of test data in a table named median_test:

create table median_test (id text, value double precision);

insert into median_test 
    (id, value)
select
    id,
    value
from
    (select 
        generate_series(1,100) as num, 
        chr(cast(random()*10 as integer)+65) as id, 
        random() as value
    ) as dd;

To support the median calculation, two columns are added using window functions, one of which is the row number of the ordered values (but doubled), and the other of which is the total number of rows. The window functions allow these to be calculated for each data frame, which is determined here by values of the id column. In this example the additional columns are added in a temporary view, but this could instead be a subquery, a common table expression, or even an actual table.

create or replace temporary view d as
select
    id,
    value,
    row_number() over(partition by id order by value) * 2 as rownum2,
    count(*) over (partition by id) as rows
from 
    median_test;

The median calculation is then carried out by averaging either the two central values when there is an even number of values, or the one central value when there is an odd number of values.

select 
    id,
    avg(value) as median
from
    d
where 
    rownum2 in (rows, rows+1, rows+2)
group by
    id;

When there is an odd number of rows, the single median value will have a value of rownum2 equal to rows+1. When there is an even number of rows, the two central rows will have values of rownum2 equal to either rows or rows+2.

Friday, December 4, 2015

Monday, November 23, 2015

execsql

execsql is a Python application that reads a text file of SQL commands and executes them against a database. The supported databases are:
  • PostgreSQL
  • MS-Access
  • MS-SQL Server
  • SQLite
  • MySQL or MariaDB
  • Firebird
  • ODBC DSN connections.
In addition to executing SQL statements, execsql also implements a number of metacommands that allow:
  • Import of data from text files and OpenDocument spreadsheets
  • Export of data to delimited text, HTML, JSON, LaTeX tables, and OpenDocument spreadsheets
  • Copying of data between databases, even of different DBMS types
  • Conditional execution of SQL code and metacommands based on data values or user input.

execsql allows variables to be defined and used to dynamically modify SQL code and metacommands. An INCLUDE metacommand can be used to modularize SQL scripts and facilitate code re-use. Variables can be used to parameterize included scripts. Conditional inclusion of scripts can be used to implement loops. Automatically incrementing counter variables can be used to control loops or generate unique values to be used in input or output.

execsql is fundamentally a command-line application that can be used to automate database operations, either by itself or as part of a toolchain that may include other steps to either pre-process data before loading, or post-process or analyze data that are extracted from a database. However, execsql also has the ability to present data to the user in dialog boxes, request several types of input from the user either on the terminal or in GUI dialogs, and display messages to the user either on the terminal or in a GUI console. These features allow flexible interactive applications to be created.

A guiding principle in the development of execsql is documentation of all data operations. The use of SQL scripts, rather than ad-hoc interactive operations in a GUI, is key to meeting that goal. In addition, execsql logs all usage information, including databases used, script files executed (including script file version information), variable assignments, user input, and errors. Chain-of-custody procedures are used in some disciplines to ensure traceability from collection through (typically) laboratory analysis. However, no formal chain of custody procedures ordinarily are applied to data during or after entry into a database. SQL script files and execsql log files can be used to produce a complete record of data operations so that traceability of data can be extended into the database environment.

execsql and its documentation are available from the Python Package Index (PyPI). Note that the search functionality in PyPI and in pip search is broken, and will return links to obsolete versions of execsql that are no longer available. The direct link will take you to the latest available version.