A makefile is a convenient and efficient way to run a multi-step process that creates or modifies files. However, when one step of that process is to extract data from a server-based database, make has no inherent ability to determine when the data in the database has last been changed, and therefore, whether the data extraction step needs to be re-run.
The following Python program (db_updated.py) addresses this problem. It updates the modification date of a specific flag file (named db_updated.flag by default), creating the file if it does not already exist. This code is written to be used with a PostgreSQL database, but can be easily modified for other databases.
The program requires that the database contain at least one table with a column that contains a revision date (and time), either for that table or for the database as a whole. Command-line options can be used to specify the table(s) to be checked, and the name of the column to be checked. If neither is specified, a default table name of "auditlog" is used, and a default column name of "rev_time" is used.
Command-line options and arguments are:
Multiple -t options may be specified; the latest revision date of any of the tables will be used to set the modification date of the db_updated.flag file. If a user name is specified, the program will prompt for a password.
This can be used in a makefile like the example below. This example extracts data from a database and then runs an R script to further summarize the data. The input and output file names for the R script are shown on the command line here, though encoding them in the script itself may be more convenient in many cases than parsing the command line in R.
In this example, neither a table name nor a column name are included as arguments to db_updated.py. These may be needed in some circumstances, or the program modified so that the defaults are appropriate for the circumstances in which it is used.