The following Python program (db_updated.py) addresses this problem. It updates the modification date of a specific flag file (named db_updated.flag by default), creating the file if it does not already exist. This code is written to be used with a PostgreSQL database, but can be easily modified for other databases.
The program requires that the database contain at least one table with a column that contains a revision date (and time), either for that table or for the database as a whole. Command-line options can be used to specify the table(s) to be checked, and the name of the column to be checked. If neither is specified, a default table name of "auditlog" is used, and a default column name of "rev_time" is used.
Command-line options and arguments are:
db_updated.py [-t table] [-c column] [-u user] server database
Multiple -t options may be specified; the latest revision date of any of the tables will be used to set the modification date of the db_updated.flag file. If a user name is specified, the program will prompt for a password.
db_updated.py:
import argparse import psycopg2 import getpass import os import os.path import time FLAG_FILE = 'db_updated.flag' def get_user(server_name, database_name): user_name = raw_input("User name for %s database %s: " % (server_name, database_name)) return user_name def get_password(server_name, database_name, user_name): passwd = getpass.getpass("Password for user %s on %s database %s: " % (user_name, server_name, database_name)) return passwd def clparser(): desc_msg = "Create or update the database flag file if the given database has been recently modified." parser = argparse.ArgumentParser(description=desc_msg) parser.add_argument('-u', '--db_user', dest='db_user', action='store') parser.add_argument('-t', '--table', dest='table', action='append') parser.add_argument('-c', '--column', dest='column', action='store') parser.add_argument('server', action='store') parser.add_argument('database', action='store') return parser def main(): parser = clparser() arg = parser.parse_args() if not os.path.exists(FLAG_FILE): open(FLAG_FILE, 'a').close() if arg.db_user: db_user = arg.db_user pw = get_password(arg.server, arg.database, db_user) conn = psycopg2.connect(host=arg.server, database=arg.database, port=5432, user=unicode(db_user), password=unicode(pw)) else: conn = psycopg2.connect(host=arg.server, database=arg.database, port=5432) curs = conn.cursor() column = arg.column or "rev_time" if arg.table: tables = 0 sql = '' for t in arg.table: if tables > 0: sql = sql + " union " tables += 1 sql = sql + "select max(%s) as latest from %s" % (column, t) if tables > 1: sql = "select max(latest) from (%s) as all_t" % sql sql = sql + ";" else: sql = "select max(%s) from auditlog;" % column curs.execute(sql) revtime = time.mktime(curs.fetchone()[0].timetuple()) conn.close() os.utime(FLAG_FILE, (revtime, revtime)) main()
This can be used in a makefile like the example below. This example extracts data from a database and then runs an R script to further summarize the data. The input and output file names for the R script are shown on the command line here, though encoding them in the script itself may be more convenient in many cases than parsing the command line in R.
.PHONY: all all: db_updated.flag data_summary.csv db_updated.py -u user_name server_name db_name data_summary.csv: data.csv summary_stats.R Rscript --vanilla summary_stats.R data.csv data_summary.csv data.csv: db_updated.flag get_data.sql execsql.py -tp -u user_name get_data.sql server_name db_name db_updated.flag: db_updated.py -u user_name server_name db_name
In this example, neither a table name nor a column name are included as arguments to db_updated.py. These may be needed in some circumstances, or the program modified so that the defaults are appropriate for the circumstances in which it is used.
No comments:
Post a Comment