tag:blogger.com,1999:blog-10153936081887772202024-03-13T09:30:42.247-07:00Splinter of the SingularityR.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.comBlogger38125tag:blogger.com,1999:blog-1015393608188777220.post-38587075493663804302023-09-24T13:29:00.001-07:002023-09-24T13:30:15.261-07:00Median and Quartiles by Group in SQLiteData distributions are often characterized by their range mean, median, and quartiles. For data sets in SQLite, the range (min and max) and mean are easily calculated with built-in aggregate functions. Calculation of quartiles (the first quartile, the median, and the third quartile) require custom SQL code, as there are no built-in aggregate functions for these values. This R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-25477796122451449132023-09-04T20:35:00.004-07:002023-09-15T23:02:47.325-07:00Why Don't People Ask More Questions?More than what? said a person who asks a lot of questions.
This is not a new question. There are numerous summaries of reasons why people don't ask questions, and suggestions for how one can become a better question-asker. A lot of existing discussion is focused on general conversations; the focus here is specifically on technical conversations in professional settings, where the goal is R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-89808926788136779962023-08-30T11:19:00.003-07:002023-08-30T11:21:32.800-07:00Calculating with Significant Digits in PostgreSQLData that are stored in databases may be measurement results that are inherently imprecise. The precision of a measurement is often represented by the number of significant digits associated with the measurement. Calculations performed with such measurements should preserve, or propagate, the significant digits so that the precision of the calculated result is also represented appropriately.
R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-54306810434797054062021-10-02T10:35:00.000-07:002021-10-02T10:35:22.221-07:00Are We in the Twilight Years of Programming Language Development?The following figure shows the number of new programming languages created per year since 1970. The data are from the Wikipedia page Timeline of Programming Languages.There's a consistent decline since the mid-'90s. Have most of the best programming languages already been developed, so we don't need any more?R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-44634097159868572602021-08-28T11:12:00.009-07:002021-08-29T05:01:44.933-07:00Data Transformation Operations and a Standard TerminologyData standardization, normalization, rescaling, and transformation: what they are and when to apply themThere are several different types of operations that are commonly applied to data before further analysis. These operations are applied to individual data values, but are typically based on the attributes of a group of data values. In a data matrix where columns are variables and R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-62609799429762419372021-04-18T19:20:00.001-07:002021-08-29T05:03:45.644-07:00Data Management Best PracticesThere are a lot of web pages that present some interpretation of data management best practices. Most of them are, in my opinion, uselessly general. Herewith, then, is my list of specific recommended best practices. 1. Know the use of the data a. For the overall project b. For each data summary request2. Use a single authoritative data store.3R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-74675785487916300712021-03-21T07:28:00.003-07:002021-03-21T07:36:21.736-07:00Crosstabbing Data in Postgres with an Indefinite Number of ColumnsPostgreSQL's tablefunc extension provides the crosstab function, which can be used to transform normalized data into a crosstabbed format. However, one of the drawbacks of the crosstab function is that it requires that the names of all of the column headers be listed in one of its arguments. In some cases (actually, pretty much every case when I have reason to use crosstab), the R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-86322640329017768842021-03-17T03:38:00.003-07:002021-08-29T05:05:01.956-07:00Checklist for SQL Script Code ReviewThe following list of items focuses on points of potential weakness and improvement to consider during a code review of a SQL script.Do the header notes identify the script's purpose, author, and revision history?Is the code formatted for readability?Do comments accurately describe the purpose of functional blocks within the script?Are nullable columns used in joins without accounting for the R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-39336025056019298492020-12-05T10:47:00.001-08:002021-02-08T20:47:35.192-08:00Sequentially Writing Data to Zip Files with Python
The write() and writestr() methods of the ZipFile class in Python's zipfile library allow the addition of an entire file or a single string as a member file within a zip file. However, if a large amount of data will be generated dynamically, these methods do not allow separate data items to be written sequentially to a member file within a zip file.The StreamableZipfile class in the R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-72689228045490713602020-09-12T05:11:00.000-07:002021-08-29T05:06:17.829-07:00The Three Rules of Budgeting1. If it's a task that you know how to do, you're going to under-budget it.
2. If it's a task that you don't know how to do, you're going to under-budget it by a lot.
3. The client always follows Rule 2.
R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-20207761727180227832020-06-15T18:23:00.000-07:002020-06-25T20:55:53.154-07:00Producer-Consumer and Other Algorithms for Import of a CSV File to a Database, in PythonThis post presents a comparison of the performance of several different algorithms for import of data from a CSV file into a database. Timings are presented for six different algorithms, run on both PostgreSQL (10.12) and MariaDB (10.3.23).
Import of data into a database is a process that is amenable to application of a producer-consumer algorithm because the steps of reading data from the CSV R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-56854746834580990062020-03-01T12:18:00.000-08:002020-03-01T12:18:10.420-08:00Automated Comparison of Data in Base and Staging TablesWhen new data have been received and prepared for merging into existing database tables, a review of the changes to be made is often warranted prior to merging the data. A set of execsql.py scripts to compare data in base and staging tables is available at https://osdn.net/projects/execsql-compare/. These scripts use the information_schema views of the database to identify primary keyR.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-35305996777189899272020-01-17T07:06:00.001-08:002020-01-17T07:06:52.045-08:00Code Repositories Migrated to OSDNBecause Bitbucket will be dropping support for Mercurial, and because I prefer Mercurial to Git, I am migrating all of my open-source repositories to OSDN. Documentation that is currently hosted on ReadTheDocs (RTD) will also be hosted on OSDN because RTD cannot clone documentation sources from OSDN.
Projects now hosted on OSDN are:
execsql
execsql-upsert
execsql-glossary
tklayout
tkpane
R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-25901284027275823692019-09-18T20:52:00.000-07:002019-09-22T09:55:36.040-07:00Python DB-API 'type_code' ValuesThe Python DB-API specifies seven cursor attributes, the second of which is a type_code that should describe the data type of the column. The DB-API specifies a set of type objects that should be used for the type_code value. However, different DBMSs report various types of values for the type codes. The type of information provided as the type_code by several different DBMSs, R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-39073875069311194252019-05-11T07:41:00.000-07:002019-06-04T08:40:18.208-07:00Simplified Creation of a Data Summary GlossaryData summaries or tables that are exported from a database may use column names that are abbreviated and possibly unclear to some users of the data. Data exports may also include coded values that are not fully defined within the data table itself. Definitions of those terms can be helpful to users of the data. One way of providing those definitions is to produce a custom glossary to R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-45399928975327547662019-03-28T06:52:00.001-07:002021-08-29T05:25:35.724-07:00Aphorism of the Day"I used a standard technique" is just another way to say "I didn't think at all."
R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-21361894883511295422019-03-10T10:30:00.000-07:002019-03-10T10:38:31.138-07:00Automation of Upsert Operations with QA Checks and LoggingScripts that extend the data merging technique illustrated in the post Driving Data Table Merges from the Information Schema add the following capabilities:
The upsert operation can be applied to multiple tables simultaneously, and will execute SQL update and insert statements in top-down order to maintain referential integrity among the tables.
Prior to updating data in the base tables with R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-43025281540869545002019-01-27T08:53:00.000-08:002019-01-27T09:22:08.860-08:00Retrieving Foreign Key Columns from PostgreSQL's Information SchemaThere seems to be no readily available resource on the internet that correctly describes how to use Postgres' information_schema to retrieve a table of corresponding column names from the two tables making up a foreign key relationship. Therefore, code to produce such a table is shown below.
In the output, the columns named with a "uq" prefix refer to the table with a unique (e.g., primary) keyR.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-83646590951013481032019-01-20T11:04:00.001-08:002019-07-18T11:49:27.898-07:00Driving Data Table Merges from the Information SchemaWhen loading data into multiple tables of a database, it may be necessary to execute UPDATE and INSERT statements for numerous tables. This task can be simplified if the data that are to be loaded are staged in tables that have the same structure as the base tables that are the targets of the data merge, and if the system catalog can be queried to provide information on table structures and keysR.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-80922501765974895192018-08-18T13:34:00.001-07:002018-08-18T13:34:50.856-07:00UnboundLocalError in Python 3 When Using an Exception AliasWhile modifying execsql to run under Python 3 as well as Python 2, I encountered a version compatibility issue that I did not find described in any readily available online documentation for 2-to-3 conversion. It is:
In Python 3, when an exception is aliased to the same name as an existing variable, that variable is destroyed, and subsequent references to the variable will result in an R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-72947017868137338312018-03-16T16:00:00.000-07:002021-08-29T05:27:14.403-07:00Tech Support AllegorySo I go to my doctor, and I say “Doctor, when I stand up, I get this pain in my chest.” And the doctor says “Well, let’s see about that. Does this happen when you stand up from a sitting position?”
“Yes,” I say.
So the doctor sits down in a chair and then stands up. “Hm,” says the doctor, “I don’t feel anything. There’s no problem as far as I can tell.” And he walks R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-735457371965388122018-02-04T10:35:00.000-08:002018-02-04T10:35:41.523-08:00Building Tkinter Interfaces with Reusable Components from the TkPane LibraryBuilding graphical user interfaces (GUIs) with Python's Tkinter library usually requires a bit of fiddling not only to get the visual layout correct, but also to link the various Tkinter widgets so that they interact properly. Common interactions among widgets include enabling or disabling one based on the state of another, and accessing the data in one widget from another widget. The network R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-26952556070500314472018-01-28T14:48:00.000-08:002018-01-29T08:22:21.460-08:00Building Tkinter Interfaces from the Inside OutGraphical user interfaces (GUIs) ordinarily consist of a number of different elements, such as those illustrated in Figure 1. Some GUI development toolkits, specifically including Python's Tkinter module, require these elements to be assembled from the outside in. That is, the outermost frame must be defined first, and that first frame then populated with any container elements (e.g., R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-77282239854931265902017-12-09T11:13:00.000-08:002018-01-29T08:25:17.509-08:00Ordering Database Tables by Foreign Key DependenciesSome database operations that affect multiple tables must touch the affected tables in an order corresponding to their dependencies. For example, data must be loaded into parent tables before data can be loaded into child (dependent) tables. Some deletion and update operations may also need to be carried out in dependency order. A list of all tables in dependency order can be useful to R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0tag:blogger.com,1999:blog-1015393608188777220.post-65497838822808385342016-11-19T11:31:00.000-08:002016-11-19T11:41:27.727-08:00A Year and Quarter Data Type for PostgreSQLFor some applications, time periods are needed that represent quarters of the year. Although quarters could be represented by using an exact date data type and constraining the month and day values (i.e., to the first day of each quarter) or by using date ranges in PostgreSQL, these representations do not allow easy computation of the difference between two dates (in quarters) or the R.D.Nielsenhttp://www.blogger.com/profile/14491881604655027255noreply@blogger.com0