Sunday, April 18, 2021

Data Management Best Practices

There are a lot of web pages that present some interpretation of data management best practices.  Most of them are, in my opinion, uselessly general.  Herewith, then, is my list of specific recommended best practices.

 

1. Know the use of the data
    a. For the overall project
    b. For each data summary request

2. Use a single authoritative data store.

3. Back up important data.

4. Verify or characterize data quality.

5. Control access to sensitive data.

6. Track data changes.

7. Document data management practices.
    a. Default practices
    b. Project- or data-set-specific

8. Preserve original data.

9. Script all data revisions and data summarizations.

10. Use version control for things that may change.
    a. Scripts
    b. Regular or periodic data exports

11. Record metadata
    a. For both incoming and outgoing data
    b. Metadata includes
        1. Provenance: Who created, provided, or produced the data.
        2. Content: What the data set contains
        3. Purpose: What the data are intended to be used for.
        4. Method: How the data were generated or selected and summarized.
        5. History: The history of any revisions made to the data or the data summarization method.
    C. Forms of metadata include:
        1. Copies of emails or other documents that transmit data or request data.
        2. Header notes in scripts.
        3. Metadata pages and glossary pages in data summaries.
        4. Custom log files created by scripted data operations.

12. Date-tag directory and file names where the sequence of changes may affect their validity or interpretability.