The database schema used to define your tables and the records they contain have a direct affect on the integrity of future results. In other words, errors and inconsistencies in the result sets derived from your database may have started before any entries into the system were ever made – at the design stage!
Data integrity begins with the manner in which it is defined and stored.
Discussion by Example:
A manufacturing facility utilizes a simple Work Order / Job Number driven system where employee labour tickets are used to report daily production results. The data from the production reports is then entered into an ERP system for processing.
Each report includes fields for Employee Name, Clock #, Shift, and Date, followed by columns for separate line entries as follows::
- Start and Finish Times
- Work Center / Machine
- Part Number
- Job Number
- Step Number
- Quantities Produced: Good, Scrap, Hold
- Downtime Codes
A separate row entry is used to record production as work is assigned to the employee over the course of their shift.
The labour tickets are entered into the ERP system where labour and machine hours are calculated. The raw and calculated fields are then saved and the applicable database tables are updated accordingly.
Many good books on database design and SQL queries strongly advise against using or storing calculated results in the record / row fields of a database table. There are at least three reasons for this:
A calculated field only stores the result of the calculation, NOT the formula that created it. Any errors or changes to the formula will require re-processing of all the records in the database to determine the “correct value”.
The “data type” used to store the calculated result may lose precision or level of accuracy when represented as a binary value and / or converted to a ‘character’ string. Use a type that best represents the original form of the raw data to avoid errors due to rounding or data precision constraints introduced by the data type.
For example, 10 minutes can be stored as an integer value without loss of precision or accuracy. However, when converted to hours, 10 / 60 becomes 0.16666… Now consideration must be given to the degree of precision required to store the number to minimize potential side effects on future results (0.17, 0.167, 0.1667). To maintain data integrity, storing the original form of the raw data is preferred and is considered a best practice.
Although some errors may appear to be incidental or negligible, side effects can be both compound and cumulative. To be 99% correct on 3 calculations is to be 97% correct on the final result – Compounding Errors > 99% x 99% x 99% = 97. The cumulative effect of rounding increases as the records in included in the data set increase.
From our example above, storing hours to 3 decimal places (0.167 hours = 10.02 minutes) introduces a 0.2% margin of error. While this may be negligible for individual calculations, the cumulative effect over 100,000 data records may be considered significant.
Factors critical to retaining the integrity of the unique data elements stored in your database tables include:
- the data type used to represent the stored value in the table (integer, float, date, time, boolean, text …),
- stored value – raw versus calculated,
- the resolution of the measurement value – precision
- measurement error – every measurement system is prone to error. Even time requires a correction on occasion from single second adjustments to leap years.
Going back to our example, the start and finish times are recorded using a 24-hour format where 0600 = 6:00 am, 1200 = noon, 1800 = 6:00 pm, and 0000 = midnight.
Since times can be recorded to the nearest minute, we can represent the number of minutes between start and finish times using an integer data type. Any span of time can be determined or derived with the same precision or accuracy as the original raw data.
Error Due to Rounding:
If an employee works from 06:00 to 08:30, breaks for 20 minutes, and resumes working from 08:50 to 11:30 we can accurately determine the number of minutes worked as follows:
- Working Time (Minutes) = (08:30 – 06:00) + (11:30 – 08:50) = 150 + 160 = 310 Minutes
The result of the calculation is a whole number and can be represented by an integer data type. To convert from minutes to hours introduces an opportunity for error due to rounding. Whenever possible, always perform conversions at the last possible step in the calculation process to minimize the effect of rounding.
The question is, “What degree of “precision” do we need to provide an “accurate” result?”Converting our time from minutes to hours will depend on the number of decimal places needed to express a meaningful result:
Decimals 310 / 60 = Result Reversed x 60 Variance (Minutes)
0 5 300 - 10.0
1 5.2 312 + 2.0
2 5.17 310.2 + 0.2
3 5.167 310.02 + 0.02
4 5.1667 310.002 + 0.002
The variance column demonstrates the effect of rounding on calculated results. The commercial database in our case is storing hours worked to three decimal places.
All working hours that span across a break period of 20 minutes are subject to an increase of 0.02 minutes based on the precision of the stored calculated value.
Consider that the same conversion process applies to lost time events including but not limited to: setup, material changes, and downtime faults (machine / equipment / tooling).