Top 10 Reasons to Impute
If a chemical in an environmental sample comes back as non-detect, what is an environmental professional to do? It is a very common practice to simply substitute ½ the detection limit in for the concentration and move on with your statistical analysis. Most environmental practitioners know there are alternative ways to treat values below the detection limits, but they are seemingly complex and if they don’t understand how those values are imputed, they don’t trust the process.
Replacement of ½ DLs is an important step if you are planning on doing statistical analysis on your data. Repeating values on a left censored dataset are not ideal and can cause issues in many statistical analyses. Statvis has the solution for this issue but in the meantime, here is why you should impute values below your analytical detection limits. This is especially important for sparse datasets such as per- and polyfluoroalkyl substances (PFAS) analysis. PFAS suffers from infrequent detection of many compounds that are correlated with their family members. This makes imputation predictable and robust for replacing values that are below the analytical detection limits.
Here are 10 clear reasons environmental professionals should impute values below analytical detection limits (DLs), rather than substituting arbitrary values like half the DL:
Why Imputation Is Superior to ½ DL Substitution:
1. Improved Data Accuracy
Imputation statistically estimates values below the DL using relationships within the data, providing realistic estimates. Substituting ½ DL assigns arbitrary values that do not reflect actual conditions.
2. Preservation of Data Relationships
Imputation preserves correlations, covariances, and multivariate relationships within datasets, ensuring subsequent analyses (e.g., PCA, receptor modeling, regression analysis) remain meaningful.
3. Reduced Analytical Bias
Arbitrary substitution introduces systematic biases (positive or negative), skewing statistical interpretations. Imputation methods reduce such biases by using the inherent structure and variability of the dataset.
4. Enhanced Statistical Power
Imputation techniques prevent unnecessary data censoring, retaining more complete datasets. Larger effective sample sizes increase statistical power and confidence in results.
5. Better Regulatory and Legal Defensibility
Regulatory agencies and courts increasingly favor scientifically defensible approaches. Imputation methods, such as MICE and Kaplan-Meier provide scientifically grounded alternatives that hold up better in litigation or regulatory scrutiny.
6. Improved Source Identification and Fingerprinting
Environmental fingerprinting relies on precise data patterns. Imputation maintains these patterns, facilitating clearer source apportionment and contaminant identification compared to arbitrary substitutions.
7. Higher Quality Decision-Making
Better estimates of below-DL data lead to higher confidence in site assessments, contaminant risk evaluations, and remediation decisions, ultimately improving environmental management and liability allocation.
8. Better Handling of Multivariate Data
Modern imputation approaches explicitly handle multiple correlated contaminants simultaneously, producing internally consistent and logically coherent datasets, critical for multivariate statistical modeling.
9. Increased Predictive Reliability
Imputation methods, especially advanced methods like MICE or KNN, utilize available data to predict censored values robustly, thus improving subsequent predictive modeling outcomes compared to simple substitution.
10. Enhanced Credibility and Transparency
Imputation methods explicitly document assumptions and methods, fostering transparency and trust among stakeholders, clients, regulators, and the public.
Summary of Benefits:
Scientific robustness
Reduced bias
Preservation of data structure
Enhanced statistical confidence
Improved defensibility in legal/regulatory contexts
Better decisions from more complete data
In short, imputing environmental data instead of substituting arbitrary values greatly enhances the scientific quality, credibility, and utility of the resulting data analyses.
At Statvis, we have developed an easy-to-use process to automatically impute your data using multiple techniques or using our approach that applies the best techniques in sequence on your dataset. All original data is stored for future review and our tables show what technique was used to impute each value. It is graphical and easy to see how your data was treated.
Book a demo today for more information.