Top 10 Reasons to Impute

If a chemical in an environmental sample comes back as non-detect, what is an environmental professional to do? It is a very common practice to simply substitute ½ the detection limit in for the concentration and move on with your statistical analysis. Most environmental practitioners know there are alternative ways to treat values below the detection limits, but they are seemingly complex and if they don’t understand how those values are imputed, they don’t trust the process.

Replacement of ½ DLs is an important step if you are planning on doing statistical analysis on your data. Repeating values on a left censored dataset are not ideal and can cause issues in many statistical analyses. Statvis has the solution for this issue but in the meantime, here is why you should impute values below your analytical detection limits. This is especially important for sparse datasets such as per- and polyfluoroalkyl substances (PFAS) analysis. PFAS suffers from infrequent detection of many compounds that are correlated with their family members. This makes imputation predictable and robust for replacing values that are below the analytical detection limits.

Here are 10 clear reasons environmental professionals should impute values below analytical detection limits (DLs), rather than substituting arbitrary values like half the DL:

Why Imputation Is Superior to ½ DL Substitution:

1. Improved Data Accuracy

  • Imputation statistically estimates values below the DL using relationships within the data, providing realistic estimates. Substituting ½ DL assigns arbitrary values that do not reflect actual conditions.

2. Preservation of Data Relationships

  • Imputation preserves correlations, covariances, and multivariate relationships within datasets, ensuring subsequent analyses (e.g., PCA, receptor modeling, regression analysis) remain meaningful.

3. Reduced Analytical Bias

  • Arbitrary substitution introduces systematic biases (positive or negative), skewing statistical interpretations. Imputation methods reduce such biases by using the inherent structure and variability of the dataset.

4. Enhanced Statistical Power

  • Imputation techniques prevent unnecessary data censoring, retaining more complete datasets. Larger effective sample sizes increase statistical power and confidence in results.

5. Better Regulatory and Legal Defensibility

  • Regulatory agencies and courts increasingly favor scientifically defensible approaches. Imputation methods, such as MICE and Kaplan-Meier provide scientifically grounded alternatives that hold up better in litigation or regulatory scrutiny.

6. Improved Source Identification and Fingerprinting

  • Environmental fingerprinting relies on precise data patterns. Imputation maintains these patterns, facilitating clearer source apportionment and contaminant identification compared to arbitrary substitutions.

7. Higher Quality Decision-Making

  • Better estimates of below-DL data lead to higher confidence in site assessments, contaminant risk evaluations, and remediation decisions, ultimately improving environmental management and liability allocation.

8. Better Handling of Multivariate Data

  • Modern imputation approaches explicitly handle multiple correlated contaminants simultaneously, producing internally consistent and logically coherent datasets, critical for multivariate statistical modeling.

9. Increased Predictive Reliability

  • Imputation methods, especially advanced methods like MICE or KNN, utilize available data to predict censored values robustly, thus improving subsequent predictive modeling outcomes compared to simple substitution.

10. Enhanced Credibility and Transparency

  • Imputation methods explicitly document assumptions and methods, fostering transparency and trust among stakeholders, clients, regulators, and the public.

Summary of Benefits:

  • Scientific robustness

  • Reduced bias

  • Preservation of data structure

  • Enhanced statistical confidence

  • Improved defensibility in legal/regulatory contexts

  • Better decisions from more complete data

In short, imputing environmental data instead of substituting arbitrary values greatly enhances the scientific quality, credibility, and utility of the resulting data analyses.

At Statvis, we have developed an easy-to-use process to automatically impute your data using multiple techniques or using our approach that applies the best techniques in sequence on your dataset. All original data is stored for future review and our tables show what technique was used to impute each value. It is graphical and easy to see how your data was treated.

Book a demo today for more information.

Next
Next

Optimized Imputation for Valid Analysis of Environmental Datasets