Usage of Low-Quality Data: Evidence from Quality-Quantity Trade-Off

Published: September 25, 2025
Views:       Downloads:
Abstract

In recent years, low-cost sensors have introduced opportunities for environmental monitoring, offering advantages such as cost-efficiency, high temporal resolution, massive data generation, and flexible deployment. However, challenges persist in data quality limitations and the technical complexity of preprocessing requirements, which not only hinder actionable insight formation but also create barriers to widespread adoption, resulting in data waste. The quality-quantity trade-off relationship may provide a novel pathway to harness the potential of low-quality data. This phenomenon, while extensively studied in contexts such as monitoring network optimization, ecosystem service interactions, and socio-economic systems, remains unexplored in the context of low-quality data. Our research aims to demonstrate the practical value of such data by elucidating this quality-quantity trade-off relationship. Using water quality parameters in Taihu as a case study, we conducted a numerical simulation-based investigation. Firstly, we systematically introduced error perturbations (Strategy 1: 0~50% and Strategy2: ±0~50% error ranges, each at 10% intervals) and varied densities (five gradient configurations) to generate numerous error-induced fields. The evaluation framework incorporated both simulation fidelity metrics and entropy analysis to assess data utility under different error propagation scenarios. Secondly, we employed Support Vector Regression (SVR) to model the nonlinear relationship between system Information Monitoring Efficiency and absolute error, thereby dissecting the dynamic interplay between data quality and quantity. Ultimately, we constructed a quantitative framework to characterize the quality-quantity trade-off. The results demonstrated that (1) Numerical simulation and entropy exhibited varied responses to different strategies. System outputs demonstrated greater stability compared to individuals. The maximum impact on individuals reached 18.4% in numerical simulation and 47.3% in information content. (2) Through the analysis of system’s Absolute Error (AE) and Information Monitoring Efficiency (IE) fitted curves, thresholds were identified based on abrupt changes in the tangent slope of the curves. The threshold was 4.027 (for Strategy1) and 22.406 (for Strategy2). The relationship between data quality and quantity varied across scenarios, influenced by the datasets and the curves. (3) By the AE-IE’s fitted curve and its ±5% percentiles, a methodology was established for formulating a quality-quantity trade-off relationship. This relationship could be established within the corresponding data interval when the curve and its percentile crossed a specific y-axis line. Conversely, a trade-off relationship could only be discerned at individual data points when the curve and its percentile were positioned on the same side of the line.

Published in Abstract Book of ICEER2025 & ICCIVIL2025
Page(s) 10-10
Creative Commons

This is an Open Access abstract, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Numerical Simulation, Information Content, Quality-Quantity Trade-Off