Spectral methods for imputation of missing air quality data
S. Moshenberg, U. Lerner & B. Fishbain
Air quality is well recognized as a contributing factor for various physical phenomena and as a public health risk factor. Consequently, there is a need for an accurate way to measure the level of exposure to various pollutants. Longitudinal continuous monitoring however, is often incomplete due to measurement errors, hardware problems or insufficient sampling frequency. In this paper we introduce the discrete sampling theorem for the task of imputing missing data in longitudinal air-quality time series. Within the context of the discrete sampling theorem, two spectral schemes for filling missing values are presented—a Discrete Cosine Transform (DCT) and Clustering Single Variable Decomposition (K-SVD) based methods.
The evaluation of the suggested methods in terms of accuracy and robustness showed that the spectral methods are comparable to the state of the art when the data is missing at random and do have the upper hand when data is missing in big chunks. The accuracy was evaluated using a complete very long air pollutants time series. Previous studies used incomplete shorter series, altering the results. The robustness of the imputation method was evaluated by examining its performance with increasing portions of missing data.
Spectral methods are a great option for air quality data imputation, which should be considered especially when the missing data patterns are unknown.
Please find the Matlab package under this link. For installation unzip the file into your Matlab working directory. Once the file is unzipped, you should start with AllInOne.m file. The imputation methods’ code can be found under the Methods directory in the unzipped directory.
The example is run on a long SO2 (sulfur dioxide) sequence when the data is omitted both in random an in chunks (see S. Moshenberg, U. Lerner and B. Fishbain, “Spectral Methods for Imputation of Missing Air Quality Data”, Environmental Systems Research, 4(26):1-13, 2015. for more details). The data file is a Matlab’s mat file (SO2Sequnce.mat) located under the main working directory.
Please cite the following paper in any future publication using this package: S. Moshenberg, U. Lerner and B. Fishbain, “Spectral Methods for Imputation of Missing Air Quality Data”, Environmental Systems Research, 4(26):1-13, 2015.