Dimensionality Reduction of Data with Neighbourhood Components Analysis
Hannah Kariuki,
Samuel Mwalili,
Anthony Waititu
Issue:
Volume 8, Issue 3, June 2022
Pages:
72-81
Received:
9 April 2022
Accepted:
25 April 2022
Published:
10 May 2022
DOI:
10.11648/j.ijdsa.20220803.11
Downloads:
Views:
Abstract: In most research fields, the amount of data produced is growing very fast. Analysis of big data offers potentially unlimited opportunities for information discovery. However, due to the high dimensions and presence of outliers, there is a need for a suitable algorithm for dimensionality reduction. By performing dimensionality reduction, we can learn low dimensional embeddings which capture most of the variability in data. This study proposes a new approach, Neighbourhood Components Analysis (NCA) a nearest-neighbor-based non-parametric method for learning low-dimensional linear embeddings of labeled data. This means that the approach uses class labels to guide the dimensionality reduction (DR) process. Neighborhood Components Analysis learns a low-dimensional linear projection of the feature space to improve the performance of a nearest neighbour classifier in the projected space. The method avoids making parametric assumptions about the data and therefore, can work well with complex or multi-modal data, which is the case with most real-world data. We evaluated the efficiency of our method for dimensionality reduction of data by comparing the classification errors and class separability of the embedded data with that of Principal Component Analysis (PCA). The result shows a significant reduction in the dimensions of the data from 754 to 55 dimensions. Neighborhood Components Analysis outperformed Principal Components Analysis in classification error across a range of dimensions. Analysis conducted on real and simulated datasets showed that the proposed algorithm is generally insensitive to the increase in the number of outliers and irrelevant features and consistently outperformed the classical Principal Component Analysis method.
Abstract: In most research fields, the amount of data produced is growing very fast. Analysis of big data offers potentially unlimited opportunities for information discovery. However, due to the high dimensions and presence of outliers, there is a need for a suitable algorithm for dimensionality reduction. By performing dimensionality reduction, we can lear...
Show More
Time Series Analysis in Forecasting Monthly Average Rainfall and Temperature (Case Study, Minot ND, USA)
Upul Rupassara,
Dion Udokop,
Favour Ozordi
Issue:
Volume 8, Issue 3, June 2022
Pages:
82-93
Received:
12 May 2022
Accepted:
25 May 2022
Published:
31 May 2022
DOI:
10.11648/j.ijdsa.20220803.12
Downloads:
Views:
Abstract: This project analyzes the monthly average rainfall and temperature from 2005 January to 2021 December in Minot, ND, USA. Since both rainfall and temperature time series represent seasonal components, Seasonal Auto Regressive Integrated Moving Average (SARIMA) models were used to forecast the average rainfall and temperature. The main objective was to identify the SARIMA models based on Akaike’s Information Criteria (AIC). The graphical and diagnostic analysis techniques validated the models having the smallest AIC values. Among the competitive tentative models, the SARIMA (2, 0, 0) (2, 0, 1, 12) and SARIMA (1, 0, 1) (2, 0, 1, 12) were found to be the best time series forecasting models that capture the existing pattern of the rainfall and temperature data, respectively. Nevertheless, these models satisfy the model diagnostics test assumptions on the residuals such as randomness, independency, normality, and heteroscedasticity. Therefore, SARIMA (2, 0, 0) (2, 0, 1, 12) and SARIMA (1, 0, 1) (2, 0, 1, 12) models were used to forecast the mean rainfall and temperature, respectively, from the 2022 January to 2023 December.
Abstract: This project analyzes the monthly average rainfall and temperature from 2005 January to 2021 December in Minot, ND, USA. Since both rainfall and temperature time series represent seasonal components, Seasonal Auto Regressive Integrated Moving Average (SARIMA) models were used to forecast the average rainfall and temperature. The main objective was ...
Show More