-
Factors Influencing Secondary School Student’s Performance Through Variable Decision Tree Data Mining Technique
Issue:
Volume 6, Issue 5, October 2020
Pages:
120-129
Received:
17 January 2020
Accepted:
10 September 2020
Published:
25 September 2020
Abstract: Schools are considered as the backbone for long-term economic progress. No country can develop without increasing their education level. Despite the fact that the Portuguese population shows a brilliant development in their educational level from last decade, but still Portugal lies on the tail surrender of Europe in statistics because of excessive levels of student failure. Primarily, this costs a lot better in the middle of the elegance of Mathematics and Portuguese. On the other hand, the field of data mining (DM), the purpose of extracting the high-stage knowledge of raw statistics, automatic gear compelling offer to a useful source of training domain. This paper pursues to improve the overall performance of middle school students of Portugal through two variables decision tree, which is a favorable approach to data mining used for classification, prediction and factors explored with the help of their significance. Results shows that, provided the first and / or second interval school grades, awesome prediction accuracy can be achieved. Despite the success of students strongly influenced by father's job assistance; evaluation has clearly shown that there are also other elements (such as learning time, mother's occupation, the desire of higher education, the paid-classes and the travel time from home and school, etc.) are important elements which have great impact on the performance of students in secondary school education in Portugal. As a direct result of this study, through which specialize in these factors and create a kind of policy is mainly based on studies in the country width exceptional level of education may increase at the secondary level that produces goose bumps to the stage of higher education in Europe.
Abstract: Schools are considered as the backbone for long-term economic progress. No country can develop without increasing their education level. Despite the fact that the Portuguese population shows a brilliant development in their educational level from last decade, but still Portugal lies on the tail surrender of Europe in statistics because of excessive...
Show More
-
Modelling Extreme Temperature Using Extreme Value Theory: A Case Study Northern Kenya
Morris Mbithi Wambua,
Joseph Kyalo Mung’atu,
Jane Akinyi Aduda
Issue:
Volume 6, Issue 5, October 2020
Pages:
130-136
Received:
28 September 2020
Accepted:
9 October 2020
Published:
17 October 2020
Abstract: The impacts of extremely high temperatures on plants, human beings and animals’ health have been studied in several parts of the world. However, extreme events are uncommon and have only attracted attention recently. In this study, extreme temperature behavior was modelled through the application of extreme value theory using maximum monthly temperatures over a 36 years period. Data on monthly maximum temperature from the Mandera, Wajir and Lodwar stations was modelled using generalized extreme value (GEV) and generalized Pareto distributions (GPD) models. The results revealed that the GEV model was better in modelling extreme temperature behavior because it had the least AIC and BIC values. Two comparative tests, namely, Anderson-Darling and Kolmogorov-Smirnov confirmed the GEV model to be adequate for the data. Diagnostic checks of the two models using probability-probability (PP) plot, quantile-quantile (QQ) plot, return level plot and mean residual life plot revealed that the GEV fitted the data well. Return periods of 5, 10, 20, 50 and 100 years also revealed an increasing trend for long return periods.
Abstract: The impacts of extremely high temperatures on plants, human beings and animals’ health have been studied in several parts of the world. However, extreme events are uncommon and have only attracted attention recently. In this study, extreme temperature behavior was modelled through the application of extreme value theory using maximum monthly temper...
Show More
-
Modeling Zero Inflation and Over-Dispersion in Domestic Package Insurance Claims Portfolio: A Case of Madison Insurance Company-Kenya
Polycarp Nyabuto,
Anthony Wanjoya,
Antony Ngunyi
Issue:
Volume 6, Issue 5, October 2020
Pages:
137-144
Received:
29 September 2020
Accepted:
14 October 2020
Published:
21 October 2020
Abstract: The standard Poisson distribution is widely used as a mechanism for regression modeling of count data outcomes. However, the suitability of this modeling technique is only limited to equi-dispersed count data outcomes. This is due to the fact that this modeling technique does not take into account the problems associated with over dispersion and excess zeros in many data sets as with insurance claims data. The study objective is to model domestic package insurance claims frequency using zero inflated and hurdle models since insurance portfolios are characterized by the non-occurrence of claims over a given time interval. This non-occurrence of claims over a given time interval usually leads to the Zero-Inflation and Dispersion associated with insurance claims data. The study consequently evaluates the performance of the Poisson, Zero Inflated Poisson (ZIP) and Hurdle Poisson (HP) models in determining the model that best models the domestic package insurance claims data. This is then used to estimate, predict and determine the heterogeneity of occurrence of the aforementioned insurance claims. The statistical Hosmer-Lemeshow tests is used to define the suitability of the fitted model to estimate the zero-inflation and over-dispersion characteristic of the data. To determine the presence of outliers and the distribution of residuals, the Residual Pearson and Deviance statistics are used. Data on a number of claims for domestic package insurance policy from Madison Insurance ltd, Kenya spanning from 2014 to 2018 (261 weeks) is used in the study.
Abstract: The standard Poisson distribution is widely used as a mechanism for regression modeling of count data outcomes. However, the suitability of this modeling technique is only limited to equi-dispersed count data outcomes. This is due to the fact that this modeling technique does not take into account the problems associated with over dispersion and ex...
Show More
-
Ordinal Regression Modeling of Mother to Infant HIV Transmission in Nyeri County, Kenya
Agnes Njoki,
Anthony Wanjoya,
Antony Waititu
Issue:
Volume 6, Issue 5, October 2020
Pages:
145-152
Received:
30 September 2020
Accepted:
20 October 2020
Published:
23 October 2020
Abstract: The transmission rates of HIV from a HIV-positive mother to her child during pregnancy, delivery or breastfeeding remains of much concern. Various governments and non-governmental organizations have aimed at coming up with policies aimed at minimizing the transmissions. For this to be achievable, there is a need for sound statistical procedures in the analysis of the mother to infant HIV transmission data. The study gives an application of the ordinal regression to the modeling of such data, a case of Nyeri County-Kenya. The logistic regression has been described as the best methodology for modeling binary response variables. However it does not provide a best fit for an ordered categorical variable with more than two categories. This calls for the extensions of the logistic regression which can be used when modeling such kind of variables, such an extension is the ordinal regression methodology. This study proposes the use of the ordinal regression methodology with probit and logit link functions to model infant feeding, arv regimen, maternal cell count and maternal viral load effect on mother to infant HIV transmission in Nyeri county, Kenya a case of Karatina sub-county referral hospital. An aspect of the ordinal link models, which can be useful for this implementation is particularly emphasized as it is in their interpretation that the classes of the dependent variable can be considered from the partition of the variation interval of an underlying continuous random variable. Data to be used shall be secondary data collected from Karatina sub-county referral hospital. Inference on parameters and model diagnostics is also provided.
Abstract: The transmission rates of HIV from a HIV-positive mother to her child during pregnancy, delivery or breastfeeding remains of much concern. Various governments and non-governmental organizations have aimed at coming up with policies aimed at minimizing the transmissions. For this to be achievable, there is a need for sound statistical procedures in ...
Show More
-
Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data
Kipkorir Collins,
Anthony Waititu,
Anthony Wanjoya
Issue:
Volume 6, Issue 5, October 2020
Pages:
153-162
Received:
2 October 2020
Accepted:
20 October 2020
Published:
26 October 2020
Abstract: In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.
Abstract: In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discr...
Show More