A BOOTSTRAP APPROACH FOR IMPROVING LOGISTIC REGRESSION PERFORMANCE IN IMBALANCED DATA SETS

MATTER: International Journal of Science and Technology

View Publication Info
 
 
Field Value
 
Title A BOOTSTRAP APPROACH FOR IMPROVING LOGISTIC REGRESSION PERFORMANCE IN IMBALANCED DATA SETS
 
Creator Chang, Michael
Dalpatadu, Rohan J.
Phanord, Dieudonne
Singh, Ashok K.
 
Subject Binary Response
Prediction
SMOTE
Under-sampling
Over-sampling
Confusion Matrix
Accuracy
Precision
Recall
F1-measure
 
Description In an imbalanced dataset with binary response, the percentages of successes and failures are not approximately equal. In many real world situations, majority of the observations are “normal” (i.e., success) with a much smaller fraction of failures. The overall probability of correct classification for extremely imbalanced data sets can be very high but the probability of correctly predicting the minority class can be very low. Consider a fictitious example of a dataset with 1,000,000 observations out of which 999,000 are successes and 1,000 failures. A rule that classifies all observations as successes will have very high accuracy of prediction (99.9%) but the probability of correctly predicting a failure will be 0. In many situations, the cost associated with incorrect prediction of a failure is high, and it is therefore important to improve the prediction accuracy of failures as well.  Literature suggests that over-sampling of the minority class with replacement does not necessarily predict the minority class with higher accuracy. In this article, we propose a simple over-sampling method which bootstraps a subset of the minority class, and illustrate the bootstrap over-sampling method with several examples. In each of these examples, an improvement in prediction accuracy is seen.Article DOI: https://dx.doi.org/10.20319/mijst.2018.43.1124 This work is licensed under the Creative Commons Attribution-Non-commercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
 
Publisher Global Research & Development Services Publishing
 
Date 2018-11-15
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier https://grdspublishing.org/index.php/matter/article/view/1645
10.20319/mijst.2018.43.1124
 
Source MATTER: International Journal of Science and Technology; Vol 4 No 3 (2018): Regular Issue; 11-24
2454-5880
 
Language eng
 
Relation https://grdspublishing.org/index.php/matter/article/view/1645/1995
 
Rights Copyright (c) 2018 Michael Chang, Rohan J. Dalpatadu, Dieudonne Phanord, Ashok K. Singh
 

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us

Twitter

Copyright © 2015-2018 Simon Fraser University Library