Identifying Rare Classes with Sparse Training Data
Download
Author
Christopher Clifton
Tech report number
CERIAS TR 2007-97
Entry type
conference
Abstract
Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited.
Download
Date
2007 – 09
Booktitle
Database and Expert Systems Applications
Key alpha
Clifton
Pages
251-260
Publisher
Springer Berlin / Heidelberg
Volume
4653
Publication Date
2007-09-01

