EXPLORING MULTINOMIAL NAÏVE BAYES FOR YORÙBÁ TEXT DOCUMENT CLASSIFICATION
Keywords:
Supervised learning, text classification, Yorùbá language, text mining, BoW RepresentationAbstract
The recent increase in the emergence of Nigerian language text online motivates this paper in which the problem of classifying text documents written in Yorùbá language into one of a few pre-designated classes is considered. Text document classification/categorization research is well established for English language and many other languages; this is not so for Nigerian languages. This paper evaluated the performance of a multinomial Naive Bayes model learned on a research dataset consisting of 100 samples of text each from business, sporting, entertainment, technology and political domains, separately on unigram, bigram and trigram features obtained using the bag of words representation approach. Results show that the performance of the model over unigram and bigram features is comparable but significantly better than a model learned on trigram features. The results generally indicate a possibility for the practical application of NB algorithm to the classification of text documents written in Yorùbá language.
Downloads
Published
Issue
Section
License
The contents of the articles are the sole opinion of the author(s) and not of NIJOTECH.
NIJOTECH allows open access for distribution of the published articles in any media so long as whole (not part) of articles are distributed.
A copyright and statement of originality documents will need to be filled out clearly and signed prior to publication of an accepted article. The Copyright form can be downloaded from http://nijotech.com/downloads/COPYRIGHT%20FORM.pdf while the Statement of Originality is in http://nijotech.com/downloads/Statement%20of%20Originality.pdf
For articles that were developed from funded research, a clear acknowledgement of such support should be mentioned in the article with relevant references. Authors are expected to provide complete information on the sponsorship and intellectual property rights of the article together with all exceptions.
It is forbidden to publish the same research report in more than one journal.