Using a provided dataset of a categories URI, (URI and category), perform three methods to classify the dataset. Code must be in python.
Shuffle the dataset, for each category split randomly to 20-80 testing and training, create 8 fold on the training, where each fold have to have URIs from all the categories, group the testing file together.
Method 1: use naive bayes Binary classifier.
Method2: use naive bayes Binary classifier on each category to determine where the URI is positive (using binary approach to simulate the multiway).
Method3: use Naive Bayes Multiway classifier.
For each method show using tables the following:
1-(category-precision-recall- fscore) for each fold, with the total score and weighted average. Note the weight average is because the dataset is not balanced.
2-the confusion matrix of each fold.
3-(fold number- total precision-total recall-total fscore- total weight average).