Cyber security analytics -Model evaluation metrics python 3

In this task(see the attached full document task), you are given a dataset “[login to view URL]”, try to find the “best” classification model by comparing the evaluation metrics, especially the recall rates produced by knn, decision tree and random forest models.

You are given:

• Dataset: [login to view URL]

• thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]

• Parameter grid (param_grid):

For knn, n_neighbors = [1, 2, 3, 4, 5]

For decision tree, max_depth = [3, 4, 5, 6, 7]

For random forest, n_estimators = [5, 10, 20, 50]

• GridSearchCV(model_classifier(random_state=0), {param: param_grid}, cv=5, scoring='recall')

• Other parameters of your setting

You are asked to:

• use the train and test sets split in practical10 (X_train, X_test, y_train, y_test, and X_train_undersample, X_test_undersample, y_train_undersample, y_test_undersample)

• use Grid search with cross-validation to fit the undersample data with model knn, decision tree and random forest, respectively, set cv=5

• find and print the best parameter for each model (knn, decision tree or random forest) for X_train_undersample dataset

• for each model, build classifier using the found best parameter, predict using test sets (X_test_undersample and X_test), and plot the confusion matrix for the two predictions.

• for each model, plot recall matric for different threshold for the undersample dataset

• for each model, plot precision-recall curve for the undersample dataset

Evner: Machine Learning (ML), Python, Software Arkitektur, Statistisk analyse, Statistikker

