Annulleret

R Code to Calculate Random Forest Out-of-Bag Estimate of Error

Preference given to freelancers who can complete the project within 24-48 hours and who have R and Random Forest experience. You will need to be very familiar with Random Forests and R as I am not and can not provide much assistance.

Essentially, I am looking for an small enhancement of the Random Forest process in the R GUI called Rattle. From what I can tell by looking at the R Add-In called Party, there are a number of functions included which might mean adding perhaps 5-15 additional lines of code to what I already have (although I could certainly be off on that estimate).

Using the R GUI called Rattle, I can easily select my dataset (see below) and choose a single Y, as well as the random seed, and choose the ratio of training to testing data. Next, I execute the RF (Random Forest) model choosing only the number of trees (default is 500) and the number of predictors (default is the integer of the square root of m total predictors). From this, R (through Rattle's code) gives me the Out-of-Bag Error and the traditional 2x2 classification grid for both training and testing data. Not including the 5 seconds it takes R to run the code, I can set up this scenario from scratch in less than 1 minute. Due to Rattle’s limitations, I can only execute for a single Y at a time. This issue, as well as the inability to aggregate those Out-of-Bag results, is my problem.

The algorithm above is outlined very succinctly at [url removed, login to view]~dzeng/BIOS740/[url removed, login to view] on the first page under the title “The algorithm” and is covered in the listed points 1, 2, 3 and 1. Essentially, what I need done is the very next point they list that says:

2. Aggregated the OOB predictions. (On the average, each data point would be out-of-bag around 36% of the times, so aggregate these predictions.) Calculate the error rate, and call it the OOB estimate of error rate.

However, as I am really after the PPV (Positive Prediction Value - i.e. where a 1 is predicted for Yn) and not the global OOB error (due to my data being skewed towards y-values of 0) of the models, I am more interested in the raw prediction counts so I can calculate error rates myself.

I will supply a CSV data sample of ~4000 observations (~50/50 training/testing split) with multiple binary Y's and multiple binary X's and one continuous X (an integer ranging from 0 to ~30) for each observation. I can even supply the R code from Rattle for the procedure I am currently using.

I would like your R code to be able to accept the following inputs from me:

-observations in the format: Observation #, Y1…Yn, X1…Xm

-random seed value

-number of trees value (default is 500)

-number of predictors to be randomly sampled (default is the integer of the square root of m total predictors)

-number of rows at bottom of data list for holdout data (to be scored each round)

-number of rounds (which will be ~1,000 – 1,000,000)

I would like your R code to be able to supply the following outputs to me:

-CSV file with full original data plus the aggregated OOB prediction totals (for both training and testing data) for each observation for each Y (i.e. the number of times the OOB prediction was 0 for each observation for each Y and the number of times the OOB prediction was 1 for each observation for each Y)

If you happen to be aware of an open source R GUI that will already do all of the above for me (and that I can understand and use), you can just help me install it and will not need to supply the R code. As long as it works for me, the project will be considered completed.

Færdigheder: Algoritme, Maskinoplæring, Matematik, Programmeringssproget R, Statistikker

Se mere: y trees, what's an algorithm, what's algorithm, what is the algorithm, what is algorithm mean, what do you mean by algorithm, what are binary trees, what algorithm, use of binary, use of algorithm, trees in algorithm, set algorithm, rf freelancers, number of freelancers, number and lines freelancers, it works global, forest freelancers, data trees, code for me freelancers, calculate number of freelancers, binary trees in c, binary trees, binary problem, binary code sample, binary algorithm

Om arbejdsgiveren:
( 11 bedømmelser ) Charlottetown, Canada

Projekt-ID: #10797267

3 freelancers are bidding on average $23 for this job

sabirshah4545

I have a great experience in this field. Kindly contact me now and I’m sure we will make a reasonable deal thanks a lot.

$333 USD in 15 dage
(11 bedømmelser)
3.6
$10 USD in 0 dage
(3 bedømmelser)
2.0
tryanaditya

I really intrested in computational science. I am master in computer engineer. I finish my BS and MS for 4.5 years which may require 6 years to finish at my country. I have strong mathematical background. I have ever g Mere

$32 USD på 1 dag
(1 bedømmelse)
2.1
$13 USD på 1 dag
(0 bedømmelser)
0.0
$25 USD på 1 dag
(0 bedømmelser)
0.0