# Algorithm Optimization 3

Budget $250-750 USD

Objectives

• We will provide one datasets with one target variable (“Score”), a timestamp and 24 independent variables. The dataset contains ~55 thousand observations (however your solution should be scalable to accommodate a much larger dataset).

• The goal is to write at most 6 sets of “greater than” and “less than” restrictions on the independent variables. Each set of restrictions will return a subsample of the dataset on which we evaluate an objective function.

• Specifically, the objective function is the sum of the target of the observations in the selected subsample. Each query (set of restrictions) has to return at least 10 valid responses.

• In addition, any observations that come less than 60 seconds after a valid observation in this subsample will be removed. So each query has to return at least 10 responses that are 60 seconds apart from each other.

• In other words, your goal in this project is cornering up to 6 regions of the dataset using intervals on the independent variables, and maximize the density of positive values of the target.

Logistics:

• You can find the dataset in the Excel file “[url removed, login to view]”.

• You will see some variables have version A or version B (for instance W2 R2). In such cases you can use either one or the other, not both.

• Your restrictions cannot be applied using a higher number of decimal places than occur in the observations. For instance a restriction to W1R1 cannot be 0.015, it must be either 0.01 or 0.02.

Tips:

• Regression analysis, Neural Networks, SVM, and K-clusters will not help you much. These methods classify observations by applying a weighted average of the independent variables. The classification rule has to be on the independent variables directly, cannot be on a weighted average of them or any other function.

• Make sure to order the timestamp chronologically.

• A start could be plotting the density of the independent variables for the subsample of positive target values and for the subsample of negative target values. Then you can identify regions with a high density of positive target observations.

Reward/Milestones:

• All accepted bids will be awarded on completion

• We are going to judge the performance of each bid both inside the sample (milestone 1) and outside the sample (milestone 2). A good performance consists in a high aggregate sum of the target variable.

• After this stage we will ask you provide details of how you would maintain the existing algorithms (milestone 3) over a much larger data set, ~500 thousand observations.

## 12 freelancers are bidding on average $707 for this job

Hi, very interesting project. Could you send me the dataset? I'd code this in R if you don't mind. Best regards, marcin

Hi, I know the previous project was granted to someone else and since you said you need several ones to work in parallel, hence I would like to bid for this project. Please let me know specifically which task is expect Flere

Let an expert do it.. i have 8+ years of experience. Can we discuss the project. Please initiate a chat with me so that we can discuss the project at a broader level. Why you should hire me- 1. I have a very g Flere

HI Brother, I am Data Scientist working in Multinational Company. My work is to see the hidden pattern in the large and complex data sets and predictive analytics, Data mining,Machine Learning and also uses the stati Flere

I am a Subject Matter Expert in Mathematics, Statistics, Computer Science and Physics, and a SEO search engine optimization specialist. I worked as Matlab and Statistics Consultant for several years for many compani Flere

Please give me your best time for discussion.. My Skype Id: Vijaywebsolutions. Thanks, [url removed, login to view]