# Write a super-efficient OLS regression function in Java

• Status: Closed
• Præmier: \$25
• Modtagne indlæg: 1
• Vinder: NikolayBokhan

## Konkurrence Instruktioner

Write an efficient Java function to perform Multiple Ordinary Least Squares regression. The function should be in the form:

double[] getLineParams(double[][] independentVars, double[] dependentVar);

The length of dependentVar and each of independetVars[] is the number of datapoints in the sample. The length of the return array is the number of independent variables (the length of independentVars), plus 1; it should describe the line that minimizes the sum of squared errors, with the nth element describing the linear contribution of the nth independent variable (which may be negative). The last element of the return array should contain the constant (offset) part of the line equation.

So, assuming there were three independent variables, the objective would be to minimize totalError in the following test code:

double totalError=0;
for (int i=0;i&lt;n;i++) {
double error=dependentVar[i] -
returnArray[0] * independentVars[0][i] +
returnArray[1] * independentVars[1][i] +
returnArray[2] * independentVars[2][i] +
returnArray[4];
totalError+=error*error;
}

Solutions should be implemented as a single class. Any version of Java may be used. Solutions should use only core Java, Math functions, etc. In particular no linear algebra libraries may be used. The function/class is not permitted to spawn any threads.

The winning entry will be the function that gives the correct results as efficiently as possible. The efficiency will be measures in terms of total execution time over a very large sample of pseudo-random datasets, which will vary wildly in width (the number of independent variables) and in length (the number of datapoints).

Winner

## Arbejdsgiverfeedback

“Fast high quality solution delivered. Expert knowledge. Professional service. Great communication. Adaptable. First rate!”

jwashtell, United Kingdom.

## Offentlig Præciserings Opslagstavle

• Konkurrenceafholder
• 9 år siden

To all participants: this is a contest for a *solution*, not for a proposal.

If it *was* a contest for a proposal, the winner would not be a proposal which says "I can do it".

Presently, @NikolayBokhan and @muneshparmar92 are the only people to link to an actual solution.

Thanks.

• 9 år siden
• Konkurrenceafholder
• 9 år siden

@Nikolay. I might be wrong, but it seems to me that one can reduce convergeIteration() from three to two nested loops, substantially reducing the complexity of this solution. (?)

• 9 år siden
1. ###### NikolayBokhan
• 9 år siden

• 9 år siden
2. Konkurrenceafholder
• 9 år siden

Excellent. That is much better :-)

• 9 år siden
• Konkurrenceafholder
• 9 år siden

Alternatively, you could submit a screen-grab of your code (I'm sure we're not using this site correctly :-/)

• 9 år siden
1. ###### NikolayBokhan
• 9 år siden

That is my first trial to work on this site either. :-)

• 9 år siden
• Konkurrenceafholder
• 9 år siden

@brokenaiub, would you like to submit a solution? Just put a link to your solution in the text accompanying your submission, or in the comments below.

• 9 år siden
• Konkurrenceafholder
• 9 år siden

Thanks Nikolay - that looks excellent. This is the first time I have used Freelancer, but I think you need to make this a proper submission in order for me to be able to award you the prize.

• 9 år siden
• ###### NikolayBokhan
• 9 år siden

The following link is the link to the class with algorithm. You can specify the expected accuracy with a parameter in the constructor.

• 9 år siden
• Konkurrenceafholder
• 9 år siden

Hi Nikalay. If you are going to adopt an iterative approach, then I would be much more interested in Least Absolute Deviations! Can you do that? At any rate, an iterative approach is OK: please use a parameter to specify the precision and I will conduct the evaluation based on a few different precisions, and I will also take any computational complexity benefits to your approach into account (i.e. its improved performance on the larger datasets) when deciding a winner. Ultimately I will have to apply some judgement, because these are unlike approaches and cannot be directly compared. Of course if you are the only person to submit, and you submit an iterative implementation, then you will win the contest regardless of performance! :-)

• 9 år siden
• ###### NikolayBokhan
• 9 år siden

Accurate answer will be received only if the regression will be solved like human does that. For AI and machine learning iterative way is usually used, just because for big arrays of data (20+ independent variables) it is much more efficient. if you need to solve the regression in human-like manner - sorry i will not help you (at least for this budget). Otherwise i need accuracy to be specified. Best regards, Nikolay.

• 9 år siden
• Konkurrenceafholder
• 9 år siden

Hi Nikolay. 1) It seems that this shouldn't have a significant bearing on the efficiency, as the correct answer can be found analytically. The precision should therefore be restricted only by the limitations of the types (i.e. double) and mathematical operations involved, within reason. 2) No, the data has not been normalized.

• 9 år siden
• ###### NikolayBokhan
• 9 år siden

Dear jwashtell, I have following questions: 1) We should minimize the error. What is allowable accuracy? (0,1 or maybe 0,00001 ?) It has a great influence on the speed of the algorithm. 2) Is the incomming data normalized to any range? Could it happen that for some column we have values between 0 and 0,00001 and for the other between -1000 and 1000? Best regards, Nikolay

• 9 år siden
• Konkurrenceafholder
• 9 år siden

In case it is not obvious from the fact that this is an open contest - your submissions should link to your actual code, not just a note saying that you have done it or can do it :-)

• 9 år siden
• Konkurrenceafholder
• 9 år siden

Thanks muneshparmar92. Please note: "Solutions should use only core Java, Math functions, etc. In particular no linear algebra libraries may be used." I've revised the description to read "Solutions should be implemented as a single class."

• 9 år siden
• ###### mrhassancse
• 9 år siden

i'm interested..............................

• 9 år siden
• ###### muneshparmar92
• 9 år siden

import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression;
import org.apache.commons.math3.stat.regression.SimpleRegression;
final OLSMultipleLinearRegression regression2 = new OLSMultipleLinearRegression();
double[] y = {
4.0,
8,
13,
};
double[][] x2 =
{
{ 1.0, 1, 1 },
{ 1.0, 2, 4 },
{ 0.0, 3, 9 },
};
regression2.newSampleData(y, x2);
regression2.setNoIntercept(true);
regression2.newSampleData(y, x2);
double[] beta = regression2.estimateRegressionParameters();
for (double d : beta) {
System.out.println("D: " + d);
}

• 9 år siden
• ###### muneshparmar92
• 9 år siden

double totalError=0;
for (int i=0;i

• 9 år siden

## Sådan kommer du i gang med konkurrencer

• Opret din konkurrence Hurtigt og nemt

• Få tonsvis af indlæg Fra hele verden