Write a super-efficient OLS regression function in Java

  • Status: Closed
  • Præmier: $25
  • Modtagne indlæg: 1
  • Vinder: NikolayBokhan

Konkurrence Instruktioner

Write an efficient Java function to perform Multiple Ordinary Least Squares regression. The function should be in the form:

double[] getLineParams(double[][] independentVars, double[] dependentVar);

The length of dependentVar and each of independetVars[] is the number of datapoints in the sample. The length of the return array is the number of independent variables (the length of independentVars), plus 1; it should describe the line that minimizes the sum of squared errors, with the nth element describing the linear contribution of the nth independent variable (which may be negative). The last element of the return array should contain the constant (offset) part of the line equation.

So, assuming there were three independent variables, the objective would be to minimize totalError in the following test code:

double totalError=0;
for (int i=0;i<n;i++) {
double error=dependentVar[i] -
returnArray[0] * independentVars[0][i] +
returnArray[1] * independentVars[1][i] +
returnArray[2] * independentVars[2][i] +
returnArray[4];
totalError+=error*error;
}

Solutions should be implemented as a single class. Any version of Java may be used. Solutions should use only core Java, Math functions, etc. In particular no linear algebra libraries may be used. The function/class is not permitted to spawn any threads.

The winning entry will be the function that gives the correct results as efficiently as possible. The efficiency will be measures in terms of total execution time over a very large sample of pseudo-random datasets, which will vary wildly in width (the number of independent variables) and in length (the number of datapoints).

Anbefalede færdigheder

Arbejdsgiverfeedback

“Fast high quality solution delivered. Expert knowledge. Professional service. Great communication. Adaptable. First rate!”

Profilbillede jwashtell, United Kingdom.

Offentlig Præciserings Opslagstavle

  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    To all participants: this is a contest for a *solution*, not for a proposal.

    If it *was* a contest for a proposal, the winner would not be a proposal which says "I can do it".

    Presently, @NikolayBokhan and @muneshparmar92 are the only people to link to an actual solution.

    Thanks.

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    @Nikolay. I might be wrong, but it seems to me that one can reduce convergeIteration() from three to two nested loops, substantially reducing the complexity of this solution. (?)

    • 9 år siden
    1. NikolayBokhan
      NikolayBokhan
      • 9 år siden

      https://drive.google.com/file/d/0B91AEhaVA7YecGxOODFtWnRfTTg/edit?usp=sharing

      • 9 år siden
    2. jwashtell
      Konkurrenceafholder
      • 9 år siden

      Excellent. That is much better :-)

      • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    Alternatively, you could submit a screen-grab of your code (I'm sure we're not using this site correctly :-/)

    • 9 år siden
    1. NikolayBokhan
      NikolayBokhan
      • 9 år siden

      That is my first trial to work on this site either. :-)

      • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    @brokenaiub, would you like to submit a solution? Just put a link to your solution in the text accompanying your submission, or in the comments below.

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    Thanks Nikolay - that looks excellent. This is the first time I have used Freelancer, but I think you need to make this a proper submission in order for me to be able to award you the prize.

    • 9 år siden
  • NikolayBokhan
    NikolayBokhan
    • 9 år siden

    The following link is the link to the class with algorithm. You can specify the expected accuracy with a parameter in the constructor.

    https://drive.google.com/file/d/0B91AEhaVA7YeTl9sbmpCVTZpY2c/edit?usp=sharing

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    Hi Nikalay. If you are going to adopt an iterative approach, then I would be much more interested in Least Absolute Deviations! Can you do that? At any rate, an iterative approach is OK: please use a parameter to specify the precision and I will conduct the evaluation based on a few different precisions, and I will also take any computational complexity benefits to your approach into account (i.e. its improved performance on the larger datasets) when deciding a winner. Ultimately I will have to apply some judgement, because these are unlike approaches and cannot be directly compared. Of course if you are the only person to submit, and you submit an iterative implementation, then you will win the contest regardless of performance! :-)

    • 9 år siden
  • NikolayBokhan
    NikolayBokhan
    • 9 år siden

    Accurate answer will be received only if the regression will be solved like human does that. For AI and machine learning iterative way is usually used, just because for big arrays of data (20+ independent variables) it is much more efficient. if you need to solve the regression in human-like manner - sorry i will not help you (at least for this budget). Otherwise i need accuracy to be specified. Best regards, Nikolay.

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    Hi Nikolay. 1) It seems that this shouldn't have a significant bearing on the efficiency, as the correct answer can be found analytically. The precision should therefore be restricted only by the limitations of the types (i.e. double) and mathematical operations involved, within reason. 2) No, the data has not been normalized.

    • 9 år siden
  • NikolayBokhan
    NikolayBokhan
    • 9 år siden

    Dear jwashtell, I have following questions: 1) We should minimize the error. What is allowable accuracy? (0,1 or maybe 0,00001 ?) It has a great influence on the speed of the algorithm. 2) Is the incomming data normalized to any range? Could it happen that for some column we have values between 0 and 0,00001 and for the other between -1000 and 1000? Best regards, Nikolay

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    In case it is not obvious from the fact that this is an open contest - your submissions should link to your actual code, not just a note saying that you have done it or can do it :-)

    • 9 år siden
  • jwashtell
    Konkurrenceafholder
    • 9 år siden

    Thanks muneshparmar92. Please note: "Solutions should use only core Java, Math functions, etc. In particular no linear algebra libraries may be used." I've revised the description to read "Solutions should be implemented as a single class."

    • 9 år siden
  • mrhassancse
    mrhassancse
    • 9 år siden

    i'm interested..............................

    • 9 år siden
  • muneshparmar92
    muneshparmar92
    • 9 år siden

    import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression;
    import org.apache.commons.math3.stat.regression.SimpleRegression;
    final OLSMultipleLinearRegression regression2 = new OLSMultipleLinearRegression();
    double[] y = {
    4.0,
    8,
    13,
    };
    double[][] x2 =
    {
    { 1.0, 1, 1 },
    { 1.0, 2, 4 },
    { 0.0, 3, 9 },
    };
    regression2.newSampleData(y, x2);
    regression2.setNoIntercept(true);
    regression2.newSampleData(y, x2);
    double[] beta = regression2.estimateRegressionParameters();
    for (double d : beta) {
    System.out.println("D: " + d);
    }

    • 9 år siden
  • muneshparmar92
    muneshparmar92
    • 9 år siden

    double totalError=0;
    for (int i=0;i

    • 9 år siden

Vis flere kommentarer

Sådan kommer du i gang med konkurrencer

  • Opret din konkurrence

    Opret din konkurrence Hurtigt og nemt

  • Få tonsvis af indlæg

    Få tonsvis af indlæg Fra hele verden

  • Tildel det bedste indlæg

    Tildel det bedste indlæg Download filerne - Nemt!

Opret en Konkurrence Nu eller slut dig til os i dag!