Mock Data Gen with Machine Learning Module - 01/12/2023 02:56 EST

  • Status: Closed
  • Præmier: $500
  • Modtagne indlæg: 1
  • Vinder: td7x

Konkurrence Instruktioner


This is a software engineering contest that leverages machine learning to solve developer experience inconveniences in creating mock data for testing and for demos within the JavaScript ecosystem.

Winning submissions will include a GitHub repo of the software, complete with documentation and CICD using GitHub Workflows.

Employer reserves all rights to the software created under this contest but will redistribute the software under an Open Source license. All dependencies must have permissive OSI approved licenses and the software must be runnable offline, without dependence on an external web service or datastore and without dependency on specialized hardware.


Simple faker or charade libraries can be used for mock data in software development but the use can be labor intensive because they require a developer to select the correct method and to identify the input parameters for each data field. Developers have enough cognitive overhead and need a fake data solution that can use existing data models/schemas with zero configuration to create the fake data.


A NodeJS module that produces semantically accurate fake data from an arbitrary data model or schema with zero configuration. We are primarily a Typescript/NodeJS shop and describe the requirements from that perspective but welcome submissions that are Rust based and that compile to WASM are more than welcomed. Runtime portability such as in-browser, Bun, Cloudflare, etc is preferred but NodeJS is required.

Data model handlers for GraphQL SDL and JSONSchema are required. Extra preference will be given to submissions with additional handlers for TypeScript type definitions and protobufs.

Various fake data handlers should be supported. Required is a handler that accepts a single field name from the data model and returns semantically correct mock data consistent with the larger data model. Extra preference will be given to submissions with additional handlers that accept a GraphQL request shape (returning a GraphQL response shape) and a handler that does not accept an argument and returns an object for the data model (that could be stringified into JSON).

It is expected that this software will utilize existing generators such as FakerJs, ChanceJs, CasualJs and RandExpJs just as other higher level tools do:


Unlike these existing tools, this software will not statically code and thus limit itself to individual basic field types and require significant configuration for non-basic field types. How we overcome this limit is the crux of what makes this software different. Perhaps NLP string or vector comparisons can be used to select the correct generator function from the field name with only unmatched requests using an LLM. LangChain seems like a quite attractive pattern and tech for this.

*Code Standards*

Code will be written in strict TypeScript with strong typing and be compatible with Bun, Deno, and NodeJS. Code will be "Clean" and robust. OOP patterns are to be avoided in favor of "strategic" functional programming use. eslint-plugin-functional/recommended is great, using additional fp libs such as fp-ts or Ramda is not required. In general:
- Small composable functions.
- No nested code.
- Avoid if statements. Branches are only ok in the simplest and unavoidable use cases. Simple clean ternaries are fine.
- Along with avoiding branching, absolutely no try/catch.
- Never throw.
- No control loops.
- No unbounded iterators.
- Use maps rather than a switch or if/else.
- Functions should be small, pure, and composable.
- Separate configuration from code.
- Use arrow function syntax.
- Avoid async/await as one can accidentally block the event loop.


Fine grain testing of LangChain does not seem completely straight forward but there are current improvements to its testability and the LangSmith debugger should probably be used. Code should be decoupled so that mocks can be avoided. Vitest or Jest should be with fast-check as well as static assertions. Strict TDD is not required but preferred. Writing tests through the development and not at the end is required. The important thing is that testable code is cleaner, simpler, more robust. Tested code is easier to change.

The test suit should also prove the software works.

Anbefalede færdigheder

Offentlig Præciserings Opslagstavle

  • farhankha4548
    • 2 måneder siden

    I have ready your code and updated full functions but you have awarded someone

    • 2 måneder siden
  • tokibul2
    • 2 måneder siden

    Do you know Also, Do you know they are scammer?

    I earned 1000 GBP and 200+ USD by providing my service on this platform. But when I requested a payment withdrawal they closed my account. Blocked me and I couldn't chat or create any ticket.

    So, I created this account for help me to get my account balance in my bank account.

    what do you think about this scammer ( giving me my earnings in my account?

    [ They will just block this account. Because this is their only way of earning by taking hard-working payment from poor freelancers. In my words, they are a Beggar. ]

    Check this screenshot for more :

    • 2 måneder siden
  • farhankha4548
    • 2 måneder siden

    I am working in rust to provide your a better and best solution and I will also show you demo video also

    • 2 måneder siden
  • farhankha4548
    • 2 måneder siden

    Hello, sir Is is good for you in node.js or RUST?
    What is preferred by you?
    I can also provide you in RUST if you want?

    • 2 måneder siden
    1. dutco7
      • 2 måneder siden

      A Rust solution would be great. It just needs to be able to run in BunJS and CloudFlare. WASI direction could be good, wasm-pack could help. and are quite interesting.

      • 2 måneder siden
  • dataexpert18
    • 2 måneder siden

    Can you explain on which data you want to apply machine learning and what outcome you expect from machine learning?

    • 2 måneder siden
    1. dutco7
      • 2 måneder siden

      Hello Zafar, Im not sure how to explain it better than in the description. The generative model may need to use the data model for fine tuning or perhaps zero shot would work. The mock data gen function will accept a field name and return the semantically correct, generated data.

      • 2 måneder siden

Vis flere kommentarer

Sådan kommer du i gang med konkurrencer

  • Opret din konkurrence

    Opret din konkurrence Hurtigt og nemt

  • Få tonsvis af indlæg

    Få tonsvis af indlæg Fra hele verden

  • Tildel det bedste indlæg

    Tildel det bedste indlæg Download filerne - Nemt!

Opret en Konkurrence Nu eller slut dig til os i dag!