
Closed
Posted
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
Project ID: 40488721
7 proposals
Remote project
Active 4 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
7 freelancers are bidding on average ₹1,000 INR/hour for this job

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
₹1,500 INR in 40 days
5.7
5.7

As an experienced data analyst and scientist with a deep understanding of Python, I am perfectly equipped to deliver you a robust and efficient pipeline for processing Binance data à la your requirements. My 8+ years in the field have familiarized me with working on complex datasets and implementing advanced analytics techniques essential for this project. My expertise extends from core data handling - including setting up local storage solutions, aligning different frequencies of data, to microstructure feature extraction, and advanced optimization for ML readiness. I excel at preventing lookahead biases, normalizing features, labeling, and ensuring outputs that are clean, reliable-as-an-ox Parquet files promptly available for ML model training. Having successfully rolled out solutions across finance amongst other industries - à la businesses dependent on data management and process optimization - I've honed my abilities to yield accurate results amidst tight deadlines. With me at the helm of your project, you can rest assured about quality, efficiency, and unlocking the full potential of your Binance data. Let's embark on this data-driven journey together.
₹1,000 INR in 40 days
3.8
3.8

Addressing the core challenge of efficiently processing and normalizing large datasets without introducing lookahead bias begins with a robust ingestion and database setup. Programmatic downloads of Binance's historical data can be executed seamlessly with the appropriate use of Python libraries, leveraging DuckDB for its high-performance data handling capabilities. Prioritizing the alignment of varying data frequencies into a unified timestamp sequence is crucial for accurate feature extraction. I will implement advanced statistical techniques using Polars for processing and ensure the extracted features are immediately ML-ready. The initial deliverable will be ready in 14 days. Ready to kick this off, what's the best way to get started?
₹800 INR in 40 days
0.0
0.0

I see you need a robust Python pipeline to process Binance Futures historical data into an ML-ready format. I'd build this using Python scripts to ingest and process data from Binance Vision, ensuring clean, normalized features stored in Parquet format. This will allow you to efficiently analyze and model data for better decision-making. I've worked with similar data pipelines for finance and trading industries, ensuring accurate results. Quick question: How soon do you need this pipeline up and running? Regards, Collen Jr Liebenberg
₹750 INR in 7 days
0.0
0.0

⚡️ONLY PAY IF YOU’RE IMPRESSED⚡️ I have extensive experience building data pipelines for financial and ML applications, including ingesting and processing time-series data like Binance Futures historical data. I can help by delivering a robust, efficient pipeline that produces clean, normalized, lookahead-bias-free datasets ready for modeling. Core Deliverables:➡️ - Programmatic data ingestion from Binance Vision - Storage setup using DuckDB/Parquet for scalability - Multi-frequency data alignment with unified timestamps - Feature extraction with bias prevention - Normalized, labeled outputs in Parquet format Approach:➡️ - Use Python and Polars for efficient processing - Strict validation to avoid data leakage - Modular, reproducible code for transparency Committed to delivering a high-quality product aligned with your goals. Looking forward to discussing this further. Kind regards, Aaron Roberts
₹950 INR in 30 days
0.0
0.0

Hisar, India
Member since Jun 3, 2026
₹12500-37500 INR
$750-1500 AUD
₹12500-37500 INR
$1500-3000 USD
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
₹12500-37500 INR
$1500-3000 USD
$2-8 USD / hour
$8-15 USD / hour
$250-750 USD
₹12500-37500 INR
min ₹2500 INR / hour
£30-51 GBP
$250-750 USD
₹12500-37500 INR
$50-100 USD
₹600-1500 INR
₹750-1250 INR / hour
₹1500-12500 INR