
Open
Posted
•
Ends in 3 days
My analytics stack must consume a continuous stream of events from a proprietary source (it isn’t Kafka, Flume, or Kinesis) and process them instantly with Apache Spark Structured Streaming. The job is to wire that source into Spark 3.x, perform the required real-time transformations and aggregations, and deliver the cleaned data to two sinks—Parquet files on S3 for long-term storage and a live dashboard feed for immediate insight. Deliverables • Production-ready Scala or PySpark code that runs on Spark 3.x in either Stand-Alone or YARN mode. • Structured Streaming logic featuring checkpointing, watermarking, and exactly-once guarantees. • Tuning guidelines for executor, memory, and parallelism settings to maintain sub-second end-to-end latency at 50 k events/sec. • A runnable test suite plus a concise README so I can reproduce results on my cluster. Acceptance Criteria 1. Pipeline ingests and writes a sustained 50 k events/sec with no data loss for 10 minutes. 2. Average latency from ingestion to sink remains below one second during that test. 3. Entire build, test, and deploy sequence executes with a single command (e.g., sbt run or spark-submit with a provided shell script). Include any specialized connectors or libraries you rely on and package everything in a Git repo so each commit is clear and reviewable.
Project ID: 39740684
4 proposals
Open for bidding
Remote project
Active 1 day ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
4 freelancers are bidding on average ₹606 INR/hour for this job

I am a Data Engineer with proven experience in setting up end-to-end data projects from source to sink. I have worked on building scalable pipelines that cover the entire lifecycle—data ingestion, transformation, storage, and delivery for analytics.
₹575 INR in 40 days
0.0
0.0

Hello, I am a Data Engineer with solid experience in AWS Glue, Azure Data Factory, Databricks, and PySpark. I specialize in building and optimizing ETL/ELT pipelines for both batch and real-time data processing. I have worked on projects involving AWS S3, Delta Lake, Oracle, and Snowflake integrations. My focus has always been on delivering scalable, automated, and high-performing pipelines that ensure accurate and timely data for analytics and reporting. For your project, I can: • Build and optimize real-time Spark pipelines tailored to your requirements. • Handle large structured and semi-structured datasets efficiently. • Ensure reliable, well-documented workflows for easy maintenance. I am confident I can deliver high-quality results within your expected timelines. Let’s connect to discuss your project in detail. Best regards, Rashmi
₹575 INR in 40 days
0.0
0.0

Bengaluru, India
Member since Aug 28, 2025
$10-50 AUD
₹600-1500 INR
₹12500-37500 INR
$15-25 USD / hour
$10-30 USD
€8-30 EUR
$250-750 USD
$750-1500 USD
$2-8 USD / hour
$10-30 CAD
$8-15 USD / hour
€8-30 EUR
₹600-1500 INR
₹12500-37500 INR
₹1500-12500 INR
₹400-750 INR / hour
$30-250 USD
$15-25 USD / hour
$40-60 USD
₹1500-12500 INR