
Completed
Posted
Paid on delivery
I’m running an existing large-language-model in production and need targeted performance gains—not a full retrain. Your task is to inspect the current setup, identify the bottlenecks, and implement practical optimizations so the model replies faster and runs leaner without harming output quality. We’ll decide together which metric (latency, memory footprint, throughput) will matter most once you’ve profiled the system, but the immediate goal is measurable, benchmarked improvement. Expect to work with PyTorch, Hugging Face Transformers, and CUDA; if you prefer alternatives such as TensorRT, ONNX, or quantization/pruning frameworks, feel free to propose them. Deliverables • A concise optimisation plan after your initial audit • Updated code, scripts, or checkpoints reflecting the changes • A before-and-after benchmark report showing the gains I’m paying hourly and will fund milestones as we hit each stage, starting with the audit. Please share a brief note on similar performance-tuning work you’ve completed and your availability to start.
Project ID: 40486304
2 proposals
Remote project
Active 6 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

Hi, I can do it for you , i am from IIT and I have similar experience. Based on the use case I will try to prune the model
₹600 INR in 7 days
0.0
0.0
2 freelancers are bidding on average ₹825 INR for this job

Having cut my teeth in the field of Machine Learning, with a special focus on performance optimization and AI system integration, I'm confident that your search for a freelancer ends here. With your existing language model in production, I understand that your priority is to find targeted improvements rather than start over. This aligns perfectly with my expertise, as I have successfully undertaken similar optimization projects in the past. I am well acquainted with the tools and frameworks that you're currently using such as PyTorch and Hugging Face Transformers, enabling me to seamlessly step into your project and undertake the necessary inspection and identify bottlenecks. However, being open to new approaches, I'd also like to propose alternatives such as TensorRT, ONNX or quantization/pruning frameworks - if they promise better results for your case. Furthermore, let me assure you that I approach client projects not as mere exercises but as opportunities to deliver tangible value. Accordingly, the deliverables for this engagement - an optimization plan after the initial audit, updated code/scripts/checkpoints reflecting the implemented optimizations and a benchmark report showcasing the improvements - are all well within my domain expertise. My hourly payment structure provides reassurance that we will hit pre-defined milestones and guarantees the delivery of incremental value throughout our collaboration. Reach out and let's start optimizing together!
₹1,050 INR in 7 days
2.0
2.0

Pune, India
Member since Feb 4, 2009
₹250000-500000 INR
$30-250 USD
$25-50 USD / hour
₹600-1500 INR
$250-750 USD
₹75000-150000 INR
$10-30 USD
$15-25 USD / hour
€12-18 EUR / hour
£5000-10000 GBP
€12-18 EUR / hour
$10-30 USD
₹1500-12500 INR
₹100-400 INR / hour
$8-15 USD / hour
£10-15 GBP / hour
₹600-20000 INR
€12-18 EUR / hour
₹12500-37500 INR
₹600-1500 INR