
Completed
Posted
My production PostgreSQL 13 cluster moves thousands of transactions every second and can’t afford more than a few minutes of downtime. Day-to-day operations are solid, yet I want iron-clad confidence that any disaster—hardware failure, accidental DROP, or site outage—can be reversed quickly. Speed of recovery is the single most important goal. Current state • Full and differential backups run with pgBackRest, alongside continuous archiving for point-in-time recovery. • Backups land in a dedicated repo in DC-A, then the entire pgBackRest repository is replicated to DC-B with rsync. • I have observed lag in rsync that occasionally leaves standby servers waiting for the latest WAL segments. What I need 1. A reviewed and improved backup/recovery design that guarantees the shortest possible RTO/RPO for a high-volume workload. 2. A concrete, tested method for cross-DC WAL shipping that is demonstrably faster and more reliable than my current rsync pull—whether that is pgBackRest stanza-sync, native asynchronous replication tweaks, SSH streaming, or another approach you recommend. 3. Documented runbooks for common failure scenarios so my team can execute restores without you on call. 4. Proof of concept: a simulated “fat-finger” DROP on staging, followed by full recovery, timed and documented. Acceptance criteria • End-to-end restore of a 500 GB dataset completes within the recovery window we agree on. • No WAL gap exceeds the archive_timeout you prescribe, verified under sustained write load. • All procedures are repeatable via shell scripts or Ansible playbooks checked into Git. Tools in play today: PostgreSQL 13.x, pgBackRest, rsync, Debian 11. I’m open to additional utilities (barman, wal-g, pg_receivewal, etc.) if they meet the above targets. If this sounds like your sweet spot, let’s talk through your proposed architecture and testing plan so we can lock in a safer, faster path to recovery.
Project ID: 40366742
9 proposals
Remote project
Active 9 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

Your rsync bottleneck between DC-A and DC-B is the weak link—standby servers waiting for WAL means slower recovery when every second counts. I'll implement parallel WAL streaming, pgBackRest tuning, synchronous replication logic, automated failover, and tested disaster recovery runbooks to cut your RTO from minutes to under 60 seconds. I've handled similar multi-region PostgreSQL setups. ₹2500, 5 days. Best regards, Val
₹8,000 INR in 4 days
0.0
0.0
9 freelancers are bidding on average ₹2,311 INR/hour for this job

With over 5 years of experience as a Full-Stack Developer and DevOps Engineer, I have developed a keen understanding of the intricate balance between clean code and rock-solid infrastructure. My proficiency with PostgreSQL and system administration would be highly instrumental in delivering a reviewed and improved backup strategy for your high-volume workload. Understanding your need for an enhanced recovery window, my extensive knowledge of Linux administration and deployment on Kubernetes ensures that all procedures are repeatable precisely via shell scripts or Ansible playbooks committed to Git. With this approach, your team can confidently execute restores without any additional support being required. Moreover, my solid command over PostgreSQL coupled with my proactiveness would facilitate a faster and more reliable cross-DC WAL shipping mechanism. I will meticulously test each approach against sustained write load before recommending the most efficient one for you. Let’s collaborate to bring your PostgreSQL cluster's RTO/RPO down to an absolute minimum while empowering your team with the requisite runbooks for common failure scenarios.
₹800 INR in 40 days
1.9
1.9

Hi Hiring Manager, I’m Sean, a Database Reliability Engineer with 12 years’ experience specializing in PostgreSQL, backup & recovery, and automation. I delivered a sub-5-minute RTO for a 600 GB OLTP cluster by combining streaming WAL, parallel restore, and automation for repeatable recoveries. My approach will analyze your pgBackRest configuration and replace the rsync bottleneck with a low-latency cross-DC WAL shipping strategy (e.g., pg_receivewal streaming to a remote archive with redundant targets or stanza-sync with direct streaming), optimize archive_timeout and WAL segment sizing for sustained high write rates, and implement parallel, incremental restores to minimize RTO; I can do this project perfectly. I typically deliver this scope in 25 days, including tests, Ansible playbooks, and documented runbooks. All work includes automated tests, logging/monitoring guidance, OWASP basics where applicable, clean code and docs, and data-privacy/guardrail notes for any AI tooling used. I’ll provide a staged fat-finger recovery POC with timing and scripts What maximum RTO and RPO targets (in minutes) do you require for the 500 GB restore under production-like sustained write load, and are there any blackout windows or network constraints between DC-A and DC-B I should plan around? Best regards, Sean
₹4,657 INR in 25 days
1.4
1.4

The main challenge isn't just setting up pg_basebackup or WAL archiving, it's designing a recovery strategy that guarantees RPO/RTO under your transaction load without impacting production performance. I've worked with high-throughput PostgreSQL clusters where even a few seconds of replication lag or backup I/O contention caused cascading issues. My approach: • Continuous WAL archiving to S3/GCS with compression • Point-in-time recovery testing (not just backups. actual restore drills) • Streaming replication + logical replication where needed • Monitoring for replication lag - WAL generation rate, checkpoint tuning • Automated failover with Patroni/repmgr if you need HA I'll audit your current setup, identify gaps, implement the strategy. And document runbooks so your team can handle any disaster scenario confidently. Since I'm building my Freelancer profile, I'm offering this at a competitive rate for quality work and a solid review. Check my profile for backend/infrastructure projects, I've handled production database work across multiple stacks. Ready to make your cluster disaster-proof.
₹2,793.04 INR in 40 days
0.0
0.0

Hi, This is exactly the kind of high-stakes PostgreSQL environment I specialize in—where recovery speed is not just important, it’s critical. You already have a solid foundation with pgBackRest + WAL archiving, but the bottleneck you’re seeing with rsync lag is a known limitation in high-throughput systems. I can help you redesign this into a low-latency, near-zero data loss recovery architecture.
₹1,000 INR in 40 days
0.0
0.0

India
Payment method verified
Member since Apr 5, 2014
₹600-1500 INR
$2-8 USD / hour
₹100-400 INR / hour
₹600-1500 INR
$8-15 USD / hour
$250-750 USD
₹100-400 INR / hour
$15-25 USD / hour
$30-250 USD
£750-1500 GBP
₹100-400 INR / hour
$15-25 USD / hour
₹12500-37500 INR
$250-750 USD
₹1500-12500 INR
$250-750 USD
$10-15 USD
$10-30 USD
$3-10 SGD / hour
₹37500-75000 INR
$250-750 USD
$30-250 USD
£10-20 GBP
$1500-3000 USD