
Closed
Posted
Paid on delivery
I need a production-ready search stack that starts with an ETL flow pulling exclusively from our internal PostgreSQL databases. The pipeline must ingest and transform 38 000+ B2B category records and 5 000–10 000 company profiles, then run cleaning, vectorization, and enrichment steps so every record is categorized and stored in a pgvector-enabled schema. Once the data is in place, a separate microservice should expose a REST API that supports hybrid search: dense vectors (OpenAI text-embedding-3-small) combined with BM25 and blended with RRF scoring. Results have to work equally well in Hungarian and English; huspacy, spaCy, and Open AI are the preferred tools for language handling and any fallback generation. I expect the codebase in Python 3.10+, organised as two deployable units: • ETL package that connects to the existing tables, performs the vector and category enrichment, and writes into PostgreSQL/pgvector with idempotent reruns. • FastAPI microservice offering endpoints for single-query search and batch queries, with Docker files and a short README explaining environment variables and health checks. Acceptance will be based on end-to-end tests: I run the ETL, hit /search with a Hungarian and an English query, and receive ranked results that include both BM25 and vector hits blended by RRF. Detaled RFQ attached.
Project ID: 40192978
15 proposals
Remote project
Active 15 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
15 freelancers are bidding on average ₹28,285 INR for this job

I will design a production-ready search stack starting with an ETL flow that pulls from internal PostgreSQL databases, ingesting and transforming 38,000+ B2B category records and 5,000-10,000 company profiles, then running cleaning, vectorization, and enrichment steps, storing categorized records in a pgvector-enabled schema. I will create a separate microservice with a REST API supporting hybrid search using OpenAI text-embedding-3-small, BM25, and RRF scoring, working equally well in Hungarian and English, using huspacy, spaCy, and Open AI for language handling. I will deliver a Python 3.10+ codebase as two deployable units: an ETL package and a FastAPI microservice, including Docker files and a README, with end-to-end tests for acceptance, adapted to the proposed budget. Waiting for your response in chat! Best Regards.
₹31,250 INR in 3 days
4.8
4.8

Hello there, I reviewed your project Bilingual AI Search Pipeline and understood the requirements at a high level. I focus on delivering clear, stable, and maintainable solutions aligned with the actual scope, I can work with Python, Data Processing, PostgreSQL and follow a clean development process with proper structure and error handling. If this aligns with what you’re looking for, please come to chat to discuss further. Best regards
₹12,500 INR in 7 days
4.4
4.4

Hi, I’m Karthik, a Senior Python & AI/Search Engineer with 10+ years of experience, building production-grade ETL and search pipelines on PostgreSQL for B2B platforms. I clearly understand your requirement: a bilingual (HU/EN) hybrid search stack with clean ETL, pgvector, and a FastAPI microservice, ready for real production use—not a demo. Proposed architecture • Python 3.10+ codebase split into two deployable units • ETL pipeline pulling from existing PostgreSQL tables • Cleaning, enrichment, categorization, and vectorization using OpenAI text-embedding-3-small + spaCy / huSpaCy • Idempotent ETL reruns writing into pgvector-enabled schema Search service • FastAPI microservice • Hybrid search: BM25 + dense vectors, blended via RRF • Supports single and batch queries • Consistent ranking for Hungarian & English queries • Dockerized with health checks & env-based config What you’ll receive ✔ End-to-end working ETL + search API ✔ Ranked results combining lexical & vector hits ✔ Dockerfiles + concise README ✔ Code ready for production deployment I’m comfortable handling datasets at your stated scale and aligning strictly with an RFQ. Happy to review the detailed RFQ and start immediately. — Karthik
₹54,780 INR in 7 days
5.3
5.3

Hello, I will build a production-ready, hybrid search stack based on a Python 3.10+ architecture. The core deliverable will be two units, an ETL package that connects to your PostgreSQL database, ingests the B2B category records and company profiles, performs cleaning, and uses the OpenAI text embedding 3 small model for vectorization. This package will load the enriched data into your PostgreSQL/pgvector schema with idempotent reruns. The second unit will be a FastAPI microservice exposing a REST API. This API will support hybrid search, blending dense vector similarity with BM25 using RRF scoring, ensuring reliable results in both Hungarian and English, leveraging specialized NLP tools where necessary. What are the exact table names within your internal PostgreSQL database that contain the B2B category records and company profiles? What is the highest projected number of simultaneous search queries per second the FastAPI microservice needs to handle? Are you providing the API key for the OpenAI text embedding 3 small model? Thanks, Bharat
₹35,000 INR in 12 days
4.5
4.5

Hi — I can deliver the two-unit, production-ready stack you described: Postgres → ETL enrichment → pgvector schema, plus a FastAPI microservice providing hybrid search (BM25 + embeddings) fused via RRF, working equally well in Hungarian and English. ETL package Reads only from internal PostgreSQL tables Cleaning + language-aware normalization (huSpaCy/spaCy) Embedding generation with batching, retry/backoff, rate-limit safety Writes to pgvector-enabled schema with idempotent reruns (upserts + checkpoints) Search microservice /search + batch endpoint BM25/lexical retrieval + vector retrieval Transparent RRF blending so results include both hit types Dockerfiles, env-based config, health checks, short README Acceptance readiness End-to-end test harness: run ETL → query HU + EN → validate blended ranked output. Quick scope locks: Preferred lexical layer: Postgres FTS only, or OK to use a BM25 extension? Which fields to embed (category name, description, tags, company bio, etc.)? Any latency target for /search (p95)?
₹20,000 INR in 7 days
4.2
4.2

With my extensive technical background and steadfast commitment to delivering high-quality software solutions, I am certain that I am the perfect freelancer for your Bilingual AI Search Pipeline project. My 8+ years of hands-on experience in developing nuanced web and application systems have equipped me with a broad skillset, including a deep understanding of PostgreSQL and Python, which are vital for this project. Additionally, my proficiency in employing huspacy, spaCy, and OpenAI aligns perfectly with your preference for language handling and fallback generation. Not only will this streamline communication between records in Hungarian and English, but it will also ensure that the resulting data is accurate and reliable. Moreover, for your microservice requirements, I have strong knowledge in building REST APIs with FastAPI framework in Python. My clients choose me because I deliver practical solutions within assigned deadlines without compromising quality. Throughout my career, I have built clean codebases like what you mentioned in the project requirements - concise, well-structured, documented and easily deployable units. Rest assured that I will utilize best-practices and organize the final codebase diligently to meet your two-deployable-units need. With me on your team, you can expect more than just a freelancer; you'll have a partner dedicated to optimizing your search stack to its fullest potential. Let's make this happen!
₹25,000 INR in 7 days
3.2
3.2

The real challenge here is turning messy internal PostgreSQL business records into a bilingual hybrid search index that stays correct, rerunnable, and rank-stable in production. I’d deliver a two-unit architecture: an idempotent ETL that cleans, enriches, embeds, and writes into a pgvector-first schema, and a FastAPI microservice exposing hybrid BM25+dense retrieval with RRF fusion. Key focus will be correctness under reruns, deterministic enrichment, consistent scoring, and language-safe preprocessing (huSpaCy/spaCy). The system will be testable end-to-end with clear health checks, reproducible Docker builds, and audit-friendly logging around ranking inputs and outputs. If you want this unblocked fast, I can start directly from the schema + API contract and deliver the first working hybrid search slice immediately.
₹18,000 INR in 5 days
2.8
2.8

Drawing from our in-depth knowledge and experience in working with PostgreSQL and Python, we are confident we can deliver a state-of-the-art Bilingual AI Search Pipeline that meets your specific requirements. Your project's emphasis on data retrieval, cleaning steps, API development, and the need for multi-language support aligns perfectly with our strong suit. We have successfully developed similar complex ETL flows before, involving data transformation, cleaning, vectorization, and enrichment. Moreover, we are practiced at ensuring idempotent reruns to maintain system stability. Lastly, having adopted structured practices such as Dockerizing applications for deploying into separated packages with tightly written READMEs, we assure you a clear approachable solution. We see eye-to-eye on unit testing being crucial; thus, we will dedicate ourselves to providing end-to-end functionality by setting up tests that meet your expectations. So join forces with us to bring life to your vision; let's build the bilingual AI search pipeline together!
₹25,000 INR in 7 days
1.5
1.5

Hello Nallaiyan -., I checked your project, and it looks interesting. This is something we already work on, so the requirements are clear from the start. We mainly work on Python, Data Processing, PostgreSQL, ETL, Microservices, FastAPI We focus on making things simple, reliable, and actually useful in real life not overcomplicated stuff. Let’s connect in chat and see if we’re a good fit for this. Best Regards, Ali nawaz
₹50,000 INR in 8 days
0.0
0.0

Hey , Good morning! I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best quality and highest performance at lowest price. I can complete your project on time and your will experience great satisfaction with me. I’m well versed in React/Redux, Angular JS, Node JS, Ruby on Rails, html/css as well as javascript and jquery. I have rich experienced in Microservices, Data Processing, ETL, Python, PostgreSQL and FastAPI. For more information about me, please refer to my portfolios. I’m ready to discuss your project and start immediately. Looking forward to hearing you back and discussing all details.. Thanks & Regards
₹27,750 INR in 4 days
0.0
0.0

I am a motivated MCA student with a strong interest in building real-world, production-ready systems using Python, PostgreSQL, and modern AI techniques. I have hands-on experience working with data pipelines, ETL workflows, and databases, which allows me to efficiently extract, clean, transform, and manage large datasets. My background in AI/ML and NLP helps me understand concepts like embeddings, vector search, and relevance ranking, and I enjoy applying these ideas to solve practical problems. I am comfortable working with tools like FastAPI, Docker, spaCy, and pgvector, and I focus on writing clean, scalable, and maintainable code. I also value system reliability, idempotent design, and clear documentation. Overall, I enjoy learning deeply, experimenting responsibly, and translating technical requirements into working solutions that add real value.
₹25,000 INR in 7 days
0.0
0.0

I am a keen and passionate technologist on a mission to empower businesses by unlocking insights from data using cutting-edge AI/ML methodologies. With a rich blend of engineering, research, and product experience, I thrive at the intersection of innovation and impact—transforming raw data into meaningful strategies that drive growth. Core Expertise: - Languages & Frameworks | Python (data science, data engineering, web development, Generative AI) - Machine Learning | Supervised, Unsupervised & Semi-Supervised learning - Deep Learning Architectures | CNN, RNN, LSTM, ANN - Computer Vision | Object Detection, Image Segmentation, Preprocessing, Activity Recognition - Natural Language Processing (NLP) | Tokenization, Tf-IDF, Count Vectorizer, Word2Vec, Transformers - Generative AI + Agentic AI | Large Language Models (Text & MultiModal), RAG (Vanilla & Advanced), A2A, MCP - Time Series Forecasting | ARIMA, SARIMA, LNN - Big Data & Engineering | PySpark, Hadoop - Data/Text Preprocessing | Stemming, Lemmatization, Feature Engineering/Selection, Data Wrangling - Visualization & BI Tools | Matplotlib, Seaborn, Plotly, MS Power BI - Cloud Ecosystems | Microsoft Azure, Google Cloud Platform - Databases | SQL, NoSQL Let’s connect and collaborate to build intelligent solutions that make a difference!
₹25,000 INR in 7 days
0.0
0.0

Having had an extensive career as a Data Engineer over the past 3 years, I can confidently say that your Bilingual AI Search Pipeline project is right up my alley. With a focused expertise on building and managing scalable ETL pipelines in Python, your requirement of establishing an ETL package is something I can accomplish with ease using my proficiency in PostgreSQL and strong SQL background. Moreover, my proficiency in orchestrating data workflows with tools like Azure Data Factory aligns perfectly with the organisational structure you desire for the codebase - two deployable units. Furthermore, I bring to the table hands-on experience of handling large-scale semi-structured datasets using Big Data technologies including Apache Spark in cloud environments for efficient data processing, which adds great value to your project. Additionally, familiarity with AWS S3 gives me an edge in effectively handling storage and checkpoint management, ensuring cost-effectiveness and scalability of the data.
₹25,000 INR in 7 days
0.0
0.0

Thanjavur, India
Member since Jul 14, 2025
€2-6 EUR / hour
₹37500-75000 INR
₹600-1500 INR
$15-25 USD / hour
$250-750 USD
₹1500-12500 INR
₹1500-12500 INR
₹37500-75000 INR
₹1500-12500 INR
$1500-3000 USD
$10-30 USD
$30-250 USD
€750-1500 EUR
$10-30 USD
₹37500-75000 INR
$12345678-123456789 USD
₹1500-12500 INR
$750-1500 USD
₹1500-12500 INR
₹150000-250000 INR