Filtrér

Mine seneste søgninger
Filtrer ved:
Budget
til
til
til
Slags
Færdigheder
Sprog
    Job-status
    1,043 pyspark jobs fundet

    ...engineering experiences with various aws services Experience building end-to-end data pipelines (schema discovery, ingestion, transformation, orchestration, monitoring) Experience working with relational databases like Oracle, MySQL, and SQL Server etc Experience with data ingestion from on-prem systems to cloud Experience with streaming platforms like Kafka or AWS Kinesis Strong skills in Python, PySpark, SQL, and Terraform...

    €982 Average bid
    €982 Gns Bud
    170 bud

    ...the next round of hiring I want an accomplished Senior Data Engineer to sit in on our technical interviews for roughly two hours each day. The role is purely evaluative: you will craft probing questions, join live video calls, and quickly score each candidate’s depth of knowledge across Python, Scala and SQL. Our stack centres on Azure and Databricks, so practical insight into large-scale Spark/PySpark jobs, data-model design, ETL orchestration and cloud performance tuning is essential. Candidates frequently discuss streaming, optimisation strategies and modern AI/ML add-ons, so any hands-on exposure to libraries such as PyTorch, NumPy, SciPy or TensorFlow will help you challenge them at the right level, though it is not mandatory. Availability is limited to two focus...

    €231 Average bid
    €231 Gns Bud
    17 bud

    ...narrative continuity before passing curated context into a citation aware LLM routing layer that prioritizes Gemini, OpenAI, then Anthropic, then Ollama local models, enforcing context bound generation and preventing hallucination outside retrieved evidence. Indexing is parallelized using ProcessPoolExecutor for efficient multi core utilization and automatically scales to distributed ingestion via PySpark when corpus size exceeds a configured threshold, enabling safe handling of 20k plus documents or 50GB class corpora, while the system is wrapped in a full MLOps backbone that integrates MLflow for experiment tracking of retrieval metrics, PPO reinforcement learning rewards, and parameter tuning, exposes Prometheus metrics for latency and retrieval monitoring compatible with Graf...

    €217 Average bid
    €217 Gns Bud
    14 bud

    Description: We’re looking for an experienced Data Engineer preferably based from Dubai to help build and manage data pipelines for a global platform. Most work is in Azure, using Azure Data Factory, ADLS, and Databricks. What you’ll do: Build and manage PySpark/Spark pipelines in Databricks Schedule and monitor pipelines in Azure Data Factory Optimize Databricks for better performance Keep code and documentation organized and clear Requirements: Experience with Azure cloud and Databricks Strong PySpark / Spark skills Experience building scalable, reliable data pipelines Details: Project-based, with potential to move to full-time Ideal for engineers who like building cloud-native pipelines

    €10 / hr Average bid
    €10 / hr Gns Bud
    20 bud

    ... • Read multiple flat-file formats (mainly CSV, with the occasional JSON). • Apply thorough data-cleansing rules—removing duplicates, enforcing data types, flagging out-of-range values, and normalising text fields. • Run validation checks so that only clean, schema-compliant rows proceed to the load step. I’m happy for you to choose the stack you are most efficient with—Python (pandas, PySpark), Talend, or another ETL tool—as long as the final solution is reproducible and can be triggered automatically (CLI, scheduled job, or cloud function). If you think aggregation or more advanced joins would improve the dataset, flag that as a future enhancement; for now, cleansing and validation are the must-haves. Deliverables 1. Well-docum...

    €22 Average bid
    €22 Gns Bud
    24 bud

    ...Azure Data Engineer to support and enhance our existing data platform on an ongoing basis. You should be strong in: Azure Data Factory (ADF) for building and maintaining ETL/ELT pipelines Azure Databricks and PySpark for large‑scale data processing Python for data engineering utilities, automation, and integration Delta Lakes/Lakehouse concepts, performance optimization, and troubleshooting Working with SQL‑based data sources, data warehousing, and BI integrations Responsibilities Design, build, and optimize data pipelines in Azure ADF and Databricks Develop and maintain PySpark and Python jobs for batch and near real‑time workloads Implement best practices for data quality, observability, and monitoring Collaborate with our internal team, follow existing standa...

    €9 / hr Average bid
    €9 / hr Gns Bud
    34 bud

    I am looking for an experience data engineer with 4-5 years of experience with Pyspark And Python handson experience. Experience with handling a complex data pipeline.

    €4 / hr Average bid
    €4 / hr Gns Bud
    6 bud

    ...Databricks Data Analyst and Data Engineer certifications and want a structured, hands-on tutoring program that also deepens my Snowflake skills. The goal is to become confident building end-to-end data pipelines, running analytics, and understanding platform architecture well enough to pass the exams and perform the work in practice. Focus areas Databricks • Data processing & analytics with PySpark/SQL and Delta Lake • Machine learning workflows inside the Databricks environment • Workspace, cluster, job, and Lakehouse architecture Snowflake • Core data-warehousing concepts and best practices • Query tuning and overall performance optimisation • Security features: RBAC, masking, encryption, and access policies How we can wor...

    €165 Average bid
    €165 Gns Bud
    48 bud

    ...Object Storage, Data Flow (Spark), and Data Catalog. * Solid understanding of Finance / Order-to-Cash (O2C) data entities and processes. * Knowledge of data modeling, lineage, and governance principles. * Familiarity with CI/CD and DevOps for automated deployments. Preferred Skills * OCI Data Integration certification. * Experience integrating Oracle Cloud ERP with OCI DI. * Knowledge of Python or PySpark for custom transformations. * Exposure to Data Science and ML pipelines leveraging OCI services. * Experience with monitoring tools like Grafana...

    €1883 Average bid
    €1883 Gns Bud
    5 bud

    ...guidance with embedding Genie via API into apps, Teams, or dashboards. • Train internal teams on Genie capabilities, administration, and operational readiness. Required Skills & Experience • Strong practical experience with Azure Databricks, Lakehouse architecture, Unity Catalog, SQL Warehouse. • Knowledge of Genie AI, foundational models, or Databricks conversational analytics. • Competency in PySpark, SQL, data modeling, and enterprise data engineering practices. • Familiarity with Azure ecosystem (Data Lake, Data Factory, DevOps). • Ability to translate business questions into NLQ-friendly dataset design. • Excellent communication and ability to work with cross functional data, BI, and business teams. Nice to Have • Experience with A...

    €2 / hr Average bid
    €2 / hr Gns Bud
    3 bud

    I have a Hadoop cluster holding several large data sets, and I need a seasoned PySpark developer who also writes rock-solid SQL. The immediate aim is to connect to the cluster (YARN/HDFS with Hive metastore), develop or refine PySpark jobs, optimise the accompanying SQL, and make sure everything runs smoothly end-to-end. You’ll receive access to a staging namespace plus a sample of the data. Once the logic checks out we’ll promote the code to the full environment. Deliverables • A clean, well-commented PySpark notebook or .py job that executes successfully on the cluster • The corresponding SQL script or view definitions ready for Hive or spark-sql • A concise README detailing execution steps, parameters, and expected outputs Accep...

    €68 Average bid
    €68 Gns Bud
    11 bud

    I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeli...attached to any cluster. Acceptance criteria • Parameter-driven notebooks organised by layer. • Reusable GraphQL connector packaged as a .whl. • Bitbucket Pipelines yaml that runs unit tests, uses the Databricks CLI to deploy notebooks, and executes an integration test on commit. • Clear README detailing how to add a new API endpoint and where to place cleaning logic. Leverage native tools—PySpark, SQL, Delta Lake, dbutils—while keeping external libraries to a minimum and fully documented. Please share a brief outline of your approach and any relevant Databricks + Bitbucket CI experience s...

    €298 Average bid
    €298 Gns Bud
    115 bud

    I’m a beginner looking for a 1-on-1 Databricks instructor for a very hands-on, fast-paced 2-week program. Requirements: - Strong real-world Databricks experience - Hands-on Apache Spark (PySpark), SQL, Delta Lake - Real use case / mini project (end-to-end pipeline) - Live screen sharing, coding together - Beginner-friendly but practical (no theory-only) Goal: By the end of 2 weeks, I want to confidently build and understand a real Databricks data pipeline. Availability: 5–6 sessions per week, 1–1.5 hours per session Please share: - Your Databricks experience - How you would structure these 2 weeks - Your hourly rate Thanks!

    €18 / hr Average bid
    €18 / hr Gns Bud
    59 bud

    ...across multiple source systems. Build and optimize Foundry pipelines using Code Workbooks (PySpark, SQL, Scala) and Quiver. Support data integration, feature engineering, and pipeline debugging for production AI workloads. Implement security and permissions architecture aligned with enterprise governance. Help develop Foundry applications using Workshop, Contour, and Slate for analytics and decision-making. Guide on best practices for CI/CD, testing, and deployment within Foundry. Provide mentorship and troubleshooting support during live client engagements. Required Skills: Strong hands-on experience with Palantir Foundry (Ontology, Code Workbooks, Quiver, Workshop). Proficiency in Python, PySpark, and SQL. Experience with data modeling, transformation logic, and pipelin...

    €12 / hr Average bid
    €12 / hr Gns Bud
    21 bud

    Need a strong streaming experience person to develop design deploy Pyspark publishing and upserting job in EMR with Spark, MongoDB(documentDb) connector, AWS EMR step functions, Cloud watch, docker, Kafka cluster architecture, Airflow dags, Gitlab, Pycharm, Cursor AI IDE etc needed for environment experience

    €10 / hr Average bid
    €10 / hr Gns Bud
    40 bud

    ...patterns that Databricks loves to test. • Fresh practice questions (or a curated question bank) with detailed explanations so I understand not just the right answer but the thinking process. • At least one full-length mock exam under timed conditions followed by a debrief on weak areas and strategies to avoid common pitfalls. I work mainly in the Databricks notebook environment with Python, PySpark, and SQL, so please weave real-world examples into the prep. I’m flexible on session times and frequency; we can agree milestones and refine the plan as we go. If you’ve already helped others pass this exam—or you hold the certification yourself—tell me how you’d tackle my study roadmap and what materials you’d bring to the table. I...

    €40 Average bid
    €40 Gns Bud
    2 bud

    ...actual medicines and would map once the inconsistencies are ironed out, so I want the process to be fully automated, driven by a robust auto-correct algorithm rather than manual review. Remaining 0.1% could be non medical entries, and need to be deleted. I am open to proven techniques—fuzzy matching, phonetic hashing, Levenshtein, word embeddings, or a hybrid—as long as they scale. Python, pandas, PySpark, or any other big-data friendly stack is fine, provided the final solution is reproducible and well documented. Deliverables • Clean, executable scripts (Jupyter notebook or .py) that ingest both files, normalise product names, detect duplicates, and output a one-to-one mapping table. • A brief README explaining dependencies, algorithm logic, and how ...

    €530 Average bid
    €530 Gns Bud
    39 bud

    ...Infrastructure Microsoft Azure (Functions, Logic Apps, Service Bus, Blob Storage, Data Factory, Azure DevOps) AWS Cloud Docker, Kubernetes RabbitMQ CRM, ERP & Enterprise Platforms Microsoft Dynamics CRM 365 Dynamics Business Central Sage CRM NopCommerce Sitefinity v12.2 Umbraco v8.0 DotNetNuke v4.0 Python, AI & Advanced Solutions Python, Django, Flask, Pyramid REST APIs, WebSockets PySpark AI Email & Chatbot Solutions Data Science & Analytics CMS, E-Commerce & Web Platforms WordPress, Joomla, Drupal Prestashop PHP-based systems BI, Finance & Business Support Power BI Advanced Excel Accounting, Finance & Bookkeeping Data Entry & Business Reporting MS Office Suite Tools & Delivery Methodology Git (Version Control) N...

    €4 / hr Average bid
    €4 / hr Gns Bud
    20 bud

    ...short interpretive notes that fold easily into manuscripts. What matters most is hands-on mastery of data extraction, table linking, and general database management within MIMIC. Solid grounding in observational study design, epidemiology, and EHR quirks is essential; a background in medicine or public health will make communication smoother. Working code in SQL plus either tidyverse/R or pandas/pySpark is expected. The immediate deliverable is a fully cleaned analytic dataset with the accompanying scripts and an outline of the statistical approach. After that, I plan to keep the collaboration open for additional projects and sensitivity analyses as new questions arise....

    €23 / hr Average bid
    €23 / hr Gns Bud
    67 bud

    I’m looking for a Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure ...Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure and scalable way. This is not ad-hoc ETL — it’s a platform-style setup. Tech stack involved: • AWS: S3, SQS, Lambda, MWAA (Airflow), EMR Serverless • Data Processing: PySpark, Apache Spark • Data Lake: Apache Iceberg, AWS Glue Catalog • Governance & Security: Lake Formatio...

    €15 / hr Average bid
    €15 / hr Gns Bud
    40 bud

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    €254 Average bid
    €254 Gns Bud
    9 bud

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    €216 Average bid
    €216 Gns Bud
    4 bud

    My current résumé sells me as a data engineer, yet my next move is a Data Analyst role. I need the Work Experience and Skills sections re-worked so recruiters immediately see me as a strong analytical hire. Here’s what you’ll be working with • Hands-on background in Hadoop administration, PySpark development, Databricks workflows and day-to-day data analysis. • A solid foundation in SQL and reporting tools, though these strengths are not highlighted well in the document. What I’m after • Rewrite both sections to spotlight analytical impact, business-friendly storytelling and in-demand keywords (think SQL, dashboards, data visualization, statistical insight, KPI tracking, etc.). • Re-order bullet points around results, not...

    €19 Average bid
    €19 Gns Bud
    6 bud

    The core of my remote-sensing crop-yield project is in place, but the code will not run from start to finish. I need a fresh set of eyes to hunt down and eliminate the blockers so that the pipeline executes smoothly on Databricks and locally. Current state • Repository already contains: – Spark-based preprocessing notebooks (PySpark) – Trained ML model scripts and saved artefacts – A handful of Databricks experiment notebooks for exploration What I need most Debugging is the priority. I am not after a full rewrite—I want the existing pieces to work together. You are free to suggest refactors where they remove obvious bottlenecks, but the first milestone is simply getting the code to run cleanly. Focus areas • Spark preprocessi...

    €8 Average bid
    €8 Gns Bud
    9 bud

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    €572 Average bid
    €572 Gns Bud
    58 bud

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    €608 Average bid
    €608 Gns Bud
    29 bud

    I have an existing SAS program that handles end-to-end data processing for a single SQL Database source. The code cleans raw tables, applies a series of transformations, then produces several aggregated outputs that feed downstream reports. I now need the entire workflow re-implemented in PySpark running on Azure Databricks so I can retire the SAS environment and take advantage of Databricks’ scalability. You will receive: • The original .sas files with inline comments that explain each step • A data-dictionary of the SQL tables involved • Sample input/output datasets to verify parity What I’m expecting from you: 1. A well-structured Databricks notebook (or .py files) that reproduces the SAS logic for data cleaning, transformation, and aggregat...

    €104 Average bid
    €104 Gns Bud
    28 bud

    ...AWS and Databricks. This role is focused on hands-on execution, optimization, and support within a clearly defined scope. Key Responsibilities Enhance and maintain existing Databricks (PySpark) data pipelines Work with AWS services such as S3, Glue, Lambda, Redshift/Athena Optimize data workflows for performance and reliability Implement data transformations, validations, and incremental loads Troubleshoot and resolve pipeline and data issues Maintain documentation for assigned components Required Experience & Skills 6–8 years of experience in Data Engineering Strong hands-on expertise in Python & PySpark Proven experience with Databricks Good knowledge of AWS data services Strong SQL and data modeling skills Ability to work independently in a remote setu...

    €637 Average bid
    €637 Gns Bud
    13 bud

    ...running the usual checks for duplicates, missing values, and outliers. Once it is clean, I expect you to apply the appropriate statistical and machine-learning techniques—time-series decomposition, clustering, cohort or basket analysis, whichever combination best surfaces trend signals. Python or R is fine (Pandas, NumPy, scikit-learn, tidyverse, etc.), and if you prefer a big-data stack such as PySpark, that works too; the volume will justify it. Please package the outcome as: • A concise written report (PDF or Markdown) that explains the key trends and how you arrived at them. • Visualisations (static or interactive) that make the findings easy to consume for non-technical stakeholders—Matplotlib, Seaborn, Plotly, or Tableau Public dashboards are all a...

    €17 / hr Average bid
    €17 / hr Gns Bud
    50 bud

    I need an experienced engineer who can sit with me in Pune MH India and provide hands-on, offline technical support for daily data engineering tasks. The focus is strictly on Python and PySpark: reviewing code, untangling bugs, optimising Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present in Kharadi/Viman Nagar/Magarpatta Area—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular...

    €258 Average bid
    €258 Gns Bud
    10 bud

    ...Databricks. La idea es que el usuario complete el formulario, los datos queden almacenados directamente en una tabla de Databricks y, con un clic, se genere un informe tipo resumen ejecutivo centrado en indicadores clave de rendimiento (KPI). Busco a alguien que domine tanto la parte Front-End (HTML, CSS, JavaScript) como la integración Back-End en Databricks: notebooks, Delta Lake, Databricks SQL o PySpark. El flujo debería quedar así: • El formulario se sirve como componente web incrustado en la interfaz de Databricks (o bien como un Job/Notebook con widgets). • Al enviarse, persiste la información en una tabla Delta. • Un proceso desencadenado consulta esos registros y produce el reporte ejecutivo con los KPI más relevant...

    €8 / hr Average bid
    €8 / hr Gns Bud
    14 bud

    Bank Loan ETL & Visualization Project Report 1. Abstract This project builds a complete ETL (Extract, Transform, Load) pipeline for bank loan analytics using PySpark and Python. It cleans, validates, and integrates branch, customer, and loan datasets into a unified master table. The pipeline standardizes financial data, generates analytical insights, and prepares the output for reporting and automated financial analysis. 2. Technologies Used Python PySpark Pandas Matplotlib CSV Files Java JDK (required for Spark) 3. Dataset Description This project uses three CSV datasets: – Branch details (branch_id, branch_name, branch_state) – Customer demographic information – Loan records linked to customers and branches 4. ETL Workflow The

    €218 Average bid
    €218 Gns Bud
    19 bud

    I have an existing SAS program that handles end-to-end data processing for a single SQL Database source. The code cleans raw tables, applies a series of transformations, then produces several aggregated outputs that feed downstream reports. I now need the entire workflow re-implemented in PySpark running on Azure Databricks so I can retire the SAS environment and take advantage of Databricks’ scalability. You will receive: • The original .sas files with inline comments that explain each step • A data-dictionary of the SQL tables involved • Sample input/output datasets to verify parity What I’m expecting from you: 1. A well-structured Databricks notebook (or .py files) that reproduces the SAS logic for data cleaning, transformation, and aggregat...

    €413 Average bid
    €413 Gns Bud
    41 bud

    I need an experienced engineer who can sit with me in Pune offline and provide hands-on, offline technical support for daily software-development tasks. (entire month but as per both convenience) The focus is strictly on Python and PySpark: reviewing code, optimizing Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular schedule ...

    €241 Average bid
    €241 Gns Bud
    5 bud

    Hi, thanks for the opportunity. I can support your Databricks and AI Agents project with strong skills in PySpark, SQL, Delta Lake, and data automation. I will handle ETL pipelines, data processing, AI agent integration, and workflow optimization. My rate is 8 USD per hour, and I can work 40 hours per week (320 USD weekly). I can start immediately and will work closely with your team for smooth delivery.

    €3 / hr Average bid
    €3 / hr Gns Bud
    1 bud

    Need a strong streaming experience person to help me wrote design develop and deploy Pyspark broker publishing job in EMR with Pyspark, MongoDB connector ,DocumentDB streaming(strong Kafka mongo) AWS, step functions, EMR, docker, Kafka architecture Cloudwatch, Airflow dags.

    €9 / hr Average bid
    €9 / hr Gns Bud
    33 bud

    I need an experienced engineer who can sit with me in Pune and provide hands-on, offline technical support for daily software-development tasks. The focus is strictly on Python and PySpark: reviewing code, untangling bugs, optimising Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular schedule that works for both of us.

    €197 Average bid
    €197 Gns Bud
    4 bud

    ...audit submissions. 13. Able to communicate, plan and execute BI platform Audit with internal audit team. Competencies for the job 1) Proven experience with big data solution design and development in Databricks, notebooks & schema design, development, best practice and notebooks Azure Dev Ops / CI-CD Pipelines 2) Hands On in Python PySpark, Spark SQL, Delta Live + Kafka; Azure SQL DB, Azure Data Factory, Azure DataBricks, Azure Synapse, Azure Data Lake, Delta, Pyspark, Python, Logic Apps, Azure DevOps, CI/CD implementation, Power BI / QlikSense, Blob Storage, ADLS, Azure Key Vault, ETL, SSIS 3) Experience in Query Development, Performance Tuning and loading data to Databricks SQL DW 4) Experience in data ingestion into ADLS, Azure Blob Storage, Azure Logic Apps 5) Prac...

    €233 Average bid
    €233 Gns Bud
    14 bud

    ...delivery 8. Ensure developments follow standard coding patterns, are fully documented for audit submissions. Competencies for the job 1) Proven experience with big data solution design and development in Databricks, notebooks & schema design, development, best practice and notebooks Azure Dev Ops / CI-CD Pipelines 2) Hands On in Python PySpark, Spark SQL, Delta Live + Kafka; Azure SQL DB, Azure Data Factory, Azure DataBricks, Azure Synapse, Azure Data Lake, Delta, Pyspark, Python, Logic Apps, Azure DevOps, CI/CD implementation, Power BI / QlikSense, Blob Storage, ADLS, Azure Key Vault, ETL, SSIS 3) Experience in Query Development, Performance Tuning and loading data to Databricks SQL DW 4) Experience in data ingestion into ADLS, Azure Blob Storage, Azure Logic Apps 5) ...

    €238 Average bid
    €238 Gns Bud
    6 bud

    Description: Need an experienced Databricks engineer to guide me through adding logging tasks to 2 workflows in Azure Databricks. What needs to be done: Add log_success and log_failure notebook tasks to 2 existing Databricks workflows Config...CRITICAL REQUIREMENT: All work must be done via Zoom screen sharing on MY machine You will guide/instruct me while I make the changes or you can do it I need to learn the process, not just get it done Must Have: Strong Azure Databricks workflows/jobs experience Experience with pipeline logging/monitoring patterns Patient teaching approach Tech Stack: Azure Databricks Unity Catalog Python/PySpark Azure DevOps (YAML configs) Timeline: Start ASAP To Apply: Share your Databricks experience and availability for Zoom sessions (mention your ...

    €22 Average bid
    €22 Gns Bud
    4 bud

    Project Title: Build End-to-End Data Cleaning, ETL Pipeline & SQL Analytics (PySpark) I need a skilled Data Engineer / Data Analyst to build a complete end-to-end data pipeline using the raw CSV files provided. The project involves data cleaning, transformation, building a star schema, implementing ETL logic in PySpark, writing analytical SQL queries, and performing data quality checks. The files included are: (dirty user data – nulls, duplicates, inconsistent casing) (messy categories and SKU formatting) (20k+ orders with mixed date formats, invalid numeric fields) (dirty SKUs, wrong quantities, duplicates) Scope of Work: Data Cleaning & Standardization Fix inconsistent casing, extra spaces, special characters Convert fields into

    €139 Average bid
    €139 Gns Bud
    17 bud

    PySpark EDA & Datasets Conversion - Must make use of PySpark for the Exploratory Data Analysis. Do not have to train model using PySpark - Add comments / describe the EDA - Need to convert from PySpark to Pandas for the test train split. - And load into PySpark - Dataset Hyperparameters: * Forced images to be 48x48 * Using PyTorch not IntensiveLock I can send you the notebook so far. You just need to get it to the point where PySpark data frame converts to pandas and then the train test split before training. If you use online resources then just copy and paste the links in the relevant notebook cells W.r.t. the convolutional neural network it is basic. Pooling layer The convolutional layer Nothing hardset, flexible at the moment and nothing...

    €3 / hr Average bid
    €3 / hr Gns Bud
    1 bud

    ...make use of PySpark for the Exploratory Data Analysis. Do not have to train model using PySpark - Add comments / describe the EDA - Need to convert from PySpark to Pandas for the test train split. - And load into PySpark - Dataset Hyperparameters: * Forced images to be 48x48 * Using PyTorch not IntensiveLock I can send you the notebook so far. You just need to get it to the point where PySpark data frame converts to pandas and then the train test split before training. If you use online resources then just copy and paste the links in the relevant notebook cells W.r.t. the convolutional neural network it is basic. Pooling layer The convolutional layer Nothing hardset, flexible at the moment and nothing computationally complex Pyto...

    €23 Average bid
    €23 Gns Bud
    12 bud

    I have eleven existing Databricks jobs that need to be packaged and shipped through the new Databricks Asset Bundles workflow. All code for the jobs is already written in PySpark; what’s missing is a clean, reusable that will: • Collect the Python scripts for each job into a single asset bundle • Resolve internal dependencies and set the correct task-level libraries • Push the bundle to my Databricks workspace (Repos or DBFS) • Programmatically create/update the eleven jobs with their respective schedules and cluster definitions The script must rely on PySpark for any data-processing logic that has to run during deployment, and should use the Databricks CLI or REST API (whichever you’re most comfortable with) to handle workspace inte...

    €8 Average bid
    €8 Gns Bud
    2 bud

    I need a SQL stored procedure converted to PySpark code. The stored procedure currently interacts with a PostgreSQL database and primarily requires DataFrame operations in PySpark. Requirements: - Convert SQL stored procedure to equivalent PySpark DataFrame operations - Ensure the logic and functionality remain consistent with the original SQL - Optimize the PySpark code for performance Ideal Skills & Experience: - Proficiency in PySpark and DataFrame operations - Strong knowledge of PostgreSQL and SQL - Experience with data transformation and migration tasks - Ability to write clean, maintainable, and efficient code Please provide examples of similar work done and any relevant certifications.

    €98 Average bid
    €98 Gns Bud
    12 bud

    ...of expert hands for about 2 ½–3 hours each day. We haven’t begun the migration yet, so you’ll step in right at the planning stage and guide it all the way through execution. The work centres on three pillars: • Data migration of relational databases into Snowflake • Building and hardening ETL pipelines in Python / PySpark • Creating and maintaining a clean CI/CD path for everything we deploy You’ll work with a stack that includes AWS, Snowflake, DBT, Python, PySpark and standard DevOps tooling for CI/CD. Along the way we’ll refine data models, set up automated tests, and make sure every job is production-ready before it moves through the pipeline. I’m based in Marathahalli, Bengaluru and strongly prefer so...

    €239 Average bid
    €239 Gns Bud
    13 bud

    I’m ready to replace ...Integration support – provide clear docs, sample calls and, where necessary, helper SDK snippets so my team can wire the API into both the React and Flutter clients without blocking on you. 4. Evaluation – an offline notebook illustrating precision/recall or NDCG on a held-out set, and an online A/B framework outline so we can monitor lift after launch. Nice-to-haves include feature engineering in PySpark, use of TensorFlow Recommenders, and deployment via AWS SageMaker, but I’m open to your preferred stack as long as latency stays low and the pipeline is maintainable. If you have shipped a recommendation system for services before, especially across web and mobile, I’d love to see it. Let’s make our users feel like the p...

    €1133 Average bid
    €1133 Gns Bud
    47 bud

    ...Monitor, troubleshoot, and optimize cloud-based data workflows. Participate in code reviews and follow best practices for maintainable and scalable data solutions. Required Qualifications: Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 7+ years of hands-on experience in data engineering, with strong focus on AWS services. Proficiency in Python, SQL, and PySpark. Expertise in AWS data services: S3, Redshift, Glue, EMR, Athena, Lambda, Kinesis. Experience designing ETL pipelines, data lakes, and cloud-based data warehouses. Knowledge of CI/CD processes, version control (Git), and agile methodologies. Strong analytical, troubleshooting, and problem-solving skills. AWS certifications like AWS Certified Data Analytics – Sp...

    €7 / hr Average bid
    €7 / hr Gns Bud
    15 bud

    ...preferred). - Experience with containerization (Docker) and deployments. - Knowledge of observability and monitoring tools (Grafana, OpenTelemetry, Application Insights, AI Instrumentation). - Solid understanding of clean coding practices and modular design. - Strong problem-solving skills, communication, and ability to work in a collaborative environment. Preferred Skills: - Experience with PySpark for big data processing and analytics. - Exposure to Kubernetes (ArgoCD). - Experience with distributed task orchestration (Celery, Airflow) or messaging (Kafka, RabbitMQ). - Familiarity with advanced logging and monitoring best practices. - Familiarity with bash, PowerShell scripting, excel VBA - Strong SQL knowledge and practical experience with relational databases. What we offe...

    €14 / hr Average bid
    €14 / hr Gns Bud
    8 bud

    ...data, process it in real-time with machine learning algorithms, and store analysis results for visualization. Key components proposed include Apache Kafka for data streaming, Apache Spark (Streaming) for real-time processing, Apache Hive (or Hadoop HDFS) for data warehousing, and MongoDB for storing processed results. All code will likely be written in a high-level language (such as Python via PySpark) to integrate these components. Below, we break down the project requirements and plan into specific sections. Big Data Tools and Frameworks: The pipeline will leverage the following technologies: • Apache Kafka: Kafka is a distributed publish-subscribe messaging system ideal for ingesting and transporting real-time data streams. It is highly scalable, fault-tolerant, and low-l...

    €123 Average bid
    €123 Gns Bud
    27 bud