How To Set Up Zeppelin For Analytics And Visualization
In this article, you learn how to create and configure a Zeppelin instance on an EC2, and about notebook storage on S3, and SSH access.
...cleaning • remove duplicates • normalize titles • unify IDs 4. ETL pipeline • scripts to refresh the dataset monthly 5. Matching keys • imdb_id • tmdb_id Database Size Expectations Movies: 600k + TV Series 200k + Technical Requirements Preferred stack: • Python • Pandas • PostgreSQL or SQLite • API integration • ETL scripting Important Constraints Do NOT scrape IMDb pages. Use official datasets and APIs only. Goal of the Project The database will be used to: • compare against a personal movie rating history • calculate similarity between titles • generate predicted ratings • identify highly compatible unseen movies Accuracy of metadata is critical. Ideal Candidate Experie...
Custom ETL pipeline with postgress DB
...any schema changes made - Collaborate async with backend and product teams to align on timelines and minimize disruption You Bring: - Proven experience migrating large MySQL databases to PostgreSQL - Hands-on experience with Supabase (must-have) - Strong SQL skills across both MySQL and PostgreSQL dialects - Understanding of schema design, indexing, and relational data modeling - Experience with ETL pipelines and data transformation tooling - Ability to identify and resolve data quality issues during migration - Strong attention to detail and a zero-tolerance approach to data loss Bonus: - Experience with pgloader, AWS DMS, or similar migration tools - Familiarity with Row Level Security (RLS) in Supabase - Experience working within SaaS or product environments - Scripting expe...
...an accomplished Senior Data Engineer to sit in on our technical interviews for roughly two hours each day. The role is purely evaluative: you will craft probing questions, join live video calls, and quickly score each candidate’s depth of knowledge across Python, Scala and SQL. Our stack centres on Azure and Databricks, so practical insight into large-scale Spark/PySpark jobs, data-model design, ETL orchestration and cloud performance tuning is essential. Candidates frequently discuss streaming, optimisation strategies and modern AI/ML add-ons, so any hands-on exposure to libraries such as PyTorch, NumPy, SciPy or TensorFlow will help you challenge them at the right level, though it is not mandatory. Availability is limited to two focused hours per weekday; I will sha...
I need a compact proof-of-concept that automatically extracts Documents from an Azure Blob container, captures their accompanying metadata, and lands both in a SharePoint Online document library. Only files classed as “Documents” (e.g., PDF, DOCX, XLSX) are in scope—no images or videos. The PoC must demonstrate that file contents arrive intact, their metadata is mapped to the correct SharePoint columns, and the process can be triggered on demand or on a simple schedule. You are free to choose the most suitable approach—Azure Data Factory, Logic Apps, Power Automate, an Azure Function, or a lightweight script in C# or Python using the Azure SDK and Microsoft Graph/SharePoint REST API are all acceptable. The key is clarity and repeatability; hard-coded secrets or one...
...Responsibilities Automation Architecture: Design and deploy high-concurrency automation clusters using Node.js (TypeScript) and headless browser frameworks (Playwright/Puppeteer). Stateful Flow Simulation: Develop sophisticated scripts to simulate complex user journeys, including geo-localized session management and extraction of landed costs (Price + Tax + Freight). Neural Data Normalization: Architect ETL pipelines to transform unstructured web data into our Internal Universal Material Schema using advanced regex and LLM-assisted parsing. Perceptual Indexing: Implement pHash (Perceptual Hashing) logic to identify and de-duplicate identical physical products across the national market. Infrastructure Management: Manage auto-scaling containerized workloads on AWS (Fargate/EC...
...datasets. --- ## B. Predictive Models AI must provide: * Cash flow forecasting * Inventory restocking prediction * Revenue forecasting * Fraud risk detection * Expense anomaly detection * Tax anomaly detection * Slow-moving inventory alerts * Client financial risk scoring --- ## C. Architecture Requirements The developer must propose: * Separate AI microservice OR data warehouse model * ETL pipeline from client databases * Secure API-based data access * Scheduled model retraining * Explainable outputs * Dashboard integration into firm admin portal --- # 9. Security (Non-Negotiable) The system must follow zero-trust principles. ## Administrative Access * All admin access must be SSH-only via: * VPN or Zero-Trust Gateway * Bastion host * Key-based SSH authenticati...
...covers the entire pipeline: extracting data from the current MySQL instance, scrubbing and de-duplicating it, mapping old structures to the target schema, and applying significant transformations to fit new business rules. Solid SQL is obviously essential, and you should already feel at home with large tables, indexes, constraints, and performance tuning. Experience scripting in Python or using ETL utilities (e.g., Talend, Pentaho, custom shell/Python pipelines) will make life easier because several transformation rules are complex. I will provide full access to the existing database schema, sample data, and the target data model on day one. Throughout the month you will collaborate with our in-house QA team to validate row counts, referential integrity, and transformation accur...
The task centres on moving content that currently lives in Azure Blob Storage into two destinations—SharePoint Online and a third-party cloud repository that we will finalise together. The source consists of mixed-format files: Word documents, images, CSVs and a few other common types. I need a lightweight yet robust ETL solution that can: • automatically detect new or updated blobs, • perform any required transformations (basic metadata enrichment, naming conventions, optional compression where helpful), and • deliver the output to the target platforms with solid error handling and retry logic. An Azure-native approach—Azure Data Factory, Logic Apps, Functions, or a combination—will fit best, but I’m open to other suggestions so lo...
...Advanced filtering UI Data grid with pagination & lazy loading Infrastructure Kubernetes / Docker Load balancing CDN Caching (Redis) Object storage for raw data Required Features ================ Advanced Search Engine Full-text search Multi-filter query builder Auto-suggestions Fuzzy matching Aggregations (sum, count, trends) Data Handling ============ Bulk data ingestion pipelines ETL processing Schema optimization Index optimization Performance Requirements ====================== Query response in seconds Pagination with deep offset handling Parallel query execution Caching for repeated queries Security & Access =============== User authentication Role-based access Paid subscription model (optional phase 2) Dataset Details ============= 10+ TB...
I have several CSV and JSON files arriving on a regular schedule, and I need a reliable ETL pipeline that ingests these flat files, cleans the raw data, and validates every record before loading it into my environment. The core of the job is to: • Read multiple flat-file formats (mainly CSV, with the occasional JSON). • Apply thorough data-cleansing rules—removing duplicates, enforcing data types, flagging out-of-range values, and normalising text fields. • Run validation checks so that only clean, schema-compliant rows proceed to the load step. I’m happy for you to choose the stack you are most efficient with—Python (pandas, PySpark), Talend, or another ETL tool—as long as the final solution is reproducible and can be trigge...
I have direct access to our e-commerce sale...connections or scheduled refresh against the database. I’ll provide the connection string and a sample export; you’ll propose the data model, set up the transformations, and surface the insights in clean, intuitive visuals. Exact metrics and calculated fields can be finalised together once the framework is in place. Deliverables • Secure connection to the sales database, including any necessary ETL or query optimisation • Fully-interactive dashboard with filter panels, drill-through views, and export options • Clear hand-off documentation covering data model, refresh schedules, and how to extend the report Please outline the tool you prefer, the timeline you need, and one example of a similar dashboar...
For my data‐management course homework, I must build a complete ETL workflow with any free or open-source tool such as Talend Open Studio, Apache NiFi, or Pentaho. The task starts with extracting a small, publicly available sample data set (CSV, JSON, or a simple relational dump), then cleaning and transforming it—deduplicating records, resolving missing or inconsistent values, and normalising key fields where needed. Once the data is tidy, it has to be loaded twice: first into a staging target (a plain relational table or file storage) and then into a basic star- or snowflake-style data-warehouse schema so I can run simple analytical queries afterward. I will need the full project files, transformation jobs, and a concise write-up that walks through each step, explains the...
I want to spend one focused week mastering end-to-end data-engineering pipelines on Google Cloud. The heart of the bootcamp is hands-on practice: together we will build, deploy, monitor and troubleshoot real ETL flows until I can run them solo with confidence. Primary learning goal My priority is data ingestion, extraction, transformation and storage. We will start with the full ingestion toolkit—DataFlow, Pub/Sub and Cloud Storage—then chain that work into the wider GCP ecosystem: • Storage layers: BigQuery, Bigtable, Cloud Spanner, Cloud SQL, Datastore / Firestore • Transformation & orchestration: DataFusion, DataProc, Cloud Composer, Cloud Scheduler, Cloud Functions • Data quality & cataloging: DataPrep, Data Catalog • Visualisati...
looking for an experienced Azure Data Engineer to support and enhance our existing data platform on an ongoing basis. You should be strong in: Azure Data Factory (ADF) for building and maintaining ETL/ELT pipelines Azure Databricks and PySpark for large‑scale data processing Python for data engineering utilities, automation, and integration Delta Lakes/Lakehouse concepts, performance optimization, and troubleshooting Working with SQL‑based data sources, data warehousing, and BI integrations Responsibilities Design, build, and optimize data pipelines in Azure ADF and Databricks Develop and maintain PySpark and Python jobs for batch and near real‑time workloads Implement best practices for data quality, observability, and monitoring Collaborate with our internal team, follo...
...with every URL and a clear field dictionary so you can move straight to extraction with Python, BeautifulSoup, Scrapy, or whichever tool chain you prefer. The finished dataset must come back as a single Excel/CSV file. Before you hand it over, give it a quick polish: apply basic, uniform formatting, drop any duplicates, and make sure each column lines up with the field names I supply. No heavy ETL work—just that first-pass cleanup so I can analyse the file immediately. Deliverables • CSV (or XLSX) containing 13 columns × ~285 rows, fully populated where data exists • Basic formatting and de-duplication applied • Short note flagging any URLs or fields that could not be captured A quick turnaround is ideal; the job should be straightforward...
We are seeking an experienced Data Engineering & Data Analytics Specialist to support the design, development, and optimization of modern data pipelines and analytics solutions. - Scope of Work Design, build, and maintain scalable ETL/ELT data pipelines Develop and optimize data models for analytics and reporting Work with cloud data platforms (AWS, Azure, or GCP) Implement data validation, quality checks, and monitoring Build dashboards and actionable insights using BI tools Collaborate on performance tuning and data architecture improvements - Required Skills Strong experience with SQL and Python Expertise in modern data stack tools (e.g., Snowflake, BigQuery, Redshift, Databricks) Hands-on experience with orchestration tools (Airflow or similar) Experience with db...
I need a skilled data analyst to help with Extracting, Transforming, and Loading (ETL) financial data into Excel. The focus will be on revenue and expenses. Key Requirements: - Analyze financial data, specifically revenue and expenses - Structure data currently from external sources - Proficient in Excel and ETL processes Ideal Skills: - Experience with financial data analysis - Strong Excel skills - Familiarity with handling external data sources
...and sitting in our SQL-based data warehouse; the missing piece is a set of polished, interactive Power BI visuals that turn those rows into clear stories for management. Here is what I need: • Connect Power BI Desktop directly to the database (SQL credentials will be provided) and build a streamlined data model—relationships, calculated columns, and DAX measures where helpful, but no heavy ETL work. • Design visually consistent report pages that highlight our key metrics with slicers, drill-through, and tooltips so users can explore the data on their own. • Publish the finished .pbix to Power BI Service, set up a workspace, and configure refresh schedules. • Hand over the final .pbix, the published dashboard link, and a short walkthrough expl...
I’m preparing fo...after each assignment will keep me on track. Deliverables • A week-by-week study plan mapped to the exam objectives • Reusable Databricks notebooks and Snowflake scripts created during lessons • Practice questions or mini-projects for each topic and reviewed answers • Final mock assessment-style project that ties everything together Acceptance criteria I can independently build a simple ETL pipeline in Databricks, tune a Snowflake query to meet a stated SLA, explain the platform architectures, and score at least 80 % on the provided mock exam. When those boxes are ticked, I’ll know the tutoring engagement has succeeded. Please outline your proposed schedule, teaching approach, and any prior experience guiding students t...
I need a small, reliable system that polls the Upstox API once a minute for roughly 500 equities and captures only one metric—Market Cap. Each pull should be written straight i...• A single command spins up the database schema and scheduler • At least eight trading hours’ worth of data loads without a miss or duplicate row • Daily report matches manual spot checks of raw figures • Code passes a quick review for clarity, error handling, and PostgreSQL best practices If you have previous experience with market data feeds or the Upstox SDK that’s a bonus, but solid ETL skills and a knack for building resilient jobs are what really matter here. I would like to see the first working pull-and-store cycle within a few days so we can fine-tune ...
...future team members can follow your logic • Basic deployment notes (gateway setup, refresh schedule, role-level security) Acceptance criteria: the report refreshes without errors directly from the SQL source, renders each visual in under three seconds on typical desktop hardware, and is free of hard-coded file paths or credentials. If you have experience optimising DAX, using Power Query for ETL, and designing executive-ready dashboards, I’d love to see examples of your previous work and get started right away....
...entire ETL process, integrates with Snowflake for storage and transformation, and connects to Dun & Bradstreet (D&B) for enrichment and deduplication. This agent should not only process data but also understand it: identify quality issues, propose corrections, enrich missing details, and guide users through validation when needed. • Build a CSV ingestion pipeline with schema validation and error reporting. • Implement automated data profiling to detect missing values, inconsistencies, and duplicates. • Integrate AI models to generate cleansing suggestions and standardize formats. • Connect to D&B APIs for company lookup, D‑U‑N‑S retrieval, and firmographic enrichment. • Implement fuzzy matching and deduplication logic combining AI scorin...
...focuses on designing and implementing scalable, automated data pipelines and integration workflows within Oracle Cloud Infrastructure (OCI) to support Finance and Order-to-Cash (O2C) data flows. Key Responsibilities * Design, develop, and manage data integration pipelines using OCI Data Integration (OCI DI) for ingestion, transformation, and delivery of Finance/O2C data. * Build and orchestrate ETL/ELT workflows leveraging OCI services such as Object Storage, Data Flow, Data Catalog, Autonomous Data Warehouse (ADW), and Data Lakehouse. * Integrate data from Oracle Cloud ERP, legacy sources, and external applications into unified data models for analytics and reporting. * Implement data transformations, data quality checks, and reconciliation logic aligned with Finance business r...
...a clear, descriptive look at past inventory activity captured in our Oracle ERP system. The goal is a concise summary report that tells me, at a glance, how stock levels have moved, where turnover is strong or weak, and which items or locations stand out as anomalies. You will extract the relevant inventory tables (or work with a CSV extract I provide), shape the data with SQL or any preferred ETL approach, and present the findings in an executive-friendly format. Accuracy and interpretability are more important than advanced forecasting; this is strictly descriptive analysis. Core deliverables • Cleaned and consolidated dataset of historical inventory transactions • Executive summary report highlighting key metrics (stock movement, average holding days, turnover,...
...a clear, descriptive look at past inventory activity captured in our Oracle ERP system. The goal is a concise summary report that tells me, at a glance, how stock levels have moved, where turnover is strong or weak, and which items or locations stand out as anomalies. You will extract the relevant inventory tables (or work with a CSV extract I provide), shape the data with SQL or any preferred ETL approach, and present the findings in an executive-friendly format. Accuracy and interpretability are more important than advanced forecasting; this is strictly descriptive analysis. Core deliverables • Cleaned and consolidated dataset of historical inventory transactions • Executive summary report highlighting key metrics (stock movement, average holding days, turnover,...
...analysts, engineers, or BI specialists with strong communication and writing skills to contribute technical articles to our blog. Your mission? Share hands-on experience, real-world examples, and practical tips to help fellow data professionals work smarter. Who are we? ClicData is an all-in-one data management and business intelligence platform (SaaS), offering data connectivity, warehousing, ETL, data visualization, and automation. Our audience includes data professionals and data-savvy business leaders, primarily in mid-market companies across North America. Why write with us? - Were looking for long-term collaborators not one-off gigs. That means predictable, recurring income for you. - You'll be credited as the author of each piece you write your expertise will be sho...
...DataStage, QualityStage, or any complementary tools you recommend—so that golden records are created reliably and maintained automatically. What I expect from you • A short discovery session to understand the present architecture and pain points • A documented integration strategy that covers databases, cloud endpoints and REST/SOAP APIs • Detailed mapping specifications and any DataStage/ETL jobs or services needed to implement them • Configuration of matching, standardisation and survivorship rules inside MDM • Test plan plus before/after data-quality metrics to prove accuracy gains • Knowledge-transfer call so I can operate and extend the setup once you are done If you have demonstrable experience delivering clean, tr...
... PostgreSQL, MySQL, etc.) • Solid understanding of software engineering best practices and testing • Experience building and maintaining scalable full stack applications • Strong communication and team collaboration skills • Expertise with Python and various python packages Preferred Skills • Advanced SQL with PostgreSQL: joins, subqueries, CTEs, and query optimization • Understanding of ETL and Data ingestion • Experience with core AWS services: EC2/ECS, ELB, S3, CloudFront, IAM • Experience in Scala • Familiarity with microservices architecture and distributed systems • Knowledge of Terraform, Docker, Kubernetes • Experience with CI/CD pipelines and end-to-end testing tools like Cypress • Knowledge/Experie...
I need an experienced Python engineer who works confidently with AWS Glue to build and manage a small suite of data-integration jobs for a Hyderabad-based project. The core of the work is to design and automate Glue ETL pipelines that pull data from our production databases, catalog it accurately, and transform it into analytics-ready tables. Here is what I expect from the engagement: • Develop, test, and deploy Glue ETL jobs in Python. • Populate and maintain the Glue Data Catalog so new tables are discoverable and properly version-tracked. • Implement efficient transformation logic that cleans, enriches, and partitions data for downstream reporting. • Optimise job performance and cost by selecting the right worker types, job parameters, and data...
I need an experienced Python engineer who works confidently with AWS Glue to build and manage a small suite of data-integration jobs for a Hyderabad-based project. The core of the work is to design and automate Glue ETL pipelines that pull data from our production databases, catalog it accurately, and transform it into analytics-ready tables. Here is what I expect from the engagement: • Develop, test, and deploy Glue ETL jobs in Python. • Populate and maintain the Glue Data Catalog so new tables are discoverable and properly version-tracked. • Implement efficient transformation logic that cleans, enriches, and partitions data for downstream reporting. • Optimise job performance and cost by selecting the right worker types, job parameters, and data...
...AI-powered natural language query interface on top. ## Scope of Work **Phase 1: Data Integration to BigQuery** Connect and automate data pipelines from: - Google Analytics 4 (GA4) - Google Ads - Google Search Console - Google Tag Manager - Google Merchant Center Requirements: - Set up automated, near real-time data transfers - Design efficient BigQuery schema with proper data modeling - Implement ETL/ELT processes with data quality checks - Create unified views combining data across platforms - Import historical data - Document all data flows and transformation logic **Phase 2: AI Integration Layer** Implement AI-powered interface for natural language querying of BigQuery data. Requirements: - Configure secure connection between AI platform and BigQuery - Set up authentica...
I need an experienced Python engineer who works confidently with AWS Glue to build and manage a small suite of data-integration jobs for a Hyderabad-based project. The core of the work is to design and automate Glue ETL pipelines that pull data from our production databases, catalog it accurately, and transform it into analytics-ready tables. Here is what I expect from the engagement: • Develop, test, and deploy Glue ETL jobs in Python. • Populate and maintain the Glue Data Catalog so new tables are discoverable and properly version-tracked. • Implement efficient transformation logic that cleans, enriches, and partitions data for downstream reporting. • Optimise job performance and cost by selecting the right worker types, job parameters, and data...
제가 보유한 대용량 숫자 데이터를 MongoDB에 정리해 간편하게 테이블 형태로 확인할 수 있도록 도와주세요. 작업 범위 • 원본 파일 전달 후, 최적의 MongoDB 컬렉션·필드 구조 설계 • 데이터 정합성 점검 및 클렌징(중복 제거, 형식 통일 등) • 효율적인 인덱스 설정과 기본 집계 파이프라인 구현 • MongoDB Compass 또는 웹 기반 대시보드에서 한눈에 볼 수 있는 테이블 뷰 구성 완료 기준 - 모든 레코드가 오류 없이 import되고 쿼리 테스트 통과 - 컬렉션 구조와 주요 쿼리 예시를 문서화(PDF 또는 Markdown) - 테이블 뷰 스크린샷 및 사용 안내 전달 필요 기술: MongoDB, 데이터 모델링, ETL 스크립트(JavaScript/Python 택일 가능) 데이터를 빠르고 깔끔하게 조회할 수 있도록 체계적인 접근을 기대합니다.
I’m overhauling a Power Apps Canvas app and power query and want the interface to look and feel far slicker. The sole focus is a dashboard that surfaces our sales data in a way managers can grasp at a glance. That means: • Designing a responsive Canvas screen with charts, KPI cards and slicers that load fast on desktop and mobile • Shaping and scheduling the underlying data with Power Query so the numbers update automatically—no manual refreshes or copy-pasting • Leaving me with a short hand-over video or doc so I can tweak visuals or extend the query later Everything sits in SharePoint lists and an Excel workbook today; use whatever combination of Dataverse, collections or direct connectors keeps things fast and maintainable. Native Power Apps contro...
...Pydantic v2 Arquitectura modular por servicios Base de datos: PostgreSQL local Base vectorial: Qdrant local (o alternativa previamente aprobada) Contenerización: Docker obligatorio docker-compose mínimo para despliegue Prohibido instalar dependencias manualmente en servidor 3.3 Arquitectura de Proyecto Obligatoria Estructura mínima: /backend /api /services /connectors /rag /etl /agents /frontend /infrastructure /docs /tests No se aceptarán estructuras fuera de este estándar sin aprobación previa. 3.4 Normas de Desarrollo Repositorio Git propiedad del cliente Branching mínimo: main / dev / feature/* Commits descriptivos obligatorios Logging estructurado (JSON) Manejo exhaustivo de errores Siste...
...unifying those streams into a clean, continuously updated dataset. On top of that data layer, the build must train, evaluate, and deploy the best-performing predictive models—whether regression, decision-tree, neural-network, or any other technique that proves superior—then surface the results through a lightweight web interface and an API our teams can call in real time. Key deliverables • Automated ETL jobs and data-quality checks for the three sources mentioned above • Modular training pipeline with experiment tracking, lift/ROC reporting, and feature-importance visuals • Scoring service exposed via REST (or GraphQL) endpoints plus an intuitive dashboard for non-technical users • Deployment scripts, environment setup notes, and a live han...
I have several disparate sources holding our inventory information and I want it all pulled together into one clean, well-structured Microsoft SQL Server database file. The job is straightforward in principle: extract every piece of inventory data you can reach, reconcile d...complete when I can: • restore or run the script on my Microsoft SQL Server instance without errors, • query a consolidated Items table and see one row per unique SKU, • join stock levels to their warehouse locations with no orphan records, and • export a sample CSV of current inventory that matches today’s live figures. T-SQL proficiency is essential; SSIS, Azure Data Studio, or any other ETL tooling you prefer is fine as long as the final deliverable is the ready-to-go SQL Se...
Estoy por arrancar una migración con Informatica PowerCenter y necesito apoyo directo en la construcción de los procesos ETL. El conjunto de datos con el que trabajaremos está en Oracle y lo extraeremos desde una base de datos SQL; el objetivo es una migración limpia, controlada y auditable hacia el nuevo esquema. Qué espero de ti • Diseñar y desarrollar mappings, sesiones y workflows en PowerCenter que cubran todo el flujo de extracción, transformación y carga. • Incluir manejo de errores, validaciones de calidad y registro detallado de eventos. • Documentar cada mapping en el repositorio y entregar un pequeño manual de despliegue/rollback. Aceptaremos el trabajo cuando: - Todos los procesos s...
...through Power BI, so once the model is validated I’ll ask you to craft intuitive dashboards that highlight drivers, confidence ranges and any red-flag anomalies the model detects. Solid statistical grounding is essential; I want clear explanations of feature importance, assumptions and limitations that business stakeholders can grasp quickly. Big-data exposure, cloud familiarity (Azure, AWS or GCP), ETL pipeline design and MLOps practices are all welcome extras—you’ll have room to propose improvements if they make the solution more robust or scalable. Deliverables I need from you: • A well-documented predictive model with reproducible code and clear version control • Cleaned and transformed datasets stored back into SQL (or a recommended alternativ...
...social protection, and legal frameworks. Below are the positions and their respective qualifications: 1. Senior Project Manager - PMP or PRINCE2 certification - 10 years of experience managing complex projects in IT/e-government 2. Principal IT Architect - Degree in Computer Engineering or equivalent, TOGAF certification - 10 years of experience as an enterprise architect with expertise in API, ETL, cloud technologies 3. Business Architects/Analysts - Degree or Master’s in Computer Security, Data Science, or Computer Science - Expertise in implementing Digital Public Infrastructure (DPI) approaches 4. Cybersecurity and Data Security Expert - Degree or Master’s in Cybersecurity or IT Governance - 10 years of experience with CSIRT/SOC, PKI, IAM 5. Data Management E...
...recorded purchase prices Notifications or alerts for discrepancies or missing data Secure storage of credentials (where needed) and compliance with data protection/privacy guidelines Easy scalability as the number of suppliers increases Required skills/experience: API or data source integration (REST/SOAP, CSV/JSON/XML imports; web scraping only when permissible) Experience with data pipelines, ETL processes, scheduling (Cron, CI/CD tools, Zapier/Integromat or similar) Knowledge of price and stock data reconciliation, logging, error handling Handling secure authentication, token management, secrets management Ability to implement discrepancy alerts (email/Slack/chat alerts) and provide a simple dashboard or reporting interface What we offer: Clear list of suppliers with access ...
...auditability, integrity, and security 3. HIGH-LEVEL CLOUD ARCHITECTURE Core components: Network Layer AWS VPC (multi-AZ) Private subnets per regulated institution Central supervisory subnet Data Layer S3 Data Lake (Raw / Processed / Curated) Redshift / Aurora (analytics storage) Object Lock for integrity Compute & Processing Lambda (validation, rules engine) EC2 (stress testing engines, Monte Carlo) Glue ETL (transform pipelines) Step Functions (workflow orchestration) Streaming & APIs API Gateway (data submissions) Kinesis (real-time data ingestion) AI / ML SageMaker (fraud detection, early warning models) Neptune (graph AML network analytics) Redshift ML (ratio prediction) Monitoring & Security IAM / RBAC KMS encryption CloudTrail / GuardDuty CloudWatch Visualizati...
I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeline. All source data arrives via GraphQL APIs, so the job includes handling authentication, pagination, and schema inference before landing raw payloads in Delta tables. A dedicated cleaning stage must then standardise and validate the data before it moves on to the curated layer. The structure should be modular—ideally a bronze/silver/gold notebook hierarchy—so I can slot in new sources or extra transformations without touching the core logic. I also want a lightweight Python package (wheel) that wraps the GraphQL connector and can be attached to any cluster. Acceptance criteria • Parameter-driven notebooks organ...
Job Title: NestJS Backend Developer – High-Performance Car Bulk Import (ETL) The Challenge We are looking for a senior-level NestJS developer to build a robust, production-ready car data import engine for our vehicle marketplace. This isn't a simple "upload and save" task; we require a sophisticated streaming pipeline capable of processing massive datasets (CSV, XML, JSON) with minimal memory footprint and high reliability. Core Task Build a POST /imports/cars endpoint that: Automatic Format Detection: Handles multipart/form-data and identifies the file type (CSV, XML, or JSON) programmatically. Stream-Based Processing: Processes data using Node.js Streams / AsyncIterables. The application should never load the full file into RAM. Data Pipeline: Implements a...
- Designed automated ETL routines to standardize disparate source formats, reducing manual reconciliation time by 40%. Performed root‑cause analysis and cohort studies to identify process bottlenecks and cost drivers. Built enterprise dashboards with row‑level security, incremental refresh, and performance tuning; improved executive visibility into KPIs. Implemented and supported ERP reporting modules, mapped master data across modules, and led data migration and validation during upgrades. - Established validation rules, lineage documentation, and data quality KPIs to maintain trust in analytics. - Replaced manual spreadsheets with parameterized Power BI reports and scheduled dataflows, saving recurring effort and reducing errors. - Performance Metrics: Delivered dashboards that i...
...always one step away. I’m comfortable with tools such as Python, Pandas, LangChain, Node, SQL, Power BI, Tableau, or any similar stack you can justify. Key deliverables • Deployed WhatsApp agent(s) connected through the WhatsApp Business API - WhatsApp channel is ready. • Retrieval-augmented knowledge base so the bots surface the latest information without hallucinations .. Critical • Automated ETL jobs (n8n, Airflow, or your suggested alternative) feeding a structured data store • Reusable analysis scripts/notebooks with documented logic • Interactive dashboards and geo-visualisations accessible via web link • Deployment guide and a brief hand-off walkthrough Please respond ASAP with links to past conversational AI or data-analyti...
...and loads it downstream for ETL processing. The transfer works, but network-level performance is far below what I need. Here is what I’m looking for: • Diagnose the current ADO.NET-to-Oracle connection, identify any network, buffer, or packet-size bottlenecks, and benchmark the baseline throughput. • Tune the SSIS data flow (buffers, rows per batch, commit size, async settings, etc.) and, if necessary, adjust the Oracle driver or provider configuration. • Provide an updated package or detailed change list so I can reproduce the performance gains in other environments. • Produce a concise report summarizing findings, before-and-after metrics, and next-step recommendations. Source: another Oracle database accessed via ADO.NET. Goal: reliable, hi...
Job Description We are looking for an experienced Azure Data Engineer / Data Integration Specialist to design and implement a robust, scalable data pipeline that pulls data from the Blackbaud API and loads it into an Azure SQL Database using Azure Data Factory (ADF). The goal is to have a fully automated, secure, and monitored ETL pipeline that runs on a scheduled basis and supports future scaling. Project Scope 1. Data Ingestion Connect to Blackbaud REST APIs (OAuth authentication) Handle pagination, rate limits, and API throttling Extract multiple endpoints (e.g., constituents, gifts, transactions, etc.) 2. Data Transformation Clean, normalize, and structure raw API JSON Handle nulls, schema drift, and data type conversions Add audit fields (load date, source system, batch id) 3. ...
Top Ranked Requirements 1. Adobe/AEM Architect: Expert-level structuring of Adobe Analytics (segments, dimensions, templates). This is the "heavy lift" of the role. 2. The "Storyteller" (Strategy Bridge): 5+ years of experience turning data into strategic recommendations for Creative and Strategy teams. 3. Technical Data Handling: Hands-on proficiency with SQL/Python for ETL and data cleaning, and BI tools (Domo, Tableau, Looker) for automation. 4. Governance Lead: The ability to create "durable" frameworks—naming conventions, tagging plans, and intake processes—that ensure data stays clean year-over-year. Core Responsibilities (The "What") • Design & Implement: Build measurement frameworks and "always-on" tr...
In this article, you learn how to create and configure a Zeppelin instance on an EC2, and about notebook storage on S3, and SSH access.