Description
Design, develop, and maintain data pipelines and ETL processes to ingest, transform, and load data from various sources into data warehouses or lakes.
Develop and optimize batch and streaming data solutions using tools such as Apache Spark or PySpark.
Implement data integration solutions using Apache Kafka for real-time data streaming and processing.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver robust, scalable solutions.
Ensure data quality, integrity, and governance across all systems and processes.
Monitor, troubleshoot, and optimize data pipeline performance.
Implement best practices for data architecture, data security, and compliance within cloud environments (AWS, Azure, or GCP).
Write and maintain complex SQL queries for data analysis and reporting.
Automate data workflows, testing, and deployments using CI/CD pipelines.
Provide technical expertise and guidance to team members and stakeholders on data engineering best practices
Experience Required
Skills Required
Python
ETL
Spark
AWS
Azure
GCP
SQL
Skill Name 2
Skill Name 3
Skill Name 4
Skill Name 5
Skill Name 6
Benefits on Offer