We are hiring Data Engineer for our client. Candidate needs to have 6 to 8 years of exp in Data Engineering. Linkedin is must. Must needed skills: Pyspark, Python, AWS services, Teradata Vantage, CI/CD technologies, Terraform, SQL Job Description Design, develop and maintain ETL platforms for various business use cases which are fault tolerant, highly distributed and robust. Analyze large sets of structured and semi structured data for business analytics and ETL design. Translate business needs and vision into roadmap, project deliverables and organization strategies. Design and implement ETL solutions using leveraging cloud native platforms. Collaborate with analytics and business teams to design data models that feed business intelligence tools, increasing data accessibility and encouraging data driven solutions. Skills and Experience Required:
- Good experience on designing and developing data pipelines for data ingestion and transformation using Spark. Distributed computing experience using Pyspark. Good understanding of spark framework and spark architecture. Experience working in Cloud based big data infrastructure. Excellent in trouble shooting the performance and data skew issues. Must have good understanding of spark run time metrics and tune applications based on metrics. Deep knowledge in partitioning, bucketing concepts of data ingestion. Good understanding of AWS services like Glue, Athena, S3, Lambda, Cloud formation. Preferred working knowledge on the implementation of datalake ETL using AWS glue, Databricks etc. Experience with data modelling techniques for cloud data stores and on prem databases like Teradata, Teradata Vantage (TDV)etc Preferred working experience in ETL development in Teradata vantage and data migration from on prem to Teradata vantage. Proficiency in SQL, relational and non-relational databases, query optimization and data modelling. Experience with source code control systems like Gitlab. Experience with large scale distributed relational and NoSQL database systems.
- Pyspark, Python, AWS services, Teradata Vantage, CI/CD technologies, Terraform, SQL