13 Mar
Lead/Sr. - Hadoop/Spark/BigData/AWS
California, Newarkcitycolumbuscitydallasandhouston 00000 Newarkcitycolumbuscitydallasandhouston USA

Vacancy expired!

Lead/Senior - Hadoop/Spark/BigData/AWS Total Positions : 5 Location: Delaware : Newark City .OHIO Columbus City , Texas Dallas and Houston Deadline - sooner the better Work Description: Experience level 8+ Years Excellent communication Currently the data pipeline is setup in OnPrem Hadoop Data Lake clusters. Various Data Processing operations are done using Spark, Scala and Hive-QL on these OnPrem servers. All the Data pipelines are in Batch mode. The plan is to migrate all the OnPrem severs to AWS. For some of the modules, the AWS migration activites are already in progress and enhancements are under development. Skill-sets Required: Hadoop Basic understanding of Hadoop distributed architecture and its eco-systems Spark - Clear understanding of spark architecture, performance optimization techniques, spark DataFrame based transformations and actions, Monitoring Spark jobs, troubleshooting memory issues etc. AWS - EMR - good understanding of working on EMR cluster with Spark development activities S3 - Should understand S3 storage concepts, encryption, aws s3 commands, partitioning, storage levels, events etc. Redshift - should have good understanding and working experience with Redshift and Spectrum. Should be able to Query Redshift DWH Athena - should have knowledge on using Athena EC2 - working experience with EC2 instances AWS dev-Ops - Should have AWS dev-Ops experience of the best practices. Scala - good hands on experience in Scala (Java - not much required) Hive Good hands on experience with writing Hive Queries, optimizations, joins, partitioning and bucketing concepts SQL - should be good in SQL Shell-Script - Should be able to code using ShellScript and understand Cloudera - Should be able to understand different Cloudera Hadoop components from Cloudera Manager, should have good knowledge on CDH distribution setup and configurations. Cassandra Should be able to understand Cassandra NO-SQL architecture, Database Design and Partitioning, Query Optimization, Consistency levels etc. Kafka Good understanding of data streaming using Apache Kafka, Producer-Consumer Model, offset management etc. Send profiles on

Vacancy expired!


Report job