10 Mar
Lead BigData Engineer
North Carolina, Cary , 27511 Cary USA

Vacancy expired!

Job Responsibilities 1. Building and Implementing data ingestion and curation process developed using Big data tools such as Spark (Scala/python), Data bricks, Delta lake, Hive, Pig, Spark, HDFS, Oozie, Sqoop, Flume, Zookeeper, Kerberos, Sentry, Impala etc. 2. Ingesting huge volumes data from various platforms for Analytics needs and writing high-performance, reliable and maintainable ETL code. 3. Monitoring performance and advising any necessary infrastructure changes. 4. Defining data security principals and policies using Ranger and Kerberos. 5. Assisting application developers and advising on efficient big data application development using cutting edge technologies. Knowledge, Skills and Abilities Education · Bachelor's degree in Computer Science, Engineering, or related discipline

Experience
  • 4+ years of solutions development experience
  • Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
  • Hive database management and Performance tuning is a MUST (Partitioning / Bucketing)
  • Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
  • Strong analytic skills related to working with unstructured datasets.
  • Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
  • Experience in any model management methodologies.

Knowledge and skills Required:
  • Proficiency and extensive experience in HDFS, Hive, Spark, Scala, Python, Databricks/Delta Lake, Flume, Kafka etc.
  • Analytical skills to analyze situations and come to optimal and efficient solution based on requirements. · Performance tuning and problem-solving skills is a must · Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
  • Hands on development experience and high proficiency in Java or, Python, Scala and SQL
  • Experience designing multi-tenant, containerized Hadoop architecture for memory/CPU management/sharing across different LOBs Preferred
  • Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
  • Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
  • Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
  • Knowledge in data science is a plus
  • Experience with Informatica PC/BDM 10 and implemented push down processing into Hadoop platform, is a huge plus.
  • Proficiency is using tools Git, Bamboo and other continuous integration and deployment tools
    • Exposure to data governance principles such as Metadata, Lineage ( Colibra /Atlas)

Vacancy expired!


Report job