At Facebook, we have many opportunities to work with data each and every day. In this role as a Data Engineer on the Facebook Data Center's Data Science team, your primary responsibility will be to partner with key stakeholders, data scientists and software engineers to support and enable the continued growth critical to Facebook's Data Center organization. You will be responsible for creating the technology and data architecture that moves and translates data used to inform our most critical strategic and real-time decisions. You will also help translate business needs into requirements and identify efficiency opportunities. In addition to extracting and transforming data, you will be expected to use your expertise to build extensible data models, provide meaningful recommendations and actionable strategies to partnering data scientist for performance enhancements and development of best practices, including streamlining of data sources and related programmatic initiatives. The ideal candidate will have a passion for working in white space and creating impact from the ground up in a fast-paced environment. This position is part of the Infrastructure Data Center team.
- Partner with leadership, engineers, program managers and data scientists to understand data needs.
- Apply proven expertise and build high-performance scalable data warehouses.
- Design, build and launch efficient & reliable data pipelines to move and transform data (both large and small amounts).
- Securely source external data from numerous partners.
- Intelligently design data models for optimal storage and retrieval.
- Deploy inclusive data quality checks to ensure high quality of data.
- Optimize existing pipelines and maintain of all domain-related data pipelines.
- Ownership of the end-to-end data engineering component of the solution.
- Support on-call shift as needed to support the team.
- Design and develop new systems in partnership with software engineers to enable quick and easy consumption of data.
- BS/MS in Computer Science or a related technical field.
- 5+ years of Python or other modern programming language development experience.
- 5+ years of SQL and relational databases experience.
- 5+ years experience in custom ETL design, implementation and maintenance.
- 3+ years of experience with workflow management engines (i.e. Airflow, Luigi, Prefect, Dagster, digdag.io, Google Cloud Composer, AWS Step Functions, Azure Data Factory, UC4, Control-M).
- 3+ years experience with Data Modeling.
- Experience working with cloud or on-prem Big Data/MPP analytics platform(i.e. Netezza, Teradata, AWS Redshift, Google BigQuery, Azure Data Warehouse, or similar).
- 2+ years experience working with enterprise DE tools and experience learning in-house DE tools.
- Experience with more than one coding language.
- Designing and implementing real-time pipelines.
- Experience with data quality and validation.
- Experience with SQL performance tuning and e2e process optimization.
- Experience with anomaly/outlier detection.
- Experience with notebook-based Data Science workflow.
- Experience with Airflow.
- Experience querying massive datasets using Spark, Presto, Hive, Impala, etc.