01 Sep
Language Data Project Manager
New York, New york city 00000 New york city USA

Job DescriptionLanguage Data Project Manager Location: New York, NY/Hybrid Duration: 6 months Description: Project Overview: Data gathering efforts for speech and language technologies for under-resourced languages are picking up pace across Client, with dedicated collection efforts including Vaani, Morni and transcription of Client data for 100s of languages thanks to the YT-NTL project. Another key source of data which has enabled user-facing launches in Translate and Speech in recent years is acquired data. This kind of data comes in various forms: transcribed or untranscribed audio/video in various formats, text corpora, and multimodal data. Typically, each acquired corpus needs some individual attention in order to bring it up to the standards required for model training. In the past few months, Speech Data Operations (SDO) has made great progress with delivering acquired data from vendors including SpeechOcean, Megdap and LDCIL, and open source repositories including Mozilla Common Voice, Babel and other sources. We need continued support for the following tasks: Track incoming datasets from open source repos and vendor companies File and track SDO data requests and follow up on queries via internal bugs Analyze the format of raw datasets Split data into test and train portions Use/improve existing tools to convert raw data to standard formats Deliver data to researchers for use in model training Overall Responsibilities: The Language Data Project Manager will oversee and manage all work related to achieving high data quality for speech projects in target languages/locales which includes: Managing the lifecycle of data collection process for automatic speech recognition (ASR) Conducting external research involving the sourcing of language corpora Documenting process and methodology for training Providing weekly status updates on metrics Data Entry tasks involving the managing and organizing of information to be entered into the database Able to work independently with confidence and little oversight Comfortable with new technology and quick iterations Able to collaborate with international teams Explore new methods of data gathering for a higher throughput Work with cross-functional teams to get buy-in and ensure progress Conducting Administrative duties such as scheduling meetings, and updating spreadsheets Conducting external research involving the sourcing of language corpora Skill/Experience/Education: Mandatory- BS/BA or equivalent work experience 5+ years project management experience in software development or online product development Experience working with file systems, data wrangling, and knowledge of audio/video encodings and conversion Previous experience in managing external resources and working in Linux Project management skills, including recording, tracking and reporting on progress, writing and maintaining project documents, communication skills Proficiency with key project management and tracking tools Ability to quickly grasp technical concepts and translate high-level business goals to technical or process requirements Excellent oral and written communicator Ability to document to the greatest detail and summarize with brevity and impact. Strong leadership skills, with the ability to successfully drive cross-functional teams Basic proficiency with a coding language such as C, SQL, Bash, Python Desired: Foreign language proficiency and international experience is a plus


Related jobs

Report job