Search for Jobs…

Member of Technical Staff, Large Generative Models

3 weeks ago
$160K – $250K
Yearly

Company Name: Captions
Location: Union Square, New York City, United States (In-person at NYC HQ)
Job Type: Full-time
Salary Range: $160K – $250K yearly (Offers Equity)
Industry: AI Research / Video AI / Software (specifically video creation using generative models)

Job Overview

Captions is a pioneering company at the forefront of AI Research and Video AI, dedicated to transforming how video content is created and understood using groundbreaking generative models. We’re seeking an exceptional Member of Technical Staff, Large Generative Models (Research Engineer (MOTS)) to join our team in the heart of Union Square, New York City. This Full-time, Senior-Level / Principal-Level role offers a unique and pivotal opportunity to drive the research, design, and implementation of cutting-edge, large-scale multimodal diffusion models, directly shaping the future of video creation software.

As a Member of Technical Staff, you’ll be instrumental in designing and implementing novel architectures for massive-scale video and multimodal diffusion models, developing new approaches to temporal modeling, and creating innovative loss functions. You’ll leverage your deep expertise in generative modeling, distributed training systems, and production-oriented MLOps to push the boundaries of what’s possible in video AI. If you have a strong track record of research contributions at top ML conferences, thrive on tackling complex technical and research challenges, and are passionate about bringing state-of-the-art AI to real-world applications, Captions invites you to contribute your expertise to our groundbreaking mission.

Duties and Responsibilities

  • Design and implement novel architectures for large-scale video and multimodal diffusion models.
  • Develop new approaches to multimodal fusion, temporal modeling, and video control.
  • Research temporal video editing and controllable generation techniques.
  • Research and validate scaling laws for generative models.
  • Create new loss functions and training objectives to enhance model performance.
  • Drive rapid experimentation to validate research hypotheses.
  • Validate research through product deployment and user feedback, ensuring real-world impact.
  • Train and optimize models at massive scale (10s-100s of billions of parameters).
  • Develop sophisticated distributed training approaches, including FSDP, DeepSpeed, and Megatron-LM.
  • Design and implement model surgery techniques for model refinement.
  • Create new approaches to memory optimization and training efficiency.
  • Research techniques for improving training stability in large models.
  • Conduct systematic empirical studies to rigorously evaluate model performance.
  • Advance the state-of-the-art in video model architecture design and optimization.
  • Create novel solutions for multimodal learning and cross-modal alignment.
  • Research and implement new optimization techniques for generative models.
  • Design and validate new evaluation metrics for video AI.
  • Systematically analyze and improve model behavior through rigorous testing and refinement.
  • Apply a Master’s or PhD in Computer Science, Machine Learning, or related field.
  • Demonstrate a track record of research contributions at top ML conferences (NeurIPS, ICML, ICLR).
  • Show proven experience implementing and improving upon state-of-the-art architectures.
  • Possess deep expertise in generative modeling approaches (diffusion, autoregressive, VAEs, etc.).
  • Maintain a strong background in optimization techniques and loss function design.
  • Exhibit experience with empirical scaling studies and systematic architecture research.
  • Demonstrate strong proficiency in modern deep learning tooling (PyTorch, CUDA, Triton, FSDP, etc.).
  • Possess experience training diffusion models with 10B+ parameters.
  • Experience with very large language models (200B+ parameters) is a plus.
  • Maintain a deep understanding of attention, transformers, and modern multimodal architectures.
  • Possess expertise in distributed training systems and model parallelism.
  • Showcase a proven ability to implement and improve complex model architectures.
  • Demonstrate a track record of systematic empirical research and rigorous evaluation.
  • Possess the ability to write clean, modular research code that scales effectively.
  • Apply strong software engineering practices, including testing and code review.
  • Exhibit experience with rapid prototyping and experimental design.
  • Utilize strong analytical skills for debugging model behavior and training dynamics.
  • Possess facility with profiling and optimization tools.
  • Maintain a track record of bringing research ideas to production.
  • Demonstrate experience maintaining high code quality in a research environment.

Qualifications

  • Experience Level: Senior-Level / Principal-Level (Master’s or PhD with a strong research track record and experience with large-scale models, indicating a highly experienced individual contributor specializing in ML research).
  • Education Requirement: Master’s or PhD in Computer Science, Machine Learning, or related field.
  • Required Skills:
    • Research Experience: Master’s or PhD in Computer Science, Machine Learning, or related field. Track record of research contributions at top ML conferences (NeurIPS, ICML, ICLR). Demonstrated experience implementing and improving upon state-of-the-art architectures. Deep expertise in generative modeling approaches (diffusion, autoregressive, VAEs, etc.). Strong background in optimization techniques and loss function design. Experience with empirical scaling studies and systematic architecture research.
    • Technical Expertise: Strong proficiency in modern deep learning tooling (PyTorch, CUDA, Triton, FSDP, etc.). Experience training diffusion models with 10B+ parameters. Experience with very large language models (200B+ parameters) is a plus. Deep understanding of attention, transformers, and modern multimodal architectures. Expertise in distributed training systems and model parallelism. Proven ability to implement and improve complex model architectures. Track record of systematic empirical research and rigorous evaluation.
    • Engineering Capabilities: Ability to write clean, modular research code that scales. Strong software engineering practices including testing and code review. Experience with rapid prototyping and experimental design. Strong analytical skills for debugging model behavior and training dynamics. Facility with profiling and optimization tools. Track record of bringing research ideas to production. Experience maintaining high code quality in a research environment.
    • Key Responsibilities involve: Designing and implementing novel architectures for large-scale video and multimodal diffusion models; developing new approaches to multimodal fusion, temporal modeling, and video control; researching temporal video editing and controllable generation; researching and validating scaling laws; creating new loss functions and training objectives; driving rapid experimentation; validating research through product deployment and user feedback; training and optimizing models at massive scale (10s-100s of billions of parameters); developing sophisticated distributed training approaches (FSDP, DeepSpeed, Megatron-LM); designing and implementing model surgery techniques; creating new approaches to memory optimization and training efficiency; researching techniques for improving training stability; conducting systematic empirical studies; advancing state-of-the-art in video model architecture design and optimization; developing new approaches to temporal modeling; creating novel solutions for multimodal learning and cross-modal alignment; researching and implementing new optimization techniques; designing and validating new evaluation metrics; systematically analyzing and improving model behavior.

Salary and Benefits

Captions offers an exceptional annual salary ranging from $160K – $250K yearly for this Full-time Member of Technical Staff, Large Generative Models position. The compensation package also offers Equity in the company. We believe in rewarding top talent and fostering a dynamic work environment. Beyond salary and equity, Captions is committed to providing a comprehensive benefits package designed to support your overall well-being and professional growth, which typically includes robust health, dental, and vision insurance, generous paid time off, and opportunities for continuous professional development at the cutting-edge of AI research.

Working Conditions

This is a Full-time position based in-person at NYC HQ in Union Square, New York City, United States. You will work within a highly collaborative and innovative office environment, engaging directly with researchers, ML engineers, and software development teams. The role demands exceptional technical expertise in large-scale generative models, strong software engineering practices, and the ability to conduct rigorous empirical research. You will be expected to drive rapid experimentation and contribute to the deployment of cutting-edge AI models. Standard business hours are generally observed.

Why Work with Us

At Captions, you’re not just joining a company; you’re becoming part of a team that’s redefining the future of video content creation through AI Research and cutting-edge Video AI technology. We are a pioneering force, building sophisticated software that empowers users with unimaginable creative capabilities, specifically focused on video creation using generative models.

We offer a challenging yet incredibly rewarding environment where your expertise in Large Generative Models, distributed training, and model optimization will be highly valued. You will be empowered to design novel architectures, research temporal video editing, and directly contribute to state-of-the-art generative AI. If you are a results-driven researcher with a clear passion for pushing the boundaries of AI, and eager to make a tangible impact on a rapidly evolving software landscape, Captions offers an unparalleled opportunity for your next career chapter.

Similar Job Vacancies

$160K – $250K
Yearly