Company Name: Captions
Location: Union Square, New York City, United States (In-person at NYC HQ)
Job Type: Full-time
Salary Range: $200K – $300K yearly (Offers Equity)
Industry: AI Research / Video AI / Software
Job Overview
Captions is a pioneering company at the forefront of AI Research and Video AI, dedicated to transforming how video content is created and understood. We’re seeking an exceptional Engineering Manager, Machine Learning to join our team in the heart of Union Square, New York City. This Full-time role offers a pivotal opportunity to drive the technical vision and deployment of large-scale, multimodal diffusion models in production, directly impacting the capabilities of our groundbreaking video AI software.
As an Engineering Manager, you will lead efforts in building, optimizing, and maintaining serving infrastructure for low-latency, high-throughput inference at scale. You’ll collaborate closely with researchers to adapt state-of-the-art generative models for real-world performance and reliability, while overseeing and contributing to core GPU-based ML pipelines. If you’re a seasoned ML professional with deep expertise in model deployment, optimization, and MLOps, thriving in an innovative, in-person environment, Captions invites you to contribute your leadership and technical prowess to the future of AI.
Duties and Responsibilities
- Drive the technical vision for deploying large-scale multimodal diffusion models (tens to hundreds of billions of parameters) in production.
- Oversee and contribute directly to core ML pipelines, focusing on GPU-based inference and model optimization.
- Collaborate extensively with researchers to adapt state-of-the-art generative models for optimal real-world performance and reliability.
- Lead Inference & Deployment efforts by developing high-performance GPU-based inference pipelines.
- Build, optimize, and maintain robust serving infrastructure for low-latency, high-throughput inference at scale.
- Collaborate seamlessly with software engineering teams on containerization (Docker, Kubernetes), autoscaling, and uptime SLAs.
- Drive Model Optimization & Fine-Tuning by leveraging compression techniques (quantization, pruning, distillation) to reduce latency and memory footprint.
- Implement continuous fine-tuning workflows to enhance model performance over time.
- Oversee Production MLOps, Performance, and Scaling, designing and maintaining automated CI/CD pipelines for model deployment, versioning, and rollback.
- Implement robust monitoring (latency, throughput, concept drift) and alerting systems for distributed ML systems.
- Explore and integrate cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe).
- Apply Technical Expertise with proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.).
- Possess strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.
- Maintain proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
- Demonstrate familiarity with compression techniques for large-scale models.
- Exhibit experience profiling and optimizing model inference.
- Possess hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML.
- Maintain a strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
- Possess Domain Experience with exposure to diffusion models, multimodal video generation, or large-scale generative architectures.
- Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments is a significant plus.
Qualifications
- Experience Level: Senior-Level or Lead-level engineering role (indicated by the “Manager” title, “Technical Leadership” responsibilities, and the advanced technical requirements).
- Education Requirement: Relevant experience or a Bachelor’s/Master’s degree in Computer Science, Machine Learning, or a related technical field is typically required.
- Required Skills:
- Technical Leadership: Driving technical vision for deploying large-scale multimodal diffusion models (tens to hundreds of billions of parameters) in production; overseeing and contributing to core ML pipelines (GPU-based inference, model optimization); collaborating with researchers to adapt state-of-the-art generative models for real-world performance and reliability.
- Inference & Deployment: Developing high-performance GPU-based inference pipelines; building, optimizing, and maintaining serving infrastructure for low-latency, high-throughput inference at scale; collaborating with software engineering teams on containerization, autoscaling, and uptime SLAs.
- Model Optimization & Fine-Tuning: Leveraging compression techniques (quantization, pruning, distillation) to reduce latency and memory footprint; implementing continuous fine-tuning workflows.
- Production MLOps, Performance, Scaling: Designing and maintaining automated CI/CD pipelines for model deployment, versioning, and rollback; implementing robust monitoring (latency, throughput, concept drift) and alerting; exploring cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe).
- Technical Expertise: Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.); strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving; proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
- Model Optimization: Familiarity with compression techniques for large-scale models; experience profiling and optimizing model inference.
- Infrastructure: Hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML; strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
- Domain Experience: Exposure to diffusion models, multimodal video generation, or large-scale generative architectures; experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments.
Salary and Benefits
Captions offers an exceptional annual salary ranging from $200K – $300K yearly for this Full-time Engineering Manager, Machine Learning position. The compensation package also offers Equity in the company. We believe in rewarding top talent and fostering a dynamic work environment. Beyond salary and equity, Captions is committed to providing a comprehensive benefits package designed to support your overall well-being and professional growth, which typically includes robust health, dental, and vision insurance, generous paid time off, and opportunities for continuous professional development at the cutting-edge of AI research.
Working Conditions
This is a Full-time position based in-person at NYC HQ in Union Square, New York City, United States. You will work within a highly collaborative and innovative office environment, engaging directly with researchers, software engineering teams, and other ML professionals. The role demands exceptional technical leadership, a hands-on approach to complex ML systems, and the ability to manage significant projects in a fast-paced setting. You will be expected to drive technical vision and contribute to core ML pipelines. Standard business hours are generally observed.
Why Work with Us
At Captions, you’re not just joining a company; you’re becoming part of a team that’s redefining the future of video content creation through AI Research and cutting-edge Video AI technology. We are a pioneering force, building sophisticated software that empowers users with unimaginable creative capabilities. As an Engineering Manager, Machine Learning, your role is pivotal in deploying and optimizing the large-scale generative models that fuel our innovation.
We offer a challenging yet incredibly rewarding environment where your expertise in Machine Learning, production MLOps, and model optimization will be highly valued. You will be empowered to drive technical vision, build high-performance inference pipelines, and directly contribute to state-of-the-art generative AI. If you are a results-driven leader with a clear passion for pushing the boundaries of AI, and eager to make a tangible impact on a rapidly evolving software landscape, Captions offers an unparalleled opportunity for your next career chapter.