Search for Jobs…

Machine Learning Engineer (5+ years of experience)

3 weeks ago
$175K - $275K
Yearly

Company Name: Captions
Location: Union Square, New York City, United States (In-person at NYC HQ)
Job Type: Full-time
Salary Range: $175K – $275K yearly (Offers Equity)
Industry: AI / Video AI / Software (specifically developing AI-powered video creation tools using large-scale generative models)

Job Overview

Captions is a pioneering company at the forefront of AI, Video AI, and Software, dedicated to transforming how video content is created and understood using groundbreaking large-scale generative models. We are seeking an exceptional Machine Learning Engineer with 5+ years of experience to join our team in the heart of Union Square, New York City. This Full-time, Senior-Level role offers a pivotal opportunity to deploy, optimize, and maintain the cutting-edge deep learning models that power our innovative AI-powered video creation tools.

As a Machine Learning Engineer, you will be instrumental in developing high-performance GPU-based inference pipelines for large multimodal diffusion models, ensuring low-latency and high-throughput predictions at scale. You will leverage your strong knowledge of containerization, microservice architectures, and model optimization techniques to bring state-of-the-art generative AI into production for millions of creators worldwide. If you are a seasoned ML professional with a proven track record in deploying scalable AI systems, thrive in a fast-paced, research-driven environment, and are passionate about pushing the boundaries of video AI, Captions invites you to contribute your expertise to our groundbreaking mission.

Duties and Responsibilities

  • Develop high-performance GPU-based inference pipelines for large multimodal diffusion models.
  • Build, optimize, and maintain serving infrastructure to deliver low-latency, high-throughput inference at scale.
  • Collaborate closely with DevOps teams to containerize models, manage autoscaling, and ensure uptime SLAs.
  • Leverage model optimization & fine-tuning techniques like quantization, pruning, and distillation to reduce latency and memory footprint without compromising quality.
  • Implement continuous fine-tuning workflows to adapt models based on real-world data and feedback.
  • Design and maintain automated CI/CD pipelines for model deployment, versioning, and rollback as part of Production MLOps.
  • Implement robust monitoring (latency, throughput, concept drift) and alerting for critical production systems.
  • Explore cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe) to continuously improve throughput and reduce costs, driving Performance & Scaling.
  • Apply proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.).
  • Possess strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.
  • Demonstrate proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
  • Exhibit familiarity with compression techniques (quantization, pruning, distillation) for large-scale models.
  • Possess experience profiling and optimizing model inference (batching, concurrency, hardware utilization).
  • Maintain hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML.
  • Possess a strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
  • Show exposure to diffusion models, multimodal video generation, or large-scale generative architectures.
  • Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments is a significant plus.

Qualifications

  • Experience Level: Senior-Level (Explicitly stated “5+ years of experience”).
  • Education Requirement: Relevant experience or a Bachelor’s/Master’s degree in Computer Science, Machine Learning, or a related technical field is typically required.
  • Required Skills:
    • Technical Expertise: Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.); Strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving; Proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
    • Model Optimization: Familiarity with compression techniques (quantization, pruning, distillation) for large-scale models; Experience profiling and optimizing model inference (batching, concurrency, hardware utilization).
    • Infrastructure: Hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML; Strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
    • Domain Experience: Exposure to diffusion models, multimodal video generation, or large-scale generative architectures; Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments.
    • Inference & Deployment: Developing high-performance GPU-based inference pipelines for large multimodal diffusion models; Building, optimizing, and maintaining serving infrastructure for low-latency, high-throughput inference at scale; Collaborating with DevOps teams to containerize models, manage autoscaling, and ensure uptime SLAs.
    • Model Optimization & Fine-Tuning: Leveraging techniques like quantization, pruning, and distillation; Implementing continuous fine-tuning workflows.
    • Production MLOps: Designing and maintaining automated CI/CD pipelines for model deployment, versioning, and rollback; Implementing robust monitoring and alerting for critical production systems.
    • Performance & Scaling: Exploring cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe).

Salary and Benefits

Captions offers an exceptional annual salary ranging from $175K – $275K yearly for this Full-time Machine Learning Engineer position. The compensation package also offers Equity in the company. We believe in rewarding top talent and fostering a dynamic work environment. Beyond salary and equity, Captions is committed to providing a comprehensive benefits package designed to support your overall well-being and professional growth, which typically includes robust health, dental, and vision insurance, generous paid time off, and opportunities for continuous professional development at the cutting-edge of AI technology.

Working Conditions

This is a Full-time position based in-person at NYC HQ in Union Square, New York City, United States. You will work within a highly collaborative and innovative office environment, engaging directly with researchers, ML engineers, and software development teams. The role demands exceptional technical expertise in deep learning model deployment, strong programming skills, and the ability to design and optimize critical ML infrastructure. You will be expected to push the boundaries of ML system performance and contribute to the deployment of cutting-edge AI models. Standard business hours are generally observed.

Why Work with Us

At Captions, you’re not just joining a company; you’re becoming part of a team that’s redefining the future of video content creation through AI and cutting-edge Video AI technology. We are a pioneering force, building sophisticated Software that empowers users with unimaginable creative capabilities, specifically focused on developing AI-powered video creation tools using large-scale generative models.

We offer a challenging yet incredibly rewarding environment where your expertise in Machine Learning, GPU inference, and MLOps will be highly valued. You will be empowered to develop high-performance inference pipelines, optimize large-scale models, and directly contribute to state-of-the-art generative AI. If you are a results-driven ML Engineer with a clear passion for pushing the boundaries of AI, and eager to make a tangible impact on a rapidly evolving software landscape, Captions offers an unparalleled opportunity for your next career chapter.

Similar Job Vacancies

$170K – $250K
Yearly
$200K – $300K
Yearly