Real Jobs. Real Change. See What's Next.
A living directory of real jobs that didn't exist 5 years ago. Curated for leaders, builders, and the curious.
LLM Ops Engineer
Manages the complete operational lifecycle of Large Language Models from development and fine-tuning through deployment, monitoring, and continuous optimization in production environments
Key Responsibilities:
- Manage LLM lifecycle including fine-tuning pre-trained models, dataset curation, and training infrastructure optimization
- Develop and manage APIs for model serving while scaling infrastructure to handle varying demand loads
- Monitor performance metrics including latency, throughput, quality metrics, and cost optimization for model inference
- Create and maintain golden datasets for benchmark testing and implement statistical validation methods
- Design user feedback collection systems and establish continuous improvement processes with A/B testing frameworks
- Implement content moderation, bias detection, and regulatory compliance systems for AI safety
- Manage prompt versioning, template creation, and playground environments for systematic prompt management
Skills & Tools:
- LLM development, fine-tuning, and deployment experience
- Programming skills (Python, machine learning frameworks)
- MLOps pipeline technology (Kubeflow, Apache Airflow)
- Cloud AI platforms (Azure OpenAI, AWS Sagemaker, Vertex AI)
- Infrastructure scaling and optimization tools
- AI monitoring and dashboard creation platforms
- Machine learning operations and MLOps principles
- AI safety, bias detection, and compliance frameworks (ISO 27001, SOC2)
- Problem-solving and analytical thinking abilities
Where This Role Has Appeared:
- Litera (Legal Technology, Remote, $100k-$132k, July 2025)
Variants & Related Titles:
- ML Operations Engineer
- AI Infrastructure Engineer
- LLM Platform Engineer
- AI Production Engineer
- Machine Learning Engineer - LLM Focus
Why This Role Is New:
LLM Ops Engineer emerged in 2023-2024 as organizations moved beyond AI pilots to production-scale LLM deployments requiring specialized operational expertise. The role addresses the unique challenges of managing large language models in production, including prompt management, inference optimization, safety monitoring, and cost control that traditional MLOps roles weren't designed to handle.
Trend Insight:
As LLMs become core business infrastructure rather than experimental tools, companies are creating dedicated operational roles to ensure these powerful AI systems run reliably, safely, and cost-effectively at enterprise scale.
Seen this role elsewhere? Submit an example or share your story.