TopGenAIJobs

TopGenAIJobs

A Gen AI & Agentic AI Jobs Platform to discover high-quality Gen AI and Agentic AI opportunities from top companies worldwide.

topgenaijobs.com

Quick Links

  • Home
  • Browse Jobs
  • Browse by Category
  • Companies
  • Post a Job
  • Career Resources
  • About Us

Resources

  • Blog
  • Career Guide
  • Resume Tips
  • Interview Prep
  • Salary Guide
  • Skill Demand Index

Top Gen AI Roles

  • Gen AI Engineer Jobs
  • Agentic AI Engineer Jobs
  • Prompt Engineer Jobs
  • LLM Engineer Jobs
  • RAG Engineer Jobs
  • MLOps Engineer Jobs
  • Remote AI Jobs
  • Entry Level AI Jobs
  • Senior AI Jobs

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Contact

© 2026 TopGenAIJobs (Gen AI & Agentic AI Jobs Platform). All rights reserved.

Made with ❤ by TopGenAIJobs Team

    Home/Jobs/Staff Machine Learning Engineer - ML Training Infrastructure

    Staff Machine Learning Engineer - ML Training Infrastructure

    General Motors

    Frankfort / Sunnyvale
    7+ years
    Today
    ₹23–44 LPA
    Full-time
    Remote

    Skills Required

    LLM
    Distributed Training
    FSDP
    Pipeline Parallelism
    Training Platforms
    Python
    PyTorch
    TensorFlow
    Distributed Systems
    GPU Computing
    Cloud Environments
    AWS
    GCP
    Azure
    Profiling

    Description

    Seeking a Staff Machine Learning Engineer to lead design and development of scalable AI/ML training infrastructure at General Motors. This role involves technical leadership, architecture definition, and collaboration with ML engineers and research scientists.

    Company: General Motors

    Role: Staff Machine Learning Engineer - ML Training Infrastructure

    Location: Frankfort | Remote

    Experience:

    • 7+ years of professional software engineering experience
    • 5+ years of specialized experience in AI/ML infrastructure
    • Experience leading technically ambiguous, cross-team infrastructure initiatives

    Key Skills:

    • Python
    • PyTorch
    • TensorFlow
    • Distributed systems
    • Distributed training
    • GPU computing
    • Cloud environments (AWS, GCP, Azure)
    • ML frameworks
    • Model training optimization
    • System observability
    • Debuggability
    • Operational excellence

    Qualification:

    • Bachelor's degree or higher in Computer Science or related field or equivalent practical experience

    Role Focus:

    • Define and drive architecture, design, and development of scalable ML frameworks and platform capabilities
    • Lead model training performance analysis and optimization across distributed training workflows
    • Improve scalability, efficiency, and cost across heterogeneous hardware environments
    • Enhance system observability, debuggability, operational excellence, and developer experience
    • Own large, ambiguous, cross-functional technical initiatives from strategy through execution
    • Define technical roadmap, perform tradeoff analysis, and deliver solutions
    • Influence platform direction by identifying long-term infrastructure investments and setting engineering standards
    • Collaborate across organizational boundaries to align requirements and integrate new capabilities
    • Mentor engineers through design reviews, technical guidance, and hands-on partnership

    Additional responsibilities:

    • Travel to Sunnyvale, CA as needed
    • Operate in highly ambiguous and dynamic environments

    Nice to have:

    • Deep expertise in PyTorch 2.x+ and distributed training frameworks
    • Experience with training platforms supporting FSDP, pipeline parallelism, and scalable solutions for large foundational models
    • Experience profiling, analyzing, debugging, and optimizing training and data loading performance at scale
    • Strong record of technical leadership through architecture reviews, roadmap influence, and cross-team execution
    • Excellent communication skills for building consensus and providing constructive technical feedback
    • Self-motivated and execution-oriented with broad organizational impact

    Other:

    • Salary range $185,000 to $335,300 with bonus potential based on company and individual performance
    • Relocation benefits may be available
    • Benefits include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation and holidays, tuition assistance, employee assistance program, and GM vehicle discounts
    • Company vehicle evaluation program available upon successful motor vehicle report review
    • Role categorized as remote with no expected onsite reporting unless directed
    • General Motors commitment to diversity, inclusion, equal employment opportunity, and accommodations for disabilities
    • Employment decisions made without regard to protected status under federal, state, and local laws

    Prepare for this role

    Recommended resources to build the skills for this position. Sponsored.

    Python for Everybody Specialization

    Coursera

    Learn Python from scratch — variables, data structures, web scraping, and databases.

    Python 3 Programming Specialization

    Coursera

    Intermediate Python covering classes, inheritance, APIs, and data processing.

    Generative AI with Large Language Models

    Coursera

    Comprehensive LLM course covering transformer architecture, fine-tuning, RLHF, and deployment.

    More LLM jobs

    Python Developer – Gen AI / Agentic AI

    Digitrix Software

    Bengaluru

    Today

    Senior Compiler Engineer - AI

    Nvidia

    Redmond

    Today

    Senior Compiler Engineer - AI

    Nvidia

    Austin

    Today

    AI Prompt Writer

    Careerscape

    Los Angeles

    Today

    Staff Data Scientist | ML

    Machinify

    United States

    Today

    Forward Deployed Engineer, Agentic Platform (West Coast)

    Cohere

    San Francisco

    Today