← all jobs

[Remote] Principal Observability Platform Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Nscale is the GPU cloud engineered for AI, providing high-performance infrastructure for AI start-ups and large enterprises. As a Principal Observability Platform Engineer, you will own the technical direction of Nscale's observability platform, ensuring it scales with the business and simplifies operations.

Responsibilities

  • Own the technical strategy and architecture for observability across metrics, logs, traces, and alerting at scale
  • Drive platform decisions that have multi-year impact: tooling, data models, ingestion patterns, retention, cardinality management
  • Identify systemic gaps before they become incidents; design platforms that make failure visible and fast to diagnose
  • Partner with SRE, infrastructure, and AI/ML teams to embed observability natively into how Nscale builds and operates
  • Define standards and patterns that other engineers adopt, not by mandate, but because they're clearly better
  • Mentor and technically grow the observability team; raise the ceiling on what the team can build and own
  • Lead incident postmortems and use them to drive durable platform improvements
  • Evaluate and introduce tooling that meaningfully improves signal quality, operational efficiency, or scalability, and retire what doesn't

Skills

  • 8+ years in SRE, infrastructure engineering, platform engineering, or observability-focused roles
  • You've operated observability infrastructure at serious scale. You know what breaks at 10x and you design for it
  • You have a strong bias toward simplicity. You've seen over-engineered observability stacks collapse under their own weight and you build accordingly
  • Deep hands-on experience with a significant subset of: Prometheus, Thanos, VictoriaMetrics, Grafana, Loki, Tempo, OpenTelemetry, ClickHouse, Elastic
  • Strong engineering fundamentals, proficient in Python, Go, or similar; comfortable owning complex systems end to end
  • Experience with Kubernetes at scale; familiarity with GPU infrastructure or HPC environments (Slurm) is a strong plus
  • You can architect systems, write the code, review others' work, and explain the tradeoffs clearly, all in the same week
  • Infrastructure-as-Code is default, not optional (Terraform, Ansible, or equivalent)
  • You influence without authority. Teams want your opinion because it makes their work better
  • Experience with high-volume streaming pipelines for observability data (Kafka, Vector, Fluent Bit, etc.)
  • Background in AI/ML infrastructure observability: GPU utilisation, training job visibility, inference latency
  • Prior experience defining observability strategy at an organisation level

Benefits

  • Bonus
  • Equity
  • Commission programs
  • Medical
  • Dental
  • Vision
  • Flexible paid time off
  • Parental leave
  • Retirement plan participation

Company Overview

  • Nscale builds AI data centers and provides GPU cloud infrastructure that companies use to train, run, and scale large AI models. It was founded in 2024, and is headquartered in London, England, GBR, with a workforce of 201-500 employees. Its website is https://www.nscale.com.
  • More open positions

    [Remote] Strategic Customer Success Manager

    Work from home Full-time role

    [Remote] Enterprise Account Executive, Fintech

    Work from home Full-time role

    [Remote] Senior Business Development Manager US

    Work from home Full-time role

    [Remote] Staff AI Engineer - Contact Center AI

    Work from home Full-time role

    [Remote] Senior Recruiter

    Work from home Full-time role

    CICD Engineer

    Work from home Full-time role

    Experienced Customer Service Agent – Remote Phone Support for Private Transportation Industry

    Work from home Full-time role

    [Remote] GIS Analyst – ArcGIS Indoors Specialty

    Work from home Full-time role

    Experienced Data Entry Specialist (Typist) – Remote Logistics and Supply Chain Operations

    Work from home Full-time role

    Consultor/a Implantación

    Work from home Full-time role

    Netflix $27/H Opening Job (Data Entry Remote, Part Time) id-1737

    Work from home Full-time role

    Evaluator - Remote

    Work from home Full-time role

    Bosnian/Serbian Interpreter

    Work from home Full-time role

    Application Security Engineer

    Work from home Full-time role

    Account Executive - IN, IL, WI, MI

    Work from home Full-time role

    Experienced Customer Support Representative – Intuit TurboTax Tax Software Navigation and Support

    Work from home Full-time role

    Associate Manager, Employee Relations

    Work from home Full-time role

    [Remote] Lead Mechanical Engineer - Fleet Reliability and Operations (Hydroelectric industry)

    Work from home Full-time role

    REMOTE Senior Medical Writer US Residents only

    Work from home Full-time role

    [Remote] Marketing Intern

    Work from home Full-time role

    Remote Data Entry Specialist – Work From Home Opportunity with careerzynith (Entry-Level, Part-Time, No Experience Required) – Join Our Dynamic Remote Operations Team

    Work from home Full-time role