About the Role
We are looking for an engineering leader to run the Federal Operations team. You will stabilize and mature the platform-owning service reliability, release safety, and continuous compliance-so we can confidently expand customer coverage and delivery velocity. You'll manage and grow a high-performing team, build strong cross-functional muscle, and deliver predictable outcomes in a regulated environment.
What You'll Do
- Lead and grow Gov SRE to operate a secure, reliable, and scalable government environment (people leadership, hiring, coaching, performance, and culture).
- Enforce SLOs/SLIs, incident response, on-call, and change management processes aligned to internal risk thresholds.
- Drive platform stabilization: reduce toil, harden baselines, improve observability, and shrink MTTR through runbooks, automation, and quality guardrails.
- Own safe delivery to the Gov environment: plan and orchestrate releases, change reviews, and rollbacks; improve CI/CD and IaC workflows for repeatable, auditable change.
- Build and prioritize an execution roadmap for Gov v2.0 launch (scale product enablement in Gov, reduce operational drag, and improve deployment lead time without increasing risk).
- Improve cost, performance, and resiliency posture in GovCloud through architecture reviews, reliability testing, capacity planning.
- Report clear metrics, risks, and progress to stakeholders; proactively escalate blockers and propose mitigations.
What You'll Bring (Must-Haves)
- 7+ years in SRE/Infrastructure/Platform Engineering operating customer-facing services at scale.
- Hands-on leadership in incident management, SLOs/SLIs, observability, and change/release management.
- 2+ years managing SRE/Infra/FedOps teams with on-call ownership, including hands-on leadership in incident management, SLOs/SLIs, observability, and change/release management.
- Practical experience with cloud infrastructure (AWS preferred), Kubernetes/containers, Terraform or similar IaC, and modern CI/CD.
- Strong cross-functional collaboration with Security/Compliance, Product & Eng, and GTM; excellent written/runbook documentation and stakeholder communication.
- Track record of automation that reduces toil and improves reliability, auditability, and developer productivity.
- Ability to set clear goals/metrics, manage a prioritized roadmap, and deliver outcomes through the team.
- Depth in incident tooling and telemetry (e.g., metrics, tracing, logging) and alert hygiene.
- Experience operating in AWS GovCloud.
- Hiring and scaling a small team in a fast-moving, high-accountability environment.
How You'll Partner
- Product onboarding to Gov, dependency readiness, release planning, and quality gates.
- Partner across teams in platform infrastructure: shared patterns/modules, CI/CD safety, IaC standards, and golden-path delivery.
- Customer-facing teams: deployment readiness for POVs and go-live, incident comms, and reliability posture for federal customers.
#LI-ML1
At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons.
Base salary range:
$218,100
—
$256,600 USD
Abnormal AI is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status or other characteristics protected by law. For our EEO policy statement pleaseclick here. If you would like more information on your EEO rights under the law, pleaseclick here.
|