Systems Engineer/ Administrator II, Global Operations Support Engineering
Company: Amazon
Location: Herndon
Posted on: April 7, 2026
|
|
|
Job Description:
AWS Infrastructure Services owns the design, planning, delivery,
and operation of all AWS global infrastructure. In other words,
we're the people who keep the cloud running. We support all AWS
data centers and all of the servers, storage, networking, power,
and cooling equipment that ensure our customers have continual
access to the innovation they rely on. We work on the most
challenging problems, with thousands of variables impacting the
supply chain — and we're looking for talented people who want to
help. You'll join a diverse team of software, hardware, and network
engineers, supply chain specialists, security experts, operations
managers, and other vital roles. You'll collaborate with people
across AWS to help us deliver the highest standards for safety and
security while providing seemingly infinite capacity at the lowest
possible cost for our customers. And you'll experience an inclusive
culture that welcomes bold ideas and empowers you to own them to
completion. The AWS Global Operations Support Engineering (GOSE)
team is seeking a System Engineer to lead the technical
implementation of business automation solutions and AI-driven
operational intelligence platforms. This role will serve as the
technical backbone for transforming critical infrastructure data
into automated, intelligent systems that enable the Data Center
Community (DCC) organization to prevent customer impact, reduce
operational burden, and continuously improve fleet-wide
reliability. As a System Engineer, you will design and build
production-grade automation infrastructure, lead the
productionalization of AI-driven operational tools, and establish
engineering best practices that scale across the global AWS data
center portfolio. You will work at the intersection of
infrastructure operations, automated solutions, and artificial
intelligence to create systems that fundamentally change how AWS
manages its global infrastructure. Key job responsibilities -
Influence the team’s technical and business strategy by making
insightful contributions to team priorities and lead in identifying
and solving architecture deficiencies that limit the innovation -
Design and implement production infrastructure for AI-driven
operational intelligence platforms and agentic systems, including
event-driven architectures, Lambda functions, AgentCore
deployments, API integrations, MCP server deployments, and AI
orchestration systems that enable autonomous near-real-time actions
across the global fleet - Architect scalable automation solutions
and agentic AI systems that integrate across multiple AWS services
(Lambda, CloudWatch, Bedrock, AgentCore) and internal systems to
eliminate manual processes and enable autonomous workflows -
Develop and maintain MCP (Model Context Protocol) servers and tools
that expose data center operational data, runbooks, and automation
capabilities to agentic systems - Build and maintain AWS
infrastructure for automation programs, including dedicated AWS
accounts, IAM roles, security configurations, deployment pipelines,
usage logging, and authentication systems - Establish engineering
standards, best practices, and operational excellence patterns for
business automation and AI-driven systems, including CI/CD
pipelines and infrastructure-as-code solutions using
CDK/CloudFormation - Drive proof-of-concept development for new
automation ideas and own the productionalization of validated AI
proof-of-concepts into production-ready systems with >95%
uptime, implementing monitoring, alerting, and observability
solutions for automation infrastructure - Collaborate with Business
Intelligence Engineers, TPMs, and Data Engineers to translate
business requirements into technical solutions while leading design
and code reviews About the team The Global Operations Support
Engineering (GOSE) team is focused on maximizing AWS data center
infrastructure availability and operational excellence. We achieve
this by optimizing labor utilization, deep diving event and
incident analysis, developing data engineering and business
intelligence solutions, deploying business automation, and managing
global operational improvement initiatives. We transform critical
infrastructure data into actionable intelligence that enables the
Data Center Community (DCC) organization to prevent customer
impact, reduce operational burden, focus on highest-impact
activities, and continuously improve fleet-wide reliability and
productivity. Through our comprehensive monitoring, analysis,
reporting, and program/project management, we serve as the
analytical backbone that drives continuous improvement in
operational excellence across the global data center portfolio. The
team operates at the intersection of infrastructure operations,
data engineering, and artificial intelligence—building systems that
fundamentally change how AWS manages its global infrastructure at
scale. - 5 years of systems engineering experience - Bachelor's
degree in Systems Engineering, Computer Science, or related field
or relevant work experience - Experience in site reliability
engineering (SRE), systems engineering, systems administration,
DevOps, security administration, or network administration -
Experience in any of the following: Python, Java, Perl, PHP, Ruby,
Bash, Shell or equivalent - Knowledge of TCP/IP and networking
protocols such as HTTP and DNS - Experience designing and
developing scripts to automate operational burdens and reviewing
scripting changes to ensure they meet the standards for
maintainability, scalability and security - Experience working in
24/7 production environment - Experience with service-oriented
architecture and web services Amazon is an equal opportunity
employer and does not discriminate on the basis of protected
veteran status, disability, or other legally protected status. Our
inclusive culture empowers Amazonians to deliver the best results
for our customers. If you have a disability and need a workplace
accommodation or adjustment during the application and hiring
process, including support for the interview or onboarding process,
please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more
information. If the country/region you’re applying in isn’t listed,
please contact your Recruiting Partner. The base salary range for
this position is listed below. Your Amazon package will include
sign-on payments and restricted stock units (RSUs). Final
compensation will be determined based on factors including
experience, qualifications, and location. Amazon also offers
comprehensive benefits including health insurance (medical, dental,
vision, prescription, Basic Life & AD&D insurance and option
for Supplemental life plans, EAP, Mental Health Support, Medical
Advice Line, Flexible Spending Accounts, Adoption and Surrogacy
Reimbursement coverage), 401(k) matching, paid time off, and
parental leave. Learn more about our benefits at
https://amazon.jobs/en/benefits . USA, VA, Herndon - 104,500.00 -
160,000.00 USD annually
Keywords: Amazon, Washington DC , Systems Engineer/ Administrator II, Global Operations Support Engineering, Engineering , Herndon, DC