Key Responsibilities
- Support the availability, reliability, and performance of containerized services supporting mission systems
- Monitor infrastructure health using metrics, logs, and alerts; respond to and resolve incidents
- Perform root-cause analysis for infrastructure and service outages; implement corrective and preventative actions
- Improve system reliability through automation, standardization, and proactive engineering
- Support capacity planning, performance analysis, and scaling of infrastructure services
- Maintain and enhance monitoring, logging, and alerting solutions
- Participate in incident response, on-call rotations (as required), and post-incident reviews
- Collaborate with network, systems, platform, and application teams to resolve cross-stack issues
- Support infrastructure lifecycle activities including deployments, upgrades, patches, and configuration changes
- Apply security best practices and support compliance requirements in a regulated environment
- Develop and maintain runbooks, procedures, and operational documentation
- Contribute to CI/CD and Infrastructure-as-Code workflows supporting IaaS services
- Participate in Agile ceremonies and operational planning activities
- Perform other duties as assigned
Requirements
The following minimum qualifications are required for the position:
- Bachelor's Degree in Computer Science, Engineering or related technical field
- 5+ years of professional experience in systems engineering, SRE, DevOps, or infrastructure operations
- Strong experience administering Linux systems
- Experience supporting on-prem, cloud, or hybrid infrastructure environments
- Hands-on experience with monitoring, logging, and alerting systems
- Strong troubleshooting skills across compute, storage, networking, and OS layers
- Experience scripting or automating tasks using Bash, Python, or similar languages
- Familiarity with Infrastructure as Code (IaC) concepts and tooling, such as Helm, Ansible, and Terraform
- Strong verbal and written communication skills
- Detail-oriented, self-motivated, and able to own issues through resolution
- Ability to obtain and maintain a DoD Top Secret security clearance
- Ability to work on-site at the customer location
Candidates who have any of the following skills will be preferred:
- Advanced degree (e.g., Masters) in Computer Science, Engineering, or Mathematics
- AWS Certified Solution Architect or similar
- Experience working with and implementing Kubernetes, ArgoCD, and GitOps
- Familiarity with Capacity Planning, Disaster Recovery, and Anomaly Detection
- Monitoring performance utilizing tools such as Grafana and Prometheus
- Professional experience with DevOps and CI/CD tooling, including: Docker, Jenkins, GitLab CI/CD
- Experience working in Agile software development environments and using task scheduling and tracking software (e.g., JIRA)
- Active DoD Top Secret security clearance
*Resumes, Cover Letters, and Applications which are generated by AI will not be considered for employment.
Benefits
SciTec offers a highly competitive salary and benefits package, including:
- 4% Safe Harbor 401(k) match
- 100% company paid HSA Medical insurance, with a choice of 2 buy-up options
- 80% company paid Dental insurance
- 100% company paid Vision insurance
- 100% company paid Life insurance
- 100% company paid Long-term Disability insurance
- Short-term Disability insurance
- Annual Profit-Sharing Plan
- Discretionary Performance Bonus
- Paid Parental Leave
- Generous Paid Time Off, including Holiday, Vacation, and Sick Pay
- Flexible Work Hours
The pay range for this position is $111,000-$152,000. SciTec considers several factors when extending an offer of employment, including but not limited to the role and associated responsibilities, a candidate's work experience, education/training, and key skills. This is not a guarantee of compensation.
SciTec is proud to be an Equal Opportunity employer. Vet/Disabled.