About this Opportunity
The Site Reliability Engineer role will be responsible for providing technical support for our platform solutions and services. This role is expected to work closely with the required development, enterprise infrastructure, and internal business team (or external customers) to resolve and escalate production support incidents where necessary.
How You Contribute to Our Vision: Key Responsibilities
As a Site Reliability Engineer on our SiteOps team, you'll be part of a group that's intensely focused on our customers and the health of our solutions. Whether it's incident management, production support, advanced monitoring, or mentoring, SREs provide the foundation for issue triage and speedy resolution with a continuous improvement mindset.
Serve as a Tier 2 to escalation point to issues that the Tier 1 team cannot resolve.
Research problem tickets to address data, setup, and code issues to provide responsive correction of issues.
Assist in prioritization of enhancement and defect resolution.
Monitor automated system alerts, log files, and other monitoring tool outputs.
Design and develop emergency patches to address critical production issues.
Manage third-party components used in the different Digital Commerce Solutions.
Perform administrative functions for application software.
Assist in providing support by participating in weekly on-call rotation.
Interact with internal business customers, operation personnel, and development groups in troubleshooting and correction of issues.
Manage the development, quality assurance, and production application environments, working closely with operations personnel to honor application Service Level Agreements (SLA).
Perform root cause analysis on issues that lead to the implementation of processes to prevent repetitive problems.
Conduct analysis on issues that lead to the implementation of solutions to prevent repetitive problems.
Work on projects to better improve the Production Support model and processes.
Candidates should have experience with many of the following:
You have solved multiple problems by writing and documenting exceptional script solutions.
You have extensive experience automating solutions to identified issues/bugs/anomalies. You have a passion for replacing manual processes with efficient and concise automated solutions.
You have been responsible for running critical services that multiple customers depend upon. You understand the importance and impact that operational optimization can have on a product and the positive ripple effects that it can have across an entire organization.
You are empathetic: You take others' opinions into account and clearly communicate your thoughts to reach technical solutions quickly.
You consider it necessary to understand and appreciate your customers and enjoy seeing your work improve the work of others.
Mentorship and a Servant/Leader mentality
Experience in automation, specifically related to deployment, recovery, or other manual processes.
Experience using telemetry to understand throughput, limitations, and constraints in a service.
Strong problem-solving skills and passion for solving hard problems as part of a team and by individual investigation.
Experience with REST APIs, JSON, and exposure to container-based technologies.
Experience supporting zero fault-tolerant, scalable, and high-volume systems applications in .NET.
Experience in SQL Server 2012, Transact SQL, Stored Procedures.
Great analytical skills and ability to think on the feet and work under pressure.
Strong Windows/Unix platform skills and understanding of network, storage, tiered application environments, and security.
Knowledge of Splunk, Graylog, Dynatrace, Application Insights or equivalent monitoring tools.
Experience analyzing .Net thread/heap dumps
Familiarity with AWS services such as S3, Lambda, SQS, SNS, EC2, EKS,
Bachelor's degree in computer information systems preferred, but not required.
4+ Years Dev Ops Engineering with a focus on problem resolution and platform optimization.
Background building and managing end-to-end services surfacing telemetry and stitching together long-running business processes
Experience with enabling and managing cloud services, usage, and optimizations
Experience with resilience modeling (FMEA, MTTR) and the ability to automate simulation of service outages for platforms
Ability to work with service teams and own Live Site Reviews and corrective action plans
Excellent knowledge of a scripting language; Ruby, Python, and/or .Net Core
Experience working on an Azure-based, cloud-native infrastructure and managed services, including App Services, SQL Azure, and containers
Experience with Docker in a production environment including container orchestration (e.g., Nomad, Mesos, Kubernetes, etc.)
Experience with infrastructure as code (Terraform or CloudFormation)
API Technologies Swagger, Rest API, JSON, JWT, OAuth
Good knowledge of managing data disks, Storage for Windows in Azure.
Good knowledge of Windows Cluster configuration and troubleshooting.
Essential Duties & Responsibilities Required:
* Solve multiple problems by writing and documenting exceptional code solutions (20%)* Automating solutions to identified issues/bugs/anomalies (30%)* Performing deployments to update the ION Platform (10%)* Mentorship and mentoring junior team members (5%)* Defining and creating the mechanism to report metrics (15%)* Use telemetry to understand throughput, limitations and constraints (5%)* Use your experience with REST APIs, JSON, and exposure to container-based technologies (15%)
Education & Certifications:
* Master's / Postgraduate Degree with Computer Information Systems Field of Study preferred.* Bachelor's Degree with Computer Science field of study required.* Other Education / Certifications: Kubernetes experience highly desired
Working Conditions:* Occasional non-standard work hours or overtime as business requires.* May be located at the Clearwater corporate office or at the reseller location.* On-call availability required as necessary.* Professional, office environment.
Required Knowledge, Skills & Abilities:
* Able to execute instructions and to request clarification when needed.* Possesses strong data entry skills.* Ability to input 60 words per minute* Able to perform basic mathematical calculations.* Able to recognize and attend to important details with accuracy and efficiency.* Able to communicate clearly and convey necessary information.* Able to create and conduct formal presentations.* Able to negotiate skillfully, promote/sell ideas persuasively, and close transactions with mutually beneficial results.* Possesses strong leadership skills with a willingness to lead, create new ideas, and be assertive.* Possesses strong organizational and time management skills, driving tasks to completion.* Able to constructively work under stress and pressure when faced with high workloads and deadlines.* Able to work independently with minimum supervision.* Able to maintain confidentiality of sensitive information* Able to build solid, effective working relationships with others.* Able to quickly learn new systems and technology.* Able to use relevant computer system applications at an advanced level.
Cultural Competency Requirements:
Within Tech Data diversity is one of our fundamental shared values. We are a multi-cultural environment and we pride ourselves on being a welcoming place of work where we celebrate inclusion and champion people from a multitude of backgrounds.
Join our team to connect the world with the power of technology!
Tech Data is an equal opportunity organization. We recruit, employ, train, compensate, and promote without regard to race, religion, creed, color, national origin, age, gender, sexual orientation, gender identity, marital status, disability, veteran status, or any other basis protected by applicable federal, state or local law.
Equal Employment Opportunity: Applications will be considered without regard to race, sex, religion, color, national origin, age, gender identity and/or expression, disability, sexual orientation, veteran status or other factors not related to specific position qualifications.
If you are a qualified individual with a disability or a disabled veteran, you may request a reasonable accommodation if you are unable or limited in your ability to use or access our employment website as a result of your disability. To request reasonable accommodation, contact Tech Data Human Resources at +1 (800) 237 8931*.
*Due to the volume of applications we receive, applicant inquiries to this phone number which are unrelated to a request for accommodation due to a disability will not receive a response.