#1 Job Board for tech industry in Europe

Site Reliability Engineer (SRE)

New

DevOps

Site Reliability Engineer (SRE)

Kevin Edward

Warszawa

3 688 - 4 918 USDNet/month - B2B

Type of work

Full-time

Experience

Mid

Employment Type

B2B

Operating mode

Hybrid

Tech stack

SRE

master

DevOps

master

application support

advanced

Production Support

advanced

Linux / Unix

advanced

ITIL

regular

Kubernetes

regular

Job description

Online interview

Friendly offer

The ideal candidate will have strong experience with Docker, Kubernetes, and Unix/Linux systems, along with a deep understanding of incident management, production support, and application monitoring. You will collaborate closely with development, operations, and security teams to resolve production issues quickly and efficiently while continuously improving the systems' reliability.

Key Responsibilities:

Production Support & Incident Management:
Provide production support for mission-critical financial applications, ensuring high availability and performance.
Lead and coordinate incident management efforts, ensuring incidents are quickly diagnosed, mitigated, and resolved, with a focus on reducing downtime and service interruptions.
Troubleshoot production issues across applications, infrastructure, and networking, working closely with development and operations teams to implement long-term fixes.
System Monitoring & Performance Tuning:
Monitor and optimize the performance, availability, and reliability of systems using modern monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic).
Implement and manage alerting systems to proactively detect and resolve potential issues before they impact users.
Optimize and tune the infrastructure and applications to improve performance and reduce system resource usage.
Infrastructure Automation & DevOps Practices:
Automate infrastructure deployment, scaling, and management processes using tools such as Docker, Kubernetes, and CI/CD pipelines to ensure continuous integration and delivery.
Write and maintain infrastructure-as-code (e.g., Terraform, Ansible, etc.) to enable efficient deployment and scaling of systems and applications.
Work with DevOps teams to implement best practices for containerization, orchestration, and automation.
Collaboration with Development Teams:
Work closely with development teams to ensure that production systems are scalable, reliable, and secure.
Participate in the design, implementation, and review of new features or systems with an emphasis on their operational readiness for production.
Provide feedback on system designs and improvements, helping to bridge the gap between development and operations.
Disaster Recovery & Business Continuity:
Collaborate with the team on disaster recovery planning and ensure systems have proper backup, failover, and recovery procedures in place.
Lead efforts in capacity planning and scaling systems to meet growing traffic and data requirements while ensuring minimal impact on performance.
Security & Compliance:
Ensure that all production systems are secure and comply with industry standards and regulations related to data security, privacy, and financial compliance.
Work with security teams to address vulnerabilities and implement security best practices in application and infrastructure management.
Continuous Improvement & Documentation:
Contribute to the continuous improvement of processes, systems, and tools for better performance and reliability.
Maintain detailed documentation of systems, incidents, operational procedures, and troubleshooting steps to improve knowledge sharing and support scalability.

Required Skills and Qualifications:

Proven experience in site reliability engineering or production support within a Fintech or similarly high-demand industry.
Strong experience with Docker and Kubernetes for container orchestration, scaling, and management.
Unix/Linux experience (system administration, shell scripting, troubleshooting, performance tuning) is mandatory.
Hands-on experience with incident management and production support, including using incident response tools (e.g., PagerDuty, Opsgenie) and root cause analysis.
Solid knowledge of cloud platforms (AWS, GCP, Azure) and experience managing cloud-native applications and infrastructure.
Experience with application monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic) to ensure system reliability.
Experience with CI/CD pipelines and infrastructure automation tools (e.g., Terraform, Ansible, Jenkins, GitLab CI).
Excellent problem-solving and troubleshooting skills, with the ability to diagnose and resolve complex production issues.
Experience with disaster recovery, backup strategies, and high availability architectures for critical systems.
Strong communication skills with the ability to work collaboratively across teams, including developers, operations, and business stakeholders.
Knowledge of financial services regulations, compliance, and security best practices is a plus.

Preferred Skills:

Experience with monitoring and alerting solutions in high-volume environments (e.g., Prometheus, ELK Stack).
Familiarity with microservices architectures and understanding how to manage and scale large distributed systems.
Exposure to automated testing and performance benchmarking tools for infrastructure and applications.
Experience with logging and log management tools (e.g., ELK, Splunk).
Familiarity with networking concepts and troubleshooting in distributed systems.

Apply for this job

I am happy for the Kevin Edward Consultancy Limited to save my contact details for future correspondence.

Check similar offers

Cloud Engineer

New

SILENT EIGHT

Undisclosed Salary

Warszawa

, Fully remote

Fully remote

Cloud

PostgreSQL

Contact with a client

AWS DevOps Engineer

New

Link Group

4.94K - 6.39K USD

Warszawa

, Fully remote

Fully remote

Cloud

AWS

CI/CD

Platform Engineer (Airflow)

New

C&F S.A.

Undisclosed Salary

Warszawa

, Fully remote

Fully remote

Airflow

Amazon AWS

Python

.NET Developer / DevOps Engineer

New

Angry Nerds

2.46K - 3.69K USD

Warszawa

, Fully remote

Fully remote

Azure

.Net

Docker

Cloud Software Engineer

New

Rebels Software Sp. z o.o.

4.13K - 5.16K USD

Warszawa

, Fully remote

Fully remote

.NET Core

Azure DevOps