#1 Job Board for tech industry in Europe

  • Job offers
  • Site Reliability Engineer (SRE)
    New
    DevOps

    Site Reliability Engineer (SRE)

    Warszawa
    3 688 - 4 918 USDNet/month - B2B
    Type of work
    Full-time
    Experience
    Mid
    Employment Type
    B2B
    Operating mode
    Hybrid

    Tech stack

      SRE

      master

      DevOps

      master

      application support

      advanced

      Production Support

      advanced

      Linux / Unix

      advanced

      ITIL

      regular

      Kubernetes

      regular

    Job description

    Online interview
    Friendly offer

    The ideal candidate will have strong experience with Docker, Kubernetes, and Unix/Linux systems, along with a deep understanding of incident management, production support, and application monitoring. You will collaborate closely with development, operations, and security teams to resolve production issues quickly and efficiently while continuously improving the systems' reliability.

    Key Responsibilities:

    • Production Support & Incident Management:
    • Provide production support for mission-critical financial applications, ensuring high availability and performance.
    • Lead and coordinate incident management efforts, ensuring incidents are quickly diagnosed, mitigated, and resolved, with a focus on reducing downtime and service interruptions.
    • Troubleshoot production issues across applications, infrastructure, and networking, working closely with development and operations teams to implement long-term fixes.
    • System Monitoring & Performance Tuning:
    • Monitor and optimize the performance, availability, and reliability of systems using modern monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic).
    • Implement and manage alerting systems to proactively detect and resolve potential issues before they impact users.
    • Optimize and tune the infrastructure and applications to improve performance and reduce system resource usage.
    • Infrastructure Automation & DevOps Practices:
    • Automate infrastructure deployment, scaling, and management processes using tools such as Docker, Kubernetes, and CI/CD pipelines to ensure continuous integration and delivery.
    • Write and maintain infrastructure-as-code (e.g., Terraform, Ansible, etc.) to enable efficient deployment and scaling of systems and applications.
    • Work with DevOps teams to implement best practices for containerization, orchestration, and automation.
    • Collaboration with Development Teams:
    • Work closely with development teams to ensure that production systems are scalable, reliable, and secure.
    • Participate in the design, implementation, and review of new features or systems with an emphasis on their operational readiness for production.
    • Provide feedback on system designs and improvements, helping to bridge the gap between development and operations.
    • Disaster Recovery & Business Continuity:
    • Collaborate with the team on disaster recovery planning and ensure systems have proper backup, failover, and recovery procedures in place.
    • Lead efforts in capacity planning and scaling systems to meet growing traffic and data requirements while ensuring minimal impact on performance.
    • Security & Compliance:
    • Ensure that all production systems are secure and comply with industry standards and regulations related to data security, privacy, and financial compliance.
    • Work with security teams to address vulnerabilities and implement security best practices in application and infrastructure management.
    • Continuous Improvement & Documentation:
    • Contribute to the continuous improvement of processes, systems, and tools for better performance and reliability.
    • Maintain detailed documentation of systems, incidents, operational procedures, and troubleshooting steps to improve knowledge sharing and support scalability.

    Required Skills and Qualifications:

    • Proven experience in site reliability engineering or production support within a Fintech or similarly high-demand industry.
    • Strong experience with Docker and Kubernetes for container orchestration, scaling, and management.
    • Unix/Linux experience (system administration, shell scripting, troubleshooting, performance tuning) is mandatory.
    • Hands-on experience with incident management and production support, including using incident response tools (e.g., PagerDuty, Opsgenie) and root cause analysis.
    • Solid knowledge of cloud platforms (AWS, GCP, Azure) and experience managing cloud-native applications and infrastructure.
    • Experience with application monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic) to ensure system reliability.
    • Experience with CI/CD pipelines and infrastructure automation tools (e.g., Terraform, Ansible, Jenkins, GitLab CI).
    • Excellent problem-solving and troubleshooting skills, with the ability to diagnose and resolve complex production issues.
    • Experience with disaster recovery, backup strategies, and high availability architectures for critical systems.
    • Strong communication skills with the ability to work collaboratively across teams, including developers, operations, and business stakeholders.
    • Knowledge of financial services regulations, compliance, and security best practices is a plus.

    Preferred Skills:

    • Experience with monitoring and alerting solutions in high-volume environments (e.g., Prometheus, ELK Stack).
    • Familiarity with microservices architectures and understanding how to manage and scale large distributed systems.
    • Exposure to automated testing and performance benchmarking tools for infrastructure and applications.
    • Experience with logging and log management tools (e.g., ELK, Splunk).
    • Familiarity with networking concepts and troubleshooting in distributed systems.


    Apply for this job

    File upload
    Add document

    Format: PDF, DOCX, JPEG, PNG. Max size 5 MB

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
    I am happy for the Kevin Edward Consultancy Limited to save my contact details for future correspondence.

    Check similar offers

    Cloud Engineer

    New
    SILENT EIGHT
    Undisclosed Salary
    Warszawa
    , Fully remote
    Fully remote
    Cloud
    PostgreSQL
    Contact with a client

    AWS DevOps Engineer

    New
    Link Group
    4.94K - 6.39K USD
    Warszawa
    , Fully remote
    Fully remote
    Cloud
    AWS
    CI/CD

    Platform Engineer (Airflow)

    New
    C&F S.A.
    Undisclosed Salary
    Warszawa
    , Fully remote
    Fully remote
    Airflow
    Amazon AWS
    Python

    .NET Developer / DevOps Engineer

    New
    Angry Nerds
    2.46K - 3.69K USD
    Warszawa
    , Fully remote
    Fully remote
    Azure
    .Net
    Docker

    Cloud Software Engineer

    New
    Rebels Software Sp. z o.o.
    4.13K - 5.16K USD
    Warszawa
    , Fully remote
    Fully remote
    C#
    .NET Core
    Azure DevOps