We are seeking an experienced Principal DevOps Engineer to lead our DevOps initiatives, providing technical expertise and leadership in designing, implementing, and maintaining scalable and efficient systems. As a Principal DevOps Engineer, you will be at the forefront of managing and automating infrastructure, working across teams to ensure the smooth operation of cloud and on-premise environments. You will be responsible for driving key automation projects, ensuring a reliable and high-performing environment, and mentoring junior engineers to elevate the team's capabilities.
Key Responsibilities:
- Lead the design and implementation of scalable, highly available infrastructure solutions across hybrid cloud environments.
- Oversee the automation of operational processes, ensuring the infrastructure is repeatable and reliable.
- Collaborate with development, QA, and operations teams to drive continuous integration and continuous delivery (CI/CD) pipeline improvements.
- Build and maintain monitoring, alerting, and incident response strategies to ensure system reliability and performance.
- Evaluate, select, and implement tools for containerization, orchestration, and automation (Docker, Kubernetes, Ansible).
- Develop and enforce best practices and standards for system architecture, security, and configuration management.
- Own and manage cloud infrastructure, primarily AWS, ensuring cost optimization and security best practices.
- Conduct performance tuning, troubleshooting, and root cause analysis of complex issues in production environments.
- Mentor and guide junior engineers to adopt best practices and enhance their skill sets.
- Drive a culture of collaboration, innovation, and continuous improvement across DevOps teams.
Mandatory Skills & Qualifications:
-
Linux: Expert-level proficiency in Linux (CentOS, Ubuntu, Red Hat, etc.) system administration and troubleshooting.
-
Microsoft: Strong experience with Windows Server environments and associated infrastructure management.
-
AWS: In-depth knowledge of AWS services (EC2, S3, VPC, RDS, Lambda, etc.) with hands-on experience in building and maintaining cloud-based environments.
-
Containerization (Docker): Strong expertise in containerizing applications using Docker and managing containerized workloads.
-
Orchestration (Kubernetes): Advanced knowledge and hands-on experience with Kubernetes for container orchestration, including cluster management, scaling, and deployments.
-
Configuration Management (Ansible): Expertise in using Ansible for configuration management, infrastructure automation, and provisioning.
- Experience with continuous integration tools (Jenkins, GitLab CI, CircleCI, etc.).
- Strong scripting skills in languages such as Bash, Python, or similar.
- Proven ability to design and manage highly available, scalable, and secure infrastructure.
- Deep understanding of networking, security, and monitoring solutions in cloud and on-prem environments.
Desirable Skills:
- Experience with other cloud platforms (Azure, GCP) is a plus.
- Knowledge of infrastructure as code tools like Terraform or CloudFormation.
- Experience with logging, monitoring, and alerting systems such as ELK Stack, Prometheus, Grafana, or similar.
- Ability to contribute to the development of DevOps strategy and roadmap for the organization.