Technology stack
- GCP (must have!) , BigQuery, Cloud Storage, Apache Airflow, Cloud Composer, Vertex AI, Dataproc, Compute Engine
- CI/CD and Build tooling: Terraform, Terragrunt, Jenkins, Groovy, Crane, Kaniko
- Python, PySpark, Docker, Jupyter, Apache Airflow, Spark, Java (optional, but would be beneficial)
Key Responsibilities
- Establish and maintain best practices for ML Ops. Including version control, CI/CD pipelines and the Vertex Al Model Registry and End Points.
- Implement MLOps tools to streamline model development, training, tuning, deployment, monitoring and explain.
- Deploy and Manage ML models on GCP's Vertex Al platform ensuring efficient and scalable execution.
- Identify and address performance bottleneck in ML models and pipelines.
- Troubleshoot and resolve ML issues ensuring optimal model performance and costs. Work Closely with Compliance Analytics data scientists to prepare and preprocess data for model training and evaluation.
- Assist in feature engineering and selection to ensure model performance
- Develop techniques to visualize and explain model behavior ensuring model transparency and accountability in-line with PRA S51/23 guidelines.
- Collaborate with infrastructure and DevOps teams to establish efficient deployment and scaling strategies.
Pipeline Development:
- Build and maintain robust pipelines for model training, tuning and deployment leveraging components of Vertex Al and GCP tooling like Cloud Composer utilizing Python and Java and Big Query.
- Implement automated monitoring and alerting to track model performance and identify potential issues.
- Develop and maintain data quality checks and validation including reconciliations in-line with Data Quality and Retention Controls.
- Implement robust security measures to protect sensitive data and models.
Required Skills and Experience:
- Strong proficiency in ML Ops principles and tools.
- Proficiency in data engineering and pipeline development.
- Experience with GCP including Big Query, Cloud Composer and Vertex Al.
- Strong problem-solving and analytical skills.
- Strong proficiency in Python
- Experience with Java would be beneficial.
To learn more about Antal, please visit www.antal.pl