Designing and implementing data processing systems using Microsoft Fabric, Azure Data Analytics, Databricks and other distributed frameworks (ex. Hadoop, Spark, Snowflake, Airflow).
Writing efficient and scalable code to process, transform, and clean large volumes of structured and unstructured data.
Building data pipelines to ingest data from various sources (databases, APIs or streaming platforms).
Integrating and transforming data to ensure its compatibility with the target data model or format.
Designing and implementing data models that support efficient data storage, retrieval, and analysis.
Utilizing frameworks like Spark to perform distributed computing tasks, such as parallel processing, distributed data processing, or machine learning algorithms.
Establishing data governance practices to maintain data integrity, quality, and consistency.
Identifying and resolving issues related to data processing, storage or infrastructure.
Collaborating with cross-functional teams including data scientists, analysts, and business stakeholders to understand their requirements and provide technical solutions.
Train and mentor Junior Data Engineers, providing guidance and knowledge transfer.
Requirements:
Commercial experience working as a Data Engineer (4+ years of experience).
Proficiency in MS Fabric, Azure Data Factory.
Strong programming skills in Azure Databricks or Python, PySpark or Scala and Spark SQL/TSQL for data transformations.
Extensive knowledge of MS Fabric components:
Lakehouse, OneLake, Data Pipelines, Real-Time Analytics, Data warehouse, Power BI Integration, Semantic Models, Spark Jobs, Notebooks and Realtime Analytics, Dataflow Gen1 and Gen2, KQL.
Integrate Fabric capabilities for seamless data flow, governance, and collaboration across teams.
Strong understanding of Delta Lake, Parquet, and distributed data systems.
Strong experience in implementation and management of Lakehouse using Databricks and Azure Tech stack (ADLS Gen2, ADF, Azure SQL).
Proficiency in data integration techniques, ETL processes and data pipeline architectures.
Solid understanding of data processing techniques such as batch processing, real-time streaming and data integration.
Proficiency in working with relational and non-relational databases - MSSQL, MySQL, PostgreSQL or Cassandra.
Proficient in data modeling techniques and database optimization - query optimization, indexing, and performance tuning.
Understanding of data security best practices and experience implementing data governance policies.
Strong problem-solving abilities to identify and resolve issues related to data processing, storage, or infrastructure.
Analytical mindset to analyze and interpret complex datasets for meaningful insights.
Excellent communication skills to effectively collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders.
Ability to convey technical concepts to non-technical stakeholders in a clear and concise manner.
Nice to have:
A bachelor's or master's degree in Computer Science, Information Systems, or a related field.
Excellent knowledge on Source control / Version Control along with CI/CD.
Knowledge of data warehousing concepts and technologies like Redshift, Snowflake, or BigQuery.
Familiarity with data privacy regulations and compliance standards.
Experience in designing and creating integration and unit test.