Introdução
Claire Joster is recruiting a Data Engineer – Databricks (M/F) for a retail company.
Função
Projects understanding and Communication
– Understand problems from a user perspective and communicate to clearly understand the issue.
– Ensure the architecture provided by the Data Architect is clearly understood by yourself.
– Communicate with the Data Architect and your peers on the technical solution you’re developing and communicate with the Project Manager in charge of the project you’re working on.
Development
– Write and communicate on new or updated interface contracts.
– Strong understanding of data warehousing concepts, data lakes, ETL/ELT processes, and data modeling.
– Develop data pipelines based on the defined architecture.
– Ensure the regular good practices are applied.
– Deploy requested infrastructure, particularly using Terraform.
– Make peer reviews and ask to your peers to review your code when merging a new version of the codebase.
Testing
– Define tests with your project manager, based on the functional and technical requirements of the pipeline you’re developing
– Perform those tests and communicate regularly on the results.
– Regularly summarize the results of your tests in a dedicated document.
Deployments
– Present to the Data Architect in charge of the architecture, and the Lead DataOps, the development that was performed through our Deployment Reviews.
– Track and communicate on any potential errors in the entire period of active monitoring following a deployment.
– Ensure diligent application of deployment process, logging, and monitoring strategy.
Requisitos
– Proficiency with PySpark and Spark SQL for data processing.
– Experience with Databricks using Unit Catalog.
– Knowledge of Delta Live Tables (DLT) for automated ETL and workfl orchestration in Databricks.
– Familiarity with Azure Data Lake Storage.
– Experience with orchestration tools (e.g., Apache Airflow or similar) for building and scheduling ETL/ELT pipelines.
– Knowledge of data partitioning and data lifecycle management on cloud-based storage.
– Familiarity with implementing data security and data privacy practices in a cloud environment.
– Terraform: At least one year of experience with Terraform and know good practices of GitOps.
Additional Knowledge and Experience that are a Plus:
– Databricks Asset Bundles
– Kubernetes
– Apache Kafka
– Data Engineer DataBricks
– Vault