Strong in Unity Catalog, Delta Lake, dbConnect,db API 2.0
Databricks workflow orchestration, Security management, Platform governance, and Data Security
Must know new features available in Databricks and their implications along with various possible use cases.
Must have followed various architectural principles to design best suited per problem.
Must be well versed with Databricks Lakehouse concept and its implementation in enterprise environments.
Must have a strong understanding of Data warehousing and various governance and security standards around Databricks.
Must know about cluster optimization and its integration with various cloud services.
Must have a good understanding of creating complex data pipelines.
Must be strong in SQL and Spark-SQL.
Must have strong performance optimization skills to improve efficiency and reduce cost.
Must have worked on designing both Batch and streaming data pipelines.
Must have extensive knowledge of Spark and Hive data processing frameworks.
Must have worked on any cloud (Azure, AWS, GCP) and most common services like ADLS/S3, ADF/Lambda, CosmosDB/DynamoDB, ASB/SQS, and Cloud databases.
Must be strong in writing unit test cases and integration tests.
Responsible for setting best practices around Databricks CI/CD.
Must understand composable architecture to take fullest advantage of Databricks capabilities.
Good to have Rest API knowledge.
Good to have an understanding of cost distribution.
Good to have if worked on a migration project to build a Unified data platform.
SQL Endpoint – Photon engine
Hands-on Experience to design and build Databricks-based solutions on any cloud platform.
Must be very good at designing End-to-End solutions on cloud platforms.
Must have good knowledge of Data Engineering concepts and related services of the cloud.
In-depth hands-on implementation knowledge of Databricks. Delta Lake, Delta table - Managing Delta Tables, Delta live tables, Databricks Cluster Configuration, Cluster policies.
Experience handling structured and unstructured datasets
Strong proficiency in programming languages like Python, Pyspark, Scala, or SQL.
Experience with Cloud platforms like AWS understanding of cloud-based data storage and computing services.
Familiarity with big data technologies like Apache Spark, Hadoop, and data lake architectures.
Develop and maintain data pipelines, ETL workflows, and analytical processes on the Databricks platform.
Should have good experience in Data Engineering in Databricks Batch process and Streaming
Should have good experience in creating Workflows & Scheduling pipelines.
Should have good exposure to how to make packages or libraries available in DB.
Familiarity with Databricks default runtimes
Databricks Certified Data Engineer Associate/Professional Certification (Desirable).
Should have experience working in Agile methodology
Strong verbal and written communication skills.
Strong analytical and problem-solving skills with a high attention to detail.