Use external Hive Metastore for Synapse Spark Pool

I wrote this article which is what helped me progress forward with this.

External Apache Hive metastore, Azure Databricks, Azure SQL

I am following these instructions.  Here is the kicker, which is likely something most people may already know, but in order to make this configuration the Hive Metastore must already exist on the Azure SQL data you configured as a Linked service.  If not, you will get this exception.

import java.sql.DriverManager

Failed to establish connection: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name ‘VERSION’.

When you execute the test connection code provided here.

When I worked the instructions for this scenario I decided to create a new Azure SQL database which was empty to use as my metastore.  That is when I experienced the issue mentioned above.  Since I created a Hive Metastore in Azure Databricks which I documented here, when I changed the connection string in the test code to that database, it worked.

image

Figure 1, testing Azure SQL Hive Metastore from an Azure Synapse Analytics Spark Pool

But when I tried to run spark.sql(“show databases”).show() I received this exception.

AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

The reason was I configured my Linked service to the database which didn’t have the existing Hive Metastore.  Once I fixed that one, it was all good.

image

Figure 2, testing Azure SQL Hive Metastore from an Azure Synapse Analytics Spark Pool