External Apache Hive metastore, Azure Databricks, Azure SQL

I was following this documentation and had some experiences I wanted to document.  I wanted to accomplish this.

image

Figure 1, External Apache Hive metastore using Azure Databricks and Azure SQL

image

Figure 2, External Apache Hive metastore using Azure Databricks and Azure SQL

The cluster version I used was the most current one.

image

Figure 3, Azure Databricks Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2) cluster

Here is how my final Spark config ended up looking like.

Final Spark config:

datanucleus.schema.autoCreateTables true

spark.hadoop.javax.jdo.option.ConnectionUserName benperk@csharpguitar

datanucleus.fixedDatastore false

spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://csharpguitar.database.windows.net:1433;database=brainjammer

spark.hadoop.javax.jdo.option.ConnectionPassword *************

spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver

Solution

Basically, the datanucleus.autoCreateSchema configuration name is no longer correct.  So I changed datanucleus.autoCreateSchema true to datanucleus.schema.autoCreateTables true

Exceptions

  • Caused by: MetaException(message:Version information not found in metastore.)
  • Caused by: javax.jdo.JDODataStoreException: Required table missing : “VERSION” in Catalog “” Schema “”. DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “datanucleus.schema.autoCreateTables

All of these messages happened because I included these two entries in my Spark config

  • spark.sql.hive.metastore.version 3.1, also tried spark.sql.hive.metastore.version 2.3.7
  • spark.sql.hive.metastore.jars builtin

Experienced Exceptions

  • AnalysisException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
  • Error in SQL statement: IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 3.1. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
  • Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 0.13.0. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
  • Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 0.13.0. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.

I did find some information on StackOverflow about adding these two lines to the Spark config, which provided some good information, turns out, apparently the name has changed

  • datanucleus.autoCreateSchema true
  • datanucleus.fixedDatastore false