I was following this documentation and had some experiences I wanted to document. I wanted to accomplish this.
Figure 1, External Apache Hive metastore using Azure Databricks and Azure SQL
Figure 2, External Apache Hive metastore using Azure Databricks and Azure SQL
The cluster version I used was the most current one.
Figure 3, Azure Databricks Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2) cluster
Here is how my final Spark config ended up looking like.
Final Spark config:
datanucleus.schema.autoCreateTables true
spark.hadoop.javax.jdo.option.ConnectionUserName benperk@csharpguitar
datanucleus.fixedDatastore false
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://csharpguitar.database.windows.net:1433;database=brainjammer
spark.hadoop.javax.jdo.option.ConnectionPassword *************
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver
Solution
Basically, the datanucleus.autoCreateSchema configuration name is no longer correct. So I changed datanucleus.autoCreateSchema true to datanucleus.schema.autoCreateTables true
Exceptions
- Caused by: MetaException(message:Version information not found in metastore.)
- Caused by: javax.jdo.JDODataStoreException: Required table missing : “VERSION” in Catalog “” Schema “”. DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “datanucleus.schema.autoCreateTables“
All of these messages happened because I included these two entries in my Spark config
- spark.sql.hive.metastore.version 3.1, also tried spark.sql.hive.metastore.version 2.3.7
- spark.sql.hive.metastore.jars builtin
Experienced Exceptions
- AnalysisException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
- Error in SQL statement: IllegalArgumentException: Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 3.1. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
- Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 0.13.0. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
- Builtin jars can only be used when hive execution version == hive metastore version. Execution: 2.3.7 != Metastore: 0.13.0. Specify a valid path to the correct hive jars using spark.sql.hive.metastore.jars or change spark.sql.hive.metastore.version to 2.3.7.
I did find some information on StackOverflow about adding these two lines to the Spark config, which provided some good information, turns out, apparently the name has changed
- datanucleus.autoCreateSchema true
- datanucleus.fixedDatastore false