Skip to content

HIVE-14609: HS2 cannot drop a function whose associated jar file has been removed#6447

Open
abstractdog wants to merge 2 commits intoapache:masterfrom
abstractdog:HIVE-14609
Open

HIVE-14609: HS2 cannot drop a function whose associated jar file has been removed#6447
abstractdog wants to merge 2 commits intoapache:masterfrom
abstractdog:HIVE-14609

Conversation

@abstractdog
Copy link
Copy Markdown
Contributor

@abstractdog abstractdog commented Apr 23, 2026

What changes were proposed in this pull request?

Add an extra metastore call in DropFunctionAnalyzer if the function info retrieval wasn't successful.

Why are the changes needed?

Because with the patch it's impossible to DROP a function whose associated jar is deleted, which is extremely inconvenient.

Does this PR introduce any user-facing change?

Under normal circumstances, no.

How was this patch tested?

Unit test added (testDropSucceedsWhenJarUnavailableButFunctionInMetastore), also, more importantly, local HS2 testing is done, confirmed the behavior pre/post patch:

# one shell
mvn clean install -Dtest=StartMiniHS2Cluster -DminiHS2.clusterType=llap -DminiHS2.run=true -DminiHS2.usePortsFromConf=true -T 1C -Denforcer.skip=true -pl itests/hive-unit -Pitests -nsu -DminiHS2.isMetastoreRemote=true

# another shell

# parse HDFS address from the logs like this: "2026-04-23T06:21:12,204  INFO [main] shims.HadoopShimsSecure: Namenode null (ns=null) address: localhost/127.0.0.1:63222"
HDFS=hdfs://$(grep "Namenode.*address:" itests/hive-unit/target/surefire-reports/*.txt 2>/dev/null | grep -oP 'localhost/\K[^:]+:\d+' | head -1) && echo "HDFS: $HDFS"
JAR="./ql/target/hive-exec-4.3.0-SNAPSHOT.jar"

hdfs dfs -put -f $JAR $HDFS/tmp/test-udf.jar

beeline -u 'jdbc:hive2://localhost:10000/' -e "CREATE DATABASE IF NOT EXISTS testdb; CREATE FUNCTION testdb.myfunc AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFToString' USING JAR '$HDFS/tmp/test-udf.jar';"

hdfs dfs -rm $HDFS/tmp/test-udf.jar

beeline -u 'jdbc:hive2://localhost:10000/' -e "DROP FUNCTION testdb.myfunc;"

beeline -u 'jdbc:hive2://localhost:10000/' -e "SHOW FUNCTIONS LIKE 'testdb.myfunc';"

important to note that DROP FUNCTION is apparently successful w/wo the patch, and and exception appears in HS2 logs (which is good), but the function is only dropped successfully with the patch

pre-patch:

beeline -u 'jdbc:hive2://localhost:10000/' -e "SHOW FUNCTIONS LIKE 'testdb.myfunc';"


+----------------+
|    tab_name    |
+----------------+
| testdb.myfunc  |
+----------------+

post-patch:

beeline -u 'jdbc:hive2://localhost:10000/' -e "SHOW FUNCTIONS LIKE 'testdb.myfunc';"


+-----------+
| tab_name  |
+-----------+
+-----------+

in the log:

2026-04-23T06:57:47,726  WARN [206cd7f2-349d-4837-b992-151a856290e0 HiveServer2-Handler-Pool: Thread-911] drop.DropFunctionAnalyzer: Function testdb.myfunc has unavailable resources; proceeding with drop using metastore metadata only.

the exception appears in the log in both cases:

2026-04-23T06:55:40,376 ERROR [e3a4a121-cf9f-42da-84d3-78e7a3bcc501 HiveServer2-Handler-Pool: Thread-937] exec.FunctionRegistry: Unable to load resources for testdb.myfunc:java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:65436/tmp/test-udf.jar
java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:65436/tmp/test-udf.jar
	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1693)
	at org.apache.hadoop.hive.ql.exec.FunctionUtils.addFunctionResources(FunctionUtils.java:86)
	at org.apache.hadoop.hive.ql.exec.Registry.registerToSessionRegistry(Registry.java:690)
	at org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:667)
	at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:358)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:825)
	at org.apache.hadoop.hive.ql.ddl.function.drop.DropFunctionAnalyzer.analyzeInternal(DropFunctionAnalyzer.java:51)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:268)
	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:286)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:558)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:543)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at jdk.proxy2/jdk.proxy2.$Proxy133.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:311)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:650)
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1670)
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1650)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:65436/tmp/test-udf.jar
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1842)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1835)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1850)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:431)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:370)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2657)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2626)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2602)
	at org.apache.hadoop.hive.ql.util.ResourceDownloader.downloadResource(ResourceDownloader.java:105)
	at org.apache.hadoop.hive.ql.util.ResourceDownloader.resolveAndDownloadInternal(ResourceDownloader.java:90)
	at org.apache.hadoop.hive.ql.util.ResourceDownloader.resolveAndDownload(ResourceDownloader.java:75)
	at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1703)
	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1658)
	... 39 more

@sonarqubecloud
Copy link
Copy Markdown

super(queryState, db);
}

protected FunctionInfo getFunctionInfo(String functionName) throws SemanticException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this 1-liner function call?

// getFunctionInfo returns null when the function's JAR resource cannot be loaded (e.g. the
// HDFS file was deleted). For permanent functions fall back to a direct metastore lookup so
// that an orphaned definition can still be removed without the JAR being present.
if (!isTemporary && functionExistsInMetastore(functionName)) {
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we refactor the upper function to return all fns registered in HMS? not sure, but i think function registry should handle that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants