Skip to content

2.1 backport logrecovery#6395

Open
amcdonaldccri wants to merge 3 commits into
apache:2.1from
amcdonaldccri:2.1_backport_logrecovery
Open

2.1 backport logrecovery#6395
amcdonaldccri wants to merge 3 commits into
apache:2.1from
amcdonaldccri:2.1_backport_logrecovery

Conversation

@amcdonaldccri
Copy link
Copy Markdown

Hi
Saw this issue #4887 and thought I could help.

And using AI's help I got it to pass the unit tests.

These are the Integration Tests that got stuck or timed out.
org.apache.accumulo.test.fate.zookeeper.FateIT never returned, but ran it again and it passed
org.apache.accumulo.test.tracing.ScanTracingIT timed out twice
org.apache.accumulo.test.functional.MetadataMaxFilesIT timed out twice
org.apache.accumulo.test.functional.TimeoutIT timed out twice
org.apache.accumulo.test.functional.TServerShutdownOptimizationsIT timed out twice
org.apache.accumulo.test.functional.KerberosIT timed out twice
org.apache.accumulo.test.shell.ShellServerIT never returned, but ran it again and it passed twice

[ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 166.9 s <<< FAILURE! -- in org.apache.accumulo.test.functional.KerberosIT
[ERROR] org.apache.accumulo.test.functional.KerberosIT.testGetDelegationTokenDenied -- Time elapsed: 14.48 s <<< ERROR!
java.lang.IllegalStateException: org.apache.hadoop.security.KerberosAuthException: failure to login: using ticket cache file: FILE:/tmp/krb5cc_911602271_DwR0dF javax.security.auth.login.LoginException: java.lang
.IllegalArgumentException: Illegal principal name ajmcdonald@CCRI.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to ajmcdonald@CCRI.COM

I would like to deploy it to one of our dev environments and do some ingest testing but haven't gotten to it yet.

andrew mcdonald added 3 commits May 26, 2026 06:19
…ata apache#4873

This commit makes two major changes. First it changed log recovery to use block caches. Second it checks if a tablet has any data in walogs before acquiring the recovery lock. These two changes together really speed up loading tablets that have no data in walogs. These changes introduce an extra opening of the walogs to see if the recovery lock needs to be acquired. Using the block caches for this extra opening should avoid any extra cost. The block caches also help in the case where many tablets with the same walogs are assigned to a tablet server.

In some simple test saw an 8x speedup in tablet load times.

Anytime a tablet has an unclean shutdown it will have the walogs of the dead tserver assigned to it even if had no data in those walogs. These change make loading tablets in that situation much faster.

{"fundingSource": "41201", "team": "FED.ICGSA.OPS.MOE", "fshGit": "dummy-lo", "fshDocker": "sha256:20cf0045"}
In apache#4873 a check was added to inspect walogs during tablet load to see
if they had any data for the tablet.  This check happens prior to
volume replacement that also runs during tablet load. Therefore if
volume replacement is needed for the walogs then this check will fail
because it can not find the files and the tablet will fail to load.

To fix this problem modified the new check to switch volumes if needed
prior to running the check.

{"fundingSource": "41201", "team": "FED.ICGSA.OPS.MOE", "fshGit": "dummy-lo", "fshDocker": "sha256:20cf0045"}
… log recovery. (apache#4874)

The log recovery code would list the sorted walog files multiple times
during recovery.  These changes modify the code to only list the files
once.  Also the listing is cached for a short period of time to improve
the case of multiple tablet referencing the same walogs.  This along
with apache#4873 should result in much less traffic to the namenode when an
entire accumulo cluster shutsdown and needs to recover.

{"fundingSource": "41201", "team": "FED.ICGSA.OPS.MOE", "fshGit": "dummy-lo", "fshDocker": "sha256:20cf0045"}
@amcdonaldccri amcdonaldccri force-pushed the 2.1_backport_logrecovery branch from 909d002 to 8333a39 Compare May 26, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant