Delay auto-tuning v2 by stachuman · Pull Request #2337 · meshcore-dev/MeshCore

stachuman · 2026-04-19T14:43:45Z

Improving ACK delivery and number of ACK received - theoretically ensuring that if DM is delivered - at least one ACK is received by sender (we speak of probability!)

This base on an extensive simulations (over 100k simulations done - dedicated simulator built (hopefully can be used also for other purposes)) - all details and road to this PR can be traced in this discussion #2053

Proposal base on theoretical work of KPrivitt and my simulations (all source data used for simulations are available on my github page)

--

There are some differences comparing to the previous PR

change of variable to: auto.tune.delays (to make it more consistent)
set auto.tune.delays on/off
wired recalc of parameters into onAdvertRecv - so no periodic recalc done, instead - automatic recalc called
auto.tune.delays by default is off

Measured performance at defaults (same 4 topologies, 6 rnd seeds each):

Density	Msg Delivery	Channel Delivery	ACK Delivery	Avg ACK Copies
sparse	67.0%	57.5%	36.0%	0.60
medium	62.7%	78.8%	33.7%	0.70
dense	65.0%	64.5%	24.3%	0.50
very_dense	64.3%	43.0%	13.0%	0.20
Mean	64.8%	60.9%	26.8%	0.50

Proposed changes - autotune tx/direct tx delays:

Density	Msg Delivery Δ	ACK Delivery Δ	ACK Copies Δ
sparse	−2.7pp	+11.0pp	+0.80
medium	−6.3pp	+1.6pp	+1.00
dense	−6.0pp	+13.7pp	+1.30
very_dense	−9.3pp	+17.0pp	+1.40
Mean	−6.1pp	+10.9pp	+1.13

Update Docs URL

…or changes - Add multi byte FAQ - Reword amped radio output setting numbers - Clarify repeater ID collision including distance, supercede meshcore-dev#1478 - Reference awesome meshcore for community projects. Supercede meshcore-dev#1893

Removed "see note" from RAK 4631 entry in FAQ.

Fixed an extra TOC jump link inserted by VSCode Markdown All in One VS Code extension.

fixed typos and refined multibyte sections.

add multibyte FAQ, reference awesome-meshcore community projects, minor changes

Update RAK 4631 entry in FAQ on new bootloader - removed "see note"

# Conflicts: # docs/faq.md

… - improving ACK delivery and number of ACK received - theoretically ensuring that if DM is delivered - at least one ACK is received by sender.

KPrivitt · 2026-04-19T23:02:46Z

In the prior PR the frequency of the neighbor count was every 5 min. This is far too frequent and is consuming compute resources that can be utilized elsewhere.

While the SNR of a received Advert can vary several dB from message to message (pings can change on every one sent) and this can affect the neighbor count (for repeaters close to the 0dB SNR threshold), but the surrounding number of repeaters actually changes very slowly. I believe the count should be done daily, twice a week or once a week.

1nerdherder · 2026-04-20T01:29:27Z

I've been watching this work across the two pull request conversations. This was a heavy lift, deserving of strong consideration amongst the devs. At a minimum, the existing defaults are not optimal. The power of the autotune algorithm approach is that it makes all repeaters "good neighbors" who will adapt their settings in harmony as the mesh evolves.

stachuman · 2026-04-20T06:20:14Z

In the prior PR the frequency of the neighbor count was every 5 min. This is far too frequent and is consuming compute resources that can be utilized elsewhere.

While the SNR of a received Advert can vary several dB from message to message (pings can change on every one sent) and this can affect the neighbor count (for repeaters close to the 0dB SNR threshold), but the surrounding number of repeaters actually changes very slowly. I believe the count should be done daily, twice a week or once a week.

Correct- on one hand calculating every couple of minutes is not a big burden, yet for the sake of clean code I have moved that to advert recp. Code.

terminalvelocity23 · 2026-04-21T10:05:43Z

Hello, I've tested your PR in our mesh, which is very dense and there's a lot of in-band noise. The algorithm has set the delays so high that the repeater effectively stopped functioning.
Also, it didn't return the delays to their original values after disabling.

stachuman · 2026-04-21T13:10:06Z

Hello, I've tested your PR in our mesh, which is very dense and there's a lot of in-band noise. The algorithm has set the delays so high that the repeater effectively stopped functioning. Also, it didn't return the delays to their original values after disabling.

--
Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

For the last point - very valid point, let me update that.

terminalvelocity23 · 2026-04-21T14:11:58Z

Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

It was one repeater to test this feature. The delays were set so high it effectively stopped relaying packets, everything but its admin interface was handled by other repeaters around. It stopped showing up in outbound and inbound paths.

stachuman · 2026-04-21T16:13:40Z

Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

It was one repeater to test this feature. The delays were set so high it effectively stopped relaying packets, everything but its admin interface was handled by other repeaters around. It stopped showing up in outbound and inbound paths.

In fact - this is not bad thing what you observed, it’s in fact desired effect. The purpose of mesh network is NOT to ensure that every repeater is transmitting but to ensure effectivenes of the overall network.
To be precise - in a dense network it is NOT recommended that ‚all repeaters’ transmit within the same time window - as this only increase probability of collision - failed transmission.

Not to go into details - ‚all the repeaters in the area carried on the transmission but your one was silent’ - if then - due to collisions - ‚all the other traffic would fails’ - your repeater will retransmit with a delay - giving the chance to deliver message, and opposite - if ‚all the other traffic’ will deliver, your one won’t be even required (it will kind of reduce density of network - what is a recommended thing).

And this is the purpose of PR - to increase overall probability of delivery (ACK) - it is NOT to increase single repeater number of transmissions. Effectiveness is not coming here from how quick re-transmission is done - but is coming from probability of evading collisions with other repeaters.

Hope- I’m clear in my explanation.

stachuman · 2026-04-21T17:16:59Z

Here are theoretical results with 2 scenarios - 1. We address the busiest routers in an organized way, 2. We address randomly routers with auto delay function.
(0% - we use only default firmware, 30% - means - we use 30% of repeaters in auto-delay optimization mode)

Degree Strategy (upgrade busiest nodes first)

% Optimized	N nodes	Delivery	std	ACK	Channel	F_del	P_del	F_ack	P_ack	col/lost	ack/del
0%	0	61.7%	7.1	22%	62%	58%	48%	15%	44%	32.8	0.4
10%	14	62.0%	3.3	23%	73%	59%	43%	17%	38%	42.3	0.5
30%	43	57.7%	3.4	25%	79%	44%	36%	18%	36%	22.8	0.7
50%	72	55.0%	9.9	30%	75%	34%	32%	29%	34%	32.1	0.8
75%	107	52.3%	3.9	30%	84%	25%	24%	29%	31%	31.8	1.1
100%	143	50.7%	6.5	27%	72%	21%	22%	28%	26%	43.3	1.3

Random Strategy (uncoordinated rollout - random repeaters uses auto-delay)

% Optimized	N nodes	Delivery	std	ACK	Channel	F_del	P_del	F_ack	P_ack	col/lost	ack/del
0%	0	61.7%	7.1	22%	62%	58%	48%	15%	44%	32.8	0.4
10%	14	58.3%	8.0	27%	74%	51%	46%	21%	38%	34.3	0.6
30%	43	56.0%	4.2	29%	80%	43%	33%	25%	35%	38.4	0.8
50%	72	51.3%	7.8	22%	79%	42%	23%	23%	21%	26.7	0.8
75%	107	46.3%	5.7	26%	79%	30%	22%	30%	26%	33.9	1.2
100%	143	50.7%	6.5	27%	72%	21%	22%	28%	26%	43.3	1.3

Radio Efficiency (collision-free RX ratio)

% Optimized	Degree radio_eff	Degree ackpath_eff	Random radio_eff	Random ackpath_eff
0%	62.1%	59.2%	62.1%	59.2%
10%	65.3%	63.6%	65.1%	62.9%
30%	71.4%	69.9%	69.5%	68.1%
50%	74.1%	72.2%	73.6%	71.9%
75%	77.1%	77.1%	77.2%	76.8%
100%	76.8%	76.5%	76.8%	76.5%

1. Mixed firmware outperforms full optimization on ACK delivery

The best ACK rate (30%) occurs at degree 50-75%, not at 100% (27%) - as it comes from the scenario 1 - it means - addressing the most busiest repeaters
My interpretation:

Optimized nodes reduce collision pressure in dense clusters
Default nodes relay faster, creating alternative paths and timing diversity
The combination produces more successful ACK round-trips than either firmware alone

Note - around 75% we are reaching 1 ack delivered - so that's another reason to see that ass a sweet-spot setting.

2. Channel (broadcast) delivery peaks at degree 75%

Channel delivery reaches 84% at degree 75% — a +22pp improvement over the 0% baseline (62%) and +12pp over full optimization (72%).

3. Degree strategy is consistently better than random

Metric (at 75%)	Degree	Random	Delta
Delivery	52.3%	46.3%	+6.0pp
ACK	30%	26%	+4pp
Channel	84%	79%	+5pp
Std deviation	3.9%	5.7%	more stable

1nerdherder · 2026-04-21T19:02:22Z

Sorry for the late comment:
The earlier analysis showed the existing defaults to be flawed and actually making things worse, so should we not be selecting a new “default” rather than reuse the known defective one?

terminalvelocity23 · 2026-04-22T14:53:12Z

I'd say it's still too agressive. I've switched auto-tuning on two my repeaters pointed no the north and south of the high-rise I'm in, and dropped tx power on the companion, so nobody but them will hear it.
After a few hours the delays have settled at 12.8 for flood and 38 and 40 for direct. I mean yeah, it makes for collision avoidance, but considering the fact that the max message length is 150/2 minus whatever bytes your name requires if you use non-English alphabet, conveying any complex thought requires a few messages in succession. And the delay of up to 40 seconds breaks the sequence.

stachuman · 2026-04-22T17:34:20Z

I'd say it's still too agressive. I've switched auto-tuning on two my repeaters pointed no the north and south of the high-rise I'm in, and dropped tx power on the companion, so nobody but them will hear it. After a few hours the delays have settled at 12.8 for flood and 38 and 40 for direct. I mean yeah, it makes for collision avoidance, but considering the fact that the max message length is 150/2 minus whatever bytes your name requires if you use non-English alphabet, conveying any complex thought requires a few messages in succession. And the delay of up to 40 seconds breaks the sequence.

Well... everything base on probability, not on feelings. However - I admit - scenario where sequences of messages is sent was not tested.
Would you like to propose a scenario?

terminalvelocity23 · 2026-04-22T23:04:04Z

@stachuman Idk about a scenario, but maybe capping the delays at maybe 20s max isn't a bad idea.

KPrivitt · 2026-04-25T00:14:06Z

stachuman,
Thank you for your tremendous effort. Your results validate my original supposition that as the density increases the need for additional backoff increases and density (which is an easy thing to measure using a neighbor count) is a valid metric to use to adjust and set the amount.

From a theoretical point of view this is easy to see: the probability of a collision drops off as the number of slots increases, however if the number of neighbors increases it counteracts that benefit. Essentially, if the number of neighbors doubles the number of slots needs to double.

This does lead to high delay values in a very dense mesh. But remember in my comments on "success" I did mention that as this increases it does introduce a delay and when that delay becomes humanly noticeable (40 seconds definitely exceeded that threshold) it would need to be capped. A "balanced" or throttled setting is needed.

One thing is clear, an automatic tuning does improve mesh performance, and the current defaults are bad settings, in that they provide insufficient backoff. The mesh works better when collisions are reduced and backoff reduces collisions.

The question and discussion at hand is what should the table entries be for each neighbor count?

The discussion also needs to be about what the "success criteria" should be: what metric should be used? Just one? A combination? How should balancing be done. This will likely be contentious and have many different and sometimes opposing views. But the discussion is GOOD, all views should be entertained. That usually will generate a better result. I hope the dev experts will participate.

So, given the data we have: Can we start with a conservative table to enable automatic tuning and finalize the table later? Lets take the low hanging fruit that is right in front of us.

One comment regarding some comments that this is a centralized approach. It is not, it is decentralized since each repeater can have its own setting. This can be disabled and any repeater owner can set the values they choose. It is not forcing any particular setting, it allows choice and optimization.

The immediate value here is getting the defaults changed to a better setting (eliminate the majority of repeaters being set at the insufficient default values, we need to be good neighbors to each other, just one repeater setting it to better values will help their neighbor, but if the neighbors continue to stomp on you... well, it's best if we are all good neighbors) Plus having the ability to adjust based on density, "the mesh" can adjust for the future.

Optimization of the optimizer can come later.

My opinion: For what it is worth..

Last item: do we know why rxdelay does not affect the simulation results? It should...

That can easily be added to the table as another column. But with what settings... Can it be added and left all 0's (the current default) until we get a future better table and understand what is going on and the "optimum" settings. rxdelay is a secondary benefit, but it is a tool in our chest why not use it.

liamcottle and others added 15 commits March 24, 2026 15:38

update docs cname

2f6046d

Merge pull request meshcore-dev#2135 from liamcottle/docs/update

31007d9

Update Docs URL

Update RAK 4631 entry in FAQ on new bootloader

7268d7d

Removed "see note" from RAK 4631 entry in FAQ.

Fix TOC insertion by Markdown All in One VS Code extension

ea6ec53

Fixed an extra TOC jump link inserted by VSCode Markdown All in One VS Code extension.

update docs logo

3096323

fixed typos and refined multibyte sections.

633db08

fixed typos and refined multibyte sections.

Merge pull request meshcore-dev#2172 from LitBomb/patch-24

73fc967

add multibyte FAQ, reference awesome-meshcore community projects, minor changes

update neighbor.remove docs

a9b55f5

update readme links

8ede764

update faq

9ec0822

Merge pull request meshcore-dev#2176 from jschrempp/patch-1

bfd4800

Update RAK 4631 entry in FAQ on new bootloader - removed "see note"

Merge branch 'dev'

dee3e26

# Conflicts: # docs/faq.md

Improved proposal for autotune capability for tx and direct tx delays…

b36cae2

… - improving ACK delivery and number of ACK received - theoretically ensuring that if DM is delivered - at least one ACK is received by sender.

Comment fix

8e47460

stachuman mentioned this pull request Apr 19, 2026

Autotune of delays based on number of neighbors #2125

Closed

On auto-tune off - restore delays to defaults

e44d871

Uh oh!

Conversation

stachuman commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Measured performance at defaults (same 4 topologies, 6 rnd seeds each):

Proposed changes - autotune tx/direct tx delays:

Uh oh!

KPrivitt commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1nerdherder commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stachuman commented Apr 20, 2026

Uh oh!

terminalvelocity23 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stachuman commented Apr 21, 2026

Uh oh!

terminalvelocity23 commented Apr 21, 2026

Uh oh!

stachuman commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stachuman commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Degree Strategy (upgrade busiest nodes first)

Random Strategy (uncoordinated rollout - random repeaters uses auto-delay)

Radio Efficiency (collision-free RX ratio)

1. Mixed firmware outperforms full optimization on ACK delivery

2. Channel (broadcast) delivery peaks at degree 75%

3. Degree strategy is consistently better than random

Uh oh!

1nerdherder commented Apr 21, 2026

Uh oh!

terminalvelocity23 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stachuman commented Apr 22, 2026

Uh oh!

terminalvelocity23 commented Apr 22, 2026

Uh oh!

KPrivitt commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

stachuman commented Apr 19, 2026 •

edited

Loading

KPrivitt commented Apr 19, 2026 •

edited

Loading

1nerdherder commented Apr 20, 2026 •

edited

Loading

terminalvelocity23 commented Apr 21, 2026 •

edited

Loading

stachuman commented Apr 21, 2026 •

edited

Loading

stachuman commented Apr 21, 2026 •

edited

Loading

terminalvelocity23 commented Apr 22, 2026 •

edited

Loading

KPrivitt commented Apr 25, 2026 •

edited

Loading