Skip to content

Improve false positive rate on real-world packages#769

Merged
christophetd merged 4 commits into
v3from
christophe.tafanidereeper/v3-remove-further-false-positives
Jun 16, 2026
Merged

Improve false positive rate on real-world packages#769
christophetd merged 4 commits into
v3from
christophe.tafanidereeper/v3-remove-further-false-positives

Conversation

@christophetd

@christophetd christophetd commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

In this PR

Run Yara compilation tests in CI

We had a Makefile target but it wasn't running in CI, resulting in previously-silently failing Yara rule. Fixed in dc67dcb, which allowed to identify broken rules that we fixed in 092675f.

Rule improvements

Rule What changed Sample packages fixed
threat-runtime-obfuscation.yar $b64_1 requires an opening quote and a 150+ char run (long URLs / generated identifiers are ≤108); dropped the [A-Z]{10,} branch of $random_vars that matched ALLCAPS words (MERCHANTABILITY, INTERACTIVE) npm requests, mistralai, mypy-boto3-securityhub, pytensor
threat-filesystem-autostart.yar .profile now requires quoted-path context (no self.profile / .profile_manager); open(\bopen\s*\( so urlopen( no longer counts semgrep, dbt-snowflake, pytensor, testcontainers
lolbas-sysinfo.meta whoami / hostname only match inside an exec call (no dict keys, findall(), API paths) mistralai, botocore, censys
threat-network-exfiltration.yar $ip excludes loopback / RFC1918 ranges ($ip_internal carve-out); added source-only path_include; renamed rule to match file id mistralai (mcp.py), timing-asgi
threat-filesystem-read.yar added source-only path_include so METADATA / changelogs are not scanned botocore, python-dotenv
threat-network-outbound-shady-links.yar narrowed bare discord.com to discord.com/api/webhooks; added source-only path_include pytest

Evals

Iteration Commit Precision Recall F1 MCC
baseline (after #764) 6865b1e 88.8% 88.5% 88.7% 0.82
This PR (when Yara rules were silently broken) af1f25d 90.9 % (+2.4) 85.2 % (-3.5) 87.9% (-0.8) 0.81 (-0.01)
This PR in its final state 092675f 89.2 (+1.1 vs baseline) 88.3 % (+0.3 vs baseline) 88.8 % (+0.1 vs baseline) 0.82 (same as baseline)

I do think this makes the rules better and the hit on recall is worth the gain in accuracy in our context. In a follow-up PR I'll analyze the recall drop and see if we can solve it.

@christophetd christophetd marked this pull request as ready for review June 15, 2026 12:50
@christophetd christophetd requested a review from a team as a code owner June 15, 2026 12:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 497c7199d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread guarddog/analyzer/sourcecode/threat-network-exfiltration.yar Outdated
@christophetd christophetd force-pushed the christophe.tafanidereeper/v3-remove-further-false-positives branch from 7ddd42b to fcc8a70 Compare June 15, 2026 14:39
YARA string identifiers are file-wide, so the previous
$ip and not $ip_internal condition suppressed the entire $ip branch
whenever any loopback/private URL appeared in a file, even when a real
public-IP C2 endpoint was also present. Bake the loopback/private range
exclusion (0/8, 10/8, 127/8, 169.254/16, 172.16-31, 192.168/16) into the
$ip regex so it applies per match. Add a regression test for a file that
mixes a loopback URL with a public-IP endpoint.
@christophetd christophetd force-pushed the christophe.tafanidereeper/v3-remove-further-false-positives branch from fcc8a70 to af1f25d Compare June 15, 2026 14:58
The YARA rule unit tests (tests/analyzer/sourcecode), including the per-rule
compilation test, were defined in the Makefile but never invoked by CI, so
rules that fail to compile (e.g. unsupported regex lookbehind) could merge
undetected. Add a test-yara-rules step to the checks workflow and include its
coverage in the combined report.
@christophetd christophetd force-pushed the christophe.tafanidereeper/v3-remove-further-false-positives branch from f5eb001 to dc67dcb Compare June 16, 2026 08:43
@DataDog DataDog deleted a comment from datadog-datadog-prod-us1 Bot Jun 16, 2026
Four rules (threat-process-download-exec, threat-runtime-obfuscation-base64exec,
threat-runtime-dynamic-loader, threat-runtime-obfuscation-steganography) used a
negative lookbehind (?<![.\w]) to match bare exec(/eval( builtins. YARA's regex
engine does not support lookbehind, so these rules failed to compile and never
fired (missing beautifulsup4, pylibutil, weecoder in the recall eval). Replace
with a consuming negative class [^.\w], which keeps the same intent (skip method
calls like model.eval()).

Add positive and benign regression tests for the four rules.
@christophetd christophetd force-pushed the christophe.tafanidereeper/v3-remove-further-false-positives branch from fce610a to 092675f Compare June 16, 2026 08:55
@christophetd christophetd enabled auto-merge (squash) June 16, 2026 11:04
// so dict keys (result['hostname']), XML lookups (findall("hostname"))
// and API paths (/workers/whoami) no longer match.
$whoami = /\b(popen|system|getoutput|getstatusoutput|check_output|check_call|Popen|run|call|spawn\w*|execl?p?e?|exec_command)\s*\(\s*\[?\s*['"]whoami\b/ nocase
$hostname = /\b(popen|system|getoutput|getstatusoutput|check_output|check_call|Popen|run|call|spawn\w*|execl?p?e?|exec_command)\s*\(\s*\[?\s*['"]hostname\b/ nocase

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here was that hostname is a lolbas (like an actual OS command).
popen, check_call etc are python dependant, i can live with this, but goes against the spirit of the .meta rule

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Opened #773 (base v3) to address this: it adds language-agnostic invocation patterns (tool-specific flags like whoami /all / hostname -I, and absolute paths like /usr/bin/whoami) alongside the exec wrapper, matching the approach already used by lolbas-proc.meta and lolbas-net.meta. The exec-wrapper patterns stay so the bare no-arg invocation is still caught and the FP fixes from this PR are preserved.

@christophetd christophetd merged commit 9e8df80 into v3 Jun 16, 2026
5 checks passed
@christophetd christophetd deleted the christophe.tafanidereeper/v3-remove-further-false-positives branch June 16, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants