Skip to content

Derive program-statistics entity from model metadata (refs #326)#334

Draft
vahid-ahmadi wants to merge 1 commit intomainfrom
refactor-program-specs
Draft

Derive program-statistics entity from model metadata (refs #326)#334
vahid-ahmadi wants to merge 1 commit intomainfrom
refactor-program-specs

Conversation

@vahid-ahmadi
Copy link
Copy Markdown

Refs #326. Draft — see scope notes and overlap with #327 below.

Summary

Replaces the hard-coded programs = {...} dict in economic_impact_analysis (US) with a structured ProgramSpec list and a resolve_program_specs(specs, model_version) helper that:

  • Validates every variable name against the model up front, before any expensive simulation work.
  • Derives `entity` from the variable's own metadata — the entity string is no longer duplicated next to the variable name, so it cannot silently drift when policyengine-us moves a variable between entities.
  • Collects every unknown variable into a single `ValueError` with difflib suggestions, instead of failing one-at-a-time deep inside an Aggregate.run() next(...) lookup.

This is a concrete first step toward the durable program-statistics mapping tracked in #326. Deriving the list itself from model metadata (e.g. scanning for variables tagged as program aggregates) is left for a follow-up.

Files

  • src/policyengine/tax_benefit_models/us/programs.py (new) — ProgramSpec, ResolvedProgram, US_PROGRAM_SPECS, resolve_program_specs().
  • src/policyengine/tax_benefit_models/us/analysis.py — uses the new helper.
  • tests/test_program_specs.py (new) — 4 unit tests covering entity derivation, multi-error collection, fuzzy suggestions, and de-duplication.
  • changelog.d/program-specs-metadata.changed.md.

Overlap with #327

This PR overlaps with @anth-volk's #327 (the immediate fix for the StopIteration crash in #325). The variable name corrections from #327 (payroll_taxemployee_payroll_tax, medicaremedicare_cost, state_income_taxhousehold_state_income_tax) are applied here too so the new validation passes against the current US model.

Whichever lands first, the other needs a small rebase. Marked draft for that reason.

Test plan

  • pytest tests/test_program_specs.py
  • pytest tests/test_aggregate.py tests/test_change_aggregate.py still passes.
  • Manually run examples/us_budgetary_impact.py and confirm economic_impact_analysis produces the program-by-program table without the original StopIteration.

🤖 Generated with Claude Code

Refs #326

Replace the hard-coded `programs = {...}` dict in
`economic_impact_analysis` with a structured `ProgramSpec` list and a
`resolve_program_specs(specs, model_version)` helper that:

- Validates every variable name against the model up front
- Derives `entity` from the variable's own metadata (no more
  duplicated entity strings that drift silently when a variable
  moves between entities upstream)
- Collects every unknown variable into a single `ValueError` with
  difflib suggestions, instead of failing one-at-a-time deep inside
  an `Aggregate.run()` lookup

This is a concrete first step toward the durable program-statistics
mapping tracked in #326. It removes the entity-drift class of bug;
deriving the program list itself from model metadata (e.g. by
scanning for variables tagged as program aggregates) is left for a
follow-up.

Note: this PR overlaps with #327 (Anthony's fix for the immediate
StopIteration crash in #325). When either lands first the other
will need a small rebase. The variable name corrections from #327
(payroll_tax -> employee_payroll_tax, medicare -> medicare_cost,
state_income_tax -> household_state_income_tax) are also applied
here so the new validation passes against the current US model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant