Skip to content

Budget scrape, parse, and upload#171

Open
TyHil wants to merge 19 commits intodevelopfrom
budget-scrape
Open

Budget scrape, parse, and upload#171
TyHil wants to merge 19 commits intodevelopfrom
budget-scrape

Conversation

@TyHil
Copy link
Copy Markdown
Member

@TyHil TyHil commented May 1, 2026

  • Scrape budgets from https://finance.utdallas.edu/for-others/public-reports/
    • Both the Annual Financial Statements and Annual Budget Reports
  • Parse using Gemini and a very complicated table schema and prompt
    • This costs like $0.50 for each year (bc of the 200 page PDF), but should only ever run once for each year
      • Unless they change the PDF
    • Added stored budget PDFs in a new static-data folder since while I was scraping these UTD actually removed all the budgets from before 2021 from the website
      • Adding the useBackupBudgets flags pulls from this data to parse 2016-2020
      • Also moved grade data inside this folder
  • Upload without replacing so if only 2021-2026 are parsed, 2016-2020 won't be removed
  • Added this to the monthly GCP runner

I've only been testing with fiscal year 2025 so far but I've gotten what seems to me to be a perfect parse from Gemini from it. Student Government is checking my work on that. I'll parse all the other years once I get an approval on this PR and from them.

Copy link
Copy Markdown
Contributor

@mikehquan19 mikehquan19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, looking good! Just some moving function around, and naming stuff. Other than that, should be ready to merge

Comment thread parser/academicCalendarsParser.go
Comment thread parser/budgetsParser.go
Comment thread scrapers/academicCalendars.go
Comment thread parser/budgetsParser.go
Comment thread uploader/uploader.go
Comment thread utils/methods.go Outdated
@TyHil TyHil requested a review from mikehquan19 May 1, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants