Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 34.1k 2.4k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 19.6k 1.4k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 683 68

  4. chandra chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    Python 9.3k 976

Repositories

Showing 10 of 11 repositories
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    Python 11 MIT 7 3 6 Updated Apr 20, 2026
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 34,107 GPL-3.0 2,358 337 66 Updated Apr 14, 2026
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 19,623 GPL-3.0 1,350 143 19 Updated Apr 10, 2026
  • chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    datalab-to/chandra’s past year of commit activity
    Python 9,279 Apache-2.0 976 30 7 Updated Apr 9, 2026
  • results Public
    datalab-to/results’s past year of commit activity
    HTML 2 0 0 0 Updated Apr 9, 2026
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 7 1 0 0 Updated Feb 12, 2026
  • pykatex Public
    datalab-to/pykatex’s past year of commit activity
    Python 2 0 0 0 Updated Feb 5, 2026
  • oss_container Public
    datalab-to/oss_container’s past year of commit activity
    Python 1 1 0 0 Updated Oct 2, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 4 1 0 1 Updated Aug 13, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 11 Apache-2.0 7 0 0 Updated Jun 18, 2025

Most used topics

Loading…