pdf-reader-mcp

一个功能丰富的 PDF 阅读 MCP 服务器，让 LLM（大语言模型）客户端能够读取和分析 PDF 文件。
A feature-rich MCP server for reading and analyzing PDF files with LLM clients.

功能特性 / Features

工具 / Tool	中文说明	English
`get_pdf_info`	读取文档元数据、页数、大小和加密状态	Read document metadata, page count, size, and encryption status
`read_pdf_as_text`	提取指定页面文本内容	Extract text content from selected pages
`read_pdf_as_images`	将指定页面渲染为 base64 图片	Render selected pages as base64-encoded images
`get_pdf_outline`	读取书签与目录结构	Read bookmarks and outline structure
`search_pdf_text`	按页返回搜索结果和上下文	Search text with per-page context
`extract_pdf_tables`	提取可识别的表格结构	Extract structured tables when detectable
`extract_pdf_images`	提取 PDF 内嵌图片	Extract embedded images from the PDF
`get_pdf_page_info`	查看单页尺寸、文本、图片和链接信息	Inspect a page's dimensions, text, images, and links
`extract_pdf_links`	提取外部链接和内部跳转	Extract external URLs and internal page jumps
`get_pdf_annotations`	读取批注、高亮与注释信息	Read comments, highlights, and annotation data
`get_pdf_text_stats`	统计文本、行数、段落数和扫描版概率	Compute text, line, paragraph, and scan-likelihood stats
`compare_pdf_pages`	比较两个页面的文本相似度	Compare text similarity between two pages

为什么做这个项目 / Why this project

很多 LLM 工作流不仅需要纯文本提取，还需要目录、表格、图片、注释、链接等结构化信息。
Many LLM workflows need more than raw text extraction. They also need structure, tables, images, annotations, and links.

这个服务提供统一的 MCP 接口，用于： This server provides a unified MCP interface for:

文本型 PDF / text-heavy PDFs
扫描版或版式敏感 PDF / scanned or layout-sensitive PDFs
表格与图片提取 / table and image extraction
元数据与结构分析 / metadata and structure inspection
批注与链接分析 / annotation and link analysis

安装 / Installation

前置要求 / Prerequisites

Python 3.10+
uv 或其他 Python 环境管理工具 / uv or another Python environment manager

安装 uv / Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows PowerShell:

irm https://astral.sh/uv/install.ps1 | iex

本地安装 / Local setup

uv sync

运行服务 / Run the server

uv run pdf-reader-mcp

在 MCP 客户端中配置 / Configure in an MCP client

本地仓库配置示例 / Example configuration for a local checkout:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/pdf-reader-mcp",
        "run",
        "pdf-reader-mcp"
      ]
    }
  }
}

将 /absolute/path/to/pdf-reader-mcp 替换为你的本地仓库路径。
Replace /absolute/path/to/pdf-reader-mcp with your local repository path.

响应大小与大 PDF 注意事项 / Response size and large-PDF notes

read_pdf_as_images 返回的是 base64 图片，响应体积会迅速变大。
read_pdf_as_images returns base64 image payloads, which can grow very quickly.
图片渲染仍然限制为最多 20 页。
Image rendering is still limited to 20 pages per call.
read_pdf_as_text 现在默认限制为最多 50 页、最多 200000 字符，超限会截断并附带 warning。
read_pdf_as_text now defaults to at most 50 pages and 200000 characters, and truncates with a warning when needed.
read_pdf_as_images 现在默认限制总返回负载约 20MB，超限会提前停止并附带 warning。
read_pdf_as_images now defaults to an overall payload cap of about 20MB and stops early with a warning.
对扫描版 PDF，建议优先按小页范围调用，并降低 dpi、使用 jpeg、降低 quality。
For scanned PDFs, prefer smaller page ranges, lower dpi, jpeg, and lower quality.

开发 / Development

安装开发依赖 / Install dev dependencies:

uv sync --extra dev

运行测试 / Run tests:

uv run pytest

技术栈 / Tech stack

Python 3.10+
MCP Python SDK
PyMuPDF

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/pdf_reader_mcp		src/pdf_reader_mcp
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
test_pdf_mcp.py		test_pdf_mcp.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-reader-mcp

功能特性 / Features

为什么做这个项目 / Why this project

安装 / Installation

前置要求 / Prerequisites

本地安装 / Local setup

运行服务 / Run the server

在 MCP 客户端中配置 / Configure in an MCP client

响应大小与大 PDF 注意事项 / Response size and large-PDF notes

开发 / Development

技术栈 / Tech stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf-reader-mcp

功能特性 / Features

为什么做这个项目 / Why this project

安装 / Installation

前置要求 / Prerequisites

本地安装 / Local setup

运行服务 / Run the server

在 MCP 客户端中配置 / Configure in an MCP client

响应大小与大 PDF 注意事项 / Response size and large-PDF notes

开发 / Development

技术栈 / Tech stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages