一个功能丰富的 PDF 阅读 MCP 服务器,让 LLM(大语言模型)客户端能够读取和分析 PDF 文件。
A feature-rich MCP server for reading and analyzing PDF files with LLM clients.
| 工具 / Tool | 中文说明 | English |
|---|---|---|
get_pdf_info |
读取文档元数据、页数、大小和加密状态 | Read document metadata, page count, size, and encryption status |
read_pdf_as_text |
提取指定页面文本内容 | Extract text content from selected pages |
read_pdf_as_images |
将指定页面渲染为 base64 图片 | Render selected pages as base64-encoded images |
get_pdf_outline |
读取书签与目录结构 | Read bookmarks and outline structure |
search_pdf_text |
按页返回搜索结果和上下文 | Search text with per-page context |
extract_pdf_tables |
提取可识别的表格结构 | Extract structured tables when detectable |
extract_pdf_images |
提取 PDF 内嵌图片 | Extract embedded images from the PDF |
get_pdf_page_info |
查看单页尺寸、文本、图片和链接信息 | Inspect a page's dimensions, text, images, and links |
extract_pdf_links |
提取外部链接和内部跳转 | Extract external URLs and internal page jumps |
get_pdf_annotations |
读取批注、高亮与注释信息 | Read comments, highlights, and annotation data |
get_pdf_text_stats |
统计文本、行数、段落数和扫描版概率 | Compute text, line, paragraph, and scan-likelihood stats |
compare_pdf_pages |
比较两个页面的文本相似度 | Compare text similarity between two pages |
很多 LLM 工作流不仅需要纯文本提取,还需要目录、表格、图片、注释、链接等结构化信息。
Many LLM workflows need more than raw text extraction. They also need structure, tables, images, annotations, and links.
这个服务提供统一的 MCP 接口,用于: This server provides a unified MCP interface for:
- 文本型 PDF / text-heavy PDFs
- 扫描版或版式敏感 PDF / scanned or layout-sensitive PDFs
- 表格与图片提取 / table and image extraction
- 元数据与结构分析 / metadata and structure inspection
- 批注与链接分析 / annotation and link analysis
- Python 3.10+
uv或其他 Python 环境管理工具 /uvor another Python environment manager
安装 uv / Install uv:
curl -LsSf https://astral.sh/uv/install.sh | shWindows PowerShell:
irm https://astral.sh/uv/install.ps1 | iexuv syncuv run pdf-reader-mcp本地仓库配置示例 / Example configuration for a local checkout:
{
"mcpServers": {
"pdf-reader": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/pdf-reader-mcp",
"run",
"pdf-reader-mcp"
]
}
}
}将 /absolute/path/to/pdf-reader-mcp 替换为你的本地仓库路径。
Replace /absolute/path/to/pdf-reader-mcp with your local repository path.
read_pdf_as_images返回的是 base64 图片,响应体积会迅速变大。
read_pdf_as_imagesreturns base64 image payloads, which can grow very quickly.- 图片渲染仍然限制为最多 20 页。
Image rendering is still limited to 20 pages per call. read_pdf_as_text现在默认限制为最多 50 页、最多 200000 字符,超限会截断并附带 warning。
read_pdf_as_textnow defaults to at most 50 pages and 200000 characters, and truncates with a warning when needed.read_pdf_as_images现在默认限制总返回负载约 20MB,超限会提前停止并附带 warning。
read_pdf_as_imagesnow defaults to an overall payload cap of about 20MB and stops early with a warning.- 对扫描版 PDF,建议优先按小页范围调用,并降低
dpi、使用jpeg、降低quality。
For scanned PDFs, prefer smaller page ranges, lowerdpi,jpeg, and lowerquality.
安装开发依赖 / Install dev dependencies:
uv sync --extra dev运行测试 / Run tests:
uv run pytest- Python 3.10+
- MCP Python SDK
- PyMuPDF
MIT