Go-Docs MCP
Install and Go — your AI reads any document
The Problem
Every document MCP server needs Node.js or Python. They only handle one format — and none of them do OCR, table extraction, or image reading.
The Solution
go install and you’re done. One binary, 12 tools. PDF, DOCX, Markdown, images, OCR — no runtime, no config.
Why go-docs-mcp?
The MCP ecosystem is drowning in Node and Python. A Go option stands out simply by being different. People who run lean infrastructure — self-hosters, DevOps, terminal users — actively prefer compiled binaries over interpreted runtimes. If you already have Go, this is the fastest path from zero to document-reading AI.
| Node / TS MCPs | Python MCPs | go-docs-mcp | |
|---|---|---|---|
| ⚡ | Requires Node.js | Requires Python + pip | Single binary, no runtime |
| 📄 | Single format only | Single format only | PDF + TXT + MD + DOCX + CSV + images |
| 👁 | No OCR | No OCR | OCR for scanned PDFs + images |
| 📊 | Limited tables | Basic tables | Tables + outline + images + caching |
| 🔒 | Security varies | Security varies | Read-only, directory-locked |
Features
Read Any Document
Extract full text from PDF, TXT, MD, CSV, DOCX, and images (PNG/JPG/TIFF) with page-level granularity. Your AI reads any document format from a single server.
Search
Full-text search within documents with contextual results. Find exactly what you need across hundreds of pages.
Image Extraction
Extract embedded images from document pages. Diagrams, charts, photos — pulled out and ready for analysis.
URL Fetch
Fetch and read documents from URLs. Your AI grabs a document from the web, caches it locally, and reads it like any local file.
Security
All processing happens locally. No documents leave your machine. No cloud APIs, no data exfiltration risk.
Fast Caching
Parsed documents are cached for instant repeat access. First read extracts, subsequent reads are near-zero latency.
OCR
Read scanned PDFs and image files (PNG, JPG, TIFF) via OCR. Automatic fallback for image-based documents, force OCR when needed.
Architecture
go-docs-mcp is a single Go binary that communicates via stdio using the Model Context Protocol. It delegates to poppler-utils, tesseract, and pandoc for format-specific extraction.
12 tools in 5 categories: Discovery (2), Reading (3), Search (1), Analysis (4), OCR (2).
12 MCP Tools
A comprehensive set of tools covering multi-format document reading, search, and extraction.
| Tool | Description |
|---|---|
list_documents | List all documents in the configured directory with format detection |
read_document | Read full text or specific pages from any supported document |
search_document | Search within a document for text with contextual results |
get_document_summary | Get a summary of the document structure and content |
get_document_metadata | Extract title, author, dates, page count, and format info |
get_document_outline | Extract document outline — headings, TOC, structure |
extract_tables | Extract table structures from documents |
extract_images | Extract embedded images from document pages |
read_url | Fetch a document from a URL, cache locally, and read it |
ocr_document | Force OCR on scanned PDFs or image-based documents |
read_image | OCR standalone images (PNG, JPG, TIFF) |
list_formats | Show supported formats and installed dependencies |
Quick Start
Install with one command, configure in 30 seconds.
Requirements
A Go toolchain for installation. Format-specific deps are optional — install only what you need.
Built With
A single Go binary with optional dependencies per format.
- Go 1.25+ — Single binary, cross-platform compilation
- MCP SDK (Go) — Model Context Protocol via stdio
- poppler-utils — PDF text, image, and metadata extraction
- tesseract + pandoc — OCR for images/scans, DOCX conversion
Ready to give your AI access to any document?
go-docs-mcp is free, open source, and installs in one command.