1.6 KiB
1.6 KiB
Web Tool
A web content parsing tool that extracts and processes web content using trafilatura for clean text extraction.
Overview
The Web tool provides web content parsing and extraction capabilities within the OpenAPI MCP Server framework. It can fetch web pages and extract clean, readable content from HTML.
Endpoints
Method | Endpoint | Description |
---|---|---|
POST |
/web/web_read |
Extract and parse webpage content into clean markdown. Accepts URL and various formatting options (metadata, formatting, images, links, tables). Returns structured markdown content. |
GET |
/web/web_raw |
Fetch raw HTML content and headers from any URL. Returns unprocessed HTML content along with HTTP headers and status code for debugging or advanced processing. |
Key Features
- Clean Content Extraction: Uses trafilatura library for high-quality text extraction from web pages
- Markdown Output: Converts web content to clean, readable markdown format
- Flexible Formatting: Control inclusion of metadata, formatting, images, links, and tables
- Raw Content Access: Get unprocessed HTML and headers when needed
- Error Handling: Robust error handling for failed requests and invalid URLs
Usage
The Web tool is automatically loaded by the OpenAPI MCP Server framework. All endpoints are
available under the /web
prefix when the server is running. Visit /docs
for interactive API documentation.
Technology
Built using trafilatura, a Python library specialised in extracting and processing web content, ensuring high-quality text extraction from web pages.