Web Tool

A web content parsing tool that extracts and processes web content using trafilatura for clean text extraction.

Overview

The Web tool provides web content parsing and extraction capabilities within the OpenAPI MCP Server framework. It can fetch web pages and extract clean, readable content from HTML.

Endpoints

Method	Endpoint	Description
`POST`	`/web/web_read`	Extract and parse webpage content into clean markdown. Accepts URL and various formatting options (metadata, formatting, images, links, tables). Returns structured markdown content.
`GET`	`/web/web_raw`	Fetch raw HTML content and headers from any URL. Returns unprocessed HTML content along with HTTP headers and status code for debugging or advanced processing.

Key Features

Clean Content Extraction: Uses trafilatura library for high-quality text extraction from web pages
Markdown Output: Converts web content to clean, readable markdown format
Flexible Formatting: Control inclusion of metadata, formatting, images, links, and tables
Raw Content Access: Get unprocessed HTML and headers when needed
Error Handling: Robust error handling for failed requests and invalid URLs

Usage

The Web tool is automatically loaded by the OpenAPI MCP Server framework. All endpoints are available under the /web prefix when the server is running. Visit /docs for interactive API documentation.

Technology

Built using trafilatura, a Python library specialised in extracting and processing web content, ensuring high-quality text extraction from web pages.

1.6 KiB Raw Permalink Blame History