openapi-mcp-server/docs/web.md
Tom Foster 719d4ecba4
All checks were successful
CI / Lint & Test (push) Successful in 23s
CI / Build and push Docker image (push) Successful in 1m22s
Update endpoint paths and lint codebase
2025-08-01 21:37:26 +01:00

1.6 KiB

Web Tool

A web content parsing tool that extracts and processes web content using trafilatura for clean text extraction.

Overview

The Web tool provides web content parsing and extraction capabilities within the OpenAPI MCP Server framework. It can fetch web pages and extract clean, readable content from HTML.

Endpoints

Method Endpoint Description
POST /web/web_read Extract and parse webpage content into clean markdown. Accepts URL and various formatting options (metadata, formatting, images, links, tables). Returns structured markdown content.
GET /web/web_raw Fetch raw HTML content and headers from any URL. Returns unprocessed HTML content along with HTTP headers and status code for debugging or advanced processing.

Key Features

  • Clean Content Extraction: Uses trafilatura library for high-quality text extraction from web pages
  • Markdown Output: Converts web content to clean, readable markdown format
  • Flexible Formatting: Control inclusion of metadata, formatting, images, links, and tables
  • Raw Content Access: Get unprocessed HTML and headers when needed
  • Error Handling: Robust error handling for failed requests and invalid URLs

Usage

The Web tool is automatically loaded by the OpenAPI MCP Server framework. All endpoints are available under the /web prefix when the server is running. Visit /docs for interactive API documentation.

Technology

Built using trafilatura, a Python library specialised in extracting and processing web content, ensuring high-quality text extraction from web pages.