MCP-PDF-Extractor-server

Name: MCP-PDF-Extractor-server
Author: RayenMalouche

MCP Server

by RayenMalouche

mcpmarket.cn⭐ 07/10

基于Java和Tika的本地文件内容元数据提取服务

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.

📊 商业分析

核心功能

从PDF等文件中提取文本、HTML及元数据

商业模式

SaaS订阅或企业私有化部署授权

独特价值

Java生态原生集成，支持CSS样式保留的HTML输出

竞品

["Apache Tika", "pdfplumber", "PyMuPDF"]

🎯 应用场景

使用场景

批量文档数字化非结构化数据处理RAG知识库构建

适用领域

企业信息化法律科技学术研究

目标用户

后端开发者数据工程师AI应用构建者

📦 安装方式

🔗 安装/下载链接 →

工具信息

类型: MCP Server
平台: mcpmarket.cn
Stars: ⭐ 0
价值评分: 7/10
子分类: 文档解析与提取
复杂度: medium
可商业化: ✅ 是

AI 标签

文档解析数据提取RAG增强非结构化数据本地处理

相关工具推荐

xiaohongshutoolsSKILL

XiaoHongShu (Little Red Book) data collection and interaction toolkit. Use when working with XiaoHongShu (小红书) platform for: (1) Searching and scraping notes/posts, (2) Getting user profiles and details, (3) Extracting comments and likes, (4) Following users and liking posts, (5) Fetching home feed and trending content. Automatically handles all encryption parameters (cookies, headers) including a1, webId, x-s, x-s-common, x-t, sec_poison_id, websectiga, gid, x-b3-traceid, x-xray-traceid. Supports guest mode and authenticated sessions via web_session cookie.

9/10⭐ 11

data-storytellingSKILL

将原始数据转化为结构化叙事，融合可视化建议、上下文解读与说服性框架，专为高管汇报和商业决策场景设计

8/10⭐ 30,590

risk-metrics-calculationSKILL

自动计算投资组合VaR、CVaR、夏普比率、索提诺比率及最大回撤，支持风险限额设定与实时风险监控系统构建

8/10⭐ 30,590

senior-data-engineerSKILL

面向数据工程师的AI助手，覆盖ETL/ELT、Spark、Airflow、dbt、Kafka等现代数据栈，支持管道设计、数据建模与质量治理全流程

8/10⭐ 2,218

excel-xlsxMCP

AI驱动创建、检查和编辑Excel工作簿，支持公式计算、日期类型、格式保留与模板复用，无损操作XLSX文件

8/10⭐ 107