pdf-ocr-extraction

Name: pdf-ocr-extraction
Rating: 5 (1 reviews)
Author: bilicen700

CLI 工具

by bilicen700

clawhub⭐ 15/10

使用 Tesseract OCR 从图片或扫描 PDF 中提取文本

基于Tesseract OCR引擎，从图像型或扫描版PDF中提取文字内容，支持本地离线运行，保护数据隐私安全。

📊 商业分析

商业模式

free

独特价值

本地离线扫描PDF文字提取，无需上传云端保护隐私

竞品

1. Adobe Acrobat OCR（功能完整但价格高，企业级）；2. AWS Textract（云端精度高但需付费API）；3. 百度OCR API（中文识别更强，有免费额度）

🎯 应用场景

目标用户

法律/财务从业者（处理大量扫描合同）学术研究人员（数字化纸质文献）企业数据录入人员（批量处理扫描档案）

📦 安装方式

openclaw install bilicen700-pdf-ocr-extraction

🔗 安装/下载链接 →

工具信息

类型: CLI 工具
平台: clawhub
Stars: ⭐ 1
价值评分: 5/10
子分类: 文档智能处理与OCR识别
可商业化: ❌ 否

AI 标签

PDF文字提取OCR识别扫描件处理Tesseract离线文档解析

相关工具推荐

xiaohongshutoolsSKILL

XiaoHongShu (Little Red Book) data collection and interaction toolkit. Use when working with XiaoHongShu (小红书) platform for: (1) Searching and scraping notes/posts, (2) Getting user profiles and details, (3) Extracting comments and likes, (4) Following users and liking posts, (5) Fetching home feed and trending content. Automatically handles all encryption parameters (cookies, headers) including a1, webId, x-s, x-s-common, x-t, sec_poison_id, websectiga, gid, x-b3-traceid, x-xray-traceid. Supports guest mode and authenticated sessions via web_session cookie.

9/10⭐ 11

data-storytellingSKILL

将原始数据转化为结构化叙事，融合可视化建议、上下文解读与说服性框架，专为高管汇报和商业决策场景设计

8/10⭐ 30,590

risk-metrics-calculationSKILL

自动计算投资组合VaR、CVaR、夏普比率、索提诺比率及最大回撤，支持风险限额设定与实时风险监控系统构建

8/10⭐ 30,590

senior-data-engineerSKILL

面向数据工程师的AI助手，覆盖ETL/ELT、Spark、Airflow、dbt、Kafka等现代数据栈，支持管道设计、数据建模与质量治理全流程

8/10⭐ 2,218

excel-xlsxMCP

AI驱动创建、检查和编辑Excel工作簿，支持公式计算、日期类型、格式保留与模板复用，无损操作XLSX文件

8/10⭐ 107