Files
insightflow/backend/docs/multimodal_api.md
OpenClaw Bot 797ca58e8e Phase 7 Task 7: 插件与集成系统
- 创建 plugin_manager.py 模块
  - PluginManager: 插件管理主类
  - ChromeExtensionHandler: Chrome 插件处理
  - BotHandler: 飞书/钉钉/Slack 机器人处理
  - WebhookIntegration: Zapier/Make Webhook 集成
  - WebDAVSync: WebDAV 同步管理

- 创建完整的 Chrome 扩展代码
  - manifest.json, background.js, content.js, content.css
  - popup.html/js: 弹出窗口界面
  - options.html/js: 设置页面
  - 支持网页剪藏、选中文本保存、项目选择

- 更新 schema.sql 添加插件相关数据库表
  - plugins: 插件配置表
  - bot_sessions: 机器人会话表
  - webhook_endpoints: Webhook 端点表
  - webdav_syncs: WebDAV 同步配置表
  - plugin_activity_logs: 插件活动日志表

- 更新 main.py 添加插件相关 API 端点
  - GET/POST /api/v1/plugins - 插件管理
  - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页
  - POST /api/v1/bots/webhook/{platform} - 接收机器人消息
  - GET /api/v1/bots/sessions - 机器人会话列表
  - POST /api/v1/webhook-endpoints - 创建 Webhook 端点
  - POST /webhook/{type}/{token} - 接收外部 Webhook
  - POST /api/v1/webdav-syncs - WebDAV 同步配置
  - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接
  - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步

- 更新 requirements.txt 添加插件依赖
  - beautifulsoup4: HTML 解析
  - webdavclient3: WebDAV 客户端

- 更新 STATUS.md 和 README.md 开发进度
2026-02-23 12:09:15 +08:00

309 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# InsightFlow Phase 7 - 多模态支持 API 文档
## 概述
Phase 7 多模态支持模块为 InsightFlow 添加了处理视频和图片的能力,支持:
1. **视频处理**提取音频、关键帧、OCR 识别
2. **图片处理**识别白板、PPT、手写笔记等内容
3. **多模态实体关联**:跨模态实体对齐和知识融合
## 新增 API 端点
### 视频处理
#### 上传视频
```
POST /api/v1/projects/{project_id}/upload-video
```
**参数:**
- `file` (required): 视频文件
- `extract_interval` (optional): 关键帧提取间隔(秒),默认 5 秒
**响应:**
```json
{
"video_id": "abc123",
"project_id": "proj456",
"filename": "meeting.mp4",
"status": "completed",
"audio_extracted": true,
"frame_count": 24,
"ocr_text_preview": "会议内容预览...",
"message": "Video processed successfully"
}
```
#### 获取项目视频列表
```
GET /api/v1/projects/{project_id}/videos
```
**响应:**
```json
[
{
"id": "abc123",
"filename": "meeting.mp4",
"duration": 120.5,
"fps": 30.0,
"resolution": {"width": 1920, "height": 1080},
"ocr_preview": "会议内容...",
"status": "completed",
"created_at": "2024-01-15T10:30:00"
}
]
```
#### 获取视频关键帧
```
GET /api/v1/videos/{video_id}/frames
```
**响应:**
```json
[
{
"id": "frame001",
"frame_number": 1,
"timestamp": 0.0,
"image_url": "/tmp/frames/video123/frame_000001_0.00.jpg",
"ocr_text": "第一页内容...",
"entities": [{"name": "Project Alpha", "type": "PROJECT"}]
}
]
```
### 图片处理
#### 上传图片
```
POST /api/v1/projects/{project_id}/upload-image
```
**参数:**
- `file` (required): 图片文件
- `detect_type` (optional): 是否自动检测图片类型,默认 true
**响应:**
```json
{
"image_id": "img789",
"project_id": "proj456",
"filename": "whiteboard.jpg",
"image_type": "whiteboard",
"ocr_text_preview": "白板内容...",
"description": "这是一张白板图片。内容摘要:...",
"entity_count": 5,
"status": "completed"
}
```
#### 批量上传图片
```
POST /api/v1/projects/{project_id}/upload-images-batch
```
**参数:**
- `files` (required): 多个图片文件
**响应:**
```json
{
"project_id": "proj456",
"total_count": 3,
"success_count": 3,
"failed_count": 0,
"results": [
{
"image_id": "img001",
"status": "success",
"image_type": "ppt",
"entity_count": 4
}
]
}
```
#### 获取项目图片列表
```
GET /api/v1/projects/{project_id}/images
```
### 多模态实体关联
#### 跨模态实体对齐
```
POST /api/v1/projects/{project_id}/multimodal/align
```
**参数:**
- `threshold` (optional): 相似度阈值,默认 0.85
**响应:**
```json
{
"project_id": "proj456",
"aligned_count": 5,
"links": [
{
"link_id": "link001",
"source_entity_id": "ent001",
"target_entity_id": "ent002",
"source_modality": "video",
"target_modality": "document",
"link_type": "same_as",
"confidence": 0.95,
"evidence": "Cross-modal alignment: exact"
}
],
"message": "Successfully aligned 5 cross-modal entity pairs"
}
```
#### 获取多模态统计信息
```
GET /api/v1/projects/{project_id}/multimodal/stats
```
**响应:**
```json
{
"project_id": "proj456",
"video_count": 3,
"image_count": 10,
"multimodal_entity_count": 25,
"cross_modal_links": 8,
"modality_distribution": {
"audio": 15,
"video": 8,
"image": 12,
"document": 20
}
}
```
#### 获取实体多模态提及
```
GET /api/v1/entities/{entity_id}/multimodal-mentions
```
**响应:**
```json
[
{
"id": "mention001",
"entity_id": "ent001",
"entity_name": "Project Alpha",
"modality": "video",
"source_id": "video123",
"source_type": "video_frame",
"text_snippet": "Project Alpha 进度",
"confidence": 1.0,
"created_at": "2024-01-15T10:30:00"
}
]
```
#### 建议多模态实体合并
```
GET /api/v1/projects/{project_id}/multimodal/suggest-merges
```
**响应:**
```json
{
"project_id": "proj456",
"suggestion_count": 3,
"suggestions": [
{
"entity1": {"id": "ent001", "name": "K8s", "type": "TECH"},
"entity2": {"id": "ent002", "name": "Kubernetes", "type": "TECH"},
"similarity": 0.95,
"match_type": "alias_match",
"suggested_action": "merge"
}
]
}
```
## 数据库表结构
### videos 表
存储视频文件信息
- `id`: 视频ID
- `project_id`: 所属项目ID
- `filename`: 文件名
- `duration`: 视频时长(秒)
- `fps`: 帧率
- `resolution`: 分辨率JSON
- `audio_transcript_id`: 关联的音频转录ID
- `full_ocr_text`: 所有帧OCR文本合并
- `extracted_entities`: 提取的实体JSON
- `extracted_relations`: 提取的关系JSON
- `status`: 处理状态
### video_frames 表
存储视频关键帧信息
- `id`: 帧ID
- `video_id`: 所属视频ID
- `frame_number`: 帧序号
- `timestamp`: 时间戳(秒)
- `image_url`: 图片URL或路径
- `ocr_text`: OCR识别文本
- `extracted_entities`: 该帧提取的实体
### images 表
存储图片文件信息
- `id`: 图片ID
- `project_id`: 所属项目ID
- `filename`: 文件名
- `ocr_text`: OCR识别文本
- `description`: 图片描述
- `extracted_entities`: 提取的实体
- `extracted_relations`: 提取的关系
- `status`: 处理状态
### multimodal_mentions 表
存储实体在多模态中的提及
- `id`: 提及ID
- `project_id`: 所属项目ID
- `entity_id`: 实体ID
- `modality`: 模态类型audio/video/image/document
- `source_id`: 来源ID
- `source_type`: 来源类型
- `text_snippet`: 文本片段
- `confidence`: 置信度
### multimodal_entity_links 表
存储跨模态实体关联
- `id`: 关联ID
- `entity_id`: 实体ID
- `linked_entity_id`: 关联实体ID
- `link_type`: 关联类型same_as/related_to/part_of
- `confidence`: 置信度
- `evidence`: 关联证据
- `modalities`: 涉及的模态列表
## 依赖安装
```bash
pip install ffmpeg-python pillow opencv-python pytesseract
```
注意:使用 OCR 功能需要安装 Tesseract OCR 引擎:
- Ubuntu/Debian: `sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim`
- macOS: `brew install tesseract tesseract-lang`
- Windows: 下载安装包从 https://github.com/UB-Mannheim/tesseract/wiki
## 环境变量
```bash
# 可选:自定义临时目录
export INSIGHTFLOW_TEMP_DIR=/path/to/temp
# 可选Tesseract 路径Windows
export TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
```