Files
insightflow/backend/docs/multimodal_api.md
OpenClaw Bot 797ca58e8e Phase 7 Task 7: 插件与集成系统
- 创建 plugin_manager.py 模块
  - PluginManager: 插件管理主类
  - ChromeExtensionHandler: Chrome 插件处理
  - BotHandler: 飞书/钉钉/Slack 机器人处理
  - WebhookIntegration: Zapier/Make Webhook 集成
  - WebDAVSync: WebDAV 同步管理

- 创建完整的 Chrome 扩展代码
  - manifest.json, background.js, content.js, content.css
  - popup.html/js: 弹出窗口界面
  - options.html/js: 设置页面
  - 支持网页剪藏、选中文本保存、项目选择

- 更新 schema.sql 添加插件相关数据库表
  - plugins: 插件配置表
  - bot_sessions: 机器人会话表
  - webhook_endpoints: Webhook 端点表
  - webdav_syncs: WebDAV 同步配置表
  - plugin_activity_logs: 插件活动日志表

- 更新 main.py 添加插件相关 API 端点
  - GET/POST /api/v1/plugins - 插件管理
  - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页
  - POST /api/v1/bots/webhook/{platform} - 接收机器人消息
  - GET /api/v1/bots/sessions - 机器人会话列表
  - POST /api/v1/webhook-endpoints - 创建 Webhook 端点
  - POST /webhook/{type}/{token} - 接收外部 Webhook
  - POST /api/v1/webdav-syncs - WebDAV 同步配置
  - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接
  - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步

- 更新 requirements.txt 添加插件依赖
  - beautifulsoup4: HTML 解析
  - webdavclient3: WebDAV 客户端

- 更新 STATUS.md 和 README.md 开发进度
2026-02-23 12:09:15 +08:00

6.4 KiB
Raw Permalink Blame History

InsightFlow Phase 7 - 多模态支持 API 文档

概述

Phase 7 多模态支持模块为 InsightFlow 添加了处理视频和图片的能力,支持:

  1. 视频处理提取音频、关键帧、OCR 识别
  2. 图片处理识别白板、PPT、手写笔记等内容
  3. 多模态实体关联:跨模态实体对齐和知识融合

新增 API 端点

视频处理

上传视频

POST /api/v1/projects/{project_id}/upload-video

参数:

  • file (required): 视频文件
  • extract_interval (optional): 关键帧提取间隔(秒),默认 5 秒

响应:

{
  "video_id": "abc123",
  "project_id": "proj456",
  "filename": "meeting.mp4",
  "status": "completed",
  "audio_extracted": true,
  "frame_count": 24,
  "ocr_text_preview": "会议内容预览...",
  "message": "Video processed successfully"
}

获取项目视频列表

GET /api/v1/projects/{project_id}/videos

响应:

[
  {
    "id": "abc123",
    "filename": "meeting.mp4",
    "duration": 120.5,
    "fps": 30.0,
    "resolution": {"width": 1920, "height": 1080},
    "ocr_preview": "会议内容...",
    "status": "completed",
    "created_at": "2024-01-15T10:30:00"
  }
]

获取视频关键帧

GET /api/v1/videos/{video_id}/frames

响应:

[
  {
    "id": "frame001",
    "frame_number": 1,
    "timestamp": 0.0,
    "image_url": "/tmp/frames/video123/frame_000001_0.00.jpg",
    "ocr_text": "第一页内容...",
    "entities": [{"name": "Project Alpha", "type": "PROJECT"}]
  }
]

图片处理

上传图片

POST /api/v1/projects/{project_id}/upload-image

参数:

  • file (required): 图片文件
  • detect_type (optional): 是否自动检测图片类型,默认 true

响应:

{
  "image_id": "img789",
  "project_id": "proj456",
  "filename": "whiteboard.jpg",
  "image_type": "whiteboard",
  "ocr_text_preview": "白板内容...",
  "description": "这是一张白板图片。内容摘要:...",
  "entity_count": 5,
  "status": "completed"
}

批量上传图片

POST /api/v1/projects/{project_id}/upload-images-batch

参数:

  • files (required): 多个图片文件

响应:

{
  "project_id": "proj456",
  "total_count": 3,
  "success_count": 3,
  "failed_count": 0,
  "results": [
    {
      "image_id": "img001",
      "status": "success",
      "image_type": "ppt",
      "entity_count": 4
    }
  ]
}

获取项目图片列表

GET /api/v1/projects/{project_id}/images

多模态实体关联

跨模态实体对齐

POST /api/v1/projects/{project_id}/multimodal/align

参数:

  • threshold (optional): 相似度阈值,默认 0.85

响应:

{
  "project_id": "proj456",
  "aligned_count": 5,
  "links": [
    {
      "link_id": "link001",
      "source_entity_id": "ent001",
      "target_entity_id": "ent002",
      "source_modality": "video",
      "target_modality": "document",
      "link_type": "same_as",
      "confidence": 0.95,
      "evidence": "Cross-modal alignment: exact"
    }
  ],
  "message": "Successfully aligned 5 cross-modal entity pairs"
}

获取多模态统计信息

GET /api/v1/projects/{project_id}/multimodal/stats

响应:

{
  "project_id": "proj456",
  "video_count": 3,
  "image_count": 10,
  "multimodal_entity_count": 25,
  "cross_modal_links": 8,
  "modality_distribution": {
    "audio": 15,
    "video": 8,
    "image": 12,
    "document": 20
  }
}

获取实体多模态提及

GET /api/v1/entities/{entity_id}/multimodal-mentions

响应:

[
  {
    "id": "mention001",
    "entity_id": "ent001",
    "entity_name": "Project Alpha",
    "modality": "video",
    "source_id": "video123",
    "source_type": "video_frame",
    "text_snippet": "Project Alpha 进度",
    "confidence": 1.0,
    "created_at": "2024-01-15T10:30:00"
  }
]

建议多模态实体合并

GET /api/v1/projects/{project_id}/multimodal/suggest-merges

响应:

{
  "project_id": "proj456",
  "suggestion_count": 3,
  "suggestions": [
    {
      "entity1": {"id": "ent001", "name": "K8s", "type": "TECH"},
      "entity2": {"id": "ent002", "name": "Kubernetes", "type": "TECH"},
      "similarity": 0.95,
      "match_type": "alias_match",
      "suggested_action": "merge"
    }
  ]
}

数据库表结构

videos 表

存储视频文件信息

  • id: 视频ID
  • project_id: 所属项目ID
  • filename: 文件名
  • duration: 视频时长(秒)
  • fps: 帧率
  • resolution: 分辨率JSON
  • audio_transcript_id: 关联的音频转录ID
  • full_ocr_text: 所有帧OCR文本合并
  • extracted_entities: 提取的实体JSON
  • extracted_relations: 提取的关系JSON
  • status: 处理状态

video_frames 表

存储视频关键帧信息

  • id: 帧ID
  • video_id: 所属视频ID
  • frame_number: 帧序号
  • timestamp: 时间戳(秒)
  • image_url: 图片URL或路径
  • ocr_text: OCR识别文本
  • extracted_entities: 该帧提取的实体

images 表

存储图片文件信息

  • id: 图片ID
  • project_id: 所属项目ID
  • filename: 文件名
  • ocr_text: OCR识别文本
  • description: 图片描述
  • extracted_entities: 提取的实体
  • extracted_relations: 提取的关系
  • status: 处理状态

multimodal_mentions 表

存储实体在多模态中的提及

  • id: 提及ID
  • project_id: 所属项目ID
  • entity_id: 实体ID
  • modality: 模态类型audio/video/image/document
  • source_id: 来源ID
  • source_type: 来源类型
  • text_snippet: 文本片段
  • confidence: 置信度

存储跨模态实体关联

  • id: 关联ID
  • entity_id: 实体ID
  • linked_entity_id: 关联实体ID
  • link_type: 关联类型same_as/related_to/part_of
  • confidence: 置信度
  • evidence: 关联证据
  • modalities: 涉及的模态列表

依赖安装

pip install ffmpeg-python pillow opencv-python pytesseract

注意:使用 OCR 功能需要安装 Tesseract OCR 引擎:

环境变量

# 可选:自定义临时目录
export INSIGHTFLOW_TEMP_DIR=/path/to/temp

# 可选Tesseract 路径Windows
export TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe