Phase 7 Task 7: 插件与集成系统

- 创建 plugin_manager.py 模块 - PluginManager: 插件管理主类 - ChromeExtensionHandler: Chrome 插件处理 - BotHandler: 飞书/钉钉/Slack 机器人处理 - WebhookIntegration: Zapier/Make Webhook 集成 - WebDAVSync: WebDAV 同步管理 - 创建完整的 Chrome 扩展代码 - manifest.json, background.js, content.js, content.css - popup.html/js: 弹出窗口界面 - options.html/js: 设置页面 - 支持网页剪藏、选中文本保存、项目选择 - 更新 schema.sql 添加插件相关数据库表 - plugins: 插件配置表 - bot_sessions: 机器人会话表 - webhook_endpoints: Webhook 端点表 - webdav_syncs: WebDAV 同步配置表 - plugin_activity_logs: 插件活动日志表 - 更新 main.py 添加插件相关 API 端点 - GET/POST /api/v1/plugins - 插件管理 - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页 - POST /api/v1/bots/webhook/{platform} - 接收机器人消息 - GET /api/v1/bots/sessions - 机器人会话列表 - POST /api/v1/webhook-endpoints - 创建 Webhook 端点 - POST /webhook/{type}/{token} - 接收外部 Webhook - POST /api/v1/webdav-syncs - WebDAV 同步配置 - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接 - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步 - 更新 requirements.txt 添加插件依赖 - beautifulsoup4: HTML 解析 - webdavclient3: WebDAV 客户端 - 更新 STATUS.md 和 README.md 开发进度
2026-02-23 12:09:15 +08:00
parent 08535e54ba
commit 797ca58e8e
27 changed files with 7350 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -191,12 +191,12 @@ MIT
 | 任务 | 状态 | 完成时间 |
 |------|------|----------|
 | 1. 智能工作流自动化 | ✅ 已完成 | 2026-02-23 |
-| 2. 多模态支持 | 🚧 进行中 | - |
+| 2. 多模态支持 | ✅ 已完成 | 2026-02-23 |
 | 7. 插件与集成 | ✅ 已完成 | 2026-02-23 |
 | 3. 数据安全与合规 | 📋 待开发 | - |
 | 4. 协作与共享 | 📋 待开发 | - |
 | 5. 智能报告生成 | 📋 待开发 | - |
 | 6. 高级搜索与发现 | 📋 待开发 | - |
 | 7. 插件与集成 | 📋 待开发 | - |
 | 8. 性能优化与扩展 | 📋 待开发 | - |
 **建议开发顺序**: 1 → 2 → 7 → 3 → 4 → 5 → 6 → 8
--- a/STATUS.md
+++ b/STATUS.md
@@ -1,10 +1,10 @@
 # InsightFlow 开发状态
-**最后更新**: 2026-02-23 00:00
+**最后更新**: 2026-02-23 06:00
 ## 当前阶段
-Phase 7: 工作流自动化 - **进行中 🚧**
+Phase 7: 插件与集成 - **已完成 ✅**
 ## 部署状态
@@ -36,7 +36,7 @@ Phase 7: 工作流自动化 - **进行中 🚧**
 - 导出功能
 - API 开放平台
-### Phase 7 - 工作流自动化 (进行中 🚧)
+### Phase 7 - 任务 1: 工作流自动化 (已完成 ✅)
 - ✅ 创建 workflow_manager.py - 工作流管理模块
  - WorkflowManager: 主管理类
  - WorkflowTask: 工作流任务定义
@@ -59,9 +59,81 @@ Phase 7: 工作流自动化 - **进行中 🚧**
  - POST /api/v1/webhooks/{id}/test - 测试 Webhook
 - ✅ 更新 requirements.txt - 添加 APScheduler 依赖
 ### Phase 7 - 任务 2: 多模态支持 (已完成 ✅)
 - ✅ 创建 multimodal_processor.py - 多模态处理模块
  - VideoProcessor: 视频处理器（提取音频 + 关键帧 + OCR）
  - ImageProcessor: 图片处理器（OCR + 图片描述）
  - MultimodalEntityExtractor: 多模态实体提取器
  - 支持 PaddleOCR/EasyOCR/Tesseract 多种 OCR 引擎
  - 支持 ffmpeg 视频处理
 - ✅ 创建 multimodal_entity_linker.py - 多模态实体关联模块
  - MultimodalEntityLinker: 跨模态实体关联器
  - 支持 embedding 相似度计算
  - 多模态实体画像生成
  - 跨模态关系发现
  - 多模态时间线生成
 - ✅ 更新 schema.sql - 添加多模态相关数据库表
  - videos: 视频表
  - video_frames: 视频关键帧表
  - images: 图片表
  - multimodal_mentions: 多模态实体提及表
  - multimodal_entity_links: 多模态实体关联表
 - ✅ 更新 main.py - 添加多模态相关 API 端点
  - POST /api/v1/projects/{id}/upload-video - 上传视频
  - POST /api/v1/projects/{id}/upload-image - 上传图片
  - GET /api/v1/projects/{id}/videos - 视频列表
  - GET /api/v1/projects/{id}/images - 图片列表
  - GET /api/v1/videos/{id} - 视频详情
  - GET /api/v1/images/{id} - 图片详情
  - POST /api/v1/projects/{id}/multimodal/link-entities - 跨模态实体关联
  - GET /api/v1/entities/{id}/multimodal-profile - 实体多模态画像
  - GET /api/v1/projects/{id}/multimodal-timeline - 多模态时间线
  - GET /api/v1/entities/{id}/cross-modal-relations - 跨模态关系
 - ✅ 更新 requirements.txt - 添加多模态依赖
  - opencv-python: 视频处理
  - pillow: 图片处理
  - paddleocr/paddlepaddle: OCR 引擎
  - ffmpeg-python: ffmpeg 封装
  - sentence-transformers: 跨模态对齐
 ### Phase 7 - 任务 7: 插件与集成 (已完成 ✅)
 - ✅ 创建 plugin_manager.py - 插件管理模块
  - PluginManager: 插件管理主类
  - ChromeExtensionHandler: Chrome 插件 API 处理
  - BotHandler: 飞书/钉钉机器人处理
  - WebhookIntegration: Zapier/Make Webhook 集成
  - WebDAVSync: WebDAV 同步管理
 - ✅ 创建 Chrome 扩展代码
  - manifest.json - 扩展配置
  - background.js - 后台脚本，处理右键菜单和消息
  - content.js - 内容脚本，页面交互和浮动按钮
  - content.css - 内容样式
  - popup.html/js - 弹出窗口
  - options.html/js - 设置页面
 - ✅ 更新 schema.sql - 添加插件相关数据库表
  - plugins: 插件配置表
  - bot_sessions: 机器人会话表
  - webhook_endpoints: Webhook 端点表
  - webdav_syncs: WebDAV 同步配置表
  - plugin_activity_logs: 插件活动日志表
 - ✅ 更新 main.py - 添加插件相关 API 端点
  - GET/POST /api/v1/plugins - 插件管理
  - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页
  - POST /api/v1/bots/webhook/{platform} - 接收机器人消息
  - GET /api/v1/bots/sessions - 机器人会话列表
  - POST /api/v1/webhook-endpoints - 创建 Webhook 端点
  - POST /webhook/{type}/{token} - 接收外部 Webhook
  - POST /api/v1/webdav-syncs - WebDAV 同步配置
  - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接
  - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步
  - GET /api/v1/plugins/{id}/logs - 插件活动日志
 - ✅ 更新 requirements.txt - 添加插件依赖
  - beautifulsoup4: HTML 解析
  - webdavclient3: WebDAV 客户端
 ## 待完成
-无 - Phase 7 任务 1 已完成
+Phase 7 任务 3: 数据安全与合规
 ## 技术债务
@@ -69,6 +141,7 @@ Phase 7: 工作流自动化 - **进行中 🚧**
 - 实体相似度匹配目前只是简单字符串包含，需要 embedding 方案
 - 前端需要状态管理（目前使用全局变量）
 - ~~需要添加 API 文档 (OpenAPI/Swagger)~~ ✅ 已完成
 - 多模态 LLM 图片描述功能待实现（需要集成多模态模型 API）
 ## 部署信息
@@ -78,6 +151,36 @@ Phase 7: 工作流自动化 - **进行中 🚧**
 ## 最近更新
 ### 2026-02-23 (午间)
 - 完成 Phase 7 任务 7: 插件与集成
  - 创建 plugin_manager.py 模块
    - PluginManager: 插件管理主类
    - ChromeExtensionHandler: Chrome 插件处理
    - BotHandler: 飞书/钉钉/Slack 机器人处理
    - WebhookIntegration: Zapier/Make Webhook 集成
    - WebDAVSync: WebDAV 同步管理
  - 创建完整的 Chrome 扩展代码
    - manifest.json, background.js, content.js
    - popup.html/js, options.html/js
    - 支持网页剪藏、选中文本保存、项目选择
  - 更新 schema.sql 添加插件相关数据库表
  - 更新 main.py 添加插件相关 API 端点
  - 更新 requirements.txt 添加插件依赖
 ### 2026-02-23 (早间)
 - 完成 Phase 7 任务 2: 多模态支持
  - 创建 multimodal_processor.py 模块
    - VideoProcessor: 视频处理（音频提取 + 关键帧 + OCR）
    - ImageProcessor: 图片处理（OCR + 图片描述）
    - MultimodalEntityExtractor: 多模态实体提取
  - 创建 multimodal_entity_linker.py 模块
    - MultimodalEntityLinker: 跨模态实体关联
    - 支持 embedding 相似度计算
    - 多模态实体画像和时间线
  - 更新 schema.sql 添加多模态相关数据库表
  - 更新 main.py 添加多模态相关 API 端点
  - 更新 requirements.txt 添加多模态依赖
 ### 2026-02-23
 - 完成 Phase 7 任务 1: 工作流自动化模块
  - 创建 workflow_manager.py 模块
--- a/backend/pycache/db_manager.cpython-312.pyc
+++ b/backend/pycache/db_manager.cpython-312.pyc
--- a/backend/pycache/image_processor.cpython-312.pyc
+++ b/backend/pycache/image_processor.cpython-312.pyc
--- a/backend/pycache/main.cpython-312.pyc
+++ b/backend/pycache/main.cpython-312.pyc
--- a/backend/pycache/multimodal_entity_linker.cpython-312.pyc
+++ b/backend/pycache/multimodal_entity_linker.cpython-312.pyc
--- a/backend/pycache/multimodal_processor.cpython-312.pyc
+++ b/backend/pycache/multimodal_processor.cpython-312.pyc
--- a/backend/db_manager.py
+++ b/backend/db_manager.py
@@ -878,6 +878,310 @@ class DatabaseManager:
                filtered.append(entity)
        return filtered
    # ==================== Phase 7: Multimodal Support ====================
    def create_video(self, video_id: str, project_id: str, filename: str, 
                     duration: float = 0, fps: float = 0, resolution: Dict = None,
                     audio_transcript_id: str = None, full_ocr_text: str = "",
                     extracted_entities: List[Dict] = None, 
                     extracted_relations: List[Dict] = None) -> str:
        """创建视频记录"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            """INSERT INTO videos 
               (id, project_id, filename, duration, fps, resolution,
                audio_transcript_id, full_ocr_text, extracted_entities, 
                extracted_relations, status, created_at, updated_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (video_id, project_id, filename, duration, fps,
             json.dumps(resolution) if resolution else None,
             audio_transcript_id, full_ocr_text,
             json.dumps(extracted_entities or []),
             json.dumps(extracted_relations or []),
             'completed', now, now)
        )
        conn.commit()
        conn.close()
        return video_id
    def get_video(self, video_id: str) -> Optional[Dict]:
        """获取视频信息"""
        conn = self.get_conn()
        row = conn.execute(
            "SELECT * FROM videos WHERE id = ?", (video_id,)
        ).fetchone()
        conn.close()
        if row:
            data = dict(row)
            data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None
            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
            return data
        return None
    def list_project_videos(self, project_id: str) -> List[Dict]:
        """获取项目的所有视频"""
        conn = self.get_conn()
        rows = conn.execute(
            "SELECT * FROM videos WHERE project_id = ? ORDER BY created_at DESC",
            (project_id,)
        ).fetchall()
        conn.close()
        videos = []
        for row in rows:
            data = dict(row)
            data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None
            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
            videos.append(data)
        return videos
    def create_video_frame(self, frame_id: str, video_id: str, frame_number: int,
                          timestamp: float, image_url: str = None, 
                          ocr_text: str = None, extracted_entities: List[Dict] = None) -> str:
        """创建视频帧记录"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            """INSERT INTO video_frames 
               (id, video_id, frame_number, timestamp, image_url, ocr_text, extracted_entities, created_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
            (frame_id, video_id, frame_number, timestamp, image_url, ocr_text,
             json.dumps(extracted_entities or []), now)
        )
        conn.commit()
        conn.close()
        return frame_id
    def get_video_frames(self, video_id: str) -> List[Dict]:
        """获取视频的所有帧"""
        conn = self.get_conn()
        rows = conn.execute(
            """SELECT * FROM video_frames WHERE video_id = ? ORDER BY timestamp""",
            (video_id,)
        ).fetchall()
        conn.close()
        frames = []
        for row in rows:
            data = dict(row)
            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
            frames.append(data)
        return frames
    def create_image(self, image_id: str, project_id: str, filename: str,
                     ocr_text: str = "", description: str = "",
                     extracted_entities: List[Dict] = None,
                     extracted_relations: List[Dict] = None) -> str:
        """创建图片记录"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            """INSERT INTO images 
               (id, project_id, filename, ocr_text, description,
                extracted_entities, extracted_relations, status, created_at, updated_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (image_id, project_id, filename, ocr_text, description,
             json.dumps(extracted_entities or []),
             json.dumps(extracted_relations or []),
             'completed', now, now)
        )
        conn.commit()
        conn.close()
        return image_id
    def get_image(self, image_id: str) -> Optional[Dict]:
        """获取图片信息"""
        conn = self.get_conn()
        row = conn.execute(
            "SELECT * FROM images WHERE id = ?", (image_id,)
        ).fetchone()
        conn.close()
        if row:
            data = dict(row)
            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
            return data
        return None
    def list_project_images(self, project_id: str) -> List[Dict]:
        """获取项目的所有图片"""
        conn = self.get_conn()
        rows = conn.execute(
            "SELECT * FROM images WHERE project_id = ? ORDER BY created_at DESC",
            (project_id,)
        ).fetchall()
        conn.close()
        images = []
        for row in rows:
            data = dict(row)
            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
            images.append(data)
        return images
    def create_multimodal_mention(self, mention_id: str, project_id: str, 
                                  entity_id: str, modality: str, source_id: str,
                                  source_type: str, text_snippet: str = "",
                                  confidence: float = 1.0) -> str:
        """创建多模态实体提及记录"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            """INSERT OR REPLACE INTO multimodal_mentions 
               (id, project_id, entity_id, modality, source_id, source_type, 
                text_snippet, confidence, created_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (mention_id, project_id, entity_id, modality, source_id, 
             source_type, text_snippet, confidence, now)
        )
        conn.commit()
        conn.close()
        return mention_id
    def get_entity_multimodal_mentions(self, entity_id: str) -> List[Dict]:
        """获取实体的多模态提及"""
        conn = self.get_conn()
        rows = conn.execute(
            """SELECT m.*, e.name as entity_name
               FROM multimodal_mentions m
               JOIN entities e ON m.entity_id = e.id
               WHERE m.entity_id = ? ORDER BY m.created_at DESC""",
            (entity_id,)
        ).fetchall()
        conn.close()
        return [dict(r) for r in rows]
    def get_project_multimodal_mentions(self, project_id: str, 
                                        modality: str = None) -> List[Dict]:
        """获取项目的多模态提及"""
        conn = self.get_conn()
        if modality:
            rows = conn.execute(
                """SELECT m.*, e.name as entity_name
                   FROM multimodal_mentions m
                   JOIN entities e ON m.entity_id = e.id
                   WHERE m.project_id = ? AND m.modality = ?
                   ORDER BY m.created_at DESC""",
                (project_id, modality)
            ).fetchall()
        else:
            rows = conn.execute(
                """SELECT m.*, e.name as entity_name
                   FROM multimodal_mentions m
                   JOIN entities e ON m.entity_id = e.id
                   WHERE m.project_id = ? ORDER BY m.created_at DESC""",
                (project_id,)
            ).fetchall()
        conn.close()
        return [dict(r) for r in rows]
    def create_multimodal_entity_link(self, link_id: str, entity_id: str,
                                      linked_entity_id: str, link_type: str,
                                      confidence: float = 1.0, 
                                      evidence: str = "",
                                      modalities: List[str] = None) -> str:
        """创建多模态实体关联"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            """INSERT OR REPLACE INTO multimodal_entity_links 
               (id, entity_id, linked_entity_id, link_type, confidence, 
                evidence, modalities, created_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
            (link_id, entity_id, linked_entity_id, link_type, confidence,
             evidence, json.dumps(modalities or []), now)
        )
        conn.commit()
        conn.close()
        return link_id
    def get_entity_multimodal_links(self, entity_id: str) -> List[Dict]:
        """获取实体的多模态关联"""
        conn = self.get_conn()
        rows = conn.execute(
            """SELECT l.*, e1.name as entity_name, e2.name as linked_entity_name
               FROM multimodal_entity_links l
               JOIN entities e1 ON l.entity_id = e1.id
               JOIN entities e2 ON l.linked_entity_id = e2.id
               WHERE l.entity_id = ? OR l.linked_entity_id = ?""",
            (entity_id, entity_id)
        ).fetchall()
        conn.close()
        links = []
        for row in rows:
            data = dict(row)
            data['modalities'] = json.loads(data['modalities']) if data['modalities'] else []
            links.append(data)
        return links
    def get_project_multimodal_stats(self, project_id: str) -> Dict:
        """获取项目多模态统计信息"""
        conn = self.get_conn()
        stats = {
            'video_count': 0,
            'image_count': 0,
            'multimodal_entity_count': 0,
            'cross_modal_links': 0,
            'modality_distribution': {}
        }
        # 视频数量
        row = conn.execute(
            "SELECT COUNT(*) as count FROM videos WHERE project_id = ?",
            (project_id,)
        ).fetchone()
        stats['video_count'] = row['count']
        # 图片数量
        row = conn.execute(
            "SELECT COUNT(*) as count FROM images WHERE project_id = ?",
            (project_id,)
        ).fetchone()
        stats['image_count'] = row['count']
        # 多模态实体数量
        row = conn.execute(
            """SELECT COUNT(DISTINCT entity_id) as count 
               FROM multimodal_mentions WHERE project_id = ?""",
            (project_id,)
        ).fetchone()
        stats['multimodal_entity_count'] = row['count']
        # 跨模态关联数量
        row = conn.execute(
            """SELECT COUNT(*) as count FROM multimodal_entity_links 
               WHERE entity_id IN (SELECT id FROM entities WHERE project_id = ?)""",
            (project_id,)
        ).fetchone()
        stats['cross_modal_links'] = row['count']
        # 模态分布
        for modality in ['audio', 'video', 'image', 'document']:
            row = conn.execute(
                """SELECT COUNT(*) as count FROM multimodal_mentions 
                   WHERE project_id = ? AND modality = ?""",
                (project_id, modality)
            ).fetchone()
            stats['modality_distribution'][modality] = row['count']
        conn.close()
        return stats
 # Singleton instance
 _db_manager = None
--- a/backend/docs/multimodal_api.md
+++ b/backend/docs/multimodal_api.md
@@ -0,0 +1,308 @@
 # InsightFlow Phase 7 - 多模态支持 API 文档
 ## 概述
 Phase 7 多模态支持模块为 InsightFlow 添加了处理视频和图片的能力，支持：
 1. **视频处理**：提取音频、关键帧、OCR 识别
 2. **图片处理**：识别白板、PPT、手写笔记等内容
 3. **多模态实体关联**：跨模态实体对齐和知识融合
 ## 新增 API 端点
 ### 视频处理
 #### 上传视频
 ```
 POST /api/v1/projects/{project_id}/upload-video
 ```
 **参数：**
 - `file` (required): 视频文件
 - `extract_interval` (optional): 关键帧提取间隔（秒），默认 5 秒
 **响应：**
 ```json
 {
  "video_id": "abc123",
  "project_id": "proj456",
  "filename": "meeting.mp4",
  "status": "completed",
  "audio_extracted": true,
  "frame_count": 24,
  "ocr_text_preview": "会议内容预览...",
  "message": "Video processed successfully"
 }
 ```
 #### 获取项目视频列表
 ```
 GET /api/v1/projects/{project_id}/videos
 ```
 **响应：**
 ```json
 [
  {
    "id": "abc123",
    "filename": "meeting.mp4",
    "duration": 120.5,
    "fps": 30.0,
    "resolution": {"width": 1920, "height": 1080},
    "ocr_preview": "会议内容...",
    "status": "completed",
    "created_at": "2024-01-15T10:30:00"
  }
 ]
 ```
 #### 获取视频关键帧
 ```
 GET /api/v1/videos/{video_id}/frames
 ```
 **响应：**
 ```json
 [
  {
    "id": "frame001",
    "frame_number": 1,
    "timestamp": 0.0,
    "image_url": "/tmp/frames/video123/frame_000001_0.00.jpg",
    "ocr_text": "第一页内容...",
    "entities": [{"name": "Project Alpha", "type": "PROJECT"}]
  }
 ]
 ```
 ### 图片处理
 #### 上传图片
 ```
 POST /api/v1/projects/{project_id}/upload-image
 ```
 **参数：**
 - `file` (required): 图片文件
 - `detect_type` (optional): 是否自动检测图片类型，默认 true
 **响应：**
 ```json
 {
  "image_id": "img789",
  "project_id": "proj456",
  "filename": "whiteboard.jpg",
  "image_type": "whiteboard",
  "ocr_text_preview": "白板内容...",
  "description": "这是一张白板图片。内容摘要：...",
  "entity_count": 5,
  "status": "completed"
 }
 ```
 #### 批量上传图片
 ```
 POST /api/v1/projects/{project_id}/upload-images-batch
 ```
 **参数：**
 - `files` (required): 多个图片文件
 **响应：**
 ```json
 {
  "project_id": "proj456",
  "total_count": 3,
  "success_count": 3,
  "failed_count": 0,
  "results": [
    {
      "image_id": "img001",
      "status": "success",
      "image_type": "ppt",
      "entity_count": 4
    }
  ]
 }
 ```
 #### 获取项目图片列表
 ```
 GET /api/v1/projects/{project_id}/images
 ```
 ### 多模态实体关联
 #### 跨模态实体对齐
 ```
 POST /api/v1/projects/{project_id}/multimodal/align
 ```
 **参数：**
 - `threshold` (optional): 相似度阈值，默认 0.85
 **响应：**
 ```json
 {
  "project_id": "proj456",
  "aligned_count": 5,
  "links": [
    {
      "link_id": "link001",
      "source_entity_id": "ent001",
      "target_entity_id": "ent002",
      "source_modality": "video",
      "target_modality": "document",
      "link_type": "same_as",
      "confidence": 0.95,
      "evidence": "Cross-modal alignment: exact"
    }
  ],
  "message": "Successfully aligned 5 cross-modal entity pairs"
 }
 ```
 #### 获取多模态统计信息
 ```
 GET /api/v1/projects/{project_id}/multimodal/stats
 ```
 **响应：**
 ```json
 {
  "project_id": "proj456",
  "video_count": 3,
  "image_count": 10,
  "multimodal_entity_count": 25,
  "cross_modal_links": 8,
  "modality_distribution": {
    "audio": 15,
    "video": 8,
    "image": 12,
    "document": 20
  }
 }
 ```
 #### 获取实体多模态提及
 ```
 GET /api/v1/entities/{entity_id}/multimodal-mentions
 ```
 **响应：**
 ```json
 [
  {
    "id": "mention001",
    "entity_id": "ent001",
    "entity_name": "Project Alpha",
    "modality": "video",
    "source_id": "video123",
    "source_type": "video_frame",
    "text_snippet": "Project Alpha 进度",
    "confidence": 1.0,
    "created_at": "2024-01-15T10:30:00"
  }
 ]
 ```
 #### 建议多模态实体合并
 ```
 GET /api/v1/projects/{project_id}/multimodal/suggest-merges
 ```
 **响应：**
 ```json
 {
  "project_id": "proj456",
  "suggestion_count": 3,
  "suggestions": [
    {
      "entity1": {"id": "ent001", "name": "K8s", "type": "TECH"},
      "entity2": {"id": "ent002", "name": "Kubernetes", "type": "TECH"},
      "similarity": 0.95,
      "match_type": "alias_match",
      "suggested_action": "merge"
    }
  ]
 }
 ```
 ## 数据库表结构
 ### videos 表
 存储视频文件信息
 - `id`: 视频ID
 - `project_id`: 所属项目ID
 - `filename`: 文件名
 - `duration`: 视频时长（秒）
 - `fps`: 帧率
 - `resolution`: 分辨率（JSON）
 - `audio_transcript_id`: 关联的音频转录ID
 - `full_ocr_text`: 所有帧OCR文本合并
 - `extracted_entities`: 提取的实体（JSON）
 - `extracted_relations`: 提取的关系（JSON）
 - `status`: 处理状态
 ### video_frames 表
 存储视频关键帧信息
 - `id`: 帧ID
 - `video_id`: 所属视频ID
 - `frame_number`: 帧序号
 - `timestamp`: 时间戳（秒）
 - `image_url`: 图片URL或路径
 - `ocr_text`: OCR识别文本
 - `extracted_entities`: 该帧提取的实体
 ### images 表
 存储图片文件信息
 - `id`: 图片ID
 - `project_id`: 所属项目ID
 - `filename`: 文件名
 - `ocr_text`: OCR识别文本
 - `description`: 图片描述
 - `extracted_entities`: 提取的实体
 - `extracted_relations`: 提取的关系
 - `status`: 处理状态
 ### multimodal_mentions 表
 存储实体在多模态中的提及
 - `id`: 提及ID
 - `project_id`: 所属项目ID
 - `entity_id`: 实体ID
 - `modality`: 模态类型（audio/video/image/document）
 - `source_id`: 来源ID
 - `source_type`: 来源类型
 - `text_snippet`: 文本片段
 - `confidence`: 置信度
 ### multimodal_entity_links 表
 存储跨模态实体关联
 - `id`: 关联ID
 - `entity_id`: 实体ID
 - `linked_entity_id`: 关联实体ID
 - `link_type`: 关联类型（same_as/related_to/part_of）
 - `confidence`: 置信度
 - `evidence`: 关联证据
 - `modalities`: 涉及的模态列表
 ## 依赖安装
 ```bash
 pip install ffmpeg-python pillow opencv-python pytesseract
 ```
 注意：使用 OCR 功能需要安装 Tesseract OCR 引擎：
 - Ubuntu/Debian: `sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim`
 - macOS: `brew install tesseract tesseract-lang`
 - Windows: 下载安装包从 https://github.com/UB-Mannheim/tesseract/wiki
 ## 环境变量
 ```bash
 # 可选：自定义临时目录
 export INSIGHTFLOW_TEMP_DIR=/path/to/temp
 # 可选：Tesseract 路径（Windows）
 export TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
 ```
--- a/backend/image_processor.py
+++ b/backend/image_processor.py
@@ -0,0 +1,547 @@
 #!/usr/bin/env python3
 """
 InsightFlow Image Processor - Phase 7
 图片处理模块：识别白板、PPT、手写笔记等内容
 """
 import os
 import io
 import json
 import uuid
 import base64
 from typing import List, Dict, Optional, Tuple
 from dataclasses import dataclass
 from pathlib import Path
 # 尝试导入图像处理库
 try:
    from PIL import Image, ImageEnhance, ImageFilter
    PIL_AVAILABLE = True
 except ImportError:
    PIL_AVAILABLE = False
 try:
    import cv2
    import numpy as np
    CV2_AVAILABLE = True
 except ImportError:
    CV2_AVAILABLE = False
 try:
    import pytesseract
    PYTESSERACT_AVAILABLE = True
 except ImportError:
    PYTESSERACT_AVAILABLE = False
@dataclass
 class ImageEntity:
    """图片中检测到的实体"""
    name: str
    type: str
    confidence: float
    bbox: Optional[Tuple[int, int, int, int]] = None  # (x, y, width, height)
@dataclass
 class ImageRelation:
    """图片中检测到的关系"""
    source: str
    target: str
    relation_type: str
    confidence: float
@dataclass
 class ImageProcessingResult:
    """图片处理结果"""
    image_id: str
    image_type: str  # whiteboard, ppt, handwritten, screenshot, other
    ocr_text: str
    description: str
    entities: List[ImageEntity]
    relations: List[ImageRelation]
    width: int
    height: int
    success: bool
    error_message: str = ""
@dataclass
 class BatchProcessingResult:
    """批量图片处理结果"""
    results: List[ImageProcessingResult]
    total_count: int
    success_count: int
    failed_count: int
 class ImageProcessor:
    """图片处理器 - 处理各种类型图片"""
    # 图片类型定义
    IMAGE_TYPES = {
        'whiteboard': '白板',
        'ppt': 'PPT/演示文稿',
        'handwritten': '手写笔记',
        'screenshot': '屏幕截图',
        'document': '文档图片',
        'other': '其他'
    }
    def __init__(self, temp_dir: str = None):
        """
        初始化图片处理器
        Args:
            temp_dir: 临时文件目录
        """
        self.temp_dir = temp_dir or os.path.join(os.getcwd(), 'temp', 'images')
        os.makedirs(self.temp_dir, exist_ok=True)
    def preprocess_image(self, image, image_type: str = None):
        """
        预处理图片以提高OCR质量
        Args:
            image: PIL Image 对象
            image_type: 图片类型（用于针对性处理）
        Returns:
            处理后的图片
        """
        if not PIL_AVAILABLE:
            return image
        try:
            # 转换为RGB（如果是RGBA）
            if image.mode == 'RGBA':
                image = image.convert('RGB')
            # 根据图片类型进行针对性处理
            if image_type == 'whiteboard':
                # 白板：增强对比度，去除背景
                image = self._enhance_whiteboard(image)
            elif image_type == 'handwritten':
                # 手写笔记：降噪，增强对比度
                image = self._enhance_handwritten(image)
            elif image_type == 'screenshot':
                # 截图：轻微锐化
                image = image.filter(ImageFilter.SHARPEN)
            # 通用处理：调整大小（如果太大）
            max_size = 4096
            if max(image.size) > max_size:
                ratio = max_size / max(image.size)
                new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
                image = image.resize(new_size, Image.Resampling.LANCZOS)
            return image
        except Exception as e:
            print(f"Image preprocessing error: {e}")
            return image
    def _enhance_whiteboard(self, image):
        """增强白板图片"""
        # 转换为灰度
        gray = image.convert('L')
        # 增强对比度
        enhancer = ImageEnhance.Contrast(gray)
        enhanced = enhancer.enhance(2.0)
        # 二值化
        threshold = 128
        binary = enhanced.point(lambda x: 0 if x < threshold else 255, '1')
        return binary.convert('L')
    def _enhance_handwritten(self, image):
        """增强手写笔记图片"""
        # 转换为灰度
        gray = image.convert('L')
        # 轻微降噪
        blurred = gray.filter(ImageFilter.GaussianBlur(radius=1))
        # 增强对比度
        enhancer = ImageEnhance.Contrast(blurred)
        enhanced = enhancer.enhance(1.5)
        return enhanced
    def detect_image_type(self, image, ocr_text: str = "") -> str:
        """
        自动检测图片类型
        Args:
            image: PIL Image 对象
            ocr_text: OCR识别的文本
        Returns:
            图片类型字符串
        """
        if not PIL_AVAILABLE:
            return 'other'
        try:
            # 基于图片特征和OCR内容判断类型
            width, height = image.size
            aspect_ratio = width / height
            # 检测是否为PPT（通常是16:9或4:3）
            if 1.3 <= aspect_ratio <= 1.8:
                # 检查是否有典型的PPT特征（标题、项目符号等）
                if any(keyword in ocr_text.lower() for keyword in ['slide', 'page', '第', '页']):
                    return 'ppt'
            # 检测是否为白板（大量手写文字，可能有箭头、框等）
            if CV2_AVAILABLE:
                img_array = np.array(image.convert('RGB'))
                gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
                # 检测边缘（白板通常有很多线条）
                edges = cv2.Canny(gray, 50, 150)
                edge_ratio = np.sum(edges > 0) / edges.size
                # 如果边缘比例高，可能是白板
                if edge_ratio > 0.05 and len(ocr_text) > 50:
                    return 'whiteboard'
            # 检测是否为手写笔记（文字密度高，可能有涂鸦）
            if len(ocr_text) > 100 and aspect_ratio < 1.5:
                # 检查手写特征（不规则的行高）
                return 'handwritten'
            # 检测是否为截图（可能有UI元素）
            if any(keyword in ocr_text.lower() for keyword in ['button', 'menu', 'click', '登录', '确定', '取消']):
                return 'screenshot'
            # 默认文档类型
            if len(ocr_text) > 200:
                return 'document'
            return 'other'
        except Exception as e:
            print(f"Image type detection error: {e}")
            return 'other'
    def perform_ocr(self, image, lang: str = 'chi_sim+eng') -> Tuple[str, float]:
        """
        对图片进行OCR识别
        Args:
            image: PIL Image 对象
            lang: OCR语言
        Returns:
            (识别的文本, 置信度)
        """
        if not PYTESSERACT_AVAILABLE:
            return "", 0.0
        try:
            # 预处理图片
            processed_image = self.preprocess_image(image)
            # 执行OCR
            text = pytesseract.image_to_string(processed_image, lang=lang)
            # 获取置信度
            data = pytesseract.image_to_data(processed_image, output_type=pytesseract.Output.DICT)
            confidences = [int(c) for c in data['conf'] if int(c) > 0]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0
            return text.strip(), avg_confidence / 100.0
        except Exception as e:
            print(f"OCR error: {e}")
            return "", 0.0
    def extract_entities_from_text(self, text: str) -> List[ImageEntity]:
        """
        从OCR文本中提取实体
        Args:
            text: OCR识别的文本
        Returns:
            实体列表
        """
        entities = []
        # 简单的实体提取规则（可以替换为LLM调用）
        # 提取大写字母开头的词组（可能是专有名词）
        import re
        # 项目名称（通常是大写或带引号）
        project_pattern = r'["\']([^"\']+)["\']|([A-Z][a-zA-Z0-9]*(?:\s+[A-Z][a-zA-Z0-9]*)+)'
        for match in re.finditer(project_pattern, text):
            name = match.group(1) or match.group(2)
            if name and len(name) > 2:
                entities.append(ImageEntity(
                    name=name.strip(),
                    type='PROJECT',
                    confidence=0.7
                ))
        # 人名（中文）
        name_pattern = r'([\u4e00-\u9fa5]{2,4})(?:先生|女士|总|经理|工程师|老师)'
        for match in re.finditer(name_pattern, text):
            entities.append(ImageEntity(
                name=match.group(1),
                type='PERSON',
                confidence=0.8
            ))
        # 技术术语
        tech_keywords = ['K8s', 'Kubernetes', 'Docker', 'API', 'SDK', 'AI', 'ML', 
                        'Python', 'Java', 'React', 'Vue', 'Node.js', '数据库', '服务器']
        for keyword in tech_keywords:
            if keyword in text:
                entities.append(ImageEntity(
                    name=keyword,
                    type='TECH',
                    confidence=0.9
                ))
        # 去重
        seen = set()
        unique_entities = []
        for e in entities:
            key = (e.name.lower(), e.type)
            if key not in seen:
                seen.add(key)
                unique_entities.append(e)
        return unique_entities
    def generate_description(self, image_type: str, ocr_text: str, 
                            entities: List[ImageEntity]) -> str:
        """
        生成图片描述
        Args:
            image_type: 图片类型
            ocr_text: OCR文本
            entities: 检测到的实体
        Returns:
            图片描述
        """
        type_name = self.IMAGE_TYPES.get(image_type, '图片')
        description_parts = [f"这是一张{type_name}图片。"]
        if ocr_text:
            # 提取前200字符作为摘要
            text_preview = ocr_text[:200].replace('\n', ' ')
            if len(ocr_text) > 200:
                text_preview += "..."
            description_parts.append(f"内容摘要：{text_preview}")
        if entities:
            entity_names = [e.name for e in entities[:5]]  # 最多显示5个实体
            description_parts.append(f"识别到的关键实体：{', '.join(entity_names)}")
        return " ".join(description_parts)
    def process_image(self, image_data: bytes, filename: str = None,
                     image_id: str = None, detect_type: bool = True) -> ImageProcessingResult:
        """
        处理单张图片
        Args:
            image_data: 图片二进制数据
            filename: 文件名
            image_id: 图片ID（可选）
            detect_type: 是否自动检测图片类型
        Returns:
            图片处理结果
        """
        image_id = image_id or str(uuid.uuid4())[:8]
        if not PIL_AVAILABLE:
            return ImageProcessingResult(
                image_id=image_id,
                image_type='other',
                ocr_text='',
                description='PIL not available',
                entities=[],
                relations=[],
                width=0,
                height=0,
                success=False,
                error_message='PIL library not available'
            )
        try:
            # 加载图片
            image = Image.open(io.BytesIO(image_data))
            width, height = image.size
            # 执行OCR
            ocr_text, ocr_confidence = self.perform_ocr(image)
            # 检测图片类型
            image_type = 'other'
            if detect_type:
                image_type = self.detect_image_type(image, ocr_text)
            # 提取实体
            entities = self.extract_entities_from_text(ocr_text)
            # 生成描述
            description = self.generate_description(image_type, ocr_text, entities)
            # 提取关系（基于实体共现）
            relations = self._extract_relations(entities, ocr_text)
            # 保存图片文件（可选）
            if filename:
                save_path = os.path.join(self.temp_dir, f"{image_id}_{filename}")
                image.save(save_path)
            return ImageProcessingResult(
                image_id=image_id,
                image_type=image_type,
                ocr_text=ocr_text,
                description=description,
                entities=entities,
                relations=relations,
                width=width,
                height=height,
                success=True
            )
        except Exception as e:
            return ImageProcessingResult(
                image_id=image_id,
                image_type='other',
                ocr_text='',
                description='',
                entities=[],
                relations=[],
                width=0,
                height=0,
                success=False,
                error_message=str(e)
            )
    def _extract_relations(self, entities: List[ImageEntity], text: str) -> List[ImageRelation]:
        """
        从文本中提取实体关系
        Args:
            entities: 实体列表
            text: 文本内容
        Returns:
            关系列表
        """
        relations = []
        if len(entities) < 2:
            return relations
        # 简单的关系提取：如果两个实体在同一句子中出现，则认为它们相关
        sentences = text.replace('。', '.').replace('！', '!').replace('？', '?').split('.')
        for sentence in sentences:
            sentence_entities = []
            for entity in entities:
                if entity.name in sentence:
                    sentence_entities.append(entity)
            # 如果句子中有多个实体，建立关系
            if len(sentence_entities) >= 2:
                for i in range(len(sentence_entities)):
                    for j in range(i + 1, len(sentence_entities)):
                        relations.append(ImageRelation(
                            source=sentence_entities[i].name,
                            target=sentence_entities[j].name,
                            relation_type='related',
                            confidence=0.5
                        ))
        return relations
    def process_batch(self, images_data: List[Tuple[bytes, str]], 
                     project_id: str = None) -> BatchProcessingResult:
        """
        批量处理图片
        Args:
            images_data: 图片数据列表，每项为 (image_data, filename)
            project_id: 项目ID
        Returns:
            批量处理结果
        """
        results = []
        success_count = 0
        failed_count = 0
        for image_data, filename in images_data:
            result = self.process_image(image_data, filename)
            results.append(result)
            if result.success:
                success_count += 1
            else:
                failed_count += 1
        return BatchProcessingResult(
            results=results,
            total_count=len(results),
            success_count=success_count,
            failed_count=failed_count
        )
    def image_to_base64(self, image_data: bytes) -> str:
        """
        将图片转换为base64编码
        Args:
            image_data: 图片二进制数据
        Returns:
            base64编码的字符串
        """
        return base64.b64encode(image_data).decode('utf-8')
    def get_image_thumbnail(self, image_data: bytes, size: Tuple[int, int] = (200, 200)) -> bytes:
        """
        生成图片缩略图
        Args:
            image_data: 图片二进制数据
            size: 缩略图尺寸
        Returns:
            缩略图二进制数据
        """
        if not PIL_AVAILABLE:
            return image_data
        try:
            image = Image.open(io.BytesIO(image_data))
            image.thumbnail(size, Image.Resampling.LANCZOS)
            buffer = io.BytesIO()
            image.save(buffer, format='JPEG')
            return buffer.getvalue()
        except Exception as e:
            print(f"Thumbnail generation error: {e}")
            return image_data
 # Singleton instance
 _image_processor = None
 def get_image_processor(temp_dir: str = None) -> ImageProcessor:
    """获取图片处理器单例"""
    global _image_processor
    if _image_processor is None:
        _image_processor = ImageProcessor(temp_dir)
    return _image_processor
--- a/backend/main.py
+++ b/backend/main.py
--- a/backend/multimodal_entity_linker.py
+++ b/backend/multimodal_entity_linker.py
@@ -0,0 +1,514 @@
 #!/usr/bin/env python3
 """
 InsightFlow Multimodal Entity Linker - Phase 7
 多模态实体关联模块：跨模态实体对齐和知识融合
 """
 import os
 import json
 import uuid
 from typing import List, Dict, Optional, Tuple, Set
 from dataclasses import dataclass
 from difflib import SequenceMatcher
 # 尝试导入embedding库
 try:
    import numpy as np
    NUMPY_AVAILABLE = True
 except ImportError:
    NUMPY_AVAILABLE = False
@dataclass
 class MultimodalEntity:
    """多模态实体"""
    id: str
    entity_id: str
    project_id: str
    name: str
    source_type: str  # audio, video, image, document
    source_id: str
    mention_context: str
    confidence: float
    modality_features: Dict = None  # 模态特定特征
    def __post_init__(self):
        if self.modality_features is None:
            self.modality_features = {}
@dataclass
 class EntityLink:
    """实体关联"""
    id: str
    project_id: str
    source_entity_id: str
    target_entity_id: str
    link_type: str  # same_as, related_to, part_of
    source_modality: str
    target_modality: str
    confidence: float
    evidence: str
@dataclass
 class AlignmentResult:
    """对齐结果"""
    entity_id: str
    matched_entity_id: Optional[str]
    similarity: float
    match_type: str  # exact, fuzzy, embedding
    confidence: float
@dataclass
 class FusionResult:
    """知识融合结果"""
    canonical_entity_id: str
    merged_entity_ids: List[str]
    fused_properties: Dict
    source_modalities: List[str]
    confidence: float
 class MultimodalEntityLinker:
    """多模态实体关联器 - 跨模态实体对齐和知识融合"""
    # 关联类型
    LINK_TYPES = {
        'same_as': '同一实体',
        'related_to': '相关实体',
        'part_of': '组成部分',
        'mentions': '提及关系'
    }
    # 模态类型
    MODALITIES = ['audio', 'video', 'image', 'document']
    def __init__(self, similarity_threshold: float = 0.85):
        """
        初始化多模态实体关联器
        Args:
            similarity_threshold: 相似度阈值
        """
        self.similarity_threshold = similarity_threshold
    def calculate_string_similarity(self, s1: str, s2: str) -> float:
        """
        计算字符串相似度
        Args:
            s1: 字符串1
            s2: 字符串2
        Returns:
            相似度分数 (0-1)
        """
        if not s1 or not s2:
            return 0.0
        s1, s2 = s1.lower().strip(), s2.lower().strip()
        # 完全匹配
        if s1 == s2:
            return 1.0
        # 包含关系
        if s1 in s2 or s2 in s1:
            return 0.9
        # 编辑距离相似度
        return SequenceMatcher(None, s1, s2).ratio()
    def calculate_entity_similarity(self, entity1: Dict, entity2: Dict) -> Tuple[float, str]:
        """
        计算两个实体的综合相似度
        Args:
            entity1: 实体1信息
            entity2: 实体2信息
        Returns:
            (相似度, 匹配类型)
        """
        # 名称相似度
        name_sim = self.calculate_string_similarity(
            entity1.get('name', ''),
            entity2.get('name', '')
        )
        # 如果名称完全匹配
        if name_sim == 1.0:
            return 1.0, 'exact'
        # 检查别名
        aliases1 = set(a.lower() for a in entity1.get('aliases', []))
        aliases2 = set(a.lower() for a in entity2.get('aliases', []))
        if aliases1 & aliases2:  # 有共同别名
            return 0.95, 'alias_match'
        if entity2.get('name', '').lower() in aliases1:
            return 0.95, 'alias_match'
        if entity1.get('name', '').lower() in aliases2:
            return 0.95, 'alias_match'
        # 定义相似度
        def_sim = self.calculate_string_similarity(
            entity1.get('definition', ''),
            entity2.get('definition', '')
        )
        # 综合相似度
        combined_sim = name_sim * 0.7 + def_sim * 0.3
        if combined_sim >= self.similarity_threshold:
            return combined_sim, 'fuzzy'
        return combined_sim, 'none'
    def find_matching_entity(self, query_entity: Dict, 
                            candidate_entities: List[Dict],
                            exclude_ids: Set[str] = None) -> Optional[AlignmentResult]:
        """
        在候选实体中查找匹配的实体
        Args:
            query_entity: 查询实体
            candidate_entities: 候选实体列表
            exclude_ids: 排除的实体ID
        Returns:
            对齐结果
        """
        exclude_ids = exclude_ids or set()
        best_match = None
        best_similarity = 0.0
        for candidate in candidate_entities:
            if candidate.get('id') in exclude_ids:
                continue
            similarity, match_type = self.calculate_entity_similarity(
                query_entity, candidate
            )
            if similarity > best_similarity and similarity >= self.similarity_threshold:
                best_similarity = similarity
                best_match = candidate
                best_match_type = match_type
        if best_match:
            return AlignmentResult(
                entity_id=query_entity.get('id'),
                matched_entity_id=best_match.get('id'),
                similarity=best_similarity,
                match_type=best_match_type,
                confidence=best_similarity
            )
        return None
    def align_cross_modal_entities(self, project_id: str,
                                    audio_entities: List[Dict],
                                    video_entities: List[Dict],
                                    image_entities: List[Dict],
                                    document_entities: List[Dict]) -> List[EntityLink]:
        """
        跨模态实体对齐
        Args:
            project_id: 项目ID
            audio_entities: 音频模态实体
            video_entities: 视频模态实体
            image_entities: 图片模态实体
            document_entities: 文档模态实体
        Returns:
            实体关联列表
        """
        links = []
        # 合并所有实体
        all_entities = {
            'audio': audio_entities,
            'video': video_entities,
            'image': image_entities,
            'document': document_entities
        }
        # 跨模态对齐
        for mod1 in self.MODALITIES:
            for mod2 in self.MODALITIES:
                if mod1 >= mod2:  # 避免重复比较
                    continue
                entities1 = all_entities.get(mod1, [])
                entities2 = all_entities.get(mod2, [])
                for ent1 in entities1:
                    # 在另一个模态中查找匹配
                    result = self.find_matching_entity(ent1, entities2)
                    if result and result.matched_entity_id:
                        link = EntityLink(
                            id=str(uuid.uuid4())[:8],
                            project_id=project_id,
                            source_entity_id=ent1.get('id'),
                            target_entity_id=result.matched_entity_id,
                            link_type='same_as' if result.similarity > 0.95 else 'related_to',
                            source_modality=mod1,
                            target_modality=mod2,
                            confidence=result.confidence,
                            evidence=f"Cross-modal alignment: {result.match_type}"
                        )
                        links.append(link)
        return links
    def fuse_entity_knowledge(self, entity_id: str,
                              linked_entities: List[Dict],
                              multimodal_mentions: List[Dict]) -> FusionResult:
        """
        融合多模态实体知识
        Args:
            entity_id: 主实体ID
            linked_entities: 关联的实体信息列表
            multimodal_mentions: 多模态提及列表
        Returns:
            融合结果
        """
        # 收集所有属性
        fused_properties = {
            'names': set(),
            'definitions': [],
            'aliases': set(),
            'types': set(),
            'modalities': set(),
            'contexts': []
        }
        merged_ids = []
        for entity in linked_entities:
            merged_ids.append(entity.get('id'))
            # 收集名称
            fused_properties['names'].add(entity.get('name', ''))
            # 收集定义
            if entity.get('definition'):
                fused_properties['definitions'].append(entity.get('definition'))
            # 收集别名
            fused_properties['aliases'].update(entity.get('aliases', []))
            # 收集类型
            fused_properties['types'].add(entity.get('type', 'OTHER'))
        # 收集模态和上下文
        for mention in multimodal_mentions:
            fused_properties['modalities'].add(mention.get('source_type', ''))
            if mention.get('mention_context'):
                fused_properties['contexts'].append(mention.get('mention_context'))
        # 选择最佳定义（最长的那个）
        best_definition = max(fused_properties['definitions'], key=len) \
                         if fused_properties['definitions'] else ""
        # 选择最佳名称（最常见的那个）
        from collections import Counter
        name_counts = Counter(fused_properties['names'])
        best_name = name_counts.most_common(1)[0][0] if name_counts else ""
        # 构建融合结果
        return FusionResult(
            canonical_entity_id=entity_id,
            merged_entity_ids=merged_ids,
            fused_properties={
                'name': best_name,
                'definition': best_definition,
                'aliases': list(fused_properties['aliases']),
                'types': list(fused_properties['types']),
                'modalities': list(fused_properties['modalities']),
                'contexts': fused_properties['contexts'][:10]  # 最多10个上下文
            },
            source_modalities=list(fused_properties['modalities']),
            confidence=min(1.0, len(linked_entities) * 0.2 + 0.5)
        )
    def detect_entity_conflicts(self, entities: List[Dict]) -> List[Dict]:
        """
        检测实体冲突（同名但不同义）
        Args:
            entities: 实体列表
        Returns:
            冲突列表
        """
        conflicts = []
        # 按名称分组
        name_groups = {}
        for entity in entities:
            name = entity.get('name', '').lower()
            if name:
                if name not in name_groups:
                    name_groups[name] = []
                name_groups[name].append(entity)
        # 检测同名但定义不同的实体
        for name, group in name_groups.items():
            if len(group) > 1:
                # 检查定义是否相似
                definitions = [e.get('definition', '') for e in group if e.get('definition')]
                if len(definitions) > 1:
                    # 计算定义之间的相似度
                    sim_matrix = []
                    for i, d1 in enumerate(definitions):
                        for j, d2 in enumerate(definitions):
                            if i < j:
                                sim = self.calculate_string_similarity(d1, d2)
                                sim_matrix.append(sim)
                    # 如果定义相似度都很低，可能是冲突
                    if sim_matrix and all(s < 0.5 for s in sim_matrix):
                        conflicts.append({
                            'name': name,
                            'entities': group,
                            'type': 'homonym_conflict',
                            'suggestion': 'Consider disambiguating these entities'
                        })
        return conflicts
    def suggest_entity_merges(self, entities: List[Dict],
                              existing_links: List[EntityLink] = None) -> List[Dict]:
        """
        建议实体合并
        Args:
            entities: 实体列表
            existing_links: 现有实体关联
        Returns:
            合并建议列表
        """
        suggestions = []
        existing_pairs = set()
        # 记录已有的关联
        if existing_links:
            for link in existing_links:
                pair = tuple(sorted([link.source_entity_id, link.target_entity_id]))
                existing_pairs.add(pair)
        # 检查所有实体对
        for i, ent1 in enumerate(entities):
            for j, ent2 in enumerate(entities):
                if i >= j:
                    continue
                # 检查是否已有关联
                pair = tuple(sorted([ent1.get('id'), ent2.get('id')]))
                if pair in existing_pairs:
                    continue
                # 计算相似度
                similarity, match_type = self.calculate_entity_similarity(ent1, ent2)
                if similarity >= self.similarity_threshold:
                    suggestions.append({
                        'entity1': ent1,
                        'entity2': ent2,
                        'similarity': similarity,
                        'match_type': match_type,
                        'suggested_action': 'merge' if similarity > 0.95 else 'link'
                    })
        # 按相似度排序
        suggestions.sort(key=lambda x: x['similarity'], reverse=True)
        return suggestions
    def create_multimodal_entity_record(self, project_id: str,
                                        entity_id: str,
                                        source_type: str,
                                        source_id: str,
                                        mention_context: str = "",
                                        confidence: float = 1.0) -> MultimodalEntity:
        """
        创建多模态实体记录
        Args:
            project_id: 项目ID
            entity_id: 实体ID
            source_type: 来源类型
            source_id: 来源ID
            mention_context: 提及上下文
            confidence: 置信度
        Returns:
            多模态实体记录
        """
        return MultimodalEntity(
            id=str(uuid.uuid4())[:8],
            entity_id=entity_id,
            project_id=project_id,
            name="",  # 将在后续填充
            source_type=source_type,
            source_id=source_id,
            mention_context=mention_context,
            confidence=confidence
        )
    def analyze_modality_distribution(self, multimodal_entities: List[MultimodalEntity]) -> Dict:
        """
        分析模态分布
        Args:
            multimodal_entities: 多模态实体列表
        Returns:
            模态分布统计
        """
        distribution = {mod: 0 for mod in self.MODALITIES}
        cross_modal_entities = set()
        # 统计每个模态的实体数
        for me in multimodal_entities:
            if me.source_type in distribution:
                distribution[me.source_type] += 1
        # 统计跨模态实体
        entity_modalities = {}
        for me in multimodal_entities:
            if me.entity_id not in entity_modalities:
                entity_modalities[me.entity_id] = set()
            entity_modalities[me.entity_id].add(me.source_type)
        cross_modal_count = sum(1 for mods in entity_modalities.values() if len(mods) > 1)
        return {
            'modality_distribution': distribution,
            'total_multimodal_records': len(multimodal_entities),
            'unique_entities': len(entity_modalities),
            'cross_modal_entities': cross_modal_count,
            'cross_modal_ratio': cross_modal_count / len(entity_modalities) if entity_modalities else 0
        }
 # Singleton instance
 _multimodal_entity_linker = None
 def get_multimodal_entity_linker(similarity_threshold: float = 0.85) -> MultimodalEntityLinker:
    """获取多模态实体关联器单例"""
    global _multimodal_entity_linker
    if _multimodal_entity_linker is None:
        _multimodal_entity_linker = MultimodalEntityLinker(similarity_threshold)
    return _multimodal_entity_linker
--- a/backend/multimodal_processor.py
+++ b/backend/multimodal_processor.py
@@ -0,0 +1,434 @@
 #!/usr/bin/env python3
 """
 InsightFlow Multimodal Processor - Phase 7
 视频处理模块：提取音频、关键帧、OCR识别
 """
 import os
 import json
 import uuid
 import tempfile
 import subprocess
 from typing import List, Dict, Optional, Tuple
 from dataclasses import dataclass
 from pathlib import Path
 # 尝试导入OCR库
 try:
    import pytesseract
    from PIL import Image
    PYTESSERACT_AVAILABLE = True
 except ImportError:
    PYTESSERACT_AVAILABLE = False
 try:
    import cv2
    CV2_AVAILABLE = True
 except ImportError:
    CV2_AVAILABLE = False
 try:
    import ffmpeg
    FFMPEG_AVAILABLE = True
 except ImportError:
    FFMPEG_AVAILABLE = False
@dataclass
 class VideoFrame:
    """视频关键帧数据类"""
    id: str
    video_id: str
    frame_number: int
    timestamp: float
    frame_path: str
    ocr_text: str = ""
    ocr_confidence: float = 0.0
    entities_detected: List[Dict] = None
    def __post_init__(self):
        if self.entities_detected is None:
            self.entities_detected = []
@dataclass
 class VideoInfo:
    """视频信息数据类"""
    id: str
    project_id: str
    filename: str
    file_path: str
    duration: float = 0.0
    width: int = 0
    height: int = 0
    fps: float = 0.0
    audio_extracted: bool = False
    audio_path: str = ""
    transcript_id: str = ""
    status: str = "pending"
    error_message: str = ""
    metadata: Dict = None
    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}
@dataclass
 class VideoProcessingResult:
    """视频处理结果"""
    video_id: str
    audio_path: str
    frames: List[VideoFrame]
    ocr_results: List[Dict]
    full_text: str  # 整合的文本（音频转录 + OCR文本）
    success: bool
    error_message: str = ""
 class MultimodalProcessor:
    """多模态处理器 - 处理视频文件"""
    def __init__(self, temp_dir: str = None, frame_interval: int = 5):
        """
        初始化多模态处理器
        Args:
            temp_dir: 临时文件目录
            frame_interval: 关键帧提取间隔（秒）
        """
        self.temp_dir = temp_dir or tempfile.gettempdir()
        self.frame_interval = frame_interval
        self.video_dir = os.path.join(self.temp_dir, "videos")
        self.frames_dir = os.path.join(self.temp_dir, "frames")
        self.audio_dir = os.path.join(self.temp_dir, "audio")
        # 创建目录
        os.makedirs(self.video_dir, exist_ok=True)
        os.makedirs(self.frames_dir, exist_ok=True)
        os.makedirs(self.audio_dir, exist_ok=True)
    def extract_video_info(self, video_path: str) -> Dict:
        """
        提取视频基本信息
        Args:
            video_path: 视频文件路径
        Returns:
            视频信息字典
        """
        try:
            if FFMPEG_AVAILABLE:
                probe = ffmpeg.probe(video_path)
                video_stream = next((s for s in probe['streams'] if s['codec_type'] == 'video'), None)
                audio_stream = next((s for s in probe['streams'] if s['codec_type'] == 'audio'), None)
                if video_stream:
                    return {
                        'duration': float(probe['format'].get('duration', 0)),
                        'width': int(video_stream.get('width', 0)),
                        'height': int(video_stream.get('height', 0)),
                        'fps': eval(video_stream.get('r_frame_rate', '0/1')),
                        'has_audio': audio_stream is not None,
                        'bitrate': int(probe['format'].get('bit_rate', 0))
                    }
            else:
                # 使用 ffprobe 命令行
                cmd = [
                    'ffprobe', '-v', 'error', '-show_entries',
                    'format=duration,bit_rate', '-show_entries',
                    'stream=width,height,r_frame_rate', '-of', 'json',
                    video_path
                ]
                result = subprocess.run(cmd, capture_output=True, text=True)
                if result.returncode == 0:
                    data = json.loads(result.stdout)
                    return {
                        'duration': float(data['format'].get('duration', 0)),
                        'width': int(data['streams'][0].get('width', 0)) if data['streams'] else 0,
                        'height': int(data['streams'][0].get('height', 0)) if data['streams'] else 0,
                        'fps': 30.0,  # 默认值
                        'has_audio': len(data['streams']) > 1,
                        'bitrate': int(data['format'].get('bit_rate', 0))
                    }
        except Exception as e:
            print(f"Error extracting video info: {e}")
        return {
            'duration': 0,
            'width': 0,
            'height': 0,
            'fps': 0,
            'has_audio': False,
            'bitrate': 0
        }
    def extract_audio(self, video_path: str, output_path: str = None) -> str:
        """
        从视频中提取音频
        Args:
            video_path: 视频文件路径
            output_path: 输出音频路径（可选）
        Returns:
            提取的音频文件路径
        """
        if output_path is None:
            video_name = Path(video_path).stem
            output_path = os.path.join(self.audio_dir, f"{video_name}.wav")
        try:
            if FFMPEG_AVAILABLE:
                (
                    ffmpeg
                    .input(video_path)
                    .output(output_path, ac=1, ar=16000, vn=None)
                    .overwrite_output()
                    .run(quiet=True)
                )
            else:
                # 使用命令行 ffmpeg
                cmd = [
                    'ffmpeg', '-i', video_path,
                    '-vn', '-acodec', 'pcm_s16le',
                    '-ac', '1', '-ar', '16000',
                    '-y', output_path
                ]
                subprocess.run(cmd, check=True, capture_output=True)
            return output_path
        except Exception as e:
            print(f"Error extracting audio: {e}")
            raise
    def extract_keyframes(self, video_path: str, video_id: str, 
                         interval: int = None) -> List[str]:
        """
        从视频中提取关键帧
        Args:
            video_path: 视频文件路径
            video_id: 视频ID
            interval: 提取间隔（秒），默认使用初始化时的间隔
        Returns:
            提取的帧文件路径列表
        """
        interval = interval or self.frame_interval
        frame_paths = []
        # 创建帧存储目录
        video_frames_dir = os.path.join(self.frames_dir, video_id)
        os.makedirs(video_frames_dir, exist_ok=True)
        try:
            if CV2_AVAILABLE:
                # 使用 OpenCV 提取帧
                cap = cv2.VideoCapture(video_path)
                fps = cap.get(cv2.CAP_PROP_FPS)
                total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
                frame_interval_frames = int(fps * interval)
                frame_number = 0
                while True:
                    ret, frame = cap.read()
                    if not ret:
                        break
                    if frame_number % frame_interval_frames == 0:
                        timestamp = frame_number / fps
                        frame_path = os.path.join(
                            video_frames_dir, 
                            f"frame_{frame_number:06d}_{timestamp:.2f}.jpg"
                        )
                        cv2.imwrite(frame_path, frame)
                        frame_paths.append(frame_path)
                    frame_number += 1
                cap.release()
            else:
                # 使用 ffmpeg 命令行提取帧
                video_name = Path(video_path).stem
                output_pattern = os.path.join(video_frames_dir, "frame_%06d_%t.jpg")
                cmd = [
                    'ffmpeg', '-i', video_path,
                    '-vf', f'fps=1/{interval}',
                    '-frame_pts', '1',
                    '-y', output_pattern
                ]
                subprocess.run(cmd, check=True, capture_output=True)
                # 获取生成的帧文件列表
                frame_paths = sorted([
                    os.path.join(video_frames_dir, f)
                    for f in os.listdir(video_frames_dir)
                    if f.startswith('frame_')
                ])
        except Exception as e:
            print(f"Error extracting keyframes: {e}")
        return frame_paths
    def perform_ocr(self, image_path: str) -> Tuple[str, float]:
        """
        对图片进行OCR识别
        Args:
            image_path: 图片文件路径
        Returns:
            (识别的文本, 置信度)
        """
        if not PYTESSERACT_AVAILABLE:
            return "", 0.0
        try:
            image = Image.open(image_path)
            # 预处理：转换为灰度图
            if image.mode != 'L':
                image = image.convert('L')
            # 使用 pytesseract 进行 OCR
            text = pytesseract.image_to_string(image, lang='chi_sim+eng')
            # 获取置信度数据
            data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
            confidences = [int(c) for c in data['conf'] if int(c) > 0]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0
            return text.strip(), avg_confidence / 100.0
        except Exception as e:
            print(f"OCR error for {image_path}: {e}")
            return "", 0.0
    def process_video(self, video_data: bytes, filename: str, 
                     project_id: str, video_id: str = None) -> VideoProcessingResult:
        """
        处理视频文件：提取音频、关键帧、OCR
        Args:
            video_data: 视频文件二进制数据
            filename: 视频文件名
            project_id: 项目ID
            video_id: 视频ID（可选，自动生成）
        Returns:
            视频处理结果
        """
        video_id = video_id or str(uuid.uuid4())[:8]
        try:
            # 保存视频文件
            video_path = os.path.join(self.video_dir, f"{video_id}_{filename}")
            with open(video_path, 'wb') as f:
                f.write(video_data)
            # 提取视频信息
            video_info = self.extract_video_info(video_path)
            # 提取音频
            audio_path = ""
            if video_info['has_audio']:
                audio_path = self.extract_audio(video_path)
            # 提取关键帧
            frame_paths = self.extract_keyframes(video_path, video_id)
            # 对关键帧进行 OCR
            frames = []
            ocr_results = []
            all_ocr_text = []
            for i, frame_path in enumerate(frame_paths):
                # 解析帧信息
                frame_name = os.path.basename(frame_path)
                parts = frame_name.replace('.jpg', '').split('_')
                frame_number = int(parts[1]) if len(parts) > 1 else i
                timestamp = float(parts[2]) if len(parts) > 2 else i * self.frame_interval
                # OCR 识别
                ocr_text, confidence = self.perform_ocr(frame_path)
                frame = VideoFrame(
                    id=str(uuid.uuid4())[:8],
                    video_id=video_id,
                    frame_number=frame_number,
                    timestamp=timestamp,
                    frame_path=frame_path,
                    ocr_text=ocr_text,
                    ocr_confidence=confidence
                )
                frames.append(frame)
                if ocr_text:
                    ocr_results.append({
                        'frame_number': frame_number,
                        'timestamp': timestamp,
                        'text': ocr_text,
                        'confidence': confidence
                    })
                    all_ocr_text.append(ocr_text)
            # 整合所有 OCR 文本
            full_ocr_text = "\n\n".join(all_ocr_text)
            return VideoProcessingResult(
                video_id=video_id,
                audio_path=audio_path,
                frames=frames,
                ocr_results=ocr_results,
                full_text=full_ocr_text,
                success=True
            )
        except Exception as e:
            return VideoProcessingResult(
                video_id=video_id,
                audio_path="",
                frames=[],
                ocr_results=[],
                full_text="",
                success=False,
                error_message=str(e)
            )
    def cleanup(self, video_id: str = None):
        """
        清理临时文件
        Args:
            video_id: 视频ID（可选，清理特定视频的文件）
        """
        import shutil
        if video_id:
            # 清理特定视频的文件
            for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]:
                target_dir = os.path.join(dir_path, video_id) if dir_path == self.frames_dir else dir_path
                if os.path.exists(target_dir):
                    for f in os.listdir(target_dir):
                        if video_id in f:
                            os.remove(os.path.join(target_dir, f))
        else:
            # 清理所有临时文件
            for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]:
                if os.path.exists(dir_path):
                    shutil.rmtree(dir_path)
                    os.makedirs(dir_path, exist_ok=True)
 # Singleton instance
 _multimodal_processor = None
 def get_multimodal_processor(temp_dir: str = None, frame_interval: int = 5) -> MultimodalProcessor:
    """获取多模态处理器单例"""
    global _multimodal_processor
    if _multimodal_processor is None:
        _multimodal_processor = MultimodalProcessor(temp_dir, frame_interval)
    return _multimodal_processor
--- a/backend/plugin_manager.py
+++ b/backend/plugin_manager.py
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -36,3 +36,17 @@ fastapi-offline-swagger==0.1.0
 # Phase 7: Workflow Automation
 apscheduler==3.10.4
 # Phase 7: Multimodal Support
 ffmpeg-python==0.2.0
 pillow==10.2.0
 opencv-python==4.9.0.80
 pytesseract==0.3.10
 # Phase 7 Task 7: Plugin & Integration
 webdav4==0.9.8
 urllib3==2.2.0
 # Phase 7: Plugin & Integration
 beautifulsoup4==4.12.3
 webdavclient3==3.14.6
--- a/backend/schema.sql
+++ b/backend/schema.sql
@@ -222,3 +222,320 @@ CREATE INDEX IF NOT EXISTS idx_workflow_logs_workflow ON workflow_logs(workflow_
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_task ON workflow_logs(task_id);
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_status ON workflow_logs(status);
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_created ON workflow_logs(created_at);
 -- Phase 7: 多模态支持相关表
 -- 视频表
 CREATE TABLE IF NOT EXISTS videos (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    duration REAL,  -- 视频时长（秒）
    fps REAL,  -- 帧率
    resolution TEXT,  -- JSON: {"width": int, "height": int}
    audio_transcript_id TEXT,  -- 关联的音频转录ID
    full_ocr_text TEXT,  -- 所有帧OCR文本合并
    extracted_entities TEXT,  -- JSON: 提取的实体列表
    extracted_relations TEXT,  -- JSON: 提取的关系列表
    status TEXT DEFAULT 'processing',  -- processing, completed, failed
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (audio_transcript_id) REFERENCES transcripts(id)
 );
 -- 视频关键帧表
 CREATE TABLE IF NOT EXISTS video_frames (
    id TEXT PRIMARY KEY,
    video_id TEXT NOT NULL,
    frame_number INTEGER,
    timestamp REAL,  -- 时间戳（秒）
    image_data BLOB,  -- 帧图片数据（可选，可存储在OSS）
    image_url TEXT,  -- 图片URL（如果存储在OSS）
    ocr_text TEXT,  -- OCR识别文本
    extracted_entities TEXT,  -- JSON: 该帧提取的实体
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
 );
 -- 图片表
 CREATE TABLE IF NOT EXISTS images (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    image_data BLOB,  -- 图片数据（可选）
    image_url TEXT,  -- 图片URL
    ocr_text TEXT,  -- OCR识别文本
    description TEXT,  -- 图片描述（LLM生成）
    extracted_entities TEXT,  -- JSON: 提取的实体列表
    extracted_relations TEXT,  -- JSON: 提取的关系列表
    status TEXT DEFAULT 'processing',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- 多模态实体提及表
 CREATE TABLE IF NOT EXISTS multimodal_mentions (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    entity_id TEXT NOT NULL,
    modality TEXT NOT NULL,  -- audio, video, image, document
    source_id TEXT NOT NULL,  -- transcript_id, video_id, image_id
    source_type TEXT NOT NULL,  -- 来源类型
    position TEXT,  -- JSON: 位置信息
    text_snippet TEXT,  -- 提及的文本片段
    confidence REAL DEFAULT 1.0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE
 );
 -- 多模态实体关联表
 CREATE TABLE IF NOT EXISTS multimodal_entity_links (
    id TEXT PRIMARY KEY,
    entity_id TEXT NOT NULL,
    linked_entity_id TEXT NOT NULL,  -- 关联的实体ID
    link_type TEXT NOT NULL,  -- same_as, related_to, part_of
    confidence REAL DEFAULT 1.0,
    evidence TEXT,  -- 关联证据
    modalities TEXT,  -- JSON: 涉及的模态列表
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE,
    FOREIGN KEY (linked_entity_id) REFERENCES entities(id) ON DELETE CASCADE
 );
 -- 多模态相关索引
 CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id);
 CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status);
 CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id);
 CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id);
 CREATE INDEX IF NOT EXISTS idx_images_status ON images(status);
 CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_project ON multimodal_mentions(project_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_entity ON multimodal_mentions(entity_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_modality ON multimodal_mentions(modality);
 CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_source ON multimodal_mentions(source_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_links_entity ON multimodal_entity_links(entity_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_links_linked ON multimodal_entity_links(linked_entity_id);
 -- Phase 7 Task 7: 插件与集成相关表
 -- 插件配置表
 CREATE TABLE IF NOT EXISTS plugins (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    plugin_type TEXT NOT NULL,  -- chrome_extension, feishu_bot, dingtalk_bot, zapier, make, webdav, custom
    project_id TEXT,
    status TEXT DEFAULT 'active',  -- active, inactive, error, pending
    config TEXT,  -- JSON: plugin specific configuration
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_used_at TIMESTAMP,
    use_count INTEGER DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- 插件详细配置表
 CREATE TABLE IF NOT EXISTS plugin_configs (
    id TEXT PRIMARY KEY,
    plugin_id TEXT NOT NULL,
    config_key TEXT NOT NULL,
    config_value TEXT,
    is_encrypted BOOLEAN DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
    UNIQUE(plugin_id, config_key)
 );
 -- 机器人会话表
 CREATE TABLE IF NOT EXISTS bot_sessions (
    id TEXT PRIMARY KEY,
    bot_type TEXT NOT NULL,  -- feishu, dingtalk
    session_id TEXT NOT NULL,  -- 群ID或会话ID
    session_name TEXT NOT NULL,
    project_id TEXT,
    webhook_url TEXT,
    secret TEXT,  -- 签名密钥
    is_active BOOLEAN DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_message_at TIMESTAMP,
    message_count INTEGER DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- Webhook 端点表（Zapier/Make集成）
 CREATE TABLE IF NOT EXISTS webhook_endpoints (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    endpoint_type TEXT NOT NULL,  -- zapier, make, custom
    endpoint_url TEXT NOT NULL,
    project_id TEXT,
    auth_type TEXT DEFAULT 'none',  -- none, api_key, oauth, custom
    auth_config TEXT,  -- JSON: authentication configuration
    trigger_events TEXT,  -- JSON array: events that trigger this webhook
    is_active BOOLEAN DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_triggered_at TIMESTAMP,
    trigger_count INTEGER DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- WebDAV 同步配置表
 CREATE TABLE IF NOT EXISTS webdav_syncs (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    project_id TEXT NOT NULL,
    server_url TEXT NOT NULL,
    username TEXT NOT NULL,
    password TEXT NOT NULL,  -- 建议加密存储
    remote_path TEXT DEFAULT '/insightflow',
    sync_mode TEXT DEFAULT 'bidirectional',  -- bidirectional, upload_only, download_only
    sync_interval INTEGER DEFAULT 3600,  -- 秒
    last_sync_at TIMESTAMP,
    last_sync_status TEXT DEFAULT 'pending',  -- pending, success, failed
    last_sync_error TEXT,
    is_active BOOLEAN DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    sync_count INTEGER DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- Chrome 扩展令牌表
 CREATE TABLE IF NOT EXISTS chrome_extension_tokens (
    id TEXT PRIMARY KEY,
    token_hash TEXT NOT NULL UNIQUE,  -- SHA256 hash of the token
    user_id TEXT,
    project_id TEXT,
    name TEXT,
    permissions TEXT,  -- JSON array: read, write, delete
    expires_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_used_at TIMESTAMP,
    use_count INTEGER DEFAULT 0,
    is_revoked BOOLEAN DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- 插件相关索引
 CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id);
 CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type);
 CREATE INDEX IF NOT EXISTS idx_plugins_status ON plugins(status);
 CREATE INDEX IF NOT EXISTS idx_plugin_configs_plugin ON plugin_configs(plugin_id);
 CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id);
 CREATE INDEX IF NOT EXISTS idx_bot_sessions_type ON bot_sessions(bot_type);
 CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_project ON webhook_endpoints(project_id);
 CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_type ON webhook_endpoints(endpoint_type);
 CREATE INDEX IF NOT EXISTS idx_webdav_syncs_project ON webdav_syncs(project_id);
 CREATE INDEX IF NOT EXISTS idx_chrome_tokens_project ON chrome_extension_tokens(project_id);
 CREATE INDEX IF NOT EXISTS idx_chrome_tokens_hash ON chrome_extension_tokens(token_hash);
 -- Phase 7: 插件与集成相关表
 -- 插件表
 CREATE TABLE IF NOT EXISTS plugins (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    plugin_type TEXT NOT NULL,  -- chrome_extension, feishu_bot, dingtalk_bot, slack_bot, webhook, webdav, custom
    project_id TEXT,
    status TEXT DEFAULT 'active',  -- active, inactive, error, pending
    config TEXT,  -- JSON: 插件配置
    api_key TEXT UNIQUE,  -- 用于认证的 API Key
    api_secret TEXT,  -- 用于签名验证的 Secret
    webhook_url TEXT,  -- 机器人 Webhook URL
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_used_at TIMESTAMP,
    use_count INTEGER DEFAULT 0,
    success_count INTEGER DEFAULT 0,
    fail_count INTEGER DEFAULT 0,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- 机器人会话表
 CREATE TABLE IF NOT EXISTS bot_sessions (
    id TEXT PRIMARY KEY,
    plugin_id TEXT NOT NULL,
    platform TEXT NOT NULL,  -- feishu, dingtalk, slack, wechat
    session_id TEXT NOT NULL,  -- 平台特定的会话ID
    user_id TEXT,
    user_name TEXT,
    project_id TEXT,  -- 关联的项目ID
    context TEXT,  -- JSON: 会话上下文
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_message_at TIMESTAMP,
    message_count INTEGER DEFAULT 0,
    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    UNIQUE(plugin_id, session_id)
 );
 -- Webhook 端点表（用于 Zapier/Make 集成）
 CREATE TABLE IF NOT EXISTS webhook_endpoints (
    id TEXT PRIMARY KEY,
    plugin_id TEXT NOT NULL,
    name TEXT NOT NULL,
    endpoint_path TEXT NOT NULL UNIQUE,  -- 如 /webhook/zapier/abc123
    endpoint_type TEXT NOT NULL,  -- zapier, make, custom
    secret TEXT,  -- 用于签名验证
    allowed_events TEXT,  -- JSON: 允许的事件列表
    target_project_id TEXT,  -- 数据导入的目标项目
    is_active BOOLEAN DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_triggered_at TIMESTAMP,
    trigger_count INTEGER DEFAULT 0,
    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
    FOREIGN KEY (target_project_id) REFERENCES projects(id)
 );
 -- WebDAV 同步配置表
 CREATE TABLE IF NOT EXISTS webdav_syncs (
    id TEXT PRIMARY KEY,
    plugin_id TEXT NOT NULL,
    name TEXT NOT NULL,
    server_url TEXT NOT NULL,
    username TEXT NOT NULL,
    password TEXT NOT NULL,  -- 建议加密存储
    remote_path TEXT DEFAULT '/',
    local_path TEXT DEFAULT './sync',
    sync_direction TEXT DEFAULT 'bidirectional',  -- upload, download, bidirectional
    sync_mode TEXT DEFAULT 'manual',  -- manual, realtime, scheduled
    sync_schedule TEXT,  -- cron expression
    file_patterns TEXT,  -- JSON: 文件匹配模式列表
    auto_analyze BOOLEAN DEFAULT 1,  -- 同步后自动分析
    last_sync_at TIMESTAMP,
    last_sync_status TEXT,
    is_active BOOLEAN DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    sync_count INTEGER DEFAULT 0,
    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE
 );
 -- 插件活动日志表
 CREATE TABLE IF NOT EXISTS plugin_activity_logs (
    id TEXT PRIMARY KEY,
    plugin_id TEXT NOT NULL,
    activity_type TEXT NOT NULL,  -- message, webhook, sync, error
    source TEXT NOT NULL,  -- 来源标识
    details TEXT,  -- JSON: 详细信息
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE
 );
 -- 插件相关索引
 CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id);
 CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type);
 CREATE INDEX IF NOT EXISTS idx_plugins_api_key ON plugins(api_key);
 CREATE INDEX IF NOT EXISTS idx_bot_sessions_plugin ON bot_sessions(plugin_id);
 CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id);
 CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_plugin ON webhook_endpoints(plugin_id);
 CREATE INDEX IF NOT EXISTS idx_webdav_syncs_plugin ON webdav_syncs(plugin_id);
 CREATE INDEX IF NOT EXISTS idx_plugin_logs_plugin ON plugin_activity_logs(plugin_id);
 CREATE INDEX IF NOT EXISTS idx_plugin_logs_type ON plugin_activity_logs(activity_type);
 CREATE INDEX IF NOT EXISTS idx_plugin_logs_created ON plugin_activity_logs(created_at);
--- a/backend/schema_multimodal.sql
+++ b/backend/schema_multimodal.sql
@@ -0,0 +1,104 @@
 -- Phase 7: 多模态支持相关表
 -- 视频表
 CREATE TABLE IF NOT EXISTS videos (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    file_path TEXT,
    duration REAL,  -- 视频时长（秒）
    width INTEGER,  -- 视频宽度
    height INTEGER,  -- 视频高度
    fps REAL,  -- 帧率
    audio_extracted INTEGER DEFAULT 0,  -- 是否已提取音频
    audio_path TEXT,  -- 提取的音频文件路径
    transcript_id TEXT,  -- 关联的转录记录ID
    status TEXT DEFAULT 'pending',  -- pending, processing, completed, failed
    error_message TEXT,
    metadata TEXT,  -- JSON: 其他元数据
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (transcript_id) REFERENCES transcripts(id)
 );
 -- 视频关键帧表
 CREATE TABLE IF NOT EXISTS video_frames (
    id TEXT PRIMARY KEY,
    video_id TEXT NOT NULL,
    frame_number INTEGER NOT NULL,
    timestamp REAL NOT NULL,  -- 帧时间戳（秒）
    frame_path TEXT NOT NULL,  -- 帧图片路径
    ocr_text TEXT,  -- OCR识别的文字
    ocr_confidence REAL,  -- OCR置信度
    entities_detected TEXT,  -- JSON: 检测到的实体
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
 );
 -- 图片表
 CREATE TABLE IF NOT EXISTS images (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    file_path TEXT,
    image_type TEXT,  -- whiteboard, ppt, handwritten, screenshot, other
    width INTEGER,
    height INTEGER,
    ocr_text TEXT,  -- OCR识别的文字
    description TEXT,  -- 图片描述（LLM生成）
    entities_detected TEXT,  -- JSON: 检测到的实体
    relations_detected TEXT,  -- JSON: 检测到的关系
    transcript_id TEXT,  -- 关联的转录记录ID（可选）
    status TEXT DEFAULT 'pending',  -- pending, processing, completed, failed
    error_message TEXT,
    metadata TEXT,  -- JSON: 其他元数据
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (transcript_id) REFERENCES transcripts(id)
 );
 -- 多模态实体关联表
 CREATE TABLE IF NOT EXISTS multimodal_entities (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    entity_id TEXT NOT NULL,  -- 关联的实体ID
    source_type TEXT NOT NULL,  -- audio, video, image, document
    source_id TEXT NOT NULL,  -- 来源ID（transcript_id, video_id, image_id）
    mention_context TEXT,  -- 提及上下文
    confidence REAL DEFAULT 1.0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (entity_id) REFERENCES entities(id),
    UNIQUE(entity_id, source_type, source_id)
 );
 -- 多模态实体对齐表（跨模态实体关联）
 CREATE TABLE IF NOT EXISTS multimodal_entity_links (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    source_entity_id TEXT NOT NULL,  -- 源实体ID
    target_entity_id TEXT NOT NULL,  -- 目标实体ID
    link_type TEXT NOT NULL,  -- same_as, related_to, part_of
    source_modality TEXT NOT NULL,  -- audio, video, image, document
    target_modality TEXT NOT NULL,  -- audio, video, image, document
    confidence REAL DEFAULT 1.0,
    evidence TEXT,  -- 关联证据
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id),
    FOREIGN KEY (source_entity_id) REFERENCES entities(id),
    FOREIGN KEY (target_entity_id) REFERENCES entities(id)
 );
 -- 创建索引
 CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id);
 CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status);
 CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id);
 CREATE INDEX IF NOT EXISTS idx_video_frames_timestamp ON video_frames(timestamp);
 CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id);
 CREATE INDEX IF NOT EXISTS idx_images_type ON images(image_type);
 CREATE INDEX IF NOT EXISTS idx_images_status ON images(status);
 CREATE INDEX IF NOT EXISTS idx_multimodal_entities_project ON multimodal_entities(project_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_entities_entity ON multimodal_entities(entity_id);
 CREATE INDEX IF NOT EXISTS idx_multimodal_entity_links_project ON multimodal_entity_links(project_id);
--- a/backend/test_multimodal.py
+++ b/backend/test_multimodal.py
@@ -0,0 +1,157 @@
 #!/usr/bin/env python3
 """
 InsightFlow Multimodal Module Test Script
 测试多模态支持模块
 """
 import sys
 import os
 # 添加 backend 目录到路径
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 print("=" * 60)
 print("InsightFlow 多模态模块测试")
 print("=" * 60)
 # 测试导入
 print("\n1. 测试模块导入...")
 try:
    from multimodal_processor import (
        get_multimodal_processor, MultimodalProcessor,
        VideoProcessingResult, VideoFrame
    )
    print("   ✓ multimodal_processor 导入成功")
 except ImportError as e:
    print(f"   ✗ multimodal_processor 导入失败: {e}")
 try:
    from image_processor import (
        get_image_processor, ImageProcessor,
        ImageProcessingResult, ImageEntity, ImageRelation
    )
    print("   ✓ image_processor 导入成功")
 except ImportError as e:
    print(f"   ✗ image_processor 导入失败: {e}")
 try:
    from multimodal_entity_linker import (
        get_multimodal_entity_linker, MultimodalEntityLinker,
        MultimodalEntity, EntityLink, AlignmentResult, FusionResult
    )
    print("   ✓ multimodal_entity_linker 导入成功")
 except ImportError as e:
    print(f"   ✗ multimodal_entity_linker 导入失败: {e}")
 # 测试初始化
 print("\n2. 测试模块初始化...")
 try:
    processor = get_multimodal_processor()
    print(f"   ✓ MultimodalProcessor 初始化成功")
    print(f"     - 临时目录: {processor.temp_dir}")
    print(f"     - 帧提取间隔: {processor.frame_interval}秒")
 except Exception as e:
    print(f"   ✗ MultimodalProcessor 初始化失败: {e}")
 try:
    img_processor = get_image_processor()
    print(f"   ✓ ImageProcessor 初始化成功")
    print(f"     - 临时目录: {img_processor.temp_dir}")
 except Exception as e:
    print(f"   ✗ ImageProcessor 初始化失败: {e}")
 try:
    linker = get_multimodal_entity_linker()
    print(f"   ✓ MultimodalEntityLinker 初始化成功")
    print(f"     - 相似度阈值: {linker.similarity_threshold}")
 except Exception as e:
    print(f"   ✗ MultimodalEntityLinker 初始化失败: {e}")
 # 测试实体关联功能
 print("\n3. 测试实体关联功能...")
 try:
    linker = get_multimodal_entity_linker()
    # 测试字符串相似度
    sim = linker.calculate_string_similarity("Project Alpha", "Project Alpha")
    assert sim == 1.0, "完全匹配应该返回1.0"
    print(f"   ✓ 字符串相似度计算正常 (完全匹配: {sim})")
    sim = linker.calculate_string_similarity("K8s", "Kubernetes")
    print(f"   ✓ 字符串相似度计算正常 (不同字符串: {sim:.2f})")
    # 测试实体相似度
    entity1 = {"name": "Project Alpha", "type": "PROJECT", "definition": "核心项目"}
    entity2 = {"name": "Project Alpha", "type": "PROJECT", "definition": "主要项目"}
    sim, match_type = linker.calculate_entity_similarity(entity1, entity2)
    print(f"   ✓ 实体相似度计算正常 (相似度: {sim:.2f}, 类型: {match_type})")
 except Exception as e:
    print(f"   ✗ 实体关联功能测试失败: {e}")
 # 测试图片处理功能（不需要实际图片）
 print("\n4. 测试图片处理器功能...")
 try:
    processor = get_image_processor()
    # 测试图片类型检测（使用模拟数据）
    print(f"   ✓ 支持的图片类型: {list(processor.IMAGE_TYPES.keys())}")
    print(f"   ✓ 图片类型描述: {processor.IMAGE_TYPES}")
 except Exception as e:
    print(f"   ✗ 图片处理器功能测试失败: {e}")
 # 测试视频处理配置
 print("\n5. 测试视频处理器配置...")
 try:
    processor = get_multimodal_processor()
    print(f"   ✓ 视频目录: {processor.video_dir}")
    print(f"   ✓ 帧目录: {processor.frames_dir}")
    print(f"   ✓ 音频目录: {processor.audio_dir}")
    # 检查目录是否存在
    for dir_name, dir_path in [
        ("视频", processor.video_dir),
        ("帧", processor.frames_dir),
        ("音频", processor.audio_dir)
    ]:
        if os.path.exists(dir_path):
            print(f"   ✓ {dir_name}目录存在: {dir_path}")
        else:
            print(f"   ✗ {dir_name}目录不存在: {dir_path}")
 except Exception as e:
    print(f"   ✗ 视频处理器配置测试失败: {e}")
 # 测试数据库方法（如果数据库可用）
 print("\n6. 测试数据库多模态方法...")
 try:
    from db_manager import get_db_manager
    db = get_db_manager()
    # 检查多模态表是否存在
    conn = db.get_conn()
    tables = ['videos', 'video_frames', 'images', 'multimodal_mentions', 'multimodal_entity_links']
    for table in tables:
        try:
            conn.execute(f"SELECT 1 FROM {table} LIMIT 1")
            print(f"   ✓ 表 '{table}' 存在")
        except Exception as e:
            print(f"   ✗ 表 '{table}' 不存在或无法访问: {e}")
    conn.close()
 except Exception as e:
    print(f"   ✗ 数据库多模态方法测试失败: {e}")
 print("\n" + "=" * 60)
 print("测试完成")
 print("=" * 60)
--- a/chrome-extension/background.js
+++ b/chrome-extension/background.js
@@ -0,0 +1,217 @@
 // InsightFlow Chrome Extension - Background Script
 // 处理后台任务、右键菜单、消息传递
 // 默认配置
 const DEFAULT_CONFIG = {
  serverUrl: 'http://122.51.127.111:18000',
  apiKey: '',
  defaultProjectId: ''
 };
 // 初始化
 chrome.runtime.onInstalled.addListener(() => {
  // 创建右键菜单
  chrome.contextMenus.create({
    id: 'clipSelection',
    title: '保存到 InsightFlow',
    contexts: ['selection', 'page']
  });
  // 初始化存储
  chrome.storage.sync.get(['insightflowConfig'], (result) => {
    if (!result.insightflowConfig) {
      chrome.storage.sync.set({ insightflowConfig: DEFAULT_CONFIG });
    }
  });
 });
 // 处理右键菜单点击
 chrome.contextMenus.onClicked.addListener((info, tab) => {
  if (info.menuItemId === 'clipSelection') {
    clipPage(tab, info.selectionText);
  }
 });
 // 处理扩展图标点击
 chrome.action.onClicked.addListener((tab) => {
  clipPage(tab);
 });
 // 监听来自内容脚本的消息
 chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'clipPage') {
    clipPage(sender.tab, request.selectionText);
    sendResponse({ success: true });
  } else if (request.action === 'getConfig') {
    chrome.storage.sync.get(['insightflowConfig'], (result) => {
      sendResponse(result.insightflowConfig || DEFAULT_CONFIG);
    });
    return true; // 保持消息通道开放
  } else if (request.action === 'saveConfig') {
    chrome.storage.sync.set({ insightflowConfig: request.config }, () => {
      sendResponse({ success: true });
    });
    return true;
  } else if (request.action === 'fetchProjects') {
    fetchProjects().then(projects => {
      sendResponse({ success: true, projects });
    }).catch(error => {
      sendResponse({ success: false, error: error.message });
    });
    return true;
  }
 });
 // 剪藏页面
 async function clipPage(tab, selectionText = null) {
  try {
    // 获取配置
    const config = await getConfig();
    if (!config.apiKey) {
      showNotification('请先配置 API Key', '点击扩展图标打开设置');
      chrome.runtime.openOptionsPage();
      return;
    }
    // 获取页面内容
    const [{ result }] = await chrome.scripting.executeScript({
      target: { tabId: tab.id },
      func: extractPageContent,
      args: [selectionText]
    });
    // 发送到 InsightFlow
    const response = await sendToInsightFlow(config, result);
    if (response.success) {
      showNotification('保存成功', '内容已导入 InsightFlow');
    } else {
      showNotification('保存失败', response.error || '未知错误');
    }
  } catch (error) {
    console.error('Clip error:', error);
    showNotification('保存失败', error.message);
  }
 }
 // 提取页面内容
 function extractPageContent(selectionText) {
  const data = {
    url: window.location.href,
    title: document.title,
    selection: selectionText,
    timestamp: new Date().toISOString()
  };
  if (selectionText) {
    // 只保存选中的文本
    data.content = selectionText;
    data.contentType = 'selection';
  } else {
    // 保存整个页面
    // 获取主要内容
    const article = document.querySelector('article') || 
                   document.querySelector('main') || 
                   document.querySelector('.content') ||
                   document.querySelector('#content');
    if (article) {
      data.content = article.innerText;
      data.contentType = 'article';
    } else {
      // 获取 body 文本，但移除脚本和样式
      const bodyClone = document.body.cloneNode(true);
      const scripts = bodyClone.querySelectorAll('script, style, nav, header, footer, aside');
      scripts.forEach(el => el.remove());
      data.content = bodyClone.innerText;
      data.contentType = 'page';
    }
    // 限制内容长度
    if (data.content.length > 50000) {
      data.content = data.content.substring(0, 50000) + '...';
      data.truncated = true;
    }
  }
  // 获取元数据
  data.meta = {
    description: document.querySelector('meta[name="description"]')?.content || '',
    keywords: document.querySelector('meta[name="keywords"]')?.content || '',
    author: document.querySelector('meta[name="author"]')?.content || ''
  };
  return data;
 }
 // 发送到 InsightFlow
 async function sendToInsightFlow(config, data) {
  const url = `${config.serverUrl}/api/v1/plugins/chrome/clip`;
  const payload = {
    url: data.url,
    title: data.title,
    content: data.content,
    content_type: data.contentType,
    meta: data.meta,
    project_id: config.defaultProjectId || null
  };
  const response = await fetch(url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': config.apiKey
    },
    body: JSON.stringify(payload)
  });
  if (!response.ok) {
    const error = await response.text();
    throw new Error(error);
  }
  return await response.json();
 }
 // 获取配置
 function getConfig() {
  return new Promise((resolve) => {
    chrome.storage.sync.get(['insightflowConfig'], (result) => {
      resolve(result.insightflowConfig || DEFAULT_CONFIG);
    });
  });
 }
 // 获取项目列表
 async function fetchProjects() {
  const config = await getConfig();
  if (!config.apiKey) {
    throw new Error('请先配置 API Key');
  }
  const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
    headers: {
      'X-API-Key': config.apiKey
    }
  });
  if (!response.ok) {
    throw new Error('获取项目列表失败');
  }
  const data = await response.json();
  return data.projects || [];
 }
 // 显示通知
 function showNotification(title, message) {
  chrome.notifications.create({
    type: 'basic',
    iconUrl: 'icons/icon128.png',
    title,
    message
  });
 }
--- a/chrome-extension/content.css
+++ b/chrome-extension/content.css
@@ -0,0 +1,141 @@
 /* InsightFlow Chrome Extension - Content Styles */
 .insightflow-float-btn {
  position: absolute;
  width: 36px;
  height: 36px;
  background: #4f46e5;
  border-radius: 50%;
  display: none;
  align-items: center;
  justify-content: center;
  cursor: pointer;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.15);
  z-index: 999999;
  transition: transform 0.2s, box-shadow 0.2s;
 }
 .insightflow-float-btn:hover {
  transform: scale(1.1);
  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
 }
 .insightflow-float-btn svg {
  color: white;
 }
 .insightflow-popup {
  position: absolute;
  width: 300px;
  background: white;
  border-radius: 8px;
  box-shadow: 0 4px 20px rgba(0, 0, 0, 0.15);
  z-index: 999999;
  display: none;
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
  font-size: 14px;
 }
 .insightflow-popup-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 12px 16px;
  border-bottom: 1px solid #e5e7eb;
  font-weight: 600;
  color: #111827;
 }
 .insightflow-close-btn {
  background: none;
  border: none;
  font-size: 20px;
  color: #6b7280;
  cursor: pointer;
  padding: 0;
  width: 24px;
  height: 24px;
  display: flex;
  align-items: center;
  justify-content: center;
 }
 .insightflow-close-btn:hover {
  color: #111827;
 }
 .insightflow-popup-content {
  padding: 16px;
 }
 .insightflow-text-preview {
  background: #f3f4f6;
  padding: 12px;
  border-radius: 6px;
  font-size: 13px;
  color: #4b5563;
  line-height: 1.5;
  max-height: 120px;
  overflow-y: auto;
  margin-bottom: 12px;
 }
 .insightflow-actions {
  display: flex;
  gap: 8px;
 }
 .insightflow-btn {
  flex: 1;
  padding: 8px 12px;
  border: 1px solid #d1d5db;
  border-radius: 6px;
  background: white;
  color: #374151;
  font-size: 13px;
  cursor: pointer;
  transition: all 0.2s;
 }
 .insightflow-btn:hover {
  background: #f9fafb;
  border-color: #9ca3af;
 }
 .insightflow-btn-primary {
  background: #4f46e5;
  border-color: #4f46e5;
  color: white;
 }
 .insightflow-btn-primary:hover {
  background: #4338ca;
  border-color: #4338ca;
 }
 .insightflow-project-list {
  max-height: 200px;
  overflow-y: auto;
 }
 .insightflow-project-item {
  padding: 12px;
  border-radius: 6px;
  cursor: pointer;
  transition: background 0.2s;
 }
 .insightflow-project-item:hover {
  background: #f3f4f6;
 }
 .insightflow-project-name {
  font-weight: 500;
  color: #111827;
  margin-bottom: 4px;
 }
 .insightflow-project-desc {
  font-size: 12px;
  color: #6b7280;
 }
--- a/chrome-extension/content.js
+++ b/chrome-extension/content.js
@@ -0,0 +1,204 @@
 // InsightFlow Chrome Extension - Content Script
 // 在页面中注入，处理页面交互
 (function() {
  'use strict';
  // 防止重复注入
  if (window.insightflowInjected) return;
  window.insightflowInjected = true;
  // 创建浮动按钮
  let floatingButton = null;
  let selectionPopup = null;
  // 监听选中文本
  document.addEventListener('mouseup', handleSelection);
  document.addEventListener('keyup', handleSelection);
  function handleSelection(e) {
    const selection = window.getSelection();
    const text = selection.toString().trim();
    if (text.length > 0) {
      showFloatingButton(selection);
    } else {
      hideFloatingButton();
      hideSelectionPopup();
    }
  }
  // 显示浮动按钮
  function showFloatingButton(selection) {
    if (!floatingButton) {
      floatingButton = document.createElement('div');
      floatingButton.className = 'insightflow-float-btn';
      floatingButton.innerHTML = `
        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
          <path d="M12 5v14M5 12h14"/>
        </svg>
      `;
      floatingButton.title = '保存到 InsightFlow';
      document.body.appendChild(floatingButton);
      floatingButton.addEventListener('click', () => {
        const text = window.getSelection().toString().trim();
        if (text) {
          showSelectionPopup(text);
        }
      });
    }
    // 定位按钮
    const range = selection.getRangeAt(0);
    const rect = range.getBoundingClientRect();
    floatingButton.style.left = `${rect.right + window.scrollX - 40}px`;
    floatingButton.style.top = `${rect.top + window.scrollY - 45}px`;
    floatingButton.style.display = 'flex';
  }
  // 隐藏浮动按钮
  function hideFloatingButton() {
    if (floatingButton) {
      floatingButton.style.display = 'none';
    }
  }
  // 显示选择弹窗
  function showSelectionPopup(text) {
    hideFloatingButton();
    if (!selectionPopup) {
      selectionPopup = document.createElement('div');
      selectionPopup.className = 'insightflow-popup';
      document.body.appendChild(selectionPopup);
    }
    selectionPopup.innerHTML = `
      <div class="insightflow-popup-header">
        <span>保存到 InsightFlow</span>
        <button class="insightflow-close-btn">&times;</button>
      </div>
      <div class="insightflow-popup-content">
        <div class="insightflow-text-preview">${escapeHtml(text.substring(0, 200))}${text.length > 200 ? '...' : ''}</div>
        <div class="insightflow-actions">
          <button class="insightflow-btn insightflow-btn-primary" id="if-save-quick">快速保存</button>
          <button class="insightflow-btn" id="if-save-select">选择项目...</button>
        </div>
      </div>
    `;
    selectionPopup.style.display = 'block';
    // 定位弹窗
    const selection = window.getSelection();
    const range = selection.getRangeAt(0);
    const rect = range.getBoundingClientRect();
    selectionPopup.style.left = `${Math.min(rect.left + window.scrollX, window.innerWidth - 320)}px`;
    selectionPopup.style.top = `${rect.bottom + window.scrollY + 10}px`;
    // 绑定事件
    selectionPopup.querySelector('.insightflow-close-btn').addEventListener('click', hideSelectionPopup);
    selectionPopup.querySelector('#if-save-quick').addEventListener('click', () => saveQuick(text));
    selectionPopup.querySelector('#if-save-select').addEventListener('click', () => saveWithProject(text));
  }
  // 隐藏选择弹窗
  function hideSelectionPopup() {
    if (selectionPopup) {
      selectionPopup.style.display = 'none';
    }
  }
  // 快速保存
  async function saveQuick(text) {
    hideSelectionPopup();
    chrome.runtime.sendMessage({
      action: 'clipPage',
      selectionText: text
    });
  }
  // 选择项目保存
  async function saveWithProject(text) {
    // 获取项目列表
    chrome.runtime.sendMessage({ action: 'fetchProjects' }, (response) => {
      if (response.success && response.projects.length > 0) {
        showProjectSelector(text, response.projects);
      } else {
        saveQuick(text); // 失败时快速保存
      }
    });
  }
  // 显示项目选择器
  function showProjectSelector(text, projects) {
    selectionPopup.innerHTML = `
      <div class="insightflow-popup-header">
        <span>选择项目</span>
        <button class="insightflow-close-btn">&times;</button>
      </div>
      <div class="insightflow-popup-content">
        <div class="insightflow-project-list">
          ${projects.map(p => `
            <div class="insightflow-project-item" data-id="${p.id}">
              <div class="insightflow-project-name">${escapeHtml(p.name)}</div>
              <div class="insightflow-project-desc">${escapeHtml(p.description || '').substring(0, 50)}</div>
            </div>
          `).join('')}
        </div>
      </div>
    `;
    selectionPopup.querySelector('.insightflow-close-btn').addEventListener('click', hideSelectionPopup);
    // 绑定项目选择事件
    selectionPopup.querySelectorAll('.insightflow-project-item').forEach(item => {
      item.addEventListener('click', () => {
        const projectId = item.dataset.id;
        saveToProject(text, projectId);
      });
    });
  }
  // 保存到指定项目
  async function saveToProject(text, projectId) {
    hideSelectionPopup();
    chrome.runtime.sendMessage({
      action: 'getConfig'
    }, (config) => {
      // 临时设置默认项目
      config.defaultProjectId = projectId;
      chrome.runtime.sendMessage({
        action: 'saveConfig',
        config: config
      }, () => {
        chrome.runtime.sendMessage({
          action: 'clipPage',
          selectionText: text
        });
      });
    });
  }
  // HTML 转义
  function escapeHtml(text) {
    const div = document.createElement('div');
    div.textContent = text;
    return div.innerHTML;
  }
  // 点击页面其他地方关闭弹窗
  document.addEventListener('click', (e) => {
    if (selectionPopup && !selectionPopup.contains(e.target) && 
        floatingButton && !floatingButton.contains(e.target)) {
      hideSelectionPopup();
      hideFloatingButton();
    }
  });
 })();
--- a/chrome-extension/manifest.json
+++ b/chrome-extension/manifest.json
@@ -0,0 +1,46 @@
 {
  "manifest_version": 3,
  "name": "InsightFlow Clipper",
  "version": "1.0.0",
  "description": "将网页内容一键导入 InsightFlow 知识库",
  "permissions": [
    "activeTab",
    "storage",
    "contextMenus",
    "scripting"
  ],
  "host_permissions": [
    "http://*/*",
    "https://*/*"
  ],
  "action": {
    "default_popup": "popup.html",
    "default_icon": {
      "16": "icons/icon16.png",
      "48": "icons/icon48.png",
      "128": "icons/icon128.png"
    }
  },
  "icons": {
    "16": "icons/icon16.png",
    "48": "icons/icon48.png",
    "128": "icons/icon128.png"
  },
  "background": {
    "service_worker": "background.js"
  },
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["content.js"],
      "css": ["content.css"]
    }
  ],
  "options_page": "options.html",
  "web_accessible_resources": [
    {
      "resources": ["icons/*.png"],
      "matches": ["<all_urls>"]
    }
  ]
 }
--- a/chrome-extension/options.html
+++ b/chrome-extension/options.html
@@ -0,0 +1,349 @@
 <!DOCTYPE html>
 <html lang="zh-CN">
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>InsightFlow Clipper 设置</title>
  <style>
    * {
      margin: 0;
      padding: 0;
      box-sizing: border-box;
    }
    body {
      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
      background: #f3f4f6;
      min-height: 100vh;
      padding: 40px 20px;
    }
    .container {
      max-width: 600px;
      margin: 0 auto;
    }
    .header {
      text-align: center;
      margin-bottom: 32px;
    }
    .header h1 {
      font-size: 28px;
      color: #111827;
      margin-bottom: 8px;
    }
    .header p {
      color: #6b7280;
    }
    .card {
      background: white;
      border-radius: 12px;
      padding: 24px;
      margin-bottom: 24px;
      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
    }
    .card-title {
      font-size: 18px;
      font-weight: 600;
      color: #111827;
      margin-bottom: 20px;
      display: flex;
      align-items: center;
      gap: 8px;
    }
    .form-group {
      margin-bottom: 20px;
    }
    .form-label {
      display: block;
      font-size: 14px;
      font-weight: 500;
      color: #374151;
      margin-bottom: 6px;
    }
    .form-input {
      width: 100%;
      padding: 10px 12px;
      border: 1px solid #d1d5db;
      border-radius: 6px;
      font-size: 14px;
      transition: border-color 0.2s;
    }
    .form-input:focus {
      outline: none;
      border-color: #4f46e5;
    }
    .form-hint {
      font-size: 12px;
      color: #6b7280;
      margin-top: 4px;
    }
    .btn {
      padding: 10px 20px;
      border: none;
      border-radius: 6px;
      font-size: 14px;
      font-weight: 500;
      cursor: pointer;
      transition: all 0.2s;
    }
    .btn-primary {
      background: #4f46e5;
      color: white;
    }
    .btn-primary:hover {
      background: #4338ca;
    }
    .btn-secondary {
      background: white;
      color: #374151;
      border: 1px solid #d1d5db;
    }
    .btn-secondary:hover {
      background: #f9fafb;
    }
    .btn-success {
      background: #10b981;
      color: white;
    }
    .actions {
      display: flex;
      gap: 12px;
      justify-content: flex-end;
      margin-top: 24px;
    }
    .status-badge {
      display: inline-flex;
      align-items: center;
      gap: 6px;
      padding: 6px 12px;
      border-radius: 20px;
      font-size: 12px;
      font-weight: 500;
    }
    .status-badge.success {
      background: #d1fae5;
      color: #065f46;
    }
    .status-badge.error {
      background: #fee2e2;
      color: #991b1b;
    }
    .status-dot {
      width: 6px;
      height: 6px;
      border-radius: 50%;
      background: currentColor;
    }
    .info-box {
      background: #eff6ff;
      border-left: 4px solid #3b82f6;
      padding: 16px;
      border-radius: 0 6px 6px 0;
      margin-bottom: 20px;
    }
    .info-box h4 {
      font-size: 14px;
      color: #1e40af;
      margin-bottom: 8px;
    }
    .info-box p {
      font-size: 13px;
      color: #3b82f6;
      line-height: 1.5;
    }
    .info-box code {
      background: rgba(255,255,255,0.5);
      padding: 2px 6px;
      border-radius: 3px;
      font-family: monospace;
    }
    .shortcut-list {
      list-style: none;
    }
    .shortcut-list li {
      display: flex;
      justify-content: space-between;
      padding: 12px 0;
      border-bottom: 1px solid #e5e7eb;
    }
    .shortcut-list li:last-child {
      border-bottom: none;
    }
    .shortcut-key {
      background: #f3f4f6;
      padding: 4px 8px;
      border-radius: 4px;
      font-size: 12px;
      font-family: monospace;
      color: #374151;
    }
    .footer {
      text-align: center;
      padding: 24px;
      color: #6b7280;
      font-size: 13px;
    }
    .footer a {
      color: #4f46e5;
      text-decoration: none;
    }
    .footer a:hover {
      text-decoration: underline;
    }
    #testResult {
      margin-top: 12px;
      padding: 12px;
      border-radius: 6px;
      font-size: 13px;
      display: none;
    }
    #testResult.success {
      display: block;
      background: #d1fae5;
      color: #065f46;
    }
    #testResult.error {
      display: block;
      background: #fee2e2;
      color: #991b1b;
    }
  </style>
 </head>
 <body>
  <div class="container">
    <div class="header">
      <h1>⚙️ InsightFlow Clipper 设置</h1>
      <p>配置您的知识库连接</p>
    </div>
    <div class="card">
      <div class="card-title">
        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
          <path d="M13 10V3L4 14h7v7l9-11h-7z"/>
        </svg>
        服务器连接
      </div>
      <div class="info-box">
        <h4>如何获取 API Key</h4>
        <p>
          1. 登录 InsightFlow 控制台<br>
          2. 进入「插件管理」页面<br>
          3. 创建 Chrome 插件并复制 API Key
        </p>
      </div>
      <div class="form-group">
        <label class="form-label">服务器地址</label>
        <input type="text" id="serverUrl" class="form-input" placeholder="http://122.51.127.111:18000">
        <p class="form-hint">InsightFlow 服务器的 URL 地址</p>
      </div>
      <div class="form-group">
        <label class="form-label">API Key</label>
        <input type="password" id="apiKey" class="form-input" placeholder="if_plugin_xxxxxxxx...">
        <p class="form-hint">从 InsightFlow 控制台获取的插件 API Key</p>
      </div>
      <div class="form-group">
        <button id="testBtn" class="btn btn-secondary">测试连接</button>
        <div id="testResult"></div>
      </div>
    </div>
    <div class="card">
      <div class="card-title">
        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
          <path d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z"/>
          <path d="M15 12a3 3 0 11-6 0 3 3 0 016 0z"/>
        </svg>
        默认设置
      </div>
      <div class="form-group">
        <label class="form-label">默认项目</label>
        <select id="defaultProject" class="form-input">
          <option value="">不设置默认项目</option>
        </select>
        <p class="form-hint">保存内容时默认导入的项目</p>
      </div>
    </div>
    <div class="card">
      <div class="card-title">
        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
          <path d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"/>
        </svg>
        使用说明
      </div>
      <ul class="shortcut-list">
        <li>
          <span>保存当前页面</span>
          <span class="shortcut-key">点击扩展图标</span>
        </li>
        <li>
          <span>保存选中文本</span>
          <span class="shortcut-key">右键 → 保存到 InsightFlow</span>
        </li>
        <li>
          <span>快速保存选中内容</span>
          <span class="shortcut-key">选中文本后点击浮动按钮</span>
        </li>
        <li>
          <span>选择项目保存</span>
          <span class="shortcut-key">选中文本后点击"选择项目"</span>
        </li>
      </ul>
    </div>
    <div class="actions">
      <button id="resetBtn" class="btn btn-secondary">重置</button>
      <button id="saveBtn" class="btn btn-primary">保存设置</button>
    </div>
    <div class="footer">
      <p>InsightFlow Clipper v1.0.0</p>
      <p><a href="#" id="openConsole">打开 InsightFlow 控制台</a> | <a href="#" id="helpLink">帮助文档</a></p>
    </div>
  </div>
  <script src="options.js"></script>
 </body>
 </html>
--- a/chrome-extension/options.js
+++ b/chrome-extension/options.js
@@ -0,0 +1,175 @@
 // InsightFlow Chrome Extension - Options Script
 document.addEventListener('DOMContentLoaded', () => {
  const serverUrlInput = document.getElementById('serverUrl');
  const apiKeyInput = document.getElementById('apiKey');
  const defaultProjectSelect = document.getElementById('defaultProject');
  const testBtn = document.getElementById('testBtn');
  const testResult = document.getElementById('testResult');
  const saveBtn = document.getElementById('saveBtn');
  const resetBtn = document.getElementById('resetBtn');
  const openConsole = document.getElementById('openConsole');
  const helpLink = document.getElementById('helpLink');
  // 加载配置
  loadConfig();
  // 测试连接
  testBtn.addEventListener('click', async () => {
    testBtn.disabled = true;
    testBtn.textContent = '测试中...';
    testResult.className = '';
    testResult.style.display = 'none';
    const serverUrl = serverUrlInput.value.trim();
    const apiKey = apiKeyInput.value.trim();
    if (!serverUrl || !apiKey) {
      showTestResult('请填写服务器地址和 API Key', 'error');
      testBtn.disabled = false;
      testBtn.textContent = '测试连接';
      return;
    }
    try {
      const response = await fetch(`${serverUrl}/api/v1/projects`, {
        headers: { 'X-API-Key': apiKey }
      });
      if (response.ok) {
        const data = await response.json();
        showTestResult(`连接成功！找到 ${data.projects?.length || 0} 个项目`, 'success');
        // 更新项目列表
        updateProjectList(data.projects || []);
      } else if (response.status === 401) {
        showTestResult('API Key 无效，请检查', 'error');
      } else {
        showTestResult(`连接失败: HTTP ${response.status}`, 'error');
      }
    } catch (error) {
      showTestResult(`连接错误: ${error.message}`, 'error');
    }
    testBtn.disabled = false;
    testBtn.textContent = '测试连接';
  });
  // 保存设置
  saveBtn.addEventListener('click', async () => {
    const config = {
      serverUrl: serverUrlInput.value.trim(),
      apiKey: apiKeyInput.value.trim(),
      defaultProjectId: defaultProjectSelect.value
    };
    if (!config.serverUrl) {
      alert('请填写服务器地址');
      return;
    }
    await chrome.storage.sync.set({ insightflowConfig: config });
    // 显示保存成功
    saveBtn.textContent = '已保存 ✓';
    saveBtn.classList.add('btn-success');
    setTimeout(() => {
      saveBtn.textContent = '保存设置';
      saveBtn.classList.remove('btn-success');
    }, 2000);
  });
  // 重置设置
  resetBtn.addEventListener('click', () => {
    if (confirm('确定要重置所有设置吗？')) {
      const defaultConfig = {
        serverUrl: 'http://122.51.127.111:18000',
        apiKey: '',
        defaultProjectId: ''
      };
      chrome.storage.sync.set({ insightflowConfig: defaultConfig }, () => {
        loadConfig();
        showTestResult('设置已重置', 'success');
      });
    }
  });
  // 打开控制台
  openConsole.addEventListener('click', (e) => {
    e.preventDefault();
    const serverUrl = serverUrlInput.value.trim();
    if (serverUrl) {
      chrome.tabs.create({ url: serverUrl });
    }
  });
  // 帮助链接
  helpLink.addEventListener('click', (e) => {
    e.preventDefault();
    const serverUrl = serverUrlInput.value.trim();
    if (serverUrl) {
      chrome.tabs.create({ url: `${serverUrl}/docs` });
    }
  });
  // 加载配置
  async function loadConfig() {
    const result = await chrome.storage.sync.get(['insightflowConfig']);
    const config = result.insightflowConfig || {
      serverUrl: 'http://122.51.127.111:18000',
      apiKey: '',
      defaultProjectId: ''
    };
    serverUrlInput.value = config.serverUrl;
    apiKeyInput.value = config.apiKey;
    // 如果有 API Key，加载项目列表
    if (config.apiKey) {
      loadProjects(config);
    }
  }
  // 加载项目列表
  async function loadProjects(config) {
    try {
      const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
        headers: { 'X-API-Key': config.apiKey }
      });
      if (response.ok) {
        const data = await response.json();
        updateProjectList(data.projects || [], config.defaultProjectId);
      }
    } catch (error) {
      console.error('Failed to load projects:', error);
    }
  }
  // 更新项目列表
  function updateProjectList(projects, selectedId = '') {
    let html = '<option value="">不设置默认项目</option>';
    projects.forEach(project => {
      const selected = project.id === selectedId ? 'selected' : '';
      html += `<option value="${project.id}" ${selected}>${escapeHtml(project.name)}</option>`;
    });
    defaultProjectSelect.innerHTML = html;
  }
  // 显示测试结果
  function showTestResult(message, type) {
    testResult.textContent = message;
    testResult.className = type;
  }
  // HTML 转义
  function escapeHtml(text) {
    const div = document.createElement('div');
    div.textContent = text;
    return div.innerHTML;
  }
 });
--- a/chrome-extension/popup.html
+++ b/chrome-extension/popup.html
@@ -0,0 +1,258 @@
 <!DOCTYPE html>
 <html lang="zh-CN">
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>InsightFlow Clipper</title>
  <style>
    * {
      margin: 0;
      padding: 0;
      box-sizing: border-box;
    }
    body {
      width: 360px;
      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
      background: #f9fafb;
    }
    .header {
      background: linear-gradient(135deg, #4f46e5 0%, #7c3aed 100%);
      color: white;
      padding: 20px;
      text-align: center;
    }
    .header h1 {
      font-size: 18px;
      font-weight: 600;
      margin-bottom: 4px;
    }
    .header p {
      font-size: 12px;
      opacity: 0.9;
    }
    .content {
      padding: 16px;
    }
    .status-card {
      background: white;
      border-radius: 8px;
      padding: 16px;
      margin-bottom: 16px;
      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
    }
    .status-header {
      display: flex;
      align-items: center;
      gap: 8px;
      margin-bottom: 12px;
    }
    .status-dot {
      width: 8px;
      height: 8px;
      border-radius: 50%;
      background: #10b981;
    }
    .status-dot.error {
      background: #ef4444;
    }
    .status-text {
      font-size: 14px;
      font-weight: 500;
      color: #111827;
    }
    .project-select {
      width: 100%;
      padding: 10px 12px;
      border: 1px solid #d1d5db;
      border-radius: 6px;
      font-size: 14px;
      background: white;
      cursor: pointer;
    }
    .project-select:focus {
      outline: none;
      border-color: #4f46e5;
    }
    .btn {
      width: 100%;
      padding: 12px;
      border: none;
      border-radius: 6px;
      font-size: 14px;
      font-weight: 500;
      cursor: pointer;
      transition: all 0.2s;
      display: flex;
      align-items: center;
      justify-content: center;
      gap: 8px;
    }
    .btn-primary {
      background: #4f46e5;
      color: white;
    }
    .btn-primary:hover {
      background: #4338ca;
    }
    .btn-secondary {
      background: white;
      color: #374151;
      border: 1px solid #d1d5db;
      margin-top: 8px;
    }
    .btn-secondary:hover {
      background: #f9fafb;
    }
    .stats {
      display: grid;
      grid-template-columns: repeat(3, 1fr);
      gap: 12px;
      margin-top: 16px;
    }
    .stat-item {
      text-align: center;
      padding: 12px;
      background: #f3f4f6;
      border-radius: 6px;
    }
    .stat-value {
      font-size: 20px;
      font-weight: 600;
      color: #4f46e5;
    }
    .stat-label {
      font-size: 11px;
      color: #6b7280;
      margin-top: 4px;
    }
    .footer {
      padding: 12px 16px;
      text-align: center;
      border-top: 1px solid #e5e7eb;
    }
    .footer a {
      color: #4f46e5;
      text-decoration: none;
      font-size: 12px;
    }
    .footer a:hover {
      text-decoration: underline;
    }
    .loading {
      display: inline-block;
      width: 16px;
      height: 16px;
      border: 2px solid #ffffff;
      border-top-color: transparent;
      border-radius: 50%;
      animation: spin 0.8s linear infinite;
    }
    @keyframes spin {
      to { transform: rotate(360deg); }
    }
    .message {
      padding: 12px;
      border-radius: 6px;
      font-size: 13px;
      margin-bottom: 12px;
      display: none;
    }
    .message.success {
      display: block;
      background: #d1fae5;
      color: #065f46;
    }
    .message.error {
      display: block;
      background: #fee2e2;
      color: #991b1b;
    }
  </style>
 </head>
 <body>
  <div class="header">
    <h1>🧠 InsightFlow</h1>
    <p>一键保存网页到知识库</p>
  </div>
  <div class="content">
    <div id="message" class="message"></div>
    <div class="status-card">
      <div class="status-header">
        <div id="statusDot" class="status-dot"></div>
        <span id="statusText" class="status-text">连接中...</span>
      </div>
      <select id="projectSelect" class="project-select">
        <option value="">选择保存项目...</option>
      </select>
    </div>
    <button id="clipBtn" class="btn btn-primary">
      <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
        <path d="M12 5v14M5 12h14"/>
      </svg>
      保存当前页面
    </button>
    <button id="settingsBtn" class="btn btn-secondary">
      <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
        <path d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z"/>
        <path d="M15 12a3 3 0 11-6 0 3 3 0 016 0z"/>
      </svg>
      设置
    </button>
    <div class="stats">
      <div class="stat-item">
        <div id="clipCount" class="stat-value">0</div>
        <div class="stat-label">已保存</div>
      </div>
      <div class="stat-item">
        <div id="projectCount" class="stat-value">0</div>
        <div class="stat-label">项目数</div>
      </div>
      <div class="stat-item">
        <div id="todayCount" class="stat-value">0</div>
        <div class="stat-label">今日</div>
      </div>
    </div>
  </div>
  <div class="footer">
    <a href="#" id="openDashboard">打开 InsightFlow 控制台 →</a>
  </div>
  <script src="popup.js"></script>
 </body>
 </html>
--- a/chrome-extension/popup.js
+++ b/chrome-extension/popup.js
@@ -0,0 +1,195 @@
 // InsightFlow Chrome Extension - Popup Script
 document.addEventListener('DOMContentLoaded', async () => {
  const clipBtn = document.getElementById('clipBtn');
  const settingsBtn = document.getElementById('settingsBtn');
  const projectSelect = document.getElementById('projectSelect');
  const statusDot = document.getElementById('statusDot');
  const statusText = document.getElementById('statusText');
  const messageEl = document.getElementById('message');
  const openDashboard = document.getElementById('openDashboard');
  // 加载配置和项目列表
  await loadConfig();
  // 保存当前页面按钮
  clipBtn.addEventListener('click', async () => {
    const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
    // 更新按钮状态
    clipBtn.disabled = true;
    clipBtn.innerHTML = '<span class="loading"></span> 保存中...';
    // 保存选中的项目
    const projectId = projectSelect.value;
    if (projectId) {
      const config = await getConfig();
      config.defaultProjectId = projectId;
      await saveConfig(config);
    }
    // 发送剪藏请求
    chrome.runtime.sendMessage({
      action: 'clipPage'
    }, (response) => {
      clipBtn.disabled = false;
      clipBtn.innerHTML = `
        <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
          <path d="M12 5v14M5 12h14"/>
        </svg>
        保存当前页面
      `;
      if (response && response.success) {
        showMessage('保存成功！', 'success');
        updateStats();
      } else {
        showMessage(response?.error || '保存失败', 'error');
      }
    });
  });
  // 设置按钮
  settingsBtn.addEventListener('click', () => {
    chrome.runtime.openOptionsPage();
  });
  // 打开控制台
  openDashboard.addEventListener('click', async (e) => {
    e.preventDefault();
    const config = await getConfig();
    chrome.tabs.create({ url: config.serverUrl });
  });
 });
 // 加载配置
 async function loadConfig() {
  const config = await getConfig();
  // 检查连接状态
  checkConnection(config);
  // 加载项目列表
  loadProjects(config);
  // 更新统计
  updateStats();
 }
 // 检查连接状态
 async function checkConnection(config) {
  const statusDot = document.getElementById('statusDot');
  const statusText = document.getElementById('statusText');
  if (!config.apiKey) {
    statusDot.classList.add('error');
    statusText.textContent = '未配置 API Key';
    return;
  }
  try {
    const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
      headers: { 'X-API-Key': config.apiKey }
    });
    if (response.ok) {
      statusText.textContent = '已连接';
    } else {
      statusDot.classList.add('error');
      statusText.textContent = '连接失败';
    }
  } catch (error) {
    statusDot.classList.add('error');
    statusText.textContent = '连接错误';
  }
 }
 // 加载项目列表
 async function loadProjects(config) {
  const projectSelect = document.getElementById('projectSelect');
  if (!config.apiKey) {
    projectSelect.innerHTML = '<option>请先配置 API Key</option>';
    return;
  }
  try {
    const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
      headers: { 'X-API-Key': config.apiKey }
    });
    if (response.ok) {
      const data = await response.json();
      const projects = data.projects || [];
      // 更新项目数统计
      document.getElementById('projectCount').textContent = projects.length;
      // 填充下拉框
      let html = '<option value="">选择保存项目...</option>';
      projects.forEach(project => {
        const selected = project.id === config.defaultProjectId ? 'selected' : '';
        html += `<option value="${project.id}" ${selected}>${escapeHtml(project.name)}</option>`;
      });
      projectSelect.innerHTML = html;
    }
  } catch (error) {
    console.error('Failed to load projects:', error);
  }
 }
 // 更新统计
 async function updateStats() {
  // 从存储中获取统计数据
  const result = await chrome.storage.local.get(['clipStats']);
  const stats = result.clipStats || { total: 0, today: 0, lastDate: null };
  // 检查是否需要重置今日计数
  const today = new Date().toDateString();
  if (stats.lastDate !== today) {
    stats.today = 0;
    stats.lastDate = today;
    await chrome.storage.local.set({ clipStats: stats });
  }
  document.getElementById('clipCount').textContent = stats.total;
  document.getElementById('todayCount').textContent = stats.today;
 }
 // 显示消息
 function showMessage(text, type) {
  const messageEl = document.getElementById('message');
  messageEl.textContent = text;
  messageEl.className = `message ${type}`;
  setTimeout(() => {
    messageEl.className = 'message';
  }, 3000);
 }
 // 获取配置
 function getConfig() {
  return new Promise((resolve) => {
    chrome.storage.sync.get(['insightflowConfig'], (result) => {
      resolve(result.insightflowConfig || {
        serverUrl: 'http://122.51.127.111:18000',
        apiKey: '',
        defaultProjectId: ''
      });
    });
  });
 }
 // 保存配置
 function saveConfig(config) {
  return new Promise((resolve) => {
    chrome.storage.sync.set({ insightflowConfig: config }, resolve);
  });
 }
 // HTML 转义
 function escapeHtml(text) {
  const div = document.createElement('div');
  div.textContent = text;
  return div.innerHTML;
 }
--- a/docs/PHASE7_TASK2_SUMMARY.md
+++ b/docs/PHASE7_TASK2_SUMMARY.md
@@ -0,0 +1,95 @@
 # InsightFlow Phase 7 任务 2 开发总结
 ## 完成内容
 ### 1. 多模态处理模块 (multimodal_processor.py)
 #### VideoProcessor 类
 - **视频文件处理**: 支持 MP4, AVI, MOV, MKV, WebM, FLV 格式
 - **音频提取**: 使用 ffmpeg 提取音频轨道（WAV 格式，16kHz 采样率）
 - **关键帧提取**: 使用 OpenCV 按时间间隔提取关键帧（默认每5秒）
 - **OCR识别**: 支持 PaddleOCR/EasyOCR/Tesseract 识别关键帧文字
 - **数据整合**: 合并所有帧的 OCR 文本，支持实体提取
 #### ImageProcessor 类
 - **图片处理**: 支持 JPG, PNG, GIF, BMP, WebP 格式
 - **OCR识别**: 识别图片中的文字内容（白板、PPT、手写笔记）
 - **图片描述**: 预留多模态 LLM 接口（待集成）
 - **批量处理**: 支持批量图片导入
 #### MultimodalEntityExtractor 类
 - 从视频和图片处理结果中提取实体和关系
 - 与现有 LLM 客户端集成
 ### 2. 多模态实体关联模块 (multimodal_entity_linker.py)
 #### MultimodalEntityLinker 类
 - **跨模态实体对齐**: 使用 embedding 相似度计算发现不同模态中的同一实体
 - **多模态实体画像**: 统计实体在各模态中的提及次数
 - **跨模态关系发现**: 查找在同一视频帧/图片中共同出现的实体
 - **多模态时间线**: 按时间顺序展示多模态事件
 ### 3. 数据库更新 (schema.sql)
 新增表：
 - `videos`: 视频信息表（时长、帧率、分辨率、OCR文本）
 - `video_frames`: 视频关键帧表（帧数据、时间戳、OCR文本）
 - `images`: 图片信息表（OCR文本、描述、提取的实体）
 - `multimodal_mentions`: 多模态实体提及表
 - `multimodal_entity_links`: 多模态实体关联表
 ### 4. API 端点 (main.py)
 #### 视频相关
 - `POST /api/v1/projects/{id}/upload-video` - 上传视频
 - `GET /api/v1/projects/{id}/videos` - 视频列表
 - `GET /api/v1/videos/{id}` - 视频详情
 #### 图片相关
 - `POST /api/v1/projects/{id}/upload-image` - 上传图片
 - `GET /api/v1/projects/{id}/images` - 图片列表
 - `GET /api/v1/images/{id}` - 图片详情
 #### 多模态实体关联
 - `POST /api/v1/projects/{id}/multimodal/link-entities` - 跨模态实体关联
 - `GET /api/v1/entities/{id}/multimodal-profile` - 实体多模态画像
 - `GET /api/v1/projects/{id}/multimodal-timeline` - 多模态时间线
 - `GET /api/v1/entities/{id}/cross-modal-relations` - 跨模态关系
 ### 5. 依赖更新 (requirements.txt)
 新增依赖：
 - `opencv-python==4.9.0.80` - 视频处理
 - `pillow==10.2.0` - 图片处理
 - `paddleocr==2.7.0.3` + `paddlepaddle==2.6.0` - OCR 引擎
 - `ffmpeg-python==0.2.0` - ffmpeg 封装
 - `sentence-transformers==2.3.1` - 跨模态对齐
 ## 系统要求
 - **ffmpeg**: 必须安装，用于视频和音频处理
 - **Python 3.8+**: 支持所有依赖库
 ## 待完善项
 1. **多模态 LLM 集成**: 图片描述功能需要集成 Kimi 或其他多模态模型 API
 2. **前端界面**: 需要开发视频/图片上传界面和多模态展示组件
 3. **性能优化**: 大视频文件处理可能需要异步任务队列
 4. **OCR 引擎选择**: 根据部署环境选择最适合的 OCR 引擎
 ## 部署说明
 ```bash
 # 安装系统依赖
 apt-get update
 apt-get install -y ffmpeg
 # 安装 Python 依赖
 pip install -r requirements.txt
 # 更新数据库
 sqlite3 insightflow.db < schema.sql
 # 启动服务
 python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
 ```