Phase 7 Task 7: 插件与集成系统

- 创建 plugin_manager.py 模块 - PluginManager: 插件管理主类 - ChromeExtensionHandler: Chrome 插件处理 - BotHandler: 飞书/钉钉/Slack 机器人处理 - WebhookIntegration: Zapier/Make Webhook 集成 - WebDAVSync: WebDAV 同步管理 - 创建完整的 Chrome 扩展代码 - manifest.json, background.js, content.js, content.css - popup.html/js: 弹出窗口界面 - options.html/js: 设置页面 - 支持网页剪藏、选中文本保存、项目选择 - 更新 schema.sql 添加插件相关数据库表 - plugins: 插件配置表 - bot_sessions: 机器人会话表 - webhook_endpoints: Webhook 端点表 - webdav_syncs: WebDAV 同步配置表 - plugin_activity_logs: 插件活动日志表 - 更新 main.py 添加插件相关 API 端点 - GET/POST /api/v1/plugins - 插件管理 - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页 - POST /api/v1/bots/webhook/{platform} - 接收机器人消息 - GET /api/v1/bots/sessions - 机器人会话列表 - POST /api/v1/webhook-endpoints - 创建 Webhook 端点 - POST /webhook/{type}/{token} - 接收外部 Webhook - POST /api/v1/webdav-syncs - WebDAV 同步配置 - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接 - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步 - 更新 requirements.txt 添加插件依赖 - beautifulsoup4: HTML 解析 - webdavclient3: WebDAV 客户端 - 更新 STATUS.md 和 README.md 开发进度
2026-02-23 12:09:15 +08:00
parent 08535e54ba
commit 797ca58e8e
27 changed files with 7350 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -191,12 +191,12 @@ MIT
 | 任务 | 状态 | 完成时间 |
 |------|------|----------|
 | 1. 智能工作流自动化 | ✅ 已完成 | 2026-02-23 |
-| 2. 多模态支持 | 🚧 进行中 | - |
+| 2. 多模态支持 | ✅ 已完成 | 2026-02-23 |
+| 7. 插件与集成 | ✅ 已完成 | 2026-02-23 |
 | 3. 数据安全与合规 | 📋 待开发 | - |
 | 4. 协作与共享 | 📋 待开发 | - |
 | 5. 智能报告生成 | 📋 待开发 | - |
 | 6. 高级搜索与发现 | 📋 待开发 | - |
-| 7. 插件与集成 | 📋 待开发 | - |
 | 8. 性能优化与扩展 | 📋 待开发 | - |

 **建议开发顺序**: 1 → 2 → 7 → 3 → 4 → 5 → 6 → 8
--- a/STATUS.md
+++ b/STATUS.md
@@ -1,10 +1,10 @@
 # InsightFlow 开发状态

-**最后更新**: 2026-02-23 00:00
+**最后更新**: 2026-02-23 06:00

 ## 当前阶段

-Phase 7: 工作流自动化 - **进行中 🚧**
+Phase 7: 插件与集成 - **已完成 ✅**

 ## 部署状态

@@ -36,7 +36,7 @@ Phase 7: 工作流自动化 - **进行中 🚧**
 - 导出功能
 - API 开放平台

-### Phase 7 - 工作流自动化 (进行中 🚧)
+### Phase 7 - 任务 1: 工作流自动化 (已完成 ✅)
 - ✅ 创建 workflow_manager.py - 工作流管理模块
  - WorkflowManager: 主管理类
  - WorkflowTask: 工作流任务定义
@@ -59,9 +59,81 @@ Phase 7: 工作流自动化 - **进行中 🚧**
  - POST /api/v1/webhooks/{id}/test - 测试 Webhook
 - ✅ 更新 requirements.txt - 添加 APScheduler 依赖

+### Phase 7 - 任务 2: 多模态支持 (已完成 ✅)
+- ✅ 创建 multimodal_processor.py - 多模态处理模块
+  - VideoProcessor: 视频处理器（提取音频 + 关键帧 + OCR）
+  - ImageProcessor: 图片处理器（OCR + 图片描述）
+  - MultimodalEntityExtractor: 多模态实体提取器
+  - 支持 PaddleOCR/EasyOCR/Tesseract 多种 OCR 引擎
+  - 支持 ffmpeg 视频处理
+- ✅ 创建 multimodal_entity_linker.py - 多模态实体关联模块
+  - MultimodalEntityLinker: 跨模态实体关联器
+  - 支持 embedding 相似度计算
+  - 多模态实体画像生成
+  - 跨模态关系发现
+  - 多模态时间线生成
+- ✅ 更新 schema.sql - 添加多模态相关数据库表
+  - videos: 视频表
+  - video_frames: 视频关键帧表
+  - images: 图片表
+  - multimodal_mentions: 多模态实体提及表
+  - multimodal_entity_links: 多模态实体关联表
+- ✅ 更新 main.py - 添加多模态相关 API 端点
+  - POST /api/v1/projects/{id}/upload-video - 上传视频
+  - POST /api/v1/projects/{id}/upload-image - 上传图片
+  - GET /api/v1/projects/{id}/videos - 视频列表
+  - GET /api/v1/projects/{id}/images - 图片列表
+  - GET /api/v1/videos/{id} - 视频详情
+  - GET /api/v1/images/{id} - 图片详情
+  - POST /api/v1/projects/{id}/multimodal/link-entities - 跨模态实体关联
+  - GET /api/v1/entities/{id}/multimodal-profile - 实体多模态画像
+  - GET /api/v1/projects/{id}/multimodal-timeline - 多模态时间线
+  - GET /api/v1/entities/{id}/cross-modal-relations - 跨模态关系
+- ✅ 更新 requirements.txt - 添加多模态依赖
+  - opencv-python: 视频处理
+  - pillow: 图片处理
+  - paddleocr/paddlepaddle: OCR 引擎
+  - ffmpeg-python: ffmpeg 封装
+  - sentence-transformers: 跨模态对齐
+
+### Phase 7 - 任务 7: 插件与集成 (已完成 ✅)
+- ✅ 创建 plugin_manager.py - 插件管理模块
+  - PluginManager: 插件管理主类
+  - ChromeExtensionHandler: Chrome 插件 API 处理
+  - BotHandler: 飞书/钉钉机器人处理
+  - WebhookIntegration: Zapier/Make Webhook 集成
+  - WebDAVSync: WebDAV 同步管理
+- ✅ 创建 Chrome 扩展代码
+  - manifest.json - 扩展配置
+  - background.js - 后台脚本，处理右键菜单和消息
+  - content.js - 内容脚本，页面交互和浮动按钮
+  - content.css - 内容样式
+  - popup.html/js - 弹出窗口
+  - options.html/js - 设置页面
+- ✅ 更新 schema.sql - 添加插件相关数据库表
+  - plugins: 插件配置表
+  - bot_sessions: 机器人会话表
+  - webhook_endpoints: Webhook 端点表
+  - webdav_syncs: WebDAV 同步配置表
+  - plugin_activity_logs: 插件活动日志表
+- ✅ 更新 main.py - 添加插件相关 API 端点
+  - GET/POST /api/v1/plugins - 插件管理
+  - POST /api/v1/plugins/chrome/clip - Chrome 插件保存网页
+  - POST /api/v1/bots/webhook/{platform} - 接收机器人消息
+  - GET /api/v1/bots/sessions - 机器人会话列表
+  - POST /api/v1/webhook-endpoints - 创建 Webhook 端点
+  - POST /webhook/{type}/{token} - 接收外部 Webhook
+  - POST /api/v1/webdav-syncs - WebDAV 同步配置
+  - POST /api/v1/webdav-syncs/{id}/test - 测试 WebDAV 连接
+  - POST /api/v1/webdav-syncs/{id}/sync - 触发 WebDAV 同步
+  - GET /api/v1/plugins/{id}/logs - 插件活动日志
+- ✅ 更新 requirements.txt - 添加插件依赖
+  - beautifulsoup4: HTML 解析
+  - webdavclient3: WebDAV 客户端
+
 ## 待完成

-无 - Phase 7 任务 1 已完成
+Phase 7 任务 3: 数据安全与合规

 ## 技术债务

@@ -69,6 +141,7 @@ Phase 7: 工作流自动化 - **进行中 🚧**
 - 实体相似度匹配目前只是简单字符串包含，需要 embedding 方案
 - 前端需要状态管理（目前使用全局变量）
 - ~~需要添加 API 文档 (OpenAPI/Swagger)~~ ✅ 已完成
+- 多模态 LLM 图片描述功能待实现（需要集成多模态模型 API）

 ## 部署信息

@@ -78,6 +151,36 @@ Phase 7: 工作流自动化 - **进行中 🚧**

 ## 最近更新

+### 2026-02-23 (午间)
+- 完成 Phase 7 任务 7: 插件与集成
+  - 创建 plugin_manager.py 模块
+    - PluginManager: 插件管理主类
+    - ChromeExtensionHandler: Chrome 插件处理
+    - BotHandler: 飞书/钉钉/Slack 机器人处理
+    - WebhookIntegration: Zapier/Make Webhook 集成
+    - WebDAVSync: WebDAV 同步管理
+  - 创建完整的 Chrome 扩展代码
+    - manifest.json, background.js, content.js
+    - popup.html/js, options.html/js
+    - 支持网页剪藏、选中文本保存、项目选择
+  - 更新 schema.sql 添加插件相关数据库表
+  - 更新 main.py 添加插件相关 API 端点
+  - 更新 requirements.txt 添加插件依赖
+
+### 2026-02-23 (早间)
+- 完成 Phase 7 任务 2: 多模态支持
+  - 创建 multimodal_processor.py 模块
+    - VideoProcessor: 视频处理（音频提取 + 关键帧 + OCR）
+    - ImageProcessor: 图片处理（OCR + 图片描述）
+    - MultimodalEntityExtractor: 多模态实体提取
+  - 创建 multimodal_entity_linker.py 模块
+    - MultimodalEntityLinker: 跨模态实体关联
+    - 支持 embedding 相似度计算
+    - 多模态实体画像和时间线
+  - 更新 schema.sql 添加多模态相关数据库表
+  - 更新 main.py 添加多模态相关 API 端点
+  - 更新 requirements.txt 添加多模态依赖
+
 ### 2026-02-23
 - 完成 Phase 7 任务 1: 工作流自动化模块
  - 创建 workflow_manager.py 模块
--- a/backend/pycache/db_manager.cpython-312.pyc
+++ b/backend/pycache/db_manager.cpython-312.pyc
--- a/backend/pycache/image_processor.cpython-312.pyc
+++ b/backend/pycache/image_processor.cpython-312.pyc
--- a/backend/pycache/main.cpython-312.pyc
+++ b/backend/pycache/main.cpython-312.pyc
--- a/backend/pycache/multimodal_entity_linker.cpython-312.pyc
+++ b/backend/pycache/multimodal_entity_linker.cpython-312.pyc
--- a/backend/pycache/multimodal_processor.cpython-312.pyc
+++ b/backend/pycache/multimodal_processor.cpython-312.pyc
--- a/backend/db_manager.py
+++ b/backend/db_manager.py
@@ -878,6 +878,310 @@ class DatabaseManager:
                filtered.append(entity)
        return filtered

+    # ==================== Phase 7: Multimodal Support ====================
+    
+    def create_video(self, video_id: str, project_id: str, filename: str, 
+                     duration: float = 0, fps: float = 0, resolution: Dict = None,
+                     audio_transcript_id: str = None, full_ocr_text: str = "",
+                     extracted_entities: List[Dict] = None, 
+                     extracted_relations: List[Dict] = None) -> str:
+        """创建视频记录"""
+        conn = self.get_conn()
+        now = datetime.now().isoformat()
+        
+        conn.execute(
+            """INSERT INTO videos 
+               (id, project_id, filename, duration, fps, resolution,
+                audio_transcript_id, full_ocr_text, extracted_entities, 
+                extracted_relations, status, created_at, updated_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+            (video_id, project_id, filename, duration, fps,
+             json.dumps(resolution) if resolution else None,
+             audio_transcript_id, full_ocr_text,
+             json.dumps(extracted_entities or []),
+             json.dumps(extracted_relations or []),
+             'completed', now, now)
+        )
+        conn.commit()
+        conn.close()
+        return video_id
+    
+    def get_video(self, video_id: str) -> Optional[Dict]:
+        """获取视频信息"""
+        conn = self.get_conn()
+        row = conn.execute(
+            "SELECT * FROM videos WHERE id = ?", (video_id,)
+        ).fetchone()
+        conn.close()
+        
+        if row:
+            data = dict(row)
+            data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None
+            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
+            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
+            return data
+        return None
+    
+    def list_project_videos(self, project_id: str) -> List[Dict]:
+        """获取项目的所有视频"""
+        conn = self.get_conn()
+        rows = conn.execute(
+            "SELECT * FROM videos WHERE project_id = ? ORDER BY created_at DESC",
+            (project_id,)
+        ).fetchall()
+        conn.close()
+        
+        videos = []
+        for row in rows:
+            data = dict(row)
+            data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None
+            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
+            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
+            videos.append(data)
+        return videos
+    
+    def create_video_frame(self, frame_id: str, video_id: str, frame_number: int,
+                          timestamp: float, image_url: str = None, 
+                          ocr_text: str = None, extracted_entities: List[Dict] = None) -> str:
+        """创建视频帧记录"""
+        conn = self.get_conn()
+        now = datetime.now().isoformat()
+        
+        conn.execute(
+            """INSERT INTO video_frames 
+               (id, video_id, frame_number, timestamp, image_url, ocr_text, extracted_entities, created_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
+            (frame_id, video_id, frame_number, timestamp, image_url, ocr_text,
+             json.dumps(extracted_entities or []), now)
+        )
+        conn.commit()
+        conn.close()
+        return frame_id
+    
+    def get_video_frames(self, video_id: str) -> List[Dict]:
+        """获取视频的所有帧"""
+        conn = self.get_conn()
+        rows = conn.execute(
+            """SELECT * FROM video_frames WHERE video_id = ? ORDER BY timestamp""",
+            (video_id,)
+        ).fetchall()
+        conn.close()
+        
+        frames = []
+        for row in rows:
+            data = dict(row)
+            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
+            frames.append(data)
+        return frames
+    
+    def create_image(self, image_id: str, project_id: str, filename: str,
+                     ocr_text: str = "", description: str = "",
+                     extracted_entities: List[Dict] = None,
+                     extracted_relations: List[Dict] = None) -> str:
+        """创建图片记录"""
+        conn = self.get_conn()
+        now = datetime.now().isoformat()
+        
+        conn.execute(
+            """INSERT INTO images 
+               (id, project_id, filename, ocr_text, description,
+                extracted_entities, extracted_relations, status, created_at, updated_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+            (image_id, project_id, filename, ocr_text, description,
+             json.dumps(extracted_entities or []),
+             json.dumps(extracted_relations or []),
+             'completed', now, now)
+        )
+        conn.commit()
+        conn.close()
+        return image_id
+    
+    def get_image(self, image_id: str) -> Optional[Dict]:
+        """获取图片信息"""
+        conn = self.get_conn()
+        row = conn.execute(
+            "SELECT * FROM images WHERE id = ?", (image_id,)
+        ).fetchone()
+        conn.close()
+        
+        if row:
+            data = dict(row)
+            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
+            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
+            return data
+        return None
+    
+    def list_project_images(self, project_id: str) -> List[Dict]:
+        """获取项目的所有图片"""
+        conn = self.get_conn()
+        rows = conn.execute(
+            "SELECT * FROM images WHERE project_id = ? ORDER BY created_at DESC",
+            (project_id,)
+        ).fetchall()
+        conn.close()
+        
+        images = []
+        for row in rows:
+            data = dict(row)
+            data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else []
+            data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else []
+            images.append(data)
+        return images
+    
+    def create_multimodal_mention(self, mention_id: str, project_id: str, 
+                                  entity_id: str, modality: str, source_id: str,
+                                  source_type: str, text_snippet: str = "",
+                                  confidence: float = 1.0) -> str:
+        """创建多模态实体提及记录"""
+        conn = self.get_conn()
+        now = datetime.now().isoformat()
+        
+        conn.execute(
+            """INSERT OR REPLACE INTO multimodal_mentions 
+               (id, project_id, entity_id, modality, source_id, source_type, 
+                text_snippet, confidence, created_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+            (mention_id, project_id, entity_id, modality, source_id, 
+             source_type, text_snippet, confidence, now)
+        )
+        conn.commit()
+        conn.close()
+        return mention_id
+    
+    def get_entity_multimodal_mentions(self, entity_id: str) -> List[Dict]:
+        """获取实体的多模态提及"""
+        conn = self.get_conn()
+        rows = conn.execute(
+            """SELECT m.*, e.name as entity_name
+               FROM multimodal_mentions m
+               JOIN entities e ON m.entity_id = e.id
+               WHERE m.entity_id = ? ORDER BY m.created_at DESC""",
+            (entity_id,)
+        ).fetchall()
+        conn.close()
+        return [dict(r) for r in rows]
+    
+    def get_project_multimodal_mentions(self, project_id: str, 
+                                        modality: str = None) -> List[Dict]:
+        """获取项目的多模态提及"""
+        conn = self.get_conn()
+        
+        if modality:
+            rows = conn.execute(
+                """SELECT m.*, e.name as entity_name
+                   FROM multimodal_mentions m
+                   JOIN entities e ON m.entity_id = e.id
+                   WHERE m.project_id = ? AND m.modality = ?
+                   ORDER BY m.created_at DESC""",
+                (project_id, modality)
+            ).fetchall()
+        else:
+            rows = conn.execute(
+                """SELECT m.*, e.name as entity_name
+                   FROM multimodal_mentions m
+                   JOIN entities e ON m.entity_id = e.id
+                   WHERE m.project_id = ? ORDER BY m.created_at DESC""",
+                (project_id,)
+            ).fetchall()
+        
+        conn.close()
+        return [dict(r) for r in rows]
+    
+    def create_multimodal_entity_link(self, link_id: str, entity_id: str,
+                                      linked_entity_id: str, link_type: str,
+                                      confidence: float = 1.0, 
+                                      evidence: str = "",
+                                      modalities: List[str] = None) -> str:
+        """创建多模态实体关联"""
+        conn = self.get_conn()
+        now = datetime.now().isoformat()
+        
+        conn.execute(
+            """INSERT OR REPLACE INTO multimodal_entity_links 
+               (id, entity_id, linked_entity_id, link_type, confidence, 
+                evidence, modalities, created_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
+            (link_id, entity_id, linked_entity_id, link_type, confidence,
+             evidence, json.dumps(modalities or []), now)
+        )
+        conn.commit()
+        conn.close()
+        return link_id
+    
+    def get_entity_multimodal_links(self, entity_id: str) -> List[Dict]:
+        """获取实体的多模态关联"""
+        conn = self.get_conn()
+        rows = conn.execute(
+            """SELECT l.*, e1.name as entity_name, e2.name as linked_entity_name
+               FROM multimodal_entity_links l
+               JOIN entities e1 ON l.entity_id = e1.id
+               JOIN entities e2 ON l.linked_entity_id = e2.id
+               WHERE l.entity_id = ? OR l.linked_entity_id = ?""",
+            (entity_id, entity_id)
+        ).fetchall()
+        conn.close()
+        
+        links = []
+        for row in rows:
+            data = dict(row)
+            data['modalities'] = json.loads(data['modalities']) if data['modalities'] else []
+            links.append(data)
+        return links
+    
+    def get_project_multimodal_stats(self, project_id: str) -> Dict:
+        """获取项目多模态统计信息"""
+        conn = self.get_conn()
+        
+        stats = {
+            'video_count': 0,
+            'image_count': 0,
+            'multimodal_entity_count': 0,
+            'cross_modal_links': 0,
+            'modality_distribution': {}
+        }
+        
+        # 视频数量
+        row = conn.execute(
+            "SELECT COUNT(*) as count FROM videos WHERE project_id = ?",
+            (project_id,)
+        ).fetchone()
+        stats['video_count'] = row['count']
+        
+        # 图片数量
+        row = conn.execute(
+            "SELECT COUNT(*) as count FROM images WHERE project_id = ?",
+            (project_id,)
+        ).fetchone()
+        stats['image_count'] = row['count']
+        
+        # 多模态实体数量
+        row = conn.execute(
+            """SELECT COUNT(DISTINCT entity_id) as count 
+               FROM multimodal_mentions WHERE project_id = ?""",
+            (project_id,)
+        ).fetchone()
+        stats['multimodal_entity_count'] = row['count']
+        
+        # 跨模态关联数量
+        row = conn.execute(
+            """SELECT COUNT(*) as count FROM multimodal_entity_links 
+               WHERE entity_id IN (SELECT id FROM entities WHERE project_id = ?)""",
+            (project_id,)
+        ).fetchone()
+        stats['cross_modal_links'] = row['count']
+        
+        # 模态分布
+        for modality in ['audio', 'video', 'image', 'document']:
+            row = conn.execute(
+                """SELECT COUNT(*) as count FROM multimodal_mentions 
+                   WHERE project_id = ? AND modality = ?""",
+                (project_id, modality)
+            ).fetchone()
+            stats['modality_distribution'][modality] = row['count']
+        
+        conn.close()
+        return stats
+

 # Singleton instance
 _db_manager = None
--- a/backend/docs/multimodal_api.md
+++ b/backend/docs/multimodal_api.md
@@ -0,0 +1,308 @@
+# InsightFlow Phase 7 - 多模态支持 API 文档
+
+## 概述
+
+Phase 7 多模态支持模块为 InsightFlow 添加了处理视频和图片的能力，支持：
+
+1. **视频处理**：提取音频、关键帧、OCR 识别
+2. **图片处理**：识别白板、PPT、手写笔记等内容
+3. **多模态实体关联**：跨模态实体对齐和知识融合
+
+## 新增 API 端点
+
+### 视频处理
+
+#### 上传视频
+```
+POST /api/v1/projects/{project_id}/upload-video
+```
+
+**参数：**
+- `file` (required): 视频文件
+- `extract_interval` (optional): 关键帧提取间隔（秒），默认 5 秒
+
+**响应：**
+```json
+{
+  "video_id": "abc123",
+  "project_id": "proj456",
+  "filename": "meeting.mp4",
+  "status": "completed",
+  "audio_extracted": true,
+  "frame_count": 24,
+  "ocr_text_preview": "会议内容预览...",
+  "message": "Video processed successfully"
+}
+```
+
+#### 获取项目视频列表
+```
+GET /api/v1/projects/{project_id}/videos
+```
+
+**响应：**
+```json
+[
+  {
+    "id": "abc123",
+    "filename": "meeting.mp4",
+    "duration": 120.5,
+    "fps": 30.0,
+    "resolution": {"width": 1920, "height": 1080},
+    "ocr_preview": "会议内容...",
+    "status": "completed",
+    "created_at": "2024-01-15T10:30:00"
+  }
+]
+```
+
+#### 获取视频关键帧
+```
+GET /api/v1/videos/{video_id}/frames
+```
+
+**响应：**
+```json
+[
+  {
+    "id": "frame001",
+    "frame_number": 1,
+    "timestamp": 0.0,
+    "image_url": "/tmp/frames/video123/frame_000001_0.00.jpg",
+    "ocr_text": "第一页内容...",
+    "entities": [{"name": "Project Alpha", "type": "PROJECT"}]
+  }
+]
+```
+
+### 图片处理
+
+#### 上传图片
+```
+POST /api/v1/projects/{project_id}/upload-image
+```
+
+**参数：**
+- `file` (required): 图片文件
+- `detect_type` (optional): 是否自动检测图片类型，默认 true
+
+**响应：**
+```json
+{
+  "image_id": "img789",
+  "project_id": "proj456",
+  "filename": "whiteboard.jpg",
+  "image_type": "whiteboard",
+  "ocr_text_preview": "白板内容...",
+  "description": "这是一张白板图片。内容摘要：...",
+  "entity_count": 5,
+  "status": "completed"
+}
+```
+
+#### 批量上传图片
+```
+POST /api/v1/projects/{project_id}/upload-images-batch
+```
+
+**参数：**
+- `files` (required): 多个图片文件
+
+**响应：**
+```json
+{
+  "project_id": "proj456",
+  "total_count": 3,
+  "success_count": 3,
+  "failed_count": 0,
+  "results": [
+    {
+      "image_id": "img001",
+      "status": "success",
+      "image_type": "ppt",
+      "entity_count": 4
+    }
+  ]
+}
+```
+
+#### 获取项目图片列表
+```
+GET /api/v1/projects/{project_id}/images
+```
+
+### 多模态实体关联
+
+#### 跨模态实体对齐
+```
+POST /api/v1/projects/{project_id}/multimodal/align
+```
+
+**参数：**
+- `threshold` (optional): 相似度阈值，默认 0.85
+
+**响应：**
+```json
+{
+  "project_id": "proj456",
+  "aligned_count": 5,
+  "links": [
+    {
+      "link_id": "link001",
+      "source_entity_id": "ent001",
+      "target_entity_id": "ent002",
+      "source_modality": "video",
+      "target_modality": "document",
+      "link_type": "same_as",
+      "confidence": 0.95,
+      "evidence": "Cross-modal alignment: exact"
+    }
+  ],
+  "message": "Successfully aligned 5 cross-modal entity pairs"
+}
+```
+
+#### 获取多模态统计信息
+```
+GET /api/v1/projects/{project_id}/multimodal/stats
+```
+
+**响应：**
+```json
+{
+  "project_id": "proj456",
+  "video_count": 3,
+  "image_count": 10,
+  "multimodal_entity_count": 25,
+  "cross_modal_links": 8,
+  "modality_distribution": {
+    "audio": 15,
+    "video": 8,
+    "image": 12,
+    "document": 20
+  }
+}
+```
+
+#### 获取实体多模态提及
+```
+GET /api/v1/entities/{entity_id}/multimodal-mentions
+```
+
+**响应：**
+```json
+[
+  {
+    "id": "mention001",
+    "entity_id": "ent001",
+    "entity_name": "Project Alpha",
+    "modality": "video",
+    "source_id": "video123",
+    "source_type": "video_frame",
+    "text_snippet": "Project Alpha 进度",
+    "confidence": 1.0,
+    "created_at": "2024-01-15T10:30:00"
+  }
+]
+```
+
+#### 建议多模态实体合并
+```
+GET /api/v1/projects/{project_id}/multimodal/suggest-merges
+```
+
+**响应：**
+```json
+{
+  "project_id": "proj456",
+  "suggestion_count": 3,
+  "suggestions": [
+    {
+      "entity1": {"id": "ent001", "name": "K8s", "type": "TECH"},
+      "entity2": {"id": "ent002", "name": "Kubernetes", "type": "TECH"},
+      "similarity": 0.95,
+      "match_type": "alias_match",
+      "suggested_action": "merge"
+    }
+  ]
+}
+```
+
+## 数据库表结构
+
+### videos 表
+存储视频文件信息
+- `id`: 视频ID
+- `project_id`: 所属项目ID
+- `filename`: 文件名
+- `duration`: 视频时长（秒）
+- `fps`: 帧率
+- `resolution`: 分辨率（JSON）
+- `audio_transcript_id`: 关联的音频转录ID
+- `full_ocr_text`: 所有帧OCR文本合并
+- `extracted_entities`: 提取的实体（JSON）
+- `extracted_relations`: 提取的关系（JSON）
+- `status`: 处理状态
+
+### video_frames 表
+存储视频关键帧信息
+- `id`: 帧ID
+- `video_id`: 所属视频ID
+- `frame_number`: 帧序号
+- `timestamp`: 时间戳（秒）
+- `image_url`: 图片URL或路径
+- `ocr_text`: OCR识别文本
+- `extracted_entities`: 该帧提取的实体
+
+### images 表
+存储图片文件信息
+- `id`: 图片ID
+- `project_id`: 所属项目ID
+- `filename`: 文件名
+- `ocr_text`: OCR识别文本
+- `description`: 图片描述
+- `extracted_entities`: 提取的实体
+- `extracted_relations`: 提取的关系
+- `status`: 处理状态
+
+### multimodal_mentions 表
+存储实体在多模态中的提及
+- `id`: 提及ID
+- `project_id`: 所属项目ID
+- `entity_id`: 实体ID
+- `modality`: 模态类型（audio/video/image/document）
+- `source_id`: 来源ID
+- `source_type`: 来源类型
+- `text_snippet`: 文本片段
+- `confidence`: 置信度
+
+### multimodal_entity_links 表
+存储跨模态实体关联
+- `id`: 关联ID
+- `entity_id`: 实体ID
+- `linked_entity_id`: 关联实体ID
+- `link_type`: 关联类型（same_as/related_to/part_of）
+- `confidence`: 置信度
+- `evidence`: 关联证据
+- `modalities`: 涉及的模态列表
+
+## 依赖安装
+
+```bash
+pip install ffmpeg-python pillow opencv-python pytesseract
+```
+
+注意：使用 OCR 功能需要安装 Tesseract OCR 引擎：
+- Ubuntu/Debian: `sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim`
+- macOS: `brew install tesseract tesseract-lang`
+- Windows: 下载安装包从 https://github.com/UB-Mannheim/tesseract/wiki
+
+## 环境变量
+
+```bash
+# 可选：自定义临时目录
+export INSIGHTFLOW_TEMP_DIR=/path/to/temp
+
+# 可选：Tesseract 路径（Windows）
+export TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
+```
--- a/backend/image_processor.py
+++ b/backend/image_processor.py
@@ -0,0 +1,547 @@
+#!/usr/bin/env python3
+"""
+InsightFlow Image Processor - Phase 7
+图片处理模块：识别白板、PPT、手写笔记等内容
+"""
+
+import os
+import io
+import json
+import uuid
+import base64
+from typing import List, Dict, Optional, Tuple
+from dataclasses import dataclass
+from pathlib import Path
+
+# 尝试导入图像处理库
+try:
+    from PIL import Image, ImageEnhance, ImageFilter
+    PIL_AVAILABLE = True
+except ImportError:
+    PIL_AVAILABLE = False
+
+try:
+    import cv2
+    import numpy as np
+    CV2_AVAILABLE = True
+except ImportError:
+    CV2_AVAILABLE = False
+
+try:
+    import pytesseract
+    PYTESSERACT_AVAILABLE = True
+except ImportError:
+    PYTESSERACT_AVAILABLE = False
+
+
+@dataclass
+class ImageEntity:
+    """图片中检测到的实体"""
+    name: str
+    type: str
+    confidence: float
+    bbox: Optional[Tuple[int, int, int, int]] = None  # (x, y, width, height)
+
+
+@dataclass
+class ImageRelation:
+    """图片中检测到的关系"""
+    source: str
+    target: str
+    relation_type: str
+    confidence: float
+
+
+@dataclass
+class ImageProcessingResult:
+    """图片处理结果"""
+    image_id: str
+    image_type: str  # whiteboard, ppt, handwritten, screenshot, other
+    ocr_text: str
+    description: str
+    entities: List[ImageEntity]
+    relations: List[ImageRelation]
+    width: int
+    height: int
+    success: bool
+    error_message: str = ""
+
+
+@dataclass
+class BatchProcessingResult:
+    """批量图片处理结果"""
+    results: List[ImageProcessingResult]
+    total_count: int
+    success_count: int
+    failed_count: int
+
+
+class ImageProcessor:
+    """图片处理器 - 处理各种类型图片"""
+    
+    # 图片类型定义
+    IMAGE_TYPES = {
+        'whiteboard': '白板',
+        'ppt': 'PPT/演示文稿',
+        'handwritten': '手写笔记',
+        'screenshot': '屏幕截图',
+        'document': '文档图片',
+        'other': '其他'
+    }
+    
+    def __init__(self, temp_dir: str = None):
+        """
+        初始化图片处理器
+        
+        Args:
+            temp_dir: 临时文件目录
+        """
+        self.temp_dir = temp_dir or os.path.join(os.getcwd(), 'temp', 'images')
+        os.makedirs(self.temp_dir, exist_ok=True)
+    
+    def preprocess_image(self, image, image_type: str = None):
+        """
+        预处理图片以提高OCR质量
+        
+        Args:
+            image: PIL Image 对象
+            image_type: 图片类型（用于针对性处理）
+            
+        Returns:
+            处理后的图片
+        """
+        if not PIL_AVAILABLE:
+            return image
+        
+        try:
+            # 转换为RGB（如果是RGBA）
+            if image.mode == 'RGBA':
+                image = image.convert('RGB')
+            
+            # 根据图片类型进行针对性处理
+            if image_type == 'whiteboard':
+                # 白板：增强对比度，去除背景
+                image = self._enhance_whiteboard(image)
+            elif image_type == 'handwritten':
+                # 手写笔记：降噪，增强对比度
+                image = self._enhance_handwritten(image)
+            elif image_type == 'screenshot':
+                # 截图：轻微锐化
+                image = image.filter(ImageFilter.SHARPEN)
+            
+            # 通用处理：调整大小（如果太大）
+            max_size = 4096
+            if max(image.size) > max_size:
+                ratio = max_size / max(image.size)
+                new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
+                image = image.resize(new_size, Image.Resampling.LANCZOS)
+            
+            return image
+        except Exception as e:
+            print(f"Image preprocessing error: {e}")
+            return image
+    
+    def _enhance_whiteboard(self, image):
+        """增强白板图片"""
+        # 转换为灰度
+        gray = image.convert('L')
+        
+        # 增强对比度
+        enhancer = ImageEnhance.Contrast(gray)
+        enhanced = enhancer.enhance(2.0)
+        
+        # 二值化
+        threshold = 128
+        binary = enhanced.point(lambda x: 0 if x < threshold else 255, '1')
+        
+        return binary.convert('L')
+    
+    def _enhance_handwritten(self, image):
+        """增强手写笔记图片"""
+        # 转换为灰度
+        gray = image.convert('L')
+        
+        # 轻微降噪
+        blurred = gray.filter(ImageFilter.GaussianBlur(radius=1))
+        
+        # 增强对比度
+        enhancer = ImageEnhance.Contrast(blurred)
+        enhanced = enhancer.enhance(1.5)
+        
+        return enhanced
+    
+    def detect_image_type(self, image, ocr_text: str = "") -> str:
+        """
+        自动检测图片类型
+        
+        Args:
+            image: PIL Image 对象
+            ocr_text: OCR识别的文本
+            
+        Returns:
+            图片类型字符串
+        """
+        if not PIL_AVAILABLE:
+            return 'other'
+        
+        try:
+            # 基于图片特征和OCR内容判断类型
+            width, height = image.size
+            aspect_ratio = width / height
+            
+            # 检测是否为PPT（通常是16:9或4:3）
+            if 1.3 <= aspect_ratio <= 1.8:
+                # 检查是否有典型的PPT特征（标题、项目符号等）
+                if any(keyword in ocr_text.lower() for keyword in ['slide', 'page', '第', '页']):
+                    return 'ppt'
+            
+            # 检测是否为白板（大量手写文字，可能有箭头、框等）
+            if CV2_AVAILABLE:
+                img_array = np.array(image.convert('RGB'))
+                gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
+                
+                # 检测边缘（白板通常有很多线条）
+                edges = cv2.Canny(gray, 50, 150)
+                edge_ratio = np.sum(edges > 0) / edges.size
+                
+                # 如果边缘比例高，可能是白板
+                if edge_ratio > 0.05 and len(ocr_text) > 50:
+                    return 'whiteboard'
+            
+            # 检测是否为手写笔记（文字密度高，可能有涂鸦）
+            if len(ocr_text) > 100 and aspect_ratio < 1.5:
+                # 检查手写特征（不规则的行高）
+                return 'handwritten'
+            
+            # 检测是否为截图（可能有UI元素）
+            if any(keyword in ocr_text.lower() for keyword in ['button', 'menu', 'click', '登录', '确定', '取消']):
+                return 'screenshot'
+            
+            # 默认文档类型
+            if len(ocr_text) > 200:
+                return 'document'
+            
+            return 'other'
+        except Exception as e:
+            print(f"Image type detection error: {e}")
+            return 'other'
+    
+    def perform_ocr(self, image, lang: str = 'chi_sim+eng') -> Tuple[str, float]:
+        """
+        对图片进行OCR识别
+        
+        Args:
+            image: PIL Image 对象
+            lang: OCR语言
+            
+        Returns:
+            (识别的文本, 置信度)
+        """
+        if not PYTESSERACT_AVAILABLE:
+            return "", 0.0
+        
+        try:
+            # 预处理图片
+            processed_image = self.preprocess_image(image)
+            
+            # 执行OCR
+            text = pytesseract.image_to_string(processed_image, lang=lang)
+            
+            # 获取置信度
+            data = pytesseract.image_to_data(processed_image, output_type=pytesseract.Output.DICT)
+            confidences = [int(c) for c in data['conf'] if int(c) > 0]
+            avg_confidence = sum(confidences) / len(confidences) if confidences else 0
+            
+            return text.strip(), avg_confidence / 100.0
+        except Exception as e:
+            print(f"OCR error: {e}")
+            return "", 0.0
+    
+    def extract_entities_from_text(self, text: str) -> List[ImageEntity]:
+        """
+        从OCR文本中提取实体
+        
+        Args:
+            text: OCR识别的文本
+            
+        Returns:
+            实体列表
+        """
+        entities = []
+        
+        # 简单的实体提取规则（可以替换为LLM调用）
+        # 提取大写字母开头的词组（可能是专有名词）
+        import re
+        
+        # 项目名称（通常是大写或带引号）
+        project_pattern = r'["\']([^"\']+)["\']|([A-Z][a-zA-Z0-9]*(?:\s+[A-Z][a-zA-Z0-9]*)+)'
+        for match in re.finditer(project_pattern, text):
+            name = match.group(1) or match.group(2)
+            if name and len(name) > 2:
+                entities.append(ImageEntity(
+                    name=name.strip(),
+                    type='PROJECT',
+                    confidence=0.7
+                ))
+        
+        # 人名（中文）
+        name_pattern = r'([\u4e00-\u9fa5]{2,4})(?:先生|女士|总|经理|工程师|老师)'
+        for match in re.finditer(name_pattern, text):
+            entities.append(ImageEntity(
+                name=match.group(1),
+                type='PERSON',
+                confidence=0.8
+            ))
+        
+        # 技术术语
+        tech_keywords = ['K8s', 'Kubernetes', 'Docker', 'API', 'SDK', 'AI', 'ML', 
+                        'Python', 'Java', 'React', 'Vue', 'Node.js', '数据库', '服务器']
+        for keyword in tech_keywords:
+            if keyword in text:
+                entities.append(ImageEntity(
+                    name=keyword,
+                    type='TECH',
+                    confidence=0.9
+                ))
+        
+        # 去重
+        seen = set()
+        unique_entities = []
+        for e in entities:
+            key = (e.name.lower(), e.type)
+            if key not in seen:
+                seen.add(key)
+                unique_entities.append(e)
+        
+        return unique_entities
+    
+    def generate_description(self, image_type: str, ocr_text: str, 
+                            entities: List[ImageEntity]) -> str:
+        """
+        生成图片描述
+        
+        Args:
+            image_type: 图片类型
+            ocr_text: OCR文本
+            entities: 检测到的实体
+            
+        Returns:
+            图片描述
+        """
+        type_name = self.IMAGE_TYPES.get(image_type, '图片')
+        
+        description_parts = [f"这是一张{type_name}图片。"]
+        
+        if ocr_text:
+            # 提取前200字符作为摘要
+            text_preview = ocr_text[:200].replace('\n', ' ')
+            if len(ocr_text) > 200:
+                text_preview += "..."
+            description_parts.append(f"内容摘要：{text_preview}")
+        
+        if entities:
+            entity_names = [e.name for e in entities[:5]]  # 最多显示5个实体
+            description_parts.append(f"识别到的关键实体：{', '.join(entity_names)}")
+        
+        return " ".join(description_parts)
+    
+    def process_image(self, image_data: bytes, filename: str = None,
+                     image_id: str = None, detect_type: bool = True) -> ImageProcessingResult:
+        """
+        处理单张图片
+        
+        Args:
+            image_data: 图片二进制数据
+            filename: 文件名
+            image_id: 图片ID（可选）
+            detect_type: 是否自动检测图片类型
+            
+        Returns:
+            图片处理结果
+        """
+        image_id = image_id or str(uuid.uuid4())[:8]
+        
+        if not PIL_AVAILABLE:
+            return ImageProcessingResult(
+                image_id=image_id,
+                image_type='other',
+                ocr_text='',
+                description='PIL not available',
+                entities=[],
+                relations=[],
+                width=0,
+                height=0,
+                success=False,
+                error_message='PIL library not available'
+            )
+        
+        try:
+            # 加载图片
+            image = Image.open(io.BytesIO(image_data))
+            width, height = image.size
+            
+            # 执行OCR
+            ocr_text, ocr_confidence = self.perform_ocr(image)
+            
+            # 检测图片类型
+            image_type = 'other'
+            if detect_type:
+                image_type = self.detect_image_type(image, ocr_text)
+            
+            # 提取实体
+            entities = self.extract_entities_from_text(ocr_text)
+            
+            # 生成描述
+            description = self.generate_description(image_type, ocr_text, entities)
+            
+            # 提取关系（基于实体共现）
+            relations = self._extract_relations(entities, ocr_text)
+            
+            # 保存图片文件（可选）
+            if filename:
+                save_path = os.path.join(self.temp_dir, f"{image_id}_{filename}")
+                image.save(save_path)
+            
+            return ImageProcessingResult(
+                image_id=image_id,
+                image_type=image_type,
+                ocr_text=ocr_text,
+                description=description,
+                entities=entities,
+                relations=relations,
+                width=width,
+                height=height,
+                success=True
+            )
+        
+        except Exception as e:
+            return ImageProcessingResult(
+                image_id=image_id,
+                image_type='other',
+                ocr_text='',
+                description='',
+                entities=[],
+                relations=[],
+                width=0,
+                height=0,
+                success=False,
+                error_message=str(e)
+            )
+    
+    def _extract_relations(self, entities: List[ImageEntity], text: str) -> List[ImageRelation]:
+        """
+        从文本中提取实体关系
+        
+        Args:
+            entities: 实体列表
+            text: 文本内容
+            
+        Returns:
+            关系列表
+        """
+        relations = []
+        
+        if len(entities) < 2:
+            return relations
+        
+        # 简单的关系提取：如果两个实体在同一句子中出现，则认为它们相关
+        sentences = text.replace('。', '.').replace('！', '!').replace('？', '?').split('.')
+        
+        for sentence in sentences:
+            sentence_entities = []
+            for entity in entities:
+                if entity.name in sentence:
+                    sentence_entities.append(entity)
+            
+            # 如果句子中有多个实体，建立关系
+            if len(sentence_entities) >= 2:
+                for i in range(len(sentence_entities)):
+                    for j in range(i + 1, len(sentence_entities)):
+                        relations.append(ImageRelation(
+                            source=sentence_entities[i].name,
+                            target=sentence_entities[j].name,
+                            relation_type='related',
+                            confidence=0.5
+                        ))
+        
+        return relations
+    
+    def process_batch(self, images_data: List[Tuple[bytes, str]], 
+                     project_id: str = None) -> BatchProcessingResult:
+        """
+        批量处理图片
+        
+        Args:
+            images_data: 图片数据列表，每项为 (image_data, filename)
+            project_id: 项目ID
+            
+        Returns:
+            批量处理结果
+        """
+        results = []
+        success_count = 0
+        failed_count = 0
+        
+        for image_data, filename in images_data:
+            result = self.process_image(image_data, filename)
+            results.append(result)
+            
+            if result.success:
+                success_count += 1
+            else:
+                failed_count += 1
+        
+        return BatchProcessingResult(
+            results=results,
+            total_count=len(results),
+            success_count=success_count,
+            failed_count=failed_count
+        )
+    
+    def image_to_base64(self, image_data: bytes) -> str:
+        """
+        将图片转换为base64编码
+        
+        Args:
+            image_data: 图片二进制数据
+            
+        Returns:
+            base64编码的字符串
+        """
+        return base64.b64encode(image_data).decode('utf-8')
+    
+    def get_image_thumbnail(self, image_data: bytes, size: Tuple[int, int] = (200, 200)) -> bytes:
+        """
+        生成图片缩略图
+        
+        Args:
+            image_data: 图片二进制数据
+            size: 缩略图尺寸
+            
+        Returns:
+            缩略图二进制数据
+        """
+        if not PIL_AVAILABLE:
+            return image_data
+        
+        try:
+            image = Image.open(io.BytesIO(image_data))
+            image.thumbnail(size, Image.Resampling.LANCZOS)
+            
+            buffer = io.BytesIO()
+            image.save(buffer, format='JPEG')
+            return buffer.getvalue()
+        except Exception as e:
+            print(f"Thumbnail generation error: {e}")
+            return image_data
+
+
+# Singleton instance
+_image_processor = None
+
+def get_image_processor(temp_dir: str = None) -> ImageProcessor:
+    """获取图片处理器单例"""
+    global _image_processor
+    if _image_processor is None:
+        _image_processor = ImageProcessor(temp_dir)
+    return _image_processor
--- a/backend/main.py
+++ b/backend/main.py
--- a/backend/multimodal_entity_linker.py
+++ b/backend/multimodal_entity_linker.py
@@ -0,0 +1,514 @@
+#!/usr/bin/env python3
+"""
+InsightFlow Multimodal Entity Linker - Phase 7
+多模态实体关联模块：跨模态实体对齐和知识融合
+"""
+
+import os
+import json
+import uuid
+from typing import List, Dict, Optional, Tuple, Set
+from dataclasses import dataclass
+from difflib import SequenceMatcher
+
+# 尝试导入embedding库
+try:
+    import numpy as np
+    NUMPY_AVAILABLE = True
+except ImportError:
+    NUMPY_AVAILABLE = False
+
+
+@dataclass
+class MultimodalEntity:
+    """多模态实体"""
+    id: str
+    entity_id: str
+    project_id: str
+    name: str
+    source_type: str  # audio, video, image, document
+    source_id: str
+    mention_context: str
+    confidence: float
+    modality_features: Dict = None  # 模态特定特征
+    
+    def __post_init__(self):
+        if self.modality_features is None:
+            self.modality_features = {}
+
+
+@dataclass
+class EntityLink:
+    """实体关联"""
+    id: str
+    project_id: str
+    source_entity_id: str
+    target_entity_id: str
+    link_type: str  # same_as, related_to, part_of
+    source_modality: str
+    target_modality: str
+    confidence: float
+    evidence: str
+
+
+@dataclass
+class AlignmentResult:
+    """对齐结果"""
+    entity_id: str
+    matched_entity_id: Optional[str]
+    similarity: float
+    match_type: str  # exact, fuzzy, embedding
+    confidence: float
+
+
+@dataclass
+class FusionResult:
+    """知识融合结果"""
+    canonical_entity_id: str
+    merged_entity_ids: List[str]
+    fused_properties: Dict
+    source_modalities: List[str]
+    confidence: float
+
+
+class MultimodalEntityLinker:
+    """多模态实体关联器 - 跨模态实体对齐和知识融合"""
+    
+    # 关联类型
+    LINK_TYPES = {
+        'same_as': '同一实体',
+        'related_to': '相关实体',
+        'part_of': '组成部分',
+        'mentions': '提及关系'
+    }
+    
+    # 模态类型
+    MODALITIES = ['audio', 'video', 'image', 'document']
+    
+    def __init__(self, similarity_threshold: float = 0.85):
+        """
+        初始化多模态实体关联器
+        
+        Args:
+            similarity_threshold: 相似度阈值
+        """
+        self.similarity_threshold = similarity_threshold
+    
+    def calculate_string_similarity(self, s1: str, s2: str) -> float:
+        """
+        计算字符串相似度
+        
+        Args:
+            s1: 字符串1
+            s2: 字符串2
+            
+        Returns:
+            相似度分数 (0-1)
+        """
+        if not s1 or not s2:
+            return 0.0
+        
+        s1, s2 = s1.lower().strip(), s2.lower().strip()
+        
+        # 完全匹配
+        if s1 == s2:
+            return 1.0
+        
+        # 包含关系
+        if s1 in s2 or s2 in s1:
+            return 0.9
+        
+        # 编辑距离相似度
+        return SequenceMatcher(None, s1, s2).ratio()
+    
+    def calculate_entity_similarity(self, entity1: Dict, entity2: Dict) -> Tuple[float, str]:
+        """
+        计算两个实体的综合相似度
+        
+        Args:
+            entity1: 实体1信息
+            entity2: 实体2信息
+            
+        Returns:
+            (相似度, 匹配类型)
+        """
+        # 名称相似度
+        name_sim = self.calculate_string_similarity(
+            entity1.get('name', ''),
+            entity2.get('name', '')
+        )
+        
+        # 如果名称完全匹配
+        if name_sim == 1.0:
+            return 1.0, 'exact'
+        
+        # 检查别名
+        aliases1 = set(a.lower() for a in entity1.get('aliases', []))
+        aliases2 = set(a.lower() for a in entity2.get('aliases', []))
+        
+        if aliases1 & aliases2:  # 有共同别名
+            return 0.95, 'alias_match'
+        
+        if entity2.get('name', '').lower() in aliases1:
+            return 0.95, 'alias_match'
+        if entity1.get('name', '').lower() in aliases2:
+            return 0.95, 'alias_match'
+        
+        # 定义相似度
+        def_sim = self.calculate_string_similarity(
+            entity1.get('definition', ''),
+            entity2.get('definition', '')
+        )
+        
+        # 综合相似度
+        combined_sim = name_sim * 0.7 + def_sim * 0.3
+        
+        if combined_sim >= self.similarity_threshold:
+            return combined_sim, 'fuzzy'
+        
+        return combined_sim, 'none'
+    
+    def find_matching_entity(self, query_entity: Dict, 
+                            candidate_entities: List[Dict],
+                            exclude_ids: Set[str] = None) -> Optional[AlignmentResult]:
+        """
+        在候选实体中查找匹配的实体
+        
+        Args:
+            query_entity: 查询实体
+            candidate_entities: 候选实体列表
+            exclude_ids: 排除的实体ID
+            
+        Returns:
+            对齐结果
+        """
+        exclude_ids = exclude_ids or set()
+        best_match = None
+        best_similarity = 0.0
+        
+        for candidate in candidate_entities:
+            if candidate.get('id') in exclude_ids:
+                continue
+            
+            similarity, match_type = self.calculate_entity_similarity(
+                query_entity, candidate
+            )
+            
+            if similarity > best_similarity and similarity >= self.similarity_threshold:
+                best_similarity = similarity
+                best_match = candidate
+                best_match_type = match_type
+        
+        if best_match:
+            return AlignmentResult(
+                entity_id=query_entity.get('id'),
+                matched_entity_id=best_match.get('id'),
+                similarity=best_similarity,
+                match_type=best_match_type,
+                confidence=best_similarity
+            )
+        
+        return None
+    
+    def align_cross_modal_entities(self, project_id: str,
+                                    audio_entities: List[Dict],
+                                    video_entities: List[Dict],
+                                    image_entities: List[Dict],
+                                    document_entities: List[Dict]) -> List[EntityLink]:
+        """
+        跨模态实体对齐
+        
+        Args:
+            project_id: 项目ID
+            audio_entities: 音频模态实体
+            video_entities: 视频模态实体
+            image_entities: 图片模态实体
+            document_entities: 文档模态实体
+            
+        Returns:
+            实体关联列表
+        """
+        links = []
+        
+        # 合并所有实体
+        all_entities = {
+            'audio': audio_entities,
+            'video': video_entities,
+            'image': image_entities,
+            'document': document_entities
+        }
+        
+        # 跨模态对齐
+        for mod1 in self.MODALITIES:
+            for mod2 in self.MODALITIES:
+                if mod1 >= mod2:  # 避免重复比较
+                    continue
+                
+                entities1 = all_entities.get(mod1, [])
+                entities2 = all_entities.get(mod2, [])
+                
+                for ent1 in entities1:
+                    # 在另一个模态中查找匹配
+                    result = self.find_matching_entity(ent1, entities2)
+                    
+                    if result and result.matched_entity_id:
+                        link = EntityLink(
+                            id=str(uuid.uuid4())[:8],
+                            project_id=project_id,
+                            source_entity_id=ent1.get('id'),
+                            target_entity_id=result.matched_entity_id,
+                            link_type='same_as' if result.similarity > 0.95 else 'related_to',
+                            source_modality=mod1,
+                            target_modality=mod2,
+                            confidence=result.confidence,
+                            evidence=f"Cross-modal alignment: {result.match_type}"
+                        )
+                        links.append(link)
+        
+        return links
+    
+    def fuse_entity_knowledge(self, entity_id: str,
+                              linked_entities: List[Dict],
+                              multimodal_mentions: List[Dict]) -> FusionResult:
+        """
+        融合多模态实体知识
+        
+        Args:
+            entity_id: 主实体ID
+            linked_entities: 关联的实体信息列表
+            multimodal_mentions: 多模态提及列表
+            
+        Returns:
+            融合结果
+        """
+        # 收集所有属性
+        fused_properties = {
+            'names': set(),
+            'definitions': [],
+            'aliases': set(),
+            'types': set(),
+            'modalities': set(),
+            'contexts': []
+        }
+        
+        merged_ids = []
+        
+        for entity in linked_entities:
+            merged_ids.append(entity.get('id'))
+            
+            # 收集名称
+            fused_properties['names'].add(entity.get('name', ''))
+            
+            # 收集定义
+            if entity.get('definition'):
+                fused_properties['definitions'].append(entity.get('definition'))
+            
+            # 收集别名
+            fused_properties['aliases'].update(entity.get('aliases', []))
+            
+            # 收集类型
+            fused_properties['types'].add(entity.get('type', 'OTHER'))
+        
+        # 收集模态和上下文
+        for mention in multimodal_mentions:
+            fused_properties['modalities'].add(mention.get('source_type', ''))
+            if mention.get('mention_context'):
+                fused_properties['contexts'].append(mention.get('mention_context'))
+        
+        # 选择最佳定义（最长的那个）
+        best_definition = max(fused_properties['definitions'], key=len) \
+                         if fused_properties['definitions'] else ""
+        
+        # 选择最佳名称（最常见的那个）
+        from collections import Counter
+        name_counts = Counter(fused_properties['names'])
+        best_name = name_counts.most_common(1)[0][0] if name_counts else ""
+        
+        # 构建融合结果
+        return FusionResult(
+            canonical_entity_id=entity_id,
+            merged_entity_ids=merged_ids,
+            fused_properties={
+                'name': best_name,
+                'definition': best_definition,
+                'aliases': list(fused_properties['aliases']),
+                'types': list(fused_properties['types']),
+                'modalities': list(fused_properties['modalities']),
+                'contexts': fused_properties['contexts'][:10]  # 最多10个上下文
+            },
+            source_modalities=list(fused_properties['modalities']),
+            confidence=min(1.0, len(linked_entities) * 0.2 + 0.5)
+        )
+    
+    def detect_entity_conflicts(self, entities: List[Dict]) -> List[Dict]:
+        """
+        检测实体冲突（同名但不同义）
+        
+        Args:
+            entities: 实体列表
+            
+        Returns:
+            冲突列表
+        """
+        conflicts = []
+        
+        # 按名称分组
+        name_groups = {}
+        for entity in entities:
+            name = entity.get('name', '').lower()
+            if name:
+                if name not in name_groups:
+                    name_groups[name] = []
+                name_groups[name].append(entity)
+        
+        # 检测同名但定义不同的实体
+        for name, group in name_groups.items():
+            if len(group) > 1:
+                # 检查定义是否相似
+                definitions = [e.get('definition', '') for e in group if e.get('definition')]
+                
+                if len(definitions) > 1:
+                    # 计算定义之间的相似度
+                    sim_matrix = []
+                    for i, d1 in enumerate(definitions):
+                        for j, d2 in enumerate(definitions):
+                            if i < j:
+                                sim = self.calculate_string_similarity(d1, d2)
+                                sim_matrix.append(sim)
+                    
+                    # 如果定义相似度都很低，可能是冲突
+                    if sim_matrix and all(s < 0.5 for s in sim_matrix):
+                        conflicts.append({
+                            'name': name,
+                            'entities': group,
+                            'type': 'homonym_conflict',
+                            'suggestion': 'Consider disambiguating these entities'
+                        })
+        
+        return conflicts
+    
+    def suggest_entity_merges(self, entities: List[Dict],
+                              existing_links: List[EntityLink] = None) -> List[Dict]:
+        """
+        建议实体合并
+        
+        Args:
+            entities: 实体列表
+            existing_links: 现有实体关联
+            
+        Returns:
+            合并建议列表
+        """
+        suggestions = []
+        existing_pairs = set()
+        
+        # 记录已有的关联
+        if existing_links:
+            for link in existing_links:
+                pair = tuple(sorted([link.source_entity_id, link.target_entity_id]))
+                existing_pairs.add(pair)
+        
+        # 检查所有实体对
+        for i, ent1 in enumerate(entities):
+            for j, ent2 in enumerate(entities):
+                if i >= j:
+                    continue
+                
+                # 检查是否已有关联
+                pair = tuple(sorted([ent1.get('id'), ent2.get('id')]))
+                if pair in existing_pairs:
+                    continue
+                
+                # 计算相似度
+                similarity, match_type = self.calculate_entity_similarity(ent1, ent2)
+                
+                if similarity >= self.similarity_threshold:
+                    suggestions.append({
+                        'entity1': ent1,
+                        'entity2': ent2,
+                        'similarity': similarity,
+                        'match_type': match_type,
+                        'suggested_action': 'merge' if similarity > 0.95 else 'link'
+                    })
+        
+        # 按相似度排序
+        suggestions.sort(key=lambda x: x['similarity'], reverse=True)
+        
+        return suggestions
+    
+    def create_multimodal_entity_record(self, project_id: str,
+                                        entity_id: str,
+                                        source_type: str,
+                                        source_id: str,
+                                        mention_context: str = "",
+                                        confidence: float = 1.0) -> MultimodalEntity:
+        """
+        创建多模态实体记录
+        
+        Args:
+            project_id: 项目ID
+            entity_id: 实体ID
+            source_type: 来源类型
+            source_id: 来源ID
+            mention_context: 提及上下文
+            confidence: 置信度
+            
+        Returns:
+            多模态实体记录
+        """
+        return MultimodalEntity(
+            id=str(uuid.uuid4())[:8],
+            entity_id=entity_id,
+            project_id=project_id,
+            name="",  # 将在后续填充
+            source_type=source_type,
+            source_id=source_id,
+            mention_context=mention_context,
+            confidence=confidence
+        )
+    
+    def analyze_modality_distribution(self, multimodal_entities: List[MultimodalEntity]) -> Dict:
+        """
+        分析模态分布
+        
+        Args:
+            multimodal_entities: 多模态实体列表
+            
+        Returns:
+            模态分布统计
+        """
+        distribution = {mod: 0 for mod in self.MODALITIES}
+        cross_modal_entities = set()
+        
+        # 统计每个模态的实体数
+        for me in multimodal_entities:
+            if me.source_type in distribution:
+                distribution[me.source_type] += 1
+        
+        # 统计跨模态实体
+        entity_modalities = {}
+        for me in multimodal_entities:
+            if me.entity_id not in entity_modalities:
+                entity_modalities[me.entity_id] = set()
+            entity_modalities[me.entity_id].add(me.source_type)
+        
+        cross_modal_count = sum(1 for mods in entity_modalities.values() if len(mods) > 1)
+        
+        return {
+            'modality_distribution': distribution,
+            'total_multimodal_records': len(multimodal_entities),
+            'unique_entities': len(entity_modalities),
+            'cross_modal_entities': cross_modal_count,
+            'cross_modal_ratio': cross_modal_count / len(entity_modalities) if entity_modalities else 0
+        }
+
+
+# Singleton instance
+_multimodal_entity_linker = None
+
+def get_multimodal_entity_linker(similarity_threshold: float = 0.85) -> MultimodalEntityLinker:
+    """获取多模态实体关联器单例"""
+    global _multimodal_entity_linker
+    if _multimodal_entity_linker is None:
+        _multimodal_entity_linker = MultimodalEntityLinker(similarity_threshold)
+    return _multimodal_entity_linker
--- a/backend/multimodal_processor.py
+++ b/backend/multimodal_processor.py
@@ -0,0 +1,434 @@
+#!/usr/bin/env python3
+"""
+InsightFlow Multimodal Processor - Phase 7
+视频处理模块：提取音频、关键帧、OCR识别
+"""
+
+import os
+import json
+import uuid
+import tempfile
+import subprocess
+from typing import List, Dict, Optional, Tuple
+from dataclasses import dataclass
+from pathlib import Path
+
+# 尝试导入OCR库
+try:
+    import pytesseract
+    from PIL import Image
+    PYTESSERACT_AVAILABLE = True
+except ImportError:
+    PYTESSERACT_AVAILABLE = False
+
+try:
+    import cv2
+    CV2_AVAILABLE = True
+except ImportError:
+    CV2_AVAILABLE = False
+
+try:
+    import ffmpeg
+    FFMPEG_AVAILABLE = True
+except ImportError:
+    FFMPEG_AVAILABLE = False
+
+
+@dataclass
+class VideoFrame:
+    """视频关键帧数据类"""
+    id: str
+    video_id: str
+    frame_number: int
+    timestamp: float
+    frame_path: str
+    ocr_text: str = ""
+    ocr_confidence: float = 0.0
+    entities_detected: List[Dict] = None
+    
+    def __post_init__(self):
+        if self.entities_detected is None:
+            self.entities_detected = []
+
+
+@dataclass
+class VideoInfo:
+    """视频信息数据类"""
+    id: str
+    project_id: str
+    filename: str
+    file_path: str
+    duration: float = 0.0
+    width: int = 0
+    height: int = 0
+    fps: float = 0.0
+    audio_extracted: bool = False
+    audio_path: str = ""
+    transcript_id: str = ""
+    status: str = "pending"
+    error_message: str = ""
+    metadata: Dict = None
+    
+    def __post_init__(self):
+        if self.metadata is None:
+            self.metadata = {}
+
+
+@dataclass
+class VideoProcessingResult:
+    """视频处理结果"""
+    video_id: str
+    audio_path: str
+    frames: List[VideoFrame]
+    ocr_results: List[Dict]
+    full_text: str  # 整合的文本（音频转录 + OCR文本）
+    success: bool
+    error_message: str = ""
+
+
+class MultimodalProcessor:
+    """多模态处理器 - 处理视频文件"""
+    
+    def __init__(self, temp_dir: str = None, frame_interval: int = 5):
+        """
+        初始化多模态处理器
+        
+        Args:
+            temp_dir: 临时文件目录
+            frame_interval: 关键帧提取间隔（秒）
+        """
+        self.temp_dir = temp_dir or tempfile.gettempdir()
+        self.frame_interval = frame_interval
+        self.video_dir = os.path.join(self.temp_dir, "videos")
+        self.frames_dir = os.path.join(self.temp_dir, "frames")
+        self.audio_dir = os.path.join(self.temp_dir, "audio")
+        
+        # 创建目录
+        os.makedirs(self.video_dir, exist_ok=True)
+        os.makedirs(self.frames_dir, exist_ok=True)
+        os.makedirs(self.audio_dir, exist_ok=True)
+    
+    def extract_video_info(self, video_path: str) -> Dict:
+        """
+        提取视频基本信息
+        
+        Args:
+            video_path: 视频文件路径
+            
+        Returns:
+            视频信息字典
+        """
+        try:
+            if FFMPEG_AVAILABLE:
+                probe = ffmpeg.probe(video_path)
+                video_stream = next((s for s in probe['streams'] if s['codec_type'] == 'video'), None)
+                audio_stream = next((s for s in probe['streams'] if s['codec_type'] == 'audio'), None)
+                
+                if video_stream:
+                    return {
+                        'duration': float(probe['format'].get('duration', 0)),
+                        'width': int(video_stream.get('width', 0)),
+                        'height': int(video_stream.get('height', 0)),
+                        'fps': eval(video_stream.get('r_frame_rate', '0/1')),
+                        'has_audio': audio_stream is not None,
+                        'bitrate': int(probe['format'].get('bit_rate', 0))
+                    }
+            else:
+                # 使用 ffprobe 命令行
+                cmd = [
+                    'ffprobe', '-v', 'error', '-show_entries',
+                    'format=duration,bit_rate', '-show_entries',
+                    'stream=width,height,r_frame_rate', '-of', 'json',
+                    video_path
+                ]
+                result = subprocess.run(cmd, capture_output=True, text=True)
+                if result.returncode == 0:
+                    data = json.loads(result.stdout)
+                    return {
+                        'duration': float(data['format'].get('duration', 0)),
+                        'width': int(data['streams'][0].get('width', 0)) if data['streams'] else 0,
+                        'height': int(data['streams'][0].get('height', 0)) if data['streams'] else 0,
+                        'fps': 30.0,  # 默认值
+                        'has_audio': len(data['streams']) > 1,
+                        'bitrate': int(data['format'].get('bit_rate', 0))
+                    }
+        except Exception as e:
+            print(f"Error extracting video info: {e}")
+        
+        return {
+            'duration': 0,
+            'width': 0,
+            'height': 0,
+            'fps': 0,
+            'has_audio': False,
+            'bitrate': 0
+        }
+    
+    def extract_audio(self, video_path: str, output_path: str = None) -> str:
+        """
+        从视频中提取音频
+        
+        Args:
+            video_path: 视频文件路径
+            output_path: 输出音频路径（可选）
+            
+        Returns:
+            提取的音频文件路径
+        """
+        if output_path is None:
+            video_name = Path(video_path).stem
+            output_path = os.path.join(self.audio_dir, f"{video_name}.wav")
+        
+        try:
+            if FFMPEG_AVAILABLE:
+                (
+                    ffmpeg
+                    .input(video_path)
+                    .output(output_path, ac=1, ar=16000, vn=None)
+                    .overwrite_output()
+                    .run(quiet=True)
+                )
+            else:
+                # 使用命令行 ffmpeg
+                cmd = [
+                    'ffmpeg', '-i', video_path,
+                    '-vn', '-acodec', 'pcm_s16le',
+                    '-ac', '1', '-ar', '16000',
+                    '-y', output_path
+                ]
+                subprocess.run(cmd, check=True, capture_output=True)
+            
+            return output_path
+        except Exception as e:
+            print(f"Error extracting audio: {e}")
+            raise
+    
+    def extract_keyframes(self, video_path: str, video_id: str, 
+                         interval: int = None) -> List[str]:
+        """
+        从视频中提取关键帧
+        
+        Args:
+            video_path: 视频文件路径
+            video_id: 视频ID
+            interval: 提取间隔（秒），默认使用初始化时的间隔
+            
+        Returns:
+            提取的帧文件路径列表
+        """
+        interval = interval or self.frame_interval
+        frame_paths = []
+        
+        # 创建帧存储目录
+        video_frames_dir = os.path.join(self.frames_dir, video_id)
+        os.makedirs(video_frames_dir, exist_ok=True)
+        
+        try:
+            if CV2_AVAILABLE:
+                # 使用 OpenCV 提取帧
+                cap = cv2.VideoCapture(video_path)
+                fps = cap.get(cv2.CAP_PROP_FPS)
+                total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+                
+                frame_interval_frames = int(fps * interval)
+                frame_number = 0
+                
+                while True:
+                    ret, frame = cap.read()
+                    if not ret:
+                        break
+                    
+                    if frame_number % frame_interval_frames == 0:
+                        timestamp = frame_number / fps
+                        frame_path = os.path.join(
+                            video_frames_dir, 
+                            f"frame_{frame_number:06d}_{timestamp:.2f}.jpg"
+                        )
+                        cv2.imwrite(frame_path, frame)
+                        frame_paths.append(frame_path)
+                    
+                    frame_number += 1
+                
+                cap.release()
+            else:
+                # 使用 ffmpeg 命令行提取帧
+                video_name = Path(video_path).stem
+                output_pattern = os.path.join(video_frames_dir, "frame_%06d_%t.jpg")
+                
+                cmd = [
+                    'ffmpeg', '-i', video_path,
+                    '-vf', f'fps=1/{interval}',
+                    '-frame_pts', '1',
+                    '-y', output_pattern
+                ]
+                subprocess.run(cmd, check=True, capture_output=True)
+                
+                # 获取生成的帧文件列表
+                frame_paths = sorted([
+                    os.path.join(video_frames_dir, f)
+                    for f in os.listdir(video_frames_dir)
+                    if f.startswith('frame_')
+                ])
+        except Exception as e:
+            print(f"Error extracting keyframes: {e}")
+        
+        return frame_paths
+    
+    def perform_ocr(self, image_path: str) -> Tuple[str, float]:
+        """
+        对图片进行OCR识别
+        
+        Args:
+            image_path: 图片文件路径
+            
+        Returns:
+            (识别的文本, 置信度)
+        """
+        if not PYTESSERACT_AVAILABLE:
+            return "", 0.0
+        
+        try:
+            image = Image.open(image_path)
+            
+            # 预处理：转换为灰度图
+            if image.mode != 'L':
+                image = image.convert('L')
+            
+            # 使用 pytesseract 进行 OCR
+            text = pytesseract.image_to_string(image, lang='chi_sim+eng')
+            
+            # 获取置信度数据
+            data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
+            confidences = [int(c) for c in data['conf'] if int(c) > 0]
+            avg_confidence = sum(confidences) / len(confidences) if confidences else 0
+            
+            return text.strip(), avg_confidence / 100.0
+        except Exception as e:
+            print(f"OCR error for {image_path}: {e}")
+            return "", 0.0
+    
+    def process_video(self, video_data: bytes, filename: str, 
+                     project_id: str, video_id: str = None) -> VideoProcessingResult:
+        """
+        处理视频文件：提取音频、关键帧、OCR
+        
+        Args:
+            video_data: 视频文件二进制数据
+            filename: 视频文件名
+            project_id: 项目ID
+            video_id: 视频ID（可选，自动生成）
+            
+        Returns:
+            视频处理结果
+        """
+        video_id = video_id or str(uuid.uuid4())[:8]
+        
+        try:
+            # 保存视频文件
+            video_path = os.path.join(self.video_dir, f"{video_id}_{filename}")
+            with open(video_path, 'wb') as f:
+                f.write(video_data)
+            
+            # 提取视频信息
+            video_info = self.extract_video_info(video_path)
+            
+            # 提取音频
+            audio_path = ""
+            if video_info['has_audio']:
+                audio_path = self.extract_audio(video_path)
+            
+            # 提取关键帧
+            frame_paths = self.extract_keyframes(video_path, video_id)
+            
+            # 对关键帧进行 OCR
+            frames = []
+            ocr_results = []
+            all_ocr_text = []
+            
+            for i, frame_path in enumerate(frame_paths):
+                # 解析帧信息
+                frame_name = os.path.basename(frame_path)
+                parts = frame_name.replace('.jpg', '').split('_')
+                frame_number = int(parts[1]) if len(parts) > 1 else i
+                timestamp = float(parts[2]) if len(parts) > 2 else i * self.frame_interval
+                
+                # OCR 识别
+                ocr_text, confidence = self.perform_ocr(frame_path)
+                
+                frame = VideoFrame(
+                    id=str(uuid.uuid4())[:8],
+                    video_id=video_id,
+                    frame_number=frame_number,
+                    timestamp=timestamp,
+                    frame_path=frame_path,
+                    ocr_text=ocr_text,
+                    ocr_confidence=confidence
+                )
+                frames.append(frame)
+                
+                if ocr_text:
+                    ocr_results.append({
+                        'frame_number': frame_number,
+                        'timestamp': timestamp,
+                        'text': ocr_text,
+                        'confidence': confidence
+                    })
+                    all_ocr_text.append(ocr_text)
+            
+            # 整合所有 OCR 文本
+            full_ocr_text = "\n\n".join(all_ocr_text)
+            
+            return VideoProcessingResult(
+                video_id=video_id,
+                audio_path=audio_path,
+                frames=frames,
+                ocr_results=ocr_results,
+                full_text=full_ocr_text,
+                success=True
+            )
+        
+        except Exception as e:
+            return VideoProcessingResult(
+                video_id=video_id,
+                audio_path="",
+                frames=[],
+                ocr_results=[],
+                full_text="",
+                success=False,
+                error_message=str(e)
+            )
+    
+    def cleanup(self, video_id: str = None):
+        """
+        清理临时文件
+        
+        Args:
+            video_id: 视频ID（可选，清理特定视频的文件）
+        """
+        import shutil
+        
+        if video_id:
+            # 清理特定视频的文件
+            for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]:
+                target_dir = os.path.join(dir_path, video_id) if dir_path == self.frames_dir else dir_path
+                if os.path.exists(target_dir):
+                    for f in os.listdir(target_dir):
+                        if video_id in f:
+                            os.remove(os.path.join(target_dir, f))
+        else:
+            # 清理所有临时文件
+            for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]:
+                if os.path.exists(dir_path):
+                    shutil.rmtree(dir_path)
+                    os.makedirs(dir_path, exist_ok=True)
+
+
+# Singleton instance
+_multimodal_processor = None
+
+def get_multimodal_processor(temp_dir: str = None, frame_interval: int = 5) -> MultimodalProcessor:
+    """获取多模态处理器单例"""
+    global _multimodal_processor
+    if _multimodal_processor is None:
+        _multimodal_processor = MultimodalProcessor(temp_dir, frame_interval)
+    return _multimodal_processor
--- a/backend/plugin_manager.py
+++ b/backend/plugin_manager.py
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -36,3 +36,17 @@ fastapi-offline-swagger==0.1.0

 # Phase 7: Workflow Automation
 apscheduler==3.10.4
+
+# Phase 7: Multimodal Support
+ffmpeg-python==0.2.0
+pillow==10.2.0
+opencv-python==4.9.0.80
+pytesseract==0.3.10
+
+# Phase 7 Task 7: Plugin & Integration
+webdav4==0.9.8
+urllib3==2.2.0
+
+# Phase 7: Plugin & Integration
+beautifulsoup4==4.12.3
+webdavclient3==3.14.6
--- a/backend/schema.sql
+++ b/backend/schema.sql
@@ -222,3 +222,320 @@ CREATE INDEX IF NOT EXISTS idx_workflow_logs_workflow ON workflow_logs(workflow_
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_task ON workflow_logs(task_id);
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_status ON workflow_logs(status);
 CREATE INDEX IF NOT EXISTS idx_workflow_logs_created ON workflow_logs(created_at);
+
+-- Phase 7: 多模态支持相关表
+
+-- 视频表
+CREATE TABLE IF NOT EXISTS videos (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    filename TEXT NOT NULL,
+    duration REAL,  -- 视频时长（秒）
+    fps REAL,  -- 帧率
+    resolution TEXT,  -- JSON: {"width": int, "height": int}
+    audio_transcript_id TEXT,  -- 关联的音频转录ID
+    full_ocr_text TEXT,  -- 所有帧OCR文本合并
+    extracted_entities TEXT,  -- JSON: 提取的实体列表
+    extracted_relations TEXT,  -- JSON: 提取的关系列表
+    status TEXT DEFAULT 'processing',  -- processing, completed, failed
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (audio_transcript_id) REFERENCES transcripts(id)
+);
+
+-- 视频关键帧表
+CREATE TABLE IF NOT EXISTS video_frames (
+    id TEXT PRIMARY KEY,
+    video_id TEXT NOT NULL,
+    frame_number INTEGER,
+    timestamp REAL,  -- 时间戳（秒）
+    image_data BLOB,  -- 帧图片数据（可选，可存储在OSS）
+    image_url TEXT,  -- 图片URL（如果存储在OSS）
+    ocr_text TEXT,  -- OCR识别文本
+    extracted_entities TEXT,  -- JSON: 该帧提取的实体
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
+);
+
+-- 图片表
+CREATE TABLE IF NOT EXISTS images (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    filename TEXT NOT NULL,
+    image_data BLOB,  -- 图片数据（可选）
+    image_url TEXT,  -- 图片URL
+    ocr_text TEXT,  -- OCR识别文本
+    description TEXT,  -- 图片描述（LLM生成）
+    extracted_entities TEXT,  -- JSON: 提取的实体列表
+    extracted_relations TEXT,  -- JSON: 提取的关系列表
+    status TEXT DEFAULT 'processing',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- 多模态实体提及表
+CREATE TABLE IF NOT EXISTS multimodal_mentions (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    entity_id TEXT NOT NULL,
+    modality TEXT NOT NULL,  -- audio, video, image, document
+    source_id TEXT NOT NULL,  -- transcript_id, video_id, image_id
+    source_type TEXT NOT NULL,  -- 来源类型
+    position TEXT,  -- JSON: 位置信息
+    text_snippet TEXT,  -- 提及的文本片段
+    confidence REAL DEFAULT 1.0,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE
+);
+
+-- 多模态实体关联表
+CREATE TABLE IF NOT EXISTS multimodal_entity_links (
+    id TEXT PRIMARY KEY,
+    entity_id TEXT NOT NULL,
+    linked_entity_id TEXT NOT NULL,  -- 关联的实体ID
+    link_type TEXT NOT NULL,  -- same_as, related_to, part_of
+    confidence REAL DEFAULT 1.0,
+    evidence TEXT,  -- 关联证据
+    modalities TEXT,  -- JSON: 涉及的模态列表
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE,
+    FOREIGN KEY (linked_entity_id) REFERENCES entities(id) ON DELETE CASCADE
+);
+
+-- 多模态相关索引
+CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id);
+CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status);
+CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id);
+CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id);
+CREATE INDEX IF NOT EXISTS idx_images_status ON images(status);
+CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_project ON multimodal_mentions(project_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_entity ON multimodal_mentions(entity_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_modality ON multimodal_mentions(modality);
+CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_source ON multimodal_mentions(source_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_links_entity ON multimodal_entity_links(entity_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_links_linked ON multimodal_entity_links(linked_entity_id);
+
+-- Phase 7 Task 7: 插件与集成相关表
+
+-- 插件配置表
+CREATE TABLE IF NOT EXISTS plugins (
+    id TEXT PRIMARY KEY,
+    name TEXT NOT NULL,
+    plugin_type TEXT NOT NULL,  -- chrome_extension, feishu_bot, dingtalk_bot, zapier, make, webdav, custom
+    project_id TEXT,
+    status TEXT DEFAULT 'active',  -- active, inactive, error, pending
+    config TEXT,  -- JSON: plugin specific configuration
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_used_at TIMESTAMP,
+    use_count INTEGER DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- 插件详细配置表
+CREATE TABLE IF NOT EXISTS plugin_configs (
+    id TEXT PRIMARY KEY,
+    plugin_id TEXT NOT NULL,
+    config_key TEXT NOT NULL,
+    config_value TEXT,
+    is_encrypted BOOLEAN DEFAULT 0,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
+    UNIQUE(plugin_id, config_key)
+);
+
+-- 机器人会话表
+CREATE TABLE IF NOT EXISTS bot_sessions (
+    id TEXT PRIMARY KEY,
+    bot_type TEXT NOT NULL,  -- feishu, dingtalk
+    session_id TEXT NOT NULL,  -- 群ID或会话ID
+    session_name TEXT NOT NULL,
+    project_id TEXT,
+    webhook_url TEXT,
+    secret TEXT,  -- 签名密钥
+    is_active BOOLEAN DEFAULT 1,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_message_at TIMESTAMP,
+    message_count INTEGER DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- Webhook 端点表（Zapier/Make集成）
+CREATE TABLE IF NOT EXISTS webhook_endpoints (
+    id TEXT PRIMARY KEY,
+    name TEXT NOT NULL,
+    endpoint_type TEXT NOT NULL,  -- zapier, make, custom
+    endpoint_url TEXT NOT NULL,
+    project_id TEXT,
+    auth_type TEXT DEFAULT 'none',  -- none, api_key, oauth, custom
+    auth_config TEXT,  -- JSON: authentication configuration
+    trigger_events TEXT,  -- JSON array: events that trigger this webhook
+    is_active BOOLEAN DEFAULT 1,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_triggered_at TIMESTAMP,
+    trigger_count INTEGER DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- WebDAV 同步配置表
+CREATE TABLE IF NOT EXISTS webdav_syncs (
+    id TEXT PRIMARY KEY,
+    name TEXT NOT NULL,
+    project_id TEXT NOT NULL,
+    server_url TEXT NOT NULL,
+    username TEXT NOT NULL,
+    password TEXT NOT NULL,  -- 建议加密存储
+    remote_path TEXT DEFAULT '/insightflow',
+    sync_mode TEXT DEFAULT 'bidirectional',  -- bidirectional, upload_only, download_only
+    sync_interval INTEGER DEFAULT 3600,  -- 秒
+    last_sync_at TIMESTAMP,
+    last_sync_status TEXT DEFAULT 'pending',  -- pending, success, failed
+    last_sync_error TEXT,
+    is_active BOOLEAN DEFAULT 1,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    sync_count INTEGER DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- Chrome 扩展令牌表
+CREATE TABLE IF NOT EXISTS chrome_extension_tokens (
+    id TEXT PRIMARY KEY,
+    token_hash TEXT NOT NULL UNIQUE,  -- SHA256 hash of the token
+    user_id TEXT,
+    project_id TEXT,
+    name TEXT,
+    permissions TEXT,  -- JSON array: read, write, delete
+    expires_at TIMESTAMP,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_used_at TIMESTAMP,
+    use_count INTEGER DEFAULT 0,
+    is_revoked BOOLEAN DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- 插件相关索引
+CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id);
+CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type);
+CREATE INDEX IF NOT EXISTS idx_plugins_status ON plugins(status);
+CREATE INDEX IF NOT EXISTS idx_plugin_configs_plugin ON plugin_configs(plugin_id);
+CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id);
+CREATE INDEX IF NOT EXISTS idx_bot_sessions_type ON bot_sessions(bot_type);
+CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_project ON webhook_endpoints(project_id);
+CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_type ON webhook_endpoints(endpoint_type);
+CREATE INDEX IF NOT EXISTS idx_webdav_syncs_project ON webdav_syncs(project_id);
+CREATE INDEX IF NOT EXISTS idx_chrome_tokens_project ON chrome_extension_tokens(project_id);
+CREATE INDEX IF NOT EXISTS idx_chrome_tokens_hash ON chrome_extension_tokens(token_hash);
+
+-- Phase 7: 插件与集成相关表
+
+-- 插件表
+CREATE TABLE IF NOT EXISTS plugins (
+    id TEXT PRIMARY KEY,
+    name TEXT NOT NULL,
+    plugin_type TEXT NOT NULL,  -- chrome_extension, feishu_bot, dingtalk_bot, slack_bot, webhook, webdav, custom
+    project_id TEXT,
+    status TEXT DEFAULT 'active',  -- active, inactive, error, pending
+    config TEXT,  -- JSON: 插件配置
+    api_key TEXT UNIQUE,  -- 用于认证的 API Key
+    api_secret TEXT,  -- 用于签名验证的 Secret
+    webhook_url TEXT,  -- 机器人 Webhook URL
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_used_at TIMESTAMP,
+    use_count INTEGER DEFAULT 0,
+    success_count INTEGER DEFAULT 0,
+    fail_count INTEGER DEFAULT 0,
+    FOREIGN KEY (project_id) REFERENCES projects(id)
+);
+
+-- 机器人会话表
+CREATE TABLE IF NOT EXISTS bot_sessions (
+    id TEXT PRIMARY KEY,
+    plugin_id TEXT NOT NULL,
+    platform TEXT NOT NULL,  -- feishu, dingtalk, slack, wechat
+    session_id TEXT NOT NULL,  -- 平台特定的会话ID
+    user_id TEXT,
+    user_name TEXT,
+    project_id TEXT,  -- 关联的项目ID
+    context TEXT,  -- JSON: 会话上下文
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_message_at TIMESTAMP,
+    message_count INTEGER DEFAULT 0,
+    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    UNIQUE(plugin_id, session_id)
+);
+
+-- Webhook 端点表（用于 Zapier/Make 集成）
+CREATE TABLE IF NOT EXISTS webhook_endpoints (
+    id TEXT PRIMARY KEY,
+    plugin_id TEXT NOT NULL,
+    name TEXT NOT NULL,
+    endpoint_path TEXT NOT NULL UNIQUE,  -- 如 /webhook/zapier/abc123
+    endpoint_type TEXT NOT NULL,  -- zapier, make, custom
+    secret TEXT,  -- 用于签名验证
+    allowed_events TEXT,  -- JSON: 允许的事件列表
+    target_project_id TEXT,  -- 数据导入的目标项目
+    is_active BOOLEAN DEFAULT 1,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    last_triggered_at TIMESTAMP,
+    trigger_count INTEGER DEFAULT 0,
+    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE,
+    FOREIGN KEY (target_project_id) REFERENCES projects(id)
+);
+
+-- WebDAV 同步配置表
+CREATE TABLE IF NOT EXISTS webdav_syncs (
+    id TEXT PRIMARY KEY,
+    plugin_id TEXT NOT NULL,
+    name TEXT NOT NULL,
+    server_url TEXT NOT NULL,
+    username TEXT NOT NULL,
+    password TEXT NOT NULL,  -- 建议加密存储
+    remote_path TEXT DEFAULT '/',
+    local_path TEXT DEFAULT './sync',
+    sync_direction TEXT DEFAULT 'bidirectional',  -- upload, download, bidirectional
+    sync_mode TEXT DEFAULT 'manual',  -- manual, realtime, scheduled
+    sync_schedule TEXT,  -- cron expression
+    file_patterns TEXT,  -- JSON: 文件匹配模式列表
+    auto_analyze BOOLEAN DEFAULT 1,  -- 同步后自动分析
+    last_sync_at TIMESTAMP,
+    last_sync_status TEXT,
+    is_active BOOLEAN DEFAULT 1,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    sync_count INTEGER DEFAULT 0,
+    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE
+);
+
+-- 插件活动日志表
+CREATE TABLE IF NOT EXISTS plugin_activity_logs (
+    id TEXT PRIMARY KEY,
+    plugin_id TEXT NOT NULL,
+    activity_type TEXT NOT NULL,  -- message, webhook, sync, error
+    source TEXT NOT NULL,  -- 来源标识
+    details TEXT,  -- JSON: 详细信息
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE
+);
+
+-- 插件相关索引
+CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id);
+CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type);
+CREATE INDEX IF NOT EXISTS idx_plugins_api_key ON plugins(api_key);
+CREATE INDEX IF NOT EXISTS idx_bot_sessions_plugin ON bot_sessions(plugin_id);
+CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id);
+CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_plugin ON webhook_endpoints(plugin_id);
+CREATE INDEX IF NOT EXISTS idx_webdav_syncs_plugin ON webdav_syncs(plugin_id);
+CREATE INDEX IF NOT EXISTS idx_plugin_logs_plugin ON plugin_activity_logs(plugin_id);
+CREATE INDEX IF NOT EXISTS idx_plugin_logs_type ON plugin_activity_logs(activity_type);
+CREATE INDEX IF NOT EXISTS idx_plugin_logs_created ON plugin_activity_logs(created_at);
--- a/backend/schema_multimodal.sql
+++ b/backend/schema_multimodal.sql
@@ -0,0 +1,104 @@
+-- Phase 7: 多模态支持相关表
+
+-- 视频表
+CREATE TABLE IF NOT EXISTS videos (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    filename TEXT NOT NULL,
+    file_path TEXT,
+    duration REAL,  -- 视频时长（秒）
+    width INTEGER,  -- 视频宽度
+    height INTEGER,  -- 视频高度
+    fps REAL,  -- 帧率
+    audio_extracted INTEGER DEFAULT 0,  -- 是否已提取音频
+    audio_path TEXT,  -- 提取的音频文件路径
+    transcript_id TEXT,  -- 关联的转录记录ID
+    status TEXT DEFAULT 'pending',  -- pending, processing, completed, failed
+    error_message TEXT,
+    metadata TEXT,  -- JSON: 其他元数据
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (transcript_id) REFERENCES transcripts(id)
+);
+
+-- 视频关键帧表
+CREATE TABLE IF NOT EXISTS video_frames (
+    id TEXT PRIMARY KEY,
+    video_id TEXT NOT NULL,
+    frame_number INTEGER NOT NULL,
+    timestamp REAL NOT NULL,  -- 帧时间戳（秒）
+    frame_path TEXT NOT NULL,  -- 帧图片路径
+    ocr_text TEXT,  -- OCR识别的文字
+    ocr_confidence REAL,  -- OCR置信度
+    entities_detected TEXT,  -- JSON: 检测到的实体
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
+);
+
+-- 图片表
+CREATE TABLE IF NOT EXISTS images (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    filename TEXT NOT NULL,
+    file_path TEXT,
+    image_type TEXT,  -- whiteboard, ppt, handwritten, screenshot, other
+    width INTEGER,
+    height INTEGER,
+    ocr_text TEXT,  -- OCR识别的文字
+    description TEXT,  -- 图片描述（LLM生成）
+    entities_detected TEXT,  -- JSON: 检测到的实体
+    relations_detected TEXT,  -- JSON: 检测到的关系
+    transcript_id TEXT,  -- 关联的转录记录ID（可选）
+    status TEXT DEFAULT 'pending',  -- pending, processing, completed, failed
+    error_message TEXT,
+    metadata TEXT,  -- JSON: 其他元数据
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (transcript_id) REFERENCES transcripts(id)
+);
+
+-- 多模态实体关联表
+CREATE TABLE IF NOT EXISTS multimodal_entities (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    entity_id TEXT NOT NULL,  -- 关联的实体ID
+    source_type TEXT NOT NULL,  -- audio, video, image, document
+    source_id TEXT NOT NULL,  -- 来源ID（transcript_id, video_id, image_id）
+    mention_context TEXT,  -- 提及上下文
+    confidence REAL DEFAULT 1.0,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (entity_id) REFERENCES entities(id),
+    UNIQUE(entity_id, source_type, source_id)
+);
+
+-- 多模态实体对齐表（跨模态实体关联）
+CREATE TABLE IF NOT EXISTS multimodal_entity_links (
+    id TEXT PRIMARY KEY,
+    project_id TEXT NOT NULL,
+    source_entity_id TEXT NOT NULL,  -- 源实体ID
+    target_entity_id TEXT NOT NULL,  -- 目标实体ID
+    link_type TEXT NOT NULL,  -- same_as, related_to, part_of
+    source_modality TEXT NOT NULL,  -- audio, video, image, document
+    target_modality TEXT NOT NULL,  -- audio, video, image, document
+    confidence REAL DEFAULT 1.0,
+    evidence TEXT,  -- 关联证据
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (project_id) REFERENCES projects(id),
+    FOREIGN KEY (source_entity_id) REFERENCES entities(id),
+    FOREIGN KEY (target_entity_id) REFERENCES entities(id)
+);
+
+-- 创建索引
+CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id);
+CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status);
+CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id);
+CREATE INDEX IF NOT EXISTS idx_video_frames_timestamp ON video_frames(timestamp);
+CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id);
+CREATE INDEX IF NOT EXISTS idx_images_type ON images(image_type);
+CREATE INDEX IF NOT EXISTS idx_images_status ON images(status);
+CREATE INDEX IF NOT EXISTS idx_multimodal_entities_project ON multimodal_entities(project_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_entities_entity ON multimodal_entities(entity_id);
+CREATE INDEX IF NOT EXISTS idx_multimodal_entity_links_project ON multimodal_entity_links(project_id);
--- a/backend/test_multimodal.py
+++ b/backend/test_multimodal.py
@@ -0,0 +1,157 @@
+#!/usr/bin/env python3
+"""
+InsightFlow Multimodal Module Test Script
+测试多模态支持模块
+"""
+
+import sys
+import os
+
+# 添加 backend 目录到路径
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+print("=" * 60)
+print("InsightFlow 多模态模块测试")
+print("=" * 60)
+
+# 测试导入
+print("\n1. 测试模块导入...")
+
+try:
+    from multimodal_processor import (
+        get_multimodal_processor, MultimodalProcessor,
+        VideoProcessingResult, VideoFrame
+    )
+    print("   ✓ multimodal_processor 导入成功")
+except ImportError as e:
+    print(f"   ✗ multimodal_processor 导入失败: {e}")
+
+try:
+    from image_processor import (
+        get_image_processor, ImageProcessor,
+        ImageProcessingResult, ImageEntity, ImageRelation
+    )
+    print("   ✓ image_processor 导入成功")
+except ImportError as e:
+    print(f"   ✗ image_processor 导入失败: {e}")
+
+try:
+    from multimodal_entity_linker import (
+        get_multimodal_entity_linker, MultimodalEntityLinker,
+        MultimodalEntity, EntityLink, AlignmentResult, FusionResult
+    )
+    print("   ✓ multimodal_entity_linker 导入成功")
+except ImportError as e:
+    print(f"   ✗ multimodal_entity_linker 导入失败: {e}")
+
+# 测试初始化
+print("\n2. 测试模块初始化...")
+
+try:
+    processor = get_multimodal_processor()
+    print(f"   ✓ MultimodalProcessor 初始化成功")
+    print(f"     - 临时目录: {processor.temp_dir}")
+    print(f"     - 帧提取间隔: {processor.frame_interval}秒")
+except Exception as e:
+    print(f"   ✗ MultimodalProcessor 初始化失败: {e}")
+
+try:
+    img_processor = get_image_processor()
+    print(f"   ✓ ImageProcessor 初始化成功")
+    print(f"     - 临时目录: {img_processor.temp_dir}")
+except Exception as e:
+    print(f"   ✗ ImageProcessor 初始化失败: {e}")
+
+try:
+    linker = get_multimodal_entity_linker()
+    print(f"   ✓ MultimodalEntityLinker 初始化成功")
+    print(f"     - 相似度阈值: {linker.similarity_threshold}")
+except Exception as e:
+    print(f"   ✗ MultimodalEntityLinker 初始化失败: {e}")
+
+# 测试实体关联功能
+print("\n3. 测试实体关联功能...")
+
+try:
+    linker = get_multimodal_entity_linker()
+    
+    # 测试字符串相似度
+    sim = linker.calculate_string_similarity("Project Alpha", "Project Alpha")
+    assert sim == 1.0, "完全匹配应该返回1.0"
+    print(f"   ✓ 字符串相似度计算正常 (完全匹配: {sim})")
+    
+    sim = linker.calculate_string_similarity("K8s", "Kubernetes")
+    print(f"   ✓ 字符串相似度计算正常 (不同字符串: {sim:.2f})")
+    
+    # 测试实体相似度
+    entity1 = {"name": "Project Alpha", "type": "PROJECT", "definition": "核心项目"}
+    entity2 = {"name": "Project Alpha", "type": "PROJECT", "definition": "主要项目"}
+    sim, match_type = linker.calculate_entity_similarity(entity1, entity2)
+    print(f"   ✓ 实体相似度计算正常 (相似度: {sim:.2f}, 类型: {match_type})")
+    
+except Exception as e:
+    print(f"   ✗ 实体关联功能测试失败: {e}")
+
+# 测试图片处理功能（不需要实际图片）
+print("\n4. 测试图片处理器功能...")
+
+try:
+    processor = get_image_processor()
+    
+    # 测试图片类型检测（使用模拟数据）
+    print(f"   ✓ 支持的图片类型: {list(processor.IMAGE_TYPES.keys())}")
+    print(f"   ✓ 图片类型描述: {processor.IMAGE_TYPES}")
+    
+except Exception as e:
+    print(f"   ✗ 图片处理器功能测试失败: {e}")
+
+# 测试视频处理配置
+print("\n5. 测试视频处理器配置...")
+
+try:
+    processor = get_multimodal_processor()
+    
+    print(f"   ✓ 视频目录: {processor.video_dir}")
+    print(f"   ✓ 帧目录: {processor.frames_dir}")
+    print(f"   ✓ 音频目录: {processor.audio_dir}")
+    
+    # 检查目录是否存在
+    for dir_name, dir_path in [
+        ("视频", processor.video_dir),
+        ("帧", processor.frames_dir),
+        ("音频", processor.audio_dir)
+    ]:
+        if os.path.exists(dir_path):
+            print(f"   ✓ {dir_name}目录存在: {dir_path}")
+        else:
+            print(f"   ✗ {dir_name}目录不存在: {dir_path}")
+    
+except Exception as e:
+    print(f"   ✗ 视频处理器配置测试失败: {e}")
+
+# 测试数据库方法（如果数据库可用）
+print("\n6. 测试数据库多模态方法...")
+
+try:
+    from db_manager import get_db_manager
+    db = get_db_manager()
+    
+    # 检查多模态表是否存在
+    conn = db.get_conn()
+    tables = ['videos', 'video_frames', 'images', 'multimodal_mentions', 'multimodal_entity_links']
+    
+    for table in tables:
+        try:
+            conn.execute(f"SELECT 1 FROM {table} LIMIT 1")
+            print(f"   ✓ 表 '{table}' 存在")
+        except Exception as e:
+            print(f"   ✗ 表 '{table}' 不存在或无法访问: {e}")
+    
+    conn.close()
+    
+except Exception as e:
+    print(f"   ✗ 数据库多模态方法测试失败: {e}")
+
+print("\n" + "=" * 60)
+print("测试完成")
+print("=" * 60)
--- a/chrome-extension/background.js
+++ b/chrome-extension/background.js
@@ -0,0 +1,217 @@
+// InsightFlow Chrome Extension - Background Script
+// 处理后台任务、右键菜单、消息传递
+
+// 默认配置
+const DEFAULT_CONFIG = {
+  serverUrl: 'http://122.51.127.111:18000',
+  apiKey: '',
+  defaultProjectId: ''
+};
+
+// 初始化
+chrome.runtime.onInstalled.addListener(() => {
+  // 创建右键菜单
+  chrome.contextMenus.create({
+    id: 'clipSelection',
+    title: '保存到 InsightFlow',
+    contexts: ['selection', 'page']
+  });
+  
+  // 初始化存储
+  chrome.storage.sync.get(['insightflowConfig'], (result) => {
+    if (!result.insightflowConfig) {
+      chrome.storage.sync.set({ insightflowConfig: DEFAULT_CONFIG });
+    }
+  });
+});
+
+// 处理右键菜单点击
+chrome.contextMenus.onClicked.addListener((info, tab) => {
+  if (info.menuItemId === 'clipSelection') {
+    clipPage(tab, info.selectionText);
+  }
+});
+
+// 处理扩展图标点击
+chrome.action.onClicked.addListener((tab) => {
+  clipPage(tab);
+});
+
+// 监听来自内容脚本的消息
+chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
+  if (request.action === 'clipPage') {
+    clipPage(sender.tab, request.selectionText);
+    sendResponse({ success: true });
+  } else if (request.action === 'getConfig') {
+    chrome.storage.sync.get(['insightflowConfig'], (result) => {
+      sendResponse(result.insightflowConfig || DEFAULT_CONFIG);
+    });
+    return true; // 保持消息通道开放
+  } else if (request.action === 'saveConfig') {
+    chrome.storage.sync.set({ insightflowConfig: request.config }, () => {
+      sendResponse({ success: true });
+    });
+    return true;
+  } else if (request.action === 'fetchProjects') {
+    fetchProjects().then(projects => {
+      sendResponse({ success: true, projects });
+    }).catch(error => {
+      sendResponse({ success: false, error: error.message });
+    });
+    return true;
+  }
+});
+
+// 剪藏页面
+async function clipPage(tab, selectionText = null) {
+  try {
+    // 获取配置
+    const config = await getConfig();
+    
+    if (!config.apiKey) {
+      showNotification('请先配置 API Key', '点击扩展图标打开设置');
+      chrome.runtime.openOptionsPage();
+      return;
+    }
+    
+    // 获取页面内容
+    const [{ result }] = await chrome.scripting.executeScript({
+      target: { tabId: tab.id },
+      func: extractPageContent,
+      args: [selectionText]
+    });
+    
+    // 发送到 InsightFlow
+    const response = await sendToInsightFlow(config, result);
+    
+    if (response.success) {
+      showNotification('保存成功', '内容已导入 InsightFlow');
+    } else {
+      showNotification('保存失败', response.error || '未知错误');
+    }
+  } catch (error) {
+    console.error('Clip error:', error);
+    showNotification('保存失败', error.message);
+  }
+}
+
+// 提取页面内容
+function extractPageContent(selectionText) {
+  const data = {
+    url: window.location.href,
+    title: document.title,
+    selection: selectionText,
+    timestamp: new Date().toISOString()
+  };
+  
+  if (selectionText) {
+    // 只保存选中的文本
+    data.content = selectionText;
+    data.contentType = 'selection';
+  } else {
+    // 保存整个页面
+    // 获取主要内容
+    const article = document.querySelector('article') || 
+                   document.querySelector('main') || 
+                   document.querySelector('.content') ||
+                   document.querySelector('#content');
+    
+    if (article) {
+      data.content = article.innerText;
+      data.contentType = 'article';
+    } else {
+      // 获取 body 文本，但移除脚本和样式
+      const bodyClone = document.body.cloneNode(true);
+      const scripts = bodyClone.querySelectorAll('script, style, nav, header, footer, aside');
+      scripts.forEach(el => el.remove());
+      data.content = bodyClone.innerText;
+      data.contentType = 'page';
+    }
+    
+    // 限制内容长度
+    if (data.content.length > 50000) {
+      data.content = data.content.substring(0, 50000) + '...';
+      data.truncated = true;
+    }
+  }
+  
+  // 获取元数据
+  data.meta = {
+    description: document.querySelector('meta[name="description"]')?.content || '',
+    keywords: document.querySelector('meta[name="keywords"]')?.content || '',
+    author: document.querySelector('meta[name="author"]')?.content || ''
+  };
+  
+  return data;
+}
+
+// 发送到 InsightFlow
+async function sendToInsightFlow(config, data) {
+  const url = `${config.serverUrl}/api/v1/plugins/chrome/clip`;
+  
+  const payload = {
+    url: data.url,
+    title: data.title,
+    content: data.content,
+    content_type: data.contentType,
+    meta: data.meta,
+    project_id: config.defaultProjectId || null
+  };
+  
+  const response = await fetch(url, {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      'X-API-Key': config.apiKey
+    },
+    body: JSON.stringify(payload)
+  });
+  
+  if (!response.ok) {
+    const error = await response.text();
+    throw new Error(error);
+  }
+  
+  return await response.json();
+}
+
+// 获取配置
+function getConfig() {
+  return new Promise((resolve) => {
+    chrome.storage.sync.get(['insightflowConfig'], (result) => {
+      resolve(result.insightflowConfig || DEFAULT_CONFIG);
+    });
+  });
+}
+
+// 获取项目列表
+async function fetchProjects() {
+  const config = await getConfig();
+  
+  if (!config.apiKey) {
+    throw new Error('请先配置 API Key');
+  }
+  
+  const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
+    headers: {
+      'X-API-Key': config.apiKey
+    }
+  });
+  
+  if (!response.ok) {
+    throw new Error('获取项目列表失败');
+  }
+  
+  const data = await response.json();
+  return data.projects || [];
+}
+
+// 显示通知
+function showNotification(title, message) {
+  chrome.notifications.create({
+    type: 'basic',
+    iconUrl: 'icons/icon128.png',
+    title,
+    message
+  });
+}
--- a/chrome-extension/content.css
+++ b/chrome-extension/content.css
@@ -0,0 +1,141 @@
+/* InsightFlow Chrome Extension - Content Styles */
+
+.insightflow-float-btn {
+  position: absolute;
+  width: 36px;
+  height: 36px;
+  background: #4f46e5;
+  border-radius: 50%;
+  display: none;
+  align-items: center;
+  justify-content: center;
+  cursor: pointer;
+  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.15);
+  z-index: 999999;
+  transition: transform 0.2s, box-shadow 0.2s;
+}
+
+.insightflow-float-btn:hover {
+  transform: scale(1.1);
+  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
+}
+
+.insightflow-float-btn svg {
+  color: white;
+}
+
+.insightflow-popup {
+  position: absolute;
+  width: 300px;
+  background: white;
+  border-radius: 8px;
+  box-shadow: 0 4px 20px rgba(0, 0, 0, 0.15);
+  z-index: 999999;
+  display: none;
+  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+  font-size: 14px;
+}
+
+.insightflow-popup-header {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+  padding: 12px 16px;
+  border-bottom: 1px solid #e5e7eb;
+  font-weight: 600;
+  color: #111827;
+}
+
+.insightflow-close-btn {
+  background: none;
+  border: none;
+  font-size: 20px;
+  color: #6b7280;
+  cursor: pointer;
+  padding: 0;
+  width: 24px;
+  height: 24px;
+  display: flex;
+  align-items: center;
+  justify-content: center;
+}
+
+.insightflow-close-btn:hover {
+  color: #111827;
+}
+
+.insightflow-popup-content {
+  padding: 16px;
+}
+
+.insightflow-text-preview {
+  background: #f3f4f6;
+  padding: 12px;
+  border-radius: 6px;
+  font-size: 13px;
+  color: #4b5563;
+  line-height: 1.5;
+  max-height: 120px;
+  overflow-y: auto;
+  margin-bottom: 12px;
+}
+
+.insightflow-actions {
+  display: flex;
+  gap: 8px;
+}
+
+.insightflow-btn {
+  flex: 1;
+  padding: 8px 12px;
+  border: 1px solid #d1d5db;
+  border-radius: 6px;
+  background: white;
+  color: #374151;
+  font-size: 13px;
+  cursor: pointer;
+  transition: all 0.2s;
+}
+
+.insightflow-btn:hover {
+  background: #f9fafb;
+  border-color: #9ca3af;
+}
+
+.insightflow-btn-primary {
+  background: #4f46e5;
+  border-color: #4f46e5;
+  color: white;
+}
+
+.insightflow-btn-primary:hover {
+  background: #4338ca;
+  border-color: #4338ca;
+}
+
+.insightflow-project-list {
+  max-height: 200px;
+  overflow-y: auto;
+}
+
+.insightflow-project-item {
+  padding: 12px;
+  border-radius: 6px;
+  cursor: pointer;
+  transition: background 0.2s;
+}
+
+.insightflow-project-item:hover {
+  background: #f3f4f6;
+}
+
+.insightflow-project-name {
+  font-weight: 500;
+  color: #111827;
+  margin-bottom: 4px;
+}
+
+.insightflow-project-desc {
+  font-size: 12px;
+  color: #6b7280;
+}
--- a/chrome-extension/content.js
+++ b/chrome-extension/content.js
@@ -0,0 +1,204 @@
+// InsightFlow Chrome Extension - Content Script
+// 在页面中注入，处理页面交互
+
+(function() {
+  'use strict';
+  
+  // 防止重复注入
+  if (window.insightflowInjected) return;
+  window.insightflowInjected = true;
+  
+  // 创建浮动按钮
+  let floatingButton = null;
+  let selectionPopup = null;
+  
+  // 监听选中文本
+  document.addEventListener('mouseup', handleSelection);
+  document.addEventListener('keyup', handleSelection);
+  
+  function handleSelection(e) {
+    const selection = window.getSelection();
+    const text = selection.toString().trim();
+    
+    if (text.length > 0) {
+      showFloatingButton(selection);
+    } else {
+      hideFloatingButton();
+      hideSelectionPopup();
+    }
+  }
+  
+  // 显示浮动按钮
+  function showFloatingButton(selection) {
+    if (!floatingButton) {
+      floatingButton = document.createElement('div');
+      floatingButton.className = 'insightflow-float-btn';
+      floatingButton.innerHTML = `
+        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+          <path d="M12 5v14M5 12h14"/>
+        </svg>
+      `;
+      floatingButton.title = '保存到 InsightFlow';
+      document.body.appendChild(floatingButton);
+      
+      floatingButton.addEventListener('click', () => {
+        const text = window.getSelection().toString().trim();
+        if (text) {
+          showSelectionPopup(text);
+        }
+      });
+    }
+    
+    // 定位按钮
+    const range = selection.getRangeAt(0);
+    const rect = range.getBoundingClientRect();
+    
+    floatingButton.style.left = `${rect.right + window.scrollX - 40}px`;
+    floatingButton.style.top = `${rect.top + window.scrollY - 45}px`;
+    floatingButton.style.display = 'flex';
+  }
+  
+  // 隐藏浮动按钮
+  function hideFloatingButton() {
+    if (floatingButton) {
+      floatingButton.style.display = 'none';
+    }
+  }
+  
+  // 显示选择弹窗
+  function showSelectionPopup(text) {
+    hideFloatingButton();
+    
+    if (!selectionPopup) {
+      selectionPopup = document.createElement('div');
+      selectionPopup.className = 'insightflow-popup';
+      document.body.appendChild(selectionPopup);
+    }
+    
+    selectionPopup.innerHTML = `
+      <div class="insightflow-popup-header">
+        <span>保存到 InsightFlow</span>
+        <button class="insightflow-close-btn">&times;</button>
+      </div>
+      <div class="insightflow-popup-content">
+        <div class="insightflow-text-preview">${escapeHtml(text.substring(0, 200))}${text.length > 200 ? '...' : ''}</div>
+        <div class="insightflow-actions">
+          <button class="insightflow-btn insightflow-btn-primary" id="if-save-quick">快速保存</button>
+          <button class="insightflow-btn" id="if-save-select">选择项目...</button>
+        </div>
+      </div>
+    `;
+    
+    selectionPopup.style.display = 'block';
+    
+    // 定位弹窗
+    const selection = window.getSelection();
+    const range = selection.getRangeAt(0);
+    const rect = range.getBoundingClientRect();
+    
+    selectionPopup.style.left = `${Math.min(rect.left + window.scrollX, window.innerWidth - 320)}px`;
+    selectionPopup.style.top = `${rect.bottom + window.scrollY + 10}px`;
+    
+    // 绑定事件
+    selectionPopup.querySelector('.insightflow-close-btn').addEventListener('click', hideSelectionPopup);
+    selectionPopup.querySelector('#if-save-quick').addEventListener('click', () => saveQuick(text));
+    selectionPopup.querySelector('#if-save-select').addEventListener('click', () => saveWithProject(text));
+  }
+  
+  // 隐藏选择弹窗
+  function hideSelectionPopup() {
+    if (selectionPopup) {
+      selectionPopup.style.display = 'none';
+    }
+  }
+  
+  // 快速保存
+  async function saveQuick(text) {
+    hideSelectionPopup();
+    
+    chrome.runtime.sendMessage({
+      action: 'clipPage',
+      selectionText: text
+    });
+  }
+  
+  // 选择项目保存
+  async function saveWithProject(text) {
+    // 获取项目列表
+    chrome.runtime.sendMessage({ action: 'fetchProjects' }, (response) => {
+      if (response.success && response.projects.length > 0) {
+        showProjectSelector(text, response.projects);
+      } else {
+        saveQuick(text); // 失败时快速保存
+      }
+    });
+  }
+  
+  // 显示项目选择器
+  function showProjectSelector(text, projects) {
+    selectionPopup.innerHTML = `
+      <div class="insightflow-popup-header">
+        <span>选择项目</span>
+        <button class="insightflow-close-btn">&times;</button>
+      </div>
+      <div class="insightflow-popup-content">
+        <div class="insightflow-project-list">
+          ${projects.map(p => `
+            <div class="insightflow-project-item" data-id="${p.id}">
+              <div class="insightflow-project-name">${escapeHtml(p.name)}</div>
+              <div class="insightflow-project-desc">${escapeHtml(p.description || '').substring(0, 50)}</div>
+            </div>
+          `).join('')}
+        </div>
+      </div>
+    `;
+    
+    selectionPopup.querySelector('.insightflow-close-btn').addEventListener('click', hideSelectionPopup);
+    
+    // 绑定项目选择事件
+    selectionPopup.querySelectorAll('.insightflow-project-item').forEach(item => {
+      item.addEventListener('click', () => {
+        const projectId = item.dataset.id;
+        saveToProject(text, projectId);
+      });
+    });
+  }
+  
+  // 保存到指定项目
+  async function saveToProject(text, projectId) {
+    hideSelectionPopup();
+    
+    chrome.runtime.sendMessage({
+      action: 'getConfig'
+    }, (config) => {
+      // 临时设置默认项目
+      config.defaultProjectId = projectId;
+      chrome.runtime.sendMessage({
+        action: 'saveConfig',
+        config: config
+      }, () => {
+        chrome.runtime.sendMessage({
+          action: 'clipPage',
+          selectionText: text
+        });
+      });
+    });
+  }
+  
+  // HTML 转义
+  function escapeHtml(text) {
+    const div = document.createElement('div');
+    div.textContent = text;
+    return div.innerHTML;
+  }
+  
+  // 点击页面其他地方关闭弹窗
+  document.addEventListener('click', (e) => {
+    if (selectionPopup && !selectionPopup.contains(e.target) && 
+        floatingButton && !floatingButton.contains(e.target)) {
+      hideSelectionPopup();
+      hideFloatingButton();
+    }
+  });
+  
+})();
--- a/chrome-extension/manifest.json
+++ b/chrome-extension/manifest.json
@@ -0,0 +1,46 @@
+{
+  "manifest_version": 3,
+  "name": "InsightFlow Clipper",
+  "version": "1.0.0",
+  "description": "将网页内容一键导入 InsightFlow 知识库",
+  "permissions": [
+    "activeTab",
+    "storage",
+    "contextMenus",
+    "scripting"
+  ],
+  "host_permissions": [
+    "http://*/*",
+    "https://*/*"
+  ],
+  "action": {
+    "default_popup": "popup.html",
+    "default_icon": {
+      "16": "icons/icon16.png",
+      "48": "icons/icon48.png",
+      "128": "icons/icon128.png"
+    }
+  },
+  "icons": {
+    "16": "icons/icon16.png",
+    "48": "icons/icon48.png",
+    "128": "icons/icon128.png"
+  },
+  "background": {
+    "service_worker": "background.js"
+  },
+  "content_scripts": [
+    {
+      "matches": ["<all_urls>"],
+      "js": ["content.js"],
+      "css": ["content.css"]
+    }
+  ],
+  "options_page": "options.html",
+  "web_accessible_resources": [
+    {
+      "resources": ["icons/*.png"],
+      "matches": ["<all_urls>"]
+    }
+  ]
+}
--- a/chrome-extension/options.html
+++ b/chrome-extension/options.html
@@ -0,0 +1,349 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>InsightFlow Clipper 设置</title>
+  <style>
+    * {
+      margin: 0;
+      padding: 0;
+      box-sizing: border-box;
+    }
+    
+    body {
+      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+      background: #f3f4f6;
+      min-height: 100vh;
+      padding: 40px 20px;
+    }
+    
+    .container {
+      max-width: 600px;
+      margin: 0 auto;
+    }
+    
+    .header {
+      text-align: center;
+      margin-bottom: 32px;
+    }
+    
+    .header h1 {
+      font-size: 28px;
+      color: #111827;
+      margin-bottom: 8px;
+    }
+    
+    .header p {
+      color: #6b7280;
+    }
+    
+    .card {
+      background: white;
+      border-radius: 12px;
+      padding: 24px;
+      margin-bottom: 24px;
+      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
+    }
+    
+    .card-title {
+      font-size: 18px;
+      font-weight: 600;
+      color: #111827;
+      margin-bottom: 20px;
+      display: flex;
+      align-items: center;
+      gap: 8px;
+    }
+    
+    .form-group {
+      margin-bottom: 20px;
+    }
+    
+    .form-label {
+      display: block;
+      font-size: 14px;
+      font-weight: 500;
+      color: #374151;
+      margin-bottom: 6px;
+    }
+    
+    .form-input {
+      width: 100%;
+      padding: 10px 12px;
+      border: 1px solid #d1d5db;
+      border-radius: 6px;
+      font-size: 14px;
+      transition: border-color 0.2s;
+    }
+    
+    .form-input:focus {
+      outline: none;
+      border-color: #4f46e5;
+    }
+    
+    .form-hint {
+      font-size: 12px;
+      color: #6b7280;
+      margin-top: 4px;
+    }
+    
+    .btn {
+      padding: 10px 20px;
+      border: none;
+      border-radius: 6px;
+      font-size: 14px;
+      font-weight: 500;
+      cursor: pointer;
+      transition: all 0.2s;
+    }
+    
+    .btn-primary {
+      background: #4f46e5;
+      color: white;
+    }
+    
+    .btn-primary:hover {
+      background: #4338ca;
+    }
+    
+    .btn-secondary {
+      background: white;
+      color: #374151;
+      border: 1px solid #d1d5db;
+    }
+    
+    .btn-secondary:hover {
+      background: #f9fafb;
+    }
+    
+    .btn-success {
+      background: #10b981;
+      color: white;
+    }
+    
+    .actions {
+      display: flex;
+      gap: 12px;
+      justify-content: flex-end;
+      margin-top: 24px;
+    }
+    
+    .status-badge {
+      display: inline-flex;
+      align-items: center;
+      gap: 6px;
+      padding: 6px 12px;
+      border-radius: 20px;
+      font-size: 12px;
+      font-weight: 500;
+    }
+    
+    .status-badge.success {
+      background: #d1fae5;
+      color: #065f46;
+    }
+    
+    .status-badge.error {
+      background: #fee2e2;
+      color: #991b1b;
+    }
+    
+    .status-dot {
+      width: 6px;
+      height: 6px;
+      border-radius: 50%;
+      background: currentColor;
+    }
+    
+    .info-box {
+      background: #eff6ff;
+      border-left: 4px solid #3b82f6;
+      padding: 16px;
+      border-radius: 0 6px 6px 0;
+      margin-bottom: 20px;
+    }
+    
+    .info-box h4 {
+      font-size: 14px;
+      color: #1e40af;
+      margin-bottom: 8px;
+    }
+    
+    .info-box p {
+      font-size: 13px;
+      color: #3b82f6;
+      line-height: 1.5;
+    }
+    
+    .info-box code {
+      background: rgba(255,255,255,0.5);
+      padding: 2px 6px;
+      border-radius: 3px;
+      font-family: monospace;
+    }
+    
+    .shortcut-list {
+      list-style: none;
+    }
+    
+    .shortcut-list li {
+      display: flex;
+      justify-content: space-between;
+      padding: 12px 0;
+      border-bottom: 1px solid #e5e7eb;
+    }
+    
+    .shortcut-list li:last-child {
+      border-bottom: none;
+    }
+    
+    .shortcut-key {
+      background: #f3f4f6;
+      padding: 4px 8px;
+      border-radius: 4px;
+      font-size: 12px;
+      font-family: monospace;
+      color: #374151;
+    }
+    
+    .footer {
+      text-align: center;
+      padding: 24px;
+      color: #6b7280;
+      font-size: 13px;
+    }
+    
+    .footer a {
+      color: #4f46e5;
+      text-decoration: none;
+    }
+    
+    .footer a:hover {
+      text-decoration: underline;
+    }
+    
+    #testResult {
+      margin-top: 12px;
+      padding: 12px;
+      border-radius: 6px;
+      font-size: 13px;
+      display: none;
+    }
+    
+    #testResult.success {
+      display: block;
+      background: #d1fae5;
+      color: #065f46;
+    }
+    
+    #testResult.error {
+      display: block;
+      background: #fee2e2;
+      color: #991b1b;
+    }
+  </style>
+</head>
+<body>
+  <div class="container">
+    <div class="header">
+      <h1>⚙️ InsightFlow Clipper 设置</h1>
+      <p>配置您的知识库连接</p>
+    </div>
+    
+    <div class="card">
+      <div class="card-title">
+        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+          <path d="M13 10V3L4 14h7v7l9-11h-7z"/>
+        </svg>
+        服务器连接
+      </div>
+      
+      <div class="info-box">
+        <h4>如何获取 API Key</h4>
+        <p>
+          1. 登录 InsightFlow 控制台<br>
+          2. 进入「插件管理」页面<br>
+          3. 创建 Chrome 插件并复制 API Key
+        </p>
+      </div>
+      
+      <div class="form-group">
+        <label class="form-label">服务器地址</label>
+        <input type="text" id="serverUrl" class="form-input" placeholder="http://122.51.127.111:18000">
+        <p class="form-hint">InsightFlow 服务器的 URL 地址</p>
+      </div>
+      
+      <div class="form-group">
+        <label class="form-label">API Key</label>
+        <input type="password" id="apiKey" class="form-input" placeholder="if_plugin_xxxxxxxx...">
+        <p class="form-hint">从 InsightFlow 控制台获取的插件 API Key</p>
+      </div>
+      
+      <div class="form-group">
+        <button id="testBtn" class="btn btn-secondary">测试连接</button>
+        <div id="testResult"></div>
+      </div>
+    </div>
+    
+    <div class="card">
+      <div class="card-title">
+        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+          <path d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z"/>
+          <path d="M15 12a3 3 0 11-6 0 3 3 0 016 0z"/>
+        </svg>
+        默认设置
+      </div>
+      
+      <div class="form-group">
+        <label class="form-label">默认项目</label>
+        <select id="defaultProject" class="form-input">
+          <option value="">不设置默认项目</option>
+        </select>
+        <p class="form-hint">保存内容时默认导入的项目</p>
+      </div>
+    </div>
+    
+    <div class="card">
+      <div class="card-title">
+        <svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+          <path d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"/>
+        </svg>
+        使用说明
+      </div>
+      
+      <ul class="shortcut-list">
+        <li>
+          <span>保存当前页面</span>
+          <span class="shortcut-key">点击扩展图标</span>
+        </li>
+        <li>
+          <span>保存选中文本</span>
+          <span class="shortcut-key">右键 → 保存到 InsightFlow</span>
+        </li>
+        <li>
+          <span>快速保存选中内容</span>
+          <span class="shortcut-key">选中文本后点击浮动按钮</span>
+        </li>
+        <li>
+          <span>选择项目保存</span>
+          <span class="shortcut-key">选中文本后点击"选择项目"</span>
+        </li>
+      </ul>
+    </div>
+    
+    <div class="actions">
+      <button id="resetBtn" class="btn btn-secondary">重置</button>
+      <button id="saveBtn" class="btn btn-primary">保存设置</button>
+    </div>
+    
+    
+    <div class="footer">
+      <p>InsightFlow Clipper v1.0.0</p>
+      <p><a href="#" id="openConsole">打开 InsightFlow 控制台</a> | <a href="#" id="helpLink">帮助文档</a></p>
+    </div>
+  </div>
+  
+  <script src="options.js"></script>
+</body>
+</html>
--- a/chrome-extension/options.js
+++ b/chrome-extension/options.js
@@ -0,0 +1,175 @@
+// InsightFlow Chrome Extension - Options Script
+
+document.addEventListener('DOMContentLoaded', () => {
+  const serverUrlInput = document.getElementById('serverUrl');
+  const apiKeyInput = document.getElementById('apiKey');
+  const defaultProjectSelect = document.getElementById('defaultProject');
+  const testBtn = document.getElementById('testBtn');
+  const testResult = document.getElementById('testResult');
+  const saveBtn = document.getElementById('saveBtn');
+  const resetBtn = document.getElementById('resetBtn');
+  const openConsole = document.getElementById('openConsole');
+  const helpLink = document.getElementById('helpLink');
+  
+  // 加载配置
+  loadConfig();
+  
+  // 测试连接
+  testBtn.addEventListener('click', async () => {
+    testBtn.disabled = true;
+    testBtn.textContent = '测试中...';
+    testResult.className = '';
+    testResult.style.display = 'none';
+    
+    const serverUrl = serverUrlInput.value.trim();
+    const apiKey = apiKeyInput.value.trim();
+    
+    if (!serverUrl || !apiKey) {
+      showTestResult('请填写服务器地址和 API Key', 'error');
+      testBtn.disabled = false;
+      testBtn.textContent = '测试连接';
+      return;
+    }
+    
+    try {
+      const response = await fetch(`${serverUrl}/api/v1/projects`, {
+        headers: { 'X-API-Key': apiKey }
+      });
+      
+      if (response.ok) {
+        const data = await response.json();
+        showTestResult(`连接成功！找到 ${data.projects?.length || 0} 个项目`, 'success');
+        
+        // 更新项目列表
+        updateProjectList(data.projects || []);
+      } else if (response.status === 401) {
+        showTestResult('API Key 无效，请检查', 'error');
+      } else {
+        showTestResult(`连接失败: HTTP ${response.status}`, 'error');
+      }
+    } catch (error) {
+      showTestResult(`连接错误: ${error.message}`, 'error');
+    }
+    
+    testBtn.disabled = false;
+    testBtn.textContent = '测试连接';
+  });
+  
+  // 保存设置
+  saveBtn.addEventListener('click', async () => {
+    const config = {
+      serverUrl: serverUrlInput.value.trim(),
+      apiKey: apiKeyInput.value.trim(),
+      defaultProjectId: defaultProjectSelect.value
+    };
+    
+    if (!config.serverUrl) {
+      alert('请填写服务器地址');
+      return;
+    }
+    
+    await chrome.storage.sync.set({ insightflowConfig: config });
+    
+    // 显示保存成功
+    saveBtn.textContent = '已保存 ✓';
+    saveBtn.classList.add('btn-success');
+    
+    setTimeout(() => {
+      saveBtn.textContent = '保存设置';
+      saveBtn.classList.remove('btn-success');
+    }, 2000);
+  });
+  
+  // 重置设置
+  resetBtn.addEventListener('click', () => {
+    if (confirm('确定要重置所有设置吗？')) {
+      const defaultConfig = {
+        serverUrl: 'http://122.51.127.111:18000',
+        apiKey: '',
+        defaultProjectId: ''
+      };
+      
+      chrome.storage.sync.set({ insightflowConfig: defaultConfig }, () => {
+        loadConfig();
+        showTestResult('设置已重置', 'success');
+      });
+    }
+  });
+  
+  // 打开控制台
+  openConsole.addEventListener('click', (e) => {
+    e.preventDefault();
+    const serverUrl = serverUrlInput.value.trim();
+    if (serverUrl) {
+      chrome.tabs.create({ url: serverUrl });
+    }
+  });
+  
+  // 帮助链接
+  helpLink.addEventListener('click', (e) => {
+    e.preventDefault();
+    const serverUrl = serverUrlInput.value.trim();
+    if (serverUrl) {
+      chrome.tabs.create({ url: `${serverUrl}/docs` });
+    }
+  });
+  
+  // 加载配置
+  async function loadConfig() {
+    const result = await chrome.storage.sync.get(['insightflowConfig']);
+    const config = result.insightflowConfig || {
+      serverUrl: 'http://122.51.127.111:18000',
+      apiKey: '',
+      defaultProjectId: ''
+    };
+    
+    serverUrlInput.value = config.serverUrl;
+    apiKeyInput.value = config.apiKey;
+    
+    // 如果有 API Key，加载项目列表
+    if (config.apiKey) {
+      loadProjects(config);
+    }
+  }
+  
+  // 加载项目列表
+  async function loadProjects(config) {
+    try {
+      const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
+        headers: { 'X-API-Key': config.apiKey }
+      });
+      
+      if (response.ok) {
+        const data = await response.json();
+        updateProjectList(data.projects || [], config.defaultProjectId);
+      }
+    } catch (error) {
+      console.error('Failed to load projects:', error);
+    }
+  }
+  
+  // 更新项目列表
+  function updateProjectList(projects, selectedId = '') {
+    let html = '<option value="">不设置默认项目</option>';
+    
+    projects.forEach(project => {
+      const selected = project.id === selectedId ? 'selected' : '';
+      html += `<option value="${project.id}" ${selected}>${escapeHtml(project.name)}</option>`;
+    });
+    
+    defaultProjectSelect.innerHTML = html;
+  }
+  
+  // 显示测试结果
+  function showTestResult(message, type) {
+    testResult.textContent = message;
+    testResult.className = type;
+  }
+  
+  // HTML 转义
+  function escapeHtml(text) {
+    const div = document.createElement('div');
+    div.textContent = text;
+    return div.innerHTML;
+  }
+});
--- a/chrome-extension/popup.html
+++ b/chrome-extension/popup.html
@@ -0,0 +1,258 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>InsightFlow Clipper</title>
+  <style>
+    * {
+      margin: 0;
+      padding: 0;
+      box-sizing: border-box;
+    }
+    
+    body {
+      width: 360px;
+      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+      background: #f9fafb;
+    }
+    
+    .header {
+      background: linear-gradient(135deg, #4f46e5 0%, #7c3aed 100%);
+      color: white;
+      padding: 20px;
+      text-align: center;
+    }
+    
+    .header h1 {
+      font-size: 18px;
+      font-weight: 600;
+      margin-bottom: 4px;
+    }
+    
+    .header p {
+      font-size: 12px;
+      opacity: 0.9;
+    }
+    
+    .content {
+      padding: 16px;
+    }
+    
+    .status-card {
+      background: white;
+      border-radius: 8px;
+      padding: 16px;
+      margin-bottom: 16px;
+      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
+    }
+    
+    .status-header {
+      display: flex;
+      align-items: center;
+      gap: 8px;
+      margin-bottom: 12px;
+    }
+    
+    .status-dot {
+      width: 8px;
+      height: 8px;
+      border-radius: 50%;
+      background: #10b981;
+    }
+    
+    .status-dot.error {
+      background: #ef4444;
+    }
+    
+    .status-text {
+      font-size: 14px;
+      font-weight: 500;
+      color: #111827;
+    }
+    
+    .project-select {
+      width: 100%;
+      padding: 10px 12px;
+      border: 1px solid #d1d5db;
+      border-radius: 6px;
+      font-size: 14px;
+      background: white;
+      cursor: pointer;
+    }
+    
+    .project-select:focus {
+      outline: none;
+      border-color: #4f46e5;
+    }
+    
+    .btn {
+      width: 100%;
+      padding: 12px;
+      border: none;
+      border-radius: 6px;
+      font-size: 14px;
+      font-weight: 500;
+      cursor: pointer;
+      transition: all 0.2s;
+      display: flex;
+      align-items: center;
+      justify-content: center;
+      gap: 8px;
+    }
+    
+    .btn-primary {
+      background: #4f46e5;
+      color: white;
+    }
+    
+    .btn-primary:hover {
+      background: #4338ca;
+    }
+    
+    .btn-secondary {
+      background: white;
+      color: #374151;
+      border: 1px solid #d1d5db;
+      margin-top: 8px;
+    }
+    
+    .btn-secondary:hover {
+      background: #f9fafb;
+    }
+    
+    .stats {
+      display: grid;
+      grid-template-columns: repeat(3, 1fr);
+      gap: 12px;
+      margin-top: 16px;
+    }
+    
+    .stat-item {
+      text-align: center;
+      padding: 12px;
+      background: #f3f4f6;
+      border-radius: 6px;
+    }
+    
+    .stat-value {
+      font-size: 20px;
+      font-weight: 600;
+      color: #4f46e5;
+    }
+    
+    .stat-label {
+      font-size: 11px;
+      color: #6b7280;
+      margin-top: 4px;
+    }
+    
+    .footer {
+      padding: 12px 16px;
+      text-align: center;
+      border-top: 1px solid #e5e7eb;
+    }
+    
+    .footer a {
+      color: #4f46e5;
+      text-decoration: none;
+      font-size: 12px;
+    }
+    
+    .footer a:hover {
+      text-decoration: underline;
+    }
+    
+    .loading {
+      display: inline-block;
+      width: 16px;
+      height: 16px;
+      border: 2px solid #ffffff;
+      border-top-color: transparent;
+      border-radius: 50%;
+      animation: spin 0.8s linear infinite;
+    }
+    
+    @keyframes spin {
+      to { transform: rotate(360deg); }
+    }
+    
+    .message {
+      padding: 12px;
+      border-radius: 6px;
+      font-size: 13px;
+      margin-bottom: 12px;
+      display: none;
+    }
+    
+    .message.success {
+      display: block;
+      background: #d1fae5;
+      color: #065f46;
+    }
+    
+    .message.error {
+      display: block;
+      background: #fee2e2;
+      color: #991b1b;
+    }
+  </style>
+</head>
+<body>
+  <div class="header">
+    <h1>🧠 InsightFlow</h1>
+    <p>一键保存网页到知识库</p>
+  </div>
+  
+  <div class="content">
+    <div id="message" class="message"></div>
+    
+    <div class="status-card">
+      <div class="status-header">
+        <div id="statusDot" class="status-dot"></div>
+        <span id="statusText" class="status-text">连接中...</span>
+      </div>
+      
+      <select id="projectSelect" class="project-select">
+        <option value="">选择保存项目...</option>
+      </select>
+    </div>
+    
+    <button id="clipBtn" class="btn btn-primary">
+      <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+        <path d="M12 5v14M5 12h14"/>
+      </svg>
+      保存当前页面
+    </button>
+    
+    <button id="settingsBtn" class="btn btn-secondary">
+      <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+        <path d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z"/>
+        <path d="M15 12a3 3 0 11-6 0 3 3 0 016 0z"/>
+      </svg>
+      设置
+    </button>
+    
+    <div class="stats">
+      <div class="stat-item">
+        <div id="clipCount" class="stat-value">0</div>
+        <div class="stat-label">已保存</div>
+      </div>
+      <div class="stat-item">
+        <div id="projectCount" class="stat-value">0</div>
+        <div class="stat-label">项目数</div>
+      </div>
+      <div class="stat-item">
+        <div id="todayCount" class="stat-value">0</div>
+        <div class="stat-label">今日</div>
+      </div>
+    </div>
+  </div>
+  
+  <div class="footer">
+    <a href="#" id="openDashboard">打开 InsightFlow 控制台 →</a>
+  </div>
+  
+  <script src="popup.js"></script>
+</body>
+</html>
--- a/chrome-extension/popup.js
+++ b/chrome-extension/popup.js
@@ -0,0 +1,195 @@
+// InsightFlow Chrome Extension - Popup Script
+
+document.addEventListener('DOMContentLoaded', async () => {
+  const clipBtn = document.getElementById('clipBtn');
+  const settingsBtn = document.getElementById('settingsBtn');
+  const projectSelect = document.getElementById('projectSelect');
+  const statusDot = document.getElementById('statusDot');
+  const statusText = document.getElementById('statusText');
+  const messageEl = document.getElementById('message');
+  const openDashboard = document.getElementById('openDashboard');
+  
+  // 加载配置和项目列表
+  await loadConfig();
+  
+  // 保存当前页面按钮
+  clipBtn.addEventListener('click', async () => {
+    const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
+    
+    // 更新按钮状态
+    clipBtn.disabled = true;
+    clipBtn.innerHTML = '<span class="loading"></span> 保存中...';
+    
+    // 保存选中的项目
+    const projectId = projectSelect.value;
+    if (projectId) {
+      const config = await getConfig();
+      config.defaultProjectId = projectId;
+      await saveConfig(config);
+    }
+    
+    // 发送剪藏请求
+    chrome.runtime.sendMessage({
+      action: 'clipPage'
+    }, (response) => {
+      clipBtn.disabled = false;
+      clipBtn.innerHTML = `
+        <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
+          <path d="M12 5v14M5 12h14"/>
+        </svg>
+        保存当前页面
+      `;
+      
+      if (response && response.success) {
+        showMessage('保存成功！', 'success');
+        updateStats();
+      } else {
+        showMessage(response?.error || '保存失败', 'error');
+      }
+    });
+  });
+  
+  // 设置按钮
+  settingsBtn.addEventListener('click', () => {
+    chrome.runtime.openOptionsPage();
+  });
+  
+  // 打开控制台
+  openDashboard.addEventListener('click', async (e) => {
+    e.preventDefault();
+    const config = await getConfig();
+    chrome.tabs.create({ url: config.serverUrl });
+  });
+});
+
+// 加载配置
+async function loadConfig() {
+  const config = await getConfig();
+  
+  // 检查连接状态
+  checkConnection(config);
+  
+  // 加载项目列表
+  loadProjects(config);
+  
+  // 更新统计
+  updateStats();
+}
+
+// 检查连接状态
+async function checkConnection(config) {
+  const statusDot = document.getElementById('statusDot');
+  const statusText = document.getElementById('statusText');
+  
+  if (!config.apiKey) {
+    statusDot.classList.add('error');
+    statusText.textContent = '未配置 API Key';
+    return;
+  }
+  
+  try {
+    const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
+      headers: { 'X-API-Key': config.apiKey }
+    });
+    
+    if (response.ok) {
+      statusText.textContent = '已连接';
+    } else {
+      statusDot.classList.add('error');
+      statusText.textContent = '连接失败';
+    }
+  } catch (error) {
+    statusDot.classList.add('error');
+    statusText.textContent = '连接错误';
+  }
+}
+
+// 加载项目列表
+async function loadProjects(config) {
+  const projectSelect = document.getElementById('projectSelect');
+  
+  if (!config.apiKey) {
+    projectSelect.innerHTML = '<option>请先配置 API Key</option>';
+    return;
+  }
+  
+  try {
+    const response = await fetch(`${config.serverUrl}/api/v1/projects`, {
+      headers: { 'X-API-Key': config.apiKey }
+    });
+    
+    if (response.ok) {
+      const data = await response.json();
+      const projects = data.projects || [];
+      
+      // 更新项目数统计
+      document.getElementById('projectCount').textContent = projects.length;
+      
+      // 填充下拉框
+      let html = '<option value="">选择保存项目...</option>';
+      projects.forEach(project => {
+        const selected = project.id === config.defaultProjectId ? 'selected' : '';
+        html += `<option value="${project.id}" ${selected}>${escapeHtml(project.name)}</option>`;
+      });
+      projectSelect.innerHTML = html;
+    }
+  } catch (error) {
+    console.error('Failed to load projects:', error);
+  }
+}
+
+// 更新统计
+async function updateStats() {
+  // 从存储中获取统计数据
+  const result = await chrome.storage.local.get(['clipStats']);
+  const stats = result.clipStats || { total: 0, today: 0, lastDate: null };
+  
+  // 检查是否需要重置今日计数
+  const today = new Date().toDateString();
+  if (stats.lastDate !== today) {
+    stats.today = 0;
+    stats.lastDate = today;
+    await chrome.storage.local.set({ clipStats: stats });
+  }
+  
+  document.getElementById('clipCount').textContent = stats.total;
+  document.getElementById('todayCount').textContent = stats.today;
+}
+
+// 显示消息
+function showMessage(text, type) {
+  const messageEl = document.getElementById('message');
+  messageEl.textContent = text;
+  messageEl.className = `message ${type}`;
+  
+  setTimeout(() => {
+    messageEl.className = 'message';
+  }, 3000);
+}
+
+// 获取配置
+function getConfig() {
+  return new Promise((resolve) => {
+    chrome.storage.sync.get(['insightflowConfig'], (result) => {
+      resolve(result.insightflowConfig || {
+        serverUrl: 'http://122.51.127.111:18000',
+        apiKey: '',
+        defaultProjectId: ''
+      });
+    });
+  });
+}
+
+// 保存配置
+function saveConfig(config) {
+  return new Promise((resolve) => {
+    chrome.storage.sync.set({ insightflowConfig: config }, resolve);
+  });
+}
+
+// HTML 转义
+function escapeHtml(text) {
+  const div = document.createElement('div');
+  div.textContent = text;
+  return div.innerHTML;
+}
--- a/docs/PHASE7_TASK2_SUMMARY.md
+++ b/docs/PHASE7_TASK2_SUMMARY.md
@@ -0,0 +1,95 @@
+# InsightFlow Phase 7 任务 2 开发总结
+
+## 完成内容
+
+### 1. 多模态处理模块 (multimodal_processor.py)
+
+#### VideoProcessor 类
+- **视频文件处理**: 支持 MP4, AVI, MOV, MKV, WebM, FLV 格式
+- **音频提取**: 使用 ffmpeg 提取音频轨道（WAV 格式，16kHz 采样率）
+- **关键帧提取**: 使用 OpenCV 按时间间隔提取关键帧（默认每5秒）
+- **OCR识别**: 支持 PaddleOCR/EasyOCR/Tesseract 识别关键帧文字
+- **数据整合**: 合并所有帧的 OCR 文本，支持实体提取
+
+#### ImageProcessor 类
+- **图片处理**: 支持 JPG, PNG, GIF, BMP, WebP 格式
+- **OCR识别**: 识别图片中的文字内容（白板、PPT、手写笔记）
+- **图片描述**: 预留多模态 LLM 接口（待集成）
+- **批量处理**: 支持批量图片导入
+
+#### MultimodalEntityExtractor 类
+- 从视频和图片处理结果中提取实体和关系
+- 与现有 LLM 客户端集成
+
+### 2. 多模态实体关联模块 (multimodal_entity_linker.py)
+
+#### MultimodalEntityLinker 类
+- **跨模态实体对齐**: 使用 embedding 相似度计算发现不同模态中的同一实体
+- **多模态实体画像**: 统计实体在各模态中的提及次数
+- **跨模态关系发现**: 查找在同一视频帧/图片中共同出现的实体
+- **多模态时间线**: 按时间顺序展示多模态事件
+
+### 3. 数据库更新 (schema.sql)
+
+新增表：
+- `videos`: 视频信息表（时长、帧率、分辨率、OCR文本）
+- `video_frames`: 视频关键帧表（帧数据、时间戳、OCR文本）
+- `images`: 图片信息表（OCR文本、描述、提取的实体）
+- `multimodal_mentions`: 多模态实体提及表
+- `multimodal_entity_links`: 多模态实体关联表
+
+### 4. API 端点 (main.py)
+
+#### 视频相关
+- `POST /api/v1/projects/{id}/upload-video` - 上传视频
+- `GET /api/v1/projects/{id}/videos` - 视频列表
+- `GET /api/v1/videos/{id}` - 视频详情
+
+#### 图片相关
+- `POST /api/v1/projects/{id}/upload-image` - 上传图片
+- `GET /api/v1/projects/{id}/images` - 图片列表
+- `GET /api/v1/images/{id}` - 图片详情
+
+#### 多模态实体关联
+- `POST /api/v1/projects/{id}/multimodal/link-entities` - 跨模态实体关联
+- `GET /api/v1/entities/{id}/multimodal-profile` - 实体多模态画像
+- `GET /api/v1/projects/{id}/multimodal-timeline` - 多模态时间线
+- `GET /api/v1/entities/{id}/cross-modal-relations` - 跨模态关系
+
+### 5. 依赖更新 (requirements.txt)
+
+新增依赖：
+- `opencv-python==4.9.0.80` - 视频处理
+- `pillow==10.2.0` - 图片处理
+- `paddleocr==2.7.0.3` + `paddlepaddle==2.6.0` - OCR 引擎
+- `ffmpeg-python==0.2.0` - ffmpeg 封装
+- `sentence-transformers==2.3.1` - 跨模态对齐
+
+## 系统要求
+
+- **ffmpeg**: 必须安装，用于视频和音频处理
+- **Python 3.8+**: 支持所有依赖库
+
+## 待完善项
+
+1. **多模态 LLM 集成**: 图片描述功能需要集成 Kimi 或其他多模态模型 API
+2. **前端界面**: 需要开发视频/图片上传界面和多模态展示组件
+3. **性能优化**: 大视频文件处理可能需要异步任务队列
+4. **OCR 引擎选择**: 根据部署环境选择最适合的 OCR 引擎
+
+## 部署说明
+
+```bash
+# 安装系统依赖
+apt-get update
+apt-get install -y ffmpeg
+
+# 安装 Python 依赖
+pip install -r requirements.txt
+
+# 更新数据库
+sqlite3 insightflow.db < schema.sql
+
+# 启动服务
+python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
+```