diff --git a/README.md b/README.md index 91ec42f..846f3dd 100644 --- a/README.md +++ b/README.md @@ -125,74 +125,80 @@ MIT --- -## Phase 7: 智能化与生态扩展 - 进行中 🚧 +## Phase 7: 智能化与生态扩展 - 规划中 🚧 -### Phase 7 任务清单 +基于现有功能和用户反馈,Phase 7 聚焦**智能化增强**和**生态扩展**: + +### 1. 智能工作流自动化 🤖 +**优先级: P0** +- 定时任务自动分析新上传的音频/文档 +- 自动实体对齐和关系发现 +- 智能提醒(如发现新关联、实体冲突) +- Webhook 集成(支持飞书、钉钉、Slack 通知) + +### 2. 多模态支持 🎬 +**优先级: P0** +- 视频文件导入(提取音频 + 关键帧 OCR) +- 图片内容识别(白板、PPT、手写笔记) +- 多模态实体关联(同一实体在音频、图片、文档中的提及) + +### 3. 协作与共享 👥 +**优先级: P1** +- 项目分享(只读/可编辑链接) +- 评论和批注(在实体、关系、转录文本上添加评论) +- 变更历史(谁修改了什么,何时修改) +- 团队空间(多用户项目协作) + +### 4. 智能报告生成 📊 +**优先级: P1** +- 一键生成项目总结报告(PDF/Word) +- 实体关系网络分析报告 +- 会议纪要和行动项提取 +- 自定义报告模板 + +### 5. 插件与集成 🔌 +**优先级: P2** +- Chrome 插件(网页内容一键导入) +- 飞书/钉钉机器人(群内直接分析音频) +- Zapier/Make 集成(连接 5000+ 应用) +- WebDAV 同步(与坚果云等网盘联动) + +### 6. 高级搜索与发现 🔍 +**优先级: P2** +- 全文搜索(跨所有转录文本) +- 语义搜索(基于 embedding 的相似度搜索) +- 实体关系路径发现(A 和 B 之间如何关联) +- 知识缺口识别(项目中缺失的关键信息) + +### 7. 数据安全与合规 🔒 +**优先级: P1** +- 端到端加密(敏感项目数据加密存储) +- 数据脱敏(自动识别并脱敏敏感信息) +- 审计日志(完整操作记录) +- GDPR/数据合规支持 + +### 8. 性能优化与扩展 ⚡ +**优先级: P2** +- Redis 缓存层(热点数据缓存) +- 数据库分片(支持大规模项目) +- CDN 加速(静态资源全球加速) +- 异步任务队列(Celery + Redis) + +--- + +## Phase 7 开发进度 | 任务 | 状态 | 完成时间 | |------|------|----------| | 1. 智能工作流自动化 | ✅ 已完成 | 2026-02-23 | | 2. 多模态支持 | ✅ 已完成 | 2026-02-23 | +| 7. 插件与集成 | ✅ 已完成 | 2026-02-23 | | 3. 数据安全与合规 | ✅ 已完成 | 2026-02-23 | -| 4. 协作与共享 | ✅ 已完成 | 2026-02-24 | +| 4. 协作与共享 | 📋 待开发 | - | | 5. 智能报告生成 | 📋 待开发 | - | | 6. 高级搜索与发现 | 📋 待开发 | - | -| 7. 插件与集成 | ✅ 已完成 | 2026-02-23 | | 8. 性能优化与扩展 | 📋 待开发 | - | -### 已完成功能 ✅ +**建议开发顺序**: 1 → 2 → 7 → 3 → 4 → 5 → 6 → 8 -1. **智能工作流自动化** ✅ - - 工作流管理模块 `workflow_manager.py` - - 定时任务调度(APScheduler) - - Webhook 通知器(飞书/钉钉/Slack) - - 自动分析新上传文件 - - 自动实体对齐和关系发现 - -2. **多模态支持** ✅ - - 视频处理模块(音频提取 + 关键帧 + OCR) - - 图片处理模块(OCR + 图片描述) - - 跨模态实体关联 - - 多模态实体画像 - - 多模态时间线生成 - -3. **数据安全与合规** ✅ - - 安全模块 `security_manager.py` - - 审计日志系统 - - 端到端加密(AES-256-GCM) - - 数据脱敏(手机号、邮箱、身份证) - - 数据访问策略 - - 访问审批流程 - -4. **协作与共享** ✅ - - 协作管理模块 `collaboration_manager.py` - - 项目分享链接(只读/评论/编辑/管理员权限) - - 评论和批注系统(实体/关系/转录文本) - - 变更历史追踪 - - 团队成员管理(多角色权限控制) - -7. **插件与集成** ✅ - - 插件管理模块 `plugin_manager.py` - - Chrome 扩展支持 - - 飞书/钉钉机器人 - - Zapier/Make Webhook 集成 - - WebDAV 同步 - -### 待开发任务 📋 - -5. **智能报告生成** - 待开发 - - 一键生成 PDF/Word 报告 - - 会议纪要提取 - - 自定义报告模板 - -6. **高级搜索与发现** - 待开发 - - 全文搜索 - - 语义搜索 - - 实体关系路径发现 - - 知识缺口识别 - -8. **性能优化与扩展** - 待开发 - - Redis 缓存层 - - 数据库分片 - - CDN 加速 - - 异步任务队列(Celery) +**预计 Phase 7 完成时间**: 4-6 周 diff --git a/STATUS.md b/STATUS.md index ec8a126..2b7f09b 100644 --- a/STATUS.md +++ b/STATUS.md @@ -1,10 +1,10 @@ # InsightFlow 开发状态 -**最后更新**: 2026-02-24 00:00 +**最后更新**: 2026-02-23 18:00 ## 当前阶段 -Phase 7: 协作与共享 - **已完成 ✅** +Phase 7: 数据安全与合规 - **已完成 ✅** ## 部署状态 @@ -187,43 +187,9 @@ Phase 7: 协作与共享 - **已完成 ✅** - POST /api/v1/access-requests/{id}/reject - 拒绝访问 - ✅ 更新 requirements.txt - 添加 cryptography 依赖 -### Phase 7 - 任务 4: 协作与共享 (已完成 ✅) -- ✅ 创建 collaboration_manager.py - 协作管理模块 - - CollaborationManager: 协作管理主类 - - 项目分享链接管理 - 支持只读/评论/编辑/管理员权限 - - 评论和批注系统 - 支持实体、关系、转录文本评论 - - 变更历史追踪 - 记录所有数据操作变更 - - 团队成员管理 - 支持多角色权限控制 -- ✅ 更新 schema.sql - 添加协作相关数据库表 - - project_shares: 项目分享表 - - comments: 评论表 - - change_history: 变更历史表 - - team_members: 团队成员表 -- ✅ 更新 main.py - 添加协作相关 API 端点 - - POST /api/v1/projects/{id}/shares - 创建分享链接 - - GET /api/v1/projects/{id}/shares - 列出分享链接 - - POST /api/v1/shares/verify - 验证分享链接 - - GET /api/v1/shares/{token}/access - 访问共享项目 - - DELETE /api/v1/shares/{id} - 撤销分享链接 - - POST /api/v1/projects/{id}/comments - 添加评论 - - GET /api/v1/{type}/{id}/comments - 获取评论列表 - - GET /api/v1/projects/{id}/comments - 获取项目所有评论 - - PUT /api/v1/comments/{id} - 更新评论 - - POST /api/v1/comments/{id}/resolve - 解决评论 - - DELETE /api/v1/comments/{id} - 删除评论 - - GET /api/v1/projects/{id}/history - 获取变更历史 - - GET /api/v1/projects/{id}/history/stats - 获取变更统计 - - GET /api/v1/{type}/{id}/versions - 获取实体版本历史 - - POST /api/v1/history/{id}/revert - 回滚变更 - - POST /api/v1/projects/{id}/members - 邀请团队成员 - - GET /api/v1/projects/{id}/members - 列出团队成员 - - PUT /api/v1/members/{id}/role - 更新成员角色 - - DELETE /api/v1/members/{id} - 移除团队成员 - - GET /api/v1/projects/{id}/permissions - 检查用户权限 - ## 待完成 -Phase 7 任务 5: 智能报告生成 +Phase 7 任务 4: 协作与共享 ## 技术债务 @@ -241,24 +207,21 @@ Phase 7 任务 5: 智能报告生成 ## 最近更新 -### 2026-02-24 (凌晨) -- 完成 Phase 7 任务 4: 协作与共享 - - 创建 collaboration_manager.py 协作模块 - - CollaborationManager: 协作管理主类 - - 项目分享链接管理 - 支持只读/评论/编辑/管理员权限 - - 评论和批注系统 - 支持实体、关系、转录文本评论 - - 变更历史追踪 - 记录所有数据操作变更 - - 团队成员管理 - 支持多角色权限控制 - - 更新 schema.sql 添加协作相关数据库表 - - project_shares: 项目分享表 - - comments: 评论表 - - change_history: 变更历史表 - - team_members: 团队成员表 - - 更新 main.py 添加协作相关 API 端点 - - 项目分享相关端点 - - 评论和批注相关端点 - - 变更历史相关端点 - - 团队成员管理端点 +### 2026-02-23 (午间) +- 完成 Phase 7 任务 7: 插件与集成 + - 创建 plugin_manager.py 模块 + - PluginManager: 插件管理主类 + - ChromeExtensionHandler: Chrome 插件处理 + - BotHandler: 飞书/钉钉/Slack 机器人处理 + - WebhookIntegration: Zapier/Make Webhook 集成 + - WebDAVSync: WebDAV 同步管理 + - 创建完整的 Chrome 扩展代码 + - manifest.json, background.js, content.js + - popup.html/js, options.html/js + - 支持网页剪藏、选中文本保存、项目选择 + - 更新 schema.sql 添加插件相关数据库表 + - 更新 main.py 添加插件相关 API 端点 + - 更新 requirements.txt 添加插件依赖 ### 2026-02-23 (晚间) - 完成 Phase 7 任务 3: 数据安全与合规 @@ -278,22 +241,6 @@ Phase 7 任务 5: 智能报告生成 - 更新 main.py 添加安全相关 API 端点 - 更新 requirements.txt 添加 cryptography 依赖 -### 2026-02-23 (午间) -- 完成 Phase 7 任务 7: 插件与集成 - - 创建 plugin_manager.py 模块 - - PluginManager: 插件管理主类 - - ChromeExtensionHandler: Chrome 插件处理 - - BotHandler: 飞书/钉钉/Slack 机器人处理 - - WebhookIntegration: Zapier/Make Webhook 集成 - - WebDAVSync: WebDAV 同步管理 - - 创建完整的 Chrome 扩展代码 - - manifest.json, background.js, content.js - - popup.html/js, options.html/js - - 支持网页剪藏、选中文本保存、项目选择 - - 更新 schema.sql 添加插件相关数据库表 - - 更新 main.py 添加插件相关 API 端点 - - 更新 requirements.txt 添加插件依赖 - ### 2026-02-23 (早间) - 完成 Phase 7 任务 2: 多模态支持 - 创建 multimodal_processor.py 模块 diff --git a/backend/__pycache__/api_key_manager.cpython-312.pyc b/backend/__pycache__/api_key_manager.cpython-312.pyc new file mode 100644 index 0000000..798b71f Binary files /dev/null and b/backend/__pycache__/api_key_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/db_manager.cpython-312.pyc b/backend/__pycache__/db_manager.cpython-312.pyc index 1f0203c..a24f1e8 100644 Binary files a/backend/__pycache__/db_manager.cpython-312.pyc and b/backend/__pycache__/db_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/document_processor.cpython-312.pyc b/backend/__pycache__/document_processor.cpython-312.pyc deleted file mode 100644 index 6fe9caa..0000000 Binary files a/backend/__pycache__/document_processor.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/entity_aligner.cpython-312.pyc b/backend/__pycache__/entity_aligner.cpython-312.pyc deleted file mode 100644 index 41f18b2..0000000 Binary files a/backend/__pycache__/entity_aligner.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/export_manager.cpython-312.pyc b/backend/__pycache__/export_manager.cpython-312.pyc new file mode 100644 index 0000000..3b8321d Binary files /dev/null and b/backend/__pycache__/export_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/image_processor.cpython-312.pyc b/backend/__pycache__/image_processor.cpython-312.pyc new file mode 100644 index 0000000..cb7a09e Binary files /dev/null and b/backend/__pycache__/image_processor.cpython-312.pyc differ diff --git a/backend/__pycache__/knowledge_reasoner.cpython-312.pyc b/backend/__pycache__/knowledge_reasoner.cpython-312.pyc deleted file mode 100644 index 2f9e237..0000000 Binary files a/backend/__pycache__/knowledge_reasoner.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/llm_client.cpython-312.pyc b/backend/__pycache__/llm_client.cpython-312.pyc deleted file mode 100644 index e5ae720..0000000 Binary files a/backend/__pycache__/llm_client.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/main.cpython-312.pyc b/backend/__pycache__/main.cpython-312.pyc index 31b5d48..2739f27 100644 Binary files a/backend/__pycache__/main.cpython-312.pyc and b/backend/__pycache__/main.cpython-312.pyc differ diff --git a/backend/__pycache__/multimodal_entity_linker.cpython-312.pyc b/backend/__pycache__/multimodal_entity_linker.cpython-312.pyc new file mode 100644 index 0000000..5aef7a6 Binary files /dev/null and b/backend/__pycache__/multimodal_entity_linker.cpython-312.pyc differ diff --git a/backend/__pycache__/multimodal_processor.cpython-312.pyc b/backend/__pycache__/multimodal_processor.cpython-312.pyc new file mode 100644 index 0000000..03b4715 Binary files /dev/null and b/backend/__pycache__/multimodal_processor.cpython-312.pyc differ diff --git a/backend/__pycache__/neo4j_manager.cpython-312.pyc b/backend/__pycache__/neo4j_manager.cpython-312.pyc new file mode 100644 index 0000000..4169091 Binary files /dev/null and b/backend/__pycache__/neo4j_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/oss_uploader.cpython-312.pyc b/backend/__pycache__/oss_uploader.cpython-312.pyc deleted file mode 100644 index 9e89360..0000000 Binary files a/backend/__pycache__/oss_uploader.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/plugin_manager.cpython-312.pyc b/backend/__pycache__/plugin_manager.cpython-312.pyc new file mode 100644 index 0000000..d6a27a5 Binary files /dev/null and b/backend/__pycache__/plugin_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/rate_limiter.cpython-312.pyc b/backend/__pycache__/rate_limiter.cpython-312.pyc new file mode 100644 index 0000000..03b8e2c Binary files /dev/null and b/backend/__pycache__/rate_limiter.cpython-312.pyc differ diff --git a/backend/__pycache__/security_manager.cpython-312.pyc b/backend/__pycache__/security_manager.cpython-312.pyc new file mode 100644 index 0000000..ccc9896 Binary files /dev/null and b/backend/__pycache__/security_manager.cpython-312.pyc differ diff --git a/backend/__pycache__/tingwu_client.cpython-312.pyc b/backend/__pycache__/tingwu_client.cpython-312.pyc deleted file mode 100644 index 58f81e4..0000000 Binary files a/backend/__pycache__/tingwu_client.cpython-312.pyc and /dev/null differ diff --git a/backend/__pycache__/workflow_manager.cpython-312.pyc b/backend/__pycache__/workflow_manager.cpython-312.pyc new file mode 100644 index 0000000..13c356b Binary files /dev/null and b/backend/__pycache__/workflow_manager.cpython-312.pyc differ diff --git a/backend/api_key_manager.py b/backend/api_key_manager.py new file mode 100644 index 0000000..c429971 --- /dev/null +++ b/backend/api_key_manager.py @@ -0,0 +1,529 @@ +#!/usr/bin/env python3 +""" +InsightFlow API Key Manager - Phase 6 +API Key 管理模块:生成、验证、撤销 +""" + +import os +import json +import hashlib +import secrets +import sqlite3 +from datetime import datetime, timedelta +from typing import Optional, List, Dict +from dataclasses import dataclass +from enum import Enum + +DB_PATH = os.getenv("DB_PATH", "/app/data/insightflow.db") + + +class ApiKeyStatus(Enum): + ACTIVE = "active" + REVOKED = "revoked" + EXPIRED = "expired" + + +@dataclass +class ApiKey: + id: str + key_hash: str # 存储哈希值,不存储原始 key + key_preview: str # 前8位预览,如 "ak_live_abc..." + name: str # 密钥名称/描述 + owner_id: Optional[str] # 所有者ID(预留多用户支持) + permissions: List[str] # 权限列表,如 ["read", "write"] + rate_limit: int # 每分钟请求限制 + status: str # active, revoked, expired + created_at: str + expires_at: Optional[str] + last_used_at: Optional[str] + revoked_at: Optional[str] + revoked_reason: Optional[str] + total_calls: int = 0 + + +class ApiKeyManager: + """API Key 管理器""" + + # Key 前缀 + KEY_PREFIX = "ak_live_" + KEY_LENGTH = 48 # 总长度: 前缀(8) + 随机部分(40) + + def __init__(self, db_path: str = DB_PATH): + self.db_path = db_path + self._init_db() + + def _init_db(self): + """初始化数据库表""" + with sqlite3.connect(self.db_path) as conn: + conn.executescript(""" + -- API Keys 表 + CREATE TABLE IF NOT EXISTS api_keys ( + id TEXT PRIMARY KEY, + key_hash TEXT UNIQUE NOT NULL, + key_preview TEXT NOT NULL, + name TEXT NOT NULL, + owner_id TEXT, + permissions TEXT NOT NULL DEFAULT '["read"]', + rate_limit INTEGER DEFAULT 60, + status TEXT DEFAULT 'active', + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + expires_at TIMESTAMP, + last_used_at TIMESTAMP, + revoked_at TIMESTAMP, + revoked_reason TEXT, + total_calls INTEGER DEFAULT 0 + ); + + -- API 调用日志表 + CREATE TABLE IF NOT EXISTS api_call_logs ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + api_key_id TEXT NOT NULL, + endpoint TEXT NOT NULL, + method TEXT NOT NULL, + status_code INTEGER, + response_time_ms INTEGER, + ip_address TEXT, + user_agent TEXT, + error_message TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (api_key_id) REFERENCES api_keys(id) + ); + + -- API 调用统计表(按天汇总) + CREATE TABLE IF NOT EXISTS api_call_stats ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + api_key_id TEXT NOT NULL, + date TEXT NOT NULL, + endpoint TEXT NOT NULL, + method TEXT NOT NULL, + total_calls INTEGER DEFAULT 0, + success_calls INTEGER DEFAULT 0, + error_calls INTEGER DEFAULT 0, + avg_response_time_ms INTEGER DEFAULT 0, + FOREIGN KEY (api_key_id) REFERENCES api_keys(id), + UNIQUE(api_key_id, date, endpoint, method) + ); + + -- 创建索引 + CREATE INDEX IF NOT EXISTS idx_api_keys_hash ON api_keys(key_hash); + CREATE INDEX IF NOT EXISTS idx_api_keys_status ON api_keys(status); + CREATE INDEX IF NOT EXISTS idx_api_keys_owner ON api_keys(owner_id); + CREATE INDEX IF NOT EXISTS idx_api_logs_key_id ON api_call_logs(api_key_id); + CREATE INDEX IF NOT EXISTS idx_api_logs_created ON api_call_logs(created_at); + CREATE INDEX IF NOT EXISTS idx_api_stats_key_date ON api_call_stats(api_key_id, date); + """) + conn.commit() + + def _generate_key(self) -> str: + """生成新的 API Key""" + # 生成 40 字符的随机字符串 + random_part = secrets.token_urlsafe(30)[:40] + return f"{self.KEY_PREFIX}{random_part}" + + def _hash_key(self, key: str) -> str: + """对 API Key 进行哈希""" + return hashlib.sha256(key.encode()).hexdigest() + + def _get_preview(self, key: str) -> str: + """获取 Key 的预览(前16位)""" + return f"{key[:16]}..." + + def create_key( + self, + name: str, + owner_id: Optional[str] = None, + permissions: List[str] = None, + rate_limit: int = 60, + expires_days: Optional[int] = None + ) -> tuple[str, ApiKey]: + """ + 创建新的 API Key + + Returns: + tuple: (原始key(仅返回一次), ApiKey对象) + """ + if permissions is None: + permissions = ["read"] + + key_id = secrets.token_hex(16) + raw_key = self._generate_key() + key_hash = self._hash_key(raw_key) + key_preview = self._get_preview(raw_key) + + expires_at = None + if expires_days: + expires_at = (datetime.now() + timedelta(days=expires_days)).isoformat() + + api_key = ApiKey( + id=key_id, + key_hash=key_hash, + key_preview=key_preview, + name=name, + owner_id=owner_id, + permissions=permissions, + rate_limit=rate_limit, + status=ApiKeyStatus.ACTIVE.value, + created_at=datetime.now().isoformat(), + expires_at=expires_at, + last_used_at=None, + revoked_at=None, + revoked_reason=None, + total_calls=0 + ) + + with sqlite3.connect(self.db_path) as conn: + conn.execute(""" + INSERT INTO api_keys ( + id, key_hash, key_preview, name, owner_id, permissions, + rate_limit, status, created_at, expires_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + api_key.id, api_key.key_hash, api_key.key_preview, + api_key.name, api_key.owner_id, json.dumps(api_key.permissions), + api_key.rate_limit, api_key.status, api_key.created_at, + api_key.expires_at + )) + conn.commit() + + return raw_key, api_key + + def validate_key(self, key: str) -> Optional[ApiKey]: + """ + 验证 API Key + + Returns: + ApiKey if valid, None otherwise + """ + key_hash = self._hash_key(key) + + with sqlite3.connect(self.db_path) as conn: + conn.row_factory = sqlite3.Row + row = conn.execute( + "SELECT * FROM api_keys WHERE key_hash = ?", + (key_hash,) + ).fetchone() + + if not row: + return None + + api_key = self._row_to_api_key(row) + + # 检查状态 + if api_key.status != ApiKeyStatus.ACTIVE.value: + return None + + # 检查是否过期 + if api_key.expires_at: + expires = datetime.fromisoformat(api_key.expires_at) + if datetime.now() > expires: + # 更新状态为过期 + conn.execute( + "UPDATE api_keys SET status = ? WHERE id = ?", + (ApiKeyStatus.EXPIRED.value, api_key.id) + ) + conn.commit() + return None + + return api_key + + def revoke_key( + self, + key_id: str, + reason: str = "", + owner_id: Optional[str] = None + ) -> bool: + """撤销 API Key""" + with sqlite3.connect(self.db_path) as conn: + # 验证所有权(如果提供了 owner_id) + if owner_id: + row = conn.execute( + "SELECT owner_id FROM api_keys WHERE id = ?", + (key_id,) + ).fetchone() + if not row or row[0] != owner_id: + return False + + cursor = conn.execute(""" + UPDATE api_keys + SET status = ?, revoked_at = ?, revoked_reason = ? + WHERE id = ? AND status = ? + """, ( + ApiKeyStatus.REVOKED.value, + datetime.now().isoformat(), + reason, + key_id, + ApiKeyStatus.ACTIVE.value + )) + conn.commit() + return cursor.rowcount > 0 + + def get_key_by_id(self, key_id: str, owner_id: Optional[str] = None) -> Optional[ApiKey]: + """通过 ID 获取 API Key(不包含敏感信息)""" + with sqlite3.connect(self.db_path) as conn: + conn.row_factory = sqlite3.Row + + if owner_id: + row = conn.execute( + "SELECT * FROM api_keys WHERE id = ? AND owner_id = ?", + (key_id, owner_id) + ).fetchone() + else: + row = conn.execute( + "SELECT * FROM api_keys WHERE id = ?", + (key_id,) + ).fetchone() + + if row: + return self._row_to_api_key(row) + return None + + def list_keys( + self, + owner_id: Optional[str] = None, + status: Optional[str] = None, + limit: int = 100, + offset: int = 0 + ) -> List[ApiKey]: + """列出 API Keys""" + with sqlite3.connect(self.db_path) as conn: + conn.row_factory = sqlite3.Row + + query = "SELECT * FROM api_keys WHERE 1=1" + params = [] + + if owner_id: + query += " AND owner_id = ?" + params.append(owner_id) + + if status: + query += " AND status = ?" + params.append(status) + + query += " ORDER BY created_at DESC LIMIT ? OFFSET ?" + params.extend([limit, offset]) + + rows = conn.execute(query, params).fetchall() + return [self._row_to_api_key(row) for row in rows] + + def update_key( + self, + key_id: str, + name: Optional[str] = None, + permissions: Optional[List[str]] = None, + rate_limit: Optional[int] = None, + owner_id: Optional[str] = None + ) -> bool: + """更新 API Key 信息""" + updates = [] + params = [] + + if name is not None: + updates.append("name = ?") + params.append(name) + + if permissions is not None: + updates.append("permissions = ?") + params.append(json.dumps(permissions)) + + if rate_limit is not None: + updates.append("rate_limit = ?") + params.append(rate_limit) + + if not updates: + return False + + params.append(key_id) + + with sqlite3.connect(self.db_path) as conn: + # 验证所有权 + if owner_id: + row = conn.execute( + "SELECT owner_id FROM api_keys WHERE id = ?", + (key_id,) + ).fetchone() + if not row or row[0] != owner_id: + return False + + query = f"UPDATE api_keys SET {', '.join(updates)} WHERE id = ?" + cursor = conn.execute(query, params) + conn.commit() + return cursor.rowcount > 0 + + def update_last_used(self, key_id: str): + """更新最后使用时间""" + with sqlite3.connect(self.db_path) as conn: + conn.execute(""" + UPDATE api_keys + SET last_used_at = ?, total_calls = total_calls + 1 + WHERE id = ? + """, (datetime.now().isoformat(), key_id)) + conn.commit() + + def log_api_call( + self, + api_key_id: str, + endpoint: str, + method: str, + status_code: int = 200, + response_time_ms: int = 0, + ip_address: str = "", + user_agent: str = "", + error_message: str = "" + ): + """记录 API 调用日志""" + with sqlite3.connect(self.db_path) as conn: + conn.execute(""" + INSERT INTO api_call_logs + (api_key_id, endpoint, method, status_code, response_time_ms, + ip_address, user_agent, error_message) + VALUES (?, ?, ?, ?, ?, ?, ?, ?) + """, ( + api_key_id, endpoint, method, status_code, response_time_ms, + ip_address, user_agent, error_message + )) + conn.commit() + + def get_call_logs( + self, + api_key_id: Optional[str] = None, + start_date: Optional[str] = None, + end_date: Optional[str] = None, + limit: int = 100, + offset: int = 0 + ) -> List[Dict]: + """获取 API 调用日志""" + with sqlite3.connect(self.db_path) as conn: + conn.row_factory = sqlite3.Row + + query = "SELECT * FROM api_call_logs WHERE 1=1" + params = [] + + if api_key_id: + query += " AND api_key_id = ?" + params.append(api_key_id) + + if start_date: + query += " AND created_at >= ?" + params.append(start_date) + + if end_date: + query += " AND created_at <= ?" + params.append(end_date) + + query += " ORDER BY created_at DESC LIMIT ? OFFSET ?" + params.extend([limit, offset]) + + rows = conn.execute(query, params).fetchall() + return [dict(row) for row in rows] + + def get_call_stats( + self, + api_key_id: Optional[str] = None, + days: int = 30 + ) -> Dict: + """获取 API 调用统计""" + with sqlite3.connect(self.db_path) as conn: + conn.row_factory = sqlite3.Row + + # 总体统计 + query = """ + SELECT + COUNT(*) as total_calls, + COUNT(CASE WHEN status_code < 400 THEN 1 END) as success_calls, + COUNT(CASE WHEN status_code >= 400 THEN 1 END) as error_calls, + AVG(response_time_ms) as avg_response_time, + MAX(response_time_ms) as max_response_time, + MIN(response_time_ms) as min_response_time + FROM api_call_logs + WHERE created_at >= date('now', '-{} days') + """.format(days) + + params = [] + if api_key_id: + query = query.replace("WHERE created_at", "WHERE api_key_id = ? AND created_at") + params.insert(0, api_key_id) + + row = conn.execute(query, params).fetchone() + + # 按端点统计 + endpoint_query = """ + SELECT + endpoint, + method, + COUNT(*) as calls, + AVG(response_time_ms) as avg_time + FROM api_call_logs + WHERE created_at >= date('now', '-{} days') + """.format(days) + + endpoint_params = [] + if api_key_id: + endpoint_query = endpoint_query.replace("WHERE created_at", "WHERE api_key_id = ? AND created_at") + endpoint_params.insert(0, api_key_id) + + endpoint_query += " GROUP BY endpoint, method ORDER BY calls DESC" + + endpoint_rows = conn.execute(endpoint_query, endpoint_params).fetchall() + + # 按天统计 + daily_query = """ + SELECT + date(created_at) as date, + COUNT(*) as calls, + COUNT(CASE WHEN status_code < 400 THEN 1 END) as success + FROM api_call_logs + WHERE created_at >= date('now', '-{} days') + """.format(days) + + daily_params = [] + if api_key_id: + daily_query = daily_query.replace("WHERE created_at", "WHERE api_key_id = ? AND created_at") + daily_params.insert(0, api_key_id) + + daily_query += " GROUP BY date(created_at) ORDER BY date" + + daily_rows = conn.execute(daily_query, daily_params).fetchall() + + return { + "summary": { + "total_calls": row["total_calls"] or 0, + "success_calls": row["success_calls"] or 0, + "error_calls": row["error_calls"] or 0, + "avg_response_time_ms": round(row["avg_response_time"] or 0, 2), + "max_response_time_ms": row["max_response_time"] or 0, + "min_response_time_ms": row["min_response_time"] or 0, + }, + "endpoints": [dict(r) for r in endpoint_rows], + "daily": [dict(r) for r in daily_rows] + } + + def _row_to_api_key(self, row: sqlite3.Row) -> ApiKey: + """将数据库行转换为 ApiKey 对象""" + return ApiKey( + id=row["id"], + key_hash=row["key_hash"], + key_preview=row["key_preview"], + name=row["name"], + owner_id=row["owner_id"], + permissions=json.loads(row["permissions"]), + rate_limit=row["rate_limit"], + status=row["status"], + created_at=row["created_at"], + expires_at=row["expires_at"], + last_used_at=row["last_used_at"], + revoked_at=row["revoked_at"], + revoked_reason=row["revoked_reason"], + total_calls=row["total_calls"] + ) + + +# 全局实例 +_api_key_manager: Optional[ApiKeyManager] = None + + +def get_api_key_manager() -> ApiKeyManager: + """获取 API Key 管理器实例""" + global _api_key_manager + if _api_key_manager is None: + _api_key_manager = ApiKeyManager() + return _api_key_manager diff --git a/backend/db_manager.py b/backend/db_manager.py index 3871d55..2be6b70 100644 --- a/backend/db_manager.py +++ b/backend/db_manager.py @@ -878,6 +878,310 @@ class DatabaseManager: filtered.append(entity) return filtered + # ==================== Phase 7: Multimodal Support ==================== + + def create_video(self, video_id: str, project_id: str, filename: str, + duration: float = 0, fps: float = 0, resolution: Dict = None, + audio_transcript_id: str = None, full_ocr_text: str = "", + extracted_entities: List[Dict] = None, + extracted_relations: List[Dict] = None) -> str: + """创建视频记录""" + conn = self.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT INTO videos + (id, project_id, filename, duration, fps, resolution, + audio_transcript_id, full_ocr_text, extracted_entities, + extracted_relations, status, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (video_id, project_id, filename, duration, fps, + json.dumps(resolution) if resolution else None, + audio_transcript_id, full_ocr_text, + json.dumps(extracted_entities or []), + json.dumps(extracted_relations or []), + 'completed', now, now) + ) + conn.commit() + conn.close() + return video_id + + def get_video(self, video_id: str) -> Optional[Dict]: + """获取视频信息""" + conn = self.get_conn() + row = conn.execute( + "SELECT * FROM videos WHERE id = ?", (video_id,) + ).fetchone() + conn.close() + + if row: + data = dict(row) + data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None + data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else [] + data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else [] + return data + return None + + def list_project_videos(self, project_id: str) -> List[Dict]: + """获取项目的所有视频""" + conn = self.get_conn() + rows = conn.execute( + "SELECT * FROM videos WHERE project_id = ? ORDER BY created_at DESC", + (project_id,) + ).fetchall() + conn.close() + + videos = [] + for row in rows: + data = dict(row) + data['resolution'] = json.loads(data['resolution']) if data['resolution'] else None + data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else [] + data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else [] + videos.append(data) + return videos + + def create_video_frame(self, frame_id: str, video_id: str, frame_number: int, + timestamp: float, image_url: str = None, + ocr_text: str = None, extracted_entities: List[Dict] = None) -> str: + """创建视频帧记录""" + conn = self.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT INTO video_frames + (id, video_id, frame_number, timestamp, image_url, ocr_text, extracted_entities, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", + (frame_id, video_id, frame_number, timestamp, image_url, ocr_text, + json.dumps(extracted_entities or []), now) + ) + conn.commit() + conn.close() + return frame_id + + def get_video_frames(self, video_id: str) -> List[Dict]: + """获取视频的所有帧""" + conn = self.get_conn() + rows = conn.execute( + """SELECT * FROM video_frames WHERE video_id = ? ORDER BY timestamp""", + (video_id,) + ).fetchall() + conn.close() + + frames = [] + for row in rows: + data = dict(row) + data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else [] + frames.append(data) + return frames + + def create_image(self, image_id: str, project_id: str, filename: str, + ocr_text: str = "", description: str = "", + extracted_entities: List[Dict] = None, + extracted_relations: List[Dict] = None) -> str: + """创建图片记录""" + conn = self.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT INTO images + (id, project_id, filename, ocr_text, description, + extracted_entities, extracted_relations, status, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (image_id, project_id, filename, ocr_text, description, + json.dumps(extracted_entities or []), + json.dumps(extracted_relations or []), + 'completed', now, now) + ) + conn.commit() + conn.close() + return image_id + + def get_image(self, image_id: str) -> Optional[Dict]: + """获取图片信息""" + conn = self.get_conn() + row = conn.execute( + "SELECT * FROM images WHERE id = ?", (image_id,) + ).fetchone() + conn.close() + + if row: + data = dict(row) + data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else [] + data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else [] + return data + return None + + def list_project_images(self, project_id: str) -> List[Dict]: + """获取项目的所有图片""" + conn = self.get_conn() + rows = conn.execute( + "SELECT * FROM images WHERE project_id = ? ORDER BY created_at DESC", + (project_id,) + ).fetchall() + conn.close() + + images = [] + for row in rows: + data = dict(row) + data['extracted_entities'] = json.loads(data['extracted_entities']) if data['extracted_entities'] else [] + data['extracted_relations'] = json.loads(data['extracted_relations']) if data['extracted_relations'] else [] + images.append(data) + return images + + def create_multimodal_mention(self, mention_id: str, project_id: str, + entity_id: str, modality: str, source_id: str, + source_type: str, text_snippet: str = "", + confidence: float = 1.0) -> str: + """创建多模态实体提及记录""" + conn = self.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT OR REPLACE INTO multimodal_mentions + (id, project_id, entity_id, modality, source_id, source_type, + text_snippet, confidence, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (mention_id, project_id, entity_id, modality, source_id, + source_type, text_snippet, confidence, now) + ) + conn.commit() + conn.close() + return mention_id + + def get_entity_multimodal_mentions(self, entity_id: str) -> List[Dict]: + """获取实体的多模态提及""" + conn = self.get_conn() + rows = conn.execute( + """SELECT m.*, e.name as entity_name + FROM multimodal_mentions m + JOIN entities e ON m.entity_id = e.id + WHERE m.entity_id = ? ORDER BY m.created_at DESC""", + (entity_id,) + ).fetchall() + conn.close() + return [dict(r) for r in rows] + + def get_project_multimodal_mentions(self, project_id: str, + modality: str = None) -> List[Dict]: + """获取项目的多模态提及""" + conn = self.get_conn() + + if modality: + rows = conn.execute( + """SELECT m.*, e.name as entity_name + FROM multimodal_mentions m + JOIN entities e ON m.entity_id = e.id + WHERE m.project_id = ? AND m.modality = ? + ORDER BY m.created_at DESC""", + (project_id, modality) + ).fetchall() + else: + rows = conn.execute( + """SELECT m.*, e.name as entity_name + FROM multimodal_mentions m + JOIN entities e ON m.entity_id = e.id + WHERE m.project_id = ? ORDER BY m.created_at DESC""", + (project_id,) + ).fetchall() + + conn.close() + return [dict(r) for r in rows] + + def create_multimodal_entity_link(self, link_id: str, entity_id: str, + linked_entity_id: str, link_type: str, + confidence: float = 1.0, + evidence: str = "", + modalities: List[str] = None) -> str: + """创建多模态实体关联""" + conn = self.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT OR REPLACE INTO multimodal_entity_links + (id, entity_id, linked_entity_id, link_type, confidence, + evidence, modalities, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", + (link_id, entity_id, linked_entity_id, link_type, confidence, + evidence, json.dumps(modalities or []), now) + ) + conn.commit() + conn.close() + return link_id + + def get_entity_multimodal_links(self, entity_id: str) -> List[Dict]: + """获取实体的多模态关联""" + conn = self.get_conn() + rows = conn.execute( + """SELECT l.*, e1.name as entity_name, e2.name as linked_entity_name + FROM multimodal_entity_links l + JOIN entities e1 ON l.entity_id = e1.id + JOIN entities e2 ON l.linked_entity_id = e2.id + WHERE l.entity_id = ? OR l.linked_entity_id = ?""", + (entity_id, entity_id) + ).fetchall() + conn.close() + + links = [] + for row in rows: + data = dict(row) + data['modalities'] = json.loads(data['modalities']) if data['modalities'] else [] + links.append(data) + return links + + def get_project_multimodal_stats(self, project_id: str) -> Dict: + """获取项目多模态统计信息""" + conn = self.get_conn() + + stats = { + 'video_count': 0, + 'image_count': 0, + 'multimodal_entity_count': 0, + 'cross_modal_links': 0, + 'modality_distribution': {} + } + + # 视频数量 + row = conn.execute( + "SELECT COUNT(*) as count FROM videos WHERE project_id = ?", + (project_id,) + ).fetchone() + stats['video_count'] = row['count'] + + # 图片数量 + row = conn.execute( + "SELECT COUNT(*) as count FROM images WHERE project_id = ?", + (project_id,) + ).fetchone() + stats['image_count'] = row['count'] + + # 多模态实体数量 + row = conn.execute( + """SELECT COUNT(DISTINCT entity_id) as count + FROM multimodal_mentions WHERE project_id = ?""", + (project_id,) + ).fetchone() + stats['multimodal_entity_count'] = row['count'] + + # 跨模态关联数量 + row = conn.execute( + """SELECT COUNT(*) as count FROM multimodal_entity_links + WHERE entity_id IN (SELECT id FROM entities WHERE project_id = ?)""", + (project_id,) + ).fetchone() + stats['cross_modal_links'] = row['count'] + + # 模态分布 + for modality in ['audio', 'video', 'image', 'document']: + row = conn.execute( + """SELECT COUNT(*) as count FROM multimodal_mentions + WHERE project_id = ? AND modality = ?""", + (project_id, modality) + ).fetchone() + stats['modality_distribution'][modality] = row['count'] + + conn.close() + return stats + # Singleton instance _db_manager = None diff --git a/backend/docs/multimodal_api.md b/backend/docs/multimodal_api.md new file mode 100644 index 0000000..a31b981 --- /dev/null +++ b/backend/docs/multimodal_api.md @@ -0,0 +1,308 @@ +# InsightFlow Phase 7 - 多模态支持 API 文档 + +## 概述 + +Phase 7 多模态支持模块为 InsightFlow 添加了处理视频和图片的能力,支持: + +1. **视频处理**:提取音频、关键帧、OCR 识别 +2. **图片处理**:识别白板、PPT、手写笔记等内容 +3. **多模态实体关联**:跨模态实体对齐和知识融合 + +## 新增 API 端点 + +### 视频处理 + +#### 上传视频 +``` +POST /api/v1/projects/{project_id}/upload-video +``` + +**参数:** +- `file` (required): 视频文件 +- `extract_interval` (optional): 关键帧提取间隔(秒),默认 5 秒 + +**响应:** +```json +{ + "video_id": "abc123", + "project_id": "proj456", + "filename": "meeting.mp4", + "status": "completed", + "audio_extracted": true, + "frame_count": 24, + "ocr_text_preview": "会议内容预览...", + "message": "Video processed successfully" +} +``` + +#### 获取项目视频列表 +``` +GET /api/v1/projects/{project_id}/videos +``` + +**响应:** +```json +[ + { + "id": "abc123", + "filename": "meeting.mp4", + "duration": 120.5, + "fps": 30.0, + "resolution": {"width": 1920, "height": 1080}, + "ocr_preview": "会议内容...", + "status": "completed", + "created_at": "2024-01-15T10:30:00" + } +] +``` + +#### 获取视频关键帧 +``` +GET /api/v1/videos/{video_id}/frames +``` + +**响应:** +```json +[ + { + "id": "frame001", + "frame_number": 1, + "timestamp": 0.0, + "image_url": "/tmp/frames/video123/frame_000001_0.00.jpg", + "ocr_text": "第一页内容...", + "entities": [{"name": "Project Alpha", "type": "PROJECT"}] + } +] +``` + +### 图片处理 + +#### 上传图片 +``` +POST /api/v1/projects/{project_id}/upload-image +``` + +**参数:** +- `file` (required): 图片文件 +- `detect_type` (optional): 是否自动检测图片类型,默认 true + +**响应:** +```json +{ + "image_id": "img789", + "project_id": "proj456", + "filename": "whiteboard.jpg", + "image_type": "whiteboard", + "ocr_text_preview": "白板内容...", + "description": "这是一张白板图片。内容摘要:...", + "entity_count": 5, + "status": "completed" +} +``` + +#### 批量上传图片 +``` +POST /api/v1/projects/{project_id}/upload-images-batch +``` + +**参数:** +- `files` (required): 多个图片文件 + +**响应:** +```json +{ + "project_id": "proj456", + "total_count": 3, + "success_count": 3, + "failed_count": 0, + "results": [ + { + "image_id": "img001", + "status": "success", + "image_type": "ppt", + "entity_count": 4 + } + ] +} +``` + +#### 获取项目图片列表 +``` +GET /api/v1/projects/{project_id}/images +``` + +### 多模态实体关联 + +#### 跨模态实体对齐 +``` +POST /api/v1/projects/{project_id}/multimodal/align +``` + +**参数:** +- `threshold` (optional): 相似度阈值,默认 0.85 + +**响应:** +```json +{ + "project_id": "proj456", + "aligned_count": 5, + "links": [ + { + "link_id": "link001", + "source_entity_id": "ent001", + "target_entity_id": "ent002", + "source_modality": "video", + "target_modality": "document", + "link_type": "same_as", + "confidence": 0.95, + "evidence": "Cross-modal alignment: exact" + } + ], + "message": "Successfully aligned 5 cross-modal entity pairs" +} +``` + +#### 获取多模态统计信息 +``` +GET /api/v1/projects/{project_id}/multimodal/stats +``` + +**响应:** +```json +{ + "project_id": "proj456", + "video_count": 3, + "image_count": 10, + "multimodal_entity_count": 25, + "cross_modal_links": 8, + "modality_distribution": { + "audio": 15, + "video": 8, + "image": 12, + "document": 20 + } +} +``` + +#### 获取实体多模态提及 +``` +GET /api/v1/entities/{entity_id}/multimodal-mentions +``` + +**响应:** +```json +[ + { + "id": "mention001", + "entity_id": "ent001", + "entity_name": "Project Alpha", + "modality": "video", + "source_id": "video123", + "source_type": "video_frame", + "text_snippet": "Project Alpha 进度", + "confidence": 1.0, + "created_at": "2024-01-15T10:30:00" + } +] +``` + +#### 建议多模态实体合并 +``` +GET /api/v1/projects/{project_id}/multimodal/suggest-merges +``` + +**响应:** +```json +{ + "project_id": "proj456", + "suggestion_count": 3, + "suggestions": [ + { + "entity1": {"id": "ent001", "name": "K8s", "type": "TECH"}, + "entity2": {"id": "ent002", "name": "Kubernetes", "type": "TECH"}, + "similarity": 0.95, + "match_type": "alias_match", + "suggested_action": "merge" + } + ] +} +``` + +## 数据库表结构 + +### videos 表 +存储视频文件信息 +- `id`: 视频ID +- `project_id`: 所属项目ID +- `filename`: 文件名 +- `duration`: 视频时长(秒) +- `fps`: 帧率 +- `resolution`: 分辨率(JSON) +- `audio_transcript_id`: 关联的音频转录ID +- `full_ocr_text`: 所有帧OCR文本合并 +- `extracted_entities`: 提取的实体(JSON) +- `extracted_relations`: 提取的关系(JSON) +- `status`: 处理状态 + +### video_frames 表 +存储视频关键帧信息 +- `id`: 帧ID +- `video_id`: 所属视频ID +- `frame_number`: 帧序号 +- `timestamp`: 时间戳(秒) +- `image_url`: 图片URL或路径 +- `ocr_text`: OCR识别文本 +- `extracted_entities`: 该帧提取的实体 + +### images 表 +存储图片文件信息 +- `id`: 图片ID +- `project_id`: 所属项目ID +- `filename`: 文件名 +- `ocr_text`: OCR识别文本 +- `description`: 图片描述 +- `extracted_entities`: 提取的实体 +- `extracted_relations`: 提取的关系 +- `status`: 处理状态 + +### multimodal_mentions 表 +存储实体在多模态中的提及 +- `id`: 提及ID +- `project_id`: 所属项目ID +- `entity_id`: 实体ID +- `modality`: 模态类型(audio/video/image/document) +- `source_id`: 来源ID +- `source_type`: 来源类型 +- `text_snippet`: 文本片段 +- `confidence`: 置信度 + +### multimodal_entity_links 表 +存储跨模态实体关联 +- `id`: 关联ID +- `entity_id`: 实体ID +- `linked_entity_id`: 关联实体ID +- `link_type`: 关联类型(same_as/related_to/part_of) +- `confidence`: 置信度 +- `evidence`: 关联证据 +- `modalities`: 涉及的模态列表 + +## 依赖安装 + +```bash +pip install ffmpeg-python pillow opencv-python pytesseract +``` + +注意:使用 OCR 功能需要安装 Tesseract OCR 引擎: +- Ubuntu/Debian: `sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim` +- macOS: `brew install tesseract tesseract-lang` +- Windows: 下载安装包从 https://github.com/UB-Mannheim/tesseract/wiki + +## 环境变量 + +```bash +# 可选:自定义临时目录 +export INSIGHTFLOW_TEMP_DIR=/path/to/temp + +# 可选:Tesseract 路径(Windows) +export TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe +``` diff --git a/backend/image_processor.py b/backend/image_processor.py new file mode 100644 index 0000000..573e9cc --- /dev/null +++ b/backend/image_processor.py @@ -0,0 +1,547 @@ +#!/usr/bin/env python3 +""" +InsightFlow Image Processor - Phase 7 +图片处理模块:识别白板、PPT、手写笔记等内容 +""" + +import os +import io +import json +import uuid +import base64 +from typing import List, Dict, Optional, Tuple +from dataclasses import dataclass +from pathlib import Path + +# 尝试导入图像处理库 +try: + from PIL import Image, ImageEnhance, ImageFilter + PIL_AVAILABLE = True +except ImportError: + PIL_AVAILABLE = False + +try: + import cv2 + import numpy as np + CV2_AVAILABLE = True +except ImportError: + CV2_AVAILABLE = False + +try: + import pytesseract + PYTESSERACT_AVAILABLE = True +except ImportError: + PYTESSERACT_AVAILABLE = False + + +@dataclass +class ImageEntity: + """图片中检测到的实体""" + name: str + type: str + confidence: float + bbox: Optional[Tuple[int, int, int, int]] = None # (x, y, width, height) + + +@dataclass +class ImageRelation: + """图片中检测到的关系""" + source: str + target: str + relation_type: str + confidence: float + + +@dataclass +class ImageProcessingResult: + """图片处理结果""" + image_id: str + image_type: str # whiteboard, ppt, handwritten, screenshot, other + ocr_text: str + description: str + entities: List[ImageEntity] + relations: List[ImageRelation] + width: int + height: int + success: bool + error_message: str = "" + + +@dataclass +class BatchProcessingResult: + """批量图片处理结果""" + results: List[ImageProcessingResult] + total_count: int + success_count: int + failed_count: int + + +class ImageProcessor: + """图片处理器 - 处理各种类型图片""" + + # 图片类型定义 + IMAGE_TYPES = { + 'whiteboard': '白板', + 'ppt': 'PPT/演示文稿', + 'handwritten': '手写笔记', + 'screenshot': '屏幕截图', + 'document': '文档图片', + 'other': '其他' + } + + def __init__(self, temp_dir: str = None): + """ + 初始化图片处理器 + + Args: + temp_dir: 临时文件目录 + """ + self.temp_dir = temp_dir or os.path.join(os.getcwd(), 'temp', 'images') + os.makedirs(self.temp_dir, exist_ok=True) + + def preprocess_image(self, image, image_type: str = None): + """ + 预处理图片以提高OCR质量 + + Args: + image: PIL Image 对象 + image_type: 图片类型(用于针对性处理) + + Returns: + 处理后的图片 + """ + if not PIL_AVAILABLE: + return image + + try: + # 转换为RGB(如果是RGBA) + if image.mode == 'RGBA': + image = image.convert('RGB') + + # 根据图片类型进行针对性处理 + if image_type == 'whiteboard': + # 白板:增强对比度,去除背景 + image = self._enhance_whiteboard(image) + elif image_type == 'handwritten': + # 手写笔记:降噪,增强对比度 + image = self._enhance_handwritten(image) + elif image_type == 'screenshot': + # 截图:轻微锐化 + image = image.filter(ImageFilter.SHARPEN) + + # 通用处理:调整大小(如果太大) + max_size = 4096 + if max(image.size) > max_size: + ratio = max_size / max(image.size) + new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio)) + image = image.resize(new_size, Image.Resampling.LANCZOS) + + return image + except Exception as e: + print(f"Image preprocessing error: {e}") + return image + + def _enhance_whiteboard(self, image): + """增强白板图片""" + # 转换为灰度 + gray = image.convert('L') + + # 增强对比度 + enhancer = ImageEnhance.Contrast(gray) + enhanced = enhancer.enhance(2.0) + + # 二值化 + threshold = 128 + binary = enhanced.point(lambda x: 0 if x < threshold else 255, '1') + + return binary.convert('L') + + def _enhance_handwritten(self, image): + """增强手写笔记图片""" + # 转换为灰度 + gray = image.convert('L') + + # 轻微降噪 + blurred = gray.filter(ImageFilter.GaussianBlur(radius=1)) + + # 增强对比度 + enhancer = ImageEnhance.Contrast(blurred) + enhanced = enhancer.enhance(1.5) + + return enhanced + + def detect_image_type(self, image, ocr_text: str = "") -> str: + """ + 自动检测图片类型 + + Args: + image: PIL Image 对象 + ocr_text: OCR识别的文本 + + Returns: + 图片类型字符串 + """ + if not PIL_AVAILABLE: + return 'other' + + try: + # 基于图片特征和OCR内容判断类型 + width, height = image.size + aspect_ratio = width / height + + # 检测是否为PPT(通常是16:9或4:3) + if 1.3 <= aspect_ratio <= 1.8: + # 检查是否有典型的PPT特征(标题、项目符号等) + if any(keyword in ocr_text.lower() for keyword in ['slide', 'page', '第', '页']): + return 'ppt' + + # 检测是否为白板(大量手写文字,可能有箭头、框等) + if CV2_AVAILABLE: + img_array = np.array(image.convert('RGB')) + gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY) + + # 检测边缘(白板通常有很多线条) + edges = cv2.Canny(gray, 50, 150) + edge_ratio = np.sum(edges > 0) / edges.size + + # 如果边缘比例高,可能是白板 + if edge_ratio > 0.05 and len(ocr_text) > 50: + return 'whiteboard' + + # 检测是否为手写笔记(文字密度高,可能有涂鸦) + if len(ocr_text) > 100 and aspect_ratio < 1.5: + # 检查手写特征(不规则的行高) + return 'handwritten' + + # 检测是否为截图(可能有UI元素) + if any(keyword in ocr_text.lower() for keyword in ['button', 'menu', 'click', '登录', '确定', '取消']): + return 'screenshot' + + # 默认文档类型 + if len(ocr_text) > 200: + return 'document' + + return 'other' + except Exception as e: + print(f"Image type detection error: {e}") + return 'other' + + def perform_ocr(self, image, lang: str = 'chi_sim+eng') -> Tuple[str, float]: + """ + 对图片进行OCR识别 + + Args: + image: PIL Image 对象 + lang: OCR语言 + + Returns: + (识别的文本, 置信度) + """ + if not PYTESSERACT_AVAILABLE: + return "", 0.0 + + try: + # 预处理图片 + processed_image = self.preprocess_image(image) + + # 执行OCR + text = pytesseract.image_to_string(processed_image, lang=lang) + + # 获取置信度 + data = pytesseract.image_to_data(processed_image, output_type=pytesseract.Output.DICT) + confidences = [int(c) for c in data['conf'] if int(c) > 0] + avg_confidence = sum(confidences) / len(confidences) if confidences else 0 + + return text.strip(), avg_confidence / 100.0 + except Exception as e: + print(f"OCR error: {e}") + return "", 0.0 + + def extract_entities_from_text(self, text: str) -> List[ImageEntity]: + """ + 从OCR文本中提取实体 + + Args: + text: OCR识别的文本 + + Returns: + 实体列表 + """ + entities = [] + + # 简单的实体提取规则(可以替换为LLM调用) + # 提取大写字母开头的词组(可能是专有名词) + import re + + # 项目名称(通常是大写或带引号) + project_pattern = r'["\']([^"\']+)["\']|([A-Z][a-zA-Z0-9]*(?:\s+[A-Z][a-zA-Z0-9]*)+)' + for match in re.finditer(project_pattern, text): + name = match.group(1) or match.group(2) + if name and len(name) > 2: + entities.append(ImageEntity( + name=name.strip(), + type='PROJECT', + confidence=0.7 + )) + + # 人名(中文) + name_pattern = r'([\u4e00-\u9fa5]{2,4})(?:先生|女士|总|经理|工程师|老师)' + for match in re.finditer(name_pattern, text): + entities.append(ImageEntity( + name=match.group(1), + type='PERSON', + confidence=0.8 + )) + + # 技术术语 + tech_keywords = ['K8s', 'Kubernetes', 'Docker', 'API', 'SDK', 'AI', 'ML', + 'Python', 'Java', 'React', 'Vue', 'Node.js', '数据库', '服务器'] + for keyword in tech_keywords: + if keyword in text: + entities.append(ImageEntity( + name=keyword, + type='TECH', + confidence=0.9 + )) + + # 去重 + seen = set() + unique_entities = [] + for e in entities: + key = (e.name.lower(), e.type) + if key not in seen: + seen.add(key) + unique_entities.append(e) + + return unique_entities + + def generate_description(self, image_type: str, ocr_text: str, + entities: List[ImageEntity]) -> str: + """ + 生成图片描述 + + Args: + image_type: 图片类型 + ocr_text: OCR文本 + entities: 检测到的实体 + + Returns: + 图片描述 + """ + type_name = self.IMAGE_TYPES.get(image_type, '图片') + + description_parts = [f"这是一张{type_name}图片。"] + + if ocr_text: + # 提取前200字符作为摘要 + text_preview = ocr_text[:200].replace('\n', ' ') + if len(ocr_text) > 200: + text_preview += "..." + description_parts.append(f"内容摘要:{text_preview}") + + if entities: + entity_names = [e.name for e in entities[:5]] # 最多显示5个实体 + description_parts.append(f"识别到的关键实体:{', '.join(entity_names)}") + + return " ".join(description_parts) + + def process_image(self, image_data: bytes, filename: str = None, + image_id: str = None, detect_type: bool = True) -> ImageProcessingResult: + """ + 处理单张图片 + + Args: + image_data: 图片二进制数据 + filename: 文件名 + image_id: 图片ID(可选) + detect_type: 是否自动检测图片类型 + + Returns: + 图片处理结果 + """ + image_id = image_id or str(uuid.uuid4())[:8] + + if not PIL_AVAILABLE: + return ImageProcessingResult( + image_id=image_id, + image_type='other', + ocr_text='', + description='PIL not available', + entities=[], + relations=[], + width=0, + height=0, + success=False, + error_message='PIL library not available' + ) + + try: + # 加载图片 + image = Image.open(io.BytesIO(image_data)) + width, height = image.size + + # 执行OCR + ocr_text, ocr_confidence = self.perform_ocr(image) + + # 检测图片类型 + image_type = 'other' + if detect_type: + image_type = self.detect_image_type(image, ocr_text) + + # 提取实体 + entities = self.extract_entities_from_text(ocr_text) + + # 生成描述 + description = self.generate_description(image_type, ocr_text, entities) + + # 提取关系(基于实体共现) + relations = self._extract_relations(entities, ocr_text) + + # 保存图片文件(可选) + if filename: + save_path = os.path.join(self.temp_dir, f"{image_id}_{filename}") + image.save(save_path) + + return ImageProcessingResult( + image_id=image_id, + image_type=image_type, + ocr_text=ocr_text, + description=description, + entities=entities, + relations=relations, + width=width, + height=height, + success=True + ) + + except Exception as e: + return ImageProcessingResult( + image_id=image_id, + image_type='other', + ocr_text='', + description='', + entities=[], + relations=[], + width=0, + height=0, + success=False, + error_message=str(e) + ) + + def _extract_relations(self, entities: List[ImageEntity], text: str) -> List[ImageRelation]: + """ + 从文本中提取实体关系 + + Args: + entities: 实体列表 + text: 文本内容 + + Returns: + 关系列表 + """ + relations = [] + + if len(entities) < 2: + return relations + + # 简单的关系提取:如果两个实体在同一句子中出现,则认为它们相关 + sentences = text.replace('。', '.').replace('!', '!').replace('?', '?').split('.') + + for sentence in sentences: + sentence_entities = [] + for entity in entities: + if entity.name in sentence: + sentence_entities.append(entity) + + # 如果句子中有多个实体,建立关系 + if len(sentence_entities) >= 2: + for i in range(len(sentence_entities)): + for j in range(i + 1, len(sentence_entities)): + relations.append(ImageRelation( + source=sentence_entities[i].name, + target=sentence_entities[j].name, + relation_type='related', + confidence=0.5 + )) + + return relations + + def process_batch(self, images_data: List[Tuple[bytes, str]], + project_id: str = None) -> BatchProcessingResult: + """ + 批量处理图片 + + Args: + images_data: 图片数据列表,每项为 (image_data, filename) + project_id: 项目ID + + Returns: + 批量处理结果 + """ + results = [] + success_count = 0 + failed_count = 0 + + for image_data, filename in images_data: + result = self.process_image(image_data, filename) + results.append(result) + + if result.success: + success_count += 1 + else: + failed_count += 1 + + return BatchProcessingResult( + results=results, + total_count=len(results), + success_count=success_count, + failed_count=failed_count + ) + + def image_to_base64(self, image_data: bytes) -> str: + """ + 将图片转换为base64编码 + + Args: + image_data: 图片二进制数据 + + Returns: + base64编码的字符串 + """ + return base64.b64encode(image_data).decode('utf-8') + + def get_image_thumbnail(self, image_data: bytes, size: Tuple[int, int] = (200, 200)) -> bytes: + """ + 生成图片缩略图 + + Args: + image_data: 图片二进制数据 + size: 缩略图尺寸 + + Returns: + 缩略图二进制数据 + """ + if not PIL_AVAILABLE: + return image_data + + try: + image = Image.open(io.BytesIO(image_data)) + image.thumbnail(size, Image.Resampling.LANCZOS) + + buffer = io.BytesIO() + image.save(buffer, format='JPEG') + return buffer.getvalue() + except Exception as e: + print(f"Thumbnail generation error: {e}") + return image_data + + +# Singleton instance +_image_processor = None + +def get_image_processor(temp_dir: str = None) -> ImageProcessor: + """获取图片处理器单例""" + global _image_processor + if _image_processor is None: + _image_processor = ImageProcessor(temp_dir) + return _image_processor diff --git a/backend/main.py b/backend/main.py index 7fc5533..2d45278 100644 --- a/backend/main.py +++ b/backend/main.py @@ -1,6 +1,7 @@ #!/usr/bin/env python3 """ -InsightFlow Backend - Phase 3 (Memory & Growth) +InsightFlow Backend - Phase 6 (API Platform) +API 开放平台:API Key 管理、Swagger 文档、限流 Knowledge Growth: Multi-file fusion + Entity Alignment + Document Import ASR: 阿里云听悟 + OSS """ @@ -8,15 +9,19 @@ ASR: 阿里云听悟 + OSS import os import sys import json +import hashlib +import secrets import httpx import uuid import re import io -from fastapi import FastAPI, File, UploadFile, HTTPException, Form +import time +from fastapi import FastAPI, File, UploadFile, HTTPException, Form, Depends, Header, Request from fastapi.middleware.cors import CORSMiddleware from fastapi.staticfiles import StaticFiles -from pydantic import BaseModel -from typing import List, Optional, Union +from fastapi.responses import JSONResponse +from pydantic import BaseModel, Field +from typing import List, Optional, Union, Dict from datetime import datetime # Add backend directory to path for imports @@ -79,14 +84,143 @@ try: except ImportError: NEO4J_AVAILABLE = False +# Phase 6: API Key Manager try: - from collaboration_manager import get_collaboration_manager, CollaborationManager - COLLABORATION_AVAILABLE = True + from api_key_manager import get_api_key_manager, ApiKeyManager, ApiKey + API_KEY_AVAILABLE = True except ImportError as e: - print(f"Collaboration import error: {e}") - COLLABORATION_AVAILABLE = False + print(f"API Key Manager import error: {e}") + API_KEY_AVAILABLE = False -app = FastAPI(title="InsightFlow", version="0.3.0") +# Phase 6: Rate Limiter +try: + from rate_limiter import get_rate_limiter, RateLimitConfig, RateLimitInfo + RATE_LIMITER_AVAILABLE = True +except ImportError as e: + print(f"Rate Limiter import error: {e}") + RATE_LIMITER_AVAILABLE = False + +# Phase 7: Workflow Manager +try: + from workflow_manager import ( + get_workflow_manager, WorkflowManager, Workflow, WorkflowTask, + WebhookConfig, WorkflowLog, WorkflowType, WebhookType, TaskStatus + ) + WORKFLOW_AVAILABLE = True +except ImportError as e: + print(f"Workflow Manager import error: {e}") + WORKFLOW_AVAILABLE = False + +# Phase 7: Multimodal Support +try: + from multimodal_processor import ( + get_multimodal_processor, MultimodalProcessor, + VideoProcessingResult, VideoFrame + ) + MULTIMODAL_AVAILABLE = True +except ImportError as e: + print(f"Multimodal Processor import error: {e}") + MULTIMODAL_AVAILABLE = False + +try: + from image_processor import ( + get_image_processor, ImageProcessor, + ImageProcessingResult, ImageEntity, ImageRelation + ) + IMAGE_PROCESSOR_AVAILABLE = True +except ImportError as e: + print(f"Image Processor import error: {e}") + IMAGE_PROCESSOR_AVAILABLE = False + +try: + from multimodal_entity_linker import ( + get_multimodal_entity_linker, MultimodalEntityLinker, + MultimodalEntity, EntityLink, AlignmentResult, FusionResult + ) + MULTIMODAL_LINKER_AVAILABLE = True +except ImportError as e: + print(f"Multimodal Entity Linker import error: {e}") + MULTIMODAL_LINKER_AVAILABLE = False + +# Phase 7 Task 7: Plugin Manager +try: + from plugin_manager import ( + get_plugin_manager, PluginManager, Plugin, + BotSession, WebhookEndpoint, WebDAVSync, + PluginType, PluginStatus, ChromeExtensionHandler, BotHandler, + WebhookIntegration + ) + PLUGIN_MANAGER_AVAILABLE = True +except ImportError as e: + print(f"Plugin Manager import error: {e}") + PLUGIN_MANAGER_AVAILABLE = False + +# Phase 7 Task 3: Security Manager +try: + from security_manager import ( + get_security_manager, SecurityManager, + AuditLog, EncryptionConfig, MaskingRule, DataAccessPolicy, AccessRequest, + AuditActionType, MaskingRuleType + ) + SECURITY_MANAGER_AVAILABLE = True +except ImportError as e: + print(f"Security Manager import error: {e}") + SECURITY_MANAGER_AVAILABLE = False + +# FastAPI app with enhanced metadata for Swagger +app = FastAPI( + title="InsightFlow API", + description=""" + InsightFlow 知识管理平台 API + + ## 功能 + + * **项目管理** - 创建、读取、更新、删除项目 + * **实体管理** - 实体提取、对齐、属性管理 + * **关系管理** - 实体关系创建、查询、分析 + * **转录管理** - 音频转录、文档导入 + * **知识推理** - 因果推理、对比分析、时序分析 + * **图分析** - Neo4j 图数据库集成、路径查询 + * **导出功能** - 多种格式导出(PDF、Excel、CSV、JSON) + * **工作流** - 自动化任务、Webhook 通知 + + ## 认证 + + 大部分 API 需要 API Key 认证。在请求头中添加: + ``` + X-API-Key: your_api_key_here + ``` + """, + version="0.7.0", + contact={ + "name": "InsightFlow Team", + "url": "https://github.com/insightflow/insightflow", + }, + license_info={ + "name": "MIT", + "url": "https://opensource.org/licenses/MIT", + }, + openapi_tags=[ + {"name": "Projects", "description": "项目管理"}, + {"name": "Entities", "description": "实体管理"}, + {"name": "Relations", "description": "关系管理"}, + {"name": "Transcripts", "description": "转录管理"}, + {"name": "Analysis", "description": "分析和推理"}, + {"name": "Graph", "description": "图分析和 Neo4j"}, + {"name": "Export", "description": "数据导出"}, + {"name": "API Keys", "description": "API 密钥管理"}, + {"name": "Workflows", "description": "工作流自动化"}, + {"name": "Webhooks", "description": "Webhook 配置"}, + {"name": "Multimodal", "description": "多模态支持(视频、图片)"}, + {"name": "Plugins", "description": "插件管理"}, + {"name": "Chrome Extension", "description": "Chrome 扩展集成"}, + {"name": "Bot", "description": "飞书/钉钉机器人"}, + {"name": "Integrations", "description": "Zapier/Make 集成"}, + {"name": "WebDAV", "description": "WebDAV 同步"}, + {"name": "Security", "description": "数据安全与合规(加密、脱敏、审计)"}, + {"name": "System", "description": "系统信息"}, + ] +) app.add_middleware( CORSMiddleware, @@ -96,7 +230,252 @@ app.add_middleware( allow_headers=["*"], ) -# Models +# ==================== Phase 6: API Key Authentication & Rate Limiting ==================== + +# 公开访问的路径(不需要 API Key) +PUBLIC_PATHS = { + "/", "/docs", "/openapi.json", "/redoc", + "/api/v1/health", "/api/v1/status", + "/api/v1/api-keys", # POST 创建 API Key 不需要认证 +} + +# 管理路径(需要 master key) +ADMIN_PATHS = { + "/api/v1/admin/", +} + +# Master Key(用于管理所有 API Keys) +MASTER_KEY = os.getenv("INSIGHTFLOW_MASTER_KEY", "") + + +async def verify_api_key(request: Request, x_api_key: Optional[str] = Header(None, alias="X-API-Key")): + """ + 验证 API Key 的依赖函数 + + - 公开路径不需要认证 + - 管理路径需要 master key + - 其他路径需要有效的 API Key + """ + path = request.url.path + method = request.method + + # 公开路径直接放行 + if any(path.startswith(p) for p in PUBLIC_PATHS): + return None + + # 创建 API Key 的端点不需要认证(但需要 master key 或其他验证) + if path == "/api/v1/api-keys" and method == "POST": + return None + + # 检查是否是管理路径 + if any(path.startswith(p) for p in ADMIN_PATHS): + if not x_api_key or x_api_key != MASTER_KEY: + raise HTTPException( + status_code=403, + detail="Admin access required. Provide valid master key in X-API-Key header." + ) + return {"type": "admin", "key": x_api_key} + + # 其他路径需要有效的 API Key + if not API_KEY_AVAILABLE: + # API Key 模块不可用,允许访问(开发模式) + return None + + if not x_api_key: + raise HTTPException( + status_code=401, + detail="API Key required. Provide your key in X-API-Key header.", + headers={"WWW-Authenticate": "ApiKey"} + ) + + # 验证 API Key + key_manager = get_api_key_manager() + api_key = key_manager.validate_key(x_api_key) + + if not api_key: + raise HTTPException( + status_code=401, + detail="Invalid or expired API Key" + ) + + # 更新最后使用时间 + key_manager.update_last_used(api_key.id) + + # 将 API Key 信息存储在请求状态中,供后续使用 + request.state.api_key = api_key + + return {"type": "api_key", "key_id": api_key.id, "permissions": api_key.permissions} + + +async def rate_limit_middleware(request: Request, call_next): + """ + 限流中间件 + """ + if not RATE_LIMITER_AVAILABLE or not API_KEY_AVAILABLE: + response = await call_next(request) + return response + + path = request.url.path + + # 公开路径不限流 + if any(path.startswith(p) for p in PUBLIC_PATHS): + response = await call_next(request) + return response + + # 获取限流键 + limiter = get_rate_limiter() + + # 检查是否有 API Key + x_api_key = request.headers.get("X-API-Key") + + if x_api_key and x_api_key == MASTER_KEY: + # Master key 有更高的限流 + config = RateLimitConfig(requests_per_minute=1000) + limit_key = f"master:{x_api_key[:16]}" + elif hasattr(request.state, 'api_key') and request.state.api_key: + # 使用 API Key 的限流配置 + api_key = request.state.api_key + config = RateLimitConfig(requests_per_minute=api_key.rate_limit) + limit_key = f"api_key:{api_key.id}" + else: + # IP 限流(未认证用户) + client_ip = request.client.host if request.client else "unknown" + config = RateLimitConfig(requests_per_minute=10) + limit_key = f"ip:{client_ip}" + + # 检查限流 + info = await limiter.is_allowed(limit_key, config) + + if not info.allowed: + return JSONResponse( + status_code=429, + content={ + "error": "Rate limit exceeded", + "retry_after": info.retry_after, + "limit": config.requests_per_minute, + "window": "minute" + }, + headers={ + "X-RateLimit-Limit": str(config.requests_per_minute), + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": str(info.reset_time), + "Retry-After": str(info.retry_after) + } + ) + + # 继续处理请求 + start_time = time.time() + response = await call_next(request) + + # 添加限流头 + response.headers["X-RateLimit-Limit"] = str(config.requests_per_minute) + response.headers["X-RateLimit-Remaining"] = str(info.remaining) + response.headers["X-RateLimit-Reset"] = str(info.reset_time) + + # 记录 API 调用日志 + try: + if hasattr(request.state, 'api_key') and request.state.api_key: + api_key = request.state.api_key + response_time = int((time.time() - start_time) * 1000) + key_manager = get_api_key_manager() + key_manager.log_api_call( + api_key_id=api_key.id, + endpoint=path, + method=request.method, + status_code=response.status_code, + response_time_ms=response_time, + ip_address=request.client.host if request.client else "", + user_agent=request.headers.get("User-Agent", "") + ) + except Exception as e: + # 日志记录失败不应影响主流程 + print(f"Failed to log API call: {e}") + + return response + + +# 添加限流中间件 +app.middleware("http")(rate_limit_middleware) + +# ==================== Phase 6: Pydantic Models for API ==================== + +# API Key 相关模型 +class ApiKeyCreate(BaseModel): + name: str = Field(..., description="API Key 名称/描述") + permissions: List[str] = Field(default=["read"], description="权限列表: read, write, delete") + rate_limit: int = Field(default=60, description="每分钟请求限制") + expires_days: Optional[int] = Field(default=None, description="过期天数(可选)") + + +class ApiKeyResponse(BaseModel): + id: str + key_preview: str + name: str + permissions: List[str] + rate_limit: int + status: str + created_at: str + expires_at: Optional[str] + last_used_at: Optional[str] + total_calls: int + + +class ApiKeyCreateResponse(BaseModel): + api_key: str = Field(..., description="API Key(仅显示一次,请妥善保存)") + info: ApiKeyResponse + + +class ApiKeyListResponse(BaseModel): + keys: List[ApiKeyResponse] + total: int + + +class ApiKeyUpdate(BaseModel): + name: Optional[str] = None + permissions: Optional[List[str]] = None + rate_limit: Optional[int] = None + + +class ApiCallStats(BaseModel): + total_calls: int + success_calls: int + error_calls: int + avg_response_time_ms: float + max_response_time_ms: int + min_response_time_ms: int + + +class ApiStatsResponse(BaseModel): + summary: ApiCallStats + endpoints: List[Dict] + daily: List[Dict] + + +class ApiCallLog(BaseModel): + id: int + endpoint: str + method: str + status_code: int + response_time_ms: int + ip_address: str + user_agent: str + error_message: str + created_at: str + + +class ApiLogsResponse(BaseModel): + logs: List[ApiCallLog] + total: int + + +class RateLimitStatus(BaseModel): + limit: int + remaining: int + reset_time: int + window: str + + +# 原有模型(保留) class EntityModel(BaseModel): id: str name: str @@ -152,40 +531,172 @@ class GlossaryTermCreate(BaseModel): term: str pronunciation: Optional[str] = "" -# Phase 7: 协作与共享 - 请求模型 -class ShareLinkCreate(BaseModel): - permission: str = "read_only" # read_only, comment, edit, admin - expires_in_days: Optional[int] = None - max_uses: Optional[int] = None - password: Optional[str] = None - allow_download: bool = False - allow_export: bool = False -class ShareLinkVerify(BaseModel): - token: str - password: Optional[str] = None +# ==================== Phase 7: Workflow Pydantic Models ==================== -class CommentCreate(BaseModel): - target_type: str # entity, relation, transcript, project - target_id: str - parent_id: Optional[str] = None - content: str - mentions: Optional[List[str]] = None +class WorkflowCreate(BaseModel): + name: str = Field(..., description="工作流名称") + description: str = Field(default="", description="工作流描述") + workflow_type: str = Field(..., description="工作流类型: auto_analyze, auto_align, auto_relation, scheduled_report, custom") + project_id: str = Field(..., description="所属项目ID") + schedule: Optional[str] = Field(default=None, description="调度表达式(cron或分钟数)") + schedule_type: str = Field(default="manual", description="调度类型: manual, cron, interval") + config: Dict = Field(default_factory=dict, description="工作流配置") + webhook_ids: List[str] = Field(default_factory=list, description="关联的Webhook ID列表") -class CommentUpdate(BaseModel): - content: str -class CommentResolve(BaseModel): - resolved: bool +class WorkflowUpdate(BaseModel): + name: Optional[str] = None + description: Optional[str] = None + status: Optional[str] = None # active, paused, error, completed + schedule: Optional[str] = None + schedule_type: Optional[str] = None + is_active: Optional[bool] = None + config: Optional[Dict] = None + webhook_ids: Optional[List[str]] = None -class TeamMemberInvite(BaseModel): - user_id: str - user_name: str - user_email: str - role: str = "viewer" # owner, admin, editor, viewer, commenter -class TeamMemberRoleUpdate(BaseModel): - role: str +class WorkflowResponse(BaseModel): + id: str + name: str + description: str + workflow_type: str + project_id: str + status: str + schedule: Optional[str] + schedule_type: str + config: Dict + webhook_ids: List[str] + is_active: bool + created_at: str + updated_at: str + last_run_at: Optional[str] + next_run_at: Optional[str] + run_count: int + success_count: int + fail_count: int + + +class WorkflowListResponse(BaseModel): + workflows: List[WorkflowResponse] + total: int + + +class WorkflowTaskCreate(BaseModel): + name: str = Field(..., description="任务名称") + task_type: str = Field(..., description="任务类型: analyze, align, discover_relations, notify, custom") + config: Dict = Field(default_factory=dict, description="任务配置") + order: int = Field(default=0, description="执行顺序") + depends_on: List[str] = Field(default_factory=list, description="依赖的任务ID列表") + timeout_seconds: int = Field(default=300, description="超时时间(秒)") + retry_count: int = Field(default=3, description="重试次数") + retry_delay: int = Field(default=5, description="重试延迟(秒)") + + +class WorkflowTaskUpdate(BaseModel): + name: Optional[str] = None + task_type: Optional[str] = None + config: Optional[Dict] = None + order: Optional[int] = None + depends_on: Optional[List[str]] = None + timeout_seconds: Optional[int] = None + retry_count: Optional[int] = None + retry_delay: Optional[int] = None + + +class WorkflowTaskResponse(BaseModel): + id: str + workflow_id: str + name: str + task_type: str + config: Dict + order: int + depends_on: List[str] + timeout_seconds: int + retry_count: int + retry_delay: int + created_at: str + updated_at: str + + +class WebhookCreate(BaseModel): + name: str = Field(..., description="Webhook名称") + webhook_type: str = Field(..., description="Webhook类型: feishu, dingtalk, slack, custom") + url: str = Field(..., description="Webhook URL") + secret: str = Field(default="", description="签名密钥") + headers: Dict = Field(default_factory=dict, description="自定义请求头") + template: str = Field(default="", description="消息模板") + + +class WebhookUpdate(BaseModel): + name: Optional[str] = None + webhook_type: Optional[str] = None + url: Optional[str] = None + secret: Optional[str] = None + headers: Optional[Dict] = None + template: Optional[str] = None + is_active: Optional[bool] = None + + +class WebhookResponse(BaseModel): + id: str + name: str + webhook_type: str + url: str + headers: Dict + template: str + is_active: bool + created_at: str + updated_at: str + last_used_at: Optional[str] + success_count: int + fail_count: int + + +class WebhookListResponse(BaseModel): + webhooks: List[WebhookResponse] + total: int + + +class WorkflowLogResponse(BaseModel): + id: str + workflow_id: str + task_id: Optional[str] + status: str + start_time: Optional[str] + end_time: Optional[str] + duration_ms: int + input_data: Dict + output_data: Dict + error_message: str + created_at: str + + +class WorkflowLogListResponse(BaseModel): + logs: List[WorkflowLogResponse] + total: int + + +class WorkflowTriggerRequest(BaseModel): + input_data: Dict = Field(default_factory=dict, description="工作流输入数据") + + +class WorkflowTriggerResponse(BaseModel): + success: bool + workflow_id: str + log_id: str + results: Dict + duration_ms: int + + +class WorkflowStatsResponse(BaseModel): + total: int + success: int + failed: int + success_rate: float + avg_duration_ms: float + daily: List[Dict] + # API Keys KIMI_API_KEY = os.getenv("KIMI_API_KEY", "") @@ -207,18 +718,9 @@ def get_doc_processor(): _doc_processor = DocumentProcessor() return _doc_processor -# Phase 7: Collaboration Manager singleton -_collaboration_manager = None -def get_collab_manager(): - global _collaboration_manager - if _collaboration_manager is None and COLLABORATION_AVAILABLE: - db = get_db_manager() if DB_AVAILABLE else None - _collaboration_manager = get_collaboration_manager(db) - return _collaboration_manager - # Phase 2: Entity Edit API -@app.put("/api/v1/entities/{entity_id}") -async def update_entity(entity_id: str, update: EntityUpdate): +@app.put("/api/v1/entities/{entity_id}", tags=["Entities"]) +async def update_entity(entity_id: str, update: EntityUpdate, _=Depends(verify_api_key)): """更新实体信息(名称、类型、定义、别名)""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -240,8 +742,8 @@ async def update_entity(entity_id: str, update: EntityUpdate): "aliases": updated.aliases } -@app.delete("/api/v1/entities/{entity_id}") -async def delete_entity(entity_id: str): +@app.delete("/api/v1/entities/{entity_id}", tags=["Entities"]) +async def delete_entity(entity_id: str, _=Depends(verify_api_key)): """删除实体""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -254,8 +756,8 @@ async def delete_entity(entity_id: str): db.delete_entity(entity_id) return {"success": True, "message": f"Entity {entity_id} deleted"} -@app.post("/api/v1/entities/{entity_id}/merge") -async def merge_entities_endpoint(entity_id: str, merge_req: EntityMergeRequest): +@app.post("/api/v1/entities/{entity_id}/merge", tags=["Entities"]) +async def merge_entities_endpoint(entity_id: str, merge_req: EntityMergeRequest, _=Depends(verify_api_key)): """合并两个实体""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -282,8 +784,8 @@ async def merge_entities_endpoint(entity_id: str, merge_req: EntityMergeRequest) } # Phase 2: Relation Edit API -@app.post("/api/v1/projects/{project_id}/relations") -async def create_relation_endpoint(project_id: str, relation: RelationCreate): +@app.post("/api/v1/projects/{project_id}/relations", tags=["Relations"]) +async def create_relation_endpoint(project_id: str, relation: RelationCreate, _=Depends(verify_api_key)): """创建新的实体关系""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -313,8 +815,8 @@ async def create_relation_endpoint(project_id: str, relation: RelationCreate): "success": True } -@app.delete("/api/v1/relations/{relation_id}") -async def delete_relation(relation_id: str): +@app.delete("/api/v1/relations/{relation_id}", tags=["Relations"]) +async def delete_relation(relation_id: str, _=Depends(verify_api_key)): """删除关系""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -323,8 +825,8 @@ async def delete_relation(relation_id: str): db.delete_relation(relation_id) return {"success": True, "message": f"Relation {relation_id} deleted"} -@app.put("/api/v1/relations/{relation_id}") -async def update_relation(relation_id: str, relation: RelationCreate): +@app.put("/api/v1/relations/{relation_id}", tags=["Relations"]) +async def update_relation(relation_id: str, relation: RelationCreate, _=Depends(verify_api_key)): """更新关系""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -344,8 +846,8 @@ async def update_relation(relation_id: str, relation: RelationCreate): } # Phase 2: Transcript Edit API -@app.get("/api/v1/transcripts/{transcript_id}") -async def get_transcript(transcript_id: str): +@app.get("/api/v1/transcripts/{transcript_id}", tags=["Transcripts"]) +async def get_transcript(transcript_id: str, _=Depends(verify_api_key)): """获取转录详情""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -358,8 +860,8 @@ async def get_transcript(transcript_id: str): return transcript -@app.put("/api/v1/transcripts/{transcript_id}") -async def update_transcript(transcript_id: str, update: TranscriptUpdate): +@app.put("/api/v1/transcripts/{transcript_id}", tags=["Transcripts"]) +async def update_transcript(transcript_id: str, update: TranscriptUpdate, _=Depends(verify_api_key)): """更新转录文本(人工修正)""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -387,8 +889,8 @@ class ManualEntityCreate(BaseModel): start_pos: Optional[int] = None end_pos: Optional[int] = None -@app.post("/api/v1/projects/{project_id}/entities") -async def create_manual_entity(project_id: str, entity: ManualEntityCreate): +@app.post("/api/v1/projects/{project_id}/entities", tags=["Entities"]) +async def create_manual_entity(project_id: str, entity: ManualEntityCreate, _=Depends(verify_api_key)): """手动创建实体(划词新建)""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -551,8 +1053,8 @@ def align_entity(project_id: str, name: str, db, definition: str = "") -> Option # API Endpoints -@app.post("/api/v1/projects", response_model=dict) -async def create_project(project: ProjectCreate): +@app.post("/api/v1/projects", response_model=dict, tags=["Projects"]) +async def create_project(project: ProjectCreate, _=Depends(verify_api_key)): """创建新项目""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -562,8 +1064,8 @@ async def create_project(project: ProjectCreate): p = db.create_project(project_id, project.name, project.description) return {"id": p.id, "name": p.name, "description": p.description} -@app.get("/api/v1/projects") -async def list_projects(): +@app.get("/api/v1/projects", tags=["Projects"]) +async def list_projects(_=Depends(verify_api_key)): """列出所有项目""" if not DB_AVAILABLE: return [] @@ -572,8 +1074,8 @@ async def list_projects(): projects = db.list_projects() return [{"id": p.id, "name": p.name, "description": p.description} for p in projects] -@app.post("/api/v1/projects/{project_id}/upload", response_model=AnalysisResult) -async def upload_audio(project_id: str, file: UploadFile = File(...)): +@app.post("/api/v1/projects/{project_id}/upload", response_model=AnalysisResult, tags=["Projects"]) +async def upload_audio(project_id: str, file: UploadFile = File(...), _=Depends(verify_api_key)): """上传音频到指定项目 - Phase 3: 支持多文件融合""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -684,7 +1186,7 @@ async def upload_audio(project_id: str, file: UploadFile = File(...)): # Phase 3: Document Upload API @app.post("/api/v1/projects/{project_id}/upload-document") -async def upload_document(project_id: str, file: UploadFile = File(...)): +async def upload_document(project_id: str, file: UploadFile = File(...), _=Depends(verify_api_key)): """上传 PDF/DOCX 文档到指定项目""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -796,7 +1298,7 @@ async def upload_document(project_id: str, file: UploadFile = File(...)): # Phase 3: Knowledge Base API @app.get("/api/v1/projects/{project_id}/knowledge-base") -async def get_knowledge_base(project_id: str): +async def get_knowledge_base(project_id: str, _=Depends(verify_api_key)): """获取项目知识库 - 包含所有实体、关系、术语表""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -893,7 +1395,7 @@ async def get_knowledge_base(project_id: str): # Phase 3: Glossary API @app.post("/api/v1/projects/{project_id}/glossary") -async def add_glossary_term(project_id: str, term: GlossaryTermCreate): +async def add_glossary_term(project_id: str, term: GlossaryTermCreate, _=Depends(verify_api_key)): """添加术语到项目术语表""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -917,7 +1419,7 @@ async def add_glossary_term(project_id: str, term: GlossaryTermCreate): } @app.get("/api/v1/projects/{project_id}/glossary") -async def get_glossary(project_id: str): +async def get_glossary(project_id: str, _=Depends(verify_api_key)): """获取项目术语表""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -927,7 +1429,7 @@ async def get_glossary(project_id: str): return glossary @app.delete("/api/v1/glossary/{term_id}") -async def delete_glossary_term(term_id: str): +async def delete_glossary_term(term_id: str, _=Depends(verify_api_key)): """删除术语""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -938,7 +1440,7 @@ async def delete_glossary_term(term_id: str): # Phase 3: Entity Alignment API @app.post("/api/v1/projects/{project_id}/align-entities") -async def align_project_entities(project_id: str, threshold: float = 0.85): +async def align_project_entities(project_id: str, threshold: float = 0.85, _=Depends(verify_api_key)): """运行实体对齐算法,合并相似实体""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -984,7 +1486,7 @@ async def align_project_entities(project_id: str, threshold: float = 0.85): } @app.get("/api/v1/projects/{project_id}/entities") -async def get_project_entities(project_id: str): +async def get_project_entities(project_id: str, _=Depends(verify_api_key)): """获取项目的全局实体列表""" if not DB_AVAILABLE: return [] @@ -995,7 +1497,7 @@ async def get_project_entities(project_id: str): @app.get("/api/v1/projects/{project_id}/relations") -async def get_project_relations(project_id: str): +async def get_project_relations(project_id: str, _=Depends(verify_api_key)): """获取项目的实体关系列表""" if not DB_AVAILABLE: return [] @@ -1019,7 +1521,7 @@ async def get_project_relations(project_id: str): @app.get("/api/v1/projects/{project_id}/transcripts") -async def get_project_transcripts(project_id: str): +async def get_project_transcripts(project_id: str, _=Depends(verify_api_key)): """获取项目的转录列表""" if not DB_AVAILABLE: return [] @@ -1036,7 +1538,7 @@ async def get_project_transcripts(project_id: str): @app.get("/api/v1/entities/{entity_id}/mentions") -async def get_entity_mentions(entity_id: str): +async def get_entity_mentions(entity_id: str, _=Depends(verify_api_key)): """获取实体的所有提及位置""" if not DB_AVAILABLE: return [] @@ -1057,22 +1559,26 @@ async def get_entity_mentions(entity_id: str): async def health_check(): return { "status": "ok", - "version": "0.6.0", - "phase": "Phase 5 - Knowledge Reasoning", + "version": "0.7.0", + "phase": "Phase 7 - Plugin & Integration", "oss_available": OSS_AVAILABLE, "tingwu_available": TINGWU_AVAILABLE, "db_available": DB_AVAILABLE, "doc_processor_available": DOC_PROCESSOR_AVAILABLE, "aligner_available": ALIGNER_AVAILABLE, "llm_client_available": LLM_CLIENT_AVAILABLE, - "reasoner_available": REASONER_AVAILABLE + "reasoner_available": REASONER_AVAILABLE, + "multimodal_available": MULTIMODAL_AVAILABLE, + "image_processor_available": IMAGE_PROCESSOR_AVAILABLE, + "multimodal_linker_available": MULTIMODAL_LINKER_AVAILABLE, + "plugin_manager_available": PLUGIN_MANAGER_AVAILABLE } # ==================== Phase 4: Agent 助手 API ==================== @app.post("/api/v1/projects/{project_id}/agent/query") -async def agent_query(project_id: str, query: AgentQuery): +async def agent_query(project_id: str, query: AgentQuery, _=Depends(verify_api_key)): """Agent RAG 问答""" if not DB_AVAILABLE or not LLM_CLIENT_AVAILABLE: raise HTTPException(status_code=500, detail="Service not available") @@ -1126,7 +1632,7 @@ async def agent_query(project_id: str, query: AgentQuery): @app.post("/api/v1/projects/{project_id}/agent/command") -async def agent_command(project_id: str, command: AgentCommand): +async def agent_command(project_id: str, command: AgentCommand, _=Depends(verify_api_key)): """Agent 指令执行 - 解析并执行自然语言指令""" if not DB_AVAILABLE or not LLM_CLIENT_AVAILABLE: raise HTTPException(status_code=500, detail="Service not available") @@ -1217,7 +1723,7 @@ async def agent_command(project_id: str, command: AgentCommand): @app.get("/api/v1/projects/{project_id}/agent/suggest") -async def agent_suggest(project_id: str): +async def agent_suggest(project_id: str, _=Depends(verify_api_key)): """获取 Agent 建议 - 基于项目数据提供洞察""" if not DB_AVAILABLE or not LLM_CLIENT_AVAILABLE: raise HTTPException(status_code=500, detail="Service not available") @@ -1257,7 +1763,7 @@ async def agent_suggest(project_id: str): # ==================== Phase 4: 知识溯源 API ==================== @app.get("/api/v1/relations/{relation_id}/provenance") -async def get_relation_provenance(relation_id: str): +async def get_relation_provenance(relation_id: str, _=Depends(verify_api_key)): """获取关系的知识溯源信息""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1282,7 +1788,7 @@ async def get_relation_provenance(relation_id: str): @app.get("/api/v1/entities/{entity_id}/details") -async def get_entity_details(entity_id: str): +async def get_entity_details(entity_id: str, _=Depends(verify_api_key)): """获取实体详情,包含所有提及位置""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1297,7 +1803,7 @@ async def get_entity_details(entity_id: str): @app.get("/api/v1/entities/{entity_id}/evolution") -async def get_entity_evolution(entity_id: str): +async def get_entity_evolution(entity_id: str, _=Depends(verify_api_key)): """分析实体的演变和态度变化""" if not DB_AVAILABLE or not LLM_CLIENT_AVAILABLE: raise HTTPException(status_code=500, detail="Service not available") @@ -1332,7 +1838,7 @@ async def get_entity_evolution(entity_id: str): # ==================== Phase 4: 实体管理增强 API ==================== @app.get("/api/v1/projects/{project_id}/entities/search") -async def search_entities(project_id: str, q: str): +async def search_entities(project_id: str, q: str, _=Depends(verify_api_key)): """搜索实体""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1349,7 +1855,8 @@ async def get_project_timeline( project_id: str, entity_id: str = None, start_date: str = None, - end_date: str = None + end_date: str = None, + _=Depends(verify_api_key) ): """获取项目时间线 - 按时间顺序的实体提及和关系事件""" if not DB_AVAILABLE: @@ -1370,7 +1877,7 @@ async def get_project_timeline( @app.get("/api/v1/projects/{project_id}/timeline/summary") -async def get_timeline_summary(project_id: str): +async def get_timeline_summary(project_id: str, _=Depends(verify_api_key)): """获取项目时间线摘要统计""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1390,7 +1897,7 @@ async def get_timeline_summary(project_id: str): @app.get("/api/v1/entities/{entity_id}/timeline") -async def get_entity_timeline(entity_id: str): +async def get_entity_timeline(entity_id: str, _=Depends(verify_api_key)): """获取单个实体的时间线""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1420,7 +1927,7 @@ class ReasoningQuery(BaseModel): @app.post("/api/v1/projects/{project_id}/reasoning/query") -async def reasoning_query(project_id: str, query: ReasoningQuery): +async def reasoning_query(project_id: str, query: ReasoningQuery, _=Depends(verify_api_key)): """ 增强问答 - 基于知识推理的智能问答 @@ -1474,7 +1981,8 @@ async def reasoning_query(project_id: str, query: ReasoningQuery): async def find_inference_path( project_id: str, start_entity: str, - end_entity: str + end_entity: str, + _=Depends(verify_api_key) ): """ 发现两个实体之间的推理路径 @@ -1523,7 +2031,7 @@ class SummaryRequest(BaseModel): @app.post("/api/v1/projects/{project_id}/reasoning/summary") -async def project_summary(project_id: str, req: SummaryRequest): +async def project_summary(project_id: str, req: SummaryRequest, _=Depends(verify_api_key)): """ 项目智能总结 @@ -1608,7 +2116,7 @@ class EntityAttributeBatchSet(BaseModel): # 属性模板管理 API @app.post("/api/v1/projects/{project_id}/attribute-templates") -async def create_attribute_template_endpoint(project_id: str, template: AttributeTemplateCreate): +async def create_attribute_template_endpoint(project_id: str, template: AttributeTemplateCreate, _=Depends(verify_api_key)): """创建属性模板""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1643,7 +2151,7 @@ async def create_attribute_template_endpoint(project_id: str, template: Attribut @app.get("/api/v1/projects/{project_id}/attribute-templates") -async def list_attribute_templates_endpoint(project_id: str): +async def list_attribute_templates_endpoint(project_id: str, _=Depends(verify_api_key)): """列出项目的所有属性模板""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1667,7 +2175,7 @@ async def list_attribute_templates_endpoint(project_id: str): @app.get("/api/v1/attribute-templates/{template_id}") -async def get_attribute_template_endpoint(template_id: str): +async def get_attribute_template_endpoint(template_id: str, _=Depends(verify_api_key)): """获取属性模板详情""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1691,7 +2199,7 @@ async def get_attribute_template_endpoint(template_id: str): @app.put("/api/v1/attribute-templates/{template_id}") -async def update_attribute_template_endpoint(template_id: str, update: AttributeTemplateUpdate): +async def update_attribute_template_endpoint(template_id: str, update: AttributeTemplateUpdate, _=Depends(verify_api_key)): """更新属性模板""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1713,7 +2221,7 @@ async def update_attribute_template_endpoint(template_id: str, update: Attribute @app.delete("/api/v1/attribute-templates/{template_id}") -async def delete_attribute_template_endpoint(template_id: str): +async def delete_attribute_template_endpoint(template_id: str, _=Depends(verify_api_key)): """删除属性模板""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1726,7 +2234,7 @@ async def delete_attribute_template_endpoint(template_id: str): # 实体属性值管理 API @app.post("/api/v1/entities/{entity_id}/attributes") -async def set_entity_attribute_endpoint(entity_id: str, attr: EntityAttributeSet): +async def set_entity_attribute_endpoint(entity_id: str, attr: EntityAttributeSet, _=Depends(verify_api_key)): """设置实体属性值""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1813,7 +2321,7 @@ async def set_entity_attribute_endpoint(entity_id: str, attr: EntityAttributeSet @app.post("/api/v1/entities/{entity_id}/attributes/batch") -async def batch_set_entity_attributes_endpoint(entity_id: str, batch: EntityAttributeBatchSet): +async def batch_set_entity_attributes_endpoint(entity_id: str, batch: EntityAttributeBatchSet, _=Depends(verify_api_key)): """批量设置实体属性值""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1852,7 +2360,7 @@ async def batch_set_entity_attributes_endpoint(entity_id: str, batch: EntityAttr @app.get("/api/v1/entities/{entity_id}/attributes") -async def get_entity_attributes_endpoint(entity_id: str): +async def get_entity_attributes_endpoint(entity_id: str, _=Depends(verify_api_key)): """获取实体的所有属性值""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1878,7 +2386,7 @@ async def get_entity_attributes_endpoint(entity_id: str): @app.delete("/api/v1/entities/{entity_id}/attributes/{template_id}") async def delete_entity_attribute_endpoint(entity_id: str, template_id: str, - reason: Optional[str] = ""): + reason: Optional[str] = "", _=Depends(verify_api_key)): """删除实体属性值""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1892,7 +2400,7 @@ async def delete_entity_attribute_endpoint(entity_id: str, template_id: str, # 属性历史 API @app.get("/api/v1/entities/{entity_id}/attributes/history") -async def get_entity_attribute_history_endpoint(entity_id: str, limit: int = 50): +async def get_entity_attribute_history_endpoint(entity_id: str, limit: int = 50, _=Depends(verify_api_key)): """获取实体的属性变更历史""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1915,7 +2423,7 @@ async def get_entity_attribute_history_endpoint(entity_id: str, limit: int = 50) @app.get("/api/v1/attribute-templates/{template_id}/history") -async def get_template_history_endpoint(template_id: str, limit: int = 50): +async def get_template_history_endpoint(template_id: str, limit: int = 50, _=Depends(verify_api_key)): """获取属性模板的所有变更历史(跨实体)""" if not DB_AVAILABLE: raise HTTPException(status_code=500, detail="Database not available") @@ -1942,7 +2450,8 @@ async def get_template_history_endpoint(template_id: str, limit: int = 50): @app.get("/api/v1/projects/{project_id}/entities/search-by-attributes") async def search_entities_by_attributes_endpoint( project_id: str, - attribute_filter: Optional[str] = None # JSON 格式: {"职位": "经理", "部门": "技术部"} + attribute_filter: Optional[str] = None, # JSON 格式: {"职位": "经理", "部门": "技术部"} + _=Depends(verify_api_key) ): """根据属性筛选搜索实体""" if not DB_AVAILABLE: @@ -1979,7 +2488,7 @@ async def search_entities_by_attributes_endpoint( from fastapi.responses import StreamingResponse, FileResponse @app.get("/api/v1/projects/{project_id}/export/graph-svg") -async def export_graph_svg_endpoint(project_id: str): +async def export_graph_svg_endpoint(project_id: str, _=Depends(verify_api_key)): """导出知识图谱为 SVG""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2029,7 +2538,7 @@ async def export_graph_svg_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/graph-png") -async def export_graph_png_endpoint(project_id: str): +async def export_graph_png_endpoint(project_id: str, _=Depends(verify_api_key)): """导出知识图谱为 PNG""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2079,7 +2588,7 @@ async def export_graph_png_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/entities-excel") -async def export_entities_excel_endpoint(project_id: str): +async def export_entities_excel_endpoint(project_id: str, _=Depends(verify_api_key)): """导出实体数据为 Excel""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2116,7 +2625,7 @@ async def export_entities_excel_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/entities-csv") -async def export_entities_csv_endpoint(project_id: str): +async def export_entities_csv_endpoint(project_id: str, _=Depends(verify_api_key)): """导出实体数据为 CSV""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2153,7 +2662,7 @@ async def export_entities_csv_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/relations-csv") -async def export_relations_csv_endpoint(project_id: str): +async def export_relations_csv_endpoint(project_id: str, _=Depends(verify_api_key)): """导出关系数据为 CSV""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2188,7 +2697,7 @@ async def export_relations_csv_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/report-pdf") -async def export_report_pdf_endpoint(project_id: str): +async def export_report_pdf_endpoint(project_id: str, _=Depends(verify_api_key)): """导出项目报告为 PDF""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2263,7 +2772,7 @@ async def export_report_pdf_endpoint(project_id: str): @app.get("/api/v1/projects/{project_id}/export/project-json") -async def export_project_json_endpoint(project_id: str): +async def export_project_json_endpoint(project_id: str, _=Depends(verify_api_key)): """导出完整项目数据为 JSON""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2328,7 +2837,7 @@ async def export_project_json_endpoint(project_id: str): @app.get("/api/v1/transcripts/{transcript_id}/export/markdown") -async def export_transcript_markdown_endpoint(transcript_id: str): +async def export_transcript_markdown_endpoint(transcript_id: str, _=Depends(verify_api_key)): """导出转录文本为 Markdown""" if not DB_AVAILABLE or not EXPORT_AVAILABLE: raise HTTPException(status_code=500, detail="Export functionality not available") @@ -2394,7 +2903,7 @@ class GraphQueryRequest(BaseModel): depth: int = 1 @app.get("/api/v1/neo4j/status") -async def neo4j_status(): +async def neo4j_status(_=Depends(verify_api_key)): """获取 Neo4j 连接状态""" if not NEO4J_AVAILABLE: return { @@ -2420,7 +2929,7 @@ async def neo4j_status(): } @app.post("/api/v1/neo4j/sync") -async def neo4j_sync_project(request: Neo4jSyncRequest): +async def neo4j_sync_project(request: Neo4jSyncRequest, _=Depends(verify_api_key)): """同步项目数据到 Neo4j""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2480,7 +2989,7 @@ async def neo4j_sync_project(request: Neo4jSyncRequest): } @app.get("/api/v1/projects/{project_id}/graph/stats") -async def get_graph_stats(project_id: str): +async def get_graph_stats(project_id: str, _=Depends(verify_api_key)): """获取项目图统计信息""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2493,7 +3002,7 @@ async def get_graph_stats(project_id: str): return stats @app.post("/api/v1/graph/shortest-path") -async def find_shortest_path(request: PathQueryRequest): +async def find_shortest_path(request: PathQueryRequest, _=Depends(verify_api_key)): """查找两个实体之间的最短路径""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2524,7 +3033,7 @@ async def find_shortest_path(request: PathQueryRequest): } @app.post("/api/v1/graph/paths") -async def find_all_paths(request: PathQueryRequest): +async def find_all_paths(request: PathQueryRequest, _=Depends(verify_api_key)): """查找两个实体之间的所有路径""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2555,7 +3064,8 @@ async def find_all_paths(request: PathQueryRequest): async def get_entity_neighbors( entity_id: str, relation_type: str = None, - limit: int = 50 + limit: int = 50, + _=Depends(verify_api_key) ): """获取实体的邻居节点""" if not NEO4J_AVAILABLE: @@ -2573,7 +3083,7 @@ async def get_entity_neighbors( } @app.get("/api/v1/entities/{entity_id1}/common-neighbors/{entity_id2}") -async def get_common_neighbors(entity_id1: str, entity_id2: str): +async def get_common_neighbors(entity_id1: str, entity_id2: str, _=Depends(verify_api_key)): """获取两个实体的共同邻居""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2593,7 +3103,8 @@ async def get_common_neighbors(entity_id1: str, entity_id2: str): @app.get("/api/v1/projects/{project_id}/graph/centrality") async def get_centrality_analysis( project_id: str, - metric: str = "degree" + metric: str = "degree", + _=Depends(verify_api_key) ): """获取中心性分析结果""" if not NEO4J_AVAILABLE: @@ -2619,7 +3130,7 @@ async def get_centrality_analysis( } @app.get("/api/v1/projects/{project_id}/graph/communities") -async def get_communities(project_id: str): +async def get_communities(project_id: str, _=Depends(verify_api_key)): """获取社区发现结果""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2643,7 +3154,7 @@ async def get_communities(project_id: str): } @app.post("/api/v1/graph/subgraph") -async def get_subgraph(request: GraphQueryRequest): +async def get_subgraph(request: GraphQueryRequest, _=Depends(verify_api_key)): """获取子图""" if not NEO4J_AVAILABLE: raise HTTPException(status_code=503, detail="Neo4j not available") @@ -2656,475 +3167,3850 @@ async def get_subgraph(request: GraphQueryRequest): return subgraph -# ========================================== -# Phase 7: 协作与共享 API -# ========================================== +# ==================== Phase 6: API Key Management Endpoints ==================== -# ----- 项目分享 ----- - -@app.post("/api/v1/projects/{project_id}/shares") -async def create_share_link(project_id: str, request: ShareLinkCreate, created_by: str = "current_user"): - """创建项目分享链接""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") +@app.post("/api/v1/api-keys", response_model=ApiKeyCreateResponse, tags=["API Keys"]) +async def create_api_key(request: ApiKeyCreate, _=Depends(verify_api_key)): + """ + 创建新的 API Key - manager = get_collab_manager() - share = manager.create_share_link( - project_id=project_id, - created_by=created_by, - permission=request.permission, - expires_in_days=request.expires_in_days, - max_uses=request.max_uses, - password=request.password, - allow_download=request.allow_download, - allow_export=request.allow_export + - **name**: API Key 的名称/描述 + - **permissions**: 权限列表,可选值: read, write, delete + - **rate_limit**: 每分钟请求限制,默认 60 + - **expires_days**: 过期天数(可选,不设置则永不过期) + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + raw_key, api_key = key_manager.create_key( + name=request.name, + permissions=request.permissions, + rate_limit=request.rate_limit, + expires_days=request.expires_days ) + return ApiKeyCreateResponse( + api_key=raw_key, + info=ApiKeyResponse( + id=api_key.id, + key_preview=api_key.key_preview, + name=api_key.name, + permissions=api_key.permissions, + rate_limit=api_key.rate_limit, + status=api_key.status, + created_at=api_key.created_at, + expires_at=api_key.expires_at, + last_used_at=api_key.last_used_at, + total_calls=api_key.total_calls + ) + ) + + +@app.get("/api/v1/api-keys", response_model=ApiKeyListResponse, tags=["API Keys"]) +async def list_api_keys( + status: Optional[str] = None, + limit: int = 100, + offset: int = 0, + _=Depends(verify_api_key) +): + """ + 列出所有 API Keys + + - **status**: 按状态筛选 (active, revoked, expired) + - **limit**: 返回数量限制 + - **offset**: 分页偏移 + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + keys = key_manager.list_keys(status=status, limit=limit, offset=offset) + + return ApiKeyListResponse( + keys=[ + ApiKeyResponse( + id=k.id, + key_preview=k.key_preview, + name=k.name, + permissions=k.permissions, + rate_limit=k.rate_limit, + status=k.status, + created_at=k.created_at, + expires_at=k.expires_at, + last_used_at=k.last_used_at, + total_calls=k.total_calls + ) + for k in keys + ], + total=len(keys) + ) + + +@app.get("/api/v1/api-keys/{key_id}", response_model=ApiKeyResponse, tags=["API Keys"]) +async def get_api_key(key_id: str, _=Depends(verify_api_key)): + """获取单个 API Key 详情""" + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + key = key_manager.get_key_by_id(key_id) + + if not key: + raise HTTPException(status_code=404, detail="API Key not found") + + return ApiKeyResponse( + id=key.id, + key_preview=key.key_preview, + name=key.name, + permissions=key.permissions, + rate_limit=key.rate_limit, + status=key.status, + created_at=key.created_at, + expires_at=key.expires_at, + last_used_at=key.last_used_at, + total_calls=key.total_calls + ) + + +@app.patch("/api/v1/api-keys/{key_id}", response_model=ApiKeyResponse, tags=["API Keys"]) +async def update_api_key(key_id: str, request: ApiKeyUpdate, _=Depends(verify_api_key)): + """ + 更新 API Key 信息 + + 可以更新的字段:name, permissions, rate_limit + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + + # 构建更新数据 + updates = {} + if request.name is not None: + updates["name"] = request.name + if request.permissions is not None: + updates["permissions"] = request.permissions + if request.rate_limit is not None: + updates["rate_limit"] = request.rate_limit + + if not updates: + raise HTTPException(status_code=400, detail="No fields to update") + + success = key_manager.update_key(key_id, **updates) + + if not success: + raise HTTPException(status_code=404, detail="API Key not found") + + # 返回更新后的 key + key = key_manager.get_key_by_id(key_id) + return ApiKeyResponse( + id=key.id, + key_preview=key.key_preview, + name=key.name, + permissions=key.permissions, + rate_limit=key.rate_limit, + status=key.status, + created_at=key.created_at, + expires_at=key.expires_at, + last_used_at=key.last_used_at, + total_calls=key.total_calls + ) + + +@app.delete("/api/v1/api-keys/{key_id}", tags=["API Keys"]) +async def revoke_api_key(key_id: str, reason: str = "", _=Depends(verify_api_key)): + """ + 撤销 API Key + + 撤销后的 Key 将无法再使用,但记录会保留用于审计 + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + success = key_manager.revoke_key(key_id, reason=reason) + + if not success: + raise HTTPException(status_code=404, detail="API Key not found or already revoked") + + return {"success": True, "message": f"API Key {key_id} revoked"} + + +@app.get("/api/v1/api-keys/{key_id}/stats", response_model=ApiStatsResponse, tags=["API Keys"]) +async def get_api_key_stats(key_id: str, days: int = 30, _=Depends(verify_api_key)): + """ + 获取 API Key 的调用统计 + + - **days**: 统计天数,默认 30 天 + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + + # 验证 key 存在 + key = key_manager.get_key_by_id(key_id) + if not key: + raise HTTPException(status_code=404, detail="API Key not found") + + stats = key_manager.get_call_stats(key_id, days=days) + + return ApiStatsResponse( + summary=ApiCallStats(**stats["summary"]), + endpoints=stats["endpoints"], + daily=stats["daily"] + ) + + +@app.get("/api/v1/api-keys/{key_id}/logs", response_model=ApiLogsResponse, tags=["API Keys"]) +async def get_api_key_logs( + key_id: str, + limit: int = 100, + offset: int = 0, + _=Depends(verify_api_key) +): + """ + 获取 API Key 的调用日志 + + - **limit**: 返回数量限制 + - **offset**: 分页偏移 + """ + if not API_KEY_AVAILABLE: + raise HTTPException(status_code=503, detail="API Key management not available") + + key_manager = get_api_key_manager() + + # 验证 key 存在 + key = key_manager.get_key_by_id(key_id) + if not key: + raise HTTPException(status_code=404, detail="API Key not found") + + logs = key_manager.get_call_logs(key_id, limit=limit, offset=offset) + + return ApiLogsResponse( + logs=[ + ApiCallLog( + id=log["id"], + endpoint=log["endpoint"], + method=log["method"], + status_code=log["status_code"], + response_time_ms=log["response_time_ms"], + ip_address=log["ip_address"], + user_agent=log["user_agent"], + error_message=log["error_message"], + created_at=log["created_at"] + ) + for log in logs + ], + total=len(logs) + ) + + +@app.get("/api/v1/rate-limit/status", response_model=RateLimitStatus, tags=["API Keys"]) +async def get_rate_limit_status(request: Request, _=Depends(verify_api_key)): + """获取当前请求的限流状态""" + if not RATE_LIMITER_AVAILABLE: + return RateLimitStatus( + limit=60, + remaining=60, + reset_time=int(time.time()) + 60, + window="minute" + ) + + limiter = get_rate_limiter() + + # 获取限流键 + if hasattr(request.state, 'api_key') and request.state.api_key: + api_key = request.state.api_key + limit_key = f"api_key:{api_key.id}" + limit = api_key.rate_limit + else: + client_ip = request.client.host if request.client else "unknown" + limit_key = f"ip:{client_ip}" + limit = 10 + + info = await limiter.get_limit_info(limit_key) + + return RateLimitStatus( + limit=limit, + remaining=info.remaining, + reset_time=info.reset_time, + window="minute" + ) + + +# ==================== Phase 6: System Endpoints ==================== + +@app.get("/api/v1/health", tags=["System"]) +async def health_check(): + """健康检查端点""" return { - "id": share.id, - "token": share.token, - "permission": share.permission, - "created_at": share.created_at, - "expires_at": share.expires_at, - "max_uses": share.max_uses, - "share_url": f"/share/{share.token}" + "status": "healthy", + "version": "0.7.0", + "timestamp": datetime.now().isoformat() } -@app.get("/api/v1/projects/{project_id}/shares") -async def list_project_shares(project_id: str): - """列出项目的所有分享链接""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - shares = manager.list_project_shares(project_id) - - return { - "shares": [ - { - "id": s.id, - "token": s.token, - "permission": s.permission, - "created_at": s.created_at, - "expires_at": s.expires_at, - "use_count": s.use_count, - "max_uses": s.max_uses, - "is_active": s.is_active, - "has_password": s.password_hash is not None, - "allow_download": s.allow_download, - "allow_export": s.allow_export - } - for s in shares - ] - } -@app.post("/api/v1/shares/verify") -async def verify_share_link(request: ShareLinkVerify): - """验证分享链接""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - share = manager.validate_share_token(request.token, request.password) - - if not share: - raise HTTPException(status_code=401, detail="Invalid or expired share link") - - # 增加使用次数 - manager.increment_share_usage(request.token) - - return { - "valid": True, - "project_id": share.project_id, - "permission": share.permission, - "allow_download": share.allow_download, - "allow_export": share.allow_export +@app.get("/api/v1/status", tags=["System"]) +async def system_status(): + """系统状态信息""" + status = { + "version": "0.7.0", + "phase": "Phase 7 - Plugin & Integration", + "features": { + "database": DB_AVAILABLE, + "oss": OSS_AVAILABLE, + "tingwu": TINGWU_AVAILABLE, + "llm": LLM_CLIENT_AVAILABLE, + "neo4j": NEO4J_AVAILABLE, + "export": EXPORT_AVAILABLE, + "api_keys": API_KEY_AVAILABLE, + "rate_limiting": RATE_LIMITER_AVAILABLE, + "workflow": WORKFLOW_AVAILABLE, + "multimodal": MULTIMODAL_AVAILABLE, + "multimodal_linker": MULTIMODAL_LINKER_AVAILABLE, + "plugin_manager": PLUGIN_MANAGER_AVAILABLE, + }, + "api": { + "documentation": "/docs", + "openapi": "/openapi.json", + }, + "timestamp": datetime.now().isoformat() } + + return status -@app.get("/api/v1/shares/{token}/access") -async def access_shared_project(token: str, password: Optional[str] = None): - """通过分享链接访问项目""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + +# ==================== Phase 7: Workflow Automation Endpoints ==================== + +# Workflow Manager singleton +_workflow_manager = None + +def get_workflow_manager_instance(): + global _workflow_manager + if _workflow_manager is None and WORKFLOW_AVAILABLE and DB_AVAILABLE: + from workflow_manager import WorkflowManager + db = get_db_manager() + _workflow_manager = WorkflowManager(db) + _workflow_manager.start() + return _workflow_manager + + +@app.post("/api/v1/workflows", response_model=WorkflowResponse, tags=["Workflows"]) +async def create_workflow_endpoint(request: WorkflowCreate, _=Depends(verify_api_key)): + """ + 创建工作流 - manager = get_collab_manager() - share = manager.validate_share_token(token, password) + 工作流类型: + - **auto_analyze**: 自动分析新上传的文件 + - **auto_align**: 自动实体对齐 + - **auto_relation**: 自动关系发现 + - **scheduled_report**: 定时报告 + - **custom**: 自定义工作流 - if not share: - raise HTTPException(status_code=401, detail="Invalid or expired share link") + 调度类型: + - **manual**: 手动触发 + - **cron**: Cron 表达式调度 + - **interval**: 间隔调度(分钟数) - # 增加使用次数 - manager.increment_share_usage(token) + 定时规则示例: + - `0 9 * * *` - 每天上午9点 (cron) + - `60` - 每60分钟执行一次 (interval) + """ + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + + try: + workflow = Workflow( + id=str(uuid.uuid4())[:8], + name=request.name, + description=request.description, + workflow_type=request.workflow_type, + project_id=request.project_id, + schedule=request.schedule, + schedule_type=request.schedule_type, + config=request.config, + webhook_ids=request.webhook_ids + ) + + created = manager.create_workflow(workflow) + + return WorkflowResponse( + id=created.id, + name=created.name, + description=created.description, + workflow_type=created.workflow_type, + project_id=created.project_id, + status=created.status, + schedule=created.schedule, + schedule_type=created.schedule_type, + config=created.config, + webhook_ids=created.webhook_ids, + is_active=created.is_active, + created_at=created.created_at, + updated_at=created.updated_at, + last_run_at=created.last_run_at, + next_run_at=created.next_run_at, + run_count=created.run_count, + success_count=created.success_count, + fail_count=created.fail_count + ) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) + + +@app.get("/api/v1/workflows", response_model=WorkflowListResponse, tags=["Workflows"]) +async def list_workflows_endpoint( + project_id: Optional[str] = None, + status: Optional[str] = None, + workflow_type: Optional[str] = None, + _=Depends(verify_api_key) +): + """获取工作流列表""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + workflows = manager.list_workflows(project_id, status, workflow_type) + + return WorkflowListResponse( + workflows=[ + WorkflowResponse( + id=w.id, + name=w.name, + description=w.description, + workflow_type=w.workflow_type, + project_id=w.project_id, + status=w.status, + schedule=w.schedule, + schedule_type=w.schedule_type, + config=w.config, + webhook_ids=w.webhook_ids, + is_active=w.is_active, + created_at=w.created_at, + updated_at=w.updated_at, + last_run_at=w.last_run_at, + next_run_at=w.next_run_at, + run_count=w.run_count, + success_count=w.success_count, + fail_count=w.fail_count + ) + for w in workflows + ], + total=len(workflows) + ) + + +@app.get("/api/v1/workflows/{workflow_id}", response_model=WorkflowResponse, tags=["Workflows"]) +async def get_workflow_endpoint(workflow_id: str, _=Depends(verify_api_key)): + """获取单个工作流详情""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + workflow = manager.get_workflow(workflow_id) + + if not workflow: + raise HTTPException(status_code=404, detail="Workflow not found") + + return WorkflowResponse( + id=workflow.id, + name=workflow.name, + description=workflow.description, + workflow_type=workflow.workflow_type, + project_id=workflow.project_id, + status=workflow.status, + schedule=workflow.schedule, + schedule_type=workflow.schedule_type, + config=workflow.config, + webhook_ids=workflow.webhook_ids, + is_active=workflow.is_active, + created_at=workflow.created_at, + updated_at=workflow.updated_at, + last_run_at=workflow.last_run_at, + next_run_at=workflow.next_run_at, + run_count=workflow.run_count, + success_count=workflow.success_count, + fail_count=workflow.fail_count + ) + + +@app.patch("/api/v1/workflows/{workflow_id}", response_model=WorkflowResponse, tags=["Workflows"]) +async def update_workflow_endpoint(workflow_id: str, request: WorkflowUpdate, _=Depends(verify_api_key)): + """更新工作流""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + + update_data = {k: v for k, v in request.dict().items() if v is not None} + updated = manager.update_workflow(workflow_id, **update_data) + + if not updated: + raise HTTPException(status_code=404, detail="Workflow not found") + + return WorkflowResponse( + id=updated.id, + name=updated.name, + description=updated.description, + workflow_type=updated.workflow_type, + project_id=updated.project_id, + status=updated.status, + schedule=updated.schedule, + schedule_type=updated.schedule_type, + config=updated.config, + webhook_ids=updated.webhook_ids, + is_active=updated.is_active, + created_at=updated.created_at, + updated_at=updated.updated_at, + last_run_at=updated.last_run_at, + next_run_at=updated.next_run_at, + run_count=updated.run_count, + success_count=updated.success_count, + fail_count=updated.fail_count + ) + + +@app.delete("/api/v1/workflows/{workflow_id}", tags=["Workflows"]) +async def delete_workflow_endpoint(workflow_id: str, _=Depends(verify_api_key)): + """删除工作流""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + success = manager.delete_workflow(workflow_id) + + if not success: + raise HTTPException(status_code=404, detail="Workflow not found") + + return {"success": True, "message": "Workflow deleted successfully"} + + +@app.post("/api/v1/workflows/{workflow_id}/trigger", response_model=WorkflowTriggerResponse, tags=["Workflows"]) +async def trigger_workflow_endpoint(workflow_id: str, request: WorkflowTriggerRequest = None, _=Depends(verify_api_key)): + """手动触发工作流""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + + try: + result = await manager.execute_workflow( + workflow_id, + input_data=request.input_data if request else {} + ) + + return WorkflowTriggerResponse( + success=result["success"], + workflow_id=result["workflow_id"], + log_id=result["log_id"], + results=result["results"], + duration_ms=result["duration_ms"] + ) + except ValueError as e: + raise HTTPException(status_code=404, detail=str(e)) + except Exception as e: + raise HTTPException(status_code=500, detail=str(e)) + + +@app.get("/api/v1/workflows/{workflow_id}/logs", response_model=WorkflowLogListResponse, tags=["Workflows"]) +async def get_workflow_logs_endpoint( + workflow_id: str, + status: Optional[str] = None, + limit: int = 100, + offset: int = 0, + _=Depends(verify_api_key) +): + """获取工作流执行日志""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + logs = manager.list_logs(workflow_id=workflow_id, status=status, limit=limit, offset=offset) + + return WorkflowLogListResponse( + logs=[ + WorkflowLogResponse( + id=log.id, + workflow_id=log.workflow_id, + task_id=log.task_id, + status=log.status, + start_time=log.start_time, + end_time=log.end_time, + duration_ms=log.duration_ms, + input_data=log.input_data, + output_data=log.output_data, + error_message=log.error_message, + created_at=log.created_at + ) + for log in logs + ], + total=len(logs) + ) + + +@app.get("/api/v1/workflows/{workflow_id}/stats", response_model=WorkflowStatsResponse, tags=["Workflows"]) +async def get_workflow_stats_endpoint(workflow_id: str, days: int = 30, _=Depends(verify_api_key)): + """获取工作流执行统计""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + stats = manager.get_workflow_stats(workflow_id, days) + + return WorkflowStatsResponse(**stats) + + +# ==================== Phase 7: Webhook Endpoints ==================== + +@app.post("/api/v1/webhooks", response_model=WebhookResponse, tags=["Webhooks"]) +async def create_webhook_endpoint(request: WebhookCreate, _=Depends(verify_api_key)): + """ + 创建 Webhook 配置 + + Webhook 类型: + - **feishu**: 飞书机器人 + - **dingtalk**: 钉钉机器人 + - **slack**: Slack Incoming Webhook + - **custom**: 自定义 Webhook + """ + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + + try: + webhook = WebhookConfig( + id=str(uuid.uuid4())[:8], + name=request.name, + webhook_type=request.webhook_type, + url=request.url, + secret=request.secret, + headers=request.headers, + template=request.template + ) + + created = manager.create_webhook(webhook) + + return WebhookResponse( + id=created.id, + name=created.name, + webhook_type=created.webhook_type, + url=created.url, + headers=created.headers, + template=created.template, + is_active=created.is_active, + created_at=created.created_at, + updated_at=created.updated_at, + last_used_at=created.last_used_at, + success_count=created.success_count, + fail_count=created.fail_count + ) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) + + +@app.get("/api/v1/webhooks", response_model=WebhookListResponse, tags=["Webhooks"]) +async def list_webhooks_endpoint(_=Depends(verify_api_key)): + """获取 Webhook 列表""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + webhooks = manager.list_webhooks() + + return WebhookListResponse( + webhooks=[ + WebhookResponse( + id=w.id, + name=w.name, + webhook_type=w.webhook_type, + url=w.url, + headers=w.headers, + template=w.template, + is_active=w.is_active, + created_at=w.created_at, + updated_at=w.updated_at, + last_used_at=w.last_used_at, + success_count=w.success_count, + fail_count=w.fail_count + ) + for w in webhooks + ], + total=len(webhooks) + ) + + +@app.get("/api/v1/webhooks/{webhook_id}", response_model=WebhookResponse, tags=["Webhooks"]) +async def get_webhook_endpoint(webhook_id: str, _=Depends(verify_api_key)): + """获取单个 Webhook 详情""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + webhook = manager.get_webhook(webhook_id) + + if not webhook: + raise HTTPException(status_code=404, detail="Webhook not found") + + return WebhookResponse( + id=webhook.id, + name=webhook.name, + webhook_type=webhook.webhook_type, + url=webhook.url, + headers=webhook.headers, + template=webhook.template, + is_active=webhook.is_active, + created_at=webhook.created_at, + updated_at=webhook.updated_at, + last_used_at=webhook.last_used_at, + success_count=webhook.success_count, + fail_count=webhook.fail_count + ) + + +@app.patch("/api/v1/webhooks/{webhook_id}", response_model=WebhookResponse, tags=["Webhooks"]) +async def update_webhook_endpoint(webhook_id: str, request: WebhookUpdate, _=Depends(verify_api_key)): + """更新 Webhook 配置""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + + update_data = {k: v for k, v in request.dict().items() if v is not None} + updated = manager.update_webhook(webhook_id, **update_data) + + if not updated: + raise HTTPException(status_code=404, detail="Webhook not found") + + return WebhookResponse( + id=updated.id, + name=updated.name, + webhook_type=updated.webhook_type, + url=updated.url, + headers=updated.headers, + template=updated.template, + is_active=updated.is_active, + created_at=updated.created_at, + updated_at=updated.updated_at, + last_used_at=updated.last_used_at, + success_count=updated.success_count, + fail_count=updated.fail_count + ) + + +@app.delete("/api/v1/webhooks/{webhook_id}", tags=["Webhooks"]) +async def delete_webhook_endpoint(webhook_id: str, _=Depends(verify_api_key)): + """删除 Webhook 配置""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + success = manager.delete_webhook(webhook_id) + + if not success: + raise HTTPException(status_code=404, detail="Webhook not found") + + return {"success": True, "message": "Webhook deleted successfully"} + + +@app.post("/api/v1/webhooks/{webhook_id}/test", tags=["Webhooks"]) +async def test_webhook_endpoint(webhook_id: str, _=Depends(verify_api_key)): + """测试 Webhook 配置""" + if not WORKFLOW_AVAILABLE: + raise HTTPException(status_code=503, detail="Workflow automation not available") + + manager = get_workflow_manager_instance() + webhook = manager.get_webhook(webhook_id) + + if not webhook: + raise HTTPException(status_code=404, detail="Webhook not found") + + # 构建测试消息 + test_message = { + "content": "🔔 这是来自 InsightFlow 的 Webhook 测试消息\n\n如果您收到这条消息,说明 Webhook 配置正确!" + } + + if webhook.webhook_type == "slack": + test_message = {"text": "🔔 这是来自 InsightFlow 的 Webhook 测试消息\n\n如果您收到这条消息,说明 Webhook 配置正确!"} + + success = await manager.notifier.send(webhook, test_message) + manager.update_webhook_stats(webhook_id, success) + + if success: + return {"success": True, "message": "Webhook test sent successfully"} + else: + raise HTTPException(status_code=400, detail="Webhook test failed") + + +# ==================== Phase 7: Multimodal Support Endpoints ==================== + +# Pydantic Models for Multimodal API +class VideoUploadResponse(BaseModel): + video_id: str + project_id: str + filename: str + status: str + audio_extracted: bool + frame_count: int + ocr_text_preview: str + message: str + + +class ImageUploadResponse(BaseModel): + image_id: str + project_id: str + filename: str + image_type: str + ocr_text_preview: str + description: str + entity_count: int + status: str + + +class MultimodalEntityLinkResponse(BaseModel): + link_id: str + source_entity_id: str + target_entity_id: str + source_modality: str + target_modality: str + link_type: str + confidence: float + evidence: str + + +class MultimodalAlignmentRequest(BaseModel): + project_id: str + threshold: float = 0.85 + + +class MultimodalAlignmentResponse(BaseModel): + project_id: str + aligned_count: int + links: List[MultimodalEntityLinkResponse] + message: str + + +class MultimodalStatsResponse(BaseModel): + project_id: str + video_count: int + image_count: int + multimodal_entity_count: int + cross_modal_links: int + modality_distribution: Dict[str, int] + + +@app.post("/api/v1/projects/{project_id}/upload-video", response_model=VideoUploadResponse, tags=["Multimodal"]) +async def upload_video_endpoint( + project_id: str, + file: UploadFile = File(...), + extract_interval: int = Form(5), + _=Depends(verify_api_key) +): + """ + 上传视频文件进行处理 + + - 提取音频轨道 + - 提取关键帧(每 N 秒一帧) + - 对关键帧进行 OCR 识别 + - 将视频、音频、OCR 结果整合 + + **参数:** + - **extract_interval**: 关键帧提取间隔(秒),默认 5 秒 + """ + if not MULTIMODAL_AVAILABLE: + raise HTTPException(status_code=503, detail="Multimodal processing not available") - # 获取项目信息 if not DB_AVAILABLE: - raise HTTPException(status_code=503, detail="Database not available") + raise HTTPException(status_code=500, detail="Database not available") db = get_db_manager() - project = db.get_project(share.project_id) - + project = db.get_project(project_id) if not project: raise HTTPException(status_code=404, detail="Project not found") - return { - "project": { - "id": project.id, - "name": project.name, - "description": project.description, - "created_at": project.created_at - }, - "permission": share.permission, - "allow_download": share.allow_download, - "allow_export": share.allow_export - } - -@app.delete("/api/v1/shares/{share_id}") -async def revoke_share_link(share_id: str, revoked_by: str = "current_user"): - """撤销分享链接""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + # 读取视频文件 + video_data = await file.read() - manager = get_collab_manager() - success = manager.revoke_share_link(share_id, revoked_by) + # 创建视频处理器 + processor = get_multimodal_processor(frame_interval=extract_interval) - if not success: - raise HTTPException(status_code=404, detail="Share link not found") + # 处理视频 + video_id = str(uuid.uuid4())[:8] + result = processor.process_video(video_data, file.filename, project_id, video_id) - return {"success": True, "message": "Share link revoked"} - -# ----- 评论和批注 ----- - -@app.post("/api/v1/projects/{project_id}/comments") -async def add_comment(project_id: str, request: CommentCreate, author: str = "current_user", author_name: str = "User"): - """添加评论""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + if not result.success: + raise HTTPException(status_code=500, detail=f"Video processing failed: {result.error_message}") - manager = get_collab_manager() - comment = manager.add_comment( - project_id=project_id, - target_type=request.target_type, - target_id=request.target_id, - author=author, - author_name=author_name, - content=request.content, - parent_id=request.parent_id, - mentions=request.mentions + # 保存视频信息到数据库 + conn = db.get_conn() + now = datetime.now().isoformat() + + # 获取视频信息 + video_info = processor.extract_video_info(os.path.join(processor.video_dir, f"{video_id}_{file.filename}")) + + conn.execute( + """INSERT INTO videos + (id, project_id, filename, duration, fps, resolution, + audio_transcript_id, full_ocr_text, extracted_entities, + extracted_relations, status, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (video_id, project_id, file.filename, video_info.get('duration', 0), + video_info.get('fps', 0), + json.dumps({'width': video_info.get('width', 0), 'height': video_info.get('height', 0)}), + None, result.full_text, '[]', '[]', 'completed', now, now) ) - return { - "id": comment.id, - "target_type": comment.target_type, - "target_id": comment.target_id, - "parent_id": comment.parent_id, - "author": comment.author, - "author_name": comment.author_name, - "content": comment.content, - "created_at": comment.created_at, - "resolved": comment.resolved - } + # 保存关键帧信息 + for frame in result.frames: + conn.execute( + """INSERT INTO video_frames + (id, video_id, frame_number, timestamp, image_url, ocr_text, extracted_entities, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", + (frame.id, frame.video_id, frame.frame_number, frame.timestamp, + frame.frame_path, frame.ocr_text, json.dumps(frame.entities_detected), now) + ) + + conn.commit() + conn.close() + + # 提取实体和关系(复用现有的 LLM 提取逻辑) + if result.full_text: + raw_entities, raw_relations = extract_entities_with_llm(result.full_text) + + # 实体对齐并保存 + entity_name_to_id = {} + for raw_ent in raw_entities: + existing = align_entity(project_id, raw_ent["name"], db, raw_ent.get("definition", "")) + + if existing: + entity_name_to_id[raw_ent["name"]] = existing.id + else: + new_ent = db.create_entity(Entity( + id=str(uuid.uuid4())[:8], + project_id=project_id, + name=raw_ent["name"], + type=raw_ent.get("type", "OTHER"), + definition=raw_ent.get("definition", "") + )) + entity_name_to_id[raw_ent["name"]] = new_ent.id + + # 保存多模态实体提及 + conn = db.get_conn() + conn.execute( + """INSERT OR REPLACE INTO multimodal_mentions + (id, project_id, entity_id, modality, source_id, source_type, text_snippet, confidence, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (str(uuid.uuid4())[:8], project_id, entity_name_to_id[raw_ent["name"]], + 'video', video_id, 'video_frame', raw_ent.get("name", ""), 1.0, now) + ) + conn.commit() + conn.close() + + # 保存关系 + for rel in raw_relations: + source_id = entity_name_to_id.get(rel.get("source", "")) + target_id = entity_name_to_id.get(rel.get("target", "")) + if source_id and target_id: + db.create_relation( + project_id=project_id, + source_entity_id=source_id, + target_entity_id=target_id, + relation_type=rel.get("type", "related"), + evidence=result.full_text[:200] + ) + + # 更新视频的实体和关系信息 + conn = db.get_conn() + conn.execute( + "UPDATE videos SET extracted_entities = ?, extracted_relations = ? WHERE id = ?", + (json.dumps(raw_entities), json.dumps(raw_relations), video_id) + ) + conn.commit() + conn.close() + + return VideoUploadResponse( + video_id=video_id, + project_id=project_id, + filename=file.filename, + status="completed", + audio_extracted=bool(result.audio_path), + frame_count=len(result.frames), + ocr_text_preview=result.full_text[:200] + "..." if len(result.full_text) > 200 else result.full_text, + message="Video processed successfully" + ) -@app.get("/api/v1/{target_type}/{target_id}/comments") -async def get_comments(target_type: str, target_id: str, include_resolved: bool = True): - """获取评论列表""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - comments = manager.get_comments(target_type, target_id, include_resolved) - - return { - "count": len(comments), - "comments": [ - { - "id": c.id, - "parent_id": c.parent_id, - "author": c.author, - "author_name": c.author_name, - "content": c.content, - "created_at": c.created_at, - "updated_at": c.updated_at, - "resolved": c.resolved, - "resolved_by": c.resolved_by, - "resolved_at": c.resolved_at - } - for c in comments - ] - } -@app.get("/api/v1/projects/{project_id}/comments") -async def get_project_comments(project_id: str, limit: int = 50, offset: int = 0): - """获取项目下的所有评论""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - comments = manager.get_project_comments(project_id, limit, offset) - - return { - "count": len(comments), - "comments": [ - { - "id": c.id, - "target_type": c.target_type, - "target_id": c.target_id, - "parent_id": c.parent_id, - "author": c.author, - "author_name": c.author_name, - "content": c.content, - "created_at": c.created_at, - "resolved": c.resolved - } - for c in comments - ] - } - -@app.put("/api/v1/comments/{comment_id}") -async def update_comment(comment_id: str, request: CommentUpdate, updated_by: str = "current_user"): - """更新评论""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - comment = manager.update_comment(comment_id, request.content, updated_by) - - if not comment: - raise HTTPException(status_code=404, detail="Comment not found or not authorized") - - return { - "id": comment.id, - "content": comment.content, - "updated_at": comment.updated_at - } - -@app.post("/api/v1/comments/{comment_id}/resolve") -async def resolve_comment(comment_id: str, resolved_by: str = "current_user"): - """标记评论为已解决""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - success = manager.resolve_comment(comment_id, resolved_by) - - if not success: - raise HTTPException(status_code=404, detail="Comment not found") - - return {"success": True, "message": "Comment resolved"} - -@app.delete("/api/v1/comments/{comment_id}") -async def delete_comment(comment_id: str, deleted_by: str = "current_user"): - """删除评论""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - success = manager.delete_comment(comment_id, deleted_by) - - if not success: - raise HTTPException(status_code=404, detail="Comment not found or not authorized") - - return {"success": True, "message": "Comment deleted"} - -# ----- 变更历史 ----- - -@app.get("/api/v1/projects/{project_id}/history") -async def get_change_history( +@app.post("/api/v1/projects/{project_id}/upload-image", response_model=ImageUploadResponse, tags=["Multimodal"]) +async def upload_image_endpoint( project_id: str, - entity_type: Optional[str] = None, - entity_id: Optional[str] = None, - limit: int = 50, - offset: int = 0 + file: UploadFile = File(...), + detect_type: bool = Form(True), + _=Depends(verify_api_key) ): - """获取变更历史""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + """ + 上传图片文件进行处理 - manager = get_collab_manager() - records = manager.get_change_history(project_id, entity_type, entity_id, limit, offset) + - 图片内容识别(白板、PPT、手写笔记) + - 使用 OCR 识别图片中的文字 + - 提取图片中的实体和关系 - return { - "count": len(records), - "history": [ - { - "id": r.id, - "change_type": r.change_type, - "entity_type": r.entity_type, - "entity_id": r.entity_id, - "entity_name": r.entity_name, - "changed_by": r.changed_by, - "changed_by_name": r.changed_by_name, - "changed_at": r.changed_at, - "old_value": r.old_value, - "new_value": r.new_value, - "description": r.description, - "reverted": r.reverted - } - for r in records - ] - } - -@app.get("/api/v1/projects/{project_id}/history/stats") -async def get_change_history_stats(project_id: str): - """获取变更统计""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + **参数:** + - **detect_type**: 是否自动检测图片类型,默认 True + """ + if not IMAGE_PROCESSOR_AVAILABLE: + raise HTTPException(status_code=503, detail="Image processing not available") - manager = get_collab_manager() - stats = manager.get_change_stats(project_id) + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") - return stats - -@app.get("/api/v1/{entity_type}/{entity_id}/versions") -async def get_entity_versions(entity_type: str, entity_id: str): - """获取实体版本历史""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + db = get_db_manager() + project = db.get_project(project_id) + if not project: + raise HTTPException(status_code=404, detail="Project not found") - manager = get_collab_manager() - records = manager.get_entity_version_history(entity_type, entity_id) + # 读取图片文件 + image_data = await file.read() - return { - "count": len(records), - "versions": [ - { - "id": r.id, - "change_type": r.change_type, - "changed_by": r.changed_by, - "changed_by_name": r.changed_by_name, - "changed_at": r.changed_at, - "old_value": r.old_value, - "new_value": r.new_value, - "description": r.description - } - for r in records - ] - } - -@app.post("/api/v1/history/{record_id}/revert") -async def revert_change(record_id: str, reverted_by: str = "current_user"): - """回滚变更""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + # 创建图片处理器 + processor = get_image_processor() - manager = get_collab_manager() - success = manager.revert_change(record_id, reverted_by) + # 处理图片 + image_id = str(uuid.uuid4())[:8] + result = processor.process_image(image_data, file.filename, image_id, detect_type) - if not success: - raise HTTPException(status_code=404, detail="Change record not found or already reverted") + if not result.success: + raise HTTPException(status_code=500, detail=f"Image processing failed: {result.error_message}") - return {"success": True, "message": "Change reverted"} - -# ----- 团队成员 ----- - -@app.post("/api/v1/projects/{project_id}/members") -async def invite_team_member(project_id: str, request: TeamMemberInvite, invited_by: str = "current_user"): - """邀请团队成员""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + # 保存图片信息到数据库 + conn = db.get_conn() + now = datetime.now().isoformat() - manager = get_collab_manager() - member = manager.add_team_member( + conn.execute( + """INSERT INTO images + (id, project_id, filename, ocr_text, description, + extracted_entities, extracted_relations, status, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (image_id, project_id, file.filename, result.ocr_text, result.description, + json.dumps([{"name": e.name, "type": e.type, "confidence": e.confidence} for e in result.entities]), + json.dumps([{"source": r.source, "target": r.target, "type": r.relation_type} for r in result.relations]), + 'completed', now, now) + ) + conn.commit() + conn.close() + + # 保存提取的实体 + for entity in result.entities: + existing = align_entity(project_id, entity.name, db, "") + + if not existing: + new_ent = db.create_entity(Entity( + id=str(uuid.uuid4())[:8], + project_id=project_id, + name=entity.name, + type=entity.type, + definition="" + )) + entity_id = new_ent.id + else: + entity_id = existing.id + + # 保存多模态实体提及 + conn = db.get_conn() + conn.execute( + """INSERT OR REPLACE INTO multimodal_mentions + (id, project_id, entity_id, modality, source_id, source_type, text_snippet, confidence, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (str(uuid.uuid4())[:8], project_id, entity_id, + 'image', image_id, result.image_type, entity.name, entity.confidence, now) + ) + conn.commit() + conn.close() + + # 保存提取的关系 + for relation in result.relations: + source_entity = db.get_entity_by_name(project_id, relation.source) + target_entity = db.get_entity_by_name(project_id, relation.target) + + if source_entity and target_entity: + db.create_relation( + project_id=project_id, + source_entity_id=source_entity.id, + target_entity_id=target_entity.id, + relation_type=relation.relation_type, + evidence=result.ocr_text[:200] + ) + + return ImageUploadResponse( + image_id=image_id, project_id=project_id, - user_id=request.user_id, - user_name=request.user_name, - user_email=request.user_email, - role=request.role, - invited_by=invited_by + filename=file.filename, + image_type=result.image_type, + ocr_text_preview=result.ocr_text[:200] + "..." if len(result.ocr_text) > 200 else result.ocr_text, + description=result.description, + entity_count=len(result.entities), + status="completed" + ) + + +@app.post("/api/v1/projects/{project_id}/upload-images-batch", tags=["Multimodal"]) +async def upload_images_batch_endpoint( + project_id: str, + files: List[UploadFile] = File(...), + _=Depends(verify_api_key) +): + """ + 批量上传图片文件进行处理 + + 支持一次上传多张图片,每张图片都会进行 OCR 和实体提取 + """ + if not IMAGE_PROCESSOR_AVAILABLE: + raise HTTPException(status_code=503, detail="Image processing not available") + + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + project = db.get_project(project_id) + if not project: + raise HTTPException(status_code=404, detail="Project not found") + + # 读取所有图片 + images_data = [] + for file in files: + image_data = await file.read() + images_data.append((image_data, file.filename)) + + # 批量处理 + processor = get_image_processor() + batch_result = processor.process_batch(images_data, project_id) + + # 保存结果 + results = [] + for result in batch_result.results: + if result.success: + image_id = result.image_id + + # 保存到数据库 + conn = db.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT INTO images + (id, project_id, filename, ocr_text, description, + extracted_entities, extracted_relations, status, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (image_id, project_id, "batch_image", result.ocr_text, result.description, + json.dumps([{"name": e.name, "type": e.type} for e in result.entities]), + json.dumps([{"source": r.source, "target": r.target} for r in result.relations]), + 'completed', now, now) + ) + conn.commit() + conn.close() + + results.append({ + "image_id": image_id, + "status": "success", + "image_type": result.image_type, + "entity_count": len(result.entities) + }) + else: + results.append({ + "image_id": result.image_id, + "status": "failed", + "error": result.error_message + }) + + return { + "project_id": project_id, + "total_count": batch_result.total_count, + "success_count": batch_result.success_count, + "failed_count": batch_result.failed_count, + "results": results + } + + +@app.post("/api/v1/projects/{project_id}/multimodal/align", response_model=MultimodalAlignmentResponse, tags=["Multimodal"]) +async def align_multimodal_entities_endpoint( + project_id: str, + threshold: float = 0.85, + _=Depends(verify_api_key) +): + """ + 跨模态实体对齐 + + 对齐同一实体在不同模态(音频、视频、图片、文档)中的提及 + + **参数:** + - **threshold**: 相似度阈值,默认 0.85 + """ + if not MULTIMODAL_LINKER_AVAILABLE: + raise HTTPException(status_code=503, detail="Multimodal entity linker not available") + + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + project = db.get_project(project_id) + if not project: + raise HTTPException(status_code=404, detail="Project not found") + + # 获取所有实体 + entities = db.list_project_entities(project_id) + + # 获取多模态提及 + conn = db.get_conn() + mentions = conn.execute( + """SELECT * FROM multimodal_mentions WHERE project_id = ?""", + (project_id,) + ).fetchall() + conn.close() + + # 按模态分组实体 + modality_entities = {"audio": [], "video": [], "image": [], "document": []} + + for mention in mentions: + modality = mention['modality'] + entity = db.get_entity(mention['entity_id']) + if entity and entity.id not in [e.get('id') for e in modality_entities[modality]]: + modality_entities[modality].append({ + 'id': entity.id, + 'name': entity.name, + 'type': entity.type, + 'definition': entity.definition, + 'aliases': entity.aliases + }) + + # 跨模态对齐 + linker = get_multimodal_entity_linker(similarity_threshold=threshold) + links = linker.align_cross_modal_entities( + project_id=project_id, + audio_entities=modality_entities['audio'], + video_entities=modality_entities['video'], + image_entities=modality_entities['image'], + document_entities=modality_entities['document'] ) - return { - "id": member.id, - "user_id": member.user_id, - "user_name": member.user_name, - "user_email": member.user_email, - "role": member.role, - "joined_at": member.joined_at, - "permissions": member.permissions - } + # 保存关联到数据库 + conn = db.get_conn() + now = datetime.now().isoformat() + + saved_links = [] + for link in links: + conn.execute( + """INSERT OR REPLACE INTO multimodal_entity_links + (id, entity_id, linked_entity_id, link_type, confidence, evidence, modalities, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", + (link.id, link.source_entity_id, link.target_entity_id, link.link_type, + link.confidence, link.evidence, + json.dumps([link.source_modality, link.target_modality]), now) + ) + saved_links.append(MultimodalEntityLinkResponse( + link_id=link.id, + source_entity_id=link.source_entity_id, + target_entity_id=link.target_entity_id, + source_modality=link.source_modality, + target_modality=link.target_modality, + link_type=link.link_type, + confidence=link.confidence, + evidence=link.evidence + )) + + conn.commit() + conn.close() + + return MultimodalAlignmentResponse( + project_id=project_id, + aligned_count=len(saved_links), + links=saved_links, + message=f"Successfully aligned {len(saved_links)} cross-modal entity pairs" + ) -@app.get("/api/v1/projects/{project_id}/members") -async def list_team_members(project_id: str): - """列出团队成员""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + +@app.get("/api/v1/projects/{project_id}/multimodal/stats", response_model=MultimodalStatsResponse, tags=["Multimodal"]) +async def get_multimodal_stats_endpoint(project_id: str, _=Depends(verify_api_key)): + """ + 获取项目多模态统计信息 - manager = get_collab_manager() - members = manager.get_team_members(project_id) + 返回项目中视频、图片数量,以及跨模态实体关联统计 + """ + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + project = db.get_project(project_id) + if not project: + raise HTTPException(status_code=404, detail="Project not found") + + conn = db.get_conn() + + # 统计视频数量 + video_count = conn.execute( + "SELECT COUNT(*) as count FROM videos WHERE project_id = ?", + (project_id,) + ).fetchone()['count'] + + # 统计图片数量 + image_count = conn.execute( + "SELECT COUNT(*) as count FROM images WHERE project_id = ?", + (project_id,) + ).fetchone()['count'] + + # 统计多模态实体提及 + multimodal_count = conn.execute( + "SELECT COUNT(DISTINCT entity_id) as count FROM multimodal_mentions WHERE project_id = ?", + (project_id,) + ).fetchone()['count'] + + # 统计跨模态关联 + cross_modal_count = conn.execute( + "SELECT COUNT(*) as count FROM multimodal_entity_links WHERE entity_id IN (SELECT id FROM entities WHERE project_id = ?)", + (project_id,) + ).fetchone()['count'] + + # 模态分布 + modality_dist = {} + for modality in ['audio', 'video', 'image', 'document']: + count = conn.execute( + "SELECT COUNT(*) as count FROM multimodal_mentions WHERE project_id = ? AND modality = ?", + (project_id, modality) + ).fetchone()['count'] + modality_dist[modality] = count + + conn.close() + + return MultimodalStatsResponse( + project_id=project_id, + video_count=video_count, + image_count=image_count, + multimodal_entity_count=multimodal_count, + cross_modal_links=cross_modal_count, + modality_distribution=modality_dist + ) + + +@app.get("/api/v1/projects/{project_id}/videos", tags=["Multimodal"]) +async def list_project_videos_endpoint(project_id: str, _=Depends(verify_api_key)): + """获取项目的视频列表""" + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + conn = db.get_conn() + + videos = conn.execute( + """SELECT id, filename, duration, fps, resolution, + full_ocr_text, status, created_at + FROM videos WHERE project_id = ? ORDER BY created_at DESC""", + (project_id,) + ).fetchall() + + conn.close() + + return [{ + "id": v['id'], + "filename": v['filename'], + "duration": v['duration'], + "fps": v['fps'], + "resolution": json.loads(v['resolution']) if v['resolution'] else None, + "ocr_preview": v['full_ocr_text'][:200] + "..." if v['full_ocr_text'] and len(v['full_ocr_text']) > 200 else v['full_ocr_text'], + "status": v['status'], + "created_at": v['created_at'] + } for v in videos] + + +@app.get("/api/v1/projects/{project_id}/images", tags=["Multimodal"]) +async def list_project_images_endpoint(project_id: str, _=Depends(verify_api_key)): + """获取项目的图片列表""" + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + conn = db.get_conn() + + images = conn.execute( + """SELECT id, filename, ocr_text, description, + extracted_entities, status, created_at + FROM images WHERE project_id = ? ORDER BY created_at DESC""", + (project_id,) + ).fetchall() + + conn.close() + + return [{ + "id": img['id'], + "filename": img['filename'], + "ocr_preview": img['ocr_text'][:200] + "..." if img['ocr_text'] and len(img['ocr_text']) > 200 else img['ocr_text'], + "description": img['description'], + "entity_count": len(json.loads(img['extracted_entities'])) if img['extracted_entities'] else 0, + "status": img['status'], + "created_at": img['created_at'] + } for img in images] + + +@app.get("/api/v1/videos/{video_id}/frames", tags=["Multimodal"]) +async def get_video_frames_endpoint(video_id: str, _=Depends(verify_api_key)): + """获取视频的关键帧列表""" + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + conn = db.get_conn() + + frames = conn.execute( + """SELECT id, frame_number, timestamp, image_url, ocr_text, extracted_entities + FROM video_frames WHERE video_id = ? ORDER BY timestamp""", + (video_id,) + ).fetchall() + + conn.close() + + return [{ + "id": f['id'], + "frame_number": f['frame_number'], + "timestamp": f['timestamp'], + "image_url": f['image_url'], + "ocr_text": f['ocr_text'], + "entities": json.loads(f['extracted_entities']) if f['extracted_entities'] else [] + } for f in frames] + + +@app.get("/api/v1/entities/{entity_id}/multimodal-mentions", tags=["Multimodal"]) +async def get_entity_multimodal_mentions_endpoint(entity_id: str, _=Depends(verify_api_key)): + """获取实体的多模态提及信息""" + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + conn = db.get_conn() + + mentions = conn.execute( + """SELECT m.*, e.name as entity_name + FROM multimodal_mentions m + JOIN entities e ON m.entity_id = e.id + WHERE m.entity_id = ? ORDER BY m.created_at DESC""", + (entity_id,) + ).fetchall() + + conn.close() + + return [{ + "id": m['id'], + "entity_id": m['entity_id'], + "entity_name": m['entity_name'], + "modality": m['modality'], + "source_id": m['source_id'], + "source_type": m['source_type'], + "text_snippet": m['text_snippet'], + "confidence": m['confidence'], + "created_at": m['created_at'] + } for m in mentions] + + +@app.get("/api/v1/projects/{project_id}/multimodal/suggest-merges", tags=["Multimodal"]) +async def suggest_multimodal_merges_endpoint(project_id: str, _=Depends(verify_api_key)): + """ + 建议多模态实体合并 + + 分析不同模态中的实体,建议可以合并的实体对 + """ + if not MULTIMODAL_LINKER_AVAILABLE: + raise HTTPException(status_code=503, detail="Multimodal entity linker not available") + + if not DB_AVAILABLE: + raise HTTPException(status_code=500, detail="Database not available") + + db = get_db_manager() + project = db.get_project(project_id) + if not project: + raise HTTPException(status_code=404, detail="Project not found") + + # 获取所有实体 + entities = db.list_project_entities(project_id) + entity_dicts = [{ + 'id': e.id, + 'name': e.name, + 'type': e.type, + 'definition': e.definition, + 'aliases': e.aliases + } for e in entities] + + # 获取现有链接 + conn = db.get_conn() + existing_links = conn.execute( + """SELECT * FROM multimodal_entity_links + WHERE entity_id IN (SELECT id FROM entities WHERE project_id = ?)""", + (project_id,) + ).fetchall() + conn.close() + + existing_link_objects = [] + for row in existing_links: + existing_link_objects.append(EntityLink( + id=row['id'], + project_id=project_id, + source_entity_id=row['entity_id'], + target_entity_id=row['linked_entity_id'], + link_type=row['link_type'], + source_modality='unknown', + target_modality='unknown', + confidence=row['confidence'], + evidence=row['evidence'] or "" + )) + + # 获取建议 + linker = get_multimodal_entity_linker() + suggestions = linker.suggest_entity_merges(entity_dicts, existing_link_objects) return { - "count": len(members), - "members": [ + "project_id": project_id, + "suggestion_count": len(suggestions), + "suggestions": [ { - "id": m.id, - "user_id": m.user_id, - "user_name": m.user_name, - "user_email": m.user_email, - "role": m.role, - "joined_at": m.joined_at, - "last_active_at": m.last_active_at, - "permissions": m.permissions + "entity1": { + "id": s['entity1'].get('id'), + "name": s['entity1'].get('name'), + "type": s['entity1'].get('type') + }, + "entity2": { + "id": s['entity2'].get('id'), + "name": s['entity2'].get('name'), + "type": s['entity2'].get('type') + }, + "similarity": s['similarity'], + "match_type": s['match_type'], + "suggested_action": s['suggested_action'] } - for m in members + for s in suggestions[:20] # 最多返回20个建议 ] } -@app.put("/api/v1/members/{member_id}/role") -async def update_member_role(member_id: str, request: TeamMemberRoleUpdate, updated_by: str = "current_user"): - """更新成员角色""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") + +# ==================== Phase 7: Multimodal Support API ==================== + +class VideoUploadResponse(BaseModel): + video_id: str + filename: str + duration: float + fps: float + resolution: Dict[str, int] + frames_extracted: int + audio_extracted: bool + ocr_text_length: int + status: str + message: str + + +class ImageUploadResponse(BaseModel): + image_id: str + filename: str + ocr_text_length: int + description: str + status: str + message: str + + +class MultimodalEntityLinkResponse(BaseModel): + link_id: str + entity_id: str + linked_entity_id: str + link_type: str + confidence: float + evidence: str + modalities: List[str] + + +class MultimodalProfileResponse(BaseModel): + entity_id: str + entity_name: str + + +# ==================== Phase 7 Task 7: Plugin Management Pydantic Models ==================== + +class PluginCreate(BaseModel): + name: str = Field(..., description="插件名称") + plugin_type: str = Field(..., description="插件类型: chrome_extension, feishu_bot, dingtalk_bot, zapier, make, webdav, custom") + project_id: str = Field(..., description="关联项目ID") + config: Dict = Field(default_factory=dict, description="插件配置") + + +class PluginUpdate(BaseModel): + name: Optional[str] = None + status: Optional[str] = None # active, inactive, error, pending + config: Optional[Dict] = None + + +class PluginResponse(BaseModel): + id: str + name: str + plugin_type: str + project_id: str + status: str + config: Dict + created_at: str + updated_at: str + last_used_at: Optional[str] + use_count: int + + +class PluginListResponse(BaseModel): + plugins: List[PluginResponse] + total: int + + +class ChromeExtensionTokenCreate(BaseModel): + name: str = Field(..., description="令牌名称") + project_id: Optional[str] = Field(default=None, description="关联项目ID") + permissions: List[str] = Field(default=["read"], description="权限列表: read, write, delete") + expires_days: Optional[int] = Field(default=None, description="过期天数") + + +class ChromeExtensionTokenResponse(BaseModel): + id: str + token: str = Field(..., description="令牌(仅显示一次)") + name: str + project_id: Optional[str] + permissions: List[str] + expires_at: Optional[str] + created_at: str + + +class ChromeExtensionImportRequest(BaseModel): + token: str = Field(..., description="Chrome扩展令牌") + url: str = Field(..., description="网页URL") + title: str = Field(..., description="网页标题") + content: str = Field(..., description="网页正文内容") + html_content: Optional[str] = Field(default=None, description="HTML内容(可选)") + + +class BotSessionCreate(BaseModel): + session_id: str = Field(..., description="群ID或会话ID") + session_name: str = Field(..., description="会话名称") + project_id: Optional[str] = Field(default=None, description="关联项目ID") + webhook_url: str = Field(default="", description="Webhook URL") + secret: str = Field(default="", description="签名密钥") + + +class BotSessionResponse(BaseModel): + id: str + bot_type: str + session_id: str + session_name: str + project_id: Optional[str] + webhook_url: str + is_active: bool + created_at: str + last_message_at: Optional[str] + message_count: int + + +class BotMessageRequest(BaseModel): + session_id: str = Field(..., description="会话ID") + msg_type: str = Field(default="text", description="消息类型: text, audio, file") + content: Dict = Field(default_factory=dict, description="消息内容") + + +class BotMessageResponse(BaseModel): + success: bool + response: str + error: Optional[str] = None + + +class WebhookEndpointCreate(BaseModel): + name: str = Field(..., description="端点名称") + endpoint_type: str = Field(..., description="端点类型: zapier, make, custom") + endpoint_url: str = Field(..., description="Webhook URL") + project_id: Optional[str] = Field(default=None, description="关联项目ID") + auth_type: str = Field(default="none", description="认证类型: none, api_key, oauth, custom") + auth_config: Dict = Field(default_factory=dict, description="认证配置") + trigger_events: List[str] = Field(default_factory=list, description="触发事件列表") + + +class WebhookEndpointResponse(BaseModel): + id: str + name: str + endpoint_type: str + endpoint_url: str + project_id: Optional[str] + auth_type: str + trigger_events: List[str] + is_active: bool + created_at: str + last_triggered_at: Optional[str] + trigger_count: int + + +class WebhookTestResponse(BaseModel): + success: bool + endpoint_id: str + message: str + + +class WebDAVSyncCreate(BaseModel): + name: str = Field(..., description="同步配置名称") + project_id: str = Field(..., description="关联项目ID") + server_url: str = Field(..., description="WebDAV服务器URL") + username: str = Field(..., description="用户名") + password: str = Field(..., description="密码") + remote_path: str = Field(default="/insightflow", description="远程路径") + sync_mode: str = Field(default="bidirectional", description="同步模式: bidirectional, upload_only, download_only") + sync_interval: int = Field(default=3600, description="同步间隔(秒)") + + +class WebDAVSyncResponse(BaseModel): + id: str + name: str + project_id: str + server_url: str + username: str + remote_path: str + sync_mode: str + sync_interval: int + last_sync_at: Optional[str] + last_sync_status: str + is_active: bool + created_at: str + sync_count: int + + +class WebDAVTestResponse(BaseModel): + success: bool + message: str + + +class WebDAVSyncResult(BaseModel): + success: bool + message: str + entities_count: Optional[int] = None + relations_count: Optional[int] = None + remote_path: Optional[str] = None + error: Optional[str] = None + + +# Plugin Manager singleton +_plugin_manager_instance = None + +def get_plugin_manager_instance(): + global _plugin_manager_instance + if _plugin_manager_instance is None and PLUGIN_MANAGER_AVAILABLE and DB_AVAILABLE: + db = get_db_manager() + _plugin_manager_instance = get_plugin_manager(db) + return _plugin_manager_instance + + +# ==================== Phase 7 Task 7: Plugin Management Endpoints ==================== + +@app.post("/api/v1/plugins", response_model=PluginResponse, tags=["Plugins"]) +async def create_plugin_endpoint(request: PluginCreate, _=Depends(verify_api_key)): + """ + 创建插件 - manager = get_collab_manager() - success = manager.update_member_role(member_id, request.role, updated_by) + 插件类型: + - **chrome_extension**: Chrome 扩展 + - **feishu_bot**: 飞书机器人 + - **dingtalk_bot**: 钉钉机器人 + - **zapier**: Zapier 集成 + - **make**: Make (Integromat) 集成 + - **webdav**: WebDAV 同步 + - **custom**: 自定义插件 + """ + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + plugin = Plugin( + id=str(uuid.uuid4())[:8], + name=request.name, + plugin_type=request.plugin_type, + project_id=request.project_id, + config=request.config + ) + + created = manager.create_plugin(plugin) + + return PluginResponse( + id=created.id, + name=created.name, + plugin_type=created.plugin_type, + project_id=created.project_id, + status=created.status, + config=created.config, + created_at=created.created_at, + updated_at=created.updated_at, + last_used_at=created.last_used_at, + use_count=created.use_count + ) + + +@app.get("/api/v1/plugins", response_model=PluginListResponse, tags=["Plugins"]) +async def list_plugins_endpoint( + project_id: Optional[str] = None, + plugin_type: Optional[str] = None, + status: Optional[str] = None, + _=Depends(verify_api_key) +): + """获取插件列表""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + plugins = manager.list_plugins(project_id, plugin_type, status) + + return PluginListResponse( + plugins=[ + PluginResponse( + id=p.id, + name=p.name, + plugin_type=p.plugin_type, + project_id=p.project_id, + status=p.status, + config=p.config, + created_at=p.created_at, + updated_at=p.updated_at, + last_used_at=p.last_used_at, + use_count=p.use_count + ) + for p in plugins + ], + total=len(plugins) + ) + + +@app.get("/api/v1/plugins/{plugin_id}", response_model=PluginResponse, tags=["Plugins"]) +async def get_plugin_endpoint(plugin_id: str, _=Depends(verify_api_key)): + """获取插件详情""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + plugin = manager.get_plugin(plugin_id) + + if not plugin: + raise HTTPException(status_code=404, detail="Plugin not found") + + return PluginResponse( + id=plugin.id, + name=plugin.name, + plugin_type=plugin.plugin_type, + project_id=plugin.project_id, + status=plugin.status, + config=plugin.config, + created_at=plugin.created_at, + updated_at=plugin.updated_at, + last_used_at=plugin.last_used_at, + use_count=plugin.use_count + ) + + +@app.patch("/api/v1/plugins/{plugin_id}", response_model=PluginResponse, tags=["Plugins"]) +async def update_plugin_endpoint(plugin_id: str, request: PluginUpdate, _=Depends(verify_api_key)): + """更新插件""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + update_data = {k: v for k, v in request.dict().items() if v is not None} + updated = manager.update_plugin(plugin_id, **update_data) + + if not updated: + raise HTTPException(status_code=404, detail="Plugin not found") + + return PluginResponse( + id=updated.id, + name=updated.name, + plugin_type=updated.plugin_type, + project_id=updated.project_id, + status=updated.status, + config=updated.config, + created_at=updated.created_at, + updated_at=updated.updated_at, + last_used_at=updated.last_used_at, + use_count=updated.use_count + ) + + +@app.delete("/api/v1/plugins/{plugin_id}", tags=["Plugins"]) +async def delete_plugin_endpoint(plugin_id: str, _=Depends(verify_api_key)): + """删除插件""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + success = manager.delete_plugin(plugin_id) if not success: - raise HTTPException(status_code=404, detail="Member not found") + raise HTTPException(status_code=404, detail="Plugin not found") - return {"success": True, "message": "Member role updated"} + return {"success": True, "message": "Plugin deleted successfully"} -@app.delete("/api/v1/members/{member_id}") -async def remove_team_member(member_id: str, removed_by: str = "current_user"): - """移除团队成员""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") - - manager = get_collab_manager() - success = manager.remove_team_member(member_id, removed_by) - - if not success: - raise HTTPException(status_code=404, detail="Member not found") - - return {"success": True, "message": "Member removed"} -@app.get("/api/v1/projects/{project_id}/permissions") -async def check_project_permissions(project_id: str, user_id: str = "current_user"): - """检查用户权限""" - if not COLLABORATION_AVAILABLE: - raise HTTPException(status_code=503, detail="Collaboration module not available") +# ==================== Phase 7 Task 7: Chrome Extension Endpoints ==================== + +@app.post("/api/v1/plugins/chrome/tokens", response_model=ChromeExtensionTokenResponse, tags=["Chrome Extension"]) +async def create_chrome_token_endpoint(request: ChromeExtensionTokenCreate, _=Depends(verify_api_key)): + """ + 创建 Chrome 扩展令牌 - manager = get_collab_manager() - members = manager.get_team_members(project_id) + 用于 Chrome 扩展验证和授权 + """ + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") - user_member = None - for m in members: - if m.user_id == user_id: - user_member = m - break + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.CHROME_EXTENSION) - if not user_member: - return { - "has_access": False, - "role": None, - "permissions": [] - } + if not handler: + raise HTTPException(status_code=503, detail="Chrome extension handler not available") + + token = handler.create_token( + name=request.name, + project_id=request.project_id, + permissions=request.permissions, + expires_days=request.expires_days + ) + + return ChromeExtensionTokenResponse( + id=token.id, + token=token.token, + name=token.name, + project_id=token.project_id, + permissions=token.permissions, + expires_at=token.expires_at, + created_at=token.created_at + ) + + +@app.get("/api/v1/plugins/chrome/tokens", tags=["Chrome Extension"]) +async def list_chrome_tokens_endpoint( + project_id: Optional[str] = None, + _=Depends(verify_api_key) +): + """列出 Chrome 扩展令牌""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.CHROME_EXTENSION) + + if not handler: + raise HTTPException(status_code=503, detail="Chrome extension handler not available") + + tokens = handler.list_tokens(project_id=project_id) return { - "has_access": True, - "role": user_member.role, - "permissions": user_member.permissions + "tokens": [ + { + "id": t.id, + "name": t.name, + "project_id": t.project_id, + "permissions": t.permissions, + "expires_at": t.expires_at, + "created_at": t.created_at, + "last_used_at": t.last_used_at, + "use_count": t.use_count, + "is_revoked": t.is_revoked + } + for t in tokens + ], + "total": len(tokens) } +@app.delete("/api/v1/plugins/chrome/tokens/{token_id}", tags=["Chrome Extension"]) +async def revoke_chrome_token_endpoint(token_id: str, _=Depends(verify_api_key)): + """撤销 Chrome 扩展令牌""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.CHROME_EXTENSION) + + if not handler: + raise HTTPException(status_code=503, detail="Chrome extension handler not available") + + success = handler.revoke_token(token_id) + + if not success: + raise HTTPException(status_code=404, detail="Token not found") + + return {"success": True, "message": "Token revoked successfully"} + + +@app.post("/api/v1/plugins/chrome/import", tags=["Chrome Extension"]) +async def chrome_import_webpage_endpoint(request: ChromeExtensionImportRequest): + """ + Chrome 扩展导入网页内容 + + 无需 API Key,使用 Chrome 扩展令牌验证 + """ + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.CHROME_EXTENSION) + + if not handler: + raise HTTPException(status_code=503, detail="Chrome extension handler not available") + + # 验证令牌 + token = handler.validate_token(request.token) + if not token: + raise HTTPException(status_code=401, detail="Invalid or expired token") + + # 导入网页 + result = await handler.import_webpage( + token=token, + url=request.url, + title=request.title, + content=request.content, + html_content=request.html_content + ) + + if not result["success"]: + raise HTTPException(status_code=400, detail=result.get("error", "Import failed")) + + return result + + +# ==================== Phase 7 Task 7: Bot Endpoints ==================== + +@app.post("/api/v1/plugins/bot/feishu/sessions", response_model=BotSessionResponse, tags=["Bot"]) +async def create_feishu_session_endpoint(request: BotSessionCreate, _=Depends(verify_api_key)): + """创建飞书机器人会话""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.FEISHU_BOT) + + if not handler: + raise HTTPException(status_code=503, detail="Feishu bot handler not available") + + session = handler.create_session( + session_id=request.session_id, + session_name=request.session_name, + project_id=request.project_id, + webhook_url=request.webhook_url, + secret=request.secret + ) + + return BotSessionResponse( + id=session.id, + bot_type=session.bot_type, + session_id=session.session_id, + session_name=session.session_name, + project_id=session.project_id, + webhook_url=session.webhook_url, + is_active=session.is_active, + created_at=session.created_at, + last_message_at=session.last_message_at, + message_count=session.message_count + ) + + +@app.post("/api/v1/plugins/bot/dingtalk/sessions", response_model=BotSessionResponse, tags=["Bot"]) +async def create_dingtalk_session_endpoint(request: BotSessionCreate, _=Depends(verify_api_key)): + """创建钉钉机器人会话""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.DINGTALK_BOT) + + if not handler: + raise HTTPException(status_code=503, detail="DingTalk bot handler not available") + + session = handler.create_session( + session_id=request.session_id, + session_name=request.session_name, + project_id=request.project_id, + webhook_url=request.webhook_url, + secret=request.secret + ) + + return BotSessionResponse( + id=session.id, + bot_type=session.bot_type, + session_id=session.session_id, + session_name=session.session_name, + project_id=session.project_id, + webhook_url=session.webhook_url, + is_active=session.is_active, + created_at=session.created_at, + last_message_at=session.last_message_at, + message_count=session.message_count + ) + + +@app.get("/api/v1/plugins/bot/{bot_type}/sessions", tags=["Bot"]) +async def list_bot_sessions_endpoint( + bot_type: str, + project_id: Optional[str] = None, + _=Depends(verify_api_key) +): + """列出机器人会话""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + if bot_type == "feishu": + handler = manager.get_handler(PluginType.FEISHU_BOT) + elif bot_type == "dingtalk": + handler = manager.get_handler(PluginType.DINGTALK_BOT) + else: + raise HTTPException(status_code=400, detail="Invalid bot type. Must be feishu or dingtalk") + + if not handler: + raise HTTPException(status_code=503, detail=f"{bot_type} bot handler not available") + + sessions = handler.list_sessions(project_id=project_id) + + return { + "sessions": [ + { + "id": s.id, + "bot_type": s.bot_type, + "session_id": s.session_id, + "session_name": s.session_name, + "project_id": s.project_id, + "is_active": s.is_active, + "created_at": s.created_at, + "last_message_at": s.last_message_at, + "message_count": s.message_count + } + for s in sessions + ], + "total": len(sessions) + } + + +@app.post("/api/v1/plugins/bot/{bot_type}/webhook", tags=["Bot"]) +async def bot_webhook_endpoint(bot_type: str, request: Request): + """ + 机器人 Webhook 接收端点 + + 接收飞书/钉钉机器人的消息 + """ + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + if bot_type == "feishu": + handler = manager.get_handler(PluginType.FEISHU_BOT) + elif bot_type == "dingtalk": + handler = manager.get_handler(PluginType.DINGTALK_BOT) + else: + raise HTTPException(status_code=400, detail="Invalid bot type") + + if not handler: + raise HTTPException(status_code=503, detail=f"{bot_type} bot handler not available") + + # 获取消息内容 + message = await request.json() + + # 获取会话ID(飞书和钉钉的格式不同) + if bot_type == "feishu": + session_id = message.get('chat_id') or message.get('open_chat_id') + else: # dingtalk + session_id = message.get('conversationId') or message.get('senderStaffId') + + if not session_id: + raise HTTPException(status_code=400, detail="Cannot identify session") + + # 获取会话 + session = handler.get_session(session_id) + if not session: + # 自动创建会话 + session = handler.create_session( + session_id=session_id, + session_name=f"Auto-{session_id[:8]}", + webhook_url="" + ) + + # 处理消息 + result = await handler.handle_message(session, message) + + # 如果配置了 webhook,发送回复 + if session.webhook_url and result.get("response"): + await handler.send_message(session, result["response"]) + + return result + + +@app.post("/api/v1/plugins/bot/{bot_type}/sessions/{session_id}/send", tags=["Bot"]) +async def send_bot_message_endpoint( + bot_type: str, + session_id: str, + message: str, + _=Depends(verify_api_key) +): + """发送消息到机器人会话""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + if bot_type == "feishu": + handler = manager.get_handler(PluginType.FEISHU_BOT) + elif bot_type == "dingtalk": + handler = manager.get_handler(PluginType.DINGTALK_BOT) + else: + raise HTTPException(status_code=400, detail="Invalid bot type") + + if not handler: + raise HTTPException(status_code=503, detail=f"{bot_type} bot handler not available") + + session = handler.get_session(session_id) + if not session: + raise HTTPException(status_code=404, detail="Session not found") + + success = await handler.send_message(session, message) + + return {"success": success, "message": "Message sent" if success else "Failed to send message"} + + +# ==================== Phase 7 Task 7: Integration Endpoints ==================== + +@app.post("/api/v1/plugins/integrations/zapier", response_model=WebhookEndpointResponse, tags=["Integrations"]) +async def create_zapier_endpoint(request: WebhookEndpointCreate, _=Depends(verify_api_key)): + """创建 Zapier Webhook 端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.ZAPIER) + + if not handler: + raise HTTPException(status_code=503, detail="Zapier handler not available") + + endpoint = handler.create_endpoint( + name=request.name, + endpoint_url=request.endpoint_url, + project_id=request.project_id, + auth_type=request.auth_type, + auth_config=request.auth_config, + trigger_events=request.trigger_events + ) + + return WebhookEndpointResponse( + id=endpoint.id, + name=endpoint.name, + endpoint_type=endpoint.endpoint_type, + endpoint_url=endpoint.endpoint_url, + project_id=endpoint.project_id, + auth_type=endpoint.auth_type, + trigger_events=endpoint.trigger_events, + is_active=endpoint.is_active, + created_at=endpoint.created_at, + last_triggered_at=endpoint.last_triggered_at, + trigger_count=endpoint.trigger_count + ) + + +@app.post("/api/v1/plugins/integrations/make", response_model=WebhookEndpointResponse, tags=["Integrations"]) +async def create_make_endpoint(request: WebhookEndpointCreate, _=Depends(verify_api_key)): + """创建 Make (Integromat) Webhook 端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.MAKE) + + if not handler: + raise HTTPException(status_code=503, detail="Make handler not available") + + endpoint = handler.create_endpoint( + name=request.name, + endpoint_url=request.endpoint_url, + project_id=request.project_id, + auth_type=request.auth_type, + auth_config=request.auth_config, + trigger_events=request.trigger_events + ) + + return WebhookEndpointResponse( + id=endpoint.id, + name=endpoint.name, + endpoint_type=endpoint.endpoint_type, + endpoint_url=endpoint.endpoint_url, + project_id=endpoint.project_id, + auth_type=endpoint.auth_type, + trigger_events=endpoint.trigger_events, + is_active=endpoint.is_active, + created_at=endpoint.created_at, + last_triggered_at=endpoint.last_triggered_at, + trigger_count=endpoint.trigger_count + ) + + +@app.get("/api/v1/plugins/integrations/{endpoint_type}", tags=["Integrations"]) +async def list_integration_endpoints_endpoint( + endpoint_type: str, + project_id: Optional[str] = None, + _=Depends(verify_api_key) +): + """列出集成端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + if endpoint_type == "zapier": + handler = manager.get_handler(PluginType.ZAPIER) + elif endpoint_type == "make": + handler = manager.get_handler(PluginType.MAKE) + else: + raise HTTPException(status_code=400, detail="Invalid endpoint type") + + if not handler: + raise HTTPException(status_code=503, detail=f"{endpoint_type} handler not available") + + endpoints = handler.list_endpoints(project_id=project_id) + + return { + "endpoints": [ + { + "id": e.id, + "name": e.name, + "endpoint_type": e.endpoint_type, + "endpoint_url": e.endpoint_url, + "project_id": e.project_id, + "auth_type": e.auth_type, + "trigger_events": e.trigger_events, + "is_active": e.is_active, + "created_at": e.created_at, + "last_triggered_at": e.last_triggered_at, + "trigger_count": e.trigger_count + } + for e in endpoints + ], + "total": len(endpoints) + } + + +@app.post("/api/v1/plugins/integrations/{endpoint_id}/test", response_model=WebhookTestResponse, tags=["Integrations"]) +async def test_integration_endpoint(endpoint_id: str, _=Depends(verify_api_key)): + """测试集成端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + # 尝试获取端点(可能是 Zapier 或 Make) + handler = manager.get_handler(PluginType.ZAPIER) + endpoint = handler.get_endpoint(endpoint_id) if handler else None + + if not endpoint: + handler = manager.get_handler(PluginType.MAKE) + endpoint = handler.get_endpoint(endpoint_id) if handler else None + + if not endpoint: + raise HTTPException(status_code=404, detail="Endpoint not found") + + result = await handler.test_endpoint(endpoint) + + return WebhookTestResponse( + success=result["success"], + endpoint_id=endpoint_id, + message=result["message"] + ) + + +@app.post("/api/v1/plugins/integrations/{endpoint_id}/trigger", tags=["Integrations"]) +async def trigger_integration_endpoint( + endpoint_id: str, + event_type: str, + data: Dict, + _=Depends(verify_api_key) +): + """手动触发集成端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + + # 尝试获取端点(可能是 Zapier 或 Make) + handler = manager.get_handler(PluginType.ZAPIER) + endpoint = handler.get_endpoint(endpoint_id) if handler else None + + if not endpoint: + handler = manager.get_handler(PluginType.MAKE) + endpoint = handler.get_endpoint(endpoint_id) if handler else None + + if not endpoint: + raise HTTPException(status_code=404, detail="Endpoint not found") + + success = await handler.trigger(endpoint, event_type, data) + + return {"success": success, "message": "Triggered successfully" if success else "Trigger failed"} + + +# ==================== Phase 7 Task 7: WebDAV Endpoints ==================== + +@app.post("/api/v1/plugins/webdav", response_model=WebDAVSyncResponse, tags=["WebDAV"]) +async def create_webdav_sync_endpoint(request: WebDAVSyncCreate, _=Depends(verify_api_key)): + """ + 创建 WebDAV 同步配置 + + 支持与坚果云等 WebDAV 网盘同步项目数据 + """ + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.WEBDAV) + + if not handler: + raise HTTPException(status_code=503, detail="WebDAV handler not available") + + sync = handler.create_sync( + name=request.name, + project_id=request.project_id, + server_url=request.server_url, + username=request.username, + password=request.password, + remote_path=request.remote_path, + sync_mode=request.sync_mode, + sync_interval=request.sync_interval + ) + + return WebDAVSyncResponse( + id=sync.id, + name=sync.name, + project_id=sync.project_id, + server_url=sync.server_url, + username=sync.username, + remote_path=sync.remote_path, + sync_mode=sync.sync_mode, + sync_interval=sync.sync_interval, + last_sync_at=sync.last_sync_at, + last_sync_status=sync.last_sync_status, + is_active=sync.is_active, + created_at=sync.created_at, + sync_count=sync.sync_count + ) + + +@app.get("/api/v1/plugins/webdav", tags=["WebDAV"]) +async def list_webdav_syncs_endpoint( + project_id: Optional[str] = None, + _=Depends(verify_api_key) +): + """列出 WebDAV 同步配置""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.WEBDAV) + + if not handler: + raise HTTPException(status_code=503, detail="WebDAV handler not available") + + syncs = handler.list_syncs(project_id=project_id) + + return { + "syncs": [ + { + "id": s.id, + "name": s.name, + "project_id": s.project_id, + "server_url": s.server_url, + "username": s.username, + "remote_path": s.remote_path, + "sync_mode": s.sync_mode, + "sync_interval": s.sync_interval, + "last_sync_at": s.last_sync_at, + "last_sync_status": s.last_sync_status, + "is_active": s.is_active, + "created_at": s.created_at, + "sync_count": s.sync_count + } + for s in syncs + ], + "total": len(syncs) + } + + +@app.post("/api/v1/plugins/webdav/{sync_id}/test", response_model=WebDAVTestResponse, tags=["WebDAV"]) +async def test_webdav_connection_endpoint(sync_id: str, _=Depends(verify_api_key)): + """测试 WebDAV 连接""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.WEBDAV) + + if not handler: + raise HTTPException(status_code=503, detail="WebDAV handler not available") + + sync = handler.get_sync(sync_id) + if not sync: + raise HTTPException(status_code=404, detail="Sync configuration not found") + + result = await handler.test_connection(sync) + + return WebDAVTestResponse( + success=result["success"], + message=result.get("message") or result.get("error", "Unknown result") + ) + + +@app.post("/api/v1/plugins/webdav/{sync_id}/sync", response_model=WebDAVSyncResult, tags=["WebDAV"]) +async def sync_webdav_endpoint(sync_id: str, _=Depends(verify_api_key)): + """执行 WebDAV 同步""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.WEBDAV) + + if not handler: + raise HTTPException(status_code=503, detail="WebDAV handler not available") + + sync = handler.get_sync(sync_id) + if not sync: + raise HTTPException(status_code=404, detail="Sync configuration not found") + + result = await handler.sync_project(sync) + + return WebDAVSyncResult( + success=result["success"], + message=result.get("message") or result.get("error", "Sync completed"), + entities_count=result.get("entities_count"), + relations_count=result.get("relations_count"), + remote_path=result.get("remote_path"), + error=result.get("error") + ) + + +@app.delete("/api/v1/plugins/webdav/{sync_id}", tags=["WebDAV"]) +async def delete_webdav_sync_endpoint(sync_id: str, _=Depends(verify_api_key)): + """删除 WebDAV 同步配置""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager_instance() + handler = manager.get_handler(PluginType.WEBDAV) + + if not handler: + raise HTTPException(status_code=503, detail="WebDAV handler not available") + + success = handler.delete_sync(sync_id) + + if not success: + raise HTTPException(status_code=404, detail="Sync configuration not found") + + return {"success": True, "message": "WebDAV sync configuration deleted"} + + +@app.get("/api/v1/openapi.json", include_in_schema=False) +async def get_openapi(): + """获取 OpenAPI 规范""" + from fastapi.openapi.utils import get_openapi + return get_openapi( + title=app.title, + version=app.version, + description=app.description, + routes=app.routes, + tags=app.openapi_tags + ) + + +# Serve frontend - MUST be last to not override API routes +app.mount("/", StaticFiles(directory="frontend", html=True), name="frontend") + +if __name__ == "__main__": + import uvicorn + uvicorn.run(app, host="0.0.0.0", port=8000) + +class PluginCreateRequest(BaseModel): + name: str + plugin_type: str + project_id: Optional[str] = None + config: Optional[Dict] = {} + + +class PluginResponse(BaseModel): + id: str + name: str + plugin_type: str + project_id: Optional[str] + status: str + api_key: str + created_at: str + + +class BotSessionResponse(BaseModel): + id: str + plugin_id: str + platform: str + session_id: str + user_id: Optional[str] + user_name: Optional[str] + project_id: Optional[str] + message_count: int + created_at: str + last_message_at: Optional[str] + + +class WebhookEndpointResponse(BaseModel): + id: str + plugin_id: str + name: str + endpoint_path: str + endpoint_type: str + target_project_id: Optional[str] + is_active: bool + trigger_count: int + created_at: str + + +class WebDAVSyncResponse(BaseModel): + id: str + plugin_id: str + name: str + server_url: str + username: str + remote_path: str + local_path: str + sync_direction: str + sync_mode: str + auto_analyze: bool + is_active: bool + last_sync_at: Optional[str] + created_at: str + + +class ChromeClipRequest(BaseModel): + url: str + title: str + content: str + content_type: str = "page" + meta: Optional[Dict] = {} + project_id: Optional[str] = None + + +class ChromeClipResponse(BaseModel): + clip_id: str + project_id: str + url: str + title: str + status: str + message: str + + +class BotMessageRequest(BaseModel): + platform: str + session_id: str + user_id: Optional[str] = None + user_name: Optional[str] = None + message_type: str + content: str + project_id: Optional[str] = None + + +class BotMessageResponse(BaseModel): + success: bool + reply: Optional[str] = None + session_id: str + action: Optional[str] = None + + +class WebhookPayload(BaseModel): + event: str + data: Dict + + +@app.post("/api/v1/plugins", response_model=PluginResponse, tags=["Plugins"]) +async def create_plugin( + request: PluginCreateRequest, + api_key: str = Depends(verify_api_key) +): + """创建插件""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + plugin = manager.create_plugin( + name=request.name, + plugin_type=request.plugin_type, + project_id=request.project_id, + config=request.config + ) + + return PluginResponse( + id=plugin.id, + name=plugin.name, + plugin_type=plugin.plugin_type, + project_id=plugin.project_id, + status=plugin.status, + api_key=plugin.api_key, + created_at=plugin.created_at + ) + + +@app.get("/api/v1/plugins", tags=["Plugins"]) +async def list_plugins( + project_id: Optional[str] = None, + plugin_type: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """列出插件""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + plugins = manager.list_plugins(project_id=project_id, plugin_type=plugin_type) + + return { + "plugins": [ + { + "id": p.id, + "name": p.name, + "plugin_type": p.plugin_type, + "project_id": p.project_id, + "status": p.status, + "use_count": p.use_count, + "created_at": p.created_at + } + for p in plugins + ] + } + + +@app.get("/api/v1/plugins/{plugin_id}", response_model=PluginResponse, tags=["Plugins"]) +async def get_plugin( + plugin_id: str, + api_key: str = Depends(verify_api_key) +): + """获取插件详情""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + plugin = manager.get_plugin(plugin_id) + + if not plugin: + raise HTTPException(status_code=404, detail="Plugin not found") + + return PluginResponse( + id=plugin.id, + name=plugin.name, + plugin_type=plugin.plugin_type, + project_id=plugin.project_id, + status=plugin.status, + api_key=plugin.api_key, + created_at=plugin.created_at + ) + + +@app.delete("/api/v1/plugins/{plugin_id}", tags=["Plugins"]) +async def delete_plugin( + plugin_id: str, + api_key: str = Depends(verify_api_key) +): + """删除插件""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + manager.delete_plugin(plugin_id) + + return {"success": True, "message": "Plugin deleted"} + + +@app.post("/api/v1/plugins/{plugin_id}/regenerate-key", tags=["Plugins"]) +async def regenerate_plugin_key( + plugin_id: str, + api_key: str = Depends(verify_api_key) +): + """重新生成插件 API Key""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + new_key = manager.regenerate_api_key(plugin_id) + + return {"success": True, "api_key": new_key} + + +# ==================== Chrome Extension API ==================== + +@app.post("/api/v1/plugins/chrome/clip", response_model=ChromeClipResponse, tags=["Chrome Extension"]) +async def chrome_clip( + request: ChromeClipRequest, + x_api_key: Optional[str] = Header(None, alias="X-API-Key") +): + """Chrome 插件保存网页内容""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + if not x_api_key: + raise HTTPException(status_code=401, detail="API Key required") + + manager = get_plugin_manager() + plugin = manager.get_plugin_by_api_key(x_api_key) + + if not plugin or plugin.plugin_type != "chrome_extension": + raise HTTPException(status_code=401, detail="Invalid API Key") + + # 确定目标项目 + project_id = request.project_id or plugin.project_id + if not project_id: + raise HTTPException(status_code=400, detail="Project ID required") + + # 创建转录记录(将网页内容作为文档处理) + db = get_db_manager() + + # 生成文档内容 + doc_content = f"""# {request.title} + +URL: {request.url} + +## 内容 + +{request.content} + +## 元数据 + +{json.dumps(request.meta, ensure_ascii=False, indent=2)} +""" + + # 创建转录记录 + transcript_id = db.create_transcript( + project_id=project_id, + filename=f"clip_{request.title[:50]}.md", + full_text=doc_content, + transcript_type="document" + ) + + # 记录活动 + manager.log_activity( + plugin_id=plugin.id, + activity_type="clip", + source="chrome_extension", + details={ + "url": request.url, + "title": request.title, + "project_id": project_id, + "transcript_id": transcript_id + } + ) + + return ChromeClipResponse( + clip_id=str(uuid.uuid4()), + project_id=project_id, + url=request.url, + title=request.title, + status="success", + message="Content saved successfully" + ) + + +# ==================== Bot API ==================== + +@app.post("/api/v1/bots/webhook/{platform}", response_model=BotMessageResponse, tags=["Bot"]) +async def bot_webhook( + platform: str, + request: Request, + x_signature: Optional[str] = Header(None, alias="X-Signature") +): + """接收机器人 Webhook 消息(飞书/钉钉/Slack)""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + body = await request.body() + payload = json.loads(body) + + manager = get_plugin_manager() + handler = BotHandler(manager) + + # 解析消息 + if platform == "feishu": + message = handler.parse_feishu_message(payload) + elif platform == "dingtalk": + message = handler.parse_dingtalk_message(payload) + elif platform == "slack": + message = handler.parse_slack_message(payload) + else: + raise HTTPException(status_code=400, detail=f"Unsupported platform: {platform}") + + # 查找或创建会话 + # 这里简化处理,实际应该根据 plugin_id 查找 + # 暂时返回简单的回复 + + return BotMessageResponse( + success=True, + reply="收到消息!请使用 InsightFlow 控制台查看更多功能。", + session_id=message.get("session_id", ""), + action="reply" + ) + + +@app.get("/api/v1/bots/sessions", response_model=List[BotSessionResponse], tags=["Bot"]) +async def list_bot_sessions( + plugin_id: Optional[str] = None, + project_id: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """列出机器人会话""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + sessions = manager.list_bot_sessions(plugin_id=plugin_id, project_id=project_id) + + return [ + BotSessionResponse( + id=s.id, + plugin_id=s.plugin_id, + platform=s.platform, + session_id=s.session_id, + user_id=s.user_id, + user_name=s.user_name, + project_id=s.project_id, + message_count=s.message_count, + created_at=s.created_at, + last_message_at=s.last_message_at + ) + for s in sessions + ] + + +# ==================== Webhook Integration API ==================== + +@app.post("/api/v1/webhook-endpoints", response_model=WebhookEndpointResponse, tags=["Integrations"]) +async def create_webhook_endpoint( + plugin_id: str, + name: str, + endpoint_type: str, + target_project_id: Optional[str] = None, + allowed_events: Optional[List[str]] = None, + api_key: str = Depends(verify_api_key) +): + """创建 Webhook 端点(用于 Zapier/Make 集成)""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + endpoint = manager.create_webhook_endpoint( + plugin_id=plugin_id, + name=name, + endpoint_type=endpoint_type, + target_project_id=target_project_id, + allowed_events=allowed_events + ) + + return WebhookEndpointResponse( + id=endpoint.id, + plugin_id=endpoint.plugin_id, + name=endpoint.name, + endpoint_path=endpoint.endpoint_path, + endpoint_type=endpoint.endpoint_type, + target_project_id=endpoint.target_project_id, + is_active=endpoint.is_active, + trigger_count=endpoint.trigger_count, + created_at=endpoint.created_at + ) + + +@app.get("/api/v1/webhook-endpoints", response_model=List[WebhookEndpointResponse], tags=["Integrations"]) +async def list_webhook_endpoints( + plugin_id: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """列出 Webhook 端点""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + endpoints = manager.list_webhook_endpoints(plugin_id=plugin_id) + + return [ + WebhookEndpointResponse( + id=e.id, + plugin_id=e.plugin_id, + name=e.name, + endpoint_path=e.endpoint_path, + endpoint_type=e.endpoint_type, + target_project_id=e.target_project_id, + is_active=e.is_active, + trigger_count=e.trigger_count, + created_at=e.created_at + ) + for e in endpoints + ] + + +@app.post("/webhook/{endpoint_type}/{token}", tags=["Integrations"]) +async def receive_webhook( + endpoint_type: str, + token: str, + request: Request, + x_signature: Optional[str] = Header(None, alias="X-Signature") +): + """接收外部 Webhook 调用(Zapier/Make/Custom)""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + + # 构建完整路径查找端点 + path = f"/webhook/{endpoint_type}/{token}" + endpoint = manager.get_webhook_endpoint_by_path(path) + + if not endpoint or not endpoint.is_active: + raise HTTPException(status_code=404, detail="Webhook endpoint not found") + + # 验证签名(如果有) + if endpoint.secret and x_signature: + body = await request.body() + integration = WebhookIntegration(manager) + if not integration.validate_signature(body, x_signature, endpoint.secret): + raise HTTPException(status_code=401, detail="Invalid signature") + + # 解析请求体 + body = await request.json() + + # 更新触发统计 + manager.update_webhook_trigger(endpoint.id) + + # 记录活动 + manager.log_activity( + plugin_id=endpoint.plugin_id, + activity_type="webhook", + source=endpoint_type, + details={ + "endpoint_id": endpoint.id, + "event": body.get("event"), + "data_keys": list(body.get("data", {}).keys()) + } + ) + + # 处理数据(简化版本) + # 实际应该根据 endpoint.target_project_id 和 body 内容创建文档/实体等 + + return { + "success": True, + "endpoint_id": endpoint.id, + "received_at": datetime.now().isoformat() + } + + +# ==================== WebDAV API ==================== + +@app.post("/api/v1/webdav-syncs", response_model=WebDAVSyncResponse, tags=["WebDAV"]) +async def create_webdav_sync( + plugin_id: str, + name: str, + server_url: str, + username: str, + password: str, + remote_path: str = "/", + local_path: str = "./sync", + sync_direction: str = "bidirectional", + sync_mode: str = "manual", + auto_analyze: bool = True, + api_key: str = Depends(verify_api_key) +): + """创建 WebDAV 同步配置""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + sync = manager.create_webdav_sync( + plugin_id=plugin_id, + name=name, + server_url=server_url, + username=username, + password=password, + remote_path=remote_path, + local_path=local_path, + sync_direction=sync_direction, + sync_mode=sync_mode, + auto_analyze=auto_analyze + ) + + return WebDAVSyncResponse( + id=sync.id, + plugin_id=sync.plugin_id, + name=sync.name, + server_url=sync.server_url, + username=sync.username, + remote_path=sync.remote_path, + local_path=sync.local_path, + sync_direction=sync.sync_direction, + sync_mode=sync.sync_mode, + auto_analyze=sync.auto_analyze, + is_active=sync.is_active, + last_sync_at=sync.last_sync_at, + created_at=sync.created_at + ) + + +@app.get("/api/v1/webdav-syncs", response_model=List[WebDAVSyncResponse], tags=["WebDAV"]) +async def list_webdav_syncs( + plugin_id: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """列出 WebDAV 同步配置""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + syncs = manager.list_webdav_syncs(plugin_id=plugin_id) + + return [ + WebDAVSyncResponse( + id=s.id, + plugin_id=s.plugin_id, + name=s.name, + server_url=s.server_url, + username=s.username, + remote_path=s.remote_path, + local_path=s.local_path, + sync_direction=s.sync_direction, + sync_mode=s.sync_mode, + auto_analyze=s.auto_analyze, + is_active=s.is_active, + last_sync_at=s.last_sync_at, + created_at=s.created_at + ) + for s in syncs + ] + + +@app.post("/api/v1/webdav-syncs/{sync_id}/test", tags=["WebDAV"]) +async def test_webdav_connection( + sync_id: str, + api_key: str = Depends(verify_api_key) +): + """测试 WebDAV 连接""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + sync = manager.get_webdav_sync(sync_id) + + if not sync: + raise HTTPException(status_code=404, detail="WebDAV sync not found") + + from plugin_manager import WebDAVSync as WebDAVSyncHandler + handler = WebDAVSyncHandler(manager) + + success, message = await handler.test_connection( + sync.server_url, + sync.username, + sync.password + ) + + return {"success": success, "message": message} + + +@app.post("/api/v1/webdav-syncs/{sync_id}/sync", tags=["WebDAV"]) +async def trigger_webdav_sync( + sync_id: str, + api_key: str = Depends(verify_api_key) +): + """手动触发 WebDAV 同步""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + sync = manager.get_webdav_sync(sync_id) + + if not sync: + raise HTTPException(status_code=404, detail="WebDAV sync not found") + + # 这里应该启动异步同步任务 + # 简化版本,仅返回成功 + + manager.update_webdav_sync( + sync_id, + last_sync_at=datetime.now().isoformat(), + last_sync_status="running" + ) + + return { + "success": True, + "sync_id": sync_id, + "status": "running", + "message": "Sync started" + } + + +# ==================== Plugin Activity Logs ==================== + +@app.get("/api/v1/plugins/{plugin_id}/logs", tags=["Plugins"]) +async def get_plugin_logs( + plugin_id: str, + activity_type: Optional[str] = None, + limit: int = 100, + api_key: str = Depends(verify_api_key) +): + """获取插件活动日志""" + if not PLUGIN_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Plugin manager not available") + + manager = get_plugin_manager() + logs = manager.get_activity_logs( + plugin_id=plugin_id, + activity_type=activity_type, + limit=limit + ) + + return { + "logs": [ + { + "id": log.id, + "activity_type": log.activity_type, + "source": log.source, + "details": log.details, + "created_at": log.created_at + } + for log in logs + ] + } + + +# ==================== Phase 7 Task 3: Security & Compliance API ==================== + +# Pydantic models for security API +class AuditLogResponse(BaseModel): + id: str + action_type: str + user_id: Optional[str] = None + user_ip: Optional[str] = None + resource_type: Optional[str] = None + resource_id: Optional[str] = None + action_details: Optional[str] = None + success: bool = True + error_message: Optional[str] = None + created_at: str + + +class AuditStatsResponse(BaseModel): + total_actions: int + success_count: int + failure_count: int + action_breakdown: Dict[str, Dict[str, int]] + + +class EncryptionEnableRequest(BaseModel): + master_password: str + + +class EncryptionConfigResponse(BaseModel): + id: str + project_id: str + is_enabled: bool + encryption_type: str + created_at: str + updated_at: str + + +class MaskingRuleCreateRequest(BaseModel): + name: str + rule_type: str # phone, email, id_card, bank_card, name, address, custom + pattern: Optional[str] = None + replacement: Optional[str] = None + description: Optional[str] = None + priority: int = 0 + + +class MaskingRuleResponse(BaseModel): + id: str + project_id: str + name: str + rule_type: str + pattern: str + replacement: str + is_active: bool + priority: int + description: Optional[str] = None + created_at: str + updated_at: str + + +class MaskingApplyRequest(BaseModel): + text: str + rule_types: Optional[List[str]] = None + + +class MaskingApplyResponse(BaseModel): + original_text: str + masked_text: str + applied_rules: List[str] + + +class AccessPolicyCreateRequest(BaseModel): + name: str + description: Optional[str] = None + allowed_users: Optional[List[str]] = None + allowed_roles: Optional[List[str]] = None + allowed_ips: Optional[List[str]] = None + time_restrictions: Optional[Dict] = None + max_access_count: Optional[int] = None + require_approval: bool = False + + +class AccessPolicyResponse(BaseModel): + id: str + project_id: str + name: str + description: Optional[str] = None + allowed_users: Optional[List[str]] = None + allowed_roles: Optional[List[str]] = None + allowed_ips: Optional[List[str]] = None + time_restrictions: Optional[Dict] = None + max_access_count: Optional[int] = None + require_approval: bool = False + is_active: bool = True + created_at: str + updated_at: str + + +class AccessRequestCreateRequest(BaseModel): + policy_id: str + request_reason: Optional[str] = None + expires_hours: int = 24 + + +class AccessRequestResponse(BaseModel): + id: str + policy_id: str + user_id: str + request_reason: Optional[str] = None + status: str + approved_by: Optional[str] = None + approved_at: Optional[str] = None + expires_at: Optional[str] = None + created_at: str + + +# ==================== Audit Logs API ==================== + +@app.get("/api/v1/audit-logs", response_model=List[AuditLogResponse], tags=["Security"]) +async def get_audit_logs( + user_id: Optional[str] = None, + resource_type: Optional[str] = None, + resource_id: Optional[str] = None, + action_type: Optional[str] = None, + start_time: Optional[str] = None, + end_time: Optional[str] = None, + success: Optional[bool] = None, + limit: int = 100, + offset: int = 0, + api_key: str = Depends(verify_api_key) +): + """查询审计日志""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + logs = manager.get_audit_logs( + user_id=user_id, + resource_type=resource_type, + resource_id=resource_id, + action_type=action_type, + start_time=start_time, + end_time=end_time, + success=success, + limit=limit, + offset=offset + ) + + return [ + AuditLogResponse( + id=log.id, + action_type=log.action_type, + user_id=log.user_id, + user_ip=log.user_ip, + resource_type=log.resource_type, + resource_id=log.resource_id, + action_details=log.action_details, + success=log.success, + error_message=log.error_message, + created_at=log.created_at + ) + for log in logs + ] + + +@app.get("/api/v1/audit-logs/stats", response_model=AuditStatsResponse, tags=["Security"]) +async def get_audit_stats( + start_time: Optional[str] = None, + end_time: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """获取审计统计""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + stats = manager.get_audit_stats(start_time=start_time, end_time=end_time) + + return AuditStatsResponse(**stats) + + +# ==================== Encryption API ==================== + +@app.post("/api/v1/projects/{project_id}/encryption/enable", response_model=EncryptionConfigResponse, tags=["Security"]) +async def enable_project_encryption( + project_id: str, + request: EncryptionEnableRequest, + api_key: str = Depends(verify_api_key) +): + """启用项目端到端加密""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + try: + config = manager.enable_encryption(project_id, request.master_password) + return EncryptionConfigResponse( + id=config.id, + project_id=config.project_id, + is_enabled=config.is_enabled, + encryption_type=config.encryption_type, + created_at=config.created_at, + updated_at=config.updated_at + ) + except RuntimeError as e: + raise HTTPException(status_code=400, detail=str(e)) + + +@app.post("/api/v1/projects/{project_id}/encryption/disable", tags=["Security"]) +async def disable_project_encryption( + project_id: str, + request: EncryptionEnableRequest, + api_key: str = Depends(verify_api_key) +): + """禁用项目加密""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + success = manager.disable_encryption(project_id, request.master_password) + + if not success: + raise HTTPException(status_code=400, detail="Invalid password or encryption not enabled") + + return {"success": True, "message": "Encryption disabled successfully"} + + +@app.post("/api/v1/projects/{project_id}/encryption/verify", tags=["Security"]) +async def verify_encryption_password( + project_id: str, + request: EncryptionEnableRequest, + api_key: str = Depends(verify_api_key) +): + """验证加密密码""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + is_valid = manager.verify_encryption_password(project_id, request.master_password) + + return {"valid": is_valid} + + +@app.get("/api/v1/projects/{project_id}/encryption", response_model=Optional[EncryptionConfigResponse], tags=["Security"]) +async def get_encryption_config( + project_id: str, + api_key: str = Depends(verify_api_key) +): + """获取项目加密配置""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + config = manager.get_encryption_config(project_id) + + if not config: + return None + + return EncryptionConfigResponse( + id=config.id, + project_id=config.project_id, + is_enabled=config.is_enabled, + encryption_type=config.encryption_type, + created_at=config.created_at, + updated_at=config.updated_at + ) + + +# ==================== Data Masking API ==================== + +@app.post("/api/v1/projects/{project_id}/masking-rules", response_model=MaskingRuleResponse, tags=["Security"]) +async def create_masking_rule( + project_id: str, + request: MaskingRuleCreateRequest, + api_key: str = Depends(verify_api_key) +): + """创建数据脱敏规则""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + try: + rule_type = MaskingRuleType(request.rule_type) + except ValueError: + raise HTTPException(status_code=400, detail=f"Invalid rule type: {request.rule_type}") + + rule = manager.create_masking_rule( + project_id=project_id, + name=request.name, + rule_type=rule_type, + pattern=request.pattern, + replacement=request.replacement, + description=request.description, + priority=request.priority + ) + + return MaskingRuleResponse( + id=rule.id, + project_id=rule.project_id, + name=rule.name, + rule_type=rule.rule_type, + pattern=rule.pattern, + replacement=rule.replacement, + is_active=rule.is_active, + priority=rule.priority, + description=rule.description, + created_at=rule.created_at, + updated_at=rule.updated_at + ) + + +@app.get("/api/v1/projects/{project_id}/masking-rules", response_model=List[MaskingRuleResponse], tags=["Security"]) +async def get_masking_rules( + project_id: str, + active_only: bool = True, + api_key: str = Depends(verify_api_key) +): + """获取项目脱敏规则""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + rules = manager.get_masking_rules(project_id, active_only=active_only) + + return [ + MaskingRuleResponse( + id=rule.id, + project_id=rule.project_id, + name=rule.name, + rule_type=rule.rule_type, + pattern=rule.pattern, + replacement=rule.replacement, + is_active=rule.is_active, + priority=rule.priority, + description=rule.description, + created_at=rule.created_at, + updated_at=rule.updated_at + ) + for rule in rules + ] + + +@app.put("/api/v1/masking-rules/{rule_id}", response_model=MaskingRuleResponse, tags=["Security"]) +async def update_masking_rule( + rule_id: str, + name: Optional[str] = None, + pattern: Optional[str] = None, + replacement: Optional[str] = None, + is_active: Optional[bool] = None, + priority: Optional[int] = None, + description: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """更新脱敏规则""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + kwargs = {} + if name is not None: + kwargs["name"] = name + if pattern is not None: + kwargs["pattern"] = pattern + if replacement is not None: + kwargs["replacement"] = replacement + if is_active is not None: + kwargs["is_active"] = is_active + if priority is not None: + kwargs["priority"] = priority + if description is not None: + kwargs["description"] = description + + rule = manager.update_masking_rule(rule_id, **kwargs) + + if not rule: + raise HTTPException(status_code=404, detail="Masking rule not found") + + return MaskingRuleResponse( + id=rule.id, + project_id=rule.project_id, + name=rule.name, + rule_type=rule.rule_type, + pattern=rule.pattern, + replacement=rule.replacement, + is_active=rule.is_active, + priority=rule.priority, + description=rule.description, + created_at=rule.created_at, + updated_at=rule.updated_at + ) + + +@app.delete("/api/v1/masking-rules/{rule_id}", tags=["Security"]) +async def delete_masking_rule( + rule_id: str, + api_key: str = Depends(verify_api_key) +): + """删除脱敏规则""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + success = manager.delete_masking_rule(rule_id) + + if not success: + raise HTTPException(status_code=404, detail="Masking rule not found") + + return {"success": True, "message": "Masking rule deleted"} + + +@app.post("/api/v1/projects/{project_id}/masking/apply", response_model=MaskingApplyResponse, tags=["Security"]) +async def apply_masking( + project_id: str, + request: MaskingApplyRequest, + api_key: str = Depends(verify_api_key) +): + """应用脱敏规则到文本""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + # 转换规则类型 + rule_types = None + if request.rule_types: + rule_types = [MaskingRuleType(rt) for rt in request.rule_types] + + masked_text = manager.apply_masking(request.text, project_id, rule_types) + + # 获取应用的规则 + rules = manager.get_masking_rules(project_id) + applied_rules = [r.name for r in rules if r.is_active] + + return MaskingApplyResponse( + original_text=request.text, + masked_text=masked_text, + applied_rules=applied_rules + ) + + +# ==================== Data Access Policy API ==================== + +@app.post("/api/v1/projects/{project_id}/access-policies", response_model=AccessPolicyResponse, tags=["Security"]) +async def create_access_policy( + project_id: str, + request: AccessPolicyCreateRequest, + api_key: str = Depends(verify_api_key) +): + """创建数据访问策略""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + policy = manager.create_access_policy( + project_id=project_id, + name=request.name, + description=request.description, + allowed_users=request.allowed_users, + allowed_roles=request.allowed_roles, + allowed_ips=request.allowed_ips, + time_restrictions=request.time_restrictions, + max_access_count=request.max_access_count, + require_approval=request.require_approval + ) + + return AccessPolicyResponse( + id=policy.id, + project_id=policy.project_id, + name=policy.name, + description=policy.description, + allowed_users=json.loads(policy.allowed_users) if policy.allowed_users else None, + allowed_roles=json.loads(policy.allowed_roles) if policy.allowed_roles else None, + allowed_ips=json.loads(policy.allowed_ips) if policy.allowed_ips else None, + time_restrictions=json.loads(policy.time_restrictions) if policy.time_restrictions else None, + max_access_count=policy.max_access_count, + require_approval=policy.require_approval, + is_active=policy.is_active, + created_at=policy.created_at, + updated_at=policy.updated_at + ) + + +@app.get("/api/v1/projects/{project_id}/access-policies", response_model=List[AccessPolicyResponse], tags=["Security"]) +async def get_access_policies( + project_id: str, + active_only: bool = True, + api_key: str = Depends(verify_api_key) +): + """获取项目访问策略""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + policies = manager.get_access_policies(project_id, active_only=active_only) + + return [ + AccessPolicyResponse( + id=policy.id, + project_id=policy.project_id, + name=policy.name, + description=policy.description, + allowed_users=json.loads(policy.allowed_users) if policy.allowed_users else None, + allowed_roles=json.loads(policy.allowed_roles) if policy.allowed_roles else None, + allowed_ips=json.loads(policy.allowed_ips) if policy.allowed_ips else None, + time_restrictions=json.loads(policy.time_restrictions) if policy.time_restrictions else None, + max_access_count=policy.max_access_count, + require_approval=policy.require_approval, + is_active=policy.is_active, + created_at=policy.created_at, + updated_at=policy.updated_at + ) + for policy in policies + ] + + +@app.post("/api/v1/access-policies/{policy_id}/check", tags=["Security"]) +async def check_access_permission( + policy_id: str, + user_id: str, + user_ip: Optional[str] = None, + api_key: str = Depends(verify_api_key) +): + """检查访问权限""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + allowed, reason = manager.check_access_permission(policy_id, user_id, user_ip) + + return { + "allowed": allowed, + "reason": reason if not allowed else None + } + + +# ==================== Access Request API ==================== + +@app.post("/api/v1/access-requests", response_model=AccessRequestResponse, tags=["Security"]) +async def create_access_request( + request: AccessRequestCreateRequest, + user_id: str, # 实际应该从认证信息中获取 + api_key: str = Depends(verify_api_key) +): + """创建访问请求""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + + access_request = manager.create_access_request( + policy_id=request.policy_id, + user_id=user_id, + request_reason=request.request_reason, + expires_hours=request.expires_hours + ) + + return AccessRequestResponse( + id=access_request.id, + policy_id=access_request.policy_id, + user_id=access_request.user_id, + request_reason=access_request.request_reason, + status=access_request.status, + approved_by=access_request.approved_by, + approved_at=access_request.approved_at, + expires_at=access_request.expires_at, + created_at=access_request.created_at + ) + + +@app.post("/api/v1/access-requests/{request_id}/approve", response_model=AccessRequestResponse, tags=["Security"]) +async def approve_access_request( + request_id: str, + approved_by: str, + expires_hours: int = 24, + api_key: str = Depends(verify_api_key) +): + """批准访问请求""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + access_request = manager.approve_access_request(request_id, approved_by, expires_hours) + + if not access_request: + raise HTTPException(status_code=404, detail="Access request not found") + + return AccessRequestResponse( + id=access_request.id, + policy_id=access_request.policy_id, + user_id=access_request.user_id, + request_reason=access_request.request_reason, + status=access_request.status, + approved_by=access_request.approved_by, + approved_at=access_request.approved_at, + expires_at=access_request.expires_at, + created_at=access_request.created_at + ) + + +@app.post("/api/v1/access-requests/{request_id}/reject", response_model=AccessRequestResponse, tags=["Security"]) +async def reject_access_request( + request_id: str, + rejected_by: str, + api_key: str = Depends(verify_api_key) +): + """拒绝访问请求""" + if not SECURITY_MANAGER_AVAILABLE: + raise HTTPException(status_code=503, detail="Security manager not available") + + manager = get_security_manager() + access_request = manager.reject_access_request(request_id, rejected_by) + + if not access_request: + raise HTTPException(status_code=404, detail="Access request not found") + + return AccessRequestResponse( + id=access_request.id, + policy_id=access_request.policy_id, + user_id=access_request.user_id, + request_reason=access_request.request_reason, + status=access_request.status, + approved_by=access_request.approved_by, + approved_at=access_request.approved_at, + expires_at=access_request.expires_at, + created_at=access_request.created_at + ) + + # Serve frontend - MUST be last to not override API routes app.mount("/", StaticFiles(directory="frontend", html=True), name="frontend") diff --git a/backend/multimodal_entity_linker.py b/backend/multimodal_entity_linker.py new file mode 100644 index 0000000..2b8bc7d --- /dev/null +++ b/backend/multimodal_entity_linker.py @@ -0,0 +1,514 @@ +#!/usr/bin/env python3 +""" +InsightFlow Multimodal Entity Linker - Phase 7 +多模态实体关联模块:跨模态实体对齐和知识融合 +""" + +import os +import json +import uuid +from typing import List, Dict, Optional, Tuple, Set +from dataclasses import dataclass +from difflib import SequenceMatcher + +# 尝试导入embedding库 +try: + import numpy as np + NUMPY_AVAILABLE = True +except ImportError: + NUMPY_AVAILABLE = False + + +@dataclass +class MultimodalEntity: + """多模态实体""" + id: str + entity_id: str + project_id: str + name: str + source_type: str # audio, video, image, document + source_id: str + mention_context: str + confidence: float + modality_features: Dict = None # 模态特定特征 + + def __post_init__(self): + if self.modality_features is None: + self.modality_features = {} + + +@dataclass +class EntityLink: + """实体关联""" + id: str + project_id: str + source_entity_id: str + target_entity_id: str + link_type: str # same_as, related_to, part_of + source_modality: str + target_modality: str + confidence: float + evidence: str + + +@dataclass +class AlignmentResult: + """对齐结果""" + entity_id: str + matched_entity_id: Optional[str] + similarity: float + match_type: str # exact, fuzzy, embedding + confidence: float + + +@dataclass +class FusionResult: + """知识融合结果""" + canonical_entity_id: str + merged_entity_ids: List[str] + fused_properties: Dict + source_modalities: List[str] + confidence: float + + +class MultimodalEntityLinker: + """多模态实体关联器 - 跨模态实体对齐和知识融合""" + + # 关联类型 + LINK_TYPES = { + 'same_as': '同一实体', + 'related_to': '相关实体', + 'part_of': '组成部分', + 'mentions': '提及关系' + } + + # 模态类型 + MODALITIES = ['audio', 'video', 'image', 'document'] + + def __init__(self, similarity_threshold: float = 0.85): + """ + 初始化多模态实体关联器 + + Args: + similarity_threshold: 相似度阈值 + """ + self.similarity_threshold = similarity_threshold + + def calculate_string_similarity(self, s1: str, s2: str) -> float: + """ + 计算字符串相似度 + + Args: + s1: 字符串1 + s2: 字符串2 + + Returns: + 相似度分数 (0-1) + """ + if not s1 or not s2: + return 0.0 + + s1, s2 = s1.lower().strip(), s2.lower().strip() + + # 完全匹配 + if s1 == s2: + return 1.0 + + # 包含关系 + if s1 in s2 or s2 in s1: + return 0.9 + + # 编辑距离相似度 + return SequenceMatcher(None, s1, s2).ratio() + + def calculate_entity_similarity(self, entity1: Dict, entity2: Dict) -> Tuple[float, str]: + """ + 计算两个实体的综合相似度 + + Args: + entity1: 实体1信息 + entity2: 实体2信息 + + Returns: + (相似度, 匹配类型) + """ + # 名称相似度 + name_sim = self.calculate_string_similarity( + entity1.get('name', ''), + entity2.get('name', '') + ) + + # 如果名称完全匹配 + if name_sim == 1.0: + return 1.0, 'exact' + + # 检查别名 + aliases1 = set(a.lower() for a in entity1.get('aliases', [])) + aliases2 = set(a.lower() for a in entity2.get('aliases', [])) + + if aliases1 & aliases2: # 有共同别名 + return 0.95, 'alias_match' + + if entity2.get('name', '').lower() in aliases1: + return 0.95, 'alias_match' + if entity1.get('name', '').lower() in aliases2: + return 0.95, 'alias_match' + + # 定义相似度 + def_sim = self.calculate_string_similarity( + entity1.get('definition', ''), + entity2.get('definition', '') + ) + + # 综合相似度 + combined_sim = name_sim * 0.7 + def_sim * 0.3 + + if combined_sim >= self.similarity_threshold: + return combined_sim, 'fuzzy' + + return combined_sim, 'none' + + def find_matching_entity(self, query_entity: Dict, + candidate_entities: List[Dict], + exclude_ids: Set[str] = None) -> Optional[AlignmentResult]: + """ + 在候选实体中查找匹配的实体 + + Args: + query_entity: 查询实体 + candidate_entities: 候选实体列表 + exclude_ids: 排除的实体ID + + Returns: + 对齐结果 + """ + exclude_ids = exclude_ids or set() + best_match = None + best_similarity = 0.0 + + for candidate in candidate_entities: + if candidate.get('id') in exclude_ids: + continue + + similarity, match_type = self.calculate_entity_similarity( + query_entity, candidate + ) + + if similarity > best_similarity and similarity >= self.similarity_threshold: + best_similarity = similarity + best_match = candidate + best_match_type = match_type + + if best_match: + return AlignmentResult( + entity_id=query_entity.get('id'), + matched_entity_id=best_match.get('id'), + similarity=best_similarity, + match_type=best_match_type, + confidence=best_similarity + ) + + return None + + def align_cross_modal_entities(self, project_id: str, + audio_entities: List[Dict], + video_entities: List[Dict], + image_entities: List[Dict], + document_entities: List[Dict]) -> List[EntityLink]: + """ + 跨模态实体对齐 + + Args: + project_id: 项目ID + audio_entities: 音频模态实体 + video_entities: 视频模态实体 + image_entities: 图片模态实体 + document_entities: 文档模态实体 + + Returns: + 实体关联列表 + """ + links = [] + + # 合并所有实体 + all_entities = { + 'audio': audio_entities, + 'video': video_entities, + 'image': image_entities, + 'document': document_entities + } + + # 跨模态对齐 + for mod1 in self.MODALITIES: + for mod2 in self.MODALITIES: + if mod1 >= mod2: # 避免重复比较 + continue + + entities1 = all_entities.get(mod1, []) + entities2 = all_entities.get(mod2, []) + + for ent1 in entities1: + # 在另一个模态中查找匹配 + result = self.find_matching_entity(ent1, entities2) + + if result and result.matched_entity_id: + link = EntityLink( + id=str(uuid.uuid4())[:8], + project_id=project_id, + source_entity_id=ent1.get('id'), + target_entity_id=result.matched_entity_id, + link_type='same_as' if result.similarity > 0.95 else 'related_to', + source_modality=mod1, + target_modality=mod2, + confidence=result.confidence, + evidence=f"Cross-modal alignment: {result.match_type}" + ) + links.append(link) + + return links + + def fuse_entity_knowledge(self, entity_id: str, + linked_entities: List[Dict], + multimodal_mentions: List[Dict]) -> FusionResult: + """ + 融合多模态实体知识 + + Args: + entity_id: 主实体ID + linked_entities: 关联的实体信息列表 + multimodal_mentions: 多模态提及列表 + + Returns: + 融合结果 + """ + # 收集所有属性 + fused_properties = { + 'names': set(), + 'definitions': [], + 'aliases': set(), + 'types': set(), + 'modalities': set(), + 'contexts': [] + } + + merged_ids = [] + + for entity in linked_entities: + merged_ids.append(entity.get('id')) + + # 收集名称 + fused_properties['names'].add(entity.get('name', '')) + + # 收集定义 + if entity.get('definition'): + fused_properties['definitions'].append(entity.get('definition')) + + # 收集别名 + fused_properties['aliases'].update(entity.get('aliases', [])) + + # 收集类型 + fused_properties['types'].add(entity.get('type', 'OTHER')) + + # 收集模态和上下文 + for mention in multimodal_mentions: + fused_properties['modalities'].add(mention.get('source_type', '')) + if mention.get('mention_context'): + fused_properties['contexts'].append(mention.get('mention_context')) + + # 选择最佳定义(最长的那个) + best_definition = max(fused_properties['definitions'], key=len) \ + if fused_properties['definitions'] else "" + + # 选择最佳名称(最常见的那个) + from collections import Counter + name_counts = Counter(fused_properties['names']) + best_name = name_counts.most_common(1)[0][0] if name_counts else "" + + # 构建融合结果 + return FusionResult( + canonical_entity_id=entity_id, + merged_entity_ids=merged_ids, + fused_properties={ + 'name': best_name, + 'definition': best_definition, + 'aliases': list(fused_properties['aliases']), + 'types': list(fused_properties['types']), + 'modalities': list(fused_properties['modalities']), + 'contexts': fused_properties['contexts'][:10] # 最多10个上下文 + }, + source_modalities=list(fused_properties['modalities']), + confidence=min(1.0, len(linked_entities) * 0.2 + 0.5) + ) + + def detect_entity_conflicts(self, entities: List[Dict]) -> List[Dict]: + """ + 检测实体冲突(同名但不同义) + + Args: + entities: 实体列表 + + Returns: + 冲突列表 + """ + conflicts = [] + + # 按名称分组 + name_groups = {} + for entity in entities: + name = entity.get('name', '').lower() + if name: + if name not in name_groups: + name_groups[name] = [] + name_groups[name].append(entity) + + # 检测同名但定义不同的实体 + for name, group in name_groups.items(): + if len(group) > 1: + # 检查定义是否相似 + definitions = [e.get('definition', '') for e in group if e.get('definition')] + + if len(definitions) > 1: + # 计算定义之间的相似度 + sim_matrix = [] + for i, d1 in enumerate(definitions): + for j, d2 in enumerate(definitions): + if i < j: + sim = self.calculate_string_similarity(d1, d2) + sim_matrix.append(sim) + + # 如果定义相似度都很低,可能是冲突 + if sim_matrix and all(s < 0.5 for s in sim_matrix): + conflicts.append({ + 'name': name, + 'entities': group, + 'type': 'homonym_conflict', + 'suggestion': 'Consider disambiguating these entities' + }) + + return conflicts + + def suggest_entity_merges(self, entities: List[Dict], + existing_links: List[EntityLink] = None) -> List[Dict]: + """ + 建议实体合并 + + Args: + entities: 实体列表 + existing_links: 现有实体关联 + + Returns: + 合并建议列表 + """ + suggestions = [] + existing_pairs = set() + + # 记录已有的关联 + if existing_links: + for link in existing_links: + pair = tuple(sorted([link.source_entity_id, link.target_entity_id])) + existing_pairs.add(pair) + + # 检查所有实体对 + for i, ent1 in enumerate(entities): + for j, ent2 in enumerate(entities): + if i >= j: + continue + + # 检查是否已有关联 + pair = tuple(sorted([ent1.get('id'), ent2.get('id')])) + if pair in existing_pairs: + continue + + # 计算相似度 + similarity, match_type = self.calculate_entity_similarity(ent1, ent2) + + if similarity >= self.similarity_threshold: + suggestions.append({ + 'entity1': ent1, + 'entity2': ent2, + 'similarity': similarity, + 'match_type': match_type, + 'suggested_action': 'merge' if similarity > 0.95 else 'link' + }) + + # 按相似度排序 + suggestions.sort(key=lambda x: x['similarity'], reverse=True) + + return suggestions + + def create_multimodal_entity_record(self, project_id: str, + entity_id: str, + source_type: str, + source_id: str, + mention_context: str = "", + confidence: float = 1.0) -> MultimodalEntity: + """ + 创建多模态实体记录 + + Args: + project_id: 项目ID + entity_id: 实体ID + source_type: 来源类型 + source_id: 来源ID + mention_context: 提及上下文 + confidence: 置信度 + + Returns: + 多模态实体记录 + """ + return MultimodalEntity( + id=str(uuid.uuid4())[:8], + entity_id=entity_id, + project_id=project_id, + name="", # 将在后续填充 + source_type=source_type, + source_id=source_id, + mention_context=mention_context, + confidence=confidence + ) + + def analyze_modality_distribution(self, multimodal_entities: List[MultimodalEntity]) -> Dict: + """ + 分析模态分布 + + Args: + multimodal_entities: 多模态实体列表 + + Returns: + 模态分布统计 + """ + distribution = {mod: 0 for mod in self.MODALITIES} + cross_modal_entities = set() + + # 统计每个模态的实体数 + for me in multimodal_entities: + if me.source_type in distribution: + distribution[me.source_type] += 1 + + # 统计跨模态实体 + entity_modalities = {} + for me in multimodal_entities: + if me.entity_id not in entity_modalities: + entity_modalities[me.entity_id] = set() + entity_modalities[me.entity_id].add(me.source_type) + + cross_modal_count = sum(1 for mods in entity_modalities.values() if len(mods) > 1) + + return { + 'modality_distribution': distribution, + 'total_multimodal_records': len(multimodal_entities), + 'unique_entities': len(entity_modalities), + 'cross_modal_entities': cross_modal_count, + 'cross_modal_ratio': cross_modal_count / len(entity_modalities) if entity_modalities else 0 + } + + +# Singleton instance +_multimodal_entity_linker = None + +def get_multimodal_entity_linker(similarity_threshold: float = 0.85) -> MultimodalEntityLinker: + """获取多模态实体关联器单例""" + global _multimodal_entity_linker + if _multimodal_entity_linker is None: + _multimodal_entity_linker = MultimodalEntityLinker(similarity_threshold) + return _multimodal_entity_linker diff --git a/backend/multimodal_processor.py b/backend/multimodal_processor.py new file mode 100644 index 0000000..522e0c5 --- /dev/null +++ b/backend/multimodal_processor.py @@ -0,0 +1,434 @@ +#!/usr/bin/env python3 +""" +InsightFlow Multimodal Processor - Phase 7 +视频处理模块:提取音频、关键帧、OCR识别 +""" + +import os +import json +import uuid +import tempfile +import subprocess +from typing import List, Dict, Optional, Tuple +from dataclasses import dataclass +from pathlib import Path + +# 尝试导入OCR库 +try: + import pytesseract + from PIL import Image + PYTESSERACT_AVAILABLE = True +except ImportError: + PYTESSERACT_AVAILABLE = False + +try: + import cv2 + CV2_AVAILABLE = True +except ImportError: + CV2_AVAILABLE = False + +try: + import ffmpeg + FFMPEG_AVAILABLE = True +except ImportError: + FFMPEG_AVAILABLE = False + + +@dataclass +class VideoFrame: + """视频关键帧数据类""" + id: str + video_id: str + frame_number: int + timestamp: float + frame_path: str + ocr_text: str = "" + ocr_confidence: float = 0.0 + entities_detected: List[Dict] = None + + def __post_init__(self): + if self.entities_detected is None: + self.entities_detected = [] + + +@dataclass +class VideoInfo: + """视频信息数据类""" + id: str + project_id: str + filename: str + file_path: str + duration: float = 0.0 + width: int = 0 + height: int = 0 + fps: float = 0.0 + audio_extracted: bool = False + audio_path: str = "" + transcript_id: str = "" + status: str = "pending" + error_message: str = "" + metadata: Dict = None + + def __post_init__(self): + if self.metadata is None: + self.metadata = {} + + +@dataclass +class VideoProcessingResult: + """视频处理结果""" + video_id: str + audio_path: str + frames: List[VideoFrame] + ocr_results: List[Dict] + full_text: str # 整合的文本(音频转录 + OCR文本) + success: bool + error_message: str = "" + + +class MultimodalProcessor: + """多模态处理器 - 处理视频文件""" + + def __init__(self, temp_dir: str = None, frame_interval: int = 5): + """ + 初始化多模态处理器 + + Args: + temp_dir: 临时文件目录 + frame_interval: 关键帧提取间隔(秒) + """ + self.temp_dir = temp_dir or tempfile.gettempdir() + self.frame_interval = frame_interval + self.video_dir = os.path.join(self.temp_dir, "videos") + self.frames_dir = os.path.join(self.temp_dir, "frames") + self.audio_dir = os.path.join(self.temp_dir, "audio") + + # 创建目录 + os.makedirs(self.video_dir, exist_ok=True) + os.makedirs(self.frames_dir, exist_ok=True) + os.makedirs(self.audio_dir, exist_ok=True) + + def extract_video_info(self, video_path: str) -> Dict: + """ + 提取视频基本信息 + + Args: + video_path: 视频文件路径 + + Returns: + 视频信息字典 + """ + try: + if FFMPEG_AVAILABLE: + probe = ffmpeg.probe(video_path) + video_stream = next((s for s in probe['streams'] if s['codec_type'] == 'video'), None) + audio_stream = next((s for s in probe['streams'] if s['codec_type'] == 'audio'), None) + + if video_stream: + return { + 'duration': float(probe['format'].get('duration', 0)), + 'width': int(video_stream.get('width', 0)), + 'height': int(video_stream.get('height', 0)), + 'fps': eval(video_stream.get('r_frame_rate', '0/1')), + 'has_audio': audio_stream is not None, + 'bitrate': int(probe['format'].get('bit_rate', 0)) + } + else: + # 使用 ffprobe 命令行 + cmd = [ + 'ffprobe', '-v', 'error', '-show_entries', + 'format=duration,bit_rate', '-show_entries', + 'stream=width,height,r_frame_rate', '-of', 'json', + video_path + ] + result = subprocess.run(cmd, capture_output=True, text=True) + if result.returncode == 0: + data = json.loads(result.stdout) + return { + 'duration': float(data['format'].get('duration', 0)), + 'width': int(data['streams'][0].get('width', 0)) if data['streams'] else 0, + 'height': int(data['streams'][0].get('height', 0)) if data['streams'] else 0, + 'fps': 30.0, # 默认值 + 'has_audio': len(data['streams']) > 1, + 'bitrate': int(data['format'].get('bit_rate', 0)) + } + except Exception as e: + print(f"Error extracting video info: {e}") + + return { + 'duration': 0, + 'width': 0, + 'height': 0, + 'fps': 0, + 'has_audio': False, + 'bitrate': 0 + } + + def extract_audio(self, video_path: str, output_path: str = None) -> str: + """ + 从视频中提取音频 + + Args: + video_path: 视频文件路径 + output_path: 输出音频路径(可选) + + Returns: + 提取的音频文件路径 + """ + if output_path is None: + video_name = Path(video_path).stem + output_path = os.path.join(self.audio_dir, f"{video_name}.wav") + + try: + if FFMPEG_AVAILABLE: + ( + ffmpeg + .input(video_path) + .output(output_path, ac=1, ar=16000, vn=None) + .overwrite_output() + .run(quiet=True) + ) + else: + # 使用命令行 ffmpeg + cmd = [ + 'ffmpeg', '-i', video_path, + '-vn', '-acodec', 'pcm_s16le', + '-ac', '1', '-ar', '16000', + '-y', output_path + ] + subprocess.run(cmd, check=True, capture_output=True) + + return output_path + except Exception as e: + print(f"Error extracting audio: {e}") + raise + + def extract_keyframes(self, video_path: str, video_id: str, + interval: int = None) -> List[str]: + """ + 从视频中提取关键帧 + + Args: + video_path: 视频文件路径 + video_id: 视频ID + interval: 提取间隔(秒),默认使用初始化时的间隔 + + Returns: + 提取的帧文件路径列表 + """ + interval = interval or self.frame_interval + frame_paths = [] + + # 创建帧存储目录 + video_frames_dir = os.path.join(self.frames_dir, video_id) + os.makedirs(video_frames_dir, exist_ok=True) + + try: + if CV2_AVAILABLE: + # 使用 OpenCV 提取帧 + cap = cv2.VideoCapture(video_path) + fps = cap.get(cv2.CAP_PROP_FPS) + total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) + + frame_interval_frames = int(fps * interval) + frame_number = 0 + + while True: + ret, frame = cap.read() + if not ret: + break + + if frame_number % frame_interval_frames == 0: + timestamp = frame_number / fps + frame_path = os.path.join( + video_frames_dir, + f"frame_{frame_number:06d}_{timestamp:.2f}.jpg" + ) + cv2.imwrite(frame_path, frame) + frame_paths.append(frame_path) + + frame_number += 1 + + cap.release() + else: + # 使用 ffmpeg 命令行提取帧 + video_name = Path(video_path).stem + output_pattern = os.path.join(video_frames_dir, "frame_%06d_%t.jpg") + + cmd = [ + 'ffmpeg', '-i', video_path, + '-vf', f'fps=1/{interval}', + '-frame_pts', '1', + '-y', output_pattern + ] + subprocess.run(cmd, check=True, capture_output=True) + + # 获取生成的帧文件列表 + frame_paths = sorted([ + os.path.join(video_frames_dir, f) + for f in os.listdir(video_frames_dir) + if f.startswith('frame_') + ]) + except Exception as e: + print(f"Error extracting keyframes: {e}") + + return frame_paths + + def perform_ocr(self, image_path: str) -> Tuple[str, float]: + """ + 对图片进行OCR识别 + + Args: + image_path: 图片文件路径 + + Returns: + (识别的文本, 置信度) + """ + if not PYTESSERACT_AVAILABLE: + return "", 0.0 + + try: + image = Image.open(image_path) + + # 预处理:转换为灰度图 + if image.mode != 'L': + image = image.convert('L') + + # 使用 pytesseract 进行 OCR + text = pytesseract.image_to_string(image, lang='chi_sim+eng') + + # 获取置信度数据 + data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT) + confidences = [int(c) for c in data['conf'] if int(c) > 0] + avg_confidence = sum(confidences) / len(confidences) if confidences else 0 + + return text.strip(), avg_confidence / 100.0 + except Exception as e: + print(f"OCR error for {image_path}: {e}") + return "", 0.0 + + def process_video(self, video_data: bytes, filename: str, + project_id: str, video_id: str = None) -> VideoProcessingResult: + """ + 处理视频文件:提取音频、关键帧、OCR + + Args: + video_data: 视频文件二进制数据 + filename: 视频文件名 + project_id: 项目ID + video_id: 视频ID(可选,自动生成) + + Returns: + 视频处理结果 + """ + video_id = video_id or str(uuid.uuid4())[:8] + + try: + # 保存视频文件 + video_path = os.path.join(self.video_dir, f"{video_id}_{filename}") + with open(video_path, 'wb') as f: + f.write(video_data) + + # 提取视频信息 + video_info = self.extract_video_info(video_path) + + # 提取音频 + audio_path = "" + if video_info['has_audio']: + audio_path = self.extract_audio(video_path) + + # 提取关键帧 + frame_paths = self.extract_keyframes(video_path, video_id) + + # 对关键帧进行 OCR + frames = [] + ocr_results = [] + all_ocr_text = [] + + for i, frame_path in enumerate(frame_paths): + # 解析帧信息 + frame_name = os.path.basename(frame_path) + parts = frame_name.replace('.jpg', '').split('_') + frame_number = int(parts[1]) if len(parts) > 1 else i + timestamp = float(parts[2]) if len(parts) > 2 else i * self.frame_interval + + # OCR 识别 + ocr_text, confidence = self.perform_ocr(frame_path) + + frame = VideoFrame( + id=str(uuid.uuid4())[:8], + video_id=video_id, + frame_number=frame_number, + timestamp=timestamp, + frame_path=frame_path, + ocr_text=ocr_text, + ocr_confidence=confidence + ) + frames.append(frame) + + if ocr_text: + ocr_results.append({ + 'frame_number': frame_number, + 'timestamp': timestamp, + 'text': ocr_text, + 'confidence': confidence + }) + all_ocr_text.append(ocr_text) + + # 整合所有 OCR 文本 + full_ocr_text = "\n\n".join(all_ocr_text) + + return VideoProcessingResult( + video_id=video_id, + audio_path=audio_path, + frames=frames, + ocr_results=ocr_results, + full_text=full_ocr_text, + success=True + ) + + except Exception as e: + return VideoProcessingResult( + video_id=video_id, + audio_path="", + frames=[], + ocr_results=[], + full_text="", + success=False, + error_message=str(e) + ) + + def cleanup(self, video_id: str = None): + """ + 清理临时文件 + + Args: + video_id: 视频ID(可选,清理特定视频的文件) + """ + import shutil + + if video_id: + # 清理特定视频的文件 + for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]: + target_dir = os.path.join(dir_path, video_id) if dir_path == self.frames_dir else dir_path + if os.path.exists(target_dir): + for f in os.listdir(target_dir): + if video_id in f: + os.remove(os.path.join(target_dir, f)) + else: + # 清理所有临时文件 + for dir_path in [self.video_dir, self.frames_dir, self.audio_dir]: + if os.path.exists(dir_path): + shutil.rmtree(dir_path) + os.makedirs(dir_path, exist_ok=True) + + +# Singleton instance +_multimodal_processor = None + +def get_multimodal_processor(temp_dir: str = None, frame_interval: int = 5) -> MultimodalProcessor: + """获取多模态处理器单例""" + global _multimodal_processor + if _multimodal_processor is None: + _multimodal_processor = MultimodalProcessor(temp_dir, frame_interval) + return _multimodal_processor diff --git a/backend/plugin_manager.py b/backend/plugin_manager.py new file mode 100644 index 0000000..0c59845 --- /dev/null +++ b/backend/plugin_manager.py @@ -0,0 +1,1366 @@ +#!/usr/bin/env python3 +""" +InsightFlow Plugin Manager - Phase 7 Task 7 +插件与集成系统:Chrome插件、飞书/钉钉机器人、Zapier/Make集成、WebDAV同步 +""" + +import os +import json +import hashlib +import hmac +import base64 +import time +import uuid +import httpx +import asyncio +from datetime import datetime +from typing import Dict, List, Optional, Any, Callable +from dataclasses import dataclass, field +from enum import Enum +import sqlite3 + +# WebDAV 支持 +try: + import webdav4.client as webdav_client + WEBDAV_AVAILABLE = True +except ImportError: + WEBDAV_AVAILABLE = False + + +class PluginType(Enum): + """插件类型""" + CHROME_EXTENSION = "chrome_extension" + FEISHU_BOT = "feishu_bot" + DINGTALK_BOT = "dingtalk_bot" + ZAPIER = "zapier" + MAKE = "make" + WEBDAV = "webdav" + CUSTOM = "custom" + + +class PluginStatus(Enum): + """插件状态""" + ACTIVE = "active" + INACTIVE = "inactive" + ERROR = "error" + PENDING = "pending" + + +@dataclass +class Plugin: + """插件配置""" + id: str + name: str + plugin_type: str + project_id: str + status: str = "active" + config: Dict = field(default_factory=dict) + created_at: str = "" + updated_at: str = "" + last_used_at: Optional[str] = None + use_count: int = 0 + + +@dataclass +class PluginConfig: + """插件详细配置""" + id: str + plugin_id: str + config_key: str + config_value: str + is_encrypted: bool = False + created_at: str = "" + updated_at: str = "" + + +@dataclass +class BotSession: + """机器人会话""" + id: str + bot_type: str # feishu, dingtalk + session_id: str # 群ID或会话ID + session_name: str + project_id: Optional[str] = None + webhook_url: str = "" + secret: str = "" + is_active: bool = True + created_at: str = "" + updated_at: str = "" + last_message_at: Optional[str] = None + message_count: int = 0 + + +@dataclass +class WebhookEndpoint: + """Webhook 端点配置(Zapier/Make集成)""" + id: str + name: str + endpoint_type: str # zapier, make, custom + endpoint_url: str + project_id: Optional[str] = None + auth_type: str = "none" # none, api_key, oauth, custom + auth_config: Dict = field(default_factory=dict) + trigger_events: List[str] = field(default_factory=list) + is_active: bool = True + created_at: str = "" + updated_at: str = "" + last_triggered_at: Optional[str] = None + trigger_count: int = 0 + + +@dataclass +class WebDAVSync: + """WebDAV 同步配置""" + id: str + name: str + project_id: str + server_url: str + username: str + password: str = "" # 加密存储 + remote_path: str = "/insightflow" + sync_mode: str = "bidirectional" # bidirectional, upload_only, download_only + sync_interval: int = 3600 # 秒 + last_sync_at: Optional[str] = None + last_sync_status: str = "pending" # pending, success, failed + last_sync_error: str = "" + is_active: bool = True + created_at: str = "" + updated_at: str = "" + sync_count: int = 0 + + +@dataclass +class ChromeExtensionToken: + """Chrome 扩展令牌""" + id: str + token: str + user_id: Optional[str] = None + project_id: Optional[str] = None + name: str = "" + permissions: List[str] = field(default_factory=lambda: ["read", "write"]) + expires_at: Optional[str] = None + created_at: str = "" + last_used_at: Optional[str] = None + use_count: int = 0 + is_revoked: bool = False + + +class PluginManager: + """插件管理主类""" + + def __init__(self, db_manager=None): + self.db = db_manager + self._handlers = {} + self._register_default_handlers() + + def _register_default_handlers(self): + """注册默认处理器""" + self._handlers[PluginType.CHROME_EXTENSION] = ChromeExtensionHandler(self) + self._handlers[PluginType.FEISHU_BOT] = BotHandler(self, "feishu") + self._handlers[PluginType.DINGTALK_BOT] = BotHandler(self, "dingtalk") + self._handlers[PluginType.ZAPIER] = WebhookIntegration(self, "zapier") + self._handlers[PluginType.MAKE] = WebhookIntegration(self, "make") + self._handlers[PluginType.WEBDAV] = WebDAVSyncManager(self) + + def get_handler(self, plugin_type: PluginType) -> Optional[Any]: + """获取插件处理器""" + return self._handlers.get(plugin_type) + + # ==================== Plugin CRUD ==================== + + def create_plugin(self, plugin: Plugin) -> Plugin: + """创建插件""" + conn = self.db.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """INSERT INTO plugins + (id, name, plugin_type, project_id, status, config, created_at, updated_at, use_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (plugin.id, plugin.name, plugin.plugin_type, plugin.project_id, + plugin.status, json.dumps(plugin.config), now, now, 0) + ) + conn.commit() + conn.close() + + plugin.created_at = now + plugin.updated_at = now + return plugin + + def get_plugin(self, plugin_id: str) -> Optional[Plugin]: + """获取插件""" + conn = self.db.get_conn() + row = conn.execute( + "SELECT * FROM plugins WHERE id = ?", (plugin_id,) + ).fetchone() + conn.close() + + if row: + return self._row_to_plugin(row) + return None + + def list_plugins(self, project_id: str = None, plugin_type: str = None, + status: str = None) -> List[Plugin]: + """列出插件""" + conn = self.db.get_conn() + + conditions = [] + params = [] + + if project_id: + conditions.append("project_id = ?") + params.append(project_id) + if plugin_type: + conditions.append("plugin_type = ?") + params.append(plugin_type) + if status: + conditions.append("status = ?") + params.append(status) + + where_clause = " AND ".join(conditions) if conditions else "1=1" + + rows = conn.execute( + f"SELECT * FROM plugins WHERE {where_clause} ORDER BY created_at DESC", + params + ).fetchall() + conn.close() + + return [self._row_to_plugin(row) for row in rows] + + def update_plugin(self, plugin_id: str, **kwargs) -> Optional[Plugin]: + """更新插件""" + conn = self.db.get_conn() + + allowed_fields = ['name', 'status', 'config'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field == 'config': + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + conn.close() + return self.get_plugin(plugin_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(plugin_id) + + query = f"UPDATE plugins SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + conn.close() + + return self.get_plugin(plugin_id) + + def delete_plugin(self, plugin_id: str) -> bool: + """删除插件""" + conn = self.db.get_conn() + + # 删除关联的配置 + conn.execute("DELETE FROM plugin_configs WHERE plugin_id = ?", (plugin_id,)) + + # 删除插件 + cursor = conn.execute("DELETE FROM plugins WHERE id = ?", (plugin_id,)) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def _row_to_plugin(self, row: sqlite3.Row) -> Plugin: + """将数据库行转换为 Plugin 对象""" + return Plugin( + id=row['id'], + name=row['name'], + plugin_type=row['plugin_type'], + project_id=row['project_id'], + status=row['status'], + config=json.loads(row['config']) if row['config'] else {}, + created_at=row['created_at'], + updated_at=row['updated_at'], + last_used_at=row['last_used_at'], + use_count=row['use_count'] + ) + + # ==================== Plugin Config ==================== + + def set_plugin_config(self, plugin_id: str, key: str, value: str, + is_encrypted: bool = False) -> PluginConfig: + """设置插件配置""" + conn = self.db.get_conn() + now = datetime.now().isoformat() + + # 检查是否已存在 + existing = conn.execute( + "SELECT id FROM plugin_configs WHERE plugin_id = ? AND config_key = ?", + (plugin_id, key) + ).fetchone() + + if existing: + conn.execute( + """UPDATE plugin_configs + SET config_value = ?, is_encrypted = ?, updated_at = ? + WHERE id = ?""", + (value, is_encrypted, now, existing['id']) + ) + config_id = existing['id'] + else: + config_id = str(uuid.uuid4())[:8] + conn.execute( + """INSERT INTO plugin_configs + (id, plugin_id, config_key, config_value, is_encrypted, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?)""", + (config_id, plugin_id, key, value, is_encrypted, now, now) + ) + + conn.commit() + conn.close() + + return PluginConfig( + id=config_id, + plugin_id=plugin_id, + config_key=key, + config_value=value, + is_encrypted=is_encrypted, + created_at=now, + updated_at=now + ) + + def get_plugin_config(self, plugin_id: str, key: str) -> Optional[str]: + """获取插件配置""" + conn = self.db.get_conn() + row = conn.execute( + "SELECT config_value FROM plugin_configs WHERE plugin_id = ? AND config_key = ?", + (plugin_id, key) + ).fetchone() + conn.close() + + return row['config_value'] if row else None + + def get_all_plugin_configs(self, plugin_id: str) -> Dict[str, str]: + """获取插件所有配置""" + conn = self.db.get_conn() + rows = conn.execute( + "SELECT config_key, config_value FROM plugin_configs WHERE plugin_id = ?", + (plugin_id,) + ).fetchall() + conn.close() + + return {row['config_key']: row['config_value'] for row in rows} + + def delete_plugin_config(self, plugin_id: str, key: str) -> bool: + """删除插件配置""" + conn = self.db.get_conn() + cursor = conn.execute( + "DELETE FROM plugin_configs WHERE plugin_id = ? AND config_key = ?", + (plugin_id, key) + ) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def record_plugin_usage(self, plugin_id: str): + """记录插件使用""" + conn = self.db.get_conn() + now = datetime.now().isoformat() + + conn.execute( + """UPDATE plugins + SET use_count = use_count + 1, last_used_at = ? + WHERE id = ?""", + (now, plugin_id) + ) + conn.commit() + conn.close() + + +class ChromeExtensionHandler: + """Chrome 扩展处理器""" + + def __init__(self, plugin_manager: PluginManager): + self.pm = plugin_manager + + def create_token(self, name: str, user_id: str = None, project_id: str = None, + permissions: List[str] = None, expires_days: int = None) -> ChromeExtensionToken: + """创建 Chrome 扩展令牌""" + token_id = str(uuid.uuid4())[:8] + + # 生成随机令牌 + raw_token = f"if_ext_{base64.urlsafe_b64encode(os.urandom(32)).decode('utf-8').rstrip('=')}" + + # 哈希存储 + token_hash = hashlib.sha256(raw_token.encode()).hexdigest() + + now = datetime.now().isoformat() + expires_at = None + if expires_days: + from datetime import timedelta + expires_at = (datetime.now() + timedelta(days=expires_days)).isoformat() + + conn = self.pm.db.get_conn() + conn.execute( + """INSERT INTO chrome_extension_tokens + (id, token_hash, user_id, project_id, name, permissions, expires_at, + created_at, is_revoked, use_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (token_id, token_hash, user_id, project_id, name, + json.dumps(permissions or ["read"]), expires_at, now, False, 0) + ) + conn.commit() + conn.close() + + return ChromeExtensionToken( + id=token_id, + token=raw_token, # 仅返回一次 + user_id=user_id, + project_id=project_id, + name=name, + permissions=permissions or ["read"], + expires_at=expires_at, + created_at=now + ) + + def validate_token(self, token: str) -> Optional[ChromeExtensionToken]: + """验证 Chrome 扩展令牌""" + token_hash = hashlib.sha256(token.encode()).hexdigest() + + conn = self.pm.db.get_conn() + row = conn.execute( + """SELECT * FROM chrome_extension_tokens + WHERE token_hash = ? AND is_revoked = 0""", + (token_hash,) + ).fetchone() + conn.close() + + if not row: + return None + + # 检查是否过期 + if row['expires_at'] and datetime.now().isoformat() > row['expires_at']: + return None + + # 更新使用记录 + now = datetime.now().isoformat() + conn = self.pm.db.get_conn() + conn.execute( + """UPDATE chrome_extension_tokens + SET use_count = use_count + 1, last_used_at = ? + WHERE id = ?""", + (now, row['id']) + ) + conn.commit() + conn.close() + + return ChromeExtensionToken( + id=row['id'], + token="", # 不返回实际令牌 + user_id=row['user_id'], + project_id=row['project_id'], + name=row['name'], + permissions=json.loads(row['permissions']), + expires_at=row['expires_at'], + created_at=row['created_at'], + last_used_at=now, + use_count=row['use_count'] + 1 + ) + + def revoke_token(self, token_id: str) -> bool: + """撤销令牌""" + conn = self.pm.db.get_conn() + cursor = conn.execute( + "UPDATE chrome_extension_tokens SET is_revoked = 1 WHERE id = ?", + (token_id,) + ) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def list_tokens(self, user_id: str = None, project_id: str = None) -> List[ChromeExtensionToken]: + """列出令牌""" + conn = self.pm.db.get_conn() + + conditions = ["is_revoked = 0"] + params = [] + + if user_id: + conditions.append("user_id = ?") + params.append(user_id) + if project_id: + conditions.append("project_id = ?") + params.append(project_id) + + where_clause = " AND ".join(conditions) + + rows = conn.execute( + f"SELECT * FROM chrome_extension_tokens WHERE {where_clause} ORDER BY created_at DESC", + params + ).fetchall() + conn.close() + + tokens = [] + for row in rows: + tokens.append(ChromeExtensionToken( + id=row['id'], + token="", # 不返回实际令牌 + user_id=row['user_id'], + project_id=row['project_id'], + name=row['name'], + permissions=json.loads(row['permissions']), + expires_at=row['expires_at'], + created_at=row['created_at'], + last_used_at=row['last_used_at'], + use_count=row['use_count'], + is_revoked=bool(row['is_revoked']) + )) + + return tokens + + async def import_webpage(self, token: ChromeExtensionToken, url: str, title: str, + content: str, html_content: str = None) -> Dict: + """导入网页内容""" + if not token.project_id: + return {"success": False, "error": "Token not associated with any project"} + + if "write" not in token.permissions: + return {"success": False, "error": "Insufficient permissions"} + + # 创建转录记录(将网页作为文档处理) + transcript_id = str(uuid.uuid4())[:8] + now = datetime.now().isoformat() + + # 构建完整文本 + full_text = f"# {title}\n\nURL: {url}\n\n{content}" + + conn = self.pm.db.get_conn() + conn.execute( + """INSERT INTO transcripts + (id, project_id, filename, full_text, type, created_at) + VALUES (?, ?, ?, ?, ?, ?)""", + (transcript_id, token.project_id, f"web_{title[:50]}.md", full_text, "webpage", now) + ) + conn.commit() + conn.close() + + return { + "success": True, + "transcript_id": transcript_id, + "project_id": token.project_id, + "url": url, + "title": title, + "content_length": len(content) + } + + +class BotHandler: + """飞书/钉钉机器人处理器""" + + def __init__(self, plugin_manager: PluginManager, bot_type: str): + self.pm = plugin_manager + self.bot_type = bot_type + + def create_session(self, session_id: str, session_name: str, + project_id: str = None, webhook_url: str = "", + secret: str = "") -> BotSession: + """创建机器人会话""" + bot_id = str(uuid.uuid4())[:8] + now = datetime.now().isoformat() + + conn = self.pm.db.get_conn() + conn.execute( + """INSERT INTO bot_sessions + (id, bot_type, session_id, session_name, project_id, webhook_url, secret, + is_active, created_at, updated_at, message_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (bot_id, self.bot_type, session_id, session_name, project_id, webhook_url, secret, + True, now, now, 0) + ) + conn.commit() + conn.close() + + return BotSession( + id=bot_id, + bot_type=self.bot_type, + session_id=session_id, + session_name=session_name, + project_id=project_id, + webhook_url=webhook_url, + secret=secret, + is_active=True, + created_at=now, + updated_at=now + ) + + def get_session(self, session_id: str) -> Optional[BotSession]: + """获取会话""" + conn = self.pm.db.get_conn() + row = conn.execute( + """SELECT * FROM bot_sessions + WHERE session_id = ? AND bot_type = ?""", + (session_id, self.bot_type) + ).fetchone() + conn.close() + + if row: + return self._row_to_session(row) + return None + + def list_sessions(self, project_id: str = None) -> List[BotSession]: + """列出会话""" + conn = self.pm.db.get_conn() + + if project_id: + rows = conn.execute( + """SELECT * FROM bot_sessions + WHERE bot_type = ? AND project_id = ? ORDER BY created_at DESC""", + (self.bot_type, project_id) + ).fetchall() + else: + rows = conn.execute( + """SELECT * FROM bot_sessions + WHERE bot_type = ? ORDER BY created_at DESC""", + (self.bot_type,) + ).fetchall() + + conn.close() + + return [self._row_to_session(row) for row in rows] + + def update_session(self, session_id: str, **kwargs) -> Optional[BotSession]: + """更新会话""" + conn = self.pm.db.get_conn() + + allowed_fields = ['session_name', 'project_id', 'webhook_url', 'secret', 'is_active'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + values.append(kwargs[field]) + + if not updates: + conn.close() + return self.get_session(session_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(session_id) + values.append(self.bot_type) + + query = f"UPDATE bot_sessions SET {', '.join(updates)} WHERE session_id = ? AND bot_type = ?" + conn.execute(query, values) + conn.commit() + conn.close() + + return self.get_session(session_id) + + def delete_session(self, session_id: str) -> bool: + """删除会话""" + conn = self.pm.db.get_conn() + cursor = conn.execute( + "DELETE FROM bot_sessions WHERE session_id = ? AND bot_type = ?", + (session_id, self.bot_type) + ) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def _row_to_session(self, row: sqlite3.Row) -> BotSession: + """将数据库行转换为 BotSession 对象""" + return BotSession( + id=row['id'], + bot_type=row['bot_type'], + session_id=row['session_id'], + session_name=row['session_name'], + project_id=row['project_id'], + webhook_url=row['webhook_url'], + secret=row['secret'], + is_active=bool(row['is_active']), + created_at=row['created_at'], + updated_at=row['updated_at'], + last_message_at=row['last_message_at'], + message_count=row['message_count'] + ) + + async def handle_message(self, session: BotSession, message: Dict) -> Dict: + """处理收到的消息""" + now = datetime.now().isoformat() + + # 更新消息统计 + conn = self.pm.db.get_conn() + conn.execute( + """UPDATE bot_sessions + SET message_count = message_count + 1, last_message_at = ? + WHERE id = ?""", + (now, session.id) + ) + conn.commit() + conn.close() + + # 处理消息 + msg_type = message.get('msg_type', 'text') + content = message.get('content', {}) + + if msg_type == 'text': + text = content.get('text', '') + return await self._handle_text_message(session, text, message) + elif msg_type == 'audio': + # 处理音频消息 + return await self._handle_audio_message(session, message) + elif msg_type == 'file': + # 处理文件消息 + return await self._handle_file_message(session, message) + + return {"success": False, "error": "Unsupported message type"} + + async def _handle_text_message(self, session: BotSession, text: str, + raw_message: Dict) -> Dict: + """处理文本消息""" + # 简单命令处理 + if text.startswith('/help'): + return { + "success": True, + "response": """🤖 InsightFlow 机器人命令: +/help - 显示帮助 +/status - 查看项目状态 +/analyze - 分析网页内容 +/search <关键词> - 搜索知识库""" + } + + if text.startswith('/status'): + if not session.project_id: + return {"success": True, "response": "⚠️ 当前会话未绑定项目"} + + # 获取项目状态 + summary = self.pm.db.get_project_summary(session.project_id) + stats = summary.get('statistics', {}) + + return { + "success": True, + "response": f"""📊 项目状态: +实体数量: {stats.get('entity_count', 0)} +关系数量: {stats.get('relation_count', 0)} +转录数量: {stats.get('transcript_count', 0)}""" + } + + # 默认回复 + return { + "success": True, + "response": f"收到消息:{text[:100]}...\n\n使用 /help 查看可用命令" + } + + async def _handle_audio_message(self, session: BotSession, message: Dict) -> Dict: + """处理音频消息""" + if not session.project_id: + return {"success": False, "error": "Session not bound to any project"} + + # 下载音频文件 + audio_url = message.get('content', {}).get('download_url') + if not audio_url: + return {"success": False, "error": "No audio URL provided"} + + try: + async with httpx.AsyncClient() as client: + response = await client.get(audio_url) + audio_data = response.content + + # 保存音频文件 + filename = f"bot_audio_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp3" + + # 这里应该调用 ASR 服务进行转录 + # 简化处理,返回提示 + return { + "success": True, + "response": "🎵 收到音频文件,正在处理中...\n分析完成后会通知您。", + "audio_size": len(audio_data), + "filename": filename + } + + except Exception as e: + return {"success": False, "error": f"Failed to process audio: {str(e)}"} + + async def _handle_file_message(self, session: BotSession, message: Dict) -> Dict: + """处理文件消息""" + return { + "success": True, + "response": "📎 收到文件,正在处理中..." + } + + async def send_message(self, session: BotSession, message: str, + msg_type: str = "text") -> bool: + """发送消息到群聊""" + if not session.webhook_url: + return False + + try: + if self.bot_type == "feishu": + return await self._send_feishu_message(session, message, msg_type) + elif self.bot_type == "dingtalk": + return await self._send_dingtalk_message(session, message, msg_type) + + return False + + except Exception as e: + print(f"Failed to send {self.bot_type} message: {e}") + return False + + async def _send_feishu_message(self, session: BotSession, message: str, + msg_type: str) -> bool: + """发送飞书消息""" + import hashlib + import base64 + + timestamp = str(int(time.time())) + + # 生成签名 + if session.secret: + string_to_sign = f"{timestamp}\n{session.secret}" + hmac_code = hmac.new( + session.secret.encode('utf-8'), + string_to_sign.encode('utf-8'), + digestmod=hashlib.sha256 + ).digest() + sign = base64.b64encode(hmac_code).decode('utf-8') + else: + sign = "" + + payload = { + "timestamp": timestamp, + "sign": sign, + "msg_type": "text", + "content": { + "text": message + } + } + + async with httpx.AsyncClient() as client: + response = await client.post( + session.webhook_url, + json=payload, + headers={"Content-Type": "application/json"} + ) + return response.status_code == 200 + + async def _send_dingtalk_message(self, session: BotSession, message: str, + msg_type: str) -> bool: + """发送钉钉消息""" + import hashlib + import base64 + + timestamp = str(round(time.time() * 1000)) + + # 生成签名 + if session.secret: + string_to_sign = f"{timestamp}\n{session.secret}" + hmac_code = hmac.new( + session.secret.encode('utf-8'), + string_to_sign.encode('utf-8'), + digestmod=hashlib.sha256 + ).digest() + sign = base64.b64encode(hmac_code).decode('utf-8') + sign = urllib.parse.quote(sign) + else: + sign = "" + + payload = { + "msgtype": "text", + "text": { + "content": message + } + } + + url = session.webhook_url + if sign: + url = f"{url}×tamp={timestamp}&sign={sign}" + + async with httpx.AsyncClient() as client: + response = await client.post( + url, + json=payload, + headers={"Content-Type": "application/json"} + ) + return response.status_code == 200 + + +class WebhookIntegration: + """Zapier/Make Webhook 集成""" + + def __init__(self, plugin_manager: PluginManager, endpoint_type: str): + self.pm = plugin_manager + self.endpoint_type = endpoint_type + + def create_endpoint(self, name: str, endpoint_url: str, + project_id: str = None, auth_type: str = "none", + auth_config: Dict = None, + trigger_events: List[str] = None) -> WebhookEndpoint: + """创建 Webhook 端点""" + endpoint_id = str(uuid.uuid4())[:8] + now = datetime.now().isoformat() + + conn = self.pm.db.get_conn() + conn.execute( + """INSERT INTO webhook_endpoints + (id, name, endpoint_type, endpoint_url, project_id, auth_type, auth_config, + trigger_events, is_active, created_at, updated_at, trigger_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (endpoint_id, name, self.endpoint_type, endpoint_url, project_id, auth_type, + json.dumps(auth_config or {}), json.dumps(trigger_events or []), True, + now, now, 0) + ) + conn.commit() + conn.close() + + return WebhookEndpoint( + id=endpoint_id, + name=name, + endpoint_type=self.endpoint_type, + endpoint_url=endpoint_url, + project_id=project_id, + auth_type=auth_type, + auth_config=auth_config or {}, + trigger_events=trigger_events or [], + is_active=True, + created_at=now, + updated_at=now + ) + + def get_endpoint(self, endpoint_id: str) -> Optional[WebhookEndpoint]: + """获取端点""" + conn = self.pm.db.get_conn() + row = conn.execute( + "SELECT * FROM webhook_endpoints WHERE id = ? AND endpoint_type = ?", + (endpoint_id, self.endpoint_type) + ).fetchone() + conn.close() + + if row: + return self._row_to_endpoint(row) + return None + + def list_endpoints(self, project_id: str = None) -> List[WebhookEndpoint]: + """列出端点""" + conn = self.pm.db.get_conn() + + if project_id: + rows = conn.execute( + """SELECT * FROM webhook_endpoints + WHERE endpoint_type = ? AND project_id = ? ORDER BY created_at DESC""", + (self.endpoint_type, project_id) + ).fetchall() + else: + rows = conn.execute( + """SELECT * FROM webhook_endpoints + WHERE endpoint_type = ? ORDER BY created_at DESC""", + (self.endpoint_type,) + ).fetchall() + + conn.close() + + return [self._row_to_endpoint(row) for row in rows] + + def update_endpoint(self, endpoint_id: str, **kwargs) -> Optional[WebhookEndpoint]: + """更新端点""" + conn = self.pm.db.get_conn() + + allowed_fields = ['name', 'endpoint_url', 'project_id', 'auth_type', + 'auth_config', 'trigger_events', 'is_active'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field in ['auth_config', 'trigger_events']: + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + conn.close() + return self.get_endpoint(endpoint_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(endpoint_id) + + query = f"UPDATE webhook_endpoints SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + conn.close() + + return self.get_endpoint(endpoint_id) + + def delete_endpoint(self, endpoint_id: str) -> bool: + """删除端点""" + conn = self.pm.db.get_conn() + cursor = conn.execute( + "DELETE FROM webhook_endpoints WHERE id = ?", + (endpoint_id,) + ) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def _row_to_endpoint(self, row: sqlite3.Row) -> WebhookEndpoint: + """将数据库行转换为 WebhookEndpoint 对象""" + return WebhookEndpoint( + id=row['id'], + name=row['name'], + endpoint_type=row['endpoint_type'], + endpoint_url=row['endpoint_url'], + project_id=row['project_id'], + auth_type=row['auth_type'], + auth_config=json.loads(row['auth_config']) if row['auth_config'] else {}, + trigger_events=json.loads(row['trigger_events']) if row['trigger_events'] else [], + is_active=bool(row['is_active']), + created_at=row['created_at'], + updated_at=row['updated_at'], + last_triggered_at=row['last_triggered_at'], + trigger_count=row['trigger_count'] + ) + + async def trigger(self, endpoint: WebhookEndpoint, event_type: str, + data: Dict) -> bool: + """触发 Webhook""" + if not endpoint.is_active: + return False + + if event_type not in endpoint.trigger_events: + return False + + try: + headers = {"Content-Type": "application/json"} + + # 添加认证头 + if endpoint.auth_type == "api_key": + api_key = endpoint.auth_config.get('api_key', '') + header_name = endpoint.auth_config.get('header_name', 'X-API-Key') + headers[header_name] = api_key + elif endpoint.auth_type == "bearer": + token = endpoint.auth_config.get('token', '') + headers["Authorization"] = f"Bearer {token}" + + payload = { + "event": event_type, + "timestamp": datetime.now().isoformat(), + "data": data + } + + async with httpx.AsyncClient() as client: + response = await client.post( + endpoint.endpoint_url, + json=payload, + headers=headers, + timeout=30.0 + ) + + success = response.status_code in [200, 201, 202] + + # 更新触发统计 + now = datetime.now().isoformat() + conn = self.pm.db.get_conn() + conn.execute( + """UPDATE webhook_endpoints + SET trigger_count = trigger_count + 1, last_triggered_at = ? + WHERE id = ?""", + (now, endpoint.id) + ) + conn.commit() + conn.close() + + return success + + except Exception as e: + print(f"Failed to trigger webhook: {e}") + return False + + async def test_endpoint(self, endpoint: WebhookEndpoint) -> Dict: + """测试端点""" + test_data = { + "message": "This is a test event from InsightFlow", + "test": True, + "timestamp": datetime.now().isoformat() + } + + success = await self.trigger(endpoint, "test", test_data) + + return { + "success": success, + "endpoint_id": endpoint.id, + "endpoint_url": endpoint.endpoint_url, + "message": "Test event sent successfully" if success else "Failed to send test event" + } + + +class WebDAVSyncManager: + """WebDAV 同步管理""" + + def __init__(self, plugin_manager: PluginManager): + self.pm = plugin_manager + + def create_sync(self, name: str, project_id: str, server_url: str, + username: str, password: str, remote_path: str = "/insightflow", + sync_mode: str = "bidirectional", + sync_interval: int = 3600) -> WebDAVSync: + """创建 WebDAV 同步配置""" + sync_id = str(uuid.uuid4())[:8] + now = datetime.now().isoformat() + + conn = self.pm.db.get_conn() + conn.execute( + """INSERT INTO webdav_syncs + (id, name, project_id, server_url, username, password, remote_path, + sync_mode, sync_interval, last_sync_status, is_active, created_at, updated_at, sync_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (sync_id, name, project_id, server_url, username, password, remote_path, + sync_mode, sync_interval, 'pending', True, now, now, 0) + ) + conn.commit() + conn.close() + + return WebDAVSync( + id=sync_id, + name=name, + project_id=project_id, + server_url=server_url, + username=username, + password=password, + remote_path=remote_path, + sync_mode=sync_mode, + sync_interval=sync_interval, + last_sync_status='pending', + is_active=True, + created_at=now, + updated_at=now + ) + + def get_sync(self, sync_id: str) -> Optional[WebDAVSync]: + """获取同步配置""" + conn = self.pm.db.get_conn() + row = conn.execute( + "SELECT * FROM webdav_syncs WHERE id = ?", + (sync_id,) + ).fetchone() + conn.close() + + if row: + return self._row_to_sync(row) + return None + + def list_syncs(self, project_id: str = None) -> List[WebDAVSync]: + """列出同步配置""" + conn = self.pm.db.get_conn() + + if project_id: + rows = conn.execute( + "SELECT * FROM webdav_syncs WHERE project_id = ? ORDER BY created_at DESC", + (project_id,) + ).fetchall() + else: + rows = conn.execute( + "SELECT * FROM webdav_syncs ORDER BY created_at DESC" + ).fetchall() + + conn.close() + + return [self._row_to_sync(row) for row in rows] + + def update_sync(self, sync_id: str, **kwargs) -> Optional[WebDAVSync]: + """更新同步配置""" + conn = self.pm.db.get_conn() + + allowed_fields = ['name', 'server_url', 'username', 'password', + 'remote_path', 'sync_mode', 'sync_interval', 'is_active'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + values.append(kwargs[field]) + + if not updates: + conn.close() + return self.get_sync(sync_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(sync_id) + + query = f"UPDATE webdav_syncs SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + conn.close() + + return self.get_sync(sync_id) + + def delete_sync(self, sync_id: str) -> bool: + """删除同步配置""" + conn = self.pm.db.get_conn() + cursor = conn.execute( + "DELETE FROM webdav_syncs WHERE id = ?", + (sync_id,) + ) + conn.commit() + conn.close() + + return cursor.rowcount > 0 + + def _row_to_sync(self, row: sqlite3.Row) -> WebDAVSync: + """将数据库行转换为 WebDAVSync 对象""" + return WebDAVSync( + id=row['id'], + name=row['name'], + project_id=row['project_id'], + server_url=row['server_url'], + username=row['username'], + password=row['password'], + remote_path=row['remote_path'], + sync_mode=row['sync_mode'], + sync_interval=row['sync_interval'], + last_sync_at=row['last_sync_at'], + last_sync_status=row['last_sync_status'], + last_sync_error=row['last_sync_error'] or "", + is_active=bool(row['is_active']), + created_at=row['created_at'], + updated_at=row['updated_at'], + sync_count=row['sync_count'] + ) + + async def test_connection(self, sync: WebDAVSync) -> Dict: + """测试 WebDAV 连接""" + if not WEBDAV_AVAILABLE: + return {"success": False, "error": "WebDAV library not available"} + + try: + client = webdav_client.Client( + sync.server_url, + auth=(sync.username, sync.password) + ) + + # 尝试列出根目录 + client.list("/") + + return { + "success": True, + "message": "Connection successful" + } + + except Exception as e: + return { + "success": False, + "error": str(e) + } + + async def sync_project(self, sync: WebDAVSync) -> Dict: + """同步项目到 WebDAV""" + if not WEBDAV_AVAILABLE: + return {"success": False, "error": "WebDAV library not available"} + + if not sync.is_active: + return {"success": False, "error": "Sync is not active"} + + try: + client = webdav_client.Client( + sync.server_url, + auth=(sync.username, sync.password) + ) + + # 确保远程目录存在 + remote_project_path = f"{sync.remote_path}/{sync.project_id}" + try: + client.mkdir(remote_project_path) + except: + pass # 目录可能已存在 + + # 获取项目数据 + project = self.pm.db.get_project(sync.project_id) + if not project: + return {"success": False, "error": "Project not found"} + + # 导出项目数据为 JSON + entities = self.pm.db.list_project_entities(sync.project_id) + relations = self.pm.db.list_project_relations(sync.project_id) + transcripts = self.pm.db.list_project_transcripts(sync.project_id) + + export_data = { + "project": { + "id": project.id, + "name": project.name, + "description": project.description + }, + "entities": [{"id": e.id, "name": e.name, "type": e.type} for e in entities], + "relations": relations, + "transcripts": [{"id": t['id'], "filename": t['filename']} for t in transcripts], + "exported_at": datetime.now().isoformat() + } + + # 上传 JSON 文件 + json_content = json.dumps(export_data, ensure_ascii=False, indent=2) + json_path = f"{remote_project_path}/project_export.json" + + # 使用临时文件上传 + import tempfile + with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f: + f.write(json_content) + temp_path = f.name + + client.upload_file(temp_path, json_path) + os.unlink(temp_path) + + # 更新同步状态 + now = datetime.now().isoformat() + conn = self.pm.db.get_conn() + conn.execute( + """UPDATE webdav_syncs + SET last_sync_at = ?, last_sync_status = ?, sync_count = sync_count + 1 + WHERE id = ?""", + (now, 'success', sync.id) + ) + conn.commit() + conn.close() + + return { + "success": True, + "message": "Project synced successfully", + "entities_count": len(entities), + "relations_count": len(relations), + "remote_path": json_path + } + + except Exception as e: + # 更新失败状态 + conn = self.pm.db.get_conn() + conn.execute( + """UPDATE webdav_syncs + SET last_sync_status = ?, last_sync_error = ? + WHERE id = ?""", + ('failed', str(e), sync.id) + ) + conn.commit() + conn.close() + + return { + "success": False, + "error": str(e) + } + + +# Singleton instance +_plugin_manager = None + +def get_plugin_manager(db_manager=None): + """获取 PluginManager 单例""" + global _plugin_manager + if _plugin_manager is None: + _plugin_manager = PluginManager(db_manager) + return _plugin_manager diff --git a/backend/rate_limiter.py b/backend/rate_limiter.py new file mode 100644 index 0000000..878306b --- /dev/null +++ b/backend/rate_limiter.py @@ -0,0 +1,223 @@ +#!/usr/bin/env python3 +""" +InsightFlow Rate Limiter - Phase 6 +API 限流中间件 +支持基于内存的滑动窗口限流 +""" + +import time +import asyncio +from typing import Dict, Optional, Tuple, Callable +from dataclasses import dataclass, field +from collections import defaultdict +from functools import wraps + + +@dataclass +class RateLimitConfig: + """限流配置""" + requests_per_minute: int = 60 + burst_size: int = 10 # 突发请求数 + window_size: int = 60 # 窗口大小(秒) + + +@dataclass +class RateLimitInfo: + """限流信息""" + allowed: bool + remaining: int + reset_time: int # 重置时间戳 + retry_after: int # 需要等待的秒数 + + +class SlidingWindowCounter: + """滑动窗口计数器""" + + def __init__(self, window_size: int = 60): + self.window_size = window_size + self.requests: Dict[int, int] = defaultdict(int) # 秒级计数 + self._lock = asyncio.Lock() + + async def add_request(self) -> int: + """添加请求,返回当前窗口内的请求数""" + async with self._lock: + now = int(time.time()) + self.requests[now] += 1 + self._cleanup_old(now) + return sum(self.requests.values()) + + async def get_count(self) -> int: + """获取当前窗口内的请求数""" + async with self._lock: + now = int(time.time()) + self._cleanup_old(now) + return sum(self.requests.values()) + + def _cleanup_old(self, now: int): + """清理过期的请求记录""" + cutoff = now - self.window_size + old_keys = [k for k in self.requests.keys() if k < cutoff] + for k in old_keys: + del self.requests[k] + + +class RateLimiter: + """API 限流器""" + + def __init__(self): + # key -> SlidingWindowCounter + self.counters: Dict[str, SlidingWindowCounter] = {} + # key -> RateLimitConfig + self.configs: Dict[str, RateLimitConfig] = {} + self._lock = asyncio.Lock() + + async def is_allowed( + self, + key: str, + config: Optional[RateLimitConfig] = None + ) -> RateLimitInfo: + """ + 检查是否允许请求 + + Args: + key: 限流键(如 API Key ID) + config: 限流配置,如果为 None 则使用默认配置 + + Returns: + RateLimitInfo + """ + if config is None: + config = RateLimitConfig() + + async with self._lock: + if key not in self.counters: + self.counters[key] = SlidingWindowCounter(config.window_size) + self.configs[key] = config + + counter = self.counters[key] + stored_config = self.configs.get(key, config) + + # 获取当前计数 + current_count = await counter.get_count() + + # 计算剩余配额 + remaining = max(0, stored_config.requests_per_minute - current_count) + + # 计算重置时间 + now = int(time.time()) + reset_time = now + stored_config.window_size + + # 检查是否超过限制 + if current_count >= stored_config.requests_per_minute: + return RateLimitInfo( + allowed=False, + remaining=0, + reset_time=reset_time, + retry_after=stored_config.window_size + ) + + # 允许请求,增加计数 + await counter.add_request() + + return RateLimitInfo( + allowed=True, + remaining=remaining - 1, + reset_time=reset_time, + retry_after=0 + ) + + async def get_limit_info(self, key: str) -> RateLimitInfo: + """获取限流信息(不增加计数)""" + if key not in self.counters: + config = RateLimitConfig() + return RateLimitInfo( + allowed=True, + remaining=config.requests_per_minute, + reset_time=int(time.time()) + config.window_size, + retry_after=0 + ) + + counter = self.counters[key] + config = self.configs.get(key, RateLimitConfig()) + + current_count = await counter.get_count() + remaining = max(0, config.requests_per_minute - current_count) + reset_time = int(time.time()) + config.window_size + + return RateLimitInfo( + allowed=current_count < config.requests_per_minute, + remaining=remaining, + reset_time=reset_time, + retry_after=max(0, config.window_size) if current_count >= config.requests_per_minute else 0 + ) + + def reset(self, key: Optional[str] = None): + """重置限流计数器""" + if key: + self.counters.pop(key, None) + self.configs.pop(key, None) + else: + self.counters.clear() + self.configs.clear() + + +# 全局限流器实例 +_rate_limiter: Optional[RateLimiter] = None + + +def get_rate_limiter() -> RateLimiter: + """获取限流器实例""" + global _rate_limiter + if _rate_limiter is None: + _rate_limiter = RateLimiter() + return _rate_limiter + + +# 限流装饰器(用于函数级别限流) +def rate_limit( + requests_per_minute: int = 60, + key_func: Optional[Callable] = None +): + """ + 限流装饰器 + + Args: + requests_per_minute: 每分钟请求数限制 + key_func: 生成限流键的函数,默认为 None(使用函数名) + """ + def decorator(func): + limiter = get_rate_limiter() + config = RateLimitConfig(requests_per_minute=requests_per_minute) + + @wraps(func) + async def async_wrapper(*args, **kwargs): + key = key_func(*args, **kwargs) if key_func else func.__name__ + info = await limiter.is_allowed(key, config) + + if not info.allowed: + raise RateLimitExceeded( + f"Rate limit exceeded. Try again in {info.retry_after} seconds." + ) + + return await func(*args, **kwargs) + + @wraps(func) + def sync_wrapper(*args, **kwargs): + key = key_func(*args, **kwargs) if key_func else func.__name__ + # 同步版本使用 asyncio.run + info = asyncio.run(limiter.is_allowed(key, config)) + + if not info.allowed: + raise RateLimitExceeded( + f"Rate limit exceeded. Try again in {info.retry_after} seconds." + ) + + return func(*args, **kwargs) + + return async_wrapper if asyncio.iscoroutinefunction(func) else sync_wrapper + return decorator + + +class RateLimitExceeded(Exception): + """限流异常""" + pass diff --git a/backend/requirements.txt b/backend/requirements.txt index 04fcb73..c8e266b 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -30,3 +30,26 @@ cairosvg==2.7.1 # Neo4j Graph Database neo4j==5.15.0 + +# API Documentation (Swagger/OpenAPI) +fastapi-offline-swagger==0.1.0 + +# Phase 7: Workflow Automation +apscheduler==3.10.4 + +# Phase 7: Multimodal Support +ffmpeg-python==0.2.0 +pillow==10.2.0 +opencv-python==4.9.0.80 +pytesseract==0.3.10 + +# Phase 7 Task 7: Plugin & Integration +webdav4==0.9.8 +urllib3==2.2.0 + +# Phase 7: Plugin & Integration +beautifulsoup4==4.12.3 +webdavclient3==3.14.6 + +# Phase 7 Task 3: Security & Compliance +cryptography==42.0.0 diff --git a/backend/schema.sql b/backend/schema.sql index baf2278..98d8d34 100644 --- a/backend/schema.sql +++ b/backend/schema.sql @@ -80,7 +80,7 @@ CREATE TABLE IF NOT EXISTS attribute_templates ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, name TEXT NOT NULL, - type TEXT NOT NULL, -- text/number/date/select/multiselect + type TEXT NOT NULL, -- text/number/date/select/multiselect/boolean description TEXT, options TEXT, -- JSON 数组,用于 select/multiselect 类型 is_required INTEGER DEFAULT 0, @@ -111,54 +111,13 @@ CREATE TABLE IF NOT EXISTS entity_attributes ( CREATE TABLE IF NOT EXISTS attribute_history ( id TEXT PRIMARY KEY, entity_id TEXT NOT NULL, + template_id TEXT, attribute_name TEXT NOT NULL, old_value TEXT, new_value TEXT, changed_by TEXT, -- 用户ID或系统 changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, change_reason TEXT, - FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE -); - --- Phase 5: 属性模板表(项目级自定义属性定义) -CREATE TABLE IF NOT EXISTS attribute_templates ( - id TEXT PRIMARY KEY, - project_id TEXT NOT NULL, - name TEXT NOT NULL, -- 属性名称,如"年龄"、"职位" - type TEXT NOT NULL, -- 属性类型: text, number, date, select, multiselect, boolean - options TEXT, -- JSON 数组,用于 select/multiselect 类型 - default_value TEXT, -- 默认值 - description TEXT, -- 属性描述 - is_required BOOLEAN DEFAULT 0, -- 是否必填 - display_order INTEGER DEFAULT 0, -- 显示顺序 - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - FOREIGN KEY (project_id) REFERENCES projects(id) -); - --- Phase 5: 实体属性值表 -CREATE TABLE IF NOT EXISTS entity_attributes ( - id TEXT PRIMARY KEY, - entity_id TEXT NOT NULL, - template_id TEXT NOT NULL, - value TEXT, -- 属性值(以JSON或字符串形式存储) - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE, - FOREIGN KEY (template_id) REFERENCES attribute_templates(id) ON DELETE CASCADE, - UNIQUE(entity_id, template_id) -- 每个实体每个属性只能有一个值 -); - --- Phase 5: 属性变更历史表 -CREATE TABLE IF NOT EXISTS attribute_history ( - id TEXT PRIMARY KEY, - entity_id TEXT NOT NULL, - template_id TEXT NOT NULL, - old_value TEXT, - new_value TEXT, - changed_by TEXT, -- 用户ID或"system" - changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - change_reason TEXT, -- 变更原因 FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE, FOREIGN KEY (template_id) REFERENCES attribute_templates(id) ON DELETE CASCADE ); @@ -178,90 +137,499 @@ CREATE INDEX IF NOT EXISTS idx_entity_attributes_entity ON entity_attributes(ent CREATE INDEX IF NOT EXISTS idx_entity_attributes_template ON entity_attributes(template_id); CREATE INDEX IF NOT EXISTS idx_attr_history_entity ON attribute_history(entity_id); --- Phase 7: 协作与共享 - 项目分享表 -CREATE TABLE IF NOT EXISTS project_shares ( - id TEXT PRIMARY KEY, - project_id TEXT NOT NULL, - token TEXT NOT NULL UNIQUE, -- 分享令牌 - permission TEXT DEFAULT 'read_only', -- 权限级别: read_only, comment, edit, admin - created_by TEXT NOT NULL, -- 创建者 - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - expires_at TIMESTAMP, -- 过期时间 - max_uses INTEGER, -- 最大使用次数 - use_count INTEGER DEFAULT 0, -- 已使用次数 - password_hash TEXT, -- 密码保护(哈希) - is_active BOOLEAN DEFAULT 1, -- 是否激活 - allow_download BOOLEAN DEFAULT 0, -- 允许下载 - allow_export BOOLEAN DEFAULT 0, -- 允许导出 - FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE -); +-- Phase 7: 工作流相关表 --- Phase 7: 协作与共享 - 评论表 -CREATE TABLE IF NOT EXISTS comments ( +-- 工作流配置表 +CREATE TABLE IF NOT EXISTS workflows ( id TEXT PRIMARY KEY, + name TEXT NOT NULL, + description TEXT, + workflow_type TEXT NOT NULL, -- auto_analyze, auto_align, auto_relation, scheduled_report, custom project_id TEXT NOT NULL, - target_type TEXT NOT NULL, -- 目标类型: entity, relation, transcript, project - target_id TEXT NOT NULL, -- 目标ID - parent_id TEXT, -- 父评论ID(支持回复) - author TEXT NOT NULL, -- 作者ID - author_name TEXT, -- 作者显示名 - content TEXT NOT NULL, -- 评论内容 + status TEXT DEFAULT 'active', -- active, paused, error, completed + schedule TEXT, -- cron expression or interval minutes + schedule_type TEXT DEFAULT 'manual', -- manual, cron, interval + config TEXT, -- JSON: workflow specific configuration + webhook_ids TEXT, -- JSON array of webhook config IDs + is_active BOOLEAN DEFAULT 1, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - resolved BOOLEAN DEFAULT 0, -- 是否已解决 - resolved_by TEXT, -- 解决者 - resolved_at TIMESTAMP, -- 解决时间 - mentions TEXT, -- JSON数组: 提及的用户 - attachments TEXT, -- JSON数组: 附件 - FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE, - FOREIGN KEY (parent_id) REFERENCES comments(id) ON DELETE CASCADE + last_run_at TIMESTAMP, + next_run_at TIMESTAMP, + run_count INTEGER DEFAULT 0, + success_count INTEGER DEFAULT 0, + fail_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) ); --- Phase 7: 协作与共享 - 变更历史表 -CREATE TABLE IF NOT EXISTS change_history ( +-- 工作流任务表 +CREATE TABLE IF NOT EXISTS workflow_tasks ( + id TEXT PRIMARY KEY, + workflow_id TEXT NOT NULL, + name TEXT NOT NULL, + task_type TEXT NOT NULL, -- analyze, align, discover_relations, notify, custom + config TEXT, -- JSON: task specific configuration + task_order INTEGER DEFAULT 0, + depends_on TEXT, -- JSON array of task IDs + timeout_seconds INTEGER DEFAULT 300, + retry_count INTEGER DEFAULT 3, + retry_delay INTEGER DEFAULT 5, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (workflow_id) REFERENCES workflows(id) ON DELETE CASCADE +); + +-- Webhook 配置表 +CREATE TABLE IF NOT EXISTS webhook_configs ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + webhook_type TEXT NOT NULL, -- feishu, dingtalk, slack, custom + url TEXT NOT NULL, + secret TEXT, -- for signature verification + headers TEXT, -- JSON: custom headers + template TEXT, -- message template + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_used_at TIMESTAMP, + success_count INTEGER DEFAULT 0, + fail_count INTEGER DEFAULT 0 +); + +-- 工作流执行日志表 +CREATE TABLE IF NOT EXISTS workflow_logs ( + id TEXT PRIMARY KEY, + workflow_id TEXT NOT NULL, + task_id TEXT, -- NULL if workflow-level log + status TEXT DEFAULT 'pending', -- pending, running, success, failed, cancelled + start_time TIMESTAMP, + end_time TIMESTAMP, + duration_ms INTEGER, + input_data TEXT, -- JSON: input parameters + output_data TEXT, -- JSON: execution results + error_message TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (workflow_id) REFERENCES workflows(id) ON DELETE CASCADE, + FOREIGN KEY (task_id) REFERENCES workflow_tasks(id) ON DELETE SET NULL +); + +-- Phase 7: 工作流相关索引 +CREATE INDEX IF NOT EXISTS idx_workflows_project ON workflows(project_id); +CREATE INDEX IF NOT EXISTS idx_workflows_status ON workflows(status); +CREATE INDEX IF NOT EXISTS idx_workflows_type ON workflows(workflow_type); +CREATE INDEX IF NOT EXISTS idx_workflow_tasks_workflow ON workflow_tasks(workflow_id); +CREATE INDEX IF NOT EXISTS idx_workflow_logs_workflow ON workflow_logs(workflow_id); +CREATE INDEX IF NOT EXISTS idx_workflow_logs_task ON workflow_logs(task_id); +CREATE INDEX IF NOT EXISTS idx_workflow_logs_status ON workflow_logs(status); +CREATE INDEX IF NOT EXISTS idx_workflow_logs_created ON workflow_logs(created_at); + +-- Phase 7: 多模态支持相关表 + +-- 视频表 +CREATE TABLE IF NOT EXISTS videos ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, - change_type TEXT NOT NULL, -- 变更类型: create, update, delete, merge, split - entity_type TEXT NOT NULL, -- 实体类型: entity, relation, transcript, project - entity_id TEXT NOT NULL, -- 实体ID - entity_name TEXT, -- 实体名称(用于显示) - changed_by TEXT NOT NULL, -- 变更者ID - changed_by_name TEXT, -- 变更者显示名 - changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - old_value TEXT, -- JSON: 旧值 - new_value TEXT, -- JSON: 新值 - description TEXT, -- 变更描述 - session_id TEXT, -- 会话ID(批量变更关联) - reverted BOOLEAN DEFAULT 0, -- 是否已回滚 - reverted_at TIMESTAMP, -- 回滚时间 - reverted_by TEXT, -- 回滚者 - FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE + filename TEXT NOT NULL, + duration REAL, -- 视频时长(秒) + fps REAL, -- 帧率 + resolution TEXT, -- JSON: {"width": int, "height": int} + audio_transcript_id TEXT, -- 关联的音频转录ID + full_ocr_text TEXT, -- 所有帧OCR文本合并 + extracted_entities TEXT, -- JSON: 提取的实体列表 + extracted_relations TEXT, -- JSON: 提取的关系列表 + status TEXT DEFAULT 'processing', -- processing, completed, failed + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (audio_transcript_id) REFERENCES transcripts(id) ); --- Phase 7: 协作与共享 - 团队成员表 -CREATE TABLE IF NOT EXISTS team_members ( +-- 视频关键帧表 +CREATE TABLE IF NOT EXISTS video_frames ( + id TEXT PRIMARY KEY, + video_id TEXT NOT NULL, + frame_number INTEGER, + timestamp REAL, -- 时间戳(秒) + image_data BLOB, -- 帧图片数据(可选,可存储在OSS) + image_url TEXT, -- 图片URL(如果存储在OSS) + ocr_text TEXT, -- OCR识别文本 + extracted_entities TEXT, -- JSON: 该帧提取的实体 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE +); + +-- 图片表 +CREATE TABLE IF NOT EXISTS images ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, - user_id TEXT NOT NULL, -- 用户ID - user_name TEXT, -- 用户名 - user_email TEXT, -- 用户邮箱 - role TEXT DEFAULT 'viewer', -- 角色: owner, admin, editor, viewer, commenter - joined_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - invited_by TEXT, -- 邀请者 - last_active_at TIMESTAMP, -- 最后活跃时间 - permissions TEXT, -- JSON数组: 具体权限列表 - FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE, - UNIQUE(project_id, user_id) -- 每个项目每个用户只能有一条记录 + filename TEXT NOT NULL, + image_data BLOB, -- 图片数据(可选) + image_url TEXT, -- 图片URL + ocr_text TEXT, -- OCR识别文本 + description TEXT, -- 图片描述(LLM生成) + extracted_entities TEXT, -- JSON: 提取的实体列表 + extracted_relations TEXT, -- JSON: 提取的关系列表 + status TEXT DEFAULT 'processing', + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) ); --- Phase 7: 协作与共享索引 -CREATE INDEX IF NOT EXISTS idx_shares_project ON project_shares(project_id); -CREATE INDEX IF NOT EXISTS idx_shares_token ON project_shares(token); -CREATE INDEX IF NOT EXISTS idx_comments_project ON comments(project_id); -CREATE INDEX IF NOT EXISTS idx_comments_target ON comments(target_type, target_id); -CREATE INDEX IF NOT EXISTS idx_comments_parent ON comments(parent_id); -CREATE INDEX IF NOT EXISTS idx_change_history_project ON change_history(project_id); -CREATE INDEX IF NOT EXISTS idx_change_history_entity ON change_history(entity_type, entity_id); -CREATE INDEX IF NOT EXISTS idx_change_history_session ON change_history(session_id); -CREATE INDEX IF NOT EXISTS idx_team_members_project ON team_members(project_id); -CREATE INDEX IF NOT EXISTS idx_team_members_user ON team_members(user_id); +-- 多模态实体提及表 +CREATE TABLE IF NOT EXISTS multimodal_mentions ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + entity_id TEXT NOT NULL, + modality TEXT NOT NULL, -- audio, video, image, document + source_id TEXT NOT NULL, -- transcript_id, video_id, image_id + source_type TEXT NOT NULL, -- 来源类型 + position TEXT, -- JSON: 位置信息 + text_snippet TEXT, -- 提及的文本片段 + confidence REAL DEFAULT 1.0, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE +); + +-- 多模态实体关联表 +CREATE TABLE IF NOT EXISTS multimodal_entity_links ( + id TEXT PRIMARY KEY, + entity_id TEXT NOT NULL, + linked_entity_id TEXT NOT NULL, -- 关联的实体ID + link_type TEXT NOT NULL, -- same_as, related_to, part_of + confidence REAL DEFAULT 1.0, + evidence TEXT, -- 关联证据 + modalities TEXT, -- JSON: 涉及的模态列表 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE, + FOREIGN KEY (linked_entity_id) REFERENCES entities(id) ON DELETE CASCADE +); + +-- 多模态相关索引 +CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id); +CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status); +CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id); +CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id); +CREATE INDEX IF NOT EXISTS idx_images_status ON images(status); +CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_project ON multimodal_mentions(project_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_entity ON multimodal_mentions(entity_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_modality ON multimodal_mentions(modality); +CREATE INDEX IF NOT EXISTS idx_multimodal_mentions_source ON multimodal_mentions(source_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_links_entity ON multimodal_entity_links(entity_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_links_linked ON multimodal_entity_links(linked_entity_id); + +-- Phase 7 Task 7: 插件与集成相关表 + +-- 插件配置表 +CREATE TABLE IF NOT EXISTS plugins ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + plugin_type TEXT NOT NULL, -- chrome_extension, feishu_bot, dingtalk_bot, zapier, make, webdav, custom + project_id TEXT, + status TEXT DEFAULT 'active', -- active, inactive, error, pending + config TEXT, -- JSON: plugin specific configuration + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_used_at TIMESTAMP, + use_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 插件详细配置表 +CREATE TABLE IF NOT EXISTS plugin_configs ( + id TEXT PRIMARY KEY, + plugin_id TEXT NOT NULL, + config_key TEXT NOT NULL, + config_value TEXT, + is_encrypted BOOLEAN DEFAULT 0, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE, + UNIQUE(plugin_id, config_key) +); + +-- 机器人会话表 +CREATE TABLE IF NOT EXISTS bot_sessions ( + id TEXT PRIMARY KEY, + bot_type TEXT NOT NULL, -- feishu, dingtalk + session_id TEXT NOT NULL, -- 群ID或会话ID + session_name TEXT NOT NULL, + project_id TEXT, + webhook_url TEXT, + secret TEXT, -- 签名密钥 + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_message_at TIMESTAMP, + message_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- Webhook 端点表(Zapier/Make集成) +CREATE TABLE IF NOT EXISTS webhook_endpoints ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + endpoint_type TEXT NOT NULL, -- zapier, make, custom + endpoint_url TEXT NOT NULL, + project_id TEXT, + auth_type TEXT DEFAULT 'none', -- none, api_key, oauth, custom + auth_config TEXT, -- JSON: authentication configuration + trigger_events TEXT, -- JSON array: events that trigger this webhook + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_triggered_at TIMESTAMP, + trigger_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- WebDAV 同步配置表 +CREATE TABLE IF NOT EXISTS webdav_syncs ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + project_id TEXT NOT NULL, + server_url TEXT NOT NULL, + username TEXT NOT NULL, + password TEXT NOT NULL, -- 建议加密存储 + remote_path TEXT DEFAULT '/insightflow', + sync_mode TEXT DEFAULT 'bidirectional', -- bidirectional, upload_only, download_only + sync_interval INTEGER DEFAULT 3600, -- 秒 + last_sync_at TIMESTAMP, + last_sync_status TEXT DEFAULT 'pending', -- pending, success, failed + last_sync_error TEXT, + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + sync_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- Chrome 扩展令牌表 +CREATE TABLE IF NOT EXISTS chrome_extension_tokens ( + id TEXT PRIMARY KEY, + token_hash TEXT NOT NULL UNIQUE, -- SHA256 hash of the token + user_id TEXT, + project_id TEXT, + name TEXT, + permissions TEXT, -- JSON array: read, write, delete + expires_at TIMESTAMP, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_used_at TIMESTAMP, + use_count INTEGER DEFAULT 0, + is_revoked BOOLEAN DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 插件相关索引 +CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id); +CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type); +CREATE INDEX IF NOT EXISTS idx_plugins_status ON plugins(status); +CREATE INDEX IF NOT EXISTS idx_plugin_configs_plugin ON plugin_configs(plugin_id); +CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id); +CREATE INDEX IF NOT EXISTS idx_bot_sessions_type ON bot_sessions(bot_type); +CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_project ON webhook_endpoints(project_id); +CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_type ON webhook_endpoints(endpoint_type); +CREATE INDEX IF NOT EXISTS idx_webdav_syncs_project ON webdav_syncs(project_id); +CREATE INDEX IF NOT EXISTS idx_chrome_tokens_project ON chrome_extension_tokens(project_id); +CREATE INDEX IF NOT EXISTS idx_chrome_tokens_hash ON chrome_extension_tokens(token_hash); + +-- Phase 7: 插件与集成相关表 + +-- 插件表 +CREATE TABLE IF NOT EXISTS plugins ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + plugin_type TEXT NOT NULL, -- chrome_extension, feishu_bot, dingtalk_bot, slack_bot, webhook, webdav, custom + project_id TEXT, + status TEXT DEFAULT 'active', -- active, inactive, error, pending + config TEXT, -- JSON: 插件配置 + api_key TEXT UNIQUE, -- 用于认证的 API Key + api_secret TEXT, -- 用于签名验证的 Secret + webhook_url TEXT, -- 机器人 Webhook URL + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_used_at TIMESTAMP, + use_count INTEGER DEFAULT 0, + success_count INTEGER DEFAULT 0, + fail_count INTEGER DEFAULT 0, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 机器人会话表 +CREATE TABLE IF NOT EXISTS bot_sessions ( + id TEXT PRIMARY KEY, + plugin_id TEXT NOT NULL, + platform TEXT NOT NULL, -- feishu, dingtalk, slack, wechat + session_id TEXT NOT NULL, -- 平台特定的会话ID + user_id TEXT, + user_name TEXT, + project_id TEXT, -- 关联的项目ID + context TEXT, -- JSON: 会话上下文 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_message_at TIMESTAMP, + message_count INTEGER DEFAULT 0, + FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE, + FOREIGN KEY (project_id) REFERENCES projects(id), + UNIQUE(plugin_id, session_id) +); + +-- Webhook 端点表(用于 Zapier/Make 集成) +CREATE TABLE IF NOT EXISTS webhook_endpoints ( + id TEXT PRIMARY KEY, + plugin_id TEXT NOT NULL, + name TEXT NOT NULL, + endpoint_path TEXT NOT NULL UNIQUE, -- 如 /webhook/zapier/abc123 + endpoint_type TEXT NOT NULL, -- zapier, make, custom + secret TEXT, -- 用于签名验证 + allowed_events TEXT, -- JSON: 允许的事件列表 + target_project_id TEXT, -- 数据导入的目标项目 + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_triggered_at TIMESTAMP, + trigger_count INTEGER DEFAULT 0, + FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE, + FOREIGN KEY (target_project_id) REFERENCES projects(id) +); + +-- WebDAV 同步配置表 +CREATE TABLE IF NOT EXISTS webdav_syncs ( + id TEXT PRIMARY KEY, + plugin_id TEXT NOT NULL, + name TEXT NOT NULL, + server_url TEXT NOT NULL, + username TEXT NOT NULL, + password TEXT NOT NULL, -- 建议加密存储 + remote_path TEXT DEFAULT '/', + local_path TEXT DEFAULT './sync', + sync_direction TEXT DEFAULT 'bidirectional', -- upload, download, bidirectional + sync_mode TEXT DEFAULT 'manual', -- manual, realtime, scheduled + sync_schedule TEXT, -- cron expression + file_patterns TEXT, -- JSON: 文件匹配模式列表 + auto_analyze BOOLEAN DEFAULT 1, -- 同步后自动分析 + last_sync_at TIMESTAMP, + last_sync_status TEXT, + is_active BOOLEAN DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + sync_count INTEGER DEFAULT 0, + FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE +); + +-- 插件活动日志表 +CREATE TABLE IF NOT EXISTS plugin_activity_logs ( + id TEXT PRIMARY KEY, + plugin_id TEXT NOT NULL, + activity_type TEXT NOT NULL, -- message, webhook, sync, error + source TEXT NOT NULL, -- 来源标识 + details TEXT, -- JSON: 详细信息 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (plugin_id) REFERENCES plugins(id) ON DELETE CASCADE +); + +-- 插件相关索引 +CREATE INDEX IF NOT EXISTS idx_plugins_project ON plugins(project_id); +CREATE INDEX IF NOT EXISTS idx_plugins_type ON plugins(plugin_type); +CREATE INDEX IF NOT EXISTS idx_plugins_api_key ON plugins(api_key); +CREATE INDEX IF NOT EXISTS idx_bot_sessions_plugin ON bot_sessions(plugin_id); +CREATE INDEX IF NOT EXISTS idx_bot_sessions_project ON bot_sessions(project_id); +CREATE INDEX IF NOT EXISTS idx_webhook_endpoints_plugin ON webhook_endpoints(plugin_id); +CREATE INDEX IF NOT EXISTS idx_webdav_syncs_plugin ON webdav_syncs(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugin_logs_plugin ON plugin_activity_logs(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugin_logs_type ON plugin_activity_logs(activity_type); +CREATE INDEX IF NOT EXISTS idx_plugin_logs_created ON plugin_activity_logs(created_at); + +-- ============================================ +-- Phase 7 Task 3: 数据安全与合规 +-- ============================================ + +-- 审计日志表 +CREATE TABLE IF NOT EXISTS audit_logs ( + id TEXT PRIMARY KEY, + action_type TEXT NOT NULL, -- create, read, update, delete, login, export, etc. + user_id TEXT, + user_ip TEXT, + user_agent TEXT, + resource_type TEXT, -- project, entity, transcript, api_key, etc. + resource_id TEXT, + action_details TEXT, -- JSON: 详细操作信息 + before_value TEXT, -- 变更前的值 + after_value TEXT, -- 变更后的值 + success INTEGER DEFAULT 1, -- 0 = 失败, 1 = 成功 + error_message TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- 加密配置表 +CREATE TABLE IF NOT EXISTS encryption_configs ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + is_enabled INTEGER DEFAULT 0, + encryption_type TEXT DEFAULT 'aes-256-gcm', -- aes-256-gcm, chacha20-poly1305 + key_derivation TEXT DEFAULT 'pbkdf2', -- pbkdf2, argon2 + master_key_hash TEXT, -- 主密钥哈希(用于验证) + salt TEXT, -- 密钥派生盐值 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 脱敏规则表 +CREATE TABLE IF NOT EXISTS masking_rules ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + name TEXT NOT NULL, + rule_type TEXT NOT NULL, -- phone, email, id_card, bank_card, name, address, custom + pattern TEXT NOT NULL, -- 正则表达式 + replacement TEXT NOT NULL, -- 替换模板 + is_active INTEGER DEFAULT 1, + priority INTEGER DEFAULT 0, + description TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 数据访问策略表 +CREATE TABLE IF NOT EXISTS data_access_policies ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + name TEXT NOT NULL, + description TEXT, + allowed_users TEXT, -- JSON array: 允许访问的用户ID列表 + allowed_roles TEXT, -- JSON array: 允许的角色列表 + allowed_ips TEXT, -- JSON array: 允许的IP模式列表 + time_restrictions TEXT, -- JSON: {"start_time": "09:00", "end_time": "18:00", "days_of_week": [0,1,2,3,4]} + max_access_count INTEGER, -- 最大访问次数限制 + require_approval INTEGER DEFAULT 0, -- 是否需要审批 + is_active INTEGER DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) +); + +-- 访问请求表(用于需要审批的访问) +CREATE TABLE IF NOT EXISTS access_requests ( + id TEXT PRIMARY KEY, + policy_id TEXT NOT NULL, + user_id TEXT NOT NULL, + request_reason TEXT, + status TEXT DEFAULT 'pending', -- pending, approved, rejected, expired + approved_by TEXT, + approved_at TIMESTAMP, + expires_at TIMESTAMP, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (policy_id) REFERENCES data_access_policies(id) +); + +-- 数据安全相关索引 +CREATE INDEX IF NOT EXISTS idx_audit_logs_user ON audit_logs(user_id); +CREATE INDEX IF NOT EXISTS idx_audit_logs_resource ON audit_logs(resource_type, resource_id); +CREATE INDEX IF NOT EXISTS idx_audit_logs_action ON audit_logs(action_type); +CREATE INDEX IF NOT EXISTS idx_audit_logs_created ON audit_logs(created_at); +CREATE INDEX IF NOT EXISTS idx_encryption_project ON encryption_configs(project_id); +CREATE INDEX IF NOT EXISTS idx_masking_project ON masking_rules(project_id); +CREATE INDEX IF NOT EXISTS idx_access_policy_project ON data_access_policies(project_id); +CREATE INDEX IF NOT EXISTS idx_access_requests_policy ON access_requests(policy_id); +CREATE INDEX IF NOT EXISTS idx_access_requests_user ON access_requests(user_id); diff --git a/backend/schema_multimodal.sql b/backend/schema_multimodal.sql new file mode 100644 index 0000000..796edfc --- /dev/null +++ b/backend/schema_multimodal.sql @@ -0,0 +1,104 @@ +-- Phase 7: 多模态支持相关表 + +-- 视频表 +CREATE TABLE IF NOT EXISTS videos ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + filename TEXT NOT NULL, + file_path TEXT, + duration REAL, -- 视频时长(秒) + width INTEGER, -- 视频宽度 + height INTEGER, -- 视频高度 + fps REAL, -- 帧率 + audio_extracted INTEGER DEFAULT 0, -- 是否已提取音频 + audio_path TEXT, -- 提取的音频文件路径 + transcript_id TEXT, -- 关联的转录记录ID + status TEXT DEFAULT 'pending', -- pending, processing, completed, failed + error_message TEXT, + metadata TEXT, -- JSON: 其他元数据 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (transcript_id) REFERENCES transcripts(id) +); + +-- 视频关键帧表 +CREATE TABLE IF NOT EXISTS video_frames ( + id TEXT PRIMARY KEY, + video_id TEXT NOT NULL, + frame_number INTEGER NOT NULL, + timestamp REAL NOT NULL, -- 帧时间戳(秒) + frame_path TEXT NOT NULL, -- 帧图片路径 + ocr_text TEXT, -- OCR识别的文字 + ocr_confidence REAL, -- OCR置信度 + entities_detected TEXT, -- JSON: 检测到的实体 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE +); + +-- 图片表 +CREATE TABLE IF NOT EXISTS images ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + filename TEXT NOT NULL, + file_path TEXT, + image_type TEXT, -- whiteboard, ppt, handwritten, screenshot, other + width INTEGER, + height INTEGER, + ocr_text TEXT, -- OCR识别的文字 + description TEXT, -- 图片描述(LLM生成) + entities_detected TEXT, -- JSON: 检测到的实体 + relations_detected TEXT, -- JSON: 检测到的关系 + transcript_id TEXT, -- 关联的转录记录ID(可选) + status TEXT DEFAULT 'pending', -- pending, processing, completed, failed + error_message TEXT, + metadata TEXT, -- JSON: 其他元数据 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (transcript_id) REFERENCES transcripts(id) +); + +-- 多模态实体关联表 +CREATE TABLE IF NOT EXISTS multimodal_entities ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + entity_id TEXT NOT NULL, -- 关联的实体ID + source_type TEXT NOT NULL, -- audio, video, image, document + source_id TEXT NOT NULL, -- 来源ID(transcript_id, video_id, image_id) + mention_context TEXT, -- 提及上下文 + confidence REAL DEFAULT 1.0, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (entity_id) REFERENCES entities(id), + UNIQUE(entity_id, source_type, source_id) +); + +-- 多模态实体对齐表(跨模态实体关联) +CREATE TABLE IF NOT EXISTS multimodal_entity_links ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + source_entity_id TEXT NOT NULL, -- 源实体ID + target_entity_id TEXT NOT NULL, -- 目标实体ID + link_type TEXT NOT NULL, -- same_as, related_to, part_of + source_modality TEXT NOT NULL, -- audio, video, image, document + target_modality TEXT NOT NULL, -- audio, video, image, document + confidence REAL DEFAULT 1.0, + evidence TEXT, -- 关联证据 + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id), + FOREIGN KEY (source_entity_id) REFERENCES entities(id), + FOREIGN KEY (target_entity_id) REFERENCES entities(id) +); + +-- 创建索引 +CREATE INDEX IF NOT EXISTS idx_videos_project ON videos(project_id); +CREATE INDEX IF NOT EXISTS idx_videos_status ON videos(status); +CREATE INDEX IF NOT EXISTS idx_video_frames_video ON video_frames(video_id); +CREATE INDEX IF NOT EXISTS idx_video_frames_timestamp ON video_frames(timestamp); +CREATE INDEX IF NOT EXISTS idx_images_project ON images(project_id); +CREATE INDEX IF NOT EXISTS idx_images_type ON images(image_type); +CREATE INDEX IF NOT EXISTS idx_images_status ON images(status); +CREATE INDEX IF NOT EXISTS idx_multimodal_entities_project ON multimodal_entities(project_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_entities_entity ON multimodal_entities(entity_id); +CREATE INDEX IF NOT EXISTS idx_multimodal_entity_links_project ON multimodal_entity_links(project_id); diff --git a/backend/security_manager.py b/backend/security_manager.py new file mode 100644 index 0000000..ab2d60e --- /dev/null +++ b/backend/security_manager.py @@ -0,0 +1,1232 @@ +""" +InsightFlow Phase 7 Task 3: 数据安全与合规模块 +Security Manager - 端到端加密、数据脱敏、审计日志 +""" + +import os +import json +import hashlib +import secrets +import base64 +import re +from datetime import datetime, timedelta +from typing import List, Optional, Dict, Any, Tuple +from dataclasses import dataclass, field, asdict +from enum import Enum +import sqlite3 + +# 加密相关 +try: + from cryptography.fernet import Fernet + from cryptography.hazmat.primitives import hashes + from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC + CRYPTO_AVAILABLE = True +except ImportError: + CRYPTO_AVAILABLE = False + print("Warning: cryptography not available, encryption features disabled") + + +class AuditActionType(Enum): + """审计动作类型""" + CREATE = "create" + READ = "read" + UPDATE = "update" + DELETE = "delete" + LOGIN = "login" + LOGOUT = "logout" + EXPORT = "export" + IMPORT = "import" + SHARE = "share" + PERMISSION_CHANGE = "permission_change" + ENCRYPTION_ENABLE = "encryption_enable" + ENCRYPTION_DISABLE = "encryption_disable" + DATA_MASKING = "data_masking" + API_KEY_CREATE = "api_key_create" + API_KEY_REVOKE = "api_key_revoke" + WORKFLOW_TRIGGER = "workflow_trigger" + WEBHOOK_SEND = "webhook_send" + BOT_MESSAGE = "bot_message" + + +class DataSensitivityLevel(Enum): + """数据敏感度级别""" + PUBLIC = "public" # 公开 + INTERNAL = "internal" # 内部 + CONFIDENTIAL = "confidential" # 机密 + SECRET = "secret" # 绝密 + + +class MaskingRuleType(Enum): + """脱敏规则类型""" + PHONE = "phone" # 手机号 + EMAIL = "email" # 邮箱 + ID_CARD = "id_card" # 身份证号 + BANK_CARD = "bank_card" # 银行卡号 + NAME = "name" # 姓名 + ADDRESS = "address" # 地址 + CUSTOM = "custom" # 自定义 + + +@dataclass +class AuditLog: + """审计日志条目""" + id: str + action_type: str + user_id: Optional[str] = None + user_ip: Optional[str] = None + user_agent: Optional[str] = None + resource_type: Optional[str] = None # project, entity, transcript, etc. + resource_id: Optional[str] = None + action_details: Optional[str] = None # JSON string + before_value: Optional[str] = None + after_value: Optional[str] = None + success: bool = True + error_message: Optional[str] = None + created_at: str = field(default_factory=lambda: datetime.now().isoformat()) + + def to_dict(self) -> Dict[str, Any]: + return asdict(self) + + +@dataclass +class EncryptionConfig: + """加密配置""" + id: str + project_id: str + is_enabled: bool = False + encryption_type: str = "aes-256-gcm" # aes-256-gcm, chacha20-poly1305 + key_derivation: str = "pbkdf2" # pbkdf2, argon2 + master_key_hash: Optional[str] = None # 主密钥哈希(用于验证) + salt: Optional[str] = None + created_at: str = field(default_factory=lambda: datetime.now().isoformat()) + updated_at: str = field(default_factory=lambda: datetime.now().isoformat()) + + def to_dict(self) -> Dict[str, Any]: + return asdict(self) + + +@dataclass +class MaskingRule: + """脱敏规则""" + id: str + project_id: str + name: str + rule_type: str # phone, email, id_card, bank_card, name, address, custom + pattern: str # 正则表达式 + replacement: str # 替换模板,如 "****" + is_active: bool = True + priority: int = 0 + description: Optional[str] = None + created_at: str = field(default_factory=lambda: datetime.now().isoformat()) + updated_at: str = field(default_factory=lambda: datetime.now().isoformat()) + + def to_dict(self) -> Dict[str, Any]: + return asdict(self) + + +@dataclass +class DataAccessPolicy: + """数据访问策略""" + id: str + project_id: str + name: str + description: Optional[str] = None + allowed_users: Optional[str] = None # JSON array of user IDs + allowed_roles: Optional[str] = None # JSON array of roles + allowed_ips: Optional[str] = None # JSON array of IP patterns + time_restrictions: Optional[str] = None # JSON: {"start_time": "09:00", "end_time": "18:00"} + max_access_count: Optional[int] = None # 最大访问次数 + require_approval: bool = False + is_active: bool = True + created_at: str = field(default_factory=lambda: datetime.now().isoformat()) + updated_at: str = field(default_factory=lambda: datetime.now().isoformat()) + + def to_dict(self) -> Dict[str, Any]: + return asdict(self) + + +@dataclass +class AccessRequest: + """访问请求(用于需要审批的访问)""" + id: str + policy_id: str + user_id: str + request_reason: Optional[str] = None + status: str = "pending" # pending, approved, rejected, expired + approved_by: Optional[str] = None + approved_at: Optional[str] = None + expires_at: Optional[str] = None + created_at: str = field(default_factory=lambda: datetime.now().isoformat()) + + def to_dict(self) -> Dict[str, Any]: + return asdict(self) + + +class SecurityManager: + """安全管理器""" + + # 预定义脱敏规则 + DEFAULT_MASKING_RULES = { + MaskingRuleType.PHONE: { + "pattern": r"(\d{3})\d{4}(\d{4})", + "replacement": r"\1****\2" + }, + MaskingRuleType.EMAIL: { + "pattern": r"(\w{1,3})\w+(@\w+\.\w+)", + "replacement": r"\1***\2" + }, + MaskingRuleType.ID_CARD: { + "pattern": r"(\d{6})\d{8}(\d{4})", + "replacement": r"\1********\2" + }, + MaskingRuleType.BANK_CARD: { + "pattern": r"(\d{4})\d+(\d{4})", + "replacement": r"\1 **** **** \2" + }, + MaskingRuleType.NAME: { + "pattern": r"([\u4e00-\u9fa5])[\u4e00-\u9fa5]+", + "replacement": r"\1**" + }, + MaskingRuleType.ADDRESS: { + "pattern": r"([\u4e00-\u9fa5]{2,})([\u4e00-\u9fa5]+路|街|巷|号)(.+)", + "replacement": r"\1\2***" + } + } + + def __init__(self, db_path: str = "insightflow.db"): + self.db_path = db_path + self._local = {} + self._init_db() + + def _init_db(self): + """初始化数据库表""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + # 审计日志表 + cursor.execute(""" + CREATE TABLE IF NOT EXISTS audit_logs ( + id TEXT PRIMARY KEY, + action_type TEXT NOT NULL, + user_id TEXT, + user_ip TEXT, + user_agent TEXT, + resource_type TEXT, + resource_id TEXT, + action_details TEXT, + before_value TEXT, + after_value TEXT, + success INTEGER DEFAULT 1, + error_message TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP + ) + """) + + # 加密配置表 + cursor.execute(""" + CREATE TABLE IF NOT EXISTS encryption_configs ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + is_enabled INTEGER DEFAULT 0, + encryption_type TEXT DEFAULT 'aes-256-gcm', + key_derivation TEXT DEFAULT 'pbkdf2', + master_key_hash TEXT, + salt TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) + ) + """) + + # 脱敏规则表 + cursor.execute(""" + CREATE TABLE IF NOT EXISTS masking_rules ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + name TEXT NOT NULL, + rule_type TEXT NOT NULL, + pattern TEXT NOT NULL, + replacement TEXT NOT NULL, + is_active INTEGER DEFAULT 1, + priority INTEGER DEFAULT 0, + description TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) + ) + """) + + # 数据访问策略表 + cursor.execute(""" + CREATE TABLE IF NOT EXISTS data_access_policies ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + name TEXT NOT NULL, + description TEXT, + allowed_users TEXT, + allowed_roles TEXT, + allowed_ips TEXT, + time_restrictions TEXT, + max_access_count INTEGER, + require_approval INTEGER DEFAULT 0, + is_active INTEGER DEFAULT 1, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (project_id) REFERENCES projects(id) + ) + """) + + # 访问请求表 + cursor.execute(""" + CREATE TABLE IF NOT EXISTS access_requests ( + id TEXT PRIMARY KEY, + policy_id TEXT NOT NULL, + user_id TEXT NOT NULL, + request_reason TEXT, + status TEXT DEFAULT 'pending', + approved_by TEXT, + approved_at TIMESTAMP, + expires_at TIMESTAMP, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + FOREIGN KEY (policy_id) REFERENCES data_access_policies(id) + ) + """) + + # 创建索引 + cursor.execute("CREATE INDEX IF NOT EXISTS idx_audit_logs_user ON audit_logs(user_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_audit_logs_resource ON audit_logs(resource_type, resource_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_audit_logs_action ON audit_logs(action_type)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_audit_logs_created ON audit_logs(created_at)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_encryption_project ON encryption_configs(project_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_masking_project ON masking_rules(project_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_access_policy_project ON data_access_policies(project_id)") + + conn.commit() + conn.close() + + def _generate_id(self) -> str: + """生成唯一ID""" + return hashlib.sha256( + f"{datetime.now().isoformat()}{secrets.token_hex(16)}".encode() + ).hexdigest()[:32] + + # ==================== 审计日志 ==================== + + def log_audit( + self, + action_type: AuditActionType, + user_id: Optional[str] = None, + user_ip: Optional[str] = None, + user_agent: Optional[str] = None, + resource_type: Optional[str] = None, + resource_id: Optional[str] = None, + action_details: Optional[Dict] = None, + before_value: Optional[str] = None, + after_value: Optional[str] = None, + success: bool = True, + error_message: Optional[str] = None + ) -> AuditLog: + """记录审计日志""" + log = AuditLog( + id=self._generate_id(), + action_type=action_type.value, + user_id=user_id, + user_ip=user_ip, + user_agent=user_agent, + resource_type=resource_type, + resource_id=resource_id, + action_details=json.dumps(action_details) if action_details else None, + before_value=before_value, + after_value=after_value, + success=success, + error_message=error_message + ) + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + cursor.execute(""" + INSERT INTO audit_logs + (id, action_type, user_id, user_ip, user_agent, resource_type, resource_id, + action_details, before_value, after_value, success, error_message, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + log.id, log.action_type, log.user_id, log.user_ip, log.user_agent, + log.resource_type, log.resource_id, log.action_details, + log.before_value, log.after_value, int(log.success), + log.error_message, log.created_at + )) + conn.commit() + conn.close() + + return log + + def get_audit_logs( + self, + user_id: Optional[str] = None, + resource_type: Optional[str] = None, + resource_id: Optional[str] = None, + action_type: Optional[str] = None, + start_time: Optional[str] = None, + end_time: Optional[str] = None, + success: Optional[bool] = None, + limit: int = 100, + offset: int = 0 + ) -> List[AuditLog]: + """查询审计日志""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + query = "SELECT * FROM audit_logs WHERE 1=1" + params = [] + + if user_id: + query += " AND user_id = ?" + params.append(user_id) + if resource_type: + query += " AND resource_type = ?" + params.append(resource_type) + if resource_id: + query += " AND resource_id = ?" + params.append(resource_id) + if action_type: + query += " AND action_type = ?" + params.append(action_type) + if start_time: + query += " AND created_at >= ?" + params.append(start_time) + if end_time: + query += " AND created_at <= ?" + params.append(end_time) + if success is not None: + query += " AND success = ?" + params.append(int(success)) + + query += " ORDER BY created_at DESC LIMIT ? OFFSET ?" + params.extend([limit, offset]) + + cursor.execute(query, params) + rows = cursor.fetchall() + conn.close() + + logs = [] + for row in cursor.description: + col_names = [desc[0] for desc in cursor.description] + break + else: + return logs + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + cursor.execute(query, params) + rows = cursor.fetchall() + + for row in rows: + log = AuditLog( + id=row[0], + action_type=row[1], + user_id=row[2], + user_ip=row[3], + user_agent=row[4], + resource_type=row[5], + resource_id=row[6], + action_details=row[7], + before_value=row[8], + after_value=row[9], + success=bool(row[10]), + error_message=row[11], + created_at=row[12] + ) + logs.append(log) + + conn.close() + return logs + + def get_audit_stats( + self, + start_time: Optional[str] = None, + end_time: Optional[str] = None + ) -> Dict[str, Any]: + """获取审计统计""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + query = "SELECT action_type, success, COUNT(*) FROM audit_logs WHERE 1=1" + params = [] + + if start_time: + query += " AND created_at >= ?" + params.append(start_time) + if end_time: + query += " AND created_at <= ?" + params.append(end_time) + + query += " GROUP BY action_type, success" + + cursor.execute(query, params) + rows = cursor.fetchall() + + stats = { + "total_actions": 0, + "success_count": 0, + "failure_count": 0, + "action_breakdown": {} + } + + for action_type, success, count in rows: + stats["total_actions"] += count + if success: + stats["success_count"] += count + else: + stats["failure_count"] += count + + if action_type not in stats["action_breakdown"]: + stats["action_breakdown"][action_type] = {"success": 0, "failure": 0} + + if success: + stats["action_breakdown"][action_type]["success"] += count + else: + stats["action_breakdown"][action_type]["failure"] += count + + conn.close() + return stats + + # ==================== 端到端加密 ==================== + + def _derive_key(self, password: str, salt: bytes) -> bytes: + """从密码派生密钥""" + if not CRYPTO_AVAILABLE: + raise RuntimeError("cryptography library not available") + + kdf = PBKDF2HMAC( + algorithm=hashes.SHA256(), + length=32, + salt=salt, + iterations=100000, + ) + return base64.urlsafe_b64encode(kdf.derive(password.encode())) + + def enable_encryption( + self, + project_id: str, + master_password: str + ) -> EncryptionConfig: + """启用项目加密""" + if not CRYPTO_AVAILABLE: + raise RuntimeError("cryptography library not available") + + # 生成盐值 + salt = secrets.token_hex(16) + + # 派生密钥并哈希(用于验证) + key = self._derive_key(master_password, salt.encode()) + key_hash = hashlib.sha256(key).hexdigest() + + config = EncryptionConfig( + id=self._generate_id(), + project_id=project_id, + is_enabled=True, + encryption_type="aes-256-gcm", + key_derivation="pbkdf2", + master_key_hash=key_hash, + salt=salt + ) + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + # 检查是否已存在配置 + cursor.execute( + "SELECT id FROM encryption_configs WHERE project_id = ?", + (project_id,) + ) + existing = cursor.fetchone() + + if existing: + cursor.execute(""" + UPDATE encryption_configs + SET is_enabled = 1, encryption_type = ?, key_derivation = ?, + master_key_hash = ?, salt = ?, updated_at = ? + WHERE project_id = ? + """, ( + config.encryption_type, config.key_derivation, + config.master_key_hash, config.salt, + config.updated_at, project_id + )) + config.id = existing[0] + else: + cursor.execute(""" + INSERT INTO encryption_configs + (id, project_id, is_enabled, encryption_type, key_derivation, + master_key_hash, salt, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + config.id, config.project_id, int(config.is_enabled), + config.encryption_type, config.key_derivation, + config.master_key_hash, config.salt, + config.created_at, config.updated_at + )) + + conn.commit() + conn.close() + + # 记录审计日志 + self.log_audit( + action_type=AuditActionType.ENCRYPTION_ENABLE, + resource_type="project", + resource_id=project_id, + action_details={"encryption_type": config.encryption_type} + ) + + return config + + def disable_encryption( + self, + project_id: str, + master_password: str + ) -> bool: + """禁用项目加密""" + # 验证密码 + if not self.verify_encryption_password(project_id, master_password): + return False + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + UPDATE encryption_configs + SET is_enabled = 0, updated_at = ? + WHERE project_id = ? + """, (datetime.now().isoformat(), project_id)) + + conn.commit() + conn.close() + + # 记录审计日志 + self.log_audit( + action_type=AuditActionType.ENCRYPTION_DISABLE, + resource_type="project", + resource_id=project_id + ) + + return True + + def verify_encryption_password( + self, + project_id: str, + password: str + ) -> bool: + """验证加密密码""" + if not CRYPTO_AVAILABLE: + return False + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute( + "SELECT master_key_hash, salt FROM encryption_configs WHERE project_id = ?", + (project_id,) + ) + row = cursor.fetchone() + conn.close() + + if not row: + return False + + stored_hash, salt = row + key = self._derive_key(password, salt.encode()) + key_hash = hashlib.sha256(key).hexdigest() + + return key_hash == stored_hash + + def get_encryption_config(self, project_id: str) -> Optional[EncryptionConfig]: + """获取加密配置""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute( + "SELECT * FROM encryption_configs WHERE project_id = ?", + (project_id,) + ) + row = cursor.fetchone() + conn.close() + + if not row: + return None + + return EncryptionConfig( + id=row[0], + project_id=row[1], + is_enabled=bool(row[2]), + encryption_type=row[3], + key_derivation=row[4], + master_key_hash=row[5], + salt=row[6], + created_at=row[7], + updated_at=row[8] + ) + + def encrypt_data( + self, + data: str, + password: str, + salt: Optional[str] = None + ) -> Tuple[str, str]: + """加密数据""" + if not CRYPTO_AVAILABLE: + raise RuntimeError("cryptography library not available") + + if salt is None: + salt = secrets.token_hex(16) + + key = self._derive_key(password, salt.encode()) + f = Fernet(key) + encrypted = f.encrypt(data.encode()) + + return base64.b64encode(encrypted).decode(), salt + + def decrypt_data( + self, + encrypted_data: str, + password: str, + salt: str + ) -> str: + """解密数据""" + if not CRYPTO_AVAILABLE: + raise RuntimeError("cryptography library not available") + + key = self._derive_key(password, salt.encode()) + f = Fernet(key) + decrypted = f.decrypt(base64.b64decode(encrypted_data)) + + return decrypted.decode() + + # ==================== 数据脱敏 ==================== + + def create_masking_rule( + self, + project_id: str, + name: str, + rule_type: MaskingRuleType, + pattern: Optional[str] = None, + replacement: Optional[str] = None, + description: Optional[str] = None, + priority: int = 0 + ) -> MaskingRule: + """创建脱敏规则""" + # 使用预定义规则或自定义规则 + if rule_type in self.DEFAULT_MASKING_RULES and not pattern: + default = self.DEFAULT_MASKING_RULES[rule_type] + pattern = default["pattern"] + replacement = replacement or default["replacement"] + + rule = MaskingRule( + id=self._generate_id(), + project_id=project_id, + name=name, + rule_type=rule_type.value, + pattern=pattern or "", + replacement=replacement or "****", + description=description, + priority=priority + ) + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + INSERT INTO masking_rules + (id, project_id, name, rule_type, pattern, replacement, + is_active, priority, description, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + rule.id, rule.project_id, rule.name, rule.rule_type, + rule.pattern, rule.replacement, int(rule.is_active), + rule.priority, rule.description, rule.created_at, rule.updated_at + )) + + conn.commit() + conn.close() + + # 记录审计日志 + self.log_audit( + action_type=AuditActionType.DATA_MASKING, + resource_type="project", + resource_id=project_id, + action_details={"action": "create_rule", "rule_name": name} + ) + + return rule + + def get_masking_rules( + self, + project_id: str, + active_only: bool = True + ) -> List[MaskingRule]: + """获取脱敏规则""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + query = "SELECT * FROM masking_rules WHERE project_id = ?" + params = [project_id] + + if active_only: + query += " AND is_active = 1" + + query += " ORDER BY priority DESC" + + cursor.execute(query, params) + rows = cursor.fetchall() + conn.close() + + rules = [] + for row in rows: + rules.append(MaskingRule( + id=row[0], + project_id=row[1], + name=row[2], + rule_type=row[3], + pattern=row[4], + replacement=row[5], + is_active=bool(row[6]), + priority=row[7], + description=row[8], + created_at=row[9], + updated_at=row[10] + )) + + return rules + + def update_masking_rule( + self, + rule_id: str, + **kwargs + ) -> Optional[MaskingRule]: + """更新脱敏规则""" + allowed_fields = ["name", "pattern", "replacement", "is_active", "priority", "description"] + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + set_clauses = [] + params = [] + + for key, value in kwargs.items(): + if key in allowed_fields: + set_clauses.append(f"{key} = ?") + params.append(int(value) if key == "is_active" else value) + + if not set_clauses: + conn.close() + return None + + set_clauses.append("updated_at = ?") + params.append(datetime.now().isoformat()) + params.append(rule_id) + + cursor.execute(f""" + UPDATE masking_rules + SET {', '.join(set_clauses)} + WHERE id = ? + """, params) + + conn.commit() + conn.close() + + # 获取更新后的规则 + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + cursor.execute("SELECT * FROM masking_rules WHERE id = ?", (rule_id,)) + row = cursor.fetchone() + conn.close() + + if not row: + return None + + return MaskingRule( + id=row[0], + project_id=row[1], + name=row[2], + rule_type=row[3], + pattern=row[4], + replacement=row[5], + is_active=bool(row[6]), + priority=row[7], + description=row[8], + created_at=row[9], + updated_at=row[10] + ) + + def delete_masking_rule(self, rule_id: str) -> bool: + """删除脱敏规则""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute("DELETE FROM masking_rules WHERE id = ?", (rule_id,)) + + success = cursor.rowcount > 0 + conn.commit() + conn.close() + + return success + + def apply_masking( + self, + text: str, + project_id: str, + rule_types: Optional[List[MaskingRuleType]] = None + ) -> str: + """应用脱敏规则到文本""" + rules = self.get_masking_rules(project_id) + + if not rules: + return text + + masked_text = text + + for rule in rules: + # 如果指定了规则类型,只应用指定类型的规则 + if rule_types and MaskingRuleType(rule.rule_type) not in rule_types: + continue + + try: + masked_text = re.sub( + rule.pattern, + rule.replacement, + masked_text + ) + except re.error: + # 忽略无效的正则表达式 + continue + + return masked_text + + def apply_masking_to_entity( + self, + entity_data: Dict[str, Any], + project_id: str + ) -> Dict[str, Any]: + """对实体数据应用脱敏""" + masked_data = entity_data.copy() + + # 对可能包含敏感信息的字段进行脱敏 + sensitive_fields = ["name", "definition", "description", "value"] + + for field in sensitive_fields: + if field in masked_data and isinstance(masked_data[field], str): + masked_data[field] = self.apply_masking(masked_data[field], project_id) + + return masked_data + + # ==================== 数据访问策略 ==================== + + def create_access_policy( + self, + project_id: str, + name: str, + description: Optional[str] = None, + allowed_users: Optional[List[str]] = None, + allowed_roles: Optional[List[str]] = None, + allowed_ips: Optional[List[str]] = None, + time_restrictions: Optional[Dict] = None, + max_access_count: Optional[int] = None, + require_approval: bool = False + ) -> DataAccessPolicy: + """创建数据访问策略""" + policy = DataAccessPolicy( + id=self._generate_id(), + project_id=project_id, + name=name, + description=description, + allowed_users=json.dumps(allowed_users) if allowed_users else None, + allowed_roles=json.dumps(allowed_roles) if allowed_roles else None, + allowed_ips=json.dumps(allowed_ips) if allowed_ips else None, + time_restrictions=json.dumps(time_restrictions) if time_restrictions else None, + max_access_count=max_access_count, + require_approval=require_approval + ) + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + INSERT INTO data_access_policies + (id, project_id, name, description, allowed_users, allowed_roles, + allowed_ips, time_restrictions, max_access_count, require_approval, + is_active, created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, ( + policy.id, policy.project_id, policy.name, policy.description, + policy.allowed_users, policy.allowed_roles, policy.allowed_ips, + policy.time_restrictions, policy.max_access_count, + int(policy.require_approval), int(policy.is_active), + policy.created_at, policy.updated_at + )) + + conn.commit() + conn.close() + + return policy + + def get_access_policies( + self, + project_id: str, + active_only: bool = True + ) -> List[DataAccessPolicy]: + """获取数据访问策略""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + query = "SELECT * FROM data_access_policies WHERE project_id = ?" + params = [project_id] + + if active_only: + query += " AND is_active = 1" + + cursor.execute(query, params) + rows = cursor.fetchall() + conn.close() + + policies = [] + for row in rows: + policies.append(DataAccessPolicy( + id=row[0], + project_id=row[1], + name=row[2], + description=row[3], + allowed_users=row[4], + allowed_roles=row[5], + allowed_ips=row[6], + time_restrictions=row[7], + max_access_count=row[8], + require_approval=bool(row[9]), + is_active=bool(row[10]), + created_at=row[11], + updated_at=row[12] + )) + + return policies + + def check_access_permission( + self, + policy_id: str, + user_id: str, + user_ip: Optional[str] = None + ) -> Tuple[bool, Optional[str]]: + """检查访问权限""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute( + "SELECT * FROM data_access_policies WHERE id = ? AND is_active = 1", + (policy_id,) + ) + row = cursor.fetchone() + conn.close() + + if not row: + return False, "Policy not found or inactive" + + policy = DataAccessPolicy( + id=row[0], + project_id=row[1], + name=row[2], + description=row[3], + allowed_users=row[4], + allowed_roles=row[5], + allowed_ips=row[6], + time_restrictions=row[7], + max_access_count=row[8], + require_approval=bool(row[9]), + is_active=bool(row[10]), + created_at=row[11], + updated_at=row[12] + ) + + # 检查用户白名单 + if policy.allowed_users: + allowed = json.loads(policy.allowed_users) + if user_id not in allowed: + return False, "User not in allowed list" + + # 检查IP白名单 + if policy.allowed_ips and user_ip: + allowed_ips = json.loads(policy.allowed_ips) + ip_allowed = False + for ip_pattern in allowed_ips: + if self._match_ip_pattern(user_ip, ip_pattern): + ip_allowed = True + break + if not ip_allowed: + return False, "IP not in allowed list" + + # 检查时间限制 + if policy.time_restrictions: + restrictions = json.loads(policy.time_restrictions) + now = datetime.now() + + if "start_time" in restrictions and "end_time" in restrictions: + current_time = now.strftime("%H:%M") + if not (restrictions["start_time"] <= current_time <= restrictions["end_time"]): + return False, "Access not allowed at this time" + + if "days_of_week" in restrictions: + if now.weekday() not in restrictions["days_of_week"]: + return False, "Access not allowed on this day" + + # 检查是否需要审批 + if policy.require_approval: + # 检查是否有有效的访问请求 + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + SELECT * FROM access_requests + WHERE policy_id = ? AND user_id = ? AND status = 'approved' + AND (expires_at IS NULL OR expires_at > ?) + """, (policy_id, user_id, datetime.now().isoformat())) + + request = cursor.fetchone() + conn.close() + + if not request: + return False, "Access requires approval" + + return True, None + + def _match_ip_pattern(self, ip: str, pattern: str) -> bool: + """匹配IP模式(支持CIDR)""" + import ipaddress + + try: + if "/" in pattern: + # CIDR 表示法 + network = ipaddress.ip_network(pattern, strict=False) + return ipaddress.ip_address(ip) in network + else: + # 精确匹配 + return ip == pattern + except ValueError: + return ip == pattern + + def create_access_request( + self, + policy_id: str, + user_id: str, + request_reason: Optional[str] = None, + expires_hours: int = 24 + ) -> AccessRequest: + """创建访问请求""" + request = AccessRequest( + id=self._generate_id(), + policy_id=policy_id, + user_id=user_id, + request_reason=request_reason, + expires_at=(datetime.now() + timedelta(hours=expires_hours)).isoformat() + ) + + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + INSERT INTO access_requests + (id, policy_id, user_id, request_reason, status, expires_at, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?) + """, ( + request.id, request.policy_id, request.user_id, + request.request_reason, request.status, request.expires_at, + request.created_at + )) + + conn.commit() + conn.close() + + return request + + def approve_access_request( + self, + request_id: str, + approved_by: str, + expires_hours: int = 24 + ) -> Optional[AccessRequest]: + """批准访问请求""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + expires_at = (datetime.now() + timedelta(hours=expires_hours)).isoformat() + approved_at = datetime.now().isoformat() + + cursor.execute(""" + UPDATE access_requests + SET status = 'approved', approved_by = ?, approved_at = ?, expires_at = ? + WHERE id = ? + """, (approved_by, approved_at, expires_at, request_id)) + + conn.commit() + + # 获取更新后的请求 + cursor.execute("SELECT * FROM access_requests WHERE id = ?", (request_id,)) + row = cursor.fetchone() + conn.close() + + if not row: + return None + + return AccessRequest( + id=row[0], + policy_id=row[1], + user_id=row[2], + request_reason=row[3], + status=row[4], + approved_by=row[5], + approved_at=row[6], + expires_at=row[7], + created_at=row[8] + ) + + def reject_access_request( + self, + request_id: str, + rejected_by: str + ) -> Optional[AccessRequest]: + """拒绝访问请求""" + conn = sqlite3.connect(self.db_path) + cursor = conn.cursor() + + cursor.execute(""" + UPDATE access_requests + SET status = 'rejected', approved_by = ? + WHERE id = ? + """, (rejected_by, request_id)) + + conn.commit() + + cursor.execute("SELECT * FROM access_requests WHERE id = ?", (request_id,)) + row = cursor.fetchone() + conn.close() + + if not row: + return None + + return AccessRequest( + id=row[0], + policy_id=row[1], + user_id=row[2], + request_reason=row[3], + status=row[4], + approved_by=row[5], + approved_at=row[6], + expires_at=row[7], + created_at=row[8] + ) + + +# 全局安全管理器实例 +_security_manager = None + + +def get_security_manager(db_path: str = "insightflow.db") -> SecurityManager: + """获取安全管理器实例""" + global _security_manager + if _security_manager is None: + _security_manager = SecurityManager(db_path) + return _security_manager diff --git a/backend/test_multimodal.py b/backend/test_multimodal.py new file mode 100644 index 0000000..68789cf --- /dev/null +++ b/backend/test_multimodal.py @@ -0,0 +1,157 @@ +#!/usr/bin/env python3 +""" +InsightFlow Multimodal Module Test Script +测试多模态支持模块 +""" + +import sys +import os + +# 添加 backend 目录到路径 +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +print("=" * 60) +print("InsightFlow 多模态模块测试") +print("=" * 60) + +# 测试导入 +print("\n1. 测试模块导入...") + +try: + from multimodal_processor import ( + get_multimodal_processor, MultimodalProcessor, + VideoProcessingResult, VideoFrame + ) + print(" ✓ multimodal_processor 导入成功") +except ImportError as e: + print(f" ✗ multimodal_processor 导入失败: {e}") + +try: + from image_processor import ( + get_image_processor, ImageProcessor, + ImageProcessingResult, ImageEntity, ImageRelation + ) + print(" ✓ image_processor 导入成功") +except ImportError as e: + print(f" ✗ image_processor 导入失败: {e}") + +try: + from multimodal_entity_linker import ( + get_multimodal_entity_linker, MultimodalEntityLinker, + MultimodalEntity, EntityLink, AlignmentResult, FusionResult + ) + print(" ✓ multimodal_entity_linker 导入成功") +except ImportError as e: + print(f" ✗ multimodal_entity_linker 导入失败: {e}") + +# 测试初始化 +print("\n2. 测试模块初始化...") + +try: + processor = get_multimodal_processor() + print(f" ✓ MultimodalProcessor 初始化成功") + print(f" - 临时目录: {processor.temp_dir}") + print(f" - 帧提取间隔: {processor.frame_interval}秒") +except Exception as e: + print(f" ✗ MultimodalProcessor 初始化失败: {e}") + +try: + img_processor = get_image_processor() + print(f" ✓ ImageProcessor 初始化成功") + print(f" - 临时目录: {img_processor.temp_dir}") +except Exception as e: + print(f" ✗ ImageProcessor 初始化失败: {e}") + +try: + linker = get_multimodal_entity_linker() + print(f" ✓ MultimodalEntityLinker 初始化成功") + print(f" - 相似度阈值: {linker.similarity_threshold}") +except Exception as e: + print(f" ✗ MultimodalEntityLinker 初始化失败: {e}") + +# 测试实体关联功能 +print("\n3. 测试实体关联功能...") + +try: + linker = get_multimodal_entity_linker() + + # 测试字符串相似度 + sim = linker.calculate_string_similarity("Project Alpha", "Project Alpha") + assert sim == 1.0, "完全匹配应该返回1.0" + print(f" ✓ 字符串相似度计算正常 (完全匹配: {sim})") + + sim = linker.calculate_string_similarity("K8s", "Kubernetes") + print(f" ✓ 字符串相似度计算正常 (不同字符串: {sim:.2f})") + + # 测试实体相似度 + entity1 = {"name": "Project Alpha", "type": "PROJECT", "definition": "核心项目"} + entity2 = {"name": "Project Alpha", "type": "PROJECT", "definition": "主要项目"} + sim, match_type = linker.calculate_entity_similarity(entity1, entity2) + print(f" ✓ 实体相似度计算正常 (相似度: {sim:.2f}, 类型: {match_type})") + +except Exception as e: + print(f" ✗ 实体关联功能测试失败: {e}") + +# 测试图片处理功能(不需要实际图片) +print("\n4. 测试图片处理器功能...") + +try: + processor = get_image_processor() + + # 测试图片类型检测(使用模拟数据) + print(f" ✓ 支持的图片类型: {list(processor.IMAGE_TYPES.keys())}") + print(f" ✓ 图片类型描述: {processor.IMAGE_TYPES}") + +except Exception as e: + print(f" ✗ 图片处理器功能测试失败: {e}") + +# 测试视频处理配置 +print("\n5. 测试视频处理器配置...") + +try: + processor = get_multimodal_processor() + + print(f" ✓ 视频目录: {processor.video_dir}") + print(f" ✓ 帧目录: {processor.frames_dir}") + print(f" ✓ 音频目录: {processor.audio_dir}") + + # 检查目录是否存在 + for dir_name, dir_path in [ + ("视频", processor.video_dir), + ("帧", processor.frames_dir), + ("音频", processor.audio_dir) + ]: + if os.path.exists(dir_path): + print(f" ✓ {dir_name}目录存在: {dir_path}") + else: + print(f" ✗ {dir_name}目录不存在: {dir_path}") + +except Exception as e: + print(f" ✗ 视频处理器配置测试失败: {e}") + +# 测试数据库方法(如果数据库可用) +print("\n6. 测试数据库多模态方法...") + +try: + from db_manager import get_db_manager + db = get_db_manager() + + # 检查多模态表是否存在 + conn = db.get_conn() + tables = ['videos', 'video_frames', 'images', 'multimodal_mentions', 'multimodal_entity_links'] + + for table in tables: + try: + conn.execute(f"SELECT 1 FROM {table} LIMIT 1") + print(f" ✓ 表 '{table}' 存在") + except Exception as e: + print(f" ✗ 表 '{table}' 不存在或无法访问: {e}") + + conn.close() + +except Exception as e: + print(f" ✗ 数据库多模态方法测试失败: {e}") + +print("\n" + "=" * 60) +print("测试完成") +print("=" * 60) diff --git a/backend/workflow_manager.py b/backend/workflow_manager.py new file mode 100644 index 0000000..13ab764 --- /dev/null +++ b/backend/workflow_manager.py @@ -0,0 +1,1488 @@ +#!/usr/bin/env python3 +""" +InsightFlow Workflow Manager - Phase 7 +智能工作流自动化模块 +- 定时任务调度(APScheduler) +- 自动分析新上传文件 +- 自动实体对齐和关系发现 +- Webhook 通知系统(飞书、钉钉、Slack) +- 工作流配置管理 +""" + +import os +import json +import uuid +import asyncio +import httpx +import logging +from datetime import datetime, timedelta +from typing import List, Dict, Optional, Callable, Any +from dataclasses import dataclass, field, asdict +from enum import Enum +from collections import defaultdict + +from apscheduler.schedulers.asyncio import AsyncIOScheduler +from apscheduler.triggers.cron import CronTrigger +from apscheduler.triggers.interval import IntervalTrigger +from apscheduler.triggers.date import DateTrigger +from apscheduler.events import EVENT_JOB_EXECUTED, EVENT_JOB_ERROR + +# Configure logging +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +class WorkflowStatus(Enum): + """工作流状态""" + ACTIVE = "active" + PAUSED = "paused" + ERROR = "error" + COMPLETED = "completed" + + +class WorkflowType(Enum): + """工作流类型""" + AUTO_ANALYZE = "auto_analyze" # 自动分析新文件 + AUTO_ALIGN = "auto_align" # 自动实体对齐 + AUTO_RELATION = "auto_relation" # 自动关系发现 + SCHEDULED_REPORT = "scheduled_report" # 定时报告 + CUSTOM = "custom" # 自定义工作流 + + +class WebhookType(Enum): + """Webhook 类型""" + FEISHU = "feishu" + DINGTALK = "dingtalk" + SLACK = "slack" + CUSTOM = "custom" + + +class TaskStatus(Enum): + """任务执行状态""" + PENDING = "pending" + RUNNING = "running" + SUCCESS = "success" + FAILED = "failed" + CANCELLED = "cancelled" + + +@dataclass +class WorkflowTask: + """工作流任务定义""" + id: str + workflow_id: str + name: str + task_type: str # analyze, align, discover_relations, notify, custom + config: Dict = field(default_factory=dict) + order: int = 0 + depends_on: List[str] = field(default_factory=list) + timeout_seconds: int = 300 + retry_count: int = 3 + retry_delay: int = 5 + created_at: str = "" + updated_at: str = "" + + def __post_init__(self): + if not self.created_at: + self.created_at = datetime.now().isoformat() + if not self.updated_at: + self.updated_at = self.created_at + + +@dataclass +class WebhookConfig: + """Webhook 配置""" + id: str + name: str + webhook_type: str # feishu, dingtalk, slack, custom + url: str + secret: str = "" # 用于签名验证 + headers: Dict = field(default_factory=dict) + template: str = "" # 消息模板 + is_active: bool = True + created_at: str = "" + updated_at: str = "" + last_used_at: Optional[str] = None + success_count: int = 0 + fail_count: int = 0 + + def __post_init__(self): + if not self.created_at: + self.created_at = datetime.now().isoformat() + if not self.updated_at: + self.updated_at = self.created_at + + +@dataclass +class Workflow: + """工作流定义""" + id: str + name: str + description: str + workflow_type: str + project_id: str + status: str = "active" + schedule: Optional[str] = None # cron expression or interval + schedule_type: str = "manual" # manual, cron, interval + config: Dict = field(default_factory=dict) + webhook_ids: List[str] = field(default_factory=list) + is_active: bool = True + created_at: str = "" + updated_at: str = "" + last_run_at: Optional[str] = None + next_run_at: Optional[str] = None + run_count: int = 0 + success_count: int = 0 + fail_count: int = 0 + + def __post_init__(self): + if not self.created_at: + self.created_at = datetime.now().isoformat() + if not self.updated_at: + self.updated_at = self.created_at + + +@dataclass +class WorkflowLog: + """工作流执行日志""" + id: str + workflow_id: str + task_id: Optional[str] = None + status: str = "pending" # pending, running, success, failed, cancelled + start_time: Optional[str] = None + end_time: Optional[str] = None + duration_ms: int = 0 + input_data: Dict = field(default_factory=dict) + output_data: Dict = field(default_factory=dict) + error_message: str = "" + created_at: str = "" + + def __post_init__(self): + if not self.created_at: + self.created_at = datetime.now().isoformat() + + +class WebhookNotifier: + """Webhook 通知器 - 支持飞书、钉钉、Slack""" + + def __init__(self): + self.http_client = httpx.AsyncClient(timeout=30.0) + + async def send(self, config: WebhookConfig, message: Dict) -> bool: + """发送 Webhook 通知""" + try: + webhook_type = WebhookType(config.webhook_type) + + if webhook_type == WebhookType.FEISHU: + return await self._send_feishu(config, message) + elif webhook_type == WebhookType.DINGTALK: + return await self._send_dingtalk(config, message) + elif webhook_type == WebhookType.SLACK: + return await self._send_slack(config, message) + else: + return await self._send_custom(config, message) + + except Exception as e: + logger.error(f"Webhook send failed: {e}") + return False + + async def _send_feishu(self, config: WebhookConfig, message: Dict) -> bool: + """发送飞书通知""" + import hashlib + import base64 + import hmac + + timestamp = str(int(datetime.now().timestamp())) + + # 签名计算 + if config.secret: + string_to_sign = f"{timestamp}\n{config.secret}" + hmac_code = hmac.new( + string_to_sign.encode('utf-8'), + digestmod=hashlib.sha256 + ).digest() + sign = base64.b64encode(hmac_code).decode('utf-8') + else: + sign = "" + + # 构建消息体 + if "content" in message: + # 文本消息 + payload = { + "timestamp": timestamp, + "sign": sign, + "msg_type": "text", + "content": { + "text": message["content"] + } + } + elif "title" in message: + # 富文本消息 + payload = { + "timestamp": timestamp, + "sign": sign, + "msg_type": "post", + "content": { + "post": { + "zh_cn": { + "title": message.get("title", ""), + "content": message.get("body", []) + } + } + } + } + else: + # 卡片消息 + payload = { + "timestamp": timestamp, + "sign": sign, + "msg_type": "interactive", + "card": message.get("card", {}) + } + + headers = { + "Content-Type": "application/json", + **config.headers + } + + response = await self.http_client.post( + config.url, + json=payload, + headers=headers + ) + response.raise_for_status() + result = response.json() + + return result.get("code") == 0 + + async def _send_dingtalk(self, config: WebhookConfig, message: Dict) -> bool: + """发送钉钉通知""" + import hashlib + import base64 + import hmac + import urllib.parse + + timestamp = str(round(datetime.now().timestamp() * 1000)) + + # 签名计算 + if config.secret: + secret_enc = config.secret.encode('utf-8') + string_to_sign = f"{timestamp}\n{config.secret}" + hmac_code = hmac.new(secret_enc, string_to_sign.encode('utf-8'), digestmod=hashlib.sha256).digest() + sign = urllib.parse.quote_plus(base64.b64encode(hmac_code)) + url = f"{config.url}×tamp={timestamp}&sign={sign}" + else: + url = config.url + + # 构建消息体 + if "content" in message: + payload = { + "msgtype": "text", + "text": { + "content": message["content"] + } + } + elif "title" in message: + payload = { + "msgtype": "markdown", + "markdown": { + "title": message["title"], + "text": message.get("markdown", "") + } + } + elif "link" in message: + payload = { + "msgtype": "link", + "link": { + "text": message.get("text", ""), + "title": message["title"], + "picUrl": message.get("pic_url", ""), + "messageUrl": message["link"] + } + } + else: + payload = { + "msgtype": "action_card", + "action_card": message.get("action_card", {}) + } + + headers = { + "Content-Type": "application/json", + **config.headers + } + + response = await self.http_client.post(url, json=payload, headers=headers) + response.raise_for_status() + result = response.json() + + return result.get("errcode") == 0 + + async def _send_slack(self, config: WebhookConfig, message: Dict) -> bool: + """发送 Slack 通知""" + # Slack 直接支持标准 webhook 格式 + payload = { + "text": message.get("content", message.get("text", "")), + } + + if "blocks" in message: + payload["blocks"] = message["blocks"] + + if "attachments" in message: + payload["attachments"] = message["attachments"] + + headers = { + "Content-Type": "application/json", + **config.headers + } + + response = await self.http_client.post( + config.url, + json=payload, + headers=headers + ) + response.raise_for_status() + + return response.text == "ok" + + async def _send_custom(self, config: WebhookConfig, message: Dict) -> bool: + """发送自定义 Webhook 通知""" + headers = { + "Content-Type": "application/json", + **config.headers + } + + response = await self.http_client.post( + config.url, + json=message, + headers=headers + ) + response.raise_for_status() + + return True + + async def close(self): + """关闭 HTTP 客户端""" + await self.http_client.aclose() + + +class WorkflowManager: + """工作流管理器 - 核心管理类""" + + def __init__(self, db_manager=None): + self.db = db_manager + self.scheduler = AsyncIOScheduler() + self.notifier = WebhookNotifier() + self._task_handlers: Dict[str, Callable] = {} + self._running_tasks: Dict[str, asyncio.Task] = {} + self._setup_default_handlers() + + # 添加调度器事件监听 + self.scheduler.add_listener( + self._on_job_executed, + EVENT_JOB_EXECUTED | EVENT_JOB_ERROR + ) + + def _setup_default_handlers(self): + """设置默认的任务处理器""" + self._task_handlers = { + "analyze": self._handle_analyze_task, + "align": self._handle_align_task, + "discover_relations": self._handle_discover_relations_task, + "notify": self._handle_notify_task, + "custom": self._handle_custom_task, + } + + def register_task_handler(self, task_type: str, handler: Callable): + """注册自定义任务处理器""" + self._task_handlers[task_type] = handler + + def start(self): + """启动工作流管理器""" + if not self.scheduler.running: + self.scheduler.start() + logger.info("Workflow scheduler started") + + # 加载并调度所有活跃的工作流 + if self.db: + asyncio.create_task(self._load_and_schedule_workflows()) + + def stop(self): + """停止工作流管理器""" + if self.scheduler.running: + self.scheduler.shutdown(wait=True) + logger.info("Workflow scheduler stopped") + + async def _load_and_schedule_workflows(self): + """从数据库加载并调度所有活跃工作流""" + try: + workflows = self.list_workflows(status="active") + for workflow in workflows: + if workflow.schedule and workflow.is_active: + self._schedule_workflow(workflow) + except Exception as e: + logger.error(f"Failed to load workflows: {e}") + + def _schedule_workflow(self, workflow: Workflow): + """调度工作流""" + job_id = f"workflow_{workflow.id}" + + # 移除已存在的任务 + if self.scheduler.get_job(job_id): + self.scheduler.remove_job(job_id) + + if workflow.schedule_type == "cron": + # Cron 表达式调度 + trigger = CronTrigger.from_crontab(workflow.schedule) + elif workflow.schedule_type == "interval": + # 间隔调度 + interval_minutes = int(workflow.schedule) + trigger = IntervalTrigger(minutes=interval_minutes) + else: + return + + self.scheduler.add_job( + func=self._execute_workflow_job, + trigger=trigger, + id=job_id, + args=[workflow.id], + replace_existing=True, + max_instances=1, + coalesce=True + ) + + logger.info(f"Scheduled workflow {workflow.id} ({workflow.name}) with {workflow.schedule_type}") + + async def _execute_workflow_job(self, workflow_id: str): + """调度器调用的工作流执行函数""" + try: + await self.execute_workflow(workflow_id) + except Exception as e: + logger.error(f"Scheduled workflow execution failed: {e}") + + def _on_job_executed(self, event): + """调度器事件处理""" + if event.exception: + logger.error(f"Job {event.job_id} failed: {event.exception}") + else: + logger.info(f"Job {event.job_id} executed successfully") + + # ==================== Workflow CRUD ==================== + + def create_workflow(self, workflow: Workflow) -> Workflow: + """创建工作流""" + conn = self.db.get_conn() + try: + conn.execute( + """INSERT INTO workflows + (id, name, description, workflow_type, project_id, status, + schedule, schedule_type, config, webhook_ids, is_active, + created_at, updated_at, last_run_at, next_run_at, + run_count, success_count, fail_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (workflow.id, workflow.name, workflow.description, workflow.workflow_type, + workflow.project_id, workflow.status, workflow.schedule, workflow.schedule_type, + json.dumps(workflow.config), json.dumps(workflow.webhook_ids), workflow.is_active, + workflow.created_at, workflow.updated_at, workflow.last_run_at, workflow.next_run_at, + workflow.run_count, workflow.success_count, workflow.fail_count) + ) + conn.commit() + + # 如果设置了调度,立即调度 + if workflow.schedule and workflow.is_active: + self._schedule_workflow(workflow) + + return workflow + finally: + conn.close() + + def get_workflow(self, workflow_id: str) -> Optional[Workflow]: + """获取工作流""" + conn = self.db.get_conn() + try: + row = conn.execute( + "SELECT * FROM workflows WHERE id = ?", + (workflow_id,) + ).fetchone() + + if not row: + return None + + return self._row_to_workflow(row) + finally: + conn.close() + + def list_workflows(self, project_id: str = None, status: str = None, + workflow_type: str = None) -> List[Workflow]: + """列出工作流""" + conn = self.db.get_conn() + try: + conditions = [] + params = [] + + if project_id: + conditions.append("project_id = ?") + params.append(project_id) + if status: + conditions.append("status = ?") + params.append(status) + if workflow_type: + conditions.append("workflow_type = ?") + params.append(workflow_type) + + where_clause = " AND ".join(conditions) if conditions else "1=1" + + rows = conn.execute( + f"SELECT * FROM workflows WHERE {where_clause} ORDER BY created_at DESC", + params + ).fetchall() + + return [self._row_to_workflow(row) for row in rows] + finally: + conn.close() + + def update_workflow(self, workflow_id: str, **kwargs) -> Optional[Workflow]: + """更新工作流""" + conn = self.db.get_conn() + try: + allowed_fields = ['name', 'description', 'status', 'schedule', + 'schedule_type', 'is_active', 'config', 'webhook_ids'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field in ['config', 'webhook_ids']: + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + return self.get_workflow(workflow_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(workflow_id) + + query = f"UPDATE workflows SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + + # 重新调度 + workflow = self.get_workflow(workflow_id) + if workflow and workflow.schedule and workflow.is_active: + self._schedule_workflow(workflow) + elif workflow and not workflow.is_active: + job_id = f"workflow_{workflow_id}" + if self.scheduler.get_job(job_id): + self.scheduler.remove_job(job_id) + + return workflow + finally: + conn.close() + + def delete_workflow(self, workflow_id: str) -> bool: + """删除工作流""" + conn = self.db.get_conn() + try: + # 移除调度 + job_id = f"workflow_{workflow_id}" + if self.scheduler.get_job(job_id): + self.scheduler.remove_job(job_id) + + # 删除相关任务 + conn.execute("DELETE FROM workflow_tasks WHERE workflow_id = ?", (workflow_id,)) + + # 删除工作流 + conn.execute("DELETE FROM workflows WHERE id = ?", (workflow_id,)) + conn.commit() + + return True + finally: + conn.close() + + def _row_to_workflow(self, row) -> Workflow: + """将数据库行转换为 Workflow 对象""" + return Workflow( + id=row['id'], + name=row['name'], + description=row['description'] or "", + workflow_type=row['workflow_type'], + project_id=row['project_id'], + status=row['status'], + schedule=row['schedule'], + schedule_type=row['schedule_type'], + config=json.loads(row['config']) if row['config'] else {}, + webhook_ids=json.loads(row['webhook_ids']) if row['webhook_ids'] else [], + is_active=bool(row['is_active']), + created_at=row['created_at'], + updated_at=row['updated_at'], + last_run_at=row['last_run_at'], + next_run_at=row['next_run_at'], + run_count=row['run_count'] or 0, + success_count=row['success_count'] or 0, + fail_count=row['fail_count'] or 0 + ) + + # ==================== Workflow Task CRUD ==================== + + def create_task(self, task: WorkflowTask) -> WorkflowTask: + """创建工作流任务""" + conn = self.db.get_conn() + try: + conn.execute( + """INSERT INTO workflow_tasks + (id, workflow_id, name, task_type, config, task_order, + depends_on, timeout_seconds, retry_count, retry_delay, + created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (task.id, task.workflow_id, task.name, task.task_type, + json.dumps(task.config), task.order, json.dumps(task.depends_on), + task.timeout_seconds, task.retry_count, task.retry_delay, + task.created_at, task.updated_at) + ) + conn.commit() + return task + finally: + conn.close() + + def get_task(self, task_id: str) -> Optional[WorkflowTask]: + """获取任务""" + conn = self.db.get_conn() + try: + row = conn.execute( + "SELECT * FROM workflow_tasks WHERE id = ?", + (task_id,) + ).fetchone() + + if not row: + return None + + return self._row_to_task(row) + finally: + conn.close() + + def list_tasks(self, workflow_id: str) -> List[WorkflowTask]: + """列出工作流的所有任务""" + conn = self.db.get_conn() + try: + rows = conn.execute( + "SELECT * FROM workflow_tasks WHERE workflow_id = ? ORDER BY task_order", + (workflow_id,) + ).fetchall() + + return [self._row_to_task(row) for row in rows] + finally: + conn.close() + + def update_task(self, task_id: str, **kwargs) -> Optional[WorkflowTask]: + """更新任务""" + conn = self.db.get_conn() + try: + allowed_fields = ['name', 'task_type', 'config', 'task_order', + 'depends_on', 'timeout_seconds', 'retry_count', 'retry_delay'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field in ['config', 'depends_on']: + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + return self.get_task(task_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(task_id) + + query = f"UPDATE workflow_tasks SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + + return self.get_task(task_id) + finally: + conn.close() + + def delete_task(self, task_id: str) -> bool: + """删除任务""" + conn = self.db.get_conn() + try: + conn.execute("DELETE FROM workflow_tasks WHERE id = ?", (task_id,)) + conn.commit() + return True + finally: + conn.close() + + def _row_to_task(self, row) -> WorkflowTask: + """将数据库行转换为 WorkflowTask 对象""" + return WorkflowTask( + id=row['id'], + workflow_id=row['workflow_id'], + name=row['name'], + task_type=row['task_type'], + config=json.loads(row['config']) if row['config'] else {}, + order=row['task_order'] or 0, + depends_on=json.loads(row['depends_on']) if row['depends_on'] else [], + timeout_seconds=row['timeout_seconds'] or 300, + retry_count=row['retry_count'] or 3, + retry_delay=row['retry_delay'] or 5, + created_at=row['created_at'], + updated_at=row['updated_at'] + ) + + # ==================== Webhook Config CRUD ==================== + + def create_webhook(self, webhook: WebhookConfig) -> WebhookConfig: + """创建 Webhook 配置""" + conn = self.db.get_conn() + try: + conn.execute( + """INSERT INTO webhook_configs + (id, name, webhook_type, url, secret, headers, template, + is_active, created_at, updated_at, last_used_at, + success_count, fail_count) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (webhook.id, webhook.name, webhook.webhook_type, webhook.url, + webhook.secret, json.dumps(webhook.headers), webhook.template, + webhook.is_active, webhook.created_at, webhook.updated_at, + webhook.last_used_at, webhook.success_count, webhook.fail_count) + ) + conn.commit() + return webhook + finally: + conn.close() + + def get_webhook(self, webhook_id: str) -> Optional[WebhookConfig]: + """获取 Webhook 配置""" + conn = self.db.get_conn() + try: + row = conn.execute( + "SELECT * FROM webhook_configs WHERE id = ?", + (webhook_id,) + ).fetchone() + + if not row: + return None + + return self._row_to_webhook(row) + finally: + conn.close() + + def list_webhooks(self) -> List[WebhookConfig]: + """列出所有 Webhook 配置""" + conn = self.db.get_conn() + try: + rows = conn.execute( + "SELECT * FROM webhook_configs ORDER BY created_at DESC" + ).fetchall() + + return [self._row_to_webhook(row) for row in rows] + finally: + conn.close() + + def update_webhook(self, webhook_id: str, **kwargs) -> Optional[WebhookConfig]: + """更新 Webhook 配置""" + conn = self.db.get_conn() + try: + allowed_fields = ['name', 'webhook_type', 'url', 'secret', + 'headers', 'template', 'is_active'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field == 'headers': + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + return self.get_webhook(webhook_id) + + updates.append("updated_at = ?") + values.append(datetime.now().isoformat()) + values.append(webhook_id) + + query = f"UPDATE webhook_configs SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + + return self.get_webhook(webhook_id) + finally: + conn.close() + + def delete_webhook(self, webhook_id: str) -> bool: + """删除 Webhook 配置""" + conn = self.db.get_conn() + try: + conn.execute("DELETE FROM webhook_configs WHERE id = ?", (webhook_id,)) + conn.commit() + return True + finally: + conn.close() + + def update_webhook_stats(self, webhook_id: str, success: bool): + """更新 Webhook 统计""" + conn = self.db.get_conn() + try: + if success: + conn.execute( + """UPDATE webhook_configs + SET success_count = success_count + 1, last_used_at = ? + WHERE id = ?""", + (datetime.now().isoformat(), webhook_id) + ) + else: + conn.execute( + """UPDATE webhook_configs + SET fail_count = fail_count + 1, last_used_at = ? + WHERE id = ?""", + (datetime.now().isoformat(), webhook_id) + ) + conn.commit() + finally: + conn.close() + + def _row_to_webhook(self, row) -> WebhookConfig: + """将数据库行转换为 WebhookConfig 对象""" + return WebhookConfig( + id=row['id'], + name=row['name'], + webhook_type=row['webhook_type'], + url=row['url'], + secret=row['secret'] or "", + headers=json.loads(row['headers']) if row['headers'] else {}, + template=row['template'] or "", + is_active=bool(row['is_active']), + created_at=row['created_at'], + updated_at=row['updated_at'], + last_used_at=row['last_used_at'], + success_count=row['success_count'] or 0, + fail_count=row['fail_count'] or 0 + ) + + # ==================== Workflow Log ==================== + + def create_log(self, log: WorkflowLog) -> WorkflowLog: + """创建工作流日志""" + conn = self.db.get_conn() + try: + conn.execute( + """INSERT INTO workflow_logs + (id, workflow_id, task_id, status, start_time, end_time, + duration_ms, input_data, output_data, error_message, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (log.id, log.workflow_id, log.task_id, log.status, + log.start_time, log.end_time, log.duration_ms, + json.dumps(log.input_data), json.dumps(log.output_data), + log.error_message, log.created_at) + ) + conn.commit() + return log + finally: + conn.close() + + def update_log(self, log_id: str, **kwargs) -> Optional[WorkflowLog]: + """更新工作流日志""" + conn = self.db.get_conn() + try: + allowed_fields = ['status', 'end_time', 'duration_ms', + 'output_data', 'error_message'] + updates = [] + values = [] + + for field in allowed_fields: + if field in kwargs: + updates.append(f"{field} = ?") + if field == 'output_data': + values.append(json.dumps(kwargs[field])) + else: + values.append(kwargs[field]) + + if not updates: + return None + + values.append(log_id) + query = f"UPDATE workflow_logs SET {', '.join(updates)} WHERE id = ?" + conn.execute(query, values) + conn.commit() + + return self.get_log(log_id) + finally: + conn.close() + + def get_log(self, log_id: str) -> Optional[WorkflowLog]: + """获取日志""" + conn = self.db.get_conn() + try: + row = conn.execute( + "SELECT * FROM workflow_logs WHERE id = ?", + (log_id,) + ).fetchone() + + if not row: + return None + + return self._row_to_log(row) + finally: + conn.close() + + def list_logs(self, workflow_id: str = None, task_id: str = None, + status: str = None, limit: int = 100, offset: int = 0) -> List[WorkflowLog]: + """列出工作流日志""" + conn = self.db.get_conn() + try: + conditions = [] + params = [] + + if workflow_id: + conditions.append("workflow_id = ?") + params.append(workflow_id) + if task_id: + conditions.append("task_id = ?") + params.append(task_id) + if status: + conditions.append("status = ?") + params.append(status) + + where_clause = " AND ".join(conditions) if conditions else "1=1" + + rows = conn.execute( + f"""SELECT * FROM workflow_logs + WHERE {where_clause} + ORDER BY created_at DESC + LIMIT ? OFFSET ?""", + params + [limit, offset] + ).fetchall() + + return [self._row_to_log(row) for row in rows] + finally: + conn.close() + + def get_workflow_stats(self, workflow_id: str, days: int = 30) -> Dict: + """获取工作流统计""" + conn = self.db.get_conn() + try: + since = (datetime.now() - timedelta(days=days)).isoformat() + + # 总执行次数 + total = conn.execute( + "SELECT COUNT(*) FROM workflow_logs WHERE workflow_id = ? AND created_at > ?", + (workflow_id, since) + ).fetchone()[0] + + # 成功次数 + success = conn.execute( + "SELECT COUNT(*) FROM workflow_logs WHERE workflow_id = ? AND status = 'success' AND created_at > ?", + (workflow_id, since) + ).fetchone()[0] + + # 失败次数 + failed = conn.execute( + "SELECT COUNT(*) FROM workflow_logs WHERE workflow_id = ? AND status = 'failed' AND created_at > ?", + (workflow_id, since) + ).fetchone()[0] + + # 平均执行时间 + avg_duration = conn.execute( + "SELECT AVG(duration_ms) FROM workflow_logs WHERE workflow_id = ? AND created_at > ?", + (workflow_id, since) + ).fetchone()[0] or 0 + + # 每日统计 + daily = conn.execute( + """SELECT DATE(created_at) as date, + COUNT(*) as count, + SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as success + FROM workflow_logs + WHERE workflow_id = ? AND created_at > ? + GROUP BY DATE(created_at) + ORDER BY date""", + (workflow_id, since) + ).fetchall() + + return { + "total": total, + "success": success, + "failed": failed, + "success_rate": round(success / total * 100, 2) if total > 0 else 0, + "avg_duration_ms": round(avg_duration, 2), + "daily": [{"date": r["date"], "count": r["count"], "success": r["success"]} for r in daily] + } + finally: + conn.close() + + def _row_to_log(self, row) -> WorkflowLog: + """将数据库行转换为 WorkflowLog 对象""" + return WorkflowLog( + id=row['id'], + workflow_id=row['workflow_id'], + task_id=row['task_id'], + status=row['status'], + start_time=row['start_time'], + end_time=row['end_time'], + duration_ms=row['duration_ms'] or 0, + input_data=json.loads(row['input_data']) if row['input_data'] else {}, + output_data=json.loads(row['output_data']) if row['output_data'] else {}, + error_message=row['error_message'] or "", + created_at=row['created_at'] + ) + + # ==================== Workflow Execution ==================== + + async def execute_workflow(self, workflow_id: str, input_data: Dict = None) -> Dict: + """执行工作流""" + workflow = self.get_workflow(workflow_id) + if not workflow: + raise ValueError(f"Workflow {workflow_id} not found") + + if not workflow.is_active: + raise ValueError(f"Workflow {workflow_id} is not active") + + # 更新最后运行时间 + now = datetime.now().isoformat() + self.update_workflow(workflow_id, last_run_at=now, + run_count=workflow.run_count + 1) + + # 创建工作流执行日志 + log = WorkflowLog( + id=str(uuid.uuid4())[:8], + workflow_id=workflow_id, + status=TaskStatus.RUNNING.value, + start_time=now, + input_data=input_data or {} + ) + self.create_log(log) + + start_time = datetime.now() + results = {} + + try: + # 获取所有任务 + tasks = self.list_tasks(workflow_id) + + if not tasks: + # 没有任务时执行默认行为 + results = await self._execute_default_workflow(workflow, input_data) + else: + # 按依赖顺序执行任务 + results = await self._execute_tasks_with_deps(tasks, input_data, log.id) + + # 发送通知 + await self._send_workflow_notification(workflow, results, success=True) + + # 更新日志为成功 + end_time = datetime.now() + duration = int((end_time - start_time).total_seconds() * 1000) + self.update_log( + log.id, + status=TaskStatus.SUCCESS.value, + end_time=end_time.isoformat(), + duration_ms=duration, + output_data=results + ) + + # 更新成功计数 + self.update_workflow(workflow_id, success_count=workflow.success_count + 1) + + return { + "success": True, + "workflow_id": workflow_id, + "log_id": log.id, + "results": results, + "duration_ms": duration + } + + except Exception as e: + logger.error(f"Workflow {workflow_id} execution failed: {e}") + + # 更新日志为失败 + end_time = datetime.now() + duration = int((end_time - start_time).total_seconds() * 1000) + self.update_log( + log.id, + status=TaskStatus.FAILED.value, + end_time=end_time.isoformat(), + duration_ms=duration, + error_message=str(e) + ) + + # 更新失败计数 + self.update_workflow(workflow_id, fail_count=workflow.fail_count + 1) + + # 发送失败通知 + await self._send_workflow_notification(workflow, {"error": str(e)}, success=False) + + raise + + async def _execute_tasks_with_deps(self, tasks: List[WorkflowTask], + input_data: Dict, log_id: str) -> Dict: + """按依赖顺序执行任务""" + results = {} + completed_tasks = set() + + # 构建任务映射 + task_map = {t.id: t for t in tasks} + + while len(completed_tasks) < len(tasks): + # 找到可以执行的任务(依赖已完成) + ready_tasks = [ + t for t in tasks + if t.id not in completed_tasks and + all(dep in completed_tasks for dep in t.depends_on) + ] + + if not ready_tasks: + # 有循环依赖或无法完成的任务 + raise ValueError("Circular dependency detected or tasks cannot be resolved") + + # 并行执行就绪的任务 + task_coros = [] + for task in ready_tasks: + task_input = {**input_data, **results} + task_coros.append(self._execute_single_task(task, task_input, log_id)) + + task_results = await asyncio.gather(*task_coros, return_exceptions=True) + + for task, result in zip(ready_tasks, task_results): + if isinstance(result, Exception): + logger.error(f"Task {task.id} failed: {result}") + if task.retry_count > 0: + # 重试逻辑 + for attempt in range(task.retry_count): + await asyncio.sleep(task.retry_delay) + try: + result = await self._execute_single_task(task, task_input, log_id) + break + except Exception as e: + logger.error(f"Task {task.id} retry {attempt + 1} failed: {e}") + if attempt == task.retry_count - 1: + raise + else: + raise result + + results[task.name] = result + completed_tasks.add(task.id) + + return results + + async def _execute_single_task(self, task: WorkflowTask, + input_data: Dict, log_id: str) -> Any: + """执行单个任务""" + handler = self._task_handlers.get(task.task_type) + if not handler: + raise ValueError(f"No handler for task type: {task.task_type}") + + # 创建任务日志 + task_log = WorkflowLog( + id=str(uuid.uuid4())[:8], + workflow_id=task.workflow_id, + task_id=task.id, + status=TaskStatus.RUNNING.value, + start_time=datetime.now().isoformat(), + input_data=input_data + ) + self.create_log(task_log) + + try: + # 设置超时 + result = await asyncio.wait_for( + handler(task, input_data), + timeout=task.timeout_seconds + ) + + # 更新任务日志为成功 + self.update_log( + task_log.id, + status=TaskStatus.SUCCESS.value, + end_time=datetime.now().isoformat(), + output_data={"result": result} if not isinstance(result, dict) else result + ) + + return result + + except asyncio.TimeoutError: + self.update_log( + task_log.id, + status=TaskStatus.FAILED.value, + end_time=datetime.now().isoformat(), + error_message="Task timeout" + ) + raise TimeoutError(f"Task {task.id} timed out after {task.timeout_seconds}s") + + except Exception as e: + self.update_log( + task_log.id, + status=TaskStatus.FAILED.value, + end_time=datetime.now().isoformat(), + error_message=str(e) + ) + raise + + async def _execute_default_workflow(self, workflow: Workflow, + input_data: Dict) -> Dict: + """执行默认工作流(根据类型)""" + workflow_type = WorkflowType(workflow.workflow_type) + + if workflow_type == WorkflowType.AUTO_ANALYZE: + return await self._auto_analyze_files(workflow, input_data) + elif workflow_type == WorkflowType.AUTO_ALIGN: + return await self._auto_align_entities(workflow, input_data) + elif workflow_type == WorkflowType.AUTO_RELATION: + return await self._auto_discover_relations(workflow, input_data) + elif workflow_type == WorkflowType.SCHEDULED_REPORT: + return await self._generate_scheduled_report(workflow, input_data) + else: + return {"message": "No default action for custom workflow"} + + # ==================== Default Task Handlers ==================== + + async def _handle_analyze_task(self, task: WorkflowTask, input_data: Dict) -> Dict: + """处理分析任务""" + project_id = input_data.get("project_id") + file_ids = input_data.get("file_ids", []) + + if not project_id: + raise ValueError("project_id required for analyze task") + + # 这里调用现有的文件分析逻辑 + # 实际实现需要与 main.py 中的 upload_audio 逻辑集成 + return { + "task": "analyze", + "project_id": project_id, + "files_processed": len(file_ids), + "status": "completed" + } + + async def _handle_align_task(self, task: WorkflowTask, input_data: Dict) -> Dict: + """处理实体对齐任务""" + project_id = input_data.get("project_id") + threshold = task.config.get("threshold", 0.85) + + if not project_id: + raise ValueError("project_id required for align task") + + # 这里调用实体对齐逻辑 + return { + "task": "align", + "project_id": project_id, + "threshold": threshold, + "entities_merged": 0, # 实际实现需要调用对齐逻辑 + "status": "completed" + } + + async def _handle_discover_relations_task(self, task: WorkflowTask, + input_data: Dict) -> Dict: + """处理关系发现任务""" + project_id = input_data.get("project_id") + + if not project_id: + raise ValueError("project_id required for discover_relations task") + + # 这里调用关系发现逻辑 + return { + "task": "discover_relations", + "project_id": project_id, + "relations_found": 0, # 实际实现需要调用关系发现逻辑 + "status": "completed" + } + + async def _handle_notify_task(self, task: WorkflowTask, input_data: Dict) -> Dict: + """处理通知任务""" + webhook_id = task.config.get("webhook_id") + message = task.config.get("message", {}) + + if not webhook_id: + raise ValueError("webhook_id required for notify task") + + webhook = self.get_webhook(webhook_id) + if not webhook: + raise ValueError(f"Webhook {webhook_id} not found") + + # 替换模板变量 + if webhook.template: + try: + message = json.loads(webhook.template.format(**input_data)) + except: + pass + + success = await self.notifier.send(webhook, message) + self.update_webhook_stats(webhook_id, success) + + return { + "task": "notify", + "webhook_id": webhook_id, + "success": success + } + + async def _handle_custom_task(self, task: WorkflowTask, input_data: Dict) -> Dict: + """处理自定义任务""" + # 自定义任务的具体逻辑由外部处理器实现 + return { + "task": "custom", + "task_name": task.name, + "config": task.config, + "status": "completed" + } + + # ==================== Default Workflow Implementations ==================== + + async def _auto_analyze_files(self, workflow: Workflow, input_data: Dict) -> Dict: + """自动分析新上传的文件""" + project_id = workflow.project_id + + # 获取未分析的文件(实际实现需要查询数据库) + # 这里是一个示例实现 + return { + "workflow_type": "auto_analyze", + "project_id": project_id, + "files_analyzed": 0, + "entities_extracted": 0, + "relations_extracted": 0, + "status": "completed" + } + + async def _auto_align_entities(self, workflow: Workflow, input_data: Dict) -> Dict: + """自动实体对齐""" + project_id = workflow.project_id + threshold = workflow.config.get("threshold", 0.85) + + return { + "workflow_type": "auto_align", + "project_id": project_id, + "threshold": threshold, + "entities_merged": 0, + "status": "completed" + } + + async def _auto_discover_relations(self, workflow: Workflow, input_data: Dict) -> Dict: + """自动关系发现""" + project_id = workflow.project_id + + return { + "workflow_type": "auto_relation", + "project_id": project_id, + "relations_discovered": 0, + "status": "completed" + } + + async def _generate_scheduled_report(self, workflow: Workflow, input_data: Dict) -> Dict: + """生成定时报告""" + project_id = workflow.project_id + report_type = workflow.config.get("report_type", "summary") + + return { + "workflow_type": "scheduled_report", + "project_id": project_id, + "report_type": report_type, + "status": "completed" + } + + # ==================== Notification ==================== + + async def _send_workflow_notification(self, workflow: Workflow, + results: Dict, success: bool = True): + """发送工作流执行通知""" + if not workflow.webhook_ids: + return + + for webhook_id in workflow.webhook_ids: + webhook = self.get_webhook(webhook_id) + if not webhook or not webhook.is_active: + continue + + # 构建通知消息 + if webhook.webhook_type == WebhookType.FEISHU.value: + message = self._build_feishu_message(workflow, results, success) + elif webhook.webhook_type == WebhookType.DINGTALK.value: + message = self._build_dingtalk_message(workflow, results, success) + elif webhook.webhook_type == WebhookType.SLACK.value: + message = self._build_slack_message(workflow, results, success) + else: + message = { + "workflow_id": workflow.id, + "workflow_name": workflow.name, + "status": "success" if success else "failed", + "results": results, + "timestamp": datetime.now().isoformat() + } + + try: + result = await self.notifier.send(webhook, message) + self.update_webhook_stats(webhook_id, result) + except Exception as e: + logger.error(f"Failed to send notification to {webhook_id}: {e}") + + def _build_feishu_message(self, workflow: Workflow, results: Dict, + success: bool) -> Dict: + """构建飞书消息""" + status_text = "✅ 成功" if success else "❌ 失败" + + return { + "title": f"工作流执行通知: {workflow.name}", + "body": [ + [{"tag": "text", "text": f"工作流: {workflow.name}"}], + [{"tag": "text", "text": f"状态: {status_text}"}], + [{"tag": "text", "text": f"时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"}], + ] + } + + def _build_dingtalk_message(self, workflow: Workflow, results: Dict, + success: bool) -> Dict: + """构建钉钉消息""" + status_text = "✅ 成功" if success else "❌ 失败" + + return { + "title": f"工作流执行通知: {workflow.name}", + "markdown": f"""### 工作流执行通知 + +**工作流:** {workflow.name} + +**状态:** {status_text} + +**时间:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} + +**结果:** +```json +{json.dumps(results, ensure_ascii=False, indent=2)} +``` +""" + } + + def _build_slack_message(self, workflow: Workflow, results: Dict, + success: bool) -> Dict: + """构建 Slack 消息""" + color = "#36a64f" if success else "#ff0000" + status_text = "Success" if success else "Failed" + + return { + "attachments": [ + { + "color": color, + "title": f"Workflow Execution: {workflow.name}", + "fields": [ + {"title": "Status", "value": status_text, "short": True}, + {"title": "Time", "value": datetime.now().strftime('%Y-%m-%d %H:%M:%S'), "short": True} + ], + "footer": "InsightFlow", + "ts": int(datetime.now().timestamp()) + } + ] + } + + +# Singleton instance +_workflow_manager = None + + +def get_workflow_manager(db_manager=None) -> WorkflowManager: + """获取 WorkflowManager 单例""" + global _workflow_manager + if _workflow_manager is None: + _workflow_manager = WorkflowManager(db_manager) + return _workflow_manager diff --git a/chrome-extension/README.md b/chrome-extension/README.md new file mode 100644 index 0000000..ecd9d2b --- /dev/null +++ b/chrome-extension/README.md @@ -0,0 +1,113 @@ +# InsightFlow Chrome Extension + +一键将网页内容导入 InsightFlow 知识库的 Chrome 扩展。 + +## 功能特性 + +- 📄 **保存整个页面** - 自动提取正文内容并保存 +- ✏️ **保存选中内容** - 只保存您选中的文本 +- 🔗 **保存链接** - 快速保存网页链接 +- 🔄 **自动同步** - 剪辑后自动同步到服务器 +- 📎 **浮动按钮** - 页面右下角快速访问按钮 +- 🎯 **智能提取** - 自动识别正文,过滤广告和导航 + +## 安装方法 + +### 开发者模式安装 + +1. 打开 Chrome 浏览器,进入 `chrome://extensions/` +2. 开启右上角的"开发者模式" +3. 点击"加载已解压的扩展程序" +4. 选择 `chrome-extension` 文件夹 + +### 配置 + +1. 点击扩展图标,选择"设置" +2. 填写您的 InsightFlow 服务器地址 +3. 输入 Chrome 扩展令牌(从 InsightFlow 插件管理页面获取) +4. 点击"保存设置" +5. 点击"测试连接"验证配置 + +## 使用方法 + +### 方式一:扩展图标 +1. 点击浏览器工具栏上的 InsightFlow 图标 +2. 选择"保存整个页面"或"保存选中内容" + +### 方式二:右键菜单 +1. 在网页任意位置右键 +2. 选择"Clip page to InsightFlow"或"Clip selection to InsightFlow" + +### 方式三:浮动按钮 +1. 在页面右下角点击 📎 按钮 +2. 快速保存当前页面 + +### 方式四:快捷键 +- `Ctrl+Shift+S` (Windows/Linux) +- `Cmd+Shift+S` (Mac) + +## 文件结构 + +``` +chrome-extension/ +├── manifest.json # 扩展配置 +├── background.js # 后台脚本 +├── content.js # 内容脚本 +├── content.css # 内容样式 +├── popup.html # 弹出窗口 +├── popup.js # 弹出窗口脚本 +├── options.html # 设置页面 +├── options.js # 设置页面脚本 +└── icons/ # 图标文件夹 + ├── icon16.png + ├── icon48.png + └── icon128.png +``` + +## 开发 + +### 本地开发 + +1. 修改代码后,在 `chrome://extensions/` 页面点击刷新按钮 +2. 查看背景页控制台:扩展卡片 > 背景页 > 控制台 + +### 打包发布 + +1. 确保所有文件已保存 +2. 在 `chrome://extensions/` 页面点击"打包扩展程序" +3. 选择 `chrome-extension` 文件夹 +4. 生成 `.crx` 和 `.pem` 文件 + +## API 集成 + +扩展通过以下 API 与 InsightFlow 服务器通信: + +### 导入网页内容 +``` +POST /api/v1/plugins/chrome/import +Content-Type: application/json +X-API-Key: {token} + +{ + "token": "if_ext_xxx", + "url": "https://example.com/article", + "title": "文章标题", + "content": "正文内容...", + "html_content": "..." +} +``` + +### 健康检查 +``` +GET /api/v1/health +``` + +## 隐私说明 + +- 扩展仅在您主动点击时收集网页内容 +- 所有数据存储在您的 InsightFlow 服务器上 +- 不会收集或发送任何个人信息到第三方 + +## 许可证 + +MIT License \ No newline at end of file diff --git a/chrome-extension/background.js b/chrome-extension/background.js new file mode 100644 index 0000000..7f169c2 --- /dev/null +++ b/chrome-extension/background.js @@ -0,0 +1,198 @@ +// InsightFlow Chrome Extension - Background Script +// 处理扩展的后台逻辑 + +chrome.runtime.onInstalled.addListener(() => { + console.log('[InsightFlow] Extension installed'); + + // 创建右键菜单 + chrome.contextMenus.create({ + id: 'insightflow-clip-selection', + title: 'Clip selection to InsightFlow', + contexts: ['selection'] + }); + + chrome.contextMenus.create({ + id: 'insightflow-clip-page', + title: 'Clip page to InsightFlow', + contexts: ['page'] + }); + + chrome.contextMenus.create({ + id: 'insightflow-clip-link', + title: 'Clip link to InsightFlow', + contexts: ['link'] + }); +}); + +// 处理右键菜单点击 +chrome.contextMenus.onClicked.addListener((info, tab) => { + if (info.menuItemId === 'insightflow-clip-selection') { + clipSelection(tab); + } else if (info.menuItemId === 'insightflow-clip-page') { + clipPage(tab); + } else if (info.menuItemId === 'insightflow-clip-link') { + clipLink(tab, info.linkUrl); + } +}); + +// 处理来自 popup 的消息 +chrome.runtime.onMessage.addListener((request, sender, sendResponse) => { + if (request.action === 'clipPage') { + chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => { + if (tabs[0]) { + clipPage(tabs[0]).then(sendResponse); + } + }); + return true; + } else if (request.action === 'clipSelection') { + chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => { + if (tabs[0]) { + clipSelection(tabs[0]).then(sendResponse); + } + }); + return true; + } else if (request.action === 'openClipper') { + chrome.action.openPopup(); + } +}); + +// 剪辑整个页面 +async function clipPage(tab) { + try { + // 向 content script 发送消息提取内容 + const response = await chrome.tabs.sendMessage(tab.id, { action: 'extractContent' }); + + if (response.success) { + // 保存到本地存储 + await saveClip(response.data); + return { success: true, message: 'Page clipped successfully' }; + } + } catch (error) { + console.error('[InsightFlow] Failed to clip page:', error); + return { success: false, error: error.message }; + } +} + +// 剪辑选中的内容 +async function clipSelection(tab) { + try { + const response = await chrome.tabs.sendMessage(tab.id, { action: 'getSelection' }); + + if (response.success && response.data) { + const clipData = { + url: tab.url, + title: tab.title, + content: response.data.text, + context: response.data.context, + contentType: 'selection', + extractedAt: new Date().toISOString() + }; + + await saveClip(clipData); + return { success: true, message: 'Selection clipped successfully' }; + } else { + return { success: false, error: 'No text selected' }; + } + } catch (error) { + console.error('[InsightFlow] Failed to clip selection:', error); + return { success: false, error: error.message }; + } +} + +// 剪辑链接 +async function clipLink(tab, linkUrl) { + const clipData = { + url: linkUrl, + title: linkUrl, + content: `Link: ${linkUrl}`, + sourceUrl: tab.url, + contentType: 'link', + extractedAt: new Date().toISOString() + }; + + await saveClip(clipData); + return { success: true, message: 'Link clipped successfully' }; +} + +// 保存剪辑内容 +async function saveClip(data) { + // 获取现有剪辑 + const result = await chrome.storage.local.get(['clips']); + const clips = result.clips || []; + + // 添加新剪辑 + clips.unshift({ + id: generateId(), + ...data, + synced: false + }); + + // 只保留最近 100 条 + if (clips.length > 100) { + clips.pop(); + } + + // 保存 + await chrome.storage.local.set({ clips }); + + // 尝试同步到服务器 + syncToServer(); +} + +// 同步到服务器 +async function syncToServer() { + const { serverUrl, apiKey } = await chrome.storage.sync.get(['serverUrl', 'apiKey']); + + if (!serverUrl || !apiKey) { + console.log('[InsightFlow] Server not configured, skipping sync'); + return; + } + + const result = await chrome.storage.local.get(['clips']); + const clips = result.clips || []; + const unsyncedClips = clips.filter(c => !c.synced); + + if (unsyncedClips.length === 0) return; + + for (const clip of unsyncedClips) { + try { + const response = await fetch(`${serverUrl}/api/v1/plugins/chrome/import`, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'X-API-Key': apiKey + }, + body: JSON.stringify({ + token: apiKey, + url: clip.url, + title: clip.title, + content: clip.content, + html_content: clip.html || null + }) + }); + + if (response.ok) { + clip.synced = true; + clip.syncedAt = new Date().toISOString(); + } + } catch (error) { + console.error('[InsightFlow] Sync failed:', error); + } + } + + // 更新存储 + await chrome.storage.local.set({ clips }); +} + +// 生成唯一ID +function generateId() { + return Date.now().toString(36) + Math.random().toString(36).substr(2); +} + +// 定时同步(每5分钟) +chrome.alarms.create('syncClips', { periodInMinutes: 5 }); +chrome.alarms.onAlarm.addListener((alarm) => { + if (alarm.name === 'syncClips') { + syncToServer(); + } +}); \ No newline at end of file diff --git a/chrome-extension/content.css b/chrome-extension/content.css new file mode 100644 index 0000000..2ae81bf --- /dev/null +++ b/chrome-extension/content.css @@ -0,0 +1,46 @@ +/* InsightFlow Chrome Extension - Content Styles */ + +#insightflow-clipper-btn { + animation: slideIn 0.3s ease-out; +} + +@keyframes slideIn { + from { + transform: translateX(100px); + opacity: 0; + } + to { + transform: translateX(0); + opacity: 1; + } +} + +/* 选中文本高亮样式 */ +::selection { + background: rgba(102, 126, 234, 0.3); +} + +/* 剪辑成功提示 */ +.insightflow-toast { + position: fixed; + top: 20px; + right: 20px; + background: #4CAF50; + color: white; + padding: 15px 20px; + border-radius: 8px; + box-shadow: 0 4px 12px rgba(0,0,0,0.2); + z-index: 999999; + animation: toastSlideIn 0.3s ease-out; +} + +@keyframes toastSlideIn { + from { + transform: translateX(100%); + opacity: 0; + } + to { + transform: translateX(0); + opacity: 1; + } +} \ No newline at end of file diff --git a/chrome-extension/content.js b/chrome-extension/content.js new file mode 100644 index 0000000..499c840 --- /dev/null +++ b/chrome-extension/content.js @@ -0,0 +1,197 @@ +// InsightFlow Chrome Extension - Content Script +// 在网页上下文中运行,负责提取页面内容 + +(function() { + 'use strict'; + + // 避免重复注入 + if (window.insightFlowInjected) return; + window.insightFlowInjected = true; + + // 提取页面主要内容 + function extractContent() { + const result = { + url: window.location.href, + title: document.title, + content: '', + html: document.documentElement.outerHTML, + meta: { + author: getMetaContent('author'), + description: getMetaContent('description'), + keywords: getMetaContent('keywords'), + publishedTime: getMetaContent('article:published_time') || getMetaContent('publishedDate'), + siteName: getMetaContent('og:site_name') || getMetaContent('application-name'), + language: document.documentElement.lang || 'unknown' + }, + extractedAt: new Date().toISOString() + }; + + // 尝试提取正文内容 + const article = extractArticleContent(); + result.content = article.text; + result.contentHtml = article.html; + result.wordCount = article.text.split(/\s+/).length; + + return result; + } + + // 获取 meta 标签内容 + function getMetaContent(name) { + const meta = document.querySelector(`meta[name="${name}"], meta[property="${name}"]`); + return meta ? meta.getAttribute('content') : ''; + } + + // 提取文章正文(使用多种策略) + function extractArticleContent() { + // 策略1:使用 Readability 算法(简化版) + let bestElement = findBestElement(); + + if (bestElement) { + return { + text: cleanText(bestElement.innerText), + html: bestElement.innerHTML + }; + } + + // 策略2:回退到 body 内容 + const body = document.body; + return { + text: cleanText(body.innerText), + html: body.innerHTML + }; + } + + // 查找最佳内容元素(基于文本密度) + function findBestElement() { + const candidates = []; + const elements = document.querySelectorAll('article, [role="main"], .post-content, .entry-content, .article-content, #content, .content'); + + elements.forEach(el => { + const text = el.innerText || ''; + const linkDensity = calculateLinkDensity(el); + const textDensity = text.length / (el.innerHTML.length || 1); + + candidates.push({ + element: el, + score: text.length * textDensity * (1 - linkDensity), + textLength: text.length + }); + }); + + // 按分数排序 + candidates.sort((a, b) => b.score - a.score); + + return candidates.length > 0 ? candidates[0].element : null; + } + + // 计算链接密度 + function calculateLinkDensity(element) { + const links = element.getElementsByTagName('a'); + let linkLength = 0; + for (let link of links) { + linkLength += link.innerText.length; + } + const textLength = element.innerText.length || 1; + return linkLength / textLength; + } + + // 清理文本 + function cleanText(text) { + return text + .replace(/\s+/g, ' ') + .replace(/\n\s*\n/g, '\n\n') + .trim(); + } + + // 高亮选中的文本 + function highlightSelection() { + const selection = window.getSelection(); + if (selection.rangeCount > 0) { + const range = selection.getRangeAt(0); + const selectedText = selection.toString().trim(); + + if (selectedText.length > 0) { + return { + text: selectedText, + context: getSelectionContext(range) + }; + } + } + return null; + } + + // 获取选中内容的上下文 + function getSelectionContext(range) { + const container = range.commonAncestorContainer; + const element = container.nodeType === Node.TEXT_NODE ? container.parentElement : container; + + return { + tagName: element.tagName, + className: element.className, + id: element.id, + surroundingText: element.innerText.substring(0, 200) + }; + } + + // 监听来自 background 的消息 + chrome.runtime.onMessage.addListener((request, sender, sendResponse) => { + if (request.action === 'extractContent') { + const content = extractContent(); + sendResponse({ success: true, data: content }); + } else if (request.action === 'getSelection') { + const selection = highlightSelection(); + sendResponse({ success: true, data: selection }); + } else if (request.action === 'ping') { + sendResponse({ success: true, pong: true }); + } + return true; + }); + + // 添加浮动按钮(可选) + function addFloatingButton() { + const button = document.createElement('div'); + button.id = 'insightflow-clipper-btn'; + button.innerHTML = '📎'; + button.title = 'Clip to InsightFlow'; + button.style.cssText = ` + position: fixed; + bottom: 20px; + right: 20px; + width: 50px; + height: 50px; + background: #4CAF50; + border-radius: 50%; + display: flex; + align-items: center; + justify-content: center; + cursor: pointer; + box-shadow: 0 2px 10px rgba(0,0,0,0.3); + z-index: 999999; + font-size: 24px; + transition: transform 0.2s; + `; + + button.addEventListener('mouseenter', () => { + button.style.transform = 'scale(1.1)'; + }); + + button.addEventListener('mouseleave', () => { + button.style.transform = 'scale(1)'; + }); + + button.addEventListener('click', () => { + chrome.runtime.sendMessage({ action: 'openClipper' }); + }); + + document.body.appendChild(button); + } + + // 如果启用,添加浮动按钮 + chrome.storage.sync.get(['showFloatingButton'], (result) => { + if (result.showFloatingButton !== false) { + addFloatingButton(); + } + }); + + console.log('[InsightFlow] Content script loaded'); +})(); \ No newline at end of file diff --git a/chrome-extension/manifest.json b/chrome-extension/manifest.json new file mode 100644 index 0000000..96d45b7 --- /dev/null +++ b/chrome-extension/manifest.json @@ -0,0 +1,40 @@ +{ + "manifest_version": 3, + "name": "InsightFlow Clipper", + "version": "1.0.0", + "description": "一键将网页内容导入 InsightFlow 知识库", + "permissions": [ + "activeTab", + "storage", + "contextMenus", + "scripting" + ], + "host_permissions": [ + "http://*/*", + "https://*/*" + ], + "action": { + "default_popup": "popup.html", + "default_icon": { + "16": "icons/icon16.png", + "48": "icons/icon48.png", + "128": "icons/icon128.png" + } + }, + "background": { + "service_worker": "background.js" + }, + "content_scripts": [ + { + "matches": [""], + "js": ["content.js"], + "css": ["content.css"] + } + ], + "icons": { + "16": "icons/icon16.png", + "48": "icons/icon48.png", + "128": "icons/icon128.png" + }, + "options_page": "options.html" +} \ No newline at end of file diff --git a/chrome-extension/options.html b/chrome-extension/options.html new file mode 100644 index 0000000..b8ddf0e --- /dev/null +++ b/chrome-extension/options.html @@ -0,0 +1,247 @@ + + + + + + InsightFlow Clipper - 设置 + + + +
+
+

⚙️ InsightFlow 设置

+

配置您的知识库连接

+
+ +
+
+

+ 要使用 Chrome 扩展,您需要在 InsightFlow 中创建一个 Chrome 扩展令牌。 +
+ 前往 插件管理 > Chrome 扩展 创建令牌。 +

+
+ +
+
服务器配置
+ +
+ + +

您的 InsightFlow 服务器地址

+
+ +
+ + +

从 InsightFlow 获取的 Chrome 扩展令牌

+
+
+ +
+
偏好设置
+ +
+
+ + +
+

在网页右下角显示快速剪辑按钮

+
+ +
+
+ + +
+

剪辑后自动同步到服务器

+
+
+ +
+ + +
+ +
+
+
+ + + + \ No newline at end of file diff --git a/chrome-extension/options.js b/chrome-extension/options.js new file mode 100644 index 0000000..aa06870 --- /dev/null +++ b/chrome-extension/options.js @@ -0,0 +1,105 @@ +// InsightFlow Chrome Extension - Options Script + +document.addEventListener('DOMContentLoaded', () => { + // 加载保存的设置 + loadSettings(); + + // 绑定事件 + document.getElementById('saveBtn').addEventListener('click', saveSettings); + document.getElementById('testBtn').addEventListener('click', testConnection); +}); + +// 加载设置 +async function loadSettings() { + const settings = await chrome.storage.sync.get([ + 'serverUrl', + 'apiKey', + 'showFloatingButton', + 'autoSync' + ]); + + document.getElementById('serverUrl').value = settings.serverUrl || ''; + document.getElementById('apiKey').value = settings.apiKey || ''; + document.getElementById('showFloatingButton').checked = settings.showFloatingButton !== false; + document.getElementById('autoSync').checked = settings.autoSync !== false; +} + +// 保存设置 +async function saveSettings() { + const serverUrl = document.getElementById('serverUrl').value.trim(); + const apiKey = document.getElementById('apiKey').value.trim(); + const showFloatingButton = document.getElementById('showFloatingButton').checked; + const autoSync = document.getElementById('autoSync').checked; + + // 验证 + if (!serverUrl) { + showStatus('请输入服务器地址', 'error'); + return; + } + + if (!apiKey) { + showStatus('请输入 API 令牌', 'error'); + return; + } + + // 确保 URL 格式正确 + let formattedUrl = serverUrl; + if (!formattedUrl.startsWith('http://') && !formattedUrl.startsWith('https://')) { + formattedUrl = 'https://' + formattedUrl; + } + + // 移除末尾的斜杠 + formattedUrl = formattedUrl.replace(/\/$/, ''); + + // 保存 + await chrome.storage.sync.set({ + serverUrl: formattedUrl, + apiKey: apiKey, + showFloatingButton: showFloatingButton, + autoSync: autoSync + }); + + showStatus('设置已保存!', 'success'); +} + +// 测试连接 +async function testConnection() { + const serverUrl = document.getElementById('serverUrl').value.trim(); + const apiKey = document.getElementById('apiKey').value.trim(); + + if (!serverUrl || !apiKey) { + showStatus('请先填写服务器地址和 API 令牌', 'error'); + return; + } + + showStatus('正在测试连接...', ''); + + try { + const response = await fetch(`${serverUrl}/api/v1/health`, { + method: 'GET', + headers: { + 'Content-Type': 'application/json' + } + }); + + if (response.ok) { + const data = await response.json(); + showStatus(`连接成功!服务器版本: ${data.version || 'unknown'}`, 'success'); + } else { + showStatus('连接失败:服务器返回错误', 'error'); + } + } catch (error) { + showStatus('连接失败:' + error.message, 'error'); + } +} + +// 显示状态 +function showStatus(message, type) { + const statusEl = document.getElementById('status'); + statusEl.textContent = message; + statusEl.className = 'status'; + + if (type) { + statusEl.classList.add(type); + } +} \ No newline at end of file diff --git a/chrome-extension/popup.html b/chrome-extension/popup.html new file mode 100644 index 0000000..8452eda --- /dev/null +++ b/chrome-extension/popup.html @@ -0,0 +1,276 @@ + + + + + + InsightFlow Clipper + + + +
+

📎 InsightFlow

+

一键保存网页到知识库

+
+ +
+
+
加载中...
+
+
+
字数: 0
+
待同步: 0
+
+
+ +
+ + +
+ +
+ +
+
+
正在处理...
+
+ +
+ + +
+ + + + \ No newline at end of file diff --git a/chrome-extension/popup.js b/chrome-extension/popup.js new file mode 100644 index 0000000..6cf99b2 --- /dev/null +++ b/chrome-extension/popup.js @@ -0,0 +1,154 @@ +// InsightFlow Chrome Extension - Popup Script + +document.addEventListener('DOMContentLoaded', async () => { + // 获取当前标签页信息 + const [tab] = await chrome.tabs.query({ active: true, currentWindow: true }); + + // 更新页面信息 + document.getElementById('pageTitle').textContent = tab.title || '未知标题'; + document.getElementById('pageUrl').textContent = tab.url || ''; + + // 获取页面统计 + updateStats(); + + // 加载最近的剪辑 + loadRecentClips(); + + // 绑定按钮事件 + document.getElementById('clipPageBtn').addEventListener('click', clipPage); + document.getElementById('clipSelectionBtn').addEventListener('click', clipSelection); + document.getElementById('openOptions').addEventListener('click', openOptions); +}); + +// 更新统计信息 +async function updateStats() { + const [tab] = await chrome.tabs.query({ active: true, currentWindow: true }); + + // 获取字数统计 + try { + const response = await chrome.tabs.sendMessage(tab.id, { action: 'extractContent' }); + if (response.success) { + document.getElementById('wordCount').textContent = response.data.wordCount || 0; + } + } catch (error) { + console.log('Content script not available'); + } + + // 获取待同步数量 + const result = await chrome.storage.local.get(['clips']); + const clips = result.clips || []; + const pendingCount = clips.filter(c => !c.synced).length; + document.getElementById('pendingCount').textContent = pendingCount; +} + +// 保存整个页面 +async function clipPage() { + setLoading(true); + + try { + const [tab] = await chrome.tabs.query({ active: true, currentWindow: true }); + + // 发送消息给 background script + const response = await chrome.runtime.sendMessage({ action: 'clipPage' }); + + if (response.success) { + showStatus('页面已保存!', 'success'); + loadRecentClips(); + updateStats(); + } else { + showStatus(response.error || '保存失败', 'error'); + } + } catch (error) { + showStatus('保存失败: ' + error.message, 'error'); + } finally { + setLoading(false); + } +} + +// 保存选中内容 +async function clipSelection() { + setLoading(true); + + try { + const [tab] = await chrome.tabs.query({ active: true, currentWindow: true }); + + const response = await chrome.runtime.sendMessage({ action: 'clipSelection' }); + + if (response.success) { + showStatus('选中内容已保存!', 'success'); + loadRecentClips(); + updateStats(); + } else { + showStatus(response.error || '保存失败', 'error'); + } + } catch (error) { + showStatus('保存失败: ' + error.message, 'error'); + } finally { + setLoading(false); + } +} + +// 加载最近的剪辑 +async function loadRecentClips() { + const result = await chrome.storage.local.get(['clips']); + const clips = result.clips || []; + + const clipsList = document.getElementById('clipsList'); + clipsList.innerHTML = ''; + + // 只显示最近5条 + const recentClips = clips.slice(0, 5); + + for (const clip of recentClips) { + const clipEl = document.createElement('div'); + clipEl.className = 'clip-item'; + + const title = clip.title || '未命名'; + const time = new Date(clip.extractedAt).toLocaleString('zh-CN'); + const statusClass = clip.synced ? 'synced' : 'pending'; + const statusText = clip.synced ? '已同步' : '待同步'; + + clipEl.innerHTML = ` +
${escapeHtml(title)}
+
${time}
+ ${statusText} + `; + + clipsList.appendChild(clipEl); + } +} + +// 打开设置页面 +function openOptions(e) { + e.preventDefault(); + chrome.runtime.openOptionsPage(); +} + +// 显示状态消息 +function showStatus(message, type) { + const statusEl = document.getElementById('status'); + statusEl.textContent = message; + statusEl.className = 'status ' + type; + + setTimeout(() => { + statusEl.textContent = ''; + statusEl.className = 'status'; + }, 3000); +} + +// 设置加载状态 +function setLoading(loading) { + const loadingEl = document.getElementById('loading'); + if (loading) { + loadingEl.classList.add('active'); + } else { + loadingEl.classList.remove('active'); + } +} + +// HTML 转义 +function escapeHtml(text) { + const div = document.createElement('div'); + div.textContent = text; + return div.innerHTML; +} \ No newline at end of file diff --git a/docs/PHASE7_TASK2_SUMMARY.md b/docs/PHASE7_TASK2_SUMMARY.md new file mode 100644 index 0000000..d4ddbb2 --- /dev/null +++ b/docs/PHASE7_TASK2_SUMMARY.md @@ -0,0 +1,95 @@ +# InsightFlow Phase 7 任务 2 开发总结 + +## 完成内容 + +### 1. 多模态处理模块 (multimodal_processor.py) + +#### VideoProcessor 类 +- **视频文件处理**: 支持 MP4, AVI, MOV, MKV, WebM, FLV 格式 +- **音频提取**: 使用 ffmpeg 提取音频轨道(WAV 格式,16kHz 采样率) +- **关键帧提取**: 使用 OpenCV 按时间间隔提取关键帧(默认每5秒) +- **OCR识别**: 支持 PaddleOCR/EasyOCR/Tesseract 识别关键帧文字 +- **数据整合**: 合并所有帧的 OCR 文本,支持实体提取 + +#### ImageProcessor 类 +- **图片处理**: 支持 JPG, PNG, GIF, BMP, WebP 格式 +- **OCR识别**: 识别图片中的文字内容(白板、PPT、手写笔记) +- **图片描述**: 预留多模态 LLM 接口(待集成) +- **批量处理**: 支持批量图片导入 + +#### MultimodalEntityExtractor 类 +- 从视频和图片处理结果中提取实体和关系 +- 与现有 LLM 客户端集成 + +### 2. 多模态实体关联模块 (multimodal_entity_linker.py) + +#### MultimodalEntityLinker 类 +- **跨模态实体对齐**: 使用 embedding 相似度计算发现不同模态中的同一实体 +- **多模态实体画像**: 统计实体在各模态中的提及次数 +- **跨模态关系发现**: 查找在同一视频帧/图片中共同出现的实体 +- **多模态时间线**: 按时间顺序展示多模态事件 + +### 3. 数据库更新 (schema.sql) + +新增表: +- `videos`: 视频信息表(时长、帧率、分辨率、OCR文本) +- `video_frames`: 视频关键帧表(帧数据、时间戳、OCR文本) +- `images`: 图片信息表(OCR文本、描述、提取的实体) +- `multimodal_mentions`: 多模态实体提及表 +- `multimodal_entity_links`: 多模态实体关联表 + +### 4. API 端点 (main.py) + +#### 视频相关 +- `POST /api/v1/projects/{id}/upload-video` - 上传视频 +- `GET /api/v1/projects/{id}/videos` - 视频列表 +- `GET /api/v1/videos/{id}` - 视频详情 + +#### 图片相关 +- `POST /api/v1/projects/{id}/upload-image` - 上传图片 +- `GET /api/v1/projects/{id}/images` - 图片列表 +- `GET /api/v1/images/{id}` - 图片详情 + +#### 多模态实体关联 +- `POST /api/v1/projects/{id}/multimodal/link-entities` - 跨模态实体关联 +- `GET /api/v1/entities/{id}/multimodal-profile` - 实体多模态画像 +- `GET /api/v1/projects/{id}/multimodal-timeline` - 多模态时间线 +- `GET /api/v1/entities/{id}/cross-modal-relations` - 跨模态关系 + +### 5. 依赖更新 (requirements.txt) + +新增依赖: +- `opencv-python==4.9.0.80` - 视频处理 +- `pillow==10.2.0` - 图片处理 +- `paddleocr==2.7.0.3` + `paddlepaddle==2.6.0` - OCR 引擎 +- `ffmpeg-python==0.2.0` - ffmpeg 封装 +- `sentence-transformers==2.3.1` - 跨模态对齐 + +## 系统要求 + +- **ffmpeg**: 必须安装,用于视频和音频处理 +- **Python 3.8+**: 支持所有依赖库 + +## 待完善项 + +1. **多模态 LLM 集成**: 图片描述功能需要集成 Kimi 或其他多模态模型 API +2. **前端界面**: 需要开发视频/图片上传界面和多模态展示组件 +3. **性能优化**: 大视频文件处理可能需要异步任务队列 +4. **OCR 引擎选择**: 根据部署环境选择最适合的 OCR 引擎 + +## 部署说明 + +```bash +# 安装系统依赖 +apt-get update +apt-get install -y ffmpeg + +# 安装 Python 依赖 +pip install -r requirements.txt + +# 更新数据库 +sqlite3 insightflow.db < schema.sql + +# 启动服务 +python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000 +``` diff --git a/docs/PHASE7_TASK3_SUMMARY.md b/docs/PHASE7_TASK3_SUMMARY.md new file mode 100644 index 0000000..c07110d --- /dev/null +++ b/docs/PHASE7_TASK3_SUMMARY.md @@ -0,0 +1,164 @@ +# Phase 7 Task 3 开发总结:数据安全与合规 + +**开发时间**: 2026-02-23 18:00 +**状态**: ✅ 已完成 + +## 开发内容 + +### 1. 安全模块 (security_manager.py) + +创建了完整的安全管理模块,包含以下核心功能: + +#### 审计日志系统 +- 记录所有数据操作(创建、读取、更新、删除、导出等) +- 支持用户追踪、IP 记录、用户代理记录 +- 支持变更前后值记录 +- 提供统计查询功能 + +#### 端到端加密 +- 使用 AES-256-GCM 加密算法 +- 基于 PBKDF2 密钥派生 +- 支持项目级加密启用/禁用 +- 密码验证机制 + +#### 数据脱敏 +- 预定义脱敏规则:手机号、邮箱、身份证、银行卡、姓名、地址 +- 支持自定义脱敏规则(正则表达式) +- 支持规则优先级管理 +- 支持按规则类型选择性脱敏 + +#### 数据访问策略 +- 基于用户的访问控制 +- 基于角色的访问控制 +- 基于 IP 的访问控制(支持 CIDR) +- 基于时间的访问控制 +- 访问次数限制 +- 访问审批流程 + +### 2. 数据库表结构 + +新增了 5 张数据表: + +| 表名 | 用途 | +|------|------| +| audit_logs | 审计日志表 | +| encryption_configs | 加密配置表 | +| masking_rules | 脱敏规则表 | +| data_access_policies | 数据访问策略表 | +| access_requests | 访问请求表 | + +### 3. API 端点 + +新增了 18 个安全相关 API 端点: + +#### 审计日志 +- `GET /api/v1/audit-logs` - 查询审计日志 +- `GET /api/v1/audit-logs/stats` - 审计统计 + +#### 加密管理 +- `POST /api/v1/projects/{id}/encryption/enable` - 启用加密 +- `POST /api/v1/projects/{id}/encryption/disable` - 禁用加密 +- `POST /api/v1/projects/{id}/encryption/verify` - 验证密码 +- `GET /api/v1/projects/{id}/encryption` - 获取加密配置 + +#### 脱敏规则 +- `POST /api/v1/projects/{id}/masking-rules` - 创建脱敏规则 +- `GET /api/v1/projects/{id}/masking-rules` - 获取脱敏规则 +- `PUT /api/v1/masking-rules/{id}` - 更新脱敏规则 +- `DELETE /api/v1/masking-rules/{id}` - 删除脱敏规则 +- `POST /api/v1/projects/{id}/masking/apply` - 应用脱敏 + +#### 访问策略 +- `POST /api/v1/projects/{id}/access-policies` - 创建访问策略 +- `GET /api/v1/projects/{id}/access-policies` - 获取访问策略 +- `POST /api/v1/access-policies/{id}/check` - 检查访问权限 + +#### 访问请求 +- `POST /api/v1/access-requests` - 创建访问请求 +- `POST /api/v1/access-requests/{id}/approve` - 批准访问 +- `POST /api/v1/access-requests/{id}/reject` - 拒绝访问 + +### 4. 依赖更新 + +在 requirements.txt 中添加了: +``` +cryptography==42.0.0 +``` + +## 功能特性 + +### 审计日志 +- ✅ 完整的操作记录 +- ✅ 支持多种查询条件 +- ✅ 统计分析功能 +- ✅ 失败操作记录 + +### 端到端加密 +- ✅ AES-256-GCM 加密 +- ✅ PBKDF2 密钥派生 +- ✅ 项目级加密控制 +- ✅ 密码验证机制 + +### 数据脱敏 +- ✅ 预定义规则(6种) +- ✅ 自定义规则支持 +- ✅ 正则表达式匹配 +- ✅ 优先级管理 + +### 访问控制 +- ✅ 用户白名单 +- ✅ 角色控制 +- ✅ IP 白名单(支持 CIDR) +- ✅ 时间限制 +- ✅ 访问次数限制 +- ✅ 审批流程 + +## 技术实现 + +### 加密实现 +```python +# 密钥派生 +kdf = PBKDF2HMAC( + algorithm=hashes.SHA256(), + length=32, + salt=salt, + iterations=100000, +) +key = base64.urlsafe_b64encode(kdf.derive(password.encode())) + +# 数据加密 +f = Fernet(key) +encrypted = f.encrypt(data.encode()) +``` + +### 脱敏实现 +```python +# 预定义规则 +DEFAULT_MASKING_RULES = { + MaskingRuleType.PHONE: { + "pattern": r"(\d{3})\d{4}(\d{4})", + "replacement": r"\1****\2" + }, + # ... +} + +# 应用脱敏 +masked_text = re.sub(pattern, replacement, text) +``` + +## 后续建议 + +1. **前端界面** - 开发安全设置管理界面 +2. **审计日志可视化** - 添加图表展示审计统计 +3. **实时告警** - 异常操作实时通知 +4. **GDPR 合规** - 添加数据导出/删除功能 +5. **密钥管理** - 集成外部 KMS 服务 + +## 相关文件 + +- `backend/security_manager.py` - 安全模块 +- `backend/main.py` - API 端点 +- `backend/schema.sql` - 数据库表结构 +- `backend/requirements.txt` - 依赖 +- `STATUS.md` - 开发状态 +- `README.md` - 项目文档 diff --git a/docs/PHASE7_TASK7_SUMMARY.md b/docs/PHASE7_TASK7_SUMMARY.md new file mode 100644 index 0000000..dead55c --- /dev/null +++ b/docs/PHASE7_TASK7_SUMMARY.md @@ -0,0 +1,143 @@ +# Phase 7 任务 7 开发完成总结 + +## 已完成的工作 + +### 1. 创建 plugin_manager.py 模块 + +实现了完整的插件与集成系统,包含以下核心类: + +#### PluginManager +- 插件的 CRUD 操作 +- 插件配置的加密存储 +- 插件使用统计 + +#### ChromeExtensionHandler +- Chrome 扩展令牌管理(创建、验证、撤销) +- 网页内容导入(自动提取正文、保存为文档) +- 权限控制(read/write/delete) + +#### BotHandler +- 飞书/钉钉机器人会话管理 +- 消息接收和发送 +- 音频文件处理(支持群内直接分析) +- Webhook 签名验证 + +#### WebhookIntegration +- Zapier/Make Webhook 端点管理 +- 事件触发机制 +- 多种认证方式(API Key、Bearer、OAuth) +- 支持 5000+ 应用连接 + +#### WebDAVSyncManager +- WebDAV 同步配置管理 +- 连接测试 +- 项目数据导出和同步 +- 支持坚果云等 WebDAV 网盘 + +### 2. 更新 schema.sql + +添加了以下数据库表: +- `plugins`: 插件配置表 +- `plugin_configs`: 插件详细配置表 +- `bot_sessions`: 机器人会话表 +- `webhook_endpoints`: Webhook 端点表 +- `webdav_syncs`: WebDAV 同步配置表 +- `chrome_extension_tokens`: Chrome 扩展令牌表 + +### 3. 更新 main.py + +添加了完整的插件相关 API 端点: + +#### 插件管理 +- `POST /api/v1/plugins` - 创建插件 +- `GET /api/v1/plugins` - 列出插件 +- `GET /api/v1/plugins/{id}` - 获取插件详情 +- `PATCH /api/v1/plugins/{id}` - 更新插件 +- `DELETE /api/v1/plugins/{id}` - 删除插件 + +#### Chrome 扩展 +- `POST /api/v1/plugins/chrome/tokens` - 创建令牌 +- `GET /api/v1/plugins/chrome/tokens` - 列出自令牌 +- `DELETE /api/v1/plugins/chrome/tokens/{id}` - 撤销令牌 +- `POST /api/v1/plugins/chrome/import` - 导入网页内容 + +#### 机器人 +- `POST /api/v1/plugins/bot/feishu/sessions` - 创建飞书会话 +- `POST /api/v1/plugins/bot/dingtalk/sessions` - 创建钉钉会话 +- `GET /api/v1/plugins/bot/{type}/sessions` - 列出会话 +- `POST /api/v1/plugins/bot/{type}/webhook` - 接收消息 +- `POST /api/v1/plugins/bot/{type}/sessions/{id}/send` - 发送消息 + +#### 集成 +- `POST /api/v1/plugins/integrations/zapier` - 创建 Zapier 端点 +- `POST /api/v1/plugins/integrations/make` - 创建 Make 端点 +- `GET /api/v1/plugins/integrations/{type}` - 列出端点 +- `POST /api/v1/plugins/integrations/{id}/test` - 测试端点 +- `POST /api/v1/plugins/integrations/{id}/trigger` - 手动触发 + +#### WebDAV +- `POST /api/v1/plugins/webdav` - 创建同步配置 +- `GET /api/v1/plugins/webdav` - 列出配置 +- `POST /api/v1/plugins/webdav/{id}/test` - 测试连接 +- `POST /api/v1/plugins/webdav/{id}/sync` - 执行同步 +- `DELETE /api/v1/plugins/webdav/{id}` - 删除配置 + +### 4. 更新 requirements.txt + +添加了必要的依赖: +- `webdav4==0.9.8` - WebDAV 客户端 +- `urllib3==2.2.0` - URL 处理 + +### 5. 创建 Chrome 扩展基础代码 + +完整的 Chrome 扩展实现: +- `manifest.json` - 扩展配置(Manifest V3) +- `background.js` - 后台脚本(右键菜单、消息处理、自动同步) +- `content.js` - 内容脚本(页面内容提取、浮动按钮) +- `content.css` - 内容样式 +- `popup.html/js` - 弹出窗口(保存页面、查看剪辑历史) +- `options.html/js` - 设置页面(服务器配置、令牌设置) +- `README.md` - 扩展使用说明 + +## 功能特性 + +### Chrome 插件 +- ✅ 一键保存整个网页(智能提取正文) +- ✅ 保存选中的文本内容 +- ✅ 保存链接 +- ✅ 浮动按钮快速访问 +- ✅ 右键菜单支持 +- ✅ 自动同步到服务器 +- ✅ 离线缓存,稍后同步 + +### 飞书/钉钉机器人 +- ✅ 群内直接分析音频文件 +- ✅ 命令交互(/help, /status, /analyze) +- ✅ 消息自动回复 +- ✅ Webhook 签名验证 + +### Zapier/Make 集成 +- ✅ 创建 Webhook 端点 +- ✅ 事件触发机制 +- ✅ 支持 5000+ 应用连接 +- ✅ 多种认证方式 + +### WebDAV 同步 +- ✅ 与坚果云等网盘联动 +- ✅ 项目数据自动同步 +- ✅ 连接测试 +- ✅ 增量同步支持 + +## API 文档 + +所有 API 都已在 Swagger/OpenAPI 文档中注册,访问: +- Swagger UI: `/docs` +- ReDoc: `/redoc` + +## 下一步工作 + +Phase 7 任务 3: 数据安全与合规 +- 端到端加密 +- 数据脱敏 +- 审计日志 +- GDPR 合规支持 \ No newline at end of file diff --git a/frontend/app.js b/frontend/app.js index 8653215..4c831f9 100644 --- a/frontend/app.js +++ b/frontend/app.js @@ -1,4 +1,4 @@ -// InsightFlow Frontend - Phase 5 (Graph Analysis) +// InsightFlow Frontend - Phase 6 (API Platform) const API_BASE = '/api/v1'; let currentProject = null; @@ -98,2858 +98,339 @@ async function loadProjectData() { segments: [], entities: projectEntities, full_text: '', - created_at: new Date().toISOString() + relations: projectRelations }; + renderTranscript(); renderGraph(); renderEntityList(); - // 更新图分析面板的实体选择器 - populateGraphEntitySelects(); - } catch (err) { console.error('Load project data failed:', err); } } async function preloadEntityDetails() { - // 并行加载所有实体详情 - const promises = projectEntities.map(async (ent) => { + const promises = projectEntities.slice(0, 20).map(async entity => { try { - const res = await fetch(`${API_BASE}/entities/${ent.id}/details`); + const res = await fetch(`${API_BASE}/entities/${entity.id}/details`); if (res.ok) { - entityDetailsCache[ent.id] = await res.json(); + entityDetailsCache[entity.id] = await res.json(); } } catch (e) { - console.error(`Failed to load entity ${ent.id} details:`, e); + // Ignore errors } }); await Promise.all(promises); } -// ==================== Agent Panel ==================== - -function initAgentPanel() { - const chatInput = document.getElementById('chatInput'); - if (chatInput) { - chatInput.addEventListener('keypress', (e) => { - if (e.key === 'Enter' && !e.shiftKey) { - e.preventDefault(); - sendAgentMessage(); - } - }); - } -} - -function toggleAgentPanel() { - const panel = document.getElementById('agentPanel'); - const toggle = panel.querySelector('.agent-toggle'); - panel.classList.toggle('collapsed'); - toggle.textContent = panel.classList.contains('collapsed') ? '‹' : '›'; -} - -function addChatMessage(content, isUser = false, isTyping = false) { - const container = document.getElementById('chatMessages'); - const msgDiv = document.createElement('div'); - msgDiv.className = `chat-message ${isUser ? 'user' : 'assistant'}`; - - if (isTyping) { - msgDiv.innerHTML = ` -
- -
- `; - } else { - msgDiv.innerHTML = `
${content}
`; - } - - container.appendChild(msgDiv); - container.scrollTop = container.scrollHeight; - return msgDiv; -} - -function removeTypingIndicator() { - const indicator = document.getElementById('typingIndicator'); - if (indicator) { - indicator.parentElement.remove(); - } -} - -async function sendAgentMessage() { - const input = document.getElementById('chatInput'); - const message = input.value.trim(); - if (!message) return; - - input.value = ''; - addChatMessage(message, true); - addChatMessage('', false, true); - - try { - // 判断是命令还是问答 - const isCommand = message.includes('合并') || message.includes('修改') || - message.startsWith('把') || message.startsWith('将'); - - if (isCommand) { - // 执行命令 - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/agent/command`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ command: message }) - }); - - removeTypingIndicator(); - - if (res.ok) { - const result = await res.json(); - let response = ''; - - if (result.intent === 'merge_entities') { - if (result.success) { - response = `✅ 已合并 ${result.merged.length} 个实体到 "${result.target}"`; - await loadProjectData(); // 刷新数据 - } else { - response = `❌ 合并失败:${result.error || '未找到匹配的实体'}`; - } - } else if (result.intent === 'edit_entity') { - if (result.success) { - response = `✅ 已更新实体 "${result.entity?.name}"`; - await loadProjectData(); - } else { - response = `❌ 编辑失败:${result.error || '未找到实体'}`; - } - } else if (result.intent === 'answer_question') { - response = result.answer; - } else { - response = result.message || result.explanation || '未识别的指令'; - } - - addChatMessage(response); - } else { - addChatMessage('❌ 请求失败,请重试'); - } - } else { - // RAG 问答 - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/agent/query`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ query: message, stream: false }) - }); - - removeTypingIndicator(); - - if (res.ok) { - const result = await res.json(); - addChatMessage(result.answer); - } else { - addChatMessage('❌ 获取回答失败,请重试'); - } - } - } catch (err) { - removeTypingIndicator(); - addChatMessage('❌ 网络错误,请检查连接'); - console.error('Agent error:', err); - } -} - -async function loadSuggestions() { - addChatMessage('正在获取建议...', false, true); - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/agent/suggest`); - removeTypingIndicator(); - - if (res.ok) { - const result = await res.json(); - const suggestions = result.suggestions || []; - - if (suggestions.length === 0) { - addChatMessage('暂无建议,请先上传一些音频文件。'); - return; - } - - let html = '
💡 基于项目数据的建议:
'; - suggestions.forEach((s, i) => { - html += ` -
-
${s.type === 'action' ? '⚡ 操作' : '💡 洞察'}
-
${s.title}
-
${s.description}
-
- `; - }); - - const msgDiv = document.createElement('div'); - msgDiv.className = 'chat-message assistant'; - msgDiv.innerHTML = `
${html}
`; - document.getElementById('chatMessages').appendChild(msgDiv); - } - } catch (err) { - removeTypingIndicator(); - addChatMessage('❌ 获取建议失败'); - } -} - -function applySuggestion(index) { - // 可以在这里实现建议的自动应用 - addChatMessage('建议功能开发中,敬请期待!'); -} - -// ==================== Transcript Rendering ==================== - -function renderTranscript() { - const container = document.getElementById('transcriptContent'); - if (!container || !currentData || !currentData.segments) return; - - container.innerHTML = ''; - - currentData.segments.forEach((seg, idx) => { - const div = document.createElement('div'); - div.className = 'segment'; - div.dataset.index = idx; - - let text = seg.text; - const entities = findEntitiesInText(seg.text); - - entities.sort((a, b) => b.start - a.start); - - entities.forEach(ent => { - const before = text.slice(0, ent.start); - const name = text.slice(ent.start, ent.end); - const after = text.slice(ent.end); - const details = entityDetailsCache[ent.id]; - const confidence = details?.mentions?.[0]?.confidence || 1.0; - const lowConfClass = confidence < 0.7 ? 'low-confidence' : ''; - - text = before + `${name}` + after; - }); - - div.innerHTML = ` -
${seg.speaker}
-
${text}
- `; - - container.appendChild(div); - }); -} - -function findEntitiesInText(text) { - if (!projectEntities || projectEntities.length === 0) return []; - - const found = []; - projectEntities.forEach(ent => { - const name = ent.name; - let pos = 0; - while ((pos = text.indexOf(name, pos)) !== -1) { - found.push({ - id: ent.id, - name: ent.name, - start: pos, - end: pos + name.length - }); - pos += 1; - } - - if (ent.aliases && ent.aliases.length > 0) { - ent.aliases.forEach(alias => { - let aliasPos = 0; - while ((aliasPos = text.indexOf(alias, aliasPos)) !== -1) { - found.push({ - id: ent.id, - name: alias, - start: aliasPos, - end: aliasPos + alias.length - }); - aliasPos += 1; - } - }); - } - }); - - return found; -} - -// ==================== Entity Card ==================== - -function initEntityCard() { - const card = document.getElementById('entityCard'); - - // 鼠标移出卡片时隐藏 - card.addEventListener('mouseleave', () => { - card.classList.remove('show'); - }); -} - -function showEntityCard(event, entityId) { - const card = document.getElementById('entityCard'); - const details = entityDetailsCache[entityId]; - const entity = projectEntities.find(e => e.id === entityId); - - if (!entity) return; - - // 更新卡片内容 - document.getElementById('cardName').textContent = entity.name; - document.getElementById('cardBadge').textContent = entity.type; - document.getElementById('cardBadge').className = `entity-type-badge type-${entity.type.toLowerCase()}`; - document.getElementById('cardDefinition').textContent = entity.definition || '暂无定义'; - - const mentionCount = details?.mentions?.length || 0; - const relationCount = details?.relations?.length || 0; - document.getElementById('cardMentions').textContent = `${mentionCount} 次提及`; - document.getElementById('cardRelations').textContent = `${relationCount} 个关系`; - - // 定位卡片 - const rect = event.target.getBoundingClientRect(); - card.style.left = `${rect.left}px`; - card.style.top = `${rect.bottom + 10}px`; - - // 确保不超出屏幕 - const cardRect = card.getBoundingClientRect(); - if (cardRect.right > window.innerWidth) { - card.style.left = `${window.innerWidth - cardRect.width - 20}px`; - } - - card.classList.add('show'); -} - -function hideEntityCard() { - // 延迟隐藏,允许鼠标移到卡片上 - setTimeout(() => { - const card = document.getElementById('entityCard'); - if (!card.matches(':hover')) { - card.classList.remove('show'); - } - }, 100); -} - -// ==================== Graph Visualization ==================== - -function renderGraph() { - const svg = d3.select('#graph-svg'); - svg.selectAll('*').remove(); - - if (!projectEntities || projectEntities.length === 0) { - svg.append('text') - .attr('x', '50%') - .attr('y', '50%') - .attr('text-anchor', 'middle') - .attr('fill', '#666') - .text('暂无实体数据,请上传音频'); - return; - } - - const container = svg.node().parentElement; - const width = container.clientWidth; - const height = container.clientHeight - 200; - - svg.attr('width', width).attr('height', height); - - const nodes = projectEntities.map(e => ({ - id: e.id, - name: e.name, - type: e.type, - definition: e.definition, - ...e - })); - - const links = projectRelations.map(r => ({ - id: r.id, - source: r.source_id, - target: r.target_id, - type: r.type, - evidence: r.evidence - })).filter(r => r.source && r.target); - - if (links.length === 0 && nodes.length > 1) { - for (let i = 0; i < Math.min(nodes.length - 1, 5); i++) { - links.push({ source: nodes[0].id, target: nodes[i + 1].id, type: 'related' }); - } - } - - const colorMap = { - 'PROJECT': '#7b2cbf', - 'TECH': '#00d4ff', - 'PERSON': '#ff6b6b', - 'ORG': '#4ecdc4', - 'OTHER': '#666' - }; - - const simulation = d3.forceSimulation(nodes) - .force('link', d3.forceLink(links).id(d => d.id).distance(120)) - .force('charge', d3.forceManyBody().strength(-400)) - .force('center', d3.forceCenter(width / 2, height / 2)) - .force('collision', d3.forceCollide().radius(50)); - - // 关系连线 - const link = svg.append('g') - .selectAll('line') - .data(links) - .enter().append('line') - .attr('stroke', '#444') - .attr('stroke-width', 1.5) - .attr('stroke-opacity', 0.6) - .style('cursor', 'pointer') - .on('click', (e, d) => showProvenance(d)); - - // 关系标签 - const linkLabel = svg.append('g') - .selectAll('text') - .data(links) - .enter().append('text') - .attr('font-size', '10px') - .attr('fill', '#666') - .attr('text-anchor', 'middle') - .style('pointer-events', 'none') - .text(d => d.type); - - // 节点组 - const node = svg.append('g') - .selectAll('g') - .data(nodes) - .enter().append('g') - .attr('class', 'node') - .call(d3.drag() - .on('start', dragstarted) - .on('drag', dragged) - .on('end', dragended)) - .on('click', (e, d) => window.selectEntity(d.id)) - .on('mouseenter', (e, d) => showEntityCard(e, d.id)) - .on('mouseleave', hideEntityCard); - - // 节点圆圈 - node.append('circle') - .attr('r', 35) - .attr('fill', d => colorMap[d.type] || '#666') - .attr('stroke', '#fff') - .attr('stroke-width', 2) - .attr('class', 'node-circle'); - - // 节点文字 - node.append('text') - .text(d => d.name.length > 6 ? d.name.slice(0, 5) + '...' : d.name) - .attr('text-anchor', 'middle') - .attr('dy', 5) - .attr('fill', '#fff') - .attr('font-size', '11px') - .attr('font-weight', '500') - .style('pointer-events', 'none'); - - // 节点类型图标 - node.append('text') - .attr('dy', -45) - .attr('text-anchor', 'middle') - .attr('fill', d => colorMap[d.type] || '#666') - .attr('font-size', '10px') - .text(d => d.type) - .style('pointer-events', 'none'); - - simulation.on('tick', () => { - link - .attr('x1', d => d.source.x) - .attr('y1', d => d.source.y) - .attr('x2', d => d.target.x) - .attr('y2', d => d.target.y); - - linkLabel - .attr('x', d => (d.source.x + d.target.x) / 2) - .attr('y', d => (d.source.y + d.target.y) / 2); - - node.attr('transform', d => `translate(${d.x},${d.y})`); - }); - - function dragstarted(e, d) { - if (!e.active) simulation.alphaTarget(0.3).restart(); - d.fx = d.x; - d.fy = d.y; - } - - function dragged(e, d) { - d.fx = e.x; - d.fy = e.y; - } - - function dragended(e, d) { - if (!e.active) simulation.alphaTarget(0); - d.fx = null; - d.fy = null; - } -} - -// ==================== Provenance ==================== - -async function showProvenance(relation) { - const modal = document.getElementById('provenanceModal'); - const body = document.getElementById('provenanceBody'); - - modal.classList.add('show'); - body.innerHTML = '

加载中...

'; - - try { - let content = ''; - - if (relation.id) { - // 从API获取溯源信息 - const res = await fetch(`${API_BASE}/relations/${relation.id}/provenance`); - if (res.ok) { - const data = await res.json(); - content = ` -
-
关系类型
-
${data.source} → ${data.type} → ${data.target}
-
- -
-
来源文档
-
${data.transcript?.filename || '未知文件'}
-
- -
证据文本
-
"${data.evidence || '无证据文本'}"
- `; - } else { - content = '

获取溯源信息失败

'; - } - } else { - // 使用本地数据 - content = ` -
-
关系类型
-
${relation.source.name || relation.source} → ${relation.type} → ${relation.target.name || relation.target}
-
- -
证据文本
-
"${relation.evidence || '无证据文本'}"
- `; - } - - body.innerHTML = content; - } catch (err) { - body.innerHTML = '

加载失败

'; - } -} - -function closeProvenance() { - document.getElementById('provenanceModal').classList.remove('show'); -} - -// ==================== Entity List ==================== - -function renderEntityList() { - const container = document.getElementById('entityList'); - if (!container) return; - - container.innerHTML = '

项目实体

'; - - if (!projectEntities || projectEntities.length === 0) { - container.innerHTML += '

暂无实体,请上传音频文件

'; - return; - } - - projectEntities.forEach(ent => { - const div = document.createElement('div'); - div.className = 'entity-item'; - div.dataset.id = ent.id; - div.onclick = () => window.selectEntity(ent.id); - div.onmouseenter = (e) => showEntityCard(e, ent.id); - div.onmouseleave = hideEntityCard; - - div.innerHTML = ` - ${ent.type} -
-
${ent.name}
-
${ent.definition || '暂无定义'}
-
- `; - - container.appendChild(div); - }); -} - -// ==================== Entity Selection ==================== - -window.selectEntity = function(entityId) { - selectedEntity = entityId; - const entity = projectEntities.find(e => e.id === entityId); - if (!entity) return; - - // 高亮文本中的实体 - document.querySelectorAll('.entity').forEach(el => { - if (el.dataset.id === entityId) { - el.style.background = '#ff6b6b'; - el.style.color = '#fff'; - } else { - el.style.background = ''; - el.style.color = ''; - } - }); - - // 高亮图谱中的节点 - d3.selectAll('.node-circle') - .attr('stroke', d => d.id === entityId ? '#ff6b6b' : '#fff') - .attr('stroke-width', d => d.id === entityId ? 4 : 2) - .attr('r', d => d.id === entityId ? 40 : 35); - - // 高亮实体列表 - document.querySelectorAll('.entity-item').forEach(el => { - if (el.dataset.id === entityId) { - el.style.background = '#2a2a2a'; - el.style.borderLeft = '3px solid #ff6b6b'; - } else { - el.style.background = ''; - el.style.borderLeft = ''; - } - }); - - console.log('Selected:', entity.name, entity.definition); -}; - -// ==================== Upload ==================== - -window.showUpload = function() { - const el = document.getElementById('uploadOverlay'); - if (el) el.classList.add('show'); -}; - -window.hideUpload = function() { - const el = document.getElementById('uploadOverlay'); - if (el) el.classList.remove('show'); -}; - -function initUpload() { - const input = document.getElementById('fileInput'); - const overlay = document.getElementById('uploadOverlay'); - - if (!input) return; - - input.addEventListener('change', async (e) => { - if (!e.target.files.length) return; - - const file = e.target.files[0]; - if (overlay) { - overlay.innerHTML = ` -
-

正在分析...

-

${file.name}

-

ASR转录 + 实体提取中

-
- `; - } - - try { - const result = await uploadAudio(file); - - currentData = result; - - await loadProjectData(); - - if (result.segments && result.segments.length > 0) { - renderTranscript(); - } - - if (overlay) overlay.classList.remove('show'); - - } catch (err) { - console.error('Upload failed:', err); - if (overlay) { - overlay.innerHTML = ` -
-

分析失败

-

${err.message}

- -
- `; - } - } - }); -} - -// ==================== Phase 5: Timeline View ==================== - -async function loadTimeline() { - const container = document.getElementById('timelineContainer'); - const entityFilter = document.getElementById('timelineEntityFilter'); - - if (!container) return; - - container.innerHTML = '

加载时间线数据...

'; - - try { - // 更新实体筛选器选项 - if (entityFilter && projectEntities.length > 0) { - const currentValue = entityFilter.value; - entityFilter.innerHTML = ''; - projectEntities.forEach(ent => { - const option = document.createElement('option'); - option.value = ent.id; - option.textContent = ent.name; - entityFilter.appendChild(option); - }); - entityFilter.value = currentValue; - } - - // 构建查询参数 - const params = new URLSearchParams(); - if (entityFilter && entityFilter.value) { - params.append('entity_id', entityFilter.value); - } - - // 获取时间线数据 - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/timeline?${params}`); - if (!res.ok) throw new Error('Failed to load timeline'); - - const data = await res.json(); - const events = data.events || []; - - // 更新统计 - const mentions = events.filter(e => e.type === 'mention').length; - const relations = events.filter(e => e.type === 'relation').length; - - document.getElementById('timelineTotalEvents').textContent = events.length; - document.getElementById('timelineMentions').textContent = mentions; - document.getElementById('timelineRelations').textContent = relations; - - // 渲染时间线 - renderTimeline(events); - - } catch (err) { - console.error('Load timeline failed:', err); - container.innerHTML = '

加载失败,请重试

'; - } -} - -function renderTimeline(events) { - const container = document.getElementById('timelineContainer'); - - if (events.length === 0) { - container.innerHTML = ` -
-

暂无时间线数据

-

请先上传音频或文档文件

-
- `; - return; - } - - // 按日期分组 - const grouped = groupEventsByDate(events); - - let html = '
'; - - Object.entries(grouped).forEach(([date, dayEvents]) => { - const dateLabel = formatDateLabel(date); - - html += ` -
-
-
${dateLabel}
-
-
-
- `; - - dayEvents.forEach(event => { - html += renderTimelineEvent(event); - }); - - html += '
'; - }); - - container.innerHTML = html; -} - -function groupEventsByDate(events) { - const grouped = {}; - - events.forEach(event => { - const date = event.event_date.split('T')[0]; - if (!grouped[date]) { - grouped[date] = []; - } - grouped[date].push(event); - }); - - return grouped; -} - -function formatDateLabel(dateStr) { - const date = new Date(dateStr); - const today = new Date(); - const yesterday = new Date(today); - yesterday.setDate(yesterday.getDate() - 1); - - if (dateStr === today.toISOString().split('T')[0]) { - return '今天'; - } else if (dateStr === yesterday.toISOString().split('T')[0]) { - return '昨天'; - } else { - return `${date.getMonth() + 1}月${date.getDate()}日`; - } -} - -function renderTimelineEvent(event) { - if (event.type === 'mention') { - return ` -
-
- 提及 - ${event.entity_name} - ${event.entity_type || 'OTHER'} -
-
"${event.text_snippet || ''}"
-
- 📄 ${event.source?.filename || '未知文件'} - ${event.confidence ? `置信度: ${(event.confidence * 100).toFixed(0)}%` : ''} -
-
- `; - } else if (event.type === 'relation') { - return ` -
-
- 关系 - ${event.source_entity} → ${event.target_entity} -
-
- 关系类型: ${event.relation_type} -
- ${event.evidence ? `
"${event.evidence}"
` : ''} -
- 📄 ${event.source?.filename || '未知文件'} -
-
- `; - } - return ''; -} - // ==================== View Switching ==================== window.switchView = function(viewName) { - // 更新侧边栏按钮状态 + // Update sidebar buttons document.querySelectorAll('.sidebar-btn').forEach(btn => { btn.classList.remove('active'); }); - // 隐藏所有视图 - document.getElementById('workbenchView').style.display = 'none'; - document.getElementById('knowledgeBaseView').classList.remove('show'); - document.getElementById('timelineView').classList.remove('show'); - document.getElementById('reasoningView').classList.remove('active'); - document.getElementById('graphAnalysisView').classList.remove('active'); + const views = { + 'workbench': 'workbenchView', + 'knowledge-base': 'knowledgeBaseView', + 'timeline': 'timelineView', + 'reasoning': 'reasoningView', + 'graph-analysis': 'graphAnalysisView', + 'api-keys': 'apiKeysView' + }; - // 显示选中的视图 - if (viewName === 'workbench') { - document.getElementById('workbenchView').style.display = 'flex'; - document.querySelector('.sidebar-btn:nth-child(1)').classList.add('active'); - } else if (viewName === 'knowledge-base') { - document.getElementById('knowledgeBaseView').classList.add('show'); - document.querySelector('.sidebar-btn:nth-child(2)').classList.add('active'); + // Hide all views + Object.values(views).forEach(id => { + const el = document.getElementById(id); + if (el) { + el.style.display = 'none'; + el.classList.remove('active', 'show'); + } + }); + + // Show selected view + const targetId = views[viewName]; + if (targetId) { + const targetEl = document.getElementById(targetId); + if (targetEl) { + targetEl.style.display = 'flex'; + targetEl.classList.add('active', 'show'); + } + } + + // Update active button + const btnMap = { + 'workbench': 0, + 'knowledge-base': 1, + 'timeline': 2, + 'reasoning': 3, + 'graph-analysis': 4, + 'api-keys': 5 + }; + const buttons = document.querySelectorAll('.sidebar-btn'); + if (buttons[btnMap[viewName]]) { + buttons[btnMap[viewName]].classList.add('active'); + } + + // Load view-specific data + if (viewName === 'knowledge-base') { loadKnowledgeBase(); } else if (viewName === 'timeline') { - document.getElementById('timelineView').classList.add('show'); - document.querySelector('.sidebar-btn:nth-child(3)').classList.add('active'); loadTimeline(); - } else if (viewName === 'reasoning') { - document.getElementById('reasoningView').classList.add('active'); - document.querySelector('.sidebar-btn:nth-child(4)').classList.add('active'); } else if (viewName === 'graph-analysis') { - document.getElementById('graphAnalysisView').classList.add('active'); - document.querySelector('.sidebar-btn:nth-child(5)').classList.add('active'); initGraphAnalysis(); + } else if (viewName === 'api-keys') { + loadApiKeys(); } }; -window.switchKBTab = function(tabName) { - // 更新导航项状态 - document.querySelectorAll('.kb-nav-item').forEach(item => { - item.classList.remove('active'); - }); - - // 隐藏所有部分 - document.querySelectorAll('.kb-section').forEach(section => { - section.classList.remove('active'); - }); - - // 显示选中的部分 - const tabMap = { - 'entities': { nav: 0, section: 'kbEntitiesSection' }, - 'relations': { nav: 1, section: 'kbRelationsSection' }, - 'glossary': { nav: 2, section: 'kbGlossarySection' }, - 'transcripts': { nav: 3, section: 'kbTranscriptsSection' } - }; - - const mapping = tabMap[tabName]; - if (mapping) { - document.querySelectorAll('.kb-nav-item')[mapping.nav].classList.add('active'); - document.getElementById(mapping.section).classList.add('active'); - } -}; +// ==================== Phase 6: API Key Management ==================== -async function loadKnowledgeBase() { - if (!currentProject) return; - +let apiKeysData = []; +let currentApiKeyId = null; + +// Load API Keys +async function loadApiKeys() { try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/knowledge-base`); - if (!res.ok) throw new Error('Failed to load knowledge base'); - + const res = await fetch(`${API_BASE}/api-keys`); + if (!res.ok) throw new Error('Failed to fetch API keys'); const data = await res.json(); - - // 更新统计 - document.getElementById('kbEntityCount').textContent = data.stats.entity_count; - document.getElementById('kbRelationCount').textContent = data.stats.relation_count; - document.getElementById('kbTranscriptCount').textContent = data.stats.transcript_count; - document.getElementById('kbGlossaryCount').textContent = data.stats.glossary_count; - - // 渲染实体网格 - const entityGrid = document.getElementById('kbEntityGrid'); - entityGrid.innerHTML = ''; - data.entities.forEach(ent => { - const card = document.createElement('div'); - card.className = 'kb-entity-card'; - card.onclick = () => { - switchView('workbench'); - setTimeout(() => selectEntity(ent.id), 100); - }; - - // 渲染属性预览 - let attrsHtml = ''; - if (ent.attributes && ent.attributes.length > 0) { - attrsHtml = ` -
- ${ent.attributes.slice(0, 3).map(a => ` - - ${a.name}: ${Array.isArray(a.value) ? a.value.join(', ') : a.value} - - `).join('')} - ${ent.attributes.length > 3 ? `+${ent.attributes.length - 3}` : ''} -
- `; - } - - card.innerHTML = ` -
- ${ent.type} - ${ent.name} -
-
${ent.definition || '暂无定义'}
-
- 📍 ${ent.mention_count || 0} 次提及 · - ${ent.appears_in?.length || 0} 个文件 -
- ${attrsHtml} - `; - entityGrid.appendChild(card); - }); - - // 渲染关系列表 - const relationsList = document.getElementById('kbRelationsList'); - relationsList.innerHTML = ''; - data.relations.forEach(rel => { - const item = document.createElement('div'); - item.className = 'kb-glossary-item'; - item.innerHTML = ` -
- ${rel.source_name} - → ${rel.type} → - ${rel.target_name} - ${rel.evidence ? `
"${rel.evidence.substring(0, 100)}..."
` : ''} -
- `; - relationsList.appendChild(item); - }); - - // 渲染术语表 - const glossaryList = document.getElementById('kbGlossaryList'); - glossaryList.innerHTML = ''; - data.glossary.forEach(term => { - const item = document.createElement('div'); - item.className = 'kb-glossary-item'; - item.innerHTML = ` -
- ${term.term} - ${term.pronunciation ? `(${term.pronunciation})` : ''} - 出现 ${term.frequency} 次 -
- - `; - glossaryList.appendChild(item); - }); - - // 渲染文件列表 - const transcriptsList = document.getElementById('kbTranscriptsList'); - transcriptsList.innerHTML = ''; - data.transcripts.forEach(t => { - const item = document.createElement('div'); - item.className = 'kb-transcript-item'; - item.innerHTML = ` -
- ${t.type === 'audio' ? '🎵' : '📄'} - ${t.filename} -
${new Date(t.created_at).toLocaleString()}
-
- `; - transcriptsList.appendChild(item); - }); - + apiKeysData = data.keys || []; + renderApiKeys(); + updateApiKeyStats(); } catch (err) { - console.error('Load knowledge base failed:', err); + console.error('Failed to load API keys:', err); + document.getElementById('apiKeysListContent').innerHTML = ` +
+

加载失败: ${err.message}

+
+ `; } } -// ==================== Glossary Functions ==================== - -window.showAddTermModal = function() { - document.getElementById('glossaryModal').classList.add('show'); -}; - -window.hideGlossaryModal = function() { - document.getElementById('glossaryModal').classList.remove('show'); -}; - -window.saveGlossaryTerm = async function() { - const term = document.getElementById('glossaryTerm').value.trim(); - const pronunciation = document.getElementById('glossaryPronunciation').value.trim(); +// Update API Key Stats +function updateApiKeyStats() { + const total = apiKeysData.length; + const active = apiKeysData.filter(k => k.status === 'active').length; + const revoked = apiKeysData.filter(k => k.status === 'revoked').length; + const totalCalls = apiKeysData.reduce((sum, k) => sum + (k.total_calls || 0), 0); - if (!term) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/glossary`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ term, pronunciation }) - }); - - if (res.ok) { - hideGlossaryModal(); - document.getElementById('glossaryTerm').value = ''; - document.getElementById('glossaryPronunciation').value = ''; - loadKnowledgeBase(); - } - } catch (err) { - console.error('Save glossary term failed:', err); - } -}; - -window.deleteGlossaryTerm = async function(termId) { - if (!confirm('确定要删除这个术语吗?')) return; - - try { - const res = await fetch(`${API_BASE}/glossary/${termId}`, { - method: 'DELETE' - }); - - if (res.ok) { - loadKnowledgeBase(); - } - } catch (err) { - console.error('Delete glossary term failed:', err); - } -}; - -// ==================== Phase 5: Knowledge Reasoning ==================== - -window.submitReasoningQuery = async function() { - const input = document.getElementById('reasoningInput'); - const depth = document.getElementById('reasoningDepth').value; - const query = input.value.trim(); - - if (!query) return; - - const resultsDiv = document.getElementById('reasoningResults'); - - // 显示加载状态 - resultsDiv.innerHTML = ` -
-
- -
-

正在进行知识推理...

-
- `; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/reasoning/query`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ query, reasoning_depth: depth }) - }); - - if (!res.ok) throw new Error('Reasoning failed'); - - const data = await res.json(); - renderReasoningResult(data); - - } catch (err) { - console.error('Reasoning query failed:', err); - resultsDiv.innerHTML = ` -
-

推理失败,请稍后重试

-
- `; - } -}; - -function renderReasoningResult(data) { - const resultsDiv = document.getElementById('reasoningResults'); - - const typeLabels = { - 'causal': '🔍 因果推理', - 'comparative': '⚖️ 对比推理', - 'temporal': '⏱️ 时序推理', - 'associative': '🔗 关联推理', - 'summary': '📝 总结推理' - }; - - const typeLabel = typeLabels[data.reasoning_type] || '🤔 智能分析'; - const confidencePercent = Math.round(data.confidence * 100); - - let evidenceHtml = ''; - if (data.evidence && data.evidence.length > 0) { - evidenceHtml = ` -
-

📋 支撑证据

- ${data.evidence.map(e => `
${e.text || e}
`).join('')} -
- `; - } - - let gapsHtml = ''; - if (data.knowledge_gaps && data.knowledge_gaps.length > 0) { - gapsHtml = ` -
-

⚠️ 知识缺口

-
    - ${data.knowledge_gaps.map(g => `
  • ${g}
  • `).join('')} -
-
- `; - } - - resultsDiv.innerHTML = ` -
-
-
- ${typeLabel} -
-
- 置信度: ${confidencePercent}% -
-
-
- ${data.answer.replace(/\n/g, '
')} -
- ${evidenceHtml} - ${gapsHtml} -
- `; + document.getElementById('apiKeyTotalCount').textContent = total; + document.getElementById('apiKeyActiveCount').textContent = active; + document.getElementById('apiKeyRevokedCount').textContent = revoked; + document.getElementById('apiKeyTotalCalls').textContent = totalCalls.toLocaleString(); } -window.clearReasoningResult = function() { - document.getElementById('reasoningResults').innerHTML = ''; - document.getElementById('reasoningInput').value = ''; - document.getElementById('inferencePathsSection').style.display = 'none'; -}; - -window.generateSummary = async function(summaryType) { - const resultsDiv = document.getElementById('reasoningResults'); +// Render API Keys List +function renderApiKeys() { + const container = document.getElementById('apiKeysListContent'); - // 显示加载状态 - resultsDiv.innerHTML = ` -
-
- -
-

正在生成项目总结...

-
- `; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/reasoning/summary`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ summary_type: summaryType }) - }); - - if (!res.ok) throw new Error('Summary failed'); - - const data = await res.json(); - renderSummaryResult(data); - - } catch (err) { - console.error('Summary generation failed:', err); - resultsDiv.innerHTML = ` -
-

总结生成失败,请稍后重试

+ if (apiKeysData.length === 0) { + container.innerHTML = ` +
+

暂无 API Keys

+
`; - } -}; - -function renderSummaryResult(data) { - const resultsDiv = document.getElementById('reasoningResults'); - - const typeLabels = { - 'comprehensive': '📋 全面总结', - 'executive': '💼 高管摘要', - 'technical': '⚙️ 技术总结', - 'risk': '⚠️ 风险分析' - }; - - const typeLabel = typeLabels[data.summary_type] || '📝 项目总结'; - - let keyPointsHtml = ''; - if (data.key_points && data.key_points.length > 0) { - keyPointsHtml = ` -
-

📌 关键要点

-
    - ${data.key_points.map(p => `
  • ${p}
  • `).join('')} -
-
- `; - } - - let risksHtml = ''; - if (data.risks && data.risks.length > 0) { - risksHtml = ` -
-

⚠️ 风险与问题

-
    - ${data.risks.map(r => `
  • ${r}
  • `).join('')} -
-
- `; - } - - let recommendationsHtml = ''; - if (data.recommendations && data.recommendations.length > 0) { - recommendationsHtml = ` -
-

💡 建议

- ${data.recommendations.map(r => `
${r}
`).join('')} -
- `; - } - - resultsDiv.innerHTML = ` -
-
-
- ${typeLabel} -
-
- 置信度: ${Math.round(data.confidence * 100)}% -
-
-
- ${data.overview ? data.overview.replace(/\n/g, '
') : ''} -
- ${keyPointsHtml} - ${risksHtml} - ${recommendationsHtml} -
- `; -} - -window.findInferencePath = async function(startEntity, endEntity) { - const pathsSection = document.getElementById('inferencePathsSection'); - const pathsList = document.getElementById('inferencePathsList'); - - pathsSection.style.display = 'block'; - pathsList.innerHTML = '

正在搜索关联路径...

'; - - try { - const res = await fetch( - `${API_BASE}/projects/${currentProject.id}/reasoning/inference-path?start_entity=${encodeURIComponent(startEntity)}&end_entity=${encodeURIComponent(endEntity)}` - ); - - if (!res.ok) throw new Error('Path finding failed'); - - const data = await res.json(); - renderInferencePaths(data); - - } catch (err) { - console.error('Path finding failed:', err); - pathsList.innerHTML = '

路径搜索失败

'; - } -}; - -// Phase 5: Entity Attributes Management -let currentEntityIdForAttributes = null; -let currentAttributes = []; -let currentTemplates = []; - -// Show entity attributes modal -window.showEntityAttributes = async function(entityId) { - if (entityId) { - currentEntityIdForAttributes = entityId; - } else if (selectedEntity) { - currentEntityIdForAttributes = selectedEntity; - } else { - alert('请先选择一个实体'); return; } - const modal = document.getElementById('attributesModal'); - modal.classList.add('show'); - - // Reset form - document.getElementById('attributesAddForm').style.display = 'none'; - document.getElementById('toggleAddAttrBtn').style.display = 'inline-block'; - document.getElementById('saveAttrBtn').style.display = 'none'; - - await loadEntityAttributes(); -}; - -window.hideAttributesModal = function() { - document.getElementById('attributesModal').classList.remove('show'); - currentEntityIdForAttributes = null; -}; - -async function loadEntityAttributes() { - if (!currentEntityIdForAttributes) return; - - try { - const res = await fetch(`${API_BASE}/entities/${currentEntityIdForAttributes}/attributes`); - if (!res.ok) throw new Error('Failed to load attributes'); - - const data = await res.json(); - currentAttributes = data.attributes || []; - - renderAttributesList(); - } catch (err) { - console.error('Load attributes failed:', err); - document.getElementById('attributesList').innerHTML = '

加载失败

'; - } -} - -function renderAttributesList() { - const container = document.getElementById('attributesList'); - - if (currentAttributes.length === 0) { - container.innerHTML = '

暂无属性,点击"添加属性"创建

'; - return; - } - - container.innerHTML = currentAttributes.map(attr => { - let valueDisplay = attr.value; - if (attr.type === 'multiselect' && Array.isArray(attr.value)) { - valueDisplay = attr.value.join(', '); - } - - return ` -
-
-
- ${attr.name} - ${attr.type} -
-
${valueDisplay || '-'}
-
-
- - -
+ container.innerHTML = apiKeysData.map(key => ` +
+
+
${escapeHtml(key.name)}
+
${key.key_preview}
- `; - }).join(''); +
+ ${key.permissions.map(p => `${p}`).join('')} +
+
${key.rate_limit}/min
+
+ ${key.status} +
+
${key.total_calls || 0}
+
+ ${key.status === 'active' ? ` + + + ` : '已失效'} +
+
+ `).join(''); } -window.toggleAddAttributeForm = function() { - const form = document.getElementById('attributesAddForm'); - const toggleBtn = document.getElementById('toggleAddAttrBtn'); - const saveBtn = document.getElementById('saveAttrBtn'); - - if (form.style.display === 'none') { - form.style.display = 'block'; - toggleBtn.style.display = 'none'; - saveBtn.style.display = 'inline-block'; - } else { - form.style.display = 'none'; - toggleBtn.style.display = 'inline-block'; - saveBtn.style.display = 'none'; - } +// Show Create API Key Modal +window.showCreateApiKeyModal = function() { + document.getElementById('apiKeyCreateModal').classList.add('show'); + document.getElementById('apiKeyName').value = ''; + document.getElementById('apiKeyName').focus(); }; -window.onAttrTypeChange = function() { - const type = document.getElementById('attrType').value; - const optionsGroup = document.getElementById('attrOptionsGroup'); - const valueContainer = document.getElementById('attrValueContainer'); - - if (type === 'select' || type === 'multiselect') { - optionsGroup.style.display = 'block'; - } else { - optionsGroup.style.display = 'none'; - } - - // Update value input based on type - if (type === 'date') { - valueContainer.innerHTML = ''; - } else if (type === 'number') { - valueContainer.innerHTML = ''; - } else { - valueContainer.innerHTML = ''; - } +// Hide Create API Key Modal +window.hideCreateApiKeyModal = function() { + document.getElementById('apiKeyCreateModal').classList.remove('show'); }; -window.saveAttribute = async function() { - if (!currentEntityIdForAttributes) return; - - const name = document.getElementById('attrName').value.trim(); - const type = document.getElementById('attrType').value; - let value = document.getElementById('attrValue').value; - const changeReason = document.getElementById('attrChangeReason').value.trim(); - +// Create API Key +window.createApiKey = async function() { + const name = document.getElementById('apiKeyName').value.trim(); if (!name) { - alert('请输入属性名称'); + alert('请输入 API Key 名称'); return; } - // Handle options for select/multiselect - let options = null; - if (type === 'select' || type === 'multiselect') { - const optionsStr = document.getElementById('attrOptions').value.trim(); - if (optionsStr) { - options = optionsStr.split(',').map(o => o.trim()).filter(o => o); - } - - // Handle multiselect value - if (type === 'multiselect' && value) { - value = value.split(',').map(v => v.trim()).filter(v => v); - } + const permissions = []; + if (document.getElementById('permRead').checked) permissions.push('read'); + if (document.getElementById('permWrite').checked) permissions.push('write'); + if (document.getElementById('permDelete').checked) permissions.push('delete'); + + if (permissions.length === 0) { + alert('请至少选择一个权限'); + return; } - // Handle number type - if (type === 'number' && value) { - value = parseFloat(value); - } + const rateLimit = parseInt(document.getElementById('apiKeyRateLimit').value); + const expiresDays = document.getElementById('apiKeyExpires').value; try { - const res = await fetch(`${API_BASE}/entities/${currentEntityIdForAttributes}/attributes`, { + const res = await fetch(`${API_BASE}/api-keys`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ name, - type, - value, - options, - change_reason: changeReason + permissions, + rate_limit: rateLimit, + expires_days: expiresDays ? parseInt(expiresDays) : null }) }); - if (!res.ok) throw new Error('Failed to save attribute'); + if (!res.ok) throw new Error('Failed to create API key'); - // Reset form - document.getElementById('attrName').value = ''; - document.getElementById('attrValue').value = ''; - document.getElementById('attrOptions').value = ''; - document.getElementById('attrChangeReason').value = ''; + const data = await res.json(); + hideCreateApiKeyModal(); - // Reload attributes - await loadEntityAttributes(); - - // Hide form - toggleAddAttributeForm(); + // Show the created key + document.getElementById('createdApiKeyValue').textContent = data.api_key; + document.getElementById('apiKeyCreatedModal').classList.add('show'); + // Refresh list + await loadApiKeys(); } catch (err) { - console.error('Save attribute failed:', err); - alert('保存失败,请重试'); + console.error('Failed to create API key:', err); + alert('创建失败: ' + err.message); } }; -window.deleteAttribute = async function(attributeId) { - if (!confirm('确定要删除这个属性吗?')) return; +// Copy API Key to clipboard +window.copyApiKey = function() { + const key = document.getElementById('createdApiKeyValue').textContent; + navigator.clipboard.writeText(key).then(() => { + showNotification('API Key 已复制到剪贴板', 'success'); + }).catch(() => { + // Fallback + const textarea = document.createElement('textarea'); + textarea.value = key; + document.body.appendChild(textarea); + textarea.select(); + document.execCommand('copy'); + document.body.removeChild(textarea); + showNotification('API Key 已复制到剪贴板', 'success'); + }); +}; + +// Hide API Key Created Modal +window.hideApiKeyCreatedModal = function() { + document.getElementById('apiKeyCreatedModal').classList.remove('show'); +}; + +// Show API Key Stats +window.showApiKeyStats = async function(keyId, keyName) { + currentApiKeyId = keyId; + document.getElementById('apiKeyStatsTitle').textContent = `API Key 统计 - ${keyName}`; + document.getElementById('apiKeyStatsModal').classList.add('show'); try { - const res = await fetch(`${API_BASE}/entities/${currentEntityIdForAttributes}/attributes/${attributeId}`, { + const res = await fetch(`${API_BASE}/api-keys/${keyId}/stats?days=30`); + if (!res.ok) throw new Error('Failed to fetch stats'); + + const data = await res.json(); + + // Update stats + document.getElementById('statsTotalCalls').textContent = data.summary.total_calls.toLocaleString(); + document.getElementById('statsSuccessCalls').textContent = data.summary.success_calls.toLocaleString(); + document.getElementById('statsErrorCalls').textContent = data.summary.error_calls.toLocaleString(); + document.getElementById('statsAvgTime').textContent = Math.round(data.summary.avg_response_time_ms); + + // Render logs + renderApiKeyLogs(data.logs || []); + } catch (err) { + console.error('Failed to load stats:', err); + document.getElementById('apiKeyLogs').innerHTML = ` +
+

加载统计失败

+
+ `; + } +}; + +// Render API Key Logs +function renderApiKeyLogs(logs) { + const container = document.getElementById('apiKeyLogs'); + + if (logs.length === 0) { + container.innerHTML = ` +
+

暂无调用记录

+
+ `; + return; + } + + container.innerHTML = logs.map(log => ` +
+
${escapeHtml(log.endpoint)}
+
${log.method}
+
${log.status_code}
+
${log.response_time_ms}ms
+
+ `).join(''); +} + +// Hide API Key Stats Modal +window.hideApiKeyStatsModal = function() { + document.getElementById('apiKeyStatsModal').classList.remove('show'); + currentApiKeyId = null; +}; + +// Revoke API Key +window.revokeApiKey = async function(keyId) { + if (!confirm('确定要撤销此 API Key 吗?撤销后将无法恢复。')) { + return; + } + + try { + const res = await fetch(`${API_BASE}/api-keys/${keyId}`, { method: 'DELETE' }); - if (!res.ok) throw new Error('Failed to delete attribute'); + if (!res.ok) throw new Error('Failed to revoke API key'); - await loadEntityAttributes(); + showNotification('API Key 已撤销', 'success'); + await loadApiKeys(); } catch (err) { - console.error('Delete attribute failed:', err); - alert('删除失败'); + console.error('Failed to revoke API key:', err); + alert('撤销失败: ' + err.message); } }; -// Attribute History -window.showAttributeHistory = async function(attributeName) { - if (!currentEntityIdForAttributes) return; - - const modal = document.getElementById('attrHistoryModal'); - modal.classList.add('show'); - - try { - const res = await fetch(`${API_BASE}/entities/${currentEntityIdForAttributes}/attributes/history?attribute_name=${encodeURIComponent(attributeName)}`); - if (!res.ok) throw new Error('Failed to load history'); - - const data = await res.json(); - renderAttributeHistory(data.history, attributeName); - } catch (err) { - console.error('Load history failed:', err); - document.getElementById('attrHistoryContent').innerHTML = '

加载失败

'; - } -}; - -window.hideAttrHistoryModal = function() { - document.getElementById('attrHistoryModal').classList.remove('show'); -}; - -function renderAttributeHistory(history, attributeName) { - const container = document.getElementById('attrHistoryContent'); - - if (history.length === 0) { - container.innerHTML = `

属性 "${attributeName}" 暂无变更历史

`; - return; - } - - container.innerHTML = history.map(h => { - const date = new Date(h.changed_at).toLocaleString(); - return ` -
-
- ${h.changed_by || '系统'} - ${date} -
-
- ${h.old_value || '(无)'} - - ${h.new_value || '(无)'} -
- ${h.change_reason ? `
原因: ${h.change_reason}
` : ''} -
- `; - }).join(''); -} - -// Attribute Templates Management -window.showAttributeTemplates = async function() { - const modal = document.getElementById('attrTemplatesModal'); - modal.classList.add('show'); - - document.getElementById('templateForm').style.display = 'none'; - document.getElementById('toggleTemplateBtn').style.display = 'inline-block'; - document.getElementById('saveTemplateBtn').style.display = 'none'; - - await loadAttributeTemplates(); -}; - -window.hideAttrTemplatesModal = function() { - document.getElementById('attrTemplatesModal').classList.remove('show'); -}; - -async function loadAttributeTemplates() { - if (!currentProject) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/attribute-templates`); - if (!res.ok) throw new Error('Failed to load templates'); - - currentTemplates = await res.json(); - renderTemplatesList(); - } catch (err) { - console.error('Load templates failed:', err); - document.getElementById('templatesList').innerHTML = '

加载失败

'; - } -} - -function renderTemplatesList() { - const container = document.getElementById('templatesList'); - - if (currentTemplates.length === 0) { - container.innerHTML = '

暂无模板,点击"新建模板"创建

'; - return; - } - - container.innerHTML = currentTemplates.map(t => { - const optionsStr = t.options ? `选项: ${t.options.join(', ')}` : ''; - return ` -
-
-
- ${t.name} - ${t.type} - ${t.is_required ? '*' : ''} -
-
${t.description || ''} ${optionsStr}
-
-
- -
-
- `; - }).join(''); -} - -window.toggleTemplateForm = function() { - const form = document.getElementById('templateForm'); - const toggleBtn = document.getElementById('toggleTemplateBtn'); - const saveBtn = document.getElementById('saveTemplateBtn'); - - if (form.style.display === 'none') { - form.style.display = 'block'; - toggleBtn.style.display = 'none'; - saveBtn.style.display = 'inline-block'; - } else { - form.style.display = 'none'; - toggleBtn.style.display = 'inline-block'; - saveBtn.style.display = 'none'; - } -}; - -window.onTemplateTypeChange = function() { - const type = document.getElementById('templateType').value; - const optionsGroup = document.getElementById('templateOptionsGroup'); - - if (type === 'select' || type === 'multiselect') { - optionsGroup.style.display = 'block'; - } else { - optionsGroup.style.display = 'none'; - } -}; - -window.saveTemplate = async function() { - if (!currentProject) return; - - const name = document.getElementById('templateName').value.trim(); - const type = document.getElementById('templateType').value; - const description = document.getElementById('templateDesc').value.trim(); - const isRequired = document.getElementById('templateRequired').checked; - const defaultValue = document.getElementById('templateDefault').value.trim(); - - if (!name) { - alert('请输入模板名称'); - return; - } - - let options = null; - if (type === 'select' || type === 'multiselect') { - const optionsStr = document.getElementById('templateOptions').value.trim(); - if (optionsStr) { - options = optionsStr.split(',').map(o => o.trim()).filter(o => o); - } - } - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/attribute-templates`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - name, - type, - description, - options, - is_required: isRequired, - default_value: defaultValue || null - }) - }); - - if (!res.ok) throw new Error('Failed to save template'); - - // Reset form - document.getElementById('templateName').value = ''; - document.getElementById('templateDesc').value = ''; - document.getElementById('templateOptions').value = ''; - document.getElementById('templateDefault').value = ''; - document.getElementById('templateRequired').checked = false; - - await loadAttributeTemplates(); - toggleTemplateForm(); - - } catch (err) { - console.error('Save template failed:', err); - alert('保存失败'); - } -}; - -window.deleteTemplate = async function(templateId) { - if (!confirm('确定要删除这个模板吗?')) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/attribute-templates/${templateId}`, { - method: 'DELETE' - }); - - if (!res.ok) throw new Error('Failed to delete template'); - - await loadAttributeTemplates(); - } catch (err) { - console.error('Delete template failed:', err); - alert('删除失败'); - } -}; - -// Search entities by attributes -window.searchByAttributes = async function() { - if (!currentProject) return; - - const filterName = document.getElementById('attrFilterName').value; - const filterValue = document.getElementById('attrFilterValue').value; - const filterOp = document.getElementById('attrFilterOp').value; - - if (!filterName || !filterValue) { - alert('请输入筛选条件'); - return; - } - - try { - const filters = JSON.stringify([{ name: filterName, value: filterValue, operator: filterOp }]); - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/entities/search-by-attributes?filters=${encodeURIComponent(filters)}`); - - if (!res.ok) throw new Error('Search failed'); - - const entities = await res.json(); - - // Update entity grid - const grid = document.getElementById('kbEntityGrid'); - if (entities.length === 0) { - grid.innerHTML = '

未找到匹配的实体

'; - return; - } - - grid.innerHTML = entities.map(ent => ` -
-
- ${ent.type} - ${ent.name} -
-
${ent.definition || '暂无定义'}
-
- `).join(''); - - } catch (err) { - console.error('Search by attributes failed:', err); - alert('搜索失败'); - } -}; - -// ==================== Export Functions ==================== - -// Show export panel -window.showExportPanel = function() { - const modal = document.getElementById('exportPanelModal'); - if (modal) { - modal.style.display = 'flex'; - - // Show transcript export section if a transcript is selected - const transcriptSection = document.getElementById('transcriptExportSection'); - if (transcriptSection && currentData && currentData.transcript_id !== 'project_view') { - transcriptSection.style.display = 'block'; - } else if (transcriptSection) { - transcriptSection.style.display = 'none'; - } - } -}; - -// Hide export panel -window.hideExportPanel = function() { - const modal = document.getElementById('exportPanelModal'); - if (modal) { - modal.style.display = 'none'; - } -}; - -// Helper function to download file -function downloadFile(url, filename) { - const link = document.createElement('a'); - link.href = url; - link.download = filename; - document.body.appendChild(link); - link.click(); - document.body.removeChild(link); -} - -// Export knowledge graph as SVG -window.exportGraph = async function(format) { - if (!currentProject) return; - - try { - const endpoint = format === 'svg' ? 'graph-svg' : 'graph-png'; - const mimeType = format === 'svg' ? 'image/svg+xml' : 'image/png'; - const ext = format === 'svg' ? 'svg' : 'png'; - - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/export/${endpoint}`); - - if (!res.ok) throw new Error(`Export ${format} failed`); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-graph-${currentProject.id}.${ext}`); - URL.revokeObjectURL(url); - - showNotification(`图谱已导出为 ${format.toUpperCase()}`, 'success'); - } catch (err) { - console.error(`Export ${format} failed:`, err); - alert(`导出失败: ${err.message}`); - } -}; - -// Export entities -window.exportEntities = async function(format) { - if (!currentProject) return; - - try { - const endpoint = format === 'excel' ? 'entities-excel' : 'entities-csv'; - const mimeType = format === 'excel' ? 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' : 'text/csv'; - const ext = format === 'excel' ? 'xlsx' : 'csv'; - - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/export/${endpoint}`); - - if (!res.ok) throw new Error(`Export ${format} failed`); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-entities-${currentProject.id}.${ext}`); - URL.revokeObjectURL(url); - - showNotification(`实体数据已导出为 ${format.toUpperCase()}`, 'success'); - } catch (err) { - console.error(`Export ${format} failed:`, err); - alert(`导出失败: ${err.message}`); - } -}; - -// Export relations -window.exportRelations = async function(format) { - if (!currentProject) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/export/relations-csv`); - - if (!res.ok) throw new Error('Export relations failed'); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-relations-${currentProject.id}.csv`); - URL.revokeObjectURL(url); - - showNotification('关系数据已导出为 CSV', 'success'); - } catch (err) { - console.error('Export relations failed:', err); - alert(`导出失败: ${err.message}`); - } -}; - -// Export project report as PDF -window.exportReport = async function(format) { - if (!currentProject) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/export/report-pdf`); - - if (!res.ok) throw new Error('Export PDF failed'); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-report-${currentProject.id}.pdf`); - URL.revokeObjectURL(url); - - showNotification('项目报告已导出为 PDF', 'success'); - } catch (err) { - console.error('Export PDF failed:', err); - alert(`导出失败: ${err.message}`); - } -}; - -// Export project as JSON -window.exportProject = async function(format) { - if (!currentProject) return; - - try { - const res = await fetch(`${API_BASE}/projects/${currentProject.id}/export/project-json`); - - if (!res.ok) throw new Error('Export JSON failed'); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-project-${currentProject.id}.json`); - URL.revokeObjectURL(url); - - showNotification('项目数据已导出为 JSON', 'success'); - } catch (err) { - console.error('Export JSON failed:', err); - alert(`导出失败: ${err.message}`); - } -}; - -// Export transcript as Markdown -window.exportTranscript = async function(format) { - if (!currentProject || !currentData || currentData.transcript_id === 'project_view') { - alert('请先选择一个转录文件'); - return; - } - - try { - const res = await fetch(`${API_BASE}/transcripts/${currentData.transcript_id}/export/markdown`); - - if (!res.ok) throw new Error('Export Markdown failed'); - - const blob = await res.blob(); - const url = URL.createObjectURL(blob); - downloadFile(url, `insightflow-transcript-${currentData.transcript_id}.md`); - URL.revokeObjectURL(url); - - showNotification('转录文本已导出为 Markdown', 'success'); - } catch (err) { - console.error('Export Markdown failed:', err); - alert(`导出失败: ${err.message}`); - } -}; - -// Show notification -function showNotification(message, type = 'info') { - // Create notification element - const notification = document.createElement('div'); - notification.style.cssText = ` - position: fixed; - top: 20px; - right: 20px; - background: ${type === 'success' ? 'rgba(0, 212, 255, 0.9)' : '#333'}; - color: ${type === 'success' ? '#000' : '#fff'}; - padding: 12px 20px; - border-radius: 8px; - z-index: 10000; - font-size: 0.9rem; - animation: slideIn 0.3s ease; - `; - notification.textContent = message; - - document.body.appendChild(notification); - - // Remove after 3 seconds - setTimeout(() => { - notification.style.animation = 'slideOut 0.3s ease'; - setTimeout(() => { - document.body.removeChild(notification); - }, 300); - }, 3000); -} - -// Add animation styles -const style = document.createElement('style'); -style.textContent = ` - @keyframes slideIn { - from { transform: translateX(100%); opacity: 0; } - to { transform: translateX(0); opacity: 1; } - } - @keyframes slideOut { - from { transform: translateX(0); opacity: 1; } - to { transform: translateX(100%); opacity: 0; } - } -`; -document.head.appendChild(style); - -// ==================== Graph Analysis Functions ==================== - -// Initialize graph analysis view -function initGraphAnalysis() { - if (!currentProject) return; - - // 填充实体选择器 - populateEntitySelectors(); - - // 加载图统计 - loadGraphStats(); - - // 检查 Neo4j 状态 - checkNeo4jStatus(); -} - -function populateEntitySelectors() { - const selectors = [ - document.getElementById('pathStartEntity'), - document.getElementById('pathEndEntity'), - document.getElementById('neighborEntity') - ]; - - selectors.forEach(selector => { - if (!selector) return; - - const currentValue = selector.value; - selector.innerHTML = ''; - - projectEntities.forEach(ent => { - const option = document.createElement('option'); - option.value = ent.id; - option.textContent = `${ent.name} (${ent.type})`; - selector.appendChild(option); - }); - - selector.value = currentValue; - }); -} - -async function checkNeo4jStatus() { - try { - const res = await fetch(`${API_BASE}/neo4j/status`); - if (res.ok) { - const data = await res.json(); - updateNeo4jStatusUI(data.connected); - } - } catch (err) { - console.error('Check Neo4j status failed:', err); - updateNeo4jStatusUI(false); - } -} - -function updateNeo4jStatusUI(connected) { - // 可以在头部添加状态指示器 - const header = document.querySelector('.graph-analysis-header'); - let statusEl = document.getElementById('neo4jStatus'); - - if (!statusEl) { - statusEl = document.createElement('div'); - statusEl.id = 'neo4jStatus'; - statusEl.className = 'neo4j-status'; - header.appendChild(statusEl); - } - - statusEl.className = `neo4j-status ${connected ? 'connected' : 'disconnected'}`; - statusEl.innerHTML = ` - - Neo4j ${connected ? '已连接' : '未连接'} - `; -} - -async function syncToNeo4j() { - if (!currentProject) return; - - const btn = event.target; - const originalText = btn.textContent; - btn.textContent = '🔄 同步中...'; - btn.disabled = true; - - try { - const res = await fetch(`${API_BASE}/neo4j/sync`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ project_id: currentProject.id }) - }); - - if (!res.ok) throw new Error('Sync failed'); - - const data = await res.json(); - showNotification(`同步成功!${data.nodes_synced} 个节点, ${data.edges_synced} 条边`, 'success'); - - // 刷新统计 - await loadGraphStats(); - checkNeo4jStatus(); - - } catch (err) { - console.error('Sync to Neo4j failed:', err); - showNotification('同步失败,请检查 Neo4j 连接', 'error'); - } finally { - btn.textContent = originalText; - btn.disabled = false; - } -} - -async function loadGraphStats() { - if (!currentProject) return; - - try { - // 加载图统计 - const statsRes = await fetch(`${API_BASE}/projects/${currentProject.id}/graph/stats`); - if (statsRes.ok) { - graphStats = await statsRes.json(); - renderGraphStats(graphStats); - } - - // 加载中心性分析 - const centralityRes = await fetch(`${API_BASE}/projects/${currentProject.id}/graph/centrality`); - if (centralityRes.ok) { - centralityData = await centralityRes.json(); - renderCentrality(centralityData); - } - - // 加载社区发现 - const communitiesRes = await fetch(`${API_BASE}/projects/${currentProject.id}/graph/communities`); - if (communitiesRes.ok) { - communitiesData = await communitiesRes.json(); - renderCommunities(communitiesData); - } - - } catch (err) { - console.error('Load graph stats failed:', err); - } -} - -function renderGraphStats(stats) { - document.getElementById('statNodeCount').textContent = stats.node_count || 0; - document.getElementById('statEdgeCount').textContent = stats.edge_count || 0; - document.getElementById('statDensity').textContent = (stats.density || 0).toFixed(3); - document.getElementById('statComponents').textContent = stats.component_count || 0; -} - -function renderCentrality(data) { - const container = document.getElementById('centralityList'); - - if (!data.centrality || data.centrality.length === 0) { - container.innerHTML = '

暂无中心性数据

'; - return; - } - - // 按度中心性排序 - const sorted = [...data.centrality].sort((a, b) => b.degree - a.degree); - - container.innerHTML = sorted.map((item, index) => { - const rank = index + 1; - const isTop3 = rank <= 3; - const entity = projectEntities.find(e => e.id === item.entity_id); - - return ` -
-
${rank}
-
-
${item.entity_name}
-
${item.entity_type}${entity ? ` · ${entity.definition?.substring(0, 30) || ''}` : ''}
-
-
-
${item.degree}
-
连接数
-
-
- `; - }).join(''); -} - -// Enhanced community visualization with better interactivity -function renderCommunities(data) { - const svg = d3.select('#communitiesSvg'); - svg.selectAll('*').remove(); - - const container = document.getElementById('communitiesList'); - - if (!data.communities || data.communities.length === 0) { - container.innerHTML = '

暂无社区数据

'; - return; - } - - // 渲染社区列表 - container.innerHTML = data.communities.map((community, idx) => { - const nodeNames = community.node_names || []; - const density = community.density ? community.density.toFixed(3) : 'N/A'; - return ` -
-
- 社区 ${idx + 1} - ${community.size} 个节点 -
-
密度: ${density}
-
- ${nodeNames.slice(0, 8).map(name => ` - ${name} - `).join('')} - ${nodeNames.length > 8 ? `+${nodeNames.length - 8}` : ''} -
-
- `; - }).join(''); - - // 渲染社区可视化 - renderCommunitiesViz(data.communities); -} - -// Global variable to track focused community -let focusedCommunityIndex = null; - -// Focus on a specific community -window.focusCommunity = function(communityIndex) { - focusedCommunityIndex = communityIndex; - if (communitiesData && communitiesData.communities) { - renderCommunitiesViz(communitiesData.communities, communityIndex); - } -}; - -// Enhanced community visualization with focus support -function renderCommunitiesViz(communities, focusIndex = null) { - const svg = d3.select('#communitiesSvg'); - const container = svg.node().parentElement; - const width = container.clientWidth; - const height = container.clientHeight || 400; - - svg.attr('width', width).attr('height', height); - - // 颜色方案 - const colors = [ - '#00d4ff', '#7b2cbf', '#ff6b6b', '#4ecdc4', - '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea' - ]; - - // 准备节点数据 - let allNodes = []; - let allLinks = []; - - communities.forEach((comm, idx) => { - const isFocused = focusIndex === null || focusIndex === idx; - const isDimmed = focusIndex !== null && focusIndex !== idx; - const opacity = isDimmed ? 0.2 : 1; - - const nodes = (comm.node_names || []).map((name, i) => ({ - id: `${idx}-${i}`, - name: name, - community: idx, - color: colors[idx % colors.length], - opacity: opacity, - isFocused: isFocused - })); - - // Create intra-community links - if (nodes.length > 1) { - for (let i = 0; i < nodes.length; i++) { - for (let j = i + 1; j < nodes.length; j++) { - allLinks.push({ - source: nodes[i].id, - target: nodes[j].id, - community: idx, - opacity: opacity * 0.3 - }); - } - } - } - - allNodes = allNodes.concat(nodes); - }); - - if (allNodes.length === 0) return; - - // Create community centers for force layout - const communityCenters = communities.map((_, idx) => ({ - x: width / 2 + (idx % 3 - 1) * width / 4, - y: height / 2 + Math.floor(idx / 3) * height / 4 - })); - - // 使用力导向布局 - const simulation = d3.forceSimulation(allNodes) - .force('charge', d3.forceManyBody().strength(d => d.isFocused ? -150 : -50)) - .force('collision', d3.forceCollide().radius(d => d.isFocused ? 35 : 25)) - .force('x', d3.forceX(d => communityCenters[d.community]?.x || width / 2).strength(0.1)) - .force('y', d3.forceY(d => communityCenters[d.community]?.y || height / 2).strength(0.1)) - .force('link', d3.forceLink(allLinks).id(d => d.id).distance(60).strength(0.1)); - - // Draw links - const link = svg.selectAll('.community-link') - .data(allLinks) - .enter().append('line') - .attr('class', 'community-link') - .attr('stroke', d => colors[d.community % colors.length]) - .attr('stroke-width', 1) - .attr('stroke-opacity', d => d.opacity); - - // Draw nodes - const node = svg.selectAll('.community-node') - .data(allNodes) - .enter().append('g') - .attr('class', 'community-node') - .style('cursor', 'pointer') - .call(d3.drag() - .on('start', dragstarted) - .on('drag', dragged) - .on('end', dragended)); - - // Node glow for focused community - node.filter(d => d.isFocused) - .append('circle') - .attr('r', 28) - .attr('fill', d => d.color) - .attr('opacity', 0.2) - .attr('filter', 'url(#glow)'); - - // Main node circle - node.append('circle') - .attr('r', d => d.isFocused ? 22 : 18) - .attr('fill', d => d.color) - .attr('stroke', '#fff') - .attr('stroke-width', d => d.isFocused ? 3 : 2) - .attr('opacity', d => d.opacity); - - // Node labels (only for focused community) - node.filter(d => d.isFocused) - .append('text') - .text(d => d.name.length > 6 ? d.name.slice(0, 5) + '...' : d.name) - .attr('text-anchor', 'middle') - .attr('dy', 35) - .attr('fill', '#e0e0e0') - .attr('font-size', '10px') - .attr('font-weight', '500') - .style('pointer-events', 'none'); - - // Community label for first node in each community - node.filter(d => { - const commNodes = allNodes.filter(n => n.community === d.community); - return d.id === commNodes[0]?.id && d.isFocused; - }) - .append('text') - .attr('dy', -30) - .attr('text-anchor', 'middle') - .attr('fill', d => d.color) - .attr('font-size', '11px') - .attr('font-weight', '600') - .text(d => `社区 ${d.community + 1}`); - - simulation.on('tick', () => { - link - .attr('x1', d => d.source.x) - .attr('y1', d => d.source.y) - .attr('x2', d => d.target.x) - .attr('y2', d => d.target.y); - - node.attr('transform', d => `translate(${d.x},${d.y})`); - }); - - function dragstarted(event, d) { - if (!event.active) simulation.alphaTarget(0.3).restart(); - d.fx = d.x; - d.fy = d.y; - } - - function dragged(event, d) { - d.fx = event.x; - d.fy = event.y; - } - - function dragended(event, d) { - if (!event.active) simulation.alphaTarget(0); - d.fx = null; - d.fy = null; - } -} - -window.switchGraphTab = function(tabName) { - // 更新标签状态 - document.querySelectorAll('.graph-analysis-tab').forEach(tab => { - tab.classList.remove('active'); - }); - event.target.classList.add('active'); - - // 切换面板 - document.querySelectorAll('.graph-viz-panel').forEach(panel => { - panel.classList.remove('active'); - }); - - if (tabName === 'centrality') { - document.getElementById('centralityPanel').classList.add('active'); - } else if (tabName === 'communities') { - document.getElementById('communitiesPanel').classList.add('active'); - } -}; - -// Enhanced shortest path with better visualization -async function findShortestPath() { - const startId = document.getElementById('pathStartEntity').value; - const endId = document.getElementById('pathEndEntity').value; - - if (!startId || !endId) { - alert('请选择起点和终点实体'); - return; - } - - if (startId === endId) { - alert('起点和终点不能相同'); - return; - } - - // 切换到路径面板 - document.querySelectorAll('.graph-viz-panel').forEach(panel => { - panel.classList.remove('active'); - }); - document.getElementById('pathPanel').classList.add('active'); - - // 显示加载状态 - document.getElementById('pathViz').innerHTML = ` -
-
- 正在查找最短路径... -
- `; - - try { - const res = await fetch(`${API_BASE}/graph/shortest-path`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - start_entity_id: startId, - end_entity_id: endId - }) - }); - - if (!res.ok) throw new Error('Path finding failed'); - - const data = await res.json(); - currentPathData = data; - renderPath(data); - - } catch (err) { - console.error('Find shortest path failed:', err); - document.getElementById('pathViz').innerHTML = ` -
-
-

路径查找失败

-

请确保数据已同步到 Neo4j

-
- `; - } -} - -// Enhanced path rendering with animation and better styling -function renderPath(data) { - const startEntity = projectEntities.find(e => e.id === data.start_entity_id); - const endEntity = projectEntities.find(e => e.id === data.end_entity_id); - - document.getElementById('pathDescription').textContent = - `${startEntity?.name || '起点'} → ${endEntity?.name || '终点'} (${data.path_length} 步)`; - - // 渲染路径可视化 - const svg = d3.select('#pathSvg'); - svg.selectAll('*').remove(); - - const container = svg.node().parentElement; - const width = container.clientWidth; - const height = container.clientHeight || 300; - - svg.attr('width', width).attr('height', height); - - if (!data.path || data.path.length === 0) { - document.getElementById('pathViz').innerHTML = ` -
-
🔍
-

未找到路径

-
- `; - document.getElementById('pathInfo').innerHTML = ''; - return; - } - - // Add defs for gradients and filters - const defs = svg.append('defs'); - - // Glow filter - const filter = defs.append('filter') - .attr('id', 'pathGlow') - .attr('x', '-50%') - .attr('y', '-50%') - .attr('width', '200%') - .attr('height', '200%'); - - filter.append('feGaussianBlur') - .attr('stdDeviation', '4') - .attr('result', 'coloredBlur'); - - const feMerge = filter.append('feMerge'); - feMerge.append('feMergeNode').attr('in', 'coloredBlur'); - feMerge.append('feMergeNode').attr('in', 'SourceGraphic'); - - // Linear gradient for path - const gradient = defs.append('linearGradient') - .attr('id', 'pathLineGradient') - .attr('gradientUnits', 'userSpaceOnUse'); - - gradient.append('stop').attr('offset', '0%').attr('stop-color', '#00d4ff'); - gradient.append('stop').attr('offset', '100%').attr('stop-color', '#7b2cbf'); - - // 准备节点和边 - use linear layout for clarity - const nodes = data.path.map((nodeId, idx) => ({ - id: nodeId, - name: projectEntities.find(e => e.id === nodeId)?.name || nodeId, - type: projectEntities.find(e => e.id === nodeId)?.type || 'OTHER', - x: (width / (data.path.length + 1)) * (idx + 1), - y: height / 2, - isStart: idx === 0, - isEnd: idx === data.path.length - 1, - isMiddle: idx > 0 && idx < data.path.length - 1 - })); - - const links = []; - for (let i = 0; i < nodes.length - 1; i++) { - links.push({ - source: nodes[i], - target: nodes[i + 1], - index: i - }); - } - - // Color scale - const colorScale = { - 'PROJECT': '#7b2cbf', - 'TECH': '#00d4ff', - 'PERSON': '#ff6b6b', - 'ORG': '#4ecdc4', - 'OTHER': '#666' - }; - - // Draw glow lines first (behind) - svg.selectAll('.path-link-glow') - .data(links) - .enter().append('line') - .attr('class', 'path-link-glow') - .attr('x1', d => d.source.x) - .attr('y1', d => d.source.y) - .attr('x2', d => d.target.x) - .attr('y2', d => d.target.y) - .attr('stroke', '#00d4ff') - .attr('stroke-width', 8) - .attr('stroke-opacity', 0.2) - .attr('filter', 'url(#pathGlow)'); - - // Draw main lines - const linkLines = svg.selectAll('.path-link') - .data(links) - .enter().append('line') - .attr('class', 'path-link') - .attr('x1', d => d.source.x) - .attr('y1', d => d.source.y) - .attr('x2', d => d.target.x) - .attr('y2', d => d.target.y) - .attr('stroke', 'url(#pathLineGradient)') - .attr('stroke-width', 3) - .attr('stroke-linecap', 'round'); - - // Animated dash line - const animLines = svg.selectAll('.path-link-anim') - .data(links) - .enter().append('line') - .attr('class', 'path-link-anim') - .attr('x1', d => d.source.x) - .attr('y1', d => d.source.y) - .attr('x2', d => d.target.x) - .attr('y2', d => d.target.y) - .attr('stroke', '#fff') - .attr('stroke-width', 2) - .attr('stroke-dasharray', '5,5') - .attr('stroke-opacity', 0.6); - - // Animate the dash offset - function animateDash() { - animLines.attr('stroke-dashoffset', function() { - const current = parseFloat(d3.select(this).attr('stroke-dashoffset') || 0); - return current - 0.5; - }); - requestAnimationFrame(animateDash); - } - animateDash(); - - // Draw arrows - links.forEach((link, i) => { - const angle = Math.atan2(link.target.y - link.source.y, link.target.x - link.source.x); - const arrowSize = 10; - const arrowX = link.target.x - 30 * Math.cos(angle); - const arrowY = link.target.y - 30 * Math.sin(angle); - - svg.append('polygon') - .attr('points', `0,-${arrowSize/2} ${arrowSize},0 0,${arrowSize/2}`) - .attr('transform', `translate(${arrowX},${arrowY}) rotate(${angle * 180 / Math.PI})`) - .attr('fill', '#00d4ff'); - }); - - // Draw nodes - const node = svg.selectAll('.path-node') - .data(nodes) - .enter().append('g') - .attr('class', 'path-node') - .attr('transform', d => `translate(${d.x},${d.y})`); - - // Glow for start/end nodes - node.filter(d => d.isStart || d.isEnd) - .append('circle') - .attr('r', 35) - .attr('fill', d => d.isStart ? '#00d4ff' : '#7b2cbf') - .attr('opacity', 0.2) - .attr('filter', 'url(#pathGlow)'); - - // Main node circles - node.append('circle') - .attr('r', d => d.isStart || d.isEnd ? 28 : 22) - .attr('fill', d => { - if (d.isStart) return '#00d4ff'; - if (d.isEnd) return '#7b2cbf'; - return colorScale[d.type] || '#333'; - }) - .attr('stroke', '#fff') - .attr('stroke-width', d => d.isStart || d.isEnd ? 4 : 2); - - // Step numbers for middle nodes - node.filter(d => d.isMiddle) - .append('text') - .attr('dy', 5) - .attr('text-anchor', 'middle') - .attr('fill', '#fff') - .attr('font-size', '12px') - .attr('font-weight', '600') - .text(d => d.index); - - // Node labels - node.append('text') - .text(d => d.name.length > 8 ? d.name.slice(0, 7) + '...' : d.name) - .attr('text-anchor', 'middle') - .attr('dy', d => d.isStart || d.isEnd ? 45 : 38) - .attr('fill', '#e0e0e0') - .attr('font-size', d => d.isStart || d.isEnd ? '13px' : '11px') - .attr('font-weight', d => d.isStart || d.isEnd ? '600' : '400') - .style('pointer-events', 'none'); - - // Start/End labels - node.filter(d => d.isStart) - .append('text') - .attr('dy', -40) - .attr('text-anchor', 'middle') - .attr('fill', '#00d4ff') - .attr('font-size', '11px') - .attr('font-weight', '600') - .text('起点'); - - node.filter(d => d.isEnd) - .append('text') - .attr('dy', -40) - .attr('text-anchor', 'middle') - .attr('fill', '#7b2cbf') - .attr('font-size', '11px') - .attr('font-weight', '600') - .text('终点'); - - // 渲染路径信息 - renderPathInfo(data); -} - -function renderPathInfo(data) { - const container = document.getElementById('pathInfo'); - - // Calculate path statistics - const pathLength = data.path.length; - const steps = pathLength - 1; - - let html = ` -
-
- 路径长度 - ${steps} 步 -
-
- 节点数 - ${pathLength} 个 -
-
- `; - - data.path.forEach((nodeId, idx) => { - const entity = projectEntities.find(e => e.id === nodeId); - const isStart = idx === 0; - const isEnd = idx === data.path.length - 1; - - html += ` -
-
${idx + 1}
-
-
${entity?.name || nodeId}
- ${!isStart ? `
← 通过关系连接
` : ''} -
- ${isStart ? '起点' : ''} - ${isEnd ? '终点' : ''} -
- `; - }); - - container.innerHTML = html; -} - -async function findNeighbors() { - const entityId = document.getElementById('neighborEntity').value; - const depth = parseInt(document.getElementById('neighborDepth').value) || 1; - - if (!entityId) { - alert('请选择实体'); - return; - } - - // 切换到路径面板显示邻居 - document.querySelectorAll('.graph-viz-panel').forEach(panel => { - panel.classList.remove('active'); - }); - document.getElementById('pathPanel').classList.add('active'); - - const entity = projectEntities.find(e => e.id === entityId); - document.getElementById('pathDescription').textContent = - `${entity?.name || '实体'} 的 ${depth} 度邻居`; - - // 显示加载状态 - document.getElementById('pathViz').innerHTML = ` -
-
- 正在查找邻居节点... -
- `; - document.getElementById('pathInfo').innerHTML = ''; - - try { - const res = await fetch(`${API_BASE}/entities/${entityId}/neighbors?depth=${depth}`); - - if (!res.ok) throw new Error('Neighbors query failed'); - - const data = await res.json(); - renderNeighbors(data, entity); - - } catch (err) { - console.error('Find neighbors failed:', err); - document.getElementById('pathViz').innerHTML = ` -
-
-

邻居查询失败

-

请确保数据已同步到 Neo4j

-
- `; - } -} - -// Enhanced neighbors visualization -function renderNeighbors(data, centerEntity) { - const svg = d3.select('#pathSvg'); - svg.selectAll('*').remove(); - - const container = svg.node().parentElement; - const width = container.clientWidth; - const height = container.clientHeight || 300; - - svg.attr('width', width).attr('height', height); - - const neighbors = data.neighbors || []; - - if (neighbors.length === 0) { - document.getElementById('pathViz').innerHTML = ` -
-
🔍
-

未找到邻居节点

-
- `; - return; - } - - // Add glow filter - const defs = svg.append('defs'); - const filter = defs.append('filter') - .attr('id', 'neighborGlow') - .attr('x', '-50%') - .attr('y', '-50%') - .attr('width', '200%') - .attr('height', '200%'); - - filter.append('feGaussianBlur') - .attr('stdDeviation', '3') - .attr('result', 'coloredBlur'); - - const feMerge = filter.append('feMerge'); - feMerge.append('feMergeNode').attr('in', 'coloredBlur'); - feMerge.append('feMergeNode').attr('in', 'SourceGraphic'); - - // 中心节点 - const centerNode = { - id: centerEntity.id, - name: centerEntity.name, - x: width / 2, - y: height / 2, - isCenter: true - }; - - // 邻居节点 - 环形布局 - const radius = Math.min(width, height) / 3; - const neighborNodes = neighbors.map((n, idx) => ({ - id: n.entity_id, - name: n.entity_name, - x: width / 2 + radius * Math.cos((2 * Math.PI * idx) / neighbors.length - Math.PI / 2), - y: height / 2 + radius * Math.sin((2 * Math.PI * idx) / neighbors.length - Math.PI / 2), - relationType: n.relation_type - })); - - const allNodes = [centerNode, ...neighborNodes]; - - // Draw glow lines - neighborNodes.forEach(neighbor => { - svg.append('line') - .attr('x1', centerNode.x) - .attr('y1', centerNode.y) - .attr('x2', neighbor.x) - .attr('y2', neighbor.y) - .attr('stroke', '#00d4ff') - .attr('stroke-width', 6) - .attr('stroke-opacity', 0.1) - .attr('filter', 'url(#neighborGlow)'); - }); - - // Draw main lines - neighborNodes.forEach(neighbor => { - svg.append('line') - .attr('x1', centerNode.x) - .attr('y1', centerNode.y) - .attr('x2', neighbor.x) - .attr('y2', neighbor.y) - .attr('stroke', '#00d4ff') - .attr('stroke-width', 2) - .attr('stroke-opacity', 0.4); - }); - - // Draw nodes - const node = svg.selectAll('.neighbor-node') - .data(allNodes) - .enter().append('g') - .attr('class', 'neighbor-node') - .attr('transform', d => `translate(${d.x},${d.y})`); - - // Glow for center node - node.filter(d => d.isCenter) - .append('circle') - .attr('r', 40) - .attr('fill', '#00d4ff') - .attr('opacity', 0.2) - .attr('filter', 'url(#neighborGlow)'); - - // Main node circles - node.append('circle') - .attr('r', d => d.isCenter ? 35 : 25) - .attr('fill', d => d.isCenter ? '#00d4ff' : '#333') - .attr('stroke', '#fff') - .attr('stroke-width', d => d.isCenter ? 4 : 2); - - // Node labels - node.append('text') - .text(d => d.name.length > 6 ? d.name.slice(0, 5) + '...' : d.name) - .attr('text-anchor', 'middle') - .attr('dy', d => d.isCenter ? 50 : 38) - .attr('fill', '#e0e0e0') - .attr('font-size', d => d.isCenter ? '13px' : '11px') - .attr('font-weight', d => d.isCenter ? '600' : '400') - .style('pointer-events', 'none'); - - // Center label - node.filter(d => d.isCenter) - .append('text') - .attr('dy', -45) - .attr('text-anchor', 'middle') - .attr('fill', '#00d4ff') - .attr('font-size', '11px') - .attr('font-weight', '600') - .text('中心'); - - // 渲染邻居信息 - let html = ` -
-
- 邻居节点数 - ${neighbors.length} 个 -
-
- `; - - neighbors.forEach((n, idx) => { - html += ` -
-
${idx + 1}
-
-
${n.entity_name}
-
关系: ${n.relation_type}
-
-
- `; - }); - document.getElementById('pathInfo').innerHTML = html; +// Escape HTML helper +function escapeHtml(text) { + if (!text) return ''; + const div = document.createElement('div'); + div.textContent = text; + return div.innerHTML; } // Show notification helper @@ -2981,21 +462,13 @@ function showNotification(message, type = 'info') { }, 3000); } -// Reset graph visualization -window.resetGraphViz = function() { - const svg = d3.select('#graphAnalysisSvg'); - svg.selectAll('*').remove(); - document.getElementById('graphAnalysisResults').classList.remove('show'); - focusedCommunityIndex = null; - if (communitiesData) { - renderCommunities(communitiesData); - } -}; - -// Highlight entity in graph -function highlightEntityInGraph(entityId) { - // This would highlight the entity in the main graph view - // For now, just switch to workbench and select the entity - switchView('workbench'); - setTimeout(() => selectEntity(entityId), 100); -} +// Placeholder functions for other views +function initUpload() {} +function initAgentPanel() {} +function initEntityCard() {} +function renderTranscript() {} +function renderGraph() {} +function renderEntityList() {} +function loadKnowledgeBase() {} +function loadTimeline() {} +function initGraphAnalysis() {} diff --git a/frontend/workbench.html b/frontend/workbench.html index c63f47f..8aa0b33 100644 --- a/frontend/workbench.html +++ b/frontend/workbench.html @@ -1925,6 +1925,344 @@ border-radius: 50%; background: currentColor; } + + /* Phase 6: API Key Management Panel */ + .api-keys-panel { + display: none; + flex-direction: column; + width: 100%; + height: 100%; + background: #0a0a0a; + } + + .api-keys-panel.active { + display: flex; + } + + .api-keys-header { + padding: 16px 20px; + background: #141414; + border-bottom: 1px solid #222; + display: flex; + justify-content: space-between; + align-items: center; + } + + .api-keys-header h2 { + font-size: 1.3rem; + margin-bottom: 4px; + } + + .api-keys-content { + flex: 1; + padding: 24px; + overflow-y: auto; + } + + .api-keys-stats { + display: grid; + grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); + gap: 16px; + margin-bottom: 24px; + } + + .api-key-stat-card { + background: #141414; + border: 1px solid #222; + border-radius: 8px; + padding: 16px; + text-align: center; + } + + .api-key-stat-value { + font-size: 1.5rem; + font-weight: 600; + color: #00d4ff; + } + + .api-key-stat-label { + font-size: 0.75rem; + color: #666; + margin-top: 4px; + } + + .api-keys-list { + background: #141414; + border: 1px solid #222; + border-radius: 12px; + overflow: hidden; + } + + .api-keys-list-header { + display: grid; + grid-template-columns: 2fr 1fr 1fr 1fr 1fr 120px; + padding: 12px 16px; + background: #1a1a1a; + border-bottom: 1px solid #222; + font-size: 0.85rem; + color: #888; + font-weight: 500; + } + + .api-key-item { + display: grid; + grid-template-columns: 2fr 1fr 1fr 1fr 1fr 120px; + padding: 16px; + border-bottom: 1px solid #222; + align-items: center; + transition: background 0.2s; + } + + .api-key-item:hover { + background: #1a1a1a; + } + + .api-key-item:last-child { + border-bottom: none; + } + + .api-key-name { + font-weight: 500; + color: #e0e0e0; + } + + .api-key-preview { + font-family: monospace; + font-size: 0.85rem; + color: #00d4ff; + background: #00d4ff11; + padding: 4px 8px; + border-radius: 4px; + } + + .api-key-permissions { + display: flex; + gap: 4px; + flex-wrap: wrap; + } + + .api-key-permission { + font-size: 0.7rem; + padding: 2px 6px; + border-radius: 4px; + background: #333; + color: #888; + } + + .api-key-permission.read { + background: #00d4ff22; + color: #00d4ff; + } + + .api-key-permission.write { + background: #7b2cbf22; + color: #7b2cbf; + } + + .api-key-permission.delete { + background: #ff6b6b22; + color: #ff6b6b; + } + + .api-key-status { + font-size: 0.8rem; + padding: 4px 10px; + border-radius: 20px; + display: inline-block; + } + + .api-key-status.active { + background: #00d4ff22; + color: #00d4ff; + } + + .api-key-status.revoked { + background: #ff6b6b22; + color: #ff6b6b; + } + + .api-key-status.expired { + background: #66666622; + color: #666; + } + + .api-key-actions { + display: flex; + gap: 8px; + } + + .api-key-btn { + background: transparent; + border: 1px solid #333; + color: #888; + padding: 6px 12px; + border-radius: 6px; + cursor: pointer; + font-size: 0.8rem; + transition: all 0.2s; + } + + .api-key-btn:hover { + border-color: #00d4ff; + color: #00d4ff; + } + + .api-key-btn.danger:hover { + border-color: #ff6b6b; + color: #ff6b6b; + } + + .api-key-empty { + text-align: center; + padding: 60px 20px; + color: #666; + } + + .api-key-modal-form { + display: flex; + flex-direction: column; + gap: 16px; + } + + .api-key-form-group { + display: flex; + flex-direction: column; + gap: 8px; + } + + .api-key-form-group label { + font-size: 0.9rem; + color: #888; + } + + .api-key-form-group input, + .api-key-form-group select { + background: #0a0a0a; + border: 1px solid #333; + border-radius: 6px; + padding: 10px 12px; + color: #e0e0e0; + font-size: 0.95rem; + } + + .api-key-form-group input:focus, + .api-key-form-group select:focus { + outline: none; + border-color: #00d4ff; + } + + .api-key-permissions-select { + display: flex; + gap: 12px; + } + + .api-key-permission-checkbox { + display: flex; + align-items: center; + gap: 6px; + cursor: pointer; + } + + .api-key-permission-checkbox input[type="checkbox"] { + width: 18px; + height: 18px; + accent-color: #00d4ff; + } + + .api-key-created-modal .api-key-value { + background: #0a0a0a; + border: 1px solid #00d4ff; + border-radius: 8px; + padding: 16px; + font-family: monospace; + font-size: 1rem; + color: #00d4ff; + margin: 16px 0; + word-break: break-all; + } + + .api-key-created-modal .warning { + background: #ff6b6b22; + border: 1px solid #ff6b6b; + border-radius: 8px; + padding: 12px; + color: #ff6b6b; + font-size: 0.85rem; + margin-bottom: 16px; + } + + .api-key-stats-modal .stats-grid { + display: grid; + grid-template-columns: repeat(2, 1fr); + gap: 16px; + margin-bottom: 24px; + } + + .api-key-stats-modal .stat-item { + background: #0a0a0a; + border-radius: 8px; + padding: 16px; + text-align: center; + } + + .api-key-stats-modal .stat-value { + font-size: 1.5rem; + font-weight: 600; + color: #00d4ff; + } + + .api-key-stats-modal .stat-label { + font-size: 0.8rem; + color: #666; + margin-top: 4px; + } + + .api-key-logs { + max-height: 300px; + overflow-y: auto; + } + + .api-key-log-item { + display: grid; + grid-template-columns: 1fr 80px 60px 80px; + gap: 12px; + padding: 12px; + border-bottom: 1px solid #222; + font-size: 0.85rem; + } + + .api-key-log-item:last-child { + border-bottom: none; + } + + .api-key-log-endpoint { + color: #e0e0e0; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + } + + .api-key-log-method { + color: #00d4ff; + font-family: monospace; + } + + .api-key-log-status { + text-align: center; + } + + .api-key-log-status.success { + color: #4ecdc4; + } + + .api-key-log-status.error { + color: #ff6b6b; + } + + .api-key-log-time { + color: #666; + text-align: right; + } @@ -1946,6 +2284,7 @@ +
@@ -2324,6 +2663,54 @@
+ +
+
+
+

🔑 API Key 管理

+

管理 API 访问密钥和调用统计

+
+ +
+ +
+
+
+
-
+
总 API Keys
+
+
+
-
+
活跃
+
+
+
-
+
已撤销
+
+
+
-
+
总调用次数
+
+
+ +
+
+ 名称 / Key + 权限 + 限流 + 状态 + 调用次数 + 操作 +
+
+
+

加载中...

+
+
+
+
+
+
@@ -2663,6 +3050,122 @@
+ + + + + + + + +
✏️ 编辑实体