Add Phase 3 feature documentation

Add deployment script for Phase 3
Phase 3: Memory & Growth - Multi-file fusion, Entity alignment with embedding, Document import, Knowledge base panel
2026-02-18 12:13:51 +08:00 · 2026-02-18 12:13:22 +08:00 · 2026-02-18 12:12:39 +08:00 · 2026-02-18 06:03:51 +08:00
13 changed files with 3217 additions and 201 deletions
--- a/34
+++ b/34
@@ -1,29 +1,33 @@
 # InsightFlow - Audio to Knowledge Graph Platform
 # Phase 3: Memory & Growth
 FROM python:3.11-slim
 WORKDIR /app
-# Install uv
+# Install system dependencies
 COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
 # Install system deps
 RUN apt-get update && apt-get install -y \
-    ffmpeg \
+    gcc \
-    git \
+    libpq-dev \
    && rm -rf /var/lib/apt/lists/*
-# Copy project files
+# Copy backend requirements
-COPY backend/pyproject.toml backend/uv.lock ./
+COPY backend/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-# Install dependencies using uv sync
+# Copy application code
 RUN uv sync --frozen --no-install-project
 # Copy code
 COPY backend/ ./backend/
 COPY frontend/ ./frontend/
-# Install project
+# Create data directory
-RUN uv sync --frozen
+RUN mkdir -p /app/data
 # Set environment variables
 ENV PYTHONPATH=/app
 ENV DB_PATH=/app/data/insightflow.db
 # Expose port
 EXPOSE 8000
-CMD ["uv", "run", "python", "backend/main.py"]
+# Run the application
 CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/README.md
+++ b/README.md
@@ -1,27 +1,88 @@
-# InsightFlow
+# InsightFlow - Audio to Knowledge Graph Platform
-音频与文档的领域知识构建平台
+## Phase 3: Memory & Growth - Completed ✅
-## 产品定位
+### 新增功能
 将会议录音和文档转化为结构化的知识图谱，通过人机回圈(Human-in-the-Loop)实现知识持续生长。
-## 核心特性
+#### 1. 多文件图谱融合 ✅
- 🎙️ ASR 语音识别 + 热词注入
+- 支持上传多个音频文件到同一项目
- 🧠 LLM 实体抽取与解释
+- 系统自动对齐实体，合并图谱
- 🔗 双视图联动（文档视图 + 图谱视图）
+- 实体提及跨文件追踪
- 📈 知识生长（多文件实体对齐）
+- 文件选择器切换不同转录内容
-## 技术栈
+#### 2. 实体对齐算法优化 ✅
- 前端: Next.js + Tailwind
+- 新增 `entity_aligner.py` 模块
- 后端: Node.js / Python
+- 支持使用 Kimi API embedding 进行语义相似度匹配
- 数据库: MySQL + Neo4j
+- 余弦相似度计算
- ASR: Whisper
+- 自动别名建议
- LLM: OpenAI / Kimi
+- 批量实体对齐 API
-## 开发阶段
+#### 3. PDF/DOCX 文档导入 ✅
- [ ] Phase 1: 骨架与单体分析 (MVP)
+- 新增 `document_processor.py` 模块
- [ ] Phase 2: 交互与纠错工作台
+- 支持 PDF、DOCX、TXT、MD 格式
- [ ] Phase 3: 记忆与生长
+- 文档文本提取并参与实体提取
 - 文档类型标记（音频/文档）
-## 文档
+#### 4. 项目知识库面板 ✅
- [PRD v2.0](docs/PRD-v2.0.md)
+- 全新的知识库视图
 - 统计面板：实体数、关系数、文件数、术语数
 - 实体网格展示（带提及统计）
 - 关系列表展示
 - 术语表管理（添加/删除）
 - 文件列表展示
 ### 技术栈
 - 后端: FastAPI + SQLite
 - 前端: 原生 HTML/JS + D3.js
 - ASR: 阿里云听悟
 - LLM: Kimi API
 - 文档处理: PyPDF2, python-docx
 ### 部署
 ```bash
 # 构建 Docker 镜像
 docker build -t insightflow:phase3 .
 # 运行容器
 docker run -d \
  -p 18000:8000 \
  -v /opt/data:/app/data \
  -e KIMI_API_KEY=your_key \
  -e ALIYUN_ACCESS_KEY_ID=your_key \
  -e ALIYUN_ACCESS_KEY_SECRET=your_secret \
  insightflow:phase3
 ```
 ### API 文档
 #### 新增 API
 **文档上传**
 ```
 POST /api/v1/projects/{project_id}/upload-document
 Content-Type: multipart/form-data
 file: <文件>
 ```
 **知识库查询**
 ```
 GET /api/v1/projects/{project_id}/knowledge-base
 ```
 **术语表管理**
 ```
 POST /api/v1/projects/{project_id}/glossary
 GET /api/v1/projects/{project_id}/glossary
 DELETE /api/v1/glossary/{term_id}
 ```
 **实体对齐**
 ```
 POST /api/v1/projects/{project_id}/align-entities?threshold=0.85
 ```
 ### 数据库 Schema 更新
 - `transcripts` 表新增 `type` 字段（audio/document）
 - `entities` 表新增 `embedding` 字段
 - 新增索引优化查询性能
--- a/STATUS.md
+++ b/STATUS.md
@@ -4,11 +4,13 @@
 ## 当前阶段
-Phase 1: 骨架与单体分析 (MVP) - **已完成 ✅**
+Phase 3: 记忆与生长 - **已完成 ✅**
 ## 已完成
-### 后端 (backend/)
+### Phase 1: 骨架与单体分析 (MVP) ✅
 #### 后端 (backend/)
 - ✅ FastAPI 项目框架搭建
 - ✅ SQLite 数据库设计 (schema.sql)
 - ✅ 数据库管理模块 (db_manager.py)
@@ -26,7 +28,7 @@ Phase 1: 骨架与单体分析 (MVP) - **已完成 ✅**
 - ✅ entity_mentions 表数据写入
 - ✅ entity_relations 表数据写入
-### 前端 (frontend/)
+#### 前端 (frontend/)
 - ✅ 项目管理页面 (index.html)
 - ✅ 知识工作台页面 (workbench.html)
 - ✅ D3.js 知识图谱可视化
@@ -35,35 +37,97 @@ Phase 1: 骨架与单体分析 (MVP) - **已完成 ✅**
 - ✅ 转录文本中实体高亮显示
 - ✅ 图谱与文本联动（点击实体双向高亮）
-### 基础设施
+### Phase 2: 交互与纠错工作台 ✅
 - ✅ Dockerfile
 - ✅ docker-compose.yml
 - ✅ Git 仓库初始化
-## Phase 2 计划 (交互与纠错工作台) - **即将开始**
+#### 后端 API 新增
 - ✅ 实体编辑 API (PUT /api/v1/entities/{id})
 - ✅ 实体删除 API (DELETE /api/v1/entities/{id})
 - ✅ 实体合并 API (POST /api/v1/entities/{id}/merge)
 - ✅ 手动创建实体 API (POST /api/v1/projects/{id}/entities)
 - ✅ 关系创建 API (POST /api/v1/projects/{id}/relations)
 - ✅ 关系删除 API (DELETE /api/v1/relations/{id})
 - ✅ 转录编辑 API (PUT /api/v1/transcripts/{id})
- 实体定义编辑功能
+#### 前端交互功能
- 实体合并功能
+- ✅ 实体编辑器模态框（名称、类型、定义、别名）
- 关系编辑功能（添加/删除）
+- ✅ 右键菜单（编辑实体、合并实体、标记为实体）
- 人工修正数据保存
+- ✅ 实体合并功能
- 文本编辑器增强（支持编辑转录文本）
+- ✅ 关系管理（添加、删除）
 - ✅ 转录文本编辑模式
 - ✅ 划词创建实体
 - ✅ 文本与图谱双向联动
-## Phase 3 计划 (记忆与生长)
+#### 数据库更新
 - ✅ update_entity() - 更新实体信息
 - ✅ delete_entity() - 删除实体及关联数据
 - ✅ delete_relation() - 删除关系
 - ✅ update_relation() - 更新关系
 - ✅ update_transcript() - 更新转录文本
- 多文件图谱融合
+### Phase 3: 记忆与生长 ✅
- 实体对齐算法优化
+
- PDF/DOCX 文档导入
+#### 多文件图谱融合
- 项目知识库面板
+- ✅ 支持上传多个音频文件到同一项目
 - ✅ 系统自动对齐实体，合并图谱
 - ✅ 实体提及跨文件追踪
 - ✅ 文件选择器切换不同转录内容
 - ✅ 转录列表 API 返回文件类型
 #### 实体对齐算法优化
 - ✅ 新增 `entity_aligner.py` 模块
 - ✅ 使用 Kimi API embedding 进行语义相似度匹配
 - ✅ 余弦相似度计算
 - ✅ 自动别名建议
 - ✅ 批量实体对齐 API
 - ✅ 实体对齐回退机制（字符串匹配）
 #### PDF/DOCX 文档导入
 - ✅ 新增 `document_processor.py` 模块
 - ✅ 支持 PDF、DOCX、TXT、MD 格式
 - ✅ 文档文本提取并参与实体提取
 - ✅ 文档上传 API (/api/v1/projects/{id}/upload-document)
 - ✅ 文档类型标记（audio/document）
 #### 项目知识库面板
 - ✅ 全新的知识库视图
 - ✅ 侧边栏导航切换（工作台/知识库）
 - ✅ 统计面板：实体数、关系数、文件数、术语数
 - ✅ 实体网格展示（带提及统计）
 - ✅ 关系列表展示
 - ✅ 术语表管理（添加/删除）
 - ✅ 文件列表展示（区分音频/文档）
 #### 术语表功能
 - ✅ 术语表数据库表 (glossary)
 - ✅ 添加术语 API
 - ✅ 获取术语列表 API
 - ✅ 删除术语 API
 - ✅ 前端术语表管理界面
 #### 数据库更新
 - ✅ transcripts 表新增 `type` 字段
 - ✅ entities 表新增 `embedding` 字段
 - ✅ 新增 glossary 表
 - ✅ 新增索引优化查询性能
 ## 技术债务
 - 听悟 SDK fallback 到 mock 需要更好的错误处理
 - 实体相似度匹配目前只是简单字符串包含，需要 embedding 方案
 - 前端需要状态管理（目前使用全局变量）
 - 需要添加 API 文档 (OpenAPI/Swagger)
 - Embedding 缓存需要持久化
 - 实体对齐算法需要更多测试
 ## 部署信息
 - 服务器: 122.51.127.111
 - 项目路径: /opt/projects/insightflow
 - 端口: 18000
 - Docker 镜像: insightflow:phase3
 ## 下一步 (Phase 4)
 - 知识推理与问答
 - 实体属性扩展
 - 时间线视图
 - 导出功能（PDF/图片）
--- a/backend/db_manager.py
+++ b/backend/db_manager.py
@@ -1,7 +1,8 @@
 #!/usr/bin/env python3
 """
-InsightFlow Database Manager
+InsightFlow Database Manager - Phase 3
 处理项目、实体、关系的持久化
 支持文档类型和多文件融合
 """
 import os
@@ -166,6 +167,18 @@ class DatabaseManager:
            (target_id, source_id)
        )
        # 更新关系 - source 作为 source_entity_id
        conn.execute(
            "UPDATE entity_relations SET source_entity_id = ? WHERE source_entity_id = ?",
            (target_id, source_id)
        )
        # 更新关系 - source 作为 target_entity_id
        conn.execute(
            "UPDATE entity_relations SET target_entity_id = ? WHERE target_entity_id = ?",
            (target_id, source_id)
        )
        # 删除源实体
        conn.execute("DELETE FROM entities WHERE id = ?", (source_id,))
@@ -222,13 +235,13 @@ class DatabaseManager:
        return [EntityMention(**dict(r)) for r in rows]
    # Transcript operations
-    def save_transcript(self, transcript_id: str, project_id: str, filename: str, full_text: str):
+    def save_transcript(self, transcript_id: str, project_id: str, filename: str, full_text: str, transcript_type: str = "audio"):
        """保存转录记录"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
-            "INSERT INTO transcripts (id, project_id, filename, full_text, created_at) VALUES (?, ?, ?, ?, ?)",
+            "INSERT INTO transcripts (id, project_id, filename, full_text, type, created_at) VALUES (?, ?, ?, ?, ?, ?)",
-            (transcript_id, project_id, filename, full_text, now)
+            (transcript_id, project_id, filename, full_text, transcript_type, now)
        )
        conn.commit()
        conn.close()
@@ -291,6 +304,156 @@ class DatabaseManager:
        conn.close()
        return [dict(r) for r in rows]
    def update_entity(self, entity_id: str, **kwargs) -> Entity:
        """更新实体信息"""
        conn = self.get_conn()
        # 构建更新字段
        allowed_fields = ['name', 'type', 'definition', 'canonical_name']
        updates = []
        values = []
        for field in allowed_fields:
            if field in kwargs:
                updates.append(f"{field} = ?")
                values.append(kwargs[field])
        # 处理别名
        if 'aliases' in kwargs:
            updates.append("aliases = ?")
            values.append(json.dumps(kwargs['aliases']))
        if not updates:
            conn.close()
            return self.get_entity(entity_id)
        updates.append("updated_at = ?")
        values.append(datetime.now().isoformat())
        values.append(entity_id)
        query = f"UPDATE entities SET {', '.join(updates)} WHERE id = ?"
        conn.execute(query, values)
        conn.commit()
        conn.close()
        return self.get_entity(entity_id)
    def delete_entity(self, entity_id: str):
        """删除实体及其关联数据"""
        conn = self.get_conn()
        # 删除提及记录
        conn.execute("DELETE FROM entity_mentions WHERE entity_id = ?", (entity_id,))
        # 删除关系
        conn.execute("DELETE FROM entity_relations WHERE source_entity_id = ? OR target_entity_id = ?", 
                     (entity_id, entity_id))
        # 删除实体
        conn.execute("DELETE FROM entities WHERE id = ?", (entity_id,))
        conn.commit()
        conn.close()
    def delete_relation(self, relation_id: str):
        """删除关系"""
        conn = self.get_conn()
        conn.execute("DELETE FROM entity_relations WHERE id = ?", (relation_id,))
        conn.commit()
        conn.close()
    def update_relation(self, relation_id: str, **kwargs) -> dict:
        """更新关系"""
        conn = self.get_conn()
        allowed_fields = ['relation_type', 'evidence']
        updates = []
        values = []
        for field in allowed_fields:
            if field in kwargs:
                updates.append(f"{field} = ?")
                values.append(kwargs[field])
        if updates:
            query = f"UPDATE entity_relations SET {', '.join(updates)} WHERE id = ?"
            values.append(relation_id)
            conn.execute(query, values)
            conn.commit()
        row = conn.execute("SELECT * FROM entity_relations WHERE id = ?", (relation_id,)).fetchone()
        conn.close()
        return dict(row) if row else None
    def update_transcript(self, transcript_id: str, full_text: str) -> dict:
        """更新转录文本"""
        conn = self.get_conn()
        now = datetime.now().isoformat()
        conn.execute(
            "UPDATE transcripts SET full_text = ?, updated_at = ? WHERE id = ?",
            (full_text, now, transcript_id)
        )
        conn.commit()
        row = conn.execute("SELECT * FROM transcripts WHERE id = ?", (transcript_id,)).fetchone()
        conn.close()
        return dict(row) if row else None
    # Phase 3: Glossary operations
    def add_glossary_term(self, project_id: str, term: str, pronunciation: str = "") -> str:
        """添加术语到术语表"""
        conn = self.get_conn()
        # 检查是否已存在
        existing = conn.execute(
            "SELECT * FROM glossary WHERE project_id = ? AND term = ?",
            (project_id, term)
        ).fetchone()
        if existing:
            # 更新频率
            conn.execute(
                "UPDATE glossary SET frequency = frequency + 1 WHERE id = ?",
                (existing['id'],)
            )
            conn.commit()
            conn.close()
            return existing['id']
        term_id = str(uuid.uuid4())[:8]
        conn.execute(
            "INSERT INTO glossary (id, project_id, term, pronunciation, frequency) VALUES (?, ?, ?, ?, ?)",
            (term_id, project_id, term, pronunciation, 1)
        )
        conn.commit()
        conn.close()
        return term_id
    def list_glossary(self, project_id: str) -> List[dict]:
        """列出项目术语表"""
        conn = self.get_conn()
        rows = conn.execute(
            "SELECT * FROM glossary WHERE project_id = ? ORDER BY frequency DESC",
            (project_id,)
        ).fetchall()
        conn.close()
        return [dict(r) for r in rows]
    def delete_glossary_term(self, term_id: str):
        """删除术语"""
        conn = self.get_conn()
        conn.execute("DELETE FROM glossary WHERE id = ?", (term_id,))
        conn.commit()
        conn.close()
    # Phase 3: Get all entities for embedding
    def get_all_entities_for_embedding(self, project_id: str) -> List[Entity]:
        """获取所有实体用于 embedding 计算"""
        return self.list_project_entities(project_id)
 # Singleton instance
 _db_manager = None
--- a/backend/document_processor.py
+++ b/backend/document_processor.py
@@ -0,0 +1,180 @@
 #!/usr/bin/env python3
 """
 Document Processor - Phase 3
 支持 PDF 和 DOCX 文档导入
 """
 import os
 import io
 from typing import Dict, Optional
 class DocumentProcessor:
    """文档处理器 - 提取 PDF/DOCX 文本"""
    def __init__(self):
        self.supported_formats = {
            '.pdf': self._extract_pdf,
            '.docx': self._extract_docx,
            '.doc': self._extract_docx,
            '.txt': self._extract_txt,
            '.md': self._extract_txt,
        }
    def process(self, content: bytes, filename: str) -> Dict[str, str]:
        """
        处理文档并提取文本
        Args:
            content: 文件二进制内容
            filename: 文件名
        Returns:
            {"text": "提取的文本内容", "format": "文件格式"}
        """
        ext = os.path.splitext(filename.lower())[1]
        if ext not in self.supported_formats:
            raise ValueError(f"Unsupported file format: {ext}. Supported: {list(self.supported_formats.keys())}")
        extractor = self.supported_formats[ext]
        text = extractor(content)
        # 清理文本
        text = self._clean_text(text)
        return {
            "text": text,
            "format": ext,
            "filename": filename
        }
    def _extract_pdf(self, content: bytes) -> str:
        """提取 PDF 文本"""
        try:
            import PyPDF2
            pdf_file = io.BytesIO(content)
            reader = PyPDF2.PdfReader(pdf_file)
            text_parts = []
            for page in reader.pages:
                page_text = page.extract_text()
                if page_text:
                    text_parts.append(page_text)
            return "\n\n".join(text_parts)
        except ImportError:
            # Fallback: 尝试使用 pdfplumber
            try:
                import pdfplumber
                text_parts = []
                with pdfplumber.open(io.BytesIO(content)) as pdf:
                    for page in pdf.pages:
                        page_text = page.extract_text()
                        if page_text:
                            text_parts.append(page_text)
                return "\n\n".join(text_parts)
            except ImportError:
                raise ImportError("PDF processing requires PyPDF2 or pdfplumber. Install with: pip install PyPDF2")
        except Exception as e:
            raise ValueError(f"PDF extraction failed: {str(e)}")
    def _extract_docx(self, content: bytes) -> str:
        """提取 DOCX 文本"""
        try:
            import docx
            doc_file = io.BytesIO(content)
            doc = docx.Document(doc_file)
            text_parts = []
            for para in doc.paragraphs:
                if para.text.strip():
                    text_parts.append(para.text)
            # 提取表格中的文本
            for table in doc.tables:
                for row in table.rows:
                    row_text = []
                    for cell in row.cells:
                        if cell.text.strip():
                            row_text.append(cell.text.strip())
                    if row_text:
                        text_parts.append(" | ".join(row_text))
            return "\n\n".join(text_parts)
        except ImportError:
            raise ImportError("DOCX processing requires python-docx. Install with: pip install python-docx")
        except Exception as e:
            raise ValueError(f"DOCX extraction failed: {str(e)}")
    def _extract_txt(self, content: bytes) -> str:
        """提取纯文本"""
        # 尝试多种编码
        encodings = ['utf-8', 'gbk', 'gb2312', 'latin-1']
        for encoding in encodings:
            try:
                return content.decode(encoding)
            except UnicodeDecodeError:
                continue
        # 如果都失败了，使用 latin-1 并忽略错误
        return content.decode('latin-1', errors='ignore')
    def _clean_text(self, text: str) -> str:
        """清理提取的文本"""
        if not text:
            return ""
        # 移除多余的空白字符
        lines = text.split('\n')
        cleaned_lines = []
        for line in lines:
            line = line.strip()
            # 移除空行，但保留段落分隔
            if line:
                cleaned_lines.append(line)
        # 合并行，保留段落结构
        text = '\n\n'.join(cleaned_lines)
        # 移除多余的空格
        text = ' '.join(text.split())
        # 移除控制字符
        text = ''.join(char for char in text if ord(char) >= 32 or char in '\n\r\t')
        return text.strip()
    def is_supported(self, filename: str) -> bool:
        """检查文件格式是否支持"""
        ext = os.path.splitext(filename.lower())[1]
        return ext in self.supported_formats
 # 简单的文本提取器（不需要外部依赖）
 class SimpleTextExtractor:
    """简单的文本提取器，用于测试"""
    def extract(self, content: bytes, filename: str) -> str:
        """尝试提取文本"""
        encodings = ['utf-8', 'gbk', 'latin-1']
        for encoding in encodings:
            try:
                return content.decode(encoding)
            except UnicodeDecodeError:
                continue
        return content.decode('latin-1', errors='ignore')
 if __name__ == "__main__":
    # 测试
    processor = DocumentProcessor()
    # 测试文本提取
    test_text = "Hello World\n\nThis is a test document.\n\nMultiple paragraphs."
    result = processor.process(test_text.encode('utf-8'), "test.txt")
    print(f"Text extraction test: {len(result['text'])} chars")
    print(result['text'][:100])
--- a/backend/entity_aligner.py
+++ b/backend/entity_aligner.py
@@ -0,0 +1,372 @@
 #!/usr/bin/env python3
 """
 Entity Aligner - Phase 3
 使用 embedding 进行实体对齐
 """
 import os
 import json
 import httpx
 import numpy as np
 from typing import List, Optional, Dict
 from dataclasses import dataclass
 # API Keys
 KIMI_API_KEY = os.getenv("KIMI_API_KEY", "")
 KIMI_BASE_URL = os.getenv("KIMI_BASE_URL", "https://api.kimi.com/coding")
@dataclass
 class EntityEmbedding:
    entity_id: str
    name: str
    definition: str
    embedding: List[float]
 class EntityAligner:
    """实体对齐器 - 使用 embedding 进行相似度匹配"""
    def __init__(self, similarity_threshold: float = 0.85):
        self.similarity_threshold = similarity_threshold
        self.embedding_cache: Dict[str, List[float]] = {}
    def get_embedding(self, text: str) -> Optional[List[float]]:
        """
        使用 Kimi API 获取文本的 embedding
        Args:
            text: 输入文本
        Returns:
            embedding 向量或 None
        """
        if not KIMI_API_KEY:
            return None
        # 检查缓存
        cache_key = hash(text)
        if cache_key in self.embedding_cache:
            return self.embedding_cache[cache_key]
        try:
            response = httpx.post(
                f"{KIMI_BASE_URL}/v1/embeddings",
                headers={"Authorization": f"Bearer {KIMI_API_KEY}", "Content-Type": "application/json"},
                json={
                    "model": "k2p5",
                    "input": text[:500]  # 限制长度
                },
                timeout=30.0
            )
            response.raise_for_status()
            result = response.json()
            embedding = result["data"][0]["embedding"]
            self.embedding_cache[cache_key] = embedding
            return embedding
        except Exception as e:
            print(f"Embedding API failed: {e}")
            return None
    def compute_similarity(self, embedding1: List[float], embedding2: List[float]) -> float:
        """
        计算两个 embedding 的余弦相似度
        Args:
            embedding1: 第一个向量
            embedding2: 第二个向量
        Returns:
            相似度分数 (0-1)
        """
        vec1 = np.array(embedding1)
        vec2 = np.array(embedding2)
        # 余弦相似度
        dot_product = np.dot(vec1, vec2)
        norm1 = np.linalg.norm(vec1)
        norm2 = np.linalg.norm(vec2)
        if norm1 == 0 or norm2 == 0:
            return 0.0
        return float(dot_product / (norm1 * norm2))
    def get_entity_text(self, name: str, definition: str = "") -> str:
        """
        构建用于 embedding 的实体文本
        Args:
            name: 实体名称
            definition: 实体定义
        Returns:
            组合文本
        """
        if definition:
            return f"{name}: {definition}"
        return name
    def find_similar_entity(
        self, 
        project_id: str, 
        name: str, 
        definition: str = "",
        exclude_id: Optional[str] = None,
        threshold: Optional[float] = None
    ) -> Optional[object]:
        """
        查找相似的实体
        Args:
            project_id: 项目 ID
            name: 实体名称
            definition: 实体定义
            exclude_id: 要排除的实体 ID
            threshold: 相似度阈值
        Returns:
            相似的实体或 None
        """
        if threshold is None:
            threshold = self.similarity_threshold
        try:
            from db_manager import get_db_manager
            db = get_db_manager()
        except ImportError:
            return None
        # 获取项目的所有实体
        entities = db.get_all_entities_for_embedding(project_id)
        if not entities:
            return None
        # 获取查询实体的 embedding
        query_text = self.get_entity_text(name, definition)
        query_embedding = self.get_embedding(query_text)
        if query_embedding is None:
            # 如果 embedding API 失败，回退到简单匹配
            return self._fallback_similarity_match(entities, name, exclude_id)
        best_match = None
        best_score = threshold
        for entity in entities:
            if exclude_id and entity.id == exclude_id:
                continue
            # 获取实体的 embedding
            entity_text = self.get_entity_text(entity.name, entity.definition)
            entity_embedding = self.get_embedding(entity_text)
            if entity_embedding is None:
                continue
            # 计算相似度
            similarity = self.compute_similarity(query_embedding, entity_embedding)
            if similarity > best_score:
                best_score = similarity
                best_match = entity
        return best_match
    def _fallback_similarity_match(
        self, 
        entities: List[object], 
        name: str, 
        exclude_id: Optional[str] = None
    ) -> Optional[object]:
        """
        回退到简单的相似度匹配（不使用 embedding）
        Args:
            entities: 实体列表
            name: 查询名称
            exclude_id: 要排除的实体 ID
        Returns:
            最相似的实体或 None
        """
        name_lower = name.lower()
        # 1. 精确匹配
        for entity in entities:
            if exclude_id and entity.id == exclude_id:
                continue
            if entity.name.lower() == name_lower:
                return entity
            if entity.aliases and name_lower in [a.lower() for a in entity.aliases]:
                return entity
        # 2. 包含匹配
        for entity in entities:
            if exclude_id and entity.id == exclude_id:
                continue
            if name_lower in entity.name.lower() or entity.name.lower() in name_lower:
                return entity
        return None
    def batch_align_entities(
        self, 
        project_id: str, 
        new_entities: List[Dict],
        threshold: Optional[float] = None
    ) -> List[Dict]:
        """
        批量对齐实体
        Args:
            project_id: 项目 ID
            new_entities: 新实体列表 [{"name": "...", "definition": "..."}]
            threshold: 相似度阈值
        Returns:
            对齐结果列表 [{"new_entity": {...}, "matched_entity": {...}, "similarity": 0.9}]
        """
        if threshold is None:
            threshold = self.similarity_threshold
        results = []
        for new_ent in new_entities:
            matched = self.find_similar_entity(
                project_id,
                new_ent["name"],
                new_ent.get("definition", ""),
                threshold=threshold
            )
            result = {
                "new_entity": new_ent,
                "matched_entity": None,
                "similarity": 0.0,
                "should_merge": False
            }
            if matched:
                # 计算相似度
                query_text = self.get_entity_text(new_ent["name"], new_ent.get("definition", ""))
                matched_text = self.get_entity_text(matched.name, matched.definition)
                query_emb = self.get_embedding(query_text)
                matched_emb = self.get_embedding(matched_text)
                if query_emb and matched_emb:
                    similarity = self.compute_similarity(query_emb, matched_emb)
                    result["matched_entity"] = {
                        "id": matched.id,
                        "name": matched.name,
                        "type": matched.type,
                        "definition": matched.definition
                    }
                    result["similarity"] = similarity
                    result["should_merge"] = similarity >= threshold
            results.append(result)
        return results
    def suggest_entity_aliases(self, entity_name: str, entity_definition: str = "") -> List[str]:
        """
        使用 LLM 建议实体的别名
        Args:
            entity_name: 实体名称
            entity_definition: 实体定义
        Returns:
            建议的别名列表
        """
        if not KIMI_API_KEY:
            return []
        prompt = f"""为以下实体生成可能的别名或简称：
 实体名称：{entity_name}
 定义：{entity_definition}
 请返回 JSON 格式的别名列表：
 {{"aliases": ["别名1", "别名2", "别名3"]}}
 只返回 JSON，不要其他内容。"""
        try:
            response = httpx.post(
                f"{KIMI_BASE_URL}/v1/chat/completions",
                headers={"Authorization": f"Bearer {KIMI_API_KEY}", "Content-Type": "application/json"},
                json={
                    "model": "k2p5",
                    "messages": [{"role": "user", "content": prompt}],
                    "temperature": 0.3
                },
                timeout=30.0
            )
            response.raise_for_status()
            result = response.json()
            content = result["choices"][0]["message"]["content"]
            import re
            json_match = re.search(r'\{{.*?\}}', content, re.DOTALL)
            if json_match:
                data = json.loads(json_match.group())
                return data.get("aliases", [])
        except Exception as e:
            print(f"Alias suggestion failed: {e}")
        return []
 # 简单的字符串相似度计算（不使用 embedding）
 def simple_similarity(str1: str, str2: str) -> float:
    """
    计算两个字符串的简单相似度
    Args:
        str1: 第一个字符串
        str2: 第二个字符串
    Returns:
        相似度分数 (0-1)
    """
    if str1 == str2:
        return 1.0
    if not str1 or not str2:
        return 0.0
    # 转换为小写
    s1 = str1.lower()
    s2 = str2.lower()
    # 包含关系
    if s1 in s2 or s2 in s1:
        return 0.8
    # 计算编辑距离相似度
    from difflib import SequenceMatcher
    return SequenceMatcher(None, s1, s2).ratio()
 if __name__ == "__main__":
    # 测试
    aligner = EntityAligner()
    # 测试 embedding
    test_text = "Kubernetes 容器编排平台"
    embedding = aligner.get_embedding(test_text)
    if embedding:
        print(f"Embedding dimension: {len(embedding)}")
        print(f"First 5 values: {embedding[:5]}")
    else:
        print("Embedding API not available")
    # 测试相似度计算
    emb1 = [1.0, 0.0, 0.0]
    emb2 = [0.9, 0.1, 0.0]
    sim = aligner.compute_similarity(emb1, emb2)
    print(f"Similarity: {sim:.4f}")
--- a/backend/main.py
+++ b/backend/main.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 """
-InsightFlow Backend - Phase 3 (Production Ready)
+InsightFlow Backend - Phase 3 (Memory & Growth)
-Knowledge Growth: Multi-file fusion + Entity Alignment
+Knowledge Growth: Multi-file fusion + Entity Alignment + Document Import
 ASR: 阿里云听悟 + OSS
 """
@@ -9,6 +9,7 @@ import os
 import json
 import httpx
 import uuid
 import re
 from fastapi import FastAPI, File, UploadFile, HTTPException, Form
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.staticfiles import StaticFiles
@@ -35,6 +36,18 @@ try:
 except ImportError:
    DB_AVAILABLE = False
 try:
    from document_processor import DocumentProcessor
    DOC_PROCESSOR_AVAILABLE = True
 except ImportError:
    DOC_PROCESSOR_AVAILABLE = False
 try:
    from entity_aligner import EntityAligner
    ALIGNER_AVAILABLE = True
 except ImportError:
    ALIGNER_AVAILABLE = False
 app = FastAPI(title="InsightFlow", version="0.3.0")
 app.add_middleware(
@@ -71,9 +84,270 @@ class ProjectCreate(BaseModel):
    name: str
    description: str = ""
 class EntityUpdate(BaseModel):
    name: Optional[str] = None
    type: Optional[str] = None
    definition: Optional[str] = None
    aliases: Optional[List[str]] = None
 class RelationCreate(BaseModel):
    source_entity_id: str
    target_entity_id: str
    relation_type: str
    evidence: Optional[str] = ""
 class TranscriptUpdate(BaseModel):
    full_text: str
 class EntityMergeRequest(BaseModel):
    source_entity_id: str
    target_entity_id: str
 class GlossaryTermCreate(BaseModel):
    term: str
    pronunciation: Optional[str] = ""
 # API Keys
 KIMI_API_KEY = os.getenv("KIMI_API_KEY", "")
-KIMI_BASE_URL = "https://api.kimi.com/coding"
+KIMI_BASE_URL = os.getenv("KIMI_BASE_URL", "https://api.kimi.com/coding")
 # Phase 3: Entity Aligner singleton
 _aligner = None
 def get_aligner():
    global _aligner
    if _aligner is None and ALIGNER_AVAILABLE:
        _aligner = EntityAligner()
    return _aligner
 # Phase 3: Document Processor singleton
 _doc_processor = None
 def get_doc_processor():
    global _doc_processor
    if _doc_processor is None and DOC_PROCESSOR_AVAILABLE:
        _doc_processor = DocumentProcessor()
    return _doc_processor
 # Phase 2: Entity Edit API
@app.put("/api/v1/entities/{entity_id}")
 async def update_entity(entity_id: str, update: EntityUpdate):
    """更新实体信息（名称、类型、定义、别名）"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    entity = db.get_entity(entity_id)
    if not entity:
        raise HTTPException(status_code=404, detail="Entity not found")
    # 更新字段
    update_data = {k: v for k, v in update.dict().items() if v is not None}
    updated = db.update_entity(entity_id, **update_data)
    return {
        "id": updated.id,
        "name": updated.name,
        "type": updated.type,
        "definition": updated.definition,
        "aliases": updated.aliases
    }
@app.delete("/api/v1/entities/{entity_id}")
 async def delete_entity(entity_id: str):
    """删除实体"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    entity = db.get_entity(entity_id)
    if not entity:
        raise HTTPException(status_code=404, detail="Entity not found")
    db.delete_entity(entity_id)
    return {"success": True, "message": f"Entity {entity_id} deleted"}
@app.post("/api/v1/entities/{entity_id}/merge")
 async def merge_entities_endpoint(entity_id: str, merge_req: EntityMergeRequest):
    """合并两个实体"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    # 验证两个实体都存在
    source = db.get_entity(merge_req.source_entity_id)
    target = db.get_entity(merge_req.target_entity_id)
    if not source or not target:
        raise HTTPException(status_code=404, detail="Entity not found")
    result = db.merge_entities(merge_req.target_entity_id, merge_req.source_entity_id)
    return {
        "success": True,
        "merged_entity": {
            "id": result.id,
            "name": result.name,
            "type": result.type,
            "definition": result.definition,
            "aliases": result.aliases
        }
    }
 # Phase 2: Relation Edit API
@app.post("/api/v1/projects/{project_id}/relations")
 async def create_relation_endpoint(project_id: str, relation: RelationCreate):
    """创建新的实体关系"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    # 验证实体存在
    source = db.get_entity(relation.source_entity_id)
    target = db.get_entity(relation.target_entity_id)
    if not source or not target:
        raise HTTPException(status_code=404, detail="Source or target entity not found")
    relation_id = db.create_relation(
        project_id=project_id,
        source_entity_id=relation.source_entity_id,
        target_entity_id=relation.target_entity_id,
        relation_type=relation.relation_type,
        evidence=relation.evidence
    )
    return {
        "id": relation_id,
        "source_id": relation.source_entity_id,
        "target_id": relation.target_entity_id,
        "type": relation.relation_type,
        "success": True
    }
@app.delete("/api/v1/relations/{relation_id}")
 async def delete_relation(relation_id: str):
    """删除关系"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    db.delete_relation(relation_id)
    return {"success": True, "message": f"Relation {relation_id} deleted"}
@app.put("/api/v1/relations/{relation_id}")
 async def update_relation(relation_id: str, relation: RelationCreate):
    """更新关系"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    updated = db.update_relation(
        relation_id=relation_id,
        relation_type=relation.relation_type,
        evidence=relation.evidence
    )
    return {
        "id": relation_id,
        "type": updated["relation_type"],
        "evidence": updated["evidence"],
        "success": True
    }
 # Phase 2: Transcript Edit API
@app.get("/api/v1/transcripts/{transcript_id}")
 async def get_transcript(transcript_id: str):
    """获取转录详情"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    transcript = db.get_transcript(transcript_id)
    if not transcript:
        raise HTTPException(status_code=404, detail="Transcript not found")
    return transcript
@app.put("/api/v1/transcripts/{transcript_id}")
 async def update_transcript(transcript_id: str, update: TranscriptUpdate):
    """更新转录文本（人工修正）"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    transcript = db.get_transcript(transcript_id)
    if not transcript:
        raise HTTPException(status_code=404, detail="Transcript not found")
    updated = db.update_transcript(transcript_id, update.full_text)
    return {
        "id": transcript_id,
        "full_text": updated["full_text"],
        "updated_at": updated["updated_at"],
        "success": True
    }
 # Phase 2: Manual Entity Creation
 class ManualEntityCreate(BaseModel):
    name: str
    type: str = "OTHER"
    definition: str = ""
    transcript_id: Optional[str] = None
    start_pos: Optional[int] = None
    end_pos: Optional[int] = None
@app.post("/api/v1/projects/{project_id}/entities")
 async def create_manual_entity(project_id: str, entity: ManualEntityCreate):
    """手动创建实体（划词新建）"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    # 检查是否已存在
    existing = db.get_entity_by_name(project_id, entity.name)
    if existing:
        return {
            "id": existing.id,
            "name": existing.name,
            "type": existing.type,
            "existed": True
        }
    entity_id = str(uuid.uuid4())[:8]
    new_entity = db.create_entity(Entity(
        id=entity_id,
        project_id=project_id,
        name=entity.name,
        type=entity.type,
        definition=entity.definition
    ))
    # 如果有提及位置信息，保存提及
    if entity.transcript_id and entity.start_pos is not None and entity.end_pos is not None:
        transcript = db.get_transcript(entity.transcript_id)
        if transcript:
            text = transcript["full_text"]
            mention = EntityMention(
                id=str(uuid.uuid4())[:8],
                entity_id=entity_id,
                transcript_id=entity.transcript_id,
                start_pos=entity.start_pos,
                end_pos=entity.end_pos,
                text_snippet=text[max(0, entity.start_pos-20):min(len(text), entity.end_pos+20)],
                confidence=1.0
            )
            db.add_mention(mention)
    return {
        "id": new_entity.id,
        "name": new_entity.name,
        "type": new_entity.type,
        "definition": new_entity.definition,
        "success": True
    }
 def transcribe_audio(audio_data: bytes, filename: str) -> dict:
    """转录音频：OSS上传 + 听悟转录"""
@@ -165,12 +439,21 @@ def extract_entities_with_llm(text: str) -> tuple[List[dict], List[dict]]:
    return [], []
-def align_entity(project_id: str, name: str, db) -> Optional[Entity]:
+def align_entity(project_id: str, name: str, db, definition: str = "") -> Optional[Entity]:
-    """实体对齐"""
+    """实体对齐 - Phase 3: 使用 embedding 对齐"""
    # 1. 首先尝试精确匹配
    existing = db.get_entity_by_name(project_id, name)
    if existing:
        return existing
    # 2. 使用 embedding 对齐（如果可用）
    aligner = get_aligner()
    if aligner:
        similar = aligner.find_similar_entity(project_id, name, definition)
        if similar:
            return similar
    # 3. 回退到简单相似度匹配
    similar = db.find_similar_entities(project_id, name)
    if similar:
        return similar[0]
@@ -202,7 +485,7 @@ async def list_projects():
@app.post("/api/v1/projects/{project_id}/upload", response_model=AnalysisResult)
 async def upload_audio(project_id: str, file: UploadFile = File(...)):
-    """上传音频到指定项目"""
+    """上传音频到指定项目 - Phase 3: 支持多文件融合"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
@@ -230,12 +513,12 @@ async def upload_audio(project_id: str, file: UploadFile = File(...)):
        full_text=tw_result["full_text"]
    )
-    # 实体对齐并保存
+    # 实体对齐并保存 - Phase 3: 使用增强对齐
    aligned_entities = []
    entity_name_to_id = {}  # 用于关系映射
    for raw_ent in raw_entities:
-        existing = align_entity(project_id, raw_ent["name"], db)
+        existing = align_entity(project_id, raw_ent["name"], db, raw_ent.get("definition", ""))
        if existing:
            ent_model = EntityModel(
@@ -310,6 +593,302 @@ async def upload_audio(project_id: str, file: UploadFile = File(...)):
        created_at=datetime.now().isoformat()
    )
 # Phase 3: Document Upload API
@app.post("/api/v1/projects/{project_id}/upload-document")
 async def upload_document(project_id: str, file: UploadFile = File(...)):
    """上传 PDF/DOCX 文档到指定项目"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    if not DOC_PROCESSOR_AVAILABLE:
        raise HTTPException(status_code=500, detail="Document processor not available")
    db = get_db_manager()
    project = db.get_project(project_id)
    if not project:
        raise HTTPException(status_code=404, detail="Project not found")
    content = await file.read()
    # 处理文档
    processor = get_doc_processor()
    try:
        result = processor.process(content, file.filename)
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Document processing failed: {str(e)}")
    # 保存文档转录记录
    transcript_id = str(uuid.uuid4())[:8]
    db.save_transcript(
        transcript_id=transcript_id,
        project_id=project_id,
        filename=file.filename,
        full_text=result["text"],
        transcript_type="document"
    )
    # 提取实体和关系
    raw_entities, raw_relations = extract_entities_with_llm(result["text"])
    # 实体对齐并保存
    aligned_entities = []
    entity_name_to_id = {}
    for raw_ent in raw_entities:
        existing = align_entity(project_id, raw_ent["name"], db, raw_ent.get("definition", ""))
        if existing:
            entity_name_to_id[raw_ent["name"]] = existing.id
            aligned_entities.append(EntityModel(
                id=existing.id,
                name=existing.name,
                type=existing.type,
                definition=existing.definition,
                aliases=existing.aliases
            ))
        else:
            new_ent = db.create_entity(Entity(
                id=str(uuid.uuid4())[:8],
                project_id=project_id,
                name=raw_ent["name"],
                type=raw_ent.get("type", "OTHER"),
                definition=raw_ent.get("definition", "")
            ))
            entity_name_to_id[raw_ent["name"]] = new_ent.id
            aligned_entities.append(EntityModel(
                id=new_ent.id,
                name=new_ent.name,
                type=new_ent.type,
                definition=new_ent.definition
            ))
        # 保存实体提及位置
        full_text = result["text"]
        name = raw_ent["name"]
        start_pos = 0
        while True:
            pos = full_text.find(name, start_pos)
            if pos == -1:
                break
            mention = EntityMention(
                id=str(uuid.uuid4())[:8],
                entity_id=entity_name_to_id[name],
                transcript_id=transcript_id,
                start_pos=pos,
                end_pos=pos + len(name),
                text_snippet=full_text[max(0, pos-20):min(len(full_text), pos+len(name)+20)],
                confidence=1.0
            )
            db.add_mention(mention)
            start_pos = pos + 1
    # 保存关系
    for rel in raw_relations:
        source_id = entity_name_to_id.get(rel.get("source", ""))
        target_id = entity_name_to_id.get(rel.get("target", ""))
        if source_id and target_id:
            db.create_relation(
                project_id=project_id,
                source_entity_id=source_id,
                target_entity_id=target_id,
                relation_type=rel.get("type", "related"),
                evidence=result["text"][:200],
                transcript_id=transcript_id
            )
    return {
        "transcript_id": transcript_id,
        "project_id": project_id,
        "filename": file.filename,
        "text_length": len(result["text"]),
        "entities": [e.dict() for e in aligned_entities],
        "created_at": datetime.now().isoformat()
    }
 # Phase 3: Knowledge Base API
@app.get("/api/v1/projects/{project_id}/knowledge-base")
 async def get_knowledge_base(project_id: str):
    """获取项目知识库 - 包含所有实体、关系、术语表"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    project = db.get_project(project_id)
    if not project:
        raise HTTPException(status_code=404, detail="Project not found")
    # 获取所有实体
    entities = db.list_project_entities(project_id)
    # 获取所有关系
    relations = db.list_project_relations(project_id)
    # 获取所有转录
    transcripts = db.list_project_transcripts(project_id)
    # 获取术语表
    glossary = db.list_glossary(project_id)
    # 构建实体统计
    entity_stats = {}
    for ent in entities:
        mentions = db.get_entity_mentions(ent.id)
        entity_stats[ent.id] = {
            "mention_count": len(mentions),
            "transcript_ids": list(set([m.transcript_id for m in mentions]))
        }
    # 构建实体名称映射
    entity_map = {e.id: e.name for e in entities}
    return {
        "project": {
            "id": project.id,
            "name": project.name,
            "description": project.description
        },
        "stats": {
            "entity_count": len(entities),
            "relation_count": len(relations),
            "transcript_count": len(transcripts),
            "glossary_count": len(glossary)
        },
        "entities": [
            {
                "id": e.id,
                "name": e.name,
                "type": e.type,
                "definition": e.definition,
                "aliases": e.aliases,
                "mention_count": entity_stats.get(e.id, {}).get("mention_count", 0),
                "appears_in": entity_stats.get(e.id, {}).get("transcript_ids", [])
            }
            for e in entities
        ],
        "relations": [
            {
                "id": r["id"],
                "source_id": r["source_entity_id"],
                "source_name": entity_map.get(r["source_entity_id"], "Unknown"),
                "target_id": r["target_entity_id"],
                "target_name": entity_map.get(r["target_entity_id"], "Unknown"),
                "type": r["relation_type"],
                "evidence": r["evidence"]
            }
            for r in relations
        ],
        "glossary": [
            {
                "id": g["id"],
                "term": g["term"],
                "pronunciation": g["pronunciation"],
                "frequency": g["frequency"]
            }
            for g in glossary
        ],
        "transcripts": [
            {
                "id": t["id"],
                "filename": t["filename"],
                "type": t.get("type", "audio"),
                "created_at": t["created_at"]
            }
            for t in transcripts
        ]
    }
 # Phase 3: Glossary API
@app.post("/api/v1/projects/{project_id}/glossary")
 async def add_glossary_term(project_id: str, term: GlossaryTermCreate):
    """添加术语到项目术语表"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    project = db.get_project(project_id)
    if not project:
        raise HTTPException(status_code=404, detail="Project not found")
    term_id = db.add_glossary_term(
        project_id=project_id,
        term=term.term,
        pronunciation=term.pronunciation
    )
    return {
        "id": term_id,
        "term": term.term,
        "pronunciation": term.pronunciation,
        "success": True
    }
@app.get("/api/v1/projects/{project_id}/glossary")
 async def get_glossary(project_id: str):
    """获取项目术语表"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    glossary = db.list_glossary(project_id)
    return glossary
@app.delete("/api/v1/glossary/{term_id}")
 async def delete_glossary_term(term_id: str):
    """删除术语"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    db.delete_glossary_term(term_id)
    return {"success": True}
 # Phase 3: Entity Alignment API
@app.post("/api/v1/projects/{project_id}/align-entities")
 async def align_project_entities(project_id: str, threshold: float = 0.85):
    """运行实体对齐算法，合并相似实体"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    aligner = get_aligner()
    if not aligner:
        raise HTTPException(status_code=500, detail="Entity aligner not available")
    db = get_db_manager()
    entities = db.list_project_entities(project_id)
    merged_count = 0
    merged_pairs = []
    # 使用 embedding 对齐
    for i, entity in enumerate(entities):
        # 跳过已合并的实体
        existing = db.get_entity(entity.id)
        if not existing:
            continue
        similar = aligner.find_similar_entity(
            project_id, 
            entity.name, 
            entity.definition,
            exclude_id=entity.id,
            threshold=threshold
        )
        if similar:
            # 合并实体
            db.merge_entities(similar.id, entity.id)
            merged_count += 1
            merged_pairs.append({
                "source": entity.name,
                "target": similar.name
            })
    return {
        "success": True,
        "merged_count": merged_count,
        "merged_pairs": merged_pairs
    }
@app.get("/api/v1/projects/{project_id}/entities")
 async def get_project_entities(project_id: str):
    """获取项目的全局实体列表"""
@@ -318,7 +897,7 @@ async def get_project_entities(project_id: str):
    db = get_db_manager()
    entities = db.list_project_entities(project_id)
-    return [{"id": e.id, "name": e.name, "type": e.type, "definition": e.definition} for e in entities]
+    return [{"id": e.id, "name": e.name, "type": e.type, "definition": e.definition, "aliases": e.aliases} for e in entities]
@app.get("/api/v1/projects/{project_id}/relations")
@@ -356,6 +935,7 @@ async def get_project_transcripts(project_id: str):
    return [{
        "id": t["id"],
        "filename": t["filename"],
        "type": t.get("type", "audio"),
        "created_at": t["created_at"],
        "preview": t["full_text"][:100] + "..." if len(t["full_text"]) > 100 else t["full_text"]
    } for t in transcripts]
@@ -378,25 +958,18 @@ async def get_entity_mentions(entity_id: str):
        "confidence": m.confidence
    } for m in mentions]
@app.post("/api/v1/entities/{entity_id}/merge")
 async def merge_entities(entity_id: str, target_entity_id: str):
    """合并两个实体"""
    if not DB_AVAILABLE:
        raise HTTPException(status_code=500, detail="Database not available")
    db = get_db_manager()
    result = db.merge_entities(target_entity_id, entity_id)
    return {"success": True, "merged_entity": {"id": result.id, "name": result.name}}
 # Health check
@app.get("/health")
 async def health_check():
    return {
        "status": "ok",
        "version": "0.3.0",
        "phase": "Phase 3 - Memory & Growth",
        "oss_available": OSS_AVAILABLE,
        "tingwu_available": TINGWU_AVAILABLE,
-        "db_available": DB_AVAILABLE
+        "db_available": DB_AVAILABLE,
        "doc_processor_available": DOC_PROCESSOR_AVAILABLE,
        "aligner_available": ALIGNER_AVAILABLE
    }
 # Serve frontend
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -0,0 +1,24 @@
 # InsightFlow Backend Dependencies
 # Web Framework
 fastapi==0.109.0
 uvicorn[standard]==0.27.0
 python-multipart==0.0.6
 # HTTP Client
 httpx==0.26.0
 # Document Processing
 PyPDF2==3.0.1
 python-docx==1.1.0
 # Data Processing
 numpy==1.26.3
 # Aliyun SDK
 aliyun-python-sdk-core==2.14.0
 aliyun-python-sdk-oss==2.18.5
 oss2==2.18.5
 # Utilities
 python-dotenv==1.0.0
--- a/backend/schema.sql
+++ b/backend/schema.sql
@@ -16,7 +16,9 @@ CREATE TABLE IF NOT EXISTS transcripts (
    project_id TEXT NOT NULL,
    filename TEXT,
    full_text TEXT,
    type TEXT DEFAULT 'audio',  -- 'audio' 或 'document'
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
@@ -29,6 +31,7 @@ CREATE TABLE IF NOT EXISTS entities (
    type TEXT,
    definition TEXT,
    aliases TEXT,  -- JSON 数组：["别名1", "别名2"]
    embedding TEXT,  -- JSON 数组：实体名称+定义的 embedding
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id)
@@ -71,3 +74,12 @@ CREATE TABLE IF NOT EXISTS glossary (
    frequency INTEGER DEFAULT 1,
    FOREIGN KEY (project_id) REFERENCES projects(id)
 );
 -- 创建索引以提高查询性能
 CREATE INDEX IF NOT EXISTS idx_entities_project ON entities(project_id);
 CREATE INDEX IF NOT EXISTS idx_entities_name ON entities(name);
 CREATE INDEX IF NOT EXISTS idx_transcripts_project ON transcripts(project_id);
 CREATE INDEX IF NOT EXISTS idx_mentions_entity ON entity_mentions(entity_id);
 CREATE INDEX IF NOT EXISTS idx_mentions_transcript ON entity_mentions(transcript_id);
 CREATE INDEX IF NOT EXISTS idx_relations_project ON entity_relations(project_id);
 CREATE INDEX IF NOT EXISTS idx_glossary_project ON glossary(project_id);
--- a/deploy.sh
+++ b/deploy.sh
@@ -0,0 +1,80 @@
 #!/bin/bash
 # InsightFlow Phase 3 部署脚本
 set -e
 echo "🚀 InsightFlow Phase 3 部署脚本"
 echo "================================"
 # 检查环境
 if ! command -v docker &> /dev/null; then
    echo "❌ Docker 未安装，请先安装 Docker"
    exit 1
 fi
 if ! command -v git &> /dev/null; then
    echo "❌ Git 未安装，请先安装 Git"
    exit 1
 fi
 # 配置
 IMAGE_NAME="insightflow"
 IMAGE_TAG="phase3"
 CONTAINER_NAME="insightflow-app"
 PORT="18000"
 DATA_DIR="/opt/data/insightflow"
 # 检查环境变量
 if [ -z "$KIMI_API_KEY" ]; then
    echo "⚠️  警告: KIMI_API_KEY 未设置"
 fi
 if [ -z "$ALIYUN_ACCESS_KEY_ID" ]; then
    echo "⚠️  警告: ALIYUN_ACCESS_KEY_ID 未设置"
 fi
 if [ -z "$ALIYUN_ACCESS_KEY_SECRET" ]; then
    echo "⚠️  警告: ALIYUN_ACCESS_KEY_SECRET 未设置"
 fi
 echo ""
 echo "📦 构建 Docker 镜像..."
 docker build -t ${IMAGE_NAME}:${IMAGE_TAG} .
 echo ""
 echo "🛑 停止旧容器..."
 docker stop ${CONTAINER_NAME} 2>/dev/null || true
 docker rm ${CONTAINER_NAME} 2>/dev/null || true
 echo ""
 echo "📁 创建数据目录..."
 mkdir -p ${DATA_DIR}
 echo ""
 echo "🚀 启动新容器..."
 docker run -d \
  --name ${CONTAINER_NAME} \
  -p ${PORT}:8000 \
  -v ${DATA_DIR}:/app/data \
  -e KIMI_API_KEY="${KIMI_API_KEY}" \
  -e KIMI_BASE_URL="${KIMI_BASE_URL:-https://api.kimi.com/coding}" \
  -e ALIYUN_ACCESS_KEY_ID="${ALIYUN_ACCESS_KEY_ID}" \
  -e ALIYUN_ACCESS_KEY_SECRET="${ALIYUN_ACCESS_KEY_SECRET}" \
  -e DB_PATH="/app/data/insightflow.db" \
  --restart unless-stopped \
  ${IMAGE_NAME}:${IMAGE_TAG}
 echo ""
 echo "⏳ 等待服务启动..."
 sleep 3
 echo ""
 echo "✅ 部署完成！"
 echo ""
 echo "📊 服务状态:"
 docker ps --filter "name=${CONTAINER_NAME}" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
 echo ""
 echo "🔗 访问地址: http://localhost:${PORT}"
 echo "📋 查看日志: docker logs -f ${CONTAINER_NAME}"
 echo ""
--- a/docs/PHASE3_FEATURES.md
+++ b/docs/PHASE3_FEATURES.md
@@ -0,0 +1,198 @@
 # InsightFlow Phase 3 功能说明
 ## 概述
 Phase 3 实现了 InsightFlow 的"记忆与生长"能力，支持多文件知识融合、文档导入和项目级知识库管理。
 ## 功能清单
 ### 1. 多文件图谱融合 ✅
 #### 功能描述
 - 支持向同一项目上传多个音频文件
 - 系统自动对齐新文件中的实体与已有实体
 - 合并知识图谱，保持实体一致性
 - 跨文件追踪实体提及
 #### 使用方式
 1. 在工作台点击"+ 上传文件"
 2. 选择音频文件（MP3/WAV/M4A）
 3. 系统自动转录并提取实体
 4. 新实体与已有实体自动对齐
 5. 使用"📁 选择文件"切换不同转录内容
 #### API
 ```
 POST /api/v1/projects/{project_id}/upload
 Content-Type: multipart/form-data
 file: <音频文件>
 ```
 ### 2. 实体对齐算法优化 ✅
 #### 功能描述
 - 使用 Kimi API 的 embedding 服务计算语义相似度
 - 余弦相似度匹配算法
 - 支持阈值调整（默认 0.85）
 - 自动别名建议
 - 失败时回退到字符串匹配
 #### 实现模块
 - `backend/entity_aligner.py`
 #### 核心算法
 ```python
 # 余弦相似度计算
 def compute_similarity(embedding1, embedding2):
    vec1 = np.array(embedding1)
    vec2 = np.array(embedding2)
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot_product / (norm1 * norm2)
 ```
 #### API
 ```
 POST /api/v1/projects/{project_id}/align-entities?threshold=0.85
 ```
 ### 3. PDF/DOCX 文档导入 ✅
 #### 功能描述
 - 支持 PDF、DOCX、DOC、TXT、MD 格式
 - 自动提取文档文本
 - 文本参与实体提取和关系构建
 - 文档类型标记和区分
 #### 使用方式
 1. 在工作台点击"+ 上传文件"
 2. 切换到"📄 文档"标签
 3. 选择文档文件
 4. 系统自动解析并提取知识
 #### 实现模块
 - `backend/document_processor.py`
 #### API
 ```
 POST /api/v1/projects/{project_id}/upload-document
 Content-Type: multipart/form-data
 file: <文档文件>
 ```
 ### 4. 项目知识库面板 ✅
 #### 功能描述
 - 项目级全域知识库视图
 - 统计面板（实体数、关系数、文件数、术语数）
 - 实体网格展示（带提及统计）
 - 关系列表展示
 - 术语表管理
 - 文件列表（区分音频/文档）
 #### 使用方式
 1. 在工作台点击左侧"📚"图标
 2. 查看项目统计概览
 3. 切换侧边栏标签浏览不同内容
 4. 点击实体可跳转回工作台查看详情
 #### API
 ```
 GET /api/v1/projects/{project_id}/knowledge-base
 ```
 ### 5. 术语表管理 ✅
 #### 功能描述
 - 项目级术语表
 - 支持添加术语和发音提示
 - 频率统计
 - 用于 ASR 热词优化
 #### 使用方式
 1. 在知识库面板切换到"📖 术语表"
 2. 点击"+ 添加术语"
 3. 输入术语和发音提示
 4. 可删除不需要的术语
 #### API
 ```
 POST /api/v1/projects/{project_id}/glossary
 GET /api/v1/projects/{project_id}/glossary
 DELETE /api/v1/glossary/{term_id}
 ```
 ## 数据库 Schema 更新
 ### transcripts 表
 ```sql
 ALTER TABLE transcripts ADD COLUMN type TEXT DEFAULT 'audio';
 -- 'audio' 或 'document'
 ```
 ### entities 表
 ```sql
 ALTER TABLE entities ADD COLUMN embedding TEXT;
 -- JSON 数组存储 embedding 向量
 ```
 ### glossary 表（新增）
 ```sql
 CREATE TABLE glossary (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    term TEXT NOT NULL,
    pronunciation TEXT,
    frequency INTEGER DEFAULT 1
 );
 ```
 ## 前端更新
 ### 新增组件
 1. **侧边栏导航** - 切换工作台/知识库视图
 2. **文件选择器** - 切换不同转录文件
 3. **上传标签页** - 区分音频/文档上传
 4. **知识库面板** - 统计卡片、实体网格、关系列表、术语表
 ### 更新文件
 - `frontend/workbench.html` - 新增知识库 UI
 - `frontend/app.js` - 新增知识库逻辑、多文件支持
 ## 部署说明
 ### 环境变量
 ```bash
 KIMI_API_KEY=your_kimi_api_key
 KIMI_BASE_URL=https://api.kimi.com/coding
 ALIYUN_ACCESS_KEY_ID=your_aliyun_key
 ALIYUN_ACCESS_KEY_SECRET=your_aliyun_secret
 ```
 ### 部署命令
 ```bash
 # 使用部署脚本
 ./deploy.sh
 # 或手动部署
 docker build -t insightflow:phase3 .
 docker run -d \
  -p 18000:8000 \
  -v /opt/data:/app/data \
  -e KIMI_API_KEY=$KIMI_API_KEY \
  insightflow:phase3
 ```
 ## 测试检查清单
 - [ ] 上传多个音频文件到同一项目
 - [ ] 检查实体是否正确对齐
 - [ ] 上传 PDF 文档
 - [ ] 上传 DOCX 文档
 - [ ] 切换不同转录文件
 - [ ] 查看知识库面板统计
 - [ ] 添加术语到术语表
 - [ ] 删除术语
 - [ ] 实体合并功能
 - [ ] 关系创建/删除
--- a/frontend/app.js
+++ b/frontend/app.js
@@ -1,4 +1,5 @@
-// InsightFlow Frontend - Production Version
+// InsightFlow Frontend - Phase 3 (Memory & Growth)
 // Knowledge Growth: Multi-file fusion + Entity Alignment + Document Import
 const API_BASE = '/api/v1';
 let currentProject = null;
@@ -6,6 +7,12 @@ let currentData = null;
 let selectedEntity = null;
 let projectRelations = [];
 let projectEntities = [];
 let currentTranscript = null;
 let projectTranscripts = [];
 let editMode = false;
 let contextMenuTarget = null;
 let currentUploadTab = 'audio';
 let knowledgeBaseData = null;
 // Init
 document.addEventListener('DOMContentLoaded', () => {
@@ -37,6 +44,8 @@ async function initWorkbench() {
        if (nameEl) nameEl.textContent = currentProject.name;
        initUpload();
        initContextMenu();
        initTextSelection();
        await loadProjectData();
    } catch (err) {
@@ -65,12 +74,131 @@ async function uploadAudio(file) {
    return await res.json();
 }
 // Phase 3: Document Upload API
 async function uploadDocument(file) {
    const formData = new FormData();
    formData.append('file', file);
    const res = await fetch(`${API_BASE}/projects/${currentProject.id}/upload-document`, {
        method: 'POST',
        body: formData
    });
    if (!res.ok) {
        const error = await res.json();
        throw new Error(error.detail || 'Document upload failed');
    }
    return await res.json();
 }
 // Phase 3: Knowledge Base API
 async function fetchKnowledgeBase() {
    const res = await fetch(`${API_BASE}/projects/${currentProject.id}/knowledge-base`);
    if (!res.ok) throw new Error('Failed to fetch knowledge base');
    return await res.json();
 }
 // Phase 3: Glossary API
 async function addGlossaryTerm(term, pronunciation = '') {
    const res = await fetch(`${API_BASE}/projects/${currentProject.id}/glossary`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ term, pronunciation })
    });
    if (!res.ok) throw new Error('Failed to add glossary term');
    return await res.json();
 }
 async function deleteGlossaryTerm(termId) {
    const res = await fetch(`${API_BASE}/glossary/${termId}`, {
        method: 'DELETE'
    });
    if (!res.ok) throw new Error('Failed to delete glossary term');
    return await res.json();
 }
 // Phase 2: Entity Edit API
 async function updateEntity(entityId, data) {
    const res = await fetch(`${API_BASE}/entities/${entityId}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(data)
    });
    if (!res.ok) throw new Error('Failed to update entity');
    return await res.json();
 }
 async function deleteEntityApi(entityId) {
    const res = await fetch(`${API_BASE}/entities/${entityId}`, {
        method: 'DELETE'
    });
    if (!res.ok) throw new Error('Failed to delete entity');
    return await res.json();
 }
 async function mergeEntitiesApi(sourceId, targetId) {
    const res = await fetch(`${API_BASE}/entities/${sourceId}/merge`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ source_entity_id: sourceId, target_entity_id: targetId })
    });
    if (!res.ok) throw new Error('Failed to merge entities');
    return await res.json();
 }
 async function createEntityApi(data) {
    const res = await fetch(`${API_BASE}/projects/${currentProject.id}/entities`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(data)
    });
    if (!res.ok) throw new Error('Failed to create entity');
    return await res.json();
 }
 // Phase 2: Relation API
 async function createRelationApi(data) {
    const res = await fetch(`${API_BASE}/projects/${currentProject.id}/relations`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(data)
    });
    if (!res.ok) throw new Error('Failed to create relation');
    return await res.json();
 }
 async function deleteRelationApi(relationId) {
    const res = await fetch(`${API_BASE}/relations/${relationId}`, {
        method: 'DELETE'
    });
    if (!res.ok) throw new Error('Failed to delete relation');
    return await res.json();
 }
 // Phase 2: Transcript API
 async function getTranscript(transcriptId) {
    const res = await fetch(`${API_BASE}/transcripts/${transcriptId}`);
    if (!res.ok) throw new Error('Failed to get transcript');
    return await res.json();
 }
 async function updateTranscript(transcriptId, fullText) {
    const res = await fetch(`${API_BASE}/transcripts/${transcriptId}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ full_text: fullText })
    });
    if (!res.ok) throw new Error('Failed to update transcript');
    return await res.json();
 }
 async function loadProjectData() {
    try {
-        // 并行加载实体和关系
+        // 并行加载实体、关系和转录列表
-        const [entitiesRes, relationsRes] = await Promise.all([
+        const [entitiesRes, relationsRes, transcriptsRes] = await Promise.all([
            fetch(`${API_BASE}/projects/${currentProject.id}/entities`),
-            fetch(`${API_BASE}/projects/${currentProject.id}/relations`)
+            fetch(`${API_BASE}/projects/${currentProject.id}/relations`),
            fetch(`${API_BASE}/projects/${currentProject.id}/transcripts`)
        ]);
        if (entitiesRes.ok) {
@@ -79,39 +207,228 @@ async function loadProjectData() {
        if (relationsRes.ok) {
            projectRelations = await relationsRes.json();
        }
        if (transcriptsRes.ok) {
            projectTranscripts = await transcriptsRes.json();
        }
        // 加载最新的转录
        if (projectTranscripts.length > 0) {
            currentTranscript = await getTranscript(projectTranscripts[0].id);
            currentData = {
-            transcript_id: 'project_view',
+                transcript_id: currentTranscript.id,
                project_id: currentProject.id,
-            segments: [],
+                segments: [{ speaker: '全文', text: currentTranscript.full_text }],
                entities: projectEntities,
-            full_text: '',
+                full_text: currentTranscript.full_text,
-            created_at: new Date().toISOString()
+                created_at: currentTranscript.created_at
            };
            renderTranscript();
        }
        renderGraph();
        renderEntityList();
        renderTranscriptDropdown();
    } catch (err) {
        console.error('Load project data failed:', err);
    }
 }
 // Phase 3: View Switching
 window.switchView = function(viewName) {
    // Update sidebar buttons
    document.querySelectorAll('.sidebar-btn').forEach(btn => {
        btn.classList.remove('active');
    });
    event.target.classList.add('active');
    if (viewName === 'workbench') {
        document.getElementById('workbenchView').style.display = 'flex';
        document.getElementById('knowledgeBaseView').classList.remove('show');
    } else if (viewName === 'knowledge-base') {
        document.getElementById('workbenchView').style.display = 'none';
        document.getElementById('knowledgeBaseView').classList.add('show');
        loadKnowledgeBase();
    }
 };
 // Phase 3: Load Knowledge Base
 async function loadKnowledgeBase() {
    try {
        knowledgeBaseData = await fetchKnowledgeBase();
        renderKnowledgeBase();
    } catch (err) {
        console.error('Load knowledge base failed:', err);
    }
 }
 // Phase 3: Render Knowledge Base
 function renderKnowledgeBase() {
    if (!knowledgeBaseData) return;
    // Update stats
    document.getElementById('kbEntityCount').textContent = knowledgeBaseData.stats.entity_count;
    document.getElementById('kbRelationCount').textContent = knowledgeBaseData.stats.relation_count;
    document.getElementById('kbTranscriptCount').textContent = knowledgeBaseData.stats.transcript_count;
    document.getElementById('kbGlossaryCount').textContent = knowledgeBaseData.stats.glossary_count;
    // Render entities
    const entityGrid = document.getElementById('kbEntityGrid');
    entityGrid.innerHTML = knowledgeBaseData.entities.map(e => `
        <div class="kb-entity-card" onclick="selectEntity('${e.id}'); switchView('workbench');">
            <span class="entity-type-badge type-${e.type}">${e.type}</span>
            <div class="kb-entity-name">${e.name}</div>
            <div class="kb-entity-def">${e.definition || '暂无定义'}</div>
            <div class="kb-entity-meta">提及 ${e.mention_count} 次 | 出现在 ${e.appears_in.length} 个文件中</div>
        </div>
    `).join('');
    // Render relations
    const relationsList = document.getElementById('kbRelationsList');
    relationsList.innerHTML = knowledgeBaseData.relations.map(r => `
        <div class="kb-glossary-item">
            <div>
                <strong>${r.source_name}</strong> 
                <span style="color:#666;">→ ${r.type} →</span> 
                <strong>${r.target_name}</strong>
                <div style="font-size:0.8rem;color:#666;margin-top:4px;">${r.evidence || '无证据'}</div>
            </div>
        </div>
    `).join('');
    // Render glossary
    const glossaryList = document.getElementById('kbGlossaryList');
    glossaryList.innerHTML = knowledgeBaseData.glossary.map(g => `
        <div class="kb-glossary-item">
            <div>
                <strong>${g.term}</strong>
                ${g.pronunciation ? `<span style="color:#666;font-size:0.85rem;"> (${g.pronunciation})</span>` : ''}
                <span style="color:#00d4ff;font-size:0.8rem;margin-left:8px;">出现 ${g.frequency} 次</span>
            </div>
            <button class="btn-icon" onclick="deleteGlossaryTerm('${g.id}').then(loadKnowledgeBase)">删除</button>
        </div>
    `).join('');
    // Render transcripts
    const transcriptsList = document.getElementById('kbTranscriptsList');
    transcriptsList.innerHTML = knowledgeBaseData.transcripts.map(t => `
        <div class="kb-transcript-item">
            <div>
                <span class="file-type-icon type-${t.type}">${t.type === 'audio' ? '🎵' : '📄'}</span>
                <span style="margin-left:8px;">${t.filename}</span>
            </div>
            <span style="color:#666;font-size:0.8rem;">${new Date(t.created_at).toLocaleDateString()}</span>
        </div>
    `).join('');
 }
 // Phase 3: KB Tab Switching
 window.switchKBTab = function(tabName) {
    document.querySelectorAll('.kb-nav-item').forEach(item => {
        item.classList.remove('active');
    });
    event.target.classList.add('active');
    document.querySelectorAll('.kb-section').forEach(section => {
        section.classList.remove('active');
    });
    document.getElementById(`kb${tabName.charAt(0).toUpperCase() + tabName.slice(1)}Section`).classList.add('active');
 };
 // Phase 3: Transcript Dropdown
 window.toggleTranscriptDropdown = function() {
    const dropdown = document.getElementById('transcriptDropdown');
    dropdown.classList.toggle('show');
 };
 function renderTranscriptDropdown() {
    const dropdown = document.getElementById('transcriptDropdown');
    if (!dropdown || projectTranscripts.length === 0) return;
    dropdown.innerHTML = projectTranscripts.map(t => `
        <div class="transcript-option ${currentTranscript && currentTranscript.id === t.id ? 'active' : ''}" 
             onclick="switchTranscript('${t.id}')">
            <span class="file-type-icon type-${t.type || 'audio'}">${(t.type || 'audio') === 'audio' ? '🎵' : '📄'}</span>
            <span style="margin-left:4px;">${t.filename}</span>
        </div>
    `).join('');
 }
 window.switchTranscript = async function(transcriptId) {
    try {
        currentTranscript = await getTranscript(transcriptId);
        currentData = {
            transcript_id: currentTranscript.id,
            project_id: currentProject.id,
            segments: [{ speaker: '全文', text: currentTranscript.full_text }],
            entities: projectEntities,
            full_text: currentTranscript.full_text,
            created_at: currentTranscript.created_at
        };
        renderTranscript();
        renderTranscriptDropdown();
        document.getElementById('transcriptDropdown').classList.remove('show');
    } catch (err) {
        console.error('Switch transcript failed:', err);
        alert('切换文件失败');
    }
 };
 // Phase 2: Transcript Edit Mode
 window.toggleEditMode = function() {
    editMode = !editMode;
    const editBtn = document.getElementById('editBtn');
    const saveBtn = document.getElementById('saveBtn');
    const content = document.getElementById('transcriptContent');
    if (editMode) {
        editBtn.style.display = 'none';
        saveBtn.style.display = 'inline-block';
        content.contentEditable = 'true';
        content.style.background = '#0f0f0f';
        content.style.border = '1px solid #00d4ff';
        content.focus();
    } else {
        editBtn.style.display = 'inline-block';
        saveBtn.style.display = 'none';
        content.contentEditable = 'false';
        content.style.background = '';
        content.style.border = '';
    }
 };
 window.saveTranscript = async function() {
    if (!currentTranscript) return;
    const content = document.getElementById('transcriptContent');
    const fullText = content.innerText;
    try {
        await updateTranscript(currentTranscript.id, fullText);
        currentTranscript.full_text = fullText;
        toggleEditMode();
        alert('转录文本已保存');
    } catch (err) {
        console.error('Save failed:', err);
        alert('保存失败: ' + err.message);
    }
 };
 // Render transcript with entity highlighting
 function renderTranscript() {
    const container = document.getElementById('transcriptContent');
-    if (!container || !currentData || !currentData.segments) return;
+    if (!container || !currentData) return;
    container.innerHTML = '';
-    currentData.segments.forEach((seg, idx) => {
+    if (editMode) {
-        const div = document.createElement('div');
+        container.innerText = currentData.full_text || '';
-        div.className = 'segment';
+        return;
-        div.dataset.index = idx;
+    }
    // 高亮实体
-        let text = seg.text;
+    let text = currentData.full_text || '';
-        const entities = findEntitiesInText(seg.text);
+    const entities = findEntitiesInText(text);
    // 按位置倒序替换，避免位置偏移
    entities.sort((a, b) => b.start - a.start);
@@ -123,13 +440,14 @@ function renderTranscript() {
        text = before + `<span class="entity" data-id="${ent.id}" onclick="window.selectEntity('${ent.id}')">${name}</span>` + after;
    });
    const div = document.createElement('div');
    div.className = 'segment';
    div.innerHTML = `
-            <div class="speaker">${seg.speaker}</div>
+        <div class="speaker">${currentTranscript.filename || '转录文本'}</div>
        <div class="segment-text">${text}</div>
    `;
    container.appendChild(div);
    });
 }
 // 在文本中查找实体位置
@@ -181,7 +499,7 @@ function renderGraph() {
            .attr('y', '50%')
            .attr('text-anchor', 'middle')
            .attr('fill', '#666')
-            .text('暂无实体数据，请上传音频');
+            .text('暂无实体数据，请上传音频或文档');
        return;
    }
@@ -201,6 +519,7 @@ function renderGraph() {
    // 使用数据库中的关系
    const links = projectRelations.map(r => ({
        id: r.id,
        source: r.source_id,
        target: r.target_id,
        type: r.type
@@ -256,7 +575,11 @@ function renderGraph() {
            .on('start', dragstarted)
            .on('drag', dragged)
            .on('end', dragended))
-        .on('click', (e, d) => window.selectEntity(d.id));
+        .on('click', (e, d) => window.selectEntity(d.id))
        .on('contextmenu', (e, d) => {
            e.preventDefault();
            showContextMenu(e, d.id);
        });
    // 节点圆圈
    node.append('circle')
@@ -323,7 +646,7 @@ function renderEntityList() {
    container.innerHTML = '<h3 style="margin-bottom:12px;color:#888;font-size:0.9rem;">项目实体</h3>';
    if (!projectEntities || projectEntities.length === 0) {
-        container.innerHTML += '<p style="color:#666;font-size:0.85rem;">暂无实体，请上传音频文件</p>';
+        container.innerHTML += '<p style="color:#666;font-size:0.85rem;">暂无实体，请上传音频或文档文件</p>';
        return;
    }
@@ -332,9 +655,13 @@ function renderEntityList() {
        div.className = 'entity-item';
        div.dataset.id = ent.id;
        div.onclick = () => window.selectEntity(ent.id);
        div.oncontextmenu = (e) => {
            e.preventDefault();
            showContextMenu(e, ent.id);
        };
        div.innerHTML = `
-            <span class="entity-type-badge type-${ent.type.toLowerCase()}">${ent.type}</span>
+            <span class="entity-type-badge type-${ent.type}">${ent.type}</span>
            <div>
                <div style="font-weight:500;">${ent.name}</div>
                <div style="font-size:0.8rem;color:#666;">${ent.definition || '暂无定义'}</div>
@@ -354,11 +681,9 @@ window.selectEntity = function(entityId) {
    // 高亮文本中的实体
    document.querySelectorAll('.entity').forEach(el => {
        if (el.dataset.id === entityId) {
-            el.style.background = '#ff6b6b';
+            el.classList.add('selected');
            el.style.color = '#fff';
        } else {
-            el.style.background = '';
+            el.classList.remove('selected');
            el.style.color = '';
        }
    });
@@ -371,17 +696,308 @@ window.selectEntity = function(entityId) {
    // 高亮实体列表
    document.querySelectorAll('.entity-item').forEach(el => {
        if (el.dataset.id === entityId) {
-            el.style.background = '#2a2a2a';
+            el.classList.add('selected');
            el.style.borderLeft = '3px solid #ff6b6b';
        } else {
-            el.style.background = '';
+            el.classList.remove('selected');
            el.style.borderLeft = '';
        }
    });
    console.log('Selected:', entity.name, entity.definition);
 };
 // Phase 2: Context Menu
 function initContextMenu() {
    document.addEventListener('click', () => {
        hideContextMenu();
    });
 }
 function showContextMenu(e, entityId) {
    contextMenuTarget = entityId;
    const menu = document.getElementById('contextMenu');
    menu.style.left = e.pageX + 'px';
    menu.style.top = e.pageY + 'px';
    menu.classList.add('show');
 }
 function hideContextMenu() {
    const menu = document.getElementById('contextMenu');
    menu.classList.remove('show');
    contextMenuTarget = null;
 }
 // Phase 2: Entity Editor Modal
 window.editEntity = function() {
    hideContextMenu();
    if (!contextMenuTarget && !selectedEntity) return;
    const entityId = contextMenuTarget || selectedEntity;
    const entity = projectEntities.find(e => e.id === entityId);
    if (!entity) return;
    document.getElementById('entityName').value = entity.name;
    document.getElementById('entityType').value = entity.type;
    document.getElementById('entityDefinition').value = entity.definition || '';
    document.getElementById('entityAliases').value = (entity.aliases || []).join(', ');
    // 显示关系编辑器
    document.getElementById('relationEditor').style.display = 'block';
    renderRelationList(entityId);
    document.getElementById('entityModal').dataset.entityId = entityId;
    document.getElementById('entityModal').classList.add('show');
 };
 function renderRelationList(entityId) {
    const container = document.getElementById('relationList');
    const entityRelations = projectRelations.filter(r => 
        r.source_id === entityId || r.target_id === entityId
    );
    if (entityRelations.length === 0) {
        container.innerHTML = '<p style="color:#666;font-size:0.8rem;">暂无关系</p>';
        return;
    }
    container.innerHTML = entityRelations.map(r => {
        const isSource = r.source_id === entityId;
        const otherId = isSource ? r.target_id : r.source_id;
        const other = projectEntities.find(e => e.id === otherId);
        const otherName = other ? other.name : 'Unknown';
        const arrow = isSource ? '→' : '←';
        return `
            <div class="relation-item">
                <span>${arrow} ${otherName} (${r.type})</span>
                <button onclick="deleteRelation('${r.id}')">删除</button>
            </div>
        `;
    }).join('');
 }
 window.hideEntityModal = function() {
    document.getElementById('entityModal').classList.remove('show');
 };
 window.saveEntity = async function() {
    const entityId = document.getElementById('entityModal').dataset.entityId;
    if (!entityId) return;
    const data = {
        name: document.getElementById('entityName').value,
        type: document.getElementById('entityType').value,
        definition: document.getElementById('entityDefinition').value,
        aliases: document.getElementById('entityAliases').value.split(',').map(s => s.trim()).filter(s => s)
    };
    try {
        await updateEntity(entityId, data);
        await loadProjectData();
        hideEntityModal();
    } catch (err) {
        console.error('Save failed:', err);
        alert('保存失败: ' + err.message);
    }
 };
 window.deleteEntity = async function() {
    const entityId = document.getElementById('entityModal').dataset.entityId;
    if (!entityId) return;
    if (!confirm('确定要删除这个实体吗？相关的提及和关系也会被删除。')) return;
    try {
        await deleteEntityApi(entityId);
        await loadProjectData();
        hideEntityModal();
    } catch (err) {
        console.error('Delete failed:', err);
        alert('删除失败: ' + err.message);
    }
 };
 // Phase 2: Merge Modal
 window.showMergeModal = function() {
    hideContextMenu();
    if (!contextMenuTarget && !selectedEntity) return;
    const sourceId = contextMenuTarget || selectedEntity;
    const source = projectEntities.find(e => e.id === sourceId);
    if (!source) return;
    document.getElementById('mergeSource').value = source.name;
    document.getElementById('mergeModal').dataset.sourceId = sourceId;
    // 填充目标实体选项（排除自己）
    const select = document.getElementById('mergeTarget');
    select.innerHTML = projectEntities
        .filter(e => e.id !== sourceId)
        .map(e => `<option value="${e.id}">${e.name} (${e.type})</option>`)
        .join('');
    document.getElementById('mergeModal').classList.add('show');
 };
 window.hideMergeModal = function() {
    document.getElementById('mergeModal').classList.remove('show');
 };
 window.confirmMerge = async function() {
    const sourceId = document.getElementById('mergeModal').dataset.sourceId;
    const targetId = document.getElementById('mergeTarget').value;
    if (!sourceId || !targetId) return;
    try {
        await mergeEntitiesApi(sourceId, targetId);
        await loadProjectData();
        hideMergeModal();
    } catch (err) {
        console.error('Merge failed:', err);
        alert('合并失败: ' + err.message);
    }
 };
 // Phase 2: Relation Modal
 window.showAddRelation = function() {
    const entityId = document.getElementById('entityModal').dataset.entityId;
    if (!entityId) return;
    const entity = projectEntities.find(e => e.id === entityId);
    document.getElementById('relationModal').dataset.sourceId = entityId;
    // 填充目标选项
    const select = document.getElementById('relationTarget');
    select.innerHTML = projectEntities
        .filter(e => e.id !== entityId)
        .map(e => `<option value="${e.id}">${e.name}</option>`)
        .join('');
    document.getElementById('relationModal').classList.add('show');
 };
 window.hideRelationModal = function() {
    document.getElementById('relationModal').classList.remove('show');
 };
 window.saveRelation = async function() {
    const sourceId = document.getElementById('relationModal').dataset.sourceId;
    const targetId = document.getElementById('relationTarget').value;
    const type = document.getElementById('relationType').value;
    const evidence = document.getElementById('relationEvidence').value;
    if (!sourceId || !targetId) return;
    try {
        await createRelationApi({
            source_entity_id: sourceId,
            target_entity_id: targetId,
            relation_type: type,
            evidence: evidence
        });
        await loadProjectData();
        renderRelationList(sourceId);
        hideRelationModal();
    } catch (err) {
        console.error('Create relation failed:', err);
        alert('创建关系失败: ' + err.message);
    }
 };
 window.deleteRelation = async function(relationId) {
    if (!confirm('确定要删除这个关系吗？')) return;
    try {
        await deleteRelationApi(relationId);
        await loadProjectData();
        const entityId = document.getElementById('entityModal').dataset.entityId;
        if (entityId) renderRelationList(entityId);
    } catch (err) {
        console.error('Delete relation failed:', err);
        alert('删除关系失败: ' + err.message);
    }
 };
 // Phase 2: Text Selection - Create Entity
 function initTextSelection() {
    document.addEventListener('selectionchange', () => {
        const selection = window.getSelection();
        const text = selection.toString().trim();
        if (text.length > 0 && text.length < 50) {
            showSelectionToolbar();
        } else {
            hideSelectionToolbar();
        }
    });
 }
 function showSelectionToolbar() {
    document.getElementById('selectionToolbar').classList.add('show');
 }
 window.hideSelectionToolbar = function() {
    document.getElementById('selectionToolbar').classList.remove('show');
    window.getSelection().removeAllRanges();
 };
 window.createEntityFromSelection = async function() {
    const selection = window.getSelection();
    const text = selection.toString().trim();
    if (!text) return;
    // 获取选中文本在全文中的位置
    const container = document.getElementById('transcriptContent');
    const fullText = currentTranscript ? currentTranscript.full_text : '';
    const startPos = fullText.indexOf(text);
    try {
        const result = await createEntityApi({
            name: text,
            type: 'OTHER',
            definition: '',
            transcript_id: currentTranscript ? currentTranscript.id : null,
            start_pos: startPos >= 0 ? startPos : null,
            end_pos: startPos >= 0 ? startPos + text.length : null
        });
        hideSelectionToolbar();
        await loadProjectData();
        if (!result.existed) {
            alert(`已创建实体: ${text}`);
        } else {
            alert(`实体 "${text}" 已存在`);
        }
    } catch (err) {
        console.error('Create entity failed:', err);
        alert('创建实体失败: ' + err.message);
    }
 };
 // Phase 3: Upload Tab Switching
 window.switchUploadTab = function(tab) {
    currentUploadTab = tab;
    document.querySelectorAll('.upload-tab').forEach(t => t.classList.remove('active'));
    event.target.classList.add('active');
    const hint = document.getElementById('uploadHint');
    if (tab === 'audio') {
        hint.textContent = '支持 MP3, WAV, M4A (最大 500MB)';
    } else {
        hint.textContent = '支持 PDF, DOCX, DOC, TXT, MD';
    }
 };
 window.triggerFileSelect = function() {
    if (currentUploadTab === 'audio') {
        document.getElementById('fileInput').click();
    } else {
        document.getElementById('docInput').click();
    }
 };
 // Show/hide upload
 window.showUpload = function() {
    const el = document.getElementById('uploadOverlay');
@@ -393,46 +1009,105 @@ window.hideUpload = function() {
    if (el) el.classList.remove('show');
 };
 // Phase 3: Glossary Modal
 window.showAddTermModal = function() {
    document.getElementById('glossaryModal').classList.add('show');
 };
 window.hideGlossaryModal = function() {
    document.getElementById('glossaryModal').classList.remove('show');
    document.getElementById('glossaryTerm').value = '';
    document.getElementById('glossaryPronunciation').value = '';
 };
 window.saveGlossaryTerm = async function() {
    const term = document.getElementById('glossaryTerm').value.trim();
    const pronunciation = document.getElementById('glossaryPronunciation').value.trim();
    if (!term) {
        alert('请输入术语');
        return;
    }
    try {
        await addGlossaryTerm(term, pronunciation);
        hideGlossaryModal();
        loadKnowledgeBase();
    } catch (err) {
        console.error('Add term failed:', err);
        alert('添加术语失败: ' + err.message);
    }
 };
 // Upload handling
 function initUpload() {
-    const input = document.getElementById('fileInput');
+    // Audio upload
    const audioInput = document.getElementById('fileInput');
    if (audioInput) {
        audioInput.addEventListener('change', async (e) => {
            if (!e.target.files.length) return;
            await handleFileUpload(e.target.files[0], 'audio');
        });
    }
    // Document upload
    const docInput = document.getElementById('docInput');
    if (docInput) {
        docInput.addEventListener('change', async (e) => {
            if (!e.target.files.length) return;
            await handleFileUpload(e.target.files[0], 'document');
        });
    }
 }
 async function handleFileUpload(file, type) {
    const overlay = document.getElementById('uploadOverlay');
    if (!input) return;
    input.addEventListener('change', async (e) => {
        if (!e.target.files.length) return;
        const file = e.target.files[0];
        if (overlay) {
    overlay.innerHTML = `
        <div style="text-align:center;">
            <h2>正在分析...</h2>
            <p style="color:#666;margin-top:10px;">${file.name}</p>
-                    <p style="color:#888;margin-top:20px;font-size:0.9rem;">ASR转录 + 实体提取中</p>
+            <p style="color:#888;margin-top:20px;font-size:0.9rem;">${type === 'audio' ? 'ASR转录 + 实体提取中' : '文档解析 + 实体提取中'}</p>
        </div>
    `;
        }
    try {
-            const result = await uploadAudio(file);
+        let result;
        if (type === 'audio') {
            result = await uploadAudio(file);
        } else {
            result = await uploadDocument(file);
        }
        // 更新当前数据
        currentData = result;
-            // 重新加载项目数据（包含新实体和关系）
+        // 重新加载项目数据
        await loadProjectData();
-            // 渲染转录文本
+        // 重置上传界面
-            if (result.segments && result.segments.length > 0) {
+        overlay.innerHTML = `
-                renderTranscript();
+            <div class="upload-box">
-            }
+                <h2 style="margin-bottom:10px;">上传文件</h2>
                <div class="upload-tabs">
                    <div class="upload-tab active" onclick="switchUploadTab('audio')">🎵 音频</div>
                    <div class="upload-tab" onclick="switchUploadTab('document')">📄 文档</div>
                </div>
                <p style="color:#666;" id="uploadHint">支持 MP3, WAV, M4A (最大 500MB)</p>
                <input type="file" id="fileInput" accept="audio/*" hidden>
                <input type="file" id="docInput" accept=".pdf,.docx,.doc,.txt,.md" hidden>
                <button class="btn" onclick="triggerFileSelect()">选择文件</button>
                <br><br>
                <button class="btn btn-secondary" onclick="hideUpload()">取消</button>
            </div>
        `;
-            if (overlay) overlay.classList.remove('show');
+        // 重新绑定事件
        initUpload();
        overlay.classList.remove('show');
    } catch (err) {
        console.error('Upload failed:', err);
            if (overlay) {
        overlay.innerHTML = `
            <div style="text-align:center;">
                <h2 style="color:#ff6b6b;">分析失败</h2>
@@ -441,6 +1116,13 @@ function initUpload() {
            </div>
        `;
    }
        }
    });
 }
 // Close dropdown when clicking outside
 document.addEventListener('click', (e) => {
    const dropdown = document.getElementById('transcriptDropdown');
    const selector = document.querySelector('.transcript-selector');
    if (dropdown && selector && !selector.contains(e.target)) {
        dropdown.classList.remove('show');
    }
 });
--- a/frontend/workbench.html
+++ b/frontend/workbench.html
@@ -3,7 +3,7 @@
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>InsightFlow - 知识工作台</title>
+    <title>InsightFlow - 知识工作台 (Phase 3)</title>
    <script src="https://d3js.org/d3.v7.min.js"></script>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
@@ -46,10 +46,44 @@
            color: #888;
            font-size: 0.9rem;
        }
        .header-actions {
            display: flex;
            gap: 10px;
        }
        .main {
            display: flex;
            height: calc(100vh - 50px);
        }
        .sidebar {
            width: 60px;
            background: #111;
            border-right: 1px solid #222;
            display: flex;
            flex-direction: column;
            align-items: center;
            padding: 10px 0;
        }
        .sidebar-btn {
            width: 44px;
            height: 44px;
            background: transparent;
            border: none;
            color: #666;
            font-size: 1.2rem;
            cursor: pointer;
            border-radius: 8px;
            margin-bottom: 8px;
            transition: all 0.2s;
        }
        .sidebar-btn:hover, .sidebar-btn.active {
            background: #1a1a1a;
            color: #00d4ff;
        }
        .content-area {
            flex: 1;
            display: flex;
            overflow: hidden;
        }
        .editor-panel {
            width: 50%;
            border-right: 1px solid #222;
@@ -66,6 +100,23 @@
            justify-content: space-between;
            align-items: center;
        }
        .panel-actions {
            display: flex;
            gap: 8px;
        }
        .btn-icon {
            background: transparent;
            border: 1px solid #333;
            color: #888;
            padding: 4px 10px;
            border-radius: 4px;
            cursor: pointer;
            font-size: 0.8rem;
        }
        .btn-icon:hover {
            border-color: #00d4ff;
            color: #00d4ff;
        }
        .transcript-content {
            flex: 1;
            padding: 20px;
@@ -92,6 +143,13 @@
        }
        .segment-text {
            color: #e0e0e0;
            outline: none;
        }
        .segment-text[contenteditable="true"] {
            background: #1a1a1a;
            padding: 8px;
            border-radius: 4px;
            border: 1px solid #00d4ff;
        }
        .entity {
            background: rgba(123, 44, 191, 0.3);
@@ -99,10 +157,16 @@
            padding: 0 4px;
            border-radius: 3px;
            cursor: pointer;
            position: relative;
        }
        .entity:hover {
            background: rgba(123, 44, 191, 0.5);
        }
        .entity.selected {
            background: #ff6b6b;
            border-color: #ff6b6b;
            color: #fff;
        }
        .graph-panel {
            width: 50%;
            display: flex;
@@ -127,10 +191,15 @@
            border-radius: 6px;
            margin-bottom: 8px;
            cursor: pointer;
            transition: all 0.2s;
        }
        .entity-item:hover {
            background: #222;
        }
        .entity-item.selected {
            background: #2a2a2a;
            border-left: 3px solid #ff6b6b;
        }
        .entity-type-badge {
            padding: 2px 8px;
            border-radius: 4px;
@@ -139,11 +208,11 @@
            margin-right: 12px;
            text-transform: uppercase;
        }
-        .type-project { background: #7b2cbf; }
+        .type-PROJECT { background: #7b2cbf; }
-        .type-tech { background: #00d4ff; color: #000; }
+        .type-TECH { background: #00d4ff; color: #000; }
-        .type-person { background: #ff6b6b; }
+        .type-PERSON { background: #ff6b6b; }
-        .type-org { background: #4ecdc4; color: #000; }
+        .type-ORG { background: #4ecdc4; color: #000; }
-        .type-other { background: #666; }
+        .type-OTHER { background: #666; }
        .upload-overlay {
            position: fixed;
            top: 0;
@@ -164,10 +233,29 @@
            border-radius: 16px;
            padding: 60px;
            text-align: center;
            max-width: 500px;
        }
        .upload-box:hover {
            border-color: #00d4ff;
        }
        .upload-tabs {
            display: flex;
            gap: 10px;
            margin-bottom: 20px;
            justify-content: center;
        }
        .upload-tab {
            padding: 8px 16px;
            background: #1a1a1a;
            border: 1px solid #333;
            border-radius: 6px;
            cursor: pointer;
            color: #888;
        }
        .upload-tab.active {
            border-color: #00d4ff;
            color: #00d4ff;
        }
        .btn {
            background: linear-gradient(90deg, #00d4ff, #7b2cbf);
            color: white;
@@ -186,10 +274,316 @@
            font-size: 0.85rem;
            margin-top: 0;
        }
        .btn-danger {
            background: #ff6b6b;
        }
        .btn-secondary {
            background: #333;
        }
        .empty-state {
            text-align: center;
            padding: 60px 20px;
        }
        /* Phase 2: Entity Editor Modal */
        .modal-overlay {
            position: fixed;
            top: 0;
            left: 0;
            right: 0;
            bottom: 0;
            background: rgba(0,0,0,0.8);
            display: none;
            align-items: center;
            justify-content: center;
            z-index: 3000;
        }
        .modal-overlay.show {
            display: flex;
        }
        .modal {
            background: #1a1a1a;
            border-radius: 12px;
            padding: 24px;
            width: 90%;
            max-width: 500px;
            max-height: 80vh;
            overflow-y: auto;
        }
        .modal-header {
            font-size: 1.2rem;
            margin-bottom: 20px;
            color: #fff;
        }
        .form-group {
            margin-bottom: 16px;
        }
        .form-group label {
            display: block;
            margin-bottom: 6px;
            color: #888;
            font-size: 0.85rem;
        }
        .form-group input,
        .form-group select,
        .form-group textarea {
            width: 100%;
            padding: 10px 12px;
            background: #0a0a0a;
            border: 1px solid #333;
            border-radius: 6px;
            color: #e0e0e0;
            font-size: 0.95rem;
        }
        .form-group input:focus,
        .form-group select:focus,
        .form-group textarea:focus {
            outline: none;
            border-color: #00d4ff;
        }
        .form-group textarea {
            min-height: 80px;
            resize: vertical;
        }
        .modal-actions {
            display: flex;
            gap: 10px;
            justify-content: flex-end;
            margin-top: 20px;
        }
        /* Phase 2: Context Menu */
        .context-menu {
            position: absolute;
            background: #1a1a1a;
            border: 1px solid #333;
            border-radius: 6px;
            padding: 6px 0;
            z-index: 4000;
            display: none;
            min-width: 160px;
        }
        .context-menu.show {
            display: block;
        }
        .context-menu-item {
            padding: 8px 16px;
            cursor: pointer;
            font-size: 0.9rem;
            color: #e0e0e0;
        }
        .context-menu-item:hover {
            background: #2a2a2a;
        }
        .context-menu-divider {
            height: 1px;
            background: #333;
            margin: 6px 0;
        }
        /* Phase 2: Relation Editor */
        .relation-editor {
            margin-top: 16px;
            padding-top: 16px;
            border-top: 1px solid #333;
        }
        .relation-item {
            display: flex;
            align-items: center;
            gap: 8px;
            padding: 8px;
            background: #0a0a0a;
            border-radius: 4px;
            margin-bottom: 8px;
            font-size: 0.85rem;
        }
        .relation-item button {
            background: transparent;
            border: none;
            color: #ff6b6b;
            cursor: pointer;
            font-size: 0.8rem;
        }
        /* Phase 2: Selection toolbar */
        .selection-toolbar {
            position: fixed;
            bottom: 20px;
            left: 50%;
            transform: translateX(-50%);
            background: #1a1a1a;
            border: 1px solid #333;
            border-radius: 8px;
            padding: 10px 20px;
            display: none;
            gap: 10px;
            z-index: 3500;
            box-shadow: 0 4px 20px rgba(0,0,0,0.5);
        }
        .selection-toolbar.show {
            display: flex;
        }
        /* Graph node styles */
        .node-circle {
            cursor: pointer;
        }
        .node-label {
            pointer-events: none;
        }
        /* Phase 3: Knowledge Base Panel */
        .kb-panel {
            width: 100%;
            height: 100%;
            display: none;
            flex-direction: column;
            background: #0a0a0a;
        }
        .kb-panel.show {
            display: flex;
        }
        .kb-header {
            padding: 16px 20px;
            background: #141414;
            border-bottom: 1px solid #222;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }
        .kb-stats {
            display: flex;
            gap: 24px;
        }
        .kb-stat {
            text-align: center;
        }
        .kb-stat-value {
            font-size: 1.5rem;
            font-weight: 600;
            color: #00d4ff;
        }
        .kb-stat-label {
            font-size: 0.75rem;
            color: #666;
        }
        .kb-content {
            flex: 1;
            display: flex;
            overflow: hidden;
        }
        .kb-sidebar {
            width: 200px;
            background: #111;
            border-right: 1px solid #222;
            padding: 16px 0;
        }
        .kb-nav-item {
            padding: 12px 20px;
            cursor: pointer;
            color: #888;
            border-left: 3px solid transparent;
        }
        .kb-nav-item:hover {
            background: #1a1a1a;
            color: #e0e0e0;
        }
        .kb-nav-item.active {
            background: #1a1a1a;
            color: #00d4ff;
            border-left-color: #00d4ff;
        }
        .kb-main {
            flex: 1;
            padding: 20px;
            overflow-y: auto;
        }
        .kb-section {
            display: none;
        }
        .kb-section.active {
            display: block;
        }
        .kb-entity-grid {
            display: grid;
            grid-template-columns: repeat(auto-fill, minmax(280px, 1fr));
            gap: 16px;
        }
        .kb-entity-card {
            background: #141414;
            border: 1px solid #222;
            border-radius: 8px;
            padding: 16px;
            cursor: pointer;
            transition: all 0.2s;
        }
        .kb-entity-card:hover {
            border-color: #00d4ff;
        }
        .kb-entity-name {
            font-weight: 600;
            margin-bottom: 4px;
        }
        .kb-entity-def {
            font-size: 0.85rem;
            color: #888;
            margin-bottom: 8px;
        }
        .kb-entity-meta {
            font-size: 0.75rem;
            color: #666;
        }
        .kb-glossary-item {
            display: flex;
            justify-content: space-between;
            align-items: center;
            padding: 12px 16px;
            background: #141414;
            border-radius: 6px;
            margin-bottom: 8px;
        }
        .kb-transcript-item {
            padding: 12px 16px;
            background: #141414;
            border-radius: 6px;
            margin-bottom: 8px;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }
        .file-type-icon {
            padding: 4px 8px;
            border-radius: 4px;
            font-size: 0.7rem;
            font-weight: 600;
        }
        .type-audio { background: #7b2cbf; }
        .type-document { background: #00d4ff; color: #000; }
        /* Transcript selector */
        .transcript-selector {
            position: relative;
        }
        .transcript-dropdown {
            position: absolute;
            top: 100%;
            right: 0;
            background: #1a1a1a;
            border: 1px solid #333;
            border-radius: 8px;
            min-width: 200px;
            max-height: 300px;
            overflow-y: auto;
            display: none;
            z-index: 100;
        }
        .transcript-dropdown.show {
            display: block;
        }
        .transcript-option {
            padding: 10px 16px;
            cursor: pointer;
            border-bottom: 1px solid #222;
        }
        .transcript-option:hover {
            background: #2a2a2a;
        }
        .transcript-option.active {
            background: #00d4ff22;
        }
    </style>
 </head>
 <body>
@@ -198,19 +592,40 @@
            <a href="/" class="back-link">← 返回项目列表</a>
            <span class="project-name" id="projectName">加载中...</span>
        </div>
-        <button class="btn btn-small" onclick="showUpload()">+ 上传音频</button>
+        <div class="header-actions">
            <button class="btn btn-small" onclick="showUpload()">+ 上传文件</button>
        </div>
    </div>
    <div class="main">
        <!-- Sidebar -->
        <div class="sidebar">
            <button class="sidebar-btn active" onclick="switchView('workbench')" title="工作台">📝</button>
            <button class="sidebar-btn" onclick="switchView('knowledge-base')" title="知识库">📚</button>
        </div>
        <!-- Content Area -->
        <div class="content-area">
            <!-- Workbench View -->
            <div id="workbenchView" class="workbench-view" style="display: flex; width: 100%;">
                <div class="editor-panel">
                    <div class="panel-header">
                        <div style="display: flex; align-items: center; gap: 12px;">
                            <span>📄 转录文本</span>
-                <span style="font-size:0.8rem;color:#666;">点击实体高亮</span>
+                            <div class="transcript-selector">
                                <button class="btn-icon" onclick="toggleTranscriptDropdown()">📁 选择文件</button>
                                <div class="transcript-dropdown" id="transcriptDropdown"></div>
                            </div>
                        </div>
                        <div class="panel-actions">
                            <button class="btn-icon" onclick="toggleEditMode()" id="editBtn">✏️ 编辑</button>
                            <button class="btn-icon" onclick="saveTranscript()" id="saveBtn" style="display:none;">💾 保存</button>
                        </div>
                    </div>
                    <div class="transcript-content" id="transcriptContent">
                        <div class="empty-state">
                            <p style="color:#666;">暂无转录内容</p>
-                    <button class="btn" onclick="showUpload()">上传音频</button>
+                            <button class="btn" onclick="showUpload()">上传音频或文档</button>
                        </div>
                    </div>
                </div>
@@ -218,7 +633,7 @@
                <div class="graph-panel">
                    <div class="panel-header">
                        <span>🔗 知识图谱</span>
-                <span style="font-size:0.8rem;color:#666;">拖拽节点查看关系</span>
+                        <span style="font-size:0.8rem;color:#666;">右键节点编辑 | 拖拽建立关系</span>
                    </div>
                    <svg id="graph-svg"></svg>
                    <div class="entity-list" id="entityList">
@@ -228,15 +643,203 @@
                </div>
            </div>
            <!-- Knowledge Base View -->
            <div id="knowledgeBaseView" class="kb-panel">
                <div class="kb-header">
                    <h2>📚 项目知识库</h2>
                    <div class="kb-stats">
                        <div class="kb-stat">
                            <div class="kb-stat-value" id="kbEntityCount">0</div>
                            <div class="kb-stat-label">实体</div>
                        </div>
                        <div class="kb-stat">
                            <div class="kb-stat-value" id="kbRelationCount">0</div>
                            <div class="kb-stat-label">关系</div>
                        </div>
                        <div class="kb-stat">
                            <div class="kb-stat-value" id="kbTranscriptCount">0</div>
                            <div class="kb-stat-label">文件</div>
                        </div>
                        <div class="kb-stat">
                            <div class="kb-stat-value" id="kbGlossaryCount">0</div>
                            <div class="kb-stat-label">术语</div>
                        </div>
                    </div>
                </div>
                <div class="kb-content">
                    <div class="kb-sidebar">
                        <div class="kb-nav-item active" onclick="switchKBTab('entities')">🏷️ 实体</div>
                        <div class="kb-nav-item" onclick="switchKBTab('relations')">🔗 关系</div>
                        <div class="kb-nav-item" onclick="switchKBTab('glossary')">📖 术语表</div>
                        <div class="kb-nav-item" onclick="switchKBTab('transcripts')">📁 文件</div>
                    </div>
                    <div class="kb-main">
                        <!-- Entities Section -->
                        <div class="kb-section active" id="kbEntitiesSection">
                            <h3 style="margin-bottom:16px;">所有实体</h3>
                            <div class="kb-entity-grid" id="kbEntityGrid"></div>
                        </div>
                        <!-- Relations Section -->
                        <div class="kb-section" id="kbRelationsSection">
                            <h3 style="margin-bottom:16px;">所有关系</h3>
                            <div id="kbRelationsList"></div>
                        </div>
                        <!-- Glossary Section -->
                        <div class="kb-section" id="kbGlossarySection">
                            <div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:16px;">
                                <h3>术语表</h3>
                                <button class="btn btn-small" onclick="showAddTermModal()">+ 添加术语</button>
                            </div>
                            <div id="kbGlossaryList"></div>
                        </div>
                        <!-- Transcripts Section -->
                        <div class="kb-section" id="kbTranscriptsSection">
                            <h3 style="margin-bottom:16px;">所有文件</h3>
                            <div id="kbTranscriptsList"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <!-- Upload Modal -->
    <div class="upload-overlay" id="uploadOverlay">
        <div class="upload-box">
-            <h2 style="margin-bottom:10px;">上传音频分析</h2>
+            <h2 style="margin-bottom:10px;">上传文件</h2>
-            <p style="color:#666;">支持 MP3, WAV, M4A (最大 500MB)</p>
+            <div class="upload-tabs">
-            <input type="file" id="fileInput" accept="audio/*" hidden>
+                <div class="upload-tab active" onclick="switchUploadTab('audio')">🎵 音频</div>
-            <button class="btn" onclick="document.getElementById('fileInput').click()">选择文件</button>
+                <div class="upload-tab" onclick="switchUploadTab('document')">📄 文档</div>
            <br><br>
            <button class="btn" style="background:#333;" onclick="hideUpload()">取消</button>
            </div>
            <p style="color:#666;" id="uploadHint">支持 MP3, WAV, M4A (最大 500MB)</p>
            <input type="file" id="fileInput" accept="audio/*" hidden>
            <input type="file" id="docInput" accept=".pdf,.docx,.doc,.txt,.md" hidden>
            <button class="btn" onclick="triggerFileSelect()">选择文件</button>
            <br><br>
            <button class="btn btn-secondary" onclick="hideUpload()">取消</button>
        </div>
    </div>
    <!-- Entity Editor Modal -->
    <div class="modal-overlay" id="entityModal">
        <div class="modal">
            <h3 class="modal-header">编辑实体</h3>
            <div class="form-group">
                <label>实体名称</label>
                <input type="text" id="entityName" placeholder="实体名称">
            </div>
            <div class="form-group">
                <label>实体类型</label>
                <select id="entityType">
                    <option value="PROJECT">项目 (PROJECT)</option>
                    <option value="TECH">技术 (TECH)</option>
                    <option value="PERSON">人物 (PERSON)</option>
                    <option value="ORG">组织 (ORG)</option>
                    <option value="OTHER">其他 (OTHER)</option>
                </select>
            </div>
            <div class="form-group">
                <label>定义描述</label>
                <textarea id="entityDefinition" placeholder="一句话描述这个实体..."></textarea>
            </div>
            <div class="form-group">
                <label>别名 (用逗号分隔)</label>
                <input type="text" id="entityAliases" placeholder="别名1, 别名2, 别名3">
            </div>
            <div class="relation-editor" id="relationEditor" style="display:none;">
                <h4 style="color:#888;font-size:0.9rem;margin-bottom:12px;">实体关系</h4>
                <div id="relationList"></div>
                <button class="btn-icon" onclick="showAddRelation()" style="margin-top:8px;">+ 添加关系</button>
            </div>
            <div class="modal-actions">
                <button class="btn btn-danger" onclick="deleteEntity()" id="deleteEntityBtn">删除</button>
                <button class="btn btn-secondary" onclick="hideEntityModal()">取消</button>
                <button class="btn" onclick="saveEntity()">保存</button>
            </div>
        </div>
    </div>
    <!-- Add Relation Modal -->
    <div class="modal-overlay" id="relationModal">
        <div class="modal">
            <h3 class="modal-header">添加关系</h3>
            <div class="form-group">
                <label>目标实体</label>
                <select id="relationTarget"></select>
            </div>
            <div class="form-group">
                <label>关系类型</label>
                <select id="relationType">
                    <option value="belongs_to">属于 (belongs_to)</option>
                    <option value="works_with">合作 (works_with)</option>
                    <option value="depends_on">依赖 (depends_on)</option>
                    <option value="mentions">提及 (mentions)</option>
                    <option value="related">相关 (related)</option>
                </select>
            </div>
            <div class="form-group">
                <label>关系证据/说明</label>
                <textarea id="relationEvidence" placeholder="描述这个关系的依据..."></textarea>
            </div>
            <div class="modal-actions">
                <button class="btn btn-secondary" onclick="hideRelationModal()">取消</button>
                <button class="btn" onclick="saveRelation()">添加</button>
            </div>
        </div>
    </div>
    <!-- Merge Entities Modal -->
    <div class="modal-overlay" id="mergeModal">
        <div class="modal">
            <h3 class="modal-header">合并实体</h3>
            <p style="color:#888;margin-bottom:16px;font-size:0.9rem;">将选中的实体合并到目标实体中</p>
            <div class="form-group">
                <label>源实体</label>
                <input type="text" id="mergeSource" disabled>
            </div>
            <div class="form-group">
                <label>目标实体 (保留)</label>
                <select id="mergeTarget"></select>
            </div>
            <div class="modal-actions">
                <button class="btn btn-secondary" onclick="hideMergeModal()">取消</button>
                <button class="btn" onclick="confirmMerge()">合并</button>
            </div>
        </div>
    </div>
    <!-- Add Glossary Term Modal -->
    <div class="modal-overlay" id="glossaryModal">
        <div class="modal">
            <h3 class="modal-header">添加术语</h3>
            <div class="form-group">
                <label>术语</label>
                <input type="text" id="glossaryTerm" placeholder="术语名称">
            </div>
            <div class="form-group">
                <label>发音提示 (可选)</label>
                <input type="text" id="glossaryPronunciation" placeholder="如: K8s 发音为 Kubernetes">
            </div>
            <div class="modal-actions">
                <button class="btn btn-secondary" onclick="hideGlossaryModal()">取消</button>
                <button class="btn" onclick="saveGlossaryTerm()">添加</button>
            </div>
        </div>
    </div>
    <!-- Context Menu -->
    <div class="context-menu" id="contextMenu">
        <div class="context-menu-item" onclick="editEntity()">✏️ 编辑实体</div>
        <div class="context-menu-item" onclick="showMergeModal()">🔄 合并实体</div>
        <div class="context-menu-divider"></div>
        <div class="context-menu-item" onclick="createEntityFromSelection()">➕ 标记为实体</div>
    </div>
    <!-- Selection Toolbar -->
    <div class="selection-toolbar" id="selectionToolbar">
        <span style="color:#888;font-size:0.85rem;">选中文本:</span>
        <button class="btn-icon" onclick="createEntityFromSelection()">标记为实体</button>
        <button class="btn-icon" onclick="hideSelectionToolbar()">取消</button>
    </div>
    <script src="app.js"></script>
Author	SHA1	Message	Date
OpenClaw Bot	7e192a9f0a	Add Phase 3 feature documentation	2026-02-18 12:13:51 +08:00
OpenClaw Bot	5005a2df52	Add deployment script for Phase 3	2026-02-18 12:13:22 +08:00
OpenClaw Bot	da8a4db985	Phase 3: Memory & Growth - Multi-file fusion, Entity alignment with embedding, Document import, Knowledge base panel	2026-02-18 12:12:39 +08:00
OpenClaw Bot	643fe46780	feat: Phase 2 交互与纠错工作台完成 - 新增实体编辑 API (名称、类型、定义、别名) - 新增实体删除和合并功能 - 新增关系管理 (创建、删除) - 新增转录文本编辑功能 - 新增划词创建实体功能 - 前端新增实体编辑器模态框 - 前端新增右键菜单和工具栏 - 文本与图谱双向联动优化	2026-02-18 06:03:51 +08:00