modelscope · suluyana · Feb 6, 2026 · Feb 9, 2026 · Mar 5, 2026 · Mar 12, 2026
diff --git a/docs/en/Components/Config.md b/docs/en/Components/Config.md
@@ -102,6 +102,24 @@ tools:
     url: https://mcp.api-inference.modelscope.net/xxx/sse
     exclude:
       - map_geo
+  # Local codebase / document search (sirchmunk), exposed as the `localsearch` tool
+  localsearch:
+    mcp: false
+    paths:
+      - ./src
+      - ./docs
+    work_path: ./.sirchmunk
+    mode: FAST
+    # Optional: llm_api_key, llm_base_url, llm_model_name (else inherited from `llm`)
+    # When true, a shallow sirchmunk DirectoryScanner run at tool connect injects file titles/previews
+    # into the `localsearch` tool description (default: false)
+    # description_catalog: false
+    # description_catalog_max_files: 120
+    # description_catalog_max_depth: 5
+    # description_catalog_max_chars: 10000
+    # description_catalog_max_preview_chars: 400
+    # description_catalog_cache_ttl_seconds: 300
+    # description_catalog_exclude: []  # extra globs / dir names merged with sirchmunk defaults
 ```
 
 For the complete list of supported tools and custom tools, please refer to [here](./Tools.md)
@@ -167,19 +185,19 @@ In addition to yaml configuration, MS-Agent also supports several additional com
 
 > Any configuration in agent.yaml can be passed in with new values via command line, and also supports reading from environment variables with the same name (case insensitive), for example `--llm.modelscope_api_key xxx-xxx`.
 
-- knowledge_search_paths: Knowledge search paths, comma-separated multiple paths. When provided, automatically enables SirchmunkSearch for knowledge retrieval, with LLM configuration automatically inherited from the `llm` module.
+- knowledge_search_paths: Comma-separated local search paths. Merges into `tools.localsearch.paths` and registers the **`localsearch`** tool (sirchmunk) for on-demand use by the model—not automatic per-turn injection. LLM settings are inherited from the `llm` module unless you set `tools.localsearch.llm_*` fields.
 
 ### Quick Start for Knowledge Search
 
-Use the `--knowledge_search_paths` parameter to quickly enable knowledge search based on local documents:
+Use `--knowledge_search_paths` or define `tools.localsearch` in yaml so the model can call `localsearch` when needed:
 
 ```bash
 # Using default agent.yaml configuration, automatically reuses LLM settings
-ms-agent run --query "How to implement user authentication?" --knowledge_search_paths "./src,./docs"
+ms-agent run --query "How to implement user authentication?" --knowledge_search_paths "/path/to/docs"
 
 # Specify configuration file
 ms-agent run --config /path/to/agent.yaml --query "your question" --knowledge_search_paths "/path/to/docs"
 ```
 
 LLM-related parameters (api_key, base_url, model) are automatically inherited from the `llm` module in the configuration file, no need to configure them repeatedly.
-If you need to use independent LLM configuration in the `knowledge_search` module, you can explicitly configure `knowledge_search.llm_api_key` and other parameters in the yaml.
+For a dedicated sirchmunk LLM, set `tools.localsearch.llm_api_key`, `llm_base_url`, and `llm_model_name` in yaml. Legacy top-level `knowledge_search` with the same keys is still read for backward compatibility.
diff --git a/docs/zh/Components/config.md b/docs/zh/Components/config.md
@@ -102,6 +102,24 @@ tools:
     url: https://mcp.api-inference.modelscope.net/xxx/sse
     exclude:
       - map_geo
+  # 本地代码库/文档搜索（sirchmunk），对应模型可调用的 `localsearch` 工具
+  localsearch:
+    mcp: false
+    paths:
+      - ./src
+      - ./docs
+    work_path: ./.sirchmunk
+    mode: FAST
+    # 可选：llm_api_key、llm_base_url、llm_model_name（不填则从 `llm` 继承）
+    # 为 true 时，在工具连接阶段用 sirchmunk DirectoryScanner 做浅层扫描，把文件标题/预览写入
+    # `localsearch` 工具 description，便于模型知道本地知识库里大致有哪些内容（默认 false）
+    # description_catalog: false
+    # description_catalog_max_files: 120
+    # description_catalog_max_depth: 5
+    # description_catalog_max_chars: 10000
+    # description_catalog_max_preview_chars: 400
+    # description_catalog_cache_ttl_seconds: 300
+    # description_catalog_exclude: []  # 额外 glob / 目录名，与 sirchmunk 默认排除合并
 ```
 
 支持的完整工具列表，以及自定义工具请参考 [这里](./tools)
@@ -165,13 +183,13 @@ handler: custom_handler
       }
     }
     ```
-- knowledge_search_paths: 知识搜索路径，逗号分隔的多个路径。传入后会自动启用 SirchmunkSearch 进行知识检索，LLM 配置自动从 `llm` 模块复用
+- knowledge_search_paths: 知识搜索路径，逗号分隔。会合并到 `tools.localsearch.paths` 并注册 **`localsearch`** 工具（sirchmunk），由模型按需调用；如未配置 `tools.localsearch.llm_*`， LLM 从 `llm` 模块复用
 
 > agent.yaml 中的任意一个配置，都可以使用命令行传入新的值，也支持从同名（大小写不敏感）环境变量中读取，例如 `--llm.modelscope_api_key xxx-xxx`。
 
 ### 知识搜索快速使用
 
-通过 `--knowledge_search_paths` 参数，可以快速启用基于本地文档的知识搜索：
+通过 `--knowledge_search_paths` 或在 yaml 中配置 `tools.localsearch`，启用本地知识搜索（模型按需调用 `localsearch`）：
 
 ```bash
 # 使用默认 agent.yaml 配置，自动复用 LLM 设置
@@ -182,4 +200,4 @@ ms-agent run --config /path/to/agent.yaml --query "你的问题" --knowledge_sea
 ```
 
 LLM 相关参数（api_key, base_url, model）会自动从配置文件的 `llm` 模块继承，无需重复配置。
-如果需要在 `knowledge_search` 模块中使用独立的 LLM 配置，可以在 yaml 中显式配置 `knowledge_search.llm_api_key` 等参数。
+若 sirchmunk 需独立 LLM，可在 yaml 的 `tools.localsearch` 下设置 `llm_api_key`、`llm_base_url`、`llm_model_name`。
diff --git a/examples/knowledge_search/agent.yaml.example b/examples/knowledge_search/agent.yaml.example
diff --git a/ms_agent/agent/agent.yaml b/ms_agent/agent/agent.yaml
@@ -13,42 +13,7 @@ generation_config:
 
 prompt:
   system: |
-    You are an assistant that helps me complete tasks. You need to follow these instructions:
-
-    1. Analyze whether my requirements need tool-calling. If no tools are needed, you can think directly and provide an answer.
-
-    2. I will give you many tools, some of which are similar. Please carefully analyze which tool you currently need to invoke.
-       * If tools need to be invoked, you must call at least one tool in each round until the requirement is completed.
-       * If you get any useful links or images from the tool calling, output them with your answer as well.
-       * Check carefully the tool result, what it contains, whether it has information you need.
-
-    3. You DO NOT have built-in geocode/coordinates/links. Do not output any fake geocode/coordinates/links. Always query geocode/coordinates/links from tools first!
-
-    4. If you need to complete coding tasks, you need to carefully analyze the original requirements, provide detailed requirement analysis, and then complete the code writing.
-
-    5. This conversation is NOT for demonstration or testing purposes. Answer it as accurately as you can.
-
-    6. Do not call tools carelessly. Show your thoughts **as detailed as possible**.
-
-    7. Respond in the same language the user uses. If the user switches, switch accordingly.
-
-    For requests that require performing a specific task or retrieving information, using the following format:
-    ```
-    The user needs to ...
-    I have analyzed this request in detail and broken it down into the following steps:
-    ...
-    ```
-    If you have tools which may help you to solve problems, follow this format to answer:
-    ```
-    The user needs to ...
-    I have analyzed this request in detail and broken it down into the following steps:
-    ...
-    First, I should use the [Tool Name] because [explain relevance]. The required input parameters are: ...
-    ...
-    I have carefully reviewed the tool's output. The result does/does not fully meet my expectations. Next, I need to ...
-    ```
-
-    **Important: Always respond in the same language the user is using.**
+    you are a helpful assistant.
 
 max_chat_round: 9999
 

diff --git a/ms_agent/agent/llm_agent.py b/ms_agent/agent/llm_agent.py
@@ -13,7 +13,6 @@
 import json
 from ms_agent.agent.runtime import Runtime
 from ms_agent.callbacks import Callback, callbacks_mapping
-from ms_agent.knowledge_search import SirchmunkSearch
 from ms_agent.llm.llm import LLM
 from ms_agent.llm.utils import Message, ToolResult
 from ms_agent.memory import Memory, get_memory_meta_safe, memory_mapping
@@ -107,7 +106,6 @@ def __init__(
         self.tool_manager: Optional[ToolManager] = None
         self.memory_tools: List[Memory] = []
         self.rag: Optional[RAG] = None
-        self.knowledge_search: Optional[SirschmunkSearch] = None
         self.llm: Optional[LLM] = None
         self.runtime: Optional[Runtime] = None
         self.max_chat_round: int = 0
@@ -528,6 +526,7 @@ async def parallel_tool_call(self,
                 tool_call_id=tool_call_query['id'],
                 name=tool_call_query['tool_name'],
                 resources=tool_call_result_format.resources,
+                tool_detail=tool_call_result_format.tool_detail,
             )
 
             if _new_message.tool_call_id is None:
@@ -538,6 +537,63 @@ async def parallel_tool_call(self,
             self.log_output(_new_message.content)
         return messages
 
+    async def parallel_tool_call_streaming(
+            self, messages: List[Message]) -> AsyncGenerator:
+        """Streaming variant of parallel_tool_call.
+
+        Yields messages list snapshots during tool execution:
+        - While tools are running: yields messages with the latest incremental
+          ``tool_detail`` on a temporary placeholder Message (content='') so the
+          caller can stream logs to the frontend.
+        - After all tools finish: yields the final messages list (with proper
+          tool result Messages appended), same as parallel_tool_call.
+        """
+        tool_calls = messages[-1].tool_calls
+
+        # Map call_id -> tool_call_query for final message construction.
+        call_id_to_query = {tc['id']: tc for tc in tool_calls}
+
+        # Accumulate final results keyed by call_id.
+        final_results: dict = {}
+
+        async for call_id, item, is_final in self.tool_manager.parallel_call_tool_streaming(
+                tool_calls):
+            if is_final:
+                # Final result for this call_id (any type; not inferred from content).
+                final_results[call_id] = item
+            else:
+                # Intermediate log line: one incremental chunk in tool_detail.
+                log_message = Message(
+                    role='tool',
+                    content='',
+                    tool_call_id=call_id,
+                    name=call_id_to_query.get(call_id,
+                                              {}).get('tool_name', ''),
+                    tool_detail=item,
+                )
+                yield messages + [log_message]
+
+        # All tools done — build final tool messages and yield.
+        for tool_call_query in tool_calls:
+            cid = tool_call_query['id']
+            raw_result = final_results.get(
+                cid, f'Tool call missing result for id {cid}')
+            tool_call_result_format = ToolResult.from_raw(raw_result)
+            _new_message = Message(
+                role='tool',
+                content=tool_call_result_format.text,
+                tool_call_id=cid,
+                name=tool_call_query['tool_name'],
+                resources=tool_call_result_format.resources,
+            )
+            if _new_message.tool_call_id is None:
+                _new_message.tool_call_id = str(uuid.uuid4())[:8]
+                tool_call_query['id'] = _new_message.tool_call_id
+            messages.append(_new_message)
+            self.log_output(_new_message.content)
+
+        yield messages
+
     async def prepare_tools(self):
         """Initialize and connect the tool manager."""
         self.tool_manager = ToolManager(
@@ -636,11 +692,7 @@ async def create_messages(
         return messages
 
     async def do_rag(self, messages: List[Message]):
-        """Process RAG or knowledge search to enrich the user query with context.
-
-        This method handles both traditional RAG and sirchmunk-based knowledge search.
-        For knowledge search, it also populates searching_detail and search_result
-        fields in the message for frontend display and next-turn LLM context.
+        """Process RAG to enrich the user query with context.
 
         Args:
             messages (List[Message]): The message list to process.
@@ -654,23 +706,6 @@ async def do_rag(self, messages: List[Message]):
         # Handle traditional RAG
         if self.rag is not None:
             user_message.content = await self.rag.query(query)
-        # Handle sirchmunk knowledge search
-        if self.knowledge_search is not None:
-            # Perform search and get results
-            search_result = await self.knowledge_search.query(query)
-            search_details = self.knowledge_search.get_search_details()
-
-            # Store search details in the message for frontend display
-            user_message.searching_detail = search_details
-            user_message.search_result = search_result
-
-            # Build enriched context from search results
-            if search_result:
-                # Append search context to user query
-                context = search_result
-                user_message.content = (
-                    f'Relevant context retrieved from codebase search:\n\n{context}\n\n'
-                    f'User question: {query}')
 
     async def do_skill(self,
                        messages: List[Message]) -> Optional[List[Message]]:
@@ -757,18 +792,6 @@ async def prepare_rag(self):
                     f'which supports: {list(rag_mapping.keys())}')
                 self.rag: RAG = rag_mapping(rag.name)(self.config)
 
-    async def prepare_knowledge_search(self):
-        """Load and initialize the knowledge search component from the config."""
-        if self.knowledge_search is not None:
-            # Already initialized (e.g. by caller before run_loop), skip to avoid
-            # overwriting a configured instance (e.g. one with streaming callbacks set).
-            return
-        if hasattr(self.config, 'knowledge_search'):
-            ks_config = self.config.knowledge_search
-            if ks_config is not None:
-                self.knowledge_search: SirchmunkSearch = SirchmunkSearch(
-                    self.config)
-
     async def condense_memory(self, messages: List[Message]) -> List[Message]:
         """
         Update memory using the current conversation history.
@@ -931,7 +954,15 @@ async def step(
         self.save_history(messages)
 
         if _response_message.tool_calls:
-            messages = await self.parallel_tool_call(messages)
+            # Use the streaming variant so intermediate tool logs are yielded
+            # back to the caller while the tools are still running.
+            async for messages in self.parallel_tool_call_streaming(messages):
+                _lm = messages[-1]
+                _progress = (
+                    _lm.role == 'tool' and _lm.content == ''
+                    and _lm.tool_detail is not None)
+                if _progress:
+                    yield messages
 
         await self.after_tool_call(messages)
 
@@ -1111,7 +1142,6 @@ async def run_loop(self, messages: Union[List[Message], str],
             await self.prepare_tools()
             await self.load_memory()
             await self.prepare_rag()
-            await self.prepare_knowledge_search()
             self.runtime.tag = self.tag
 
             if messages is None: