Agent Safety 之 Guardrails

Input Guardrail

ADK 使用 callback 机制，即在 agent 的 execution cycle 里插入 hook.

`before_model_callback`

before_model_callback 是 ADK 在 agent 将“编译过的” request（包含聊天记录、指令、用户信息等等）发送给 LLM 前执行的.

Input Validation/Filtering. 合法性检查，如是否是一个问题之类的
Guardrails. 言论审查
Dynamic Prompt Modification. 添加 timely information 到 request 的 context 里.

我们首先定义一个函数，以 call_context: CallbackContext 和 llm_request: LlmRequest 作为参数.

前者提供关于 agent 本身的信息和 session state
后者包含完整的、即将发送给 LLM 的消息（contents, config）

在函数内部：

我们可以查看 llm_request.content 检查 ADK 要发送什么东西
可以修改 llm_request
如果函数最后返回 LlmResponse 对象，那么 ADK 就会跳过当前轮次的 LLM Call 并直接把 LlmResponse 对象返回
如果返回 None，那么 ADK 还是正常把 request 传给 LLM

from google.adk.agents.callback_context import CallbackContext
from google.adk.models.llm_request import LlmRequest
from google.adk.models.llm_response import LlmResponse
from google.genai import types # For creating response content
from typing import Optional

# 函数定义，一个 CallbackContext 和一个 LlmRequest
def block_keyword_guardrail(
    callback_context: CallbackContext,
    llm_request: LlmRequest,
) -> Optional[LlmResponse]:
    """
    Inspects the latest user message for 'BLOCK'. If found, blocks the LLM call
    and returns a predefined LlmResponse. Otherwise, returns None to proceed.
    """

    # 获取 agent name
    agent_name = callback_context.agent_name
    print(f"--- Callback: block_keyword_guardrail running for agent: {agent_name} ---")

    # 从对话记录提取最新的用户消息
    last_user_message_text = ""
    if llm_request.contents:
        # Find the most recent message with role 'user'
        for content in reversed(llm_request.contents):
            if content.role == 'user' and content.parts:
                # Assuming text is in the first part for simplicity
                if content.parts[0].text:
                    last_user_message_text = content.parts[0].text
                    break # Found the last user message text

    print(f"--- Callback: Inspecting last user message: '{last_user_message_text[:100]}...' ---") # Log first 100 chars

    # --- 实现言论审查，屏蔽特定单词 ---
    keyword_to_block = "BLOCK"
    if keyword_to_block in last_user_message_text.upper(): # Case-insensitive check
        print(f"--- Callback: Found '{keyword_to_block}'. Blocking LLM call! ---")
        # Optionally, set a flag in state to record the block event
        callback_context.state["guardrail_block_keyword_triggered"] = True
        print(f"--- Callback: Set state 'guardrail_block_keyword_triggered': True ---")

        # Construct and return an LlmResponse to stop the flow and send this back instead
        return LlmResponse(
            content=types.Content(
                role="model", # Mimic a response from the agent's perspective
                parts=[types.Part(text=f"I cannot process this request because it contains the blocked keyword '{keyword_to_block}'.")],
            )
            # Note: You could also set an error_message field here if needed
        )
    else: # Otherwise keyword not found, allow the request to proceed to the LLM
        print(f"--- Callback: Keyword not found. Allowing LLM call for {agent_name}. ---")
        return None # Returning None signals ADK to continue normally

接着，在 Agent 定义的时候，添加 before_model_callback=... 参数.

root_agent_model_guardrail = Agent(
    name="weather_agent_v5_model_guardrail", # New version name for clarity
    model=root_agent_model,
    description="Main agent: Handles weather, delegates greetings/farewells, includes input keyword guardrail.",
    instruction="You are the main Weather Agent. Provide weather using 'get_weather_stateful'. "
                "Delegate simple greetings to 'greeting_agent' and farewells to 'farewell_agent'. "
                "Handle only weather requests, greetings, and farewells.",
    tools=[get_weather_stateful],
    sub_agents=[greeting_agent, farewell_agent], # Reference the redefined sub-agents
    output_key="last_weather_report", # Keep output_key from Step 4
    before_model_callback=block_keyword_guardrail # <<< Assign the guardrail callback
)

Tool Argument Guardrail

和 Input Guardrail 类似，是检查传给 tool 的参数有没有问题（例如说，如果函数涉及创建文件，需要检查文件名是否合法、路径是否合法等等）

graph TB
A[LLM requests Tool Use and decides arguments] --> B[Tool Argument Guardrail]
B --> C[Tool function executes]

Argument validation
Resource Protection. Prevent tools from being called with inputs that might be costly, access restricted data, or cause unwanted side effects (e.g., blocking API calls for certain parameters)
Dynamic Argument Modification

我们首先定义函数，其参数如下：

tool: BaseTool 要调用的 tool 工具
args: Dict[str, Any] 这个 tool 的调用参数
tool_context: ToolContext tool 运行时的 session state 和 agent info 等等

和 Input Guardrail 一样，如果返回一个 Dict 那么就会跳过 tool 执行（需要保证格式一致）；否则会继续执行 tool.

from google.adk.tools.base_tool import BaseTool
from google.adk.tools.tool_context import ToolContext
from typing import Optional, Dict, Any # For type hints

def block_paris_tool_guardrail(
    tool: BaseTool,
    args: Dict[str, Any],
    tool_context: ToolContext,
) -> Optional[Dict]:
    """
    Checks if 'get_weather_stateful' is called for 'Paris'.
    If so, blocks the tool execution and returns a specific error dictionary.
    Otherwise, allows the tool call to proceed by returning None.
    """
    tool_name = tool.name
    agent_name = tool_context.agent_name # Agent attempting the tool call

    # --- Guardrail Logic ---
    target_tool_name = "get_weather_stateful" # Match the function name used by FunctionTool
    blocked_city = "paris"

    # Check if it's the correct tool and the city argument matches the blocked city
    if tool_name == target_tool_name:
        city_argument = args.get("city", "") # Safely get the 'city' argument
        if city_argument and city_argument.lower() == blocked_city:
            print(f"--- Callback: Detected blocked city '{city_argument}'. Blocking tool execution! ---")
            # Optionally update state
            tool_context.state["guardrail_tool_block_triggered"] = True
            print(f"--- Callback: Set state 'guardrail_tool_block_triggered': True ---")

            # Return a dictionary matching the tool's expected output format for errors
            # This dictionary becomes the tool's result, skipping the actual tool run.
            return {
                "status": "error",
                "error_message": f"Policy restriction: Weather checks for '{city_argument.capitalize()}' are currently disabled by a tool guardrail."
            }
        else:
             print(f"--- Callback: City '{city_argument}' is allowed for tool '{tool_name}'. ---")
    else:
        print(f"--- Callback: Tool '{tool_name}' is not the target tool. Allowing. ---")


    # If the checks above didn't return a dictionary, allow the tool to execute
    print(f"--- Callback: Allowing tool '{tool_name}' to proceed. ---")
    return None # Returning None allows the actual tool function to run

然后在定义 Agent 的时候加上 before_tool_callback=... 参数

root_agent_tool_guardrail = Agent(
    name="weather_agent_v6_tool_guardrail", # New version name
    model=root_agent_model,
    description="Main agent: Handles weather, delegates, includes input AND tool guardrails.",
    instruction="You are the main Weather Agent. Provide weather using 'get_weather_stateful'. "
                "Delegate greetings to 'greeting_agent' and farewells to 'farewell_agent'. "
                "Handle only weather, greetings, and farewells.",
    tools=[get_weather_stateful],
    sub_agents=[greeting_agent, farewell_agent],
    output_key="last_weather_report",
    before_model_callback=block_keyword_guardrail, # Keep model guardrail
    before_tool_callback=block_paris_tool_guardrail # <<< Add tool guardrail
)