Hermes Agent Python NousResearch/hermes-agent

The Self-Improving Skill System: How Hermes Builds and Refines Its Own Procedures

How Hermes discovers, loads, preprocesses, invokes, and tracks its own skills across the full lifecycle

7 stops ~20 min Verified 2026-04-30
What you will learn
  • How a skill is represented on disk: the YAML frontmatter schema inside `SKILL.md` and what fields drive behavior at runtime
  • How `scan_skill_commands()` walks the filesystem to build the `/command` map that every user invocation resolves against
  • How `iter_skill_index_files()` crawls local and external skill directories with a sorted, deduplication-safe walk
  • How a user message with a `/command` token is resolved to a skill, loaded, and assembled into a formatted prompt payload
  • What preprocessing steps run before skill content enters the model context: template variable substitution and optional inline shell expansion
  • How `bump_use()` and the `.usage.json` sidecar record usage events and drive the Curator's lifecycle transitions from active to stale to archived
  • How `skills_list()` exposes skills back to the user and agent via progressive disclosure: names and descriptions only, with `skill_view()` for full content
Prerequisites
  • Familiarity with Python type hints and `pathlib`
  • Basic understanding of YAML front matter (as used in Jekyll or similar)
  • No prior Hermes knowledge required
1 / 7

What a Skill Is

agent/skill_utils.py:34

The YAML frontmatter schema and the lazy YAML loader that parses it

A Hermes skill is a SKILL.md file in its own subdirectory under ~/.hermes/skills/. The YAML frontmatter encodes the skill's identity and runtime contract: name sets the human-readable label, description populates the /skills list summary, platform restricts OS compatibility, and metadata.hermes.config declares variables that must be resolved before the skill runs.

parse_frontmatter() reads that contract with a lazy-imported YAML parser (CSafeLoader when available, SafeLoader as fallback), avoiding a full PyYAML import at startup. If the YAML is malformed, a line-by-line key:value splitter rescues the record rather than dropping it silently. The (frontmatter_dict, body) tuple it returns is consumed by every downstream function from platform matching to config injection.

Key takeaway

Every skill's behavior is encoded in its SKILL.md frontmatter. parse_frontmatter() is the single shared reader, with a rescue path that keeps malformed skills alive rather than dropping them. ---

def yaml_load(content: str):
    """Parse YAML with lazy import and CSafeLoader preference."""
    global _yaml_load_fn
    if _yaml_load_fn is None:
        import yaml

        loader = getattr(yaml, "CSafeLoader", None) or yaml.SafeLoader

        def _load(value: str):
            return yaml.load(value, Loader=loader)

        _yaml_load_fn = _load
    return _yaml_load_fn(content)


# ── Frontmatter parsing ──────────────────────────────────────────────────


def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
    """Parse YAML frontmatter from a markdown string.

    Uses yaml with CSafeLoader for full YAML support (nested metadata, lists)
    with a fallback to simple key:value splitting for robustness.

    Returns:
        (frontmatter_dict, remaining_body)
    """
    frontmatter: Dict[str, Any] = {}
    body = content

    if not content.startswith("---"):
        return frontmatter, body

    end_match = re.search(r"\n---\s*\n", content[3:])
    if not end_match:
        return frontmatter, body

    yaml_content = content[3 : end_match.start() + 3]
    body = content[end_match.end() + 3 :]

    try:
        parsed = yaml_load(yaml_content)
        if isinstance(parsed, dict):
            frontmatter = parsed
    except Exception:
        # Fallback: simple key:value parsing for malformed YAML
        for line in yaml_content.strip().split("\n"):
            if ":" not in line:
                continue
            key, value = line.split(":", 1)
            frontmatter[key.strip()] = value.strip()

    return frontmatter, body


# ── Platform matching ─────────────────────────────────────────────────────

2 / 7

Skill Creation from Files

agent/skill_commands.py:215

How scan_skill_commands() turns SKILL.md files on disk into the slash-command registry

Skills are not registered by code; they are discovered by dropping a SKILL.md file into the skills directory. scan_skill_commands() turns those files into a live command registry at startup and again on /reload-skills. It scans ~/.hermes/skills/ first, then any skills.external_dirs from config.yaml, with seen_names ensuring local skills win on name collisions.

For each file found, the function applies three filters before adding the skill:

  1. Platform compatibility (macOS, Linux, or Windows)
  2. The user's disabled list from config
  3. Deduplication against already-seen names

If the frontmatter has no description, the scanner falls back to the first non-heading prose line in the body. The skill name is then slugified (spaces and underscores become hyphens, invalid characters stripped) to produce a /command key safe for both Telegram bot names and the CLI.

Key takeaway

Skill registration is a filesystem event. Dropping a SKILL.md and running /reload-skills is enough; scan_skill_commands() handles discovery, filtering, and slugification. ---

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    """Scan ~/.hermes/skills/ and return a mapping of /command -> skill info.

    Returns:
        Dict mapping "/skill-name" to {name, description, skill_md_path, skill_dir}.
    """
    global _skill_commands
    _skill_commands = {}
    try:
        from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
        from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
        disabled = _get_disabled_skill_names()
        seen_names: set = set()

        # Scan local dir first, then external dirs
        dirs_to_scan = []
        if SKILLS_DIR.exists():
            dirs_to_scan.append(SKILLS_DIR)
        dirs_to_scan.extend(get_external_skills_dirs())

        for scan_dir in dirs_to_scan:
            for skill_md in iter_skill_index_files(scan_dir, "SKILL.md"):
                if any(part in ('.git', '.github', '.hub', '.archive') for part in skill_md.parts):
                    continue
                try:
                    content = skill_md.read_text(encoding='utf-8')
                    frontmatter, body = _parse_frontmatter(content)
                    # Skip skills incompatible with the current OS platform
                    if not skill_matches_platform(frontmatter):
                        continue
                    name = frontmatter.get('name', skill_md.parent.name)
                    if name in seen_names:
                        continue
                    # Respect user's disabled skills config
                    if name in disabled:
                        continue
                    description = frontmatter.get('description', '')
                    if not description:
                        for line in body.strip().split('\n'):
                            line = line.strip()
                            if line and not line.startswith('#'):
                                description = line[:80]
                                break
                    seen_names.add(name)
                    # Normalize to hyphen-separated slug, stripping
                    # non-alnum chars (e.g. +, /) to avoid invalid
                    # Telegram command names downstream.
                    cmd_name = name.lower().replace(' ', '-').replace('_', '-')
                    cmd_name = _SKILL_INVALID_CHARS.sub('', cmd_name)
                    cmd_name = _SKILL_MULTI_HYPHEN.sub('-', cmd_name).strip('-')
                    if not cmd_name:
                        continue
                    _skill_commands[f"/{cmd_name}"] = {
                        "name": name,
                        "description": description or f"Invoke the {name} skill",
                        "skill_md_path": str(skill_md),
                        "skill_dir": str(skill_md.parent),
                    }
                except Exception:
                    continue
    except Exception:
        pass
    return _skill_commands


def get_skill_commands() -> Dict[str, Dict[str, Any]]:
3 / 7

Skill Indexing

agent/skill_utils.py:440

How iter_skill_index_files() walks skill directories to build the index

iter_skill_index_files() is the shared tree walker for the entire skill system. Both scan_skill_commands() and _find_all_skills() delegate to it. It uses os.walk with followlinks=True to traverse symlinked skill directories, and mutates dirs[:] in place to prune excluded paths before recursion. The excluded set (.git, .github, .hub, .archive) covers version control metadata, hub bundles, and the archived-skill holding area.

The two-phase approach (collect, then sort and yield) guarantees deterministic file order regardless of the OS's directory-entry ordering. Without a stable sort, which skill wins a name collision across directories would depend on filesystem ordering and be non-reproducible.

skills/index-cache/ holds pre-built JSON snapshots from hub providers. Those feed the hub installer; iter_skill_index_files only sees SKILL.md files already extracted onto disk.

Key takeaway

Sort order is a correctness guarantee, not a convenience. iter_skill_index_files() sorts before yielding so the deduplication logic in callers always resolves collisions the same way. ---

def iter_skill_index_files(skills_dir: Path, filename: str):
    """Walk skills_dir yielding sorted paths matching *filename*.

    Excludes ``.git``, ``.github``, ``.hub``, ``.archive`` directories.
    """
    matches = []
    for root, dirs, files in os.walk(skills_dir, followlinks=True):
        dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
        if filename in files:
            matches.append(Path(root) / filename)
    for path in sorted(matches, key=lambda p: str(p.relative_to(skills_dir))):
        yield path


# ── Namespace helpers for plugin-provided skills ───────────────────────────

_NAMESPACE_RE = re.compile(r"^[a-zA-Z0-9_-]+$")

4 / 7

Skill Retrieval at Runtime

agent/skill_commands.py:352

How a user-typed /command is resolved to a skill and assembled into a prompt payload

Retrieval is exact slash-command matching with a single normalization step. When a user types /gif-search or /gif_search, resolve_skill_command_key() collapses underscores to hyphens and does a direct dictionary lookup. The Telegram constraint explains the design: Telegram bot commands cannot contain hyphens, so the same skill arrives as /gif_search from Telegram but is stored under /gif-search internally. Normalization bridges both without a separate registry.

build_skill_invocation_message() is the runtime entry point once a key is resolved. It loads SKILL.md via _load_skill_payload(), then calls bump_use() in a try/except. Usage tracking is best-effort and must never block an invocation. The activation note injected at the top of the assembled message tells the model the user invoked this skill intentionally. The user_instruction field carries any text typed after the command as an inline override the model can weigh against the skill's default behavior.

Key takeaway

Skill retrieval is a dictionary lookup, not a vector search. Underscore-to-hyphen normalization is the sole disambiguation step, chosen to reconcile Telegram's naming constraint with the internal slug format. ---

def resolve_skill_command_key(command: str) -> Optional[str]:
    """Resolve a user-typed /command to its canonical skill_cmds key.

    Skills are always stored with hyphens — ``scan_skill_commands`` normalizes
    spaces and underscores to hyphens when building the key. Hyphens and
    underscores are treated interchangeably in user input: this matches
    ``_check_unavailable_skill`` and accommodates Telegram bot-command names
    (which disallow hyphens, so ``/claude-code`` is registered as
    ``/claude_code`` and comes back in the underscored form).

    Returns the matching ``/slug`` key from ``get_skill_commands()`` or
    ``None`` if no match.
    """
    if not command:
        return None
    cmd_key = f"/{command.replace('_', '-')}"
    return cmd_key if cmd_key in get_skill_commands() else None


def build_skill_invocation_message(
    cmd_key: str,
    user_instruction: str = "",
    task_id: str | None = None,
    runtime_note: str = "",
) -> Optional[str]:
    """Build the user message content for a skill slash command invocation.

    Args:
        cmd_key: The command key including leading slash (e.g., "/gif-search").
        user_instruction: Optional text the user typed after the command.

    Returns:
        The formatted message string, or None if the skill wasn't found.
    """
    commands = get_skill_commands()
    skill_info = commands.get(cmd_key)
    if not skill_info:
        return None

    loaded = _load_skill_payload(skill_info["skill_dir"], task_id=task_id)
    if not loaded:
        return f"[Failed to load skill: {skill_info['name']}]"

    loaded_skill, skill_dir, skill_name = loaded

    # Track active usage for Curator lifecycle management (#17782)
    try:
        from tools.skill_usage import bump_use
        bump_use(skill_name)
    except Exception:
        pass  # Non-critical — skill invocation proceeds regardless

    activation_note = (
        f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
        "you to follow its instructions. The full skill content is loaded below.]"
    )
    return _build_skill_message(
        loaded_skill,
        skill_dir,
        activation_note,
        user_instruction=user_instruction,
        runtime_note=runtime_note,
        session_id=task_id,
    )
5 / 7

Skill Preprocessing

agent/skill_preprocessing.py:115

Template variable substitution and optional inline shell expansion before a skill enters the prompt

preprocess_skill_content() is the final transformation before skill markdown reaches the model. It runs two independent passes, each gated by a config flag.

The first pass (template variable substitution, on by default) replaces ${HERMES_SKILL_DIR} and ${HERMES_SESSION_ID} tokens with concrete runtime values. A skill bundling scripts can reference ${HERMES_SKILL_DIR}/scripts/run.sh so the agent invokes it by absolute path without a skill_view() round-trip. Unresolved tokens are left in place, making missing values visible rather than silent.

The second pass (inline shell expansion, off by default) replaces backtick snippets with their stdout. Output is capped at _INLINE_SHELL_MAX_OUTPUT (4000 characters), runs under a configurable timeout (default 10 seconds), and returns a [inline-shell error: ...] marker on failure rather than aborting the load. Because this pass is opt-in, the common case pays no subprocess cost. The function accepts a pre-loaded skills_cfg dict so callers that have already read config can avoid a redundant file read.

Key takeaway

Preprocessing is two opt-in passes, both designed so failures surface visibly. Template substitution gives skills portable path references; inline shell expansion lets them embed live runtime data. ---

def preprocess_skill_content(
    content: str,
    skill_dir: Path | None,
    session_id: str | None = None,
    skills_cfg: dict | None = None,
) -> str:
    """Apply configured SKILL.md template and inline-shell preprocessing."""
    if not content:
        return content

    cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
    if cfg.get("template_vars", True):
        content = substitute_template_vars(content, skill_dir, session_id)
    if cfg.get("inline_shell", False):
        timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
        content = expand_inline_shell(content, skill_dir, timeout)
    return content
6 / 7

Skill Refinement and Lifecycle Tracking

tools/skill_usage.py:214

How the usage sidecar records invocations and drives Curator lifecycle transitions

Hermes does not rewrite skill content on invocation; that is the Curator's job on a separate cadence. At invocation time, bump_use() records the event: it increments use_count and writes last_used_at to a JSON sidecar at ~/.hermes/skills/.usage.json. The sidecar is deliberately separate from SKILL.md to avoid merge conflicts for hub-fetched skills and to allow safe reads without touching skill content.

_mutate() is the shared write primitive and enforces one invariant: bundled and hub-installed skills are never recorded. The sidecar is exclusively for agent-created skills because the Curator only drives lifecycle transitions on skills the agent itself generated.

The lifecycle state machine has three states:

  • active — default
  • stale — unused beyond a configurable threshold
  • archived — moved to the .archive/ subdirectory

reload_skills() performs a before/after diff against the in-memory _skill_commands map and returns {added, removed, unchanged, total}, letting the model learn what changed mid-session without a prompt-cache reset.

Key takeaway

Skill refinement is lifecycle management, not automatic rewriting. bump_use() feeds the Curator timestamps; the Curator decides when to transition active to stale to archived, and only ever touches agent-created skills. ---

def _empty_record() -> Dict[str, Any]:
    return {
        "use_count": 0,
        "view_count": 0,
        "last_used_at": None,
        "last_viewed_at": None,
        "patch_count": 0,
        "last_patched_at": None,
        "created_at": _now_iso(),
        "state": STATE_ACTIVE,
        "pinned": False,
        "archived_at": None,
    }


def load_usage() -> Dict[str, Dict[str, Any]]:
    """Read the entire .usage.json map. Returns empty dict on missing/corrupt."""
    path = _usage_file()
    if not path.exists():
        return {}
    try:
        data = json.loads(path.read_text(encoding="utf-8"))
    except (OSError, json.JSONDecodeError) as e:
        logger.debug("Failed to read %s: %s", path, e)
        return {}
    if not isinstance(data, dict):
        return {}
    # Defensive: coerce any non-dict values to a fresh empty record
    clean: Dict[str, Dict[str, Any]] = {}
    for k, v in data.items():
        if isinstance(v, dict):
            clean[str(k)] = v
    return clean


def save_usage(data: Dict[str, Dict[str, Any]]) -> None:
    """Write the usage map atomically. Best-effort — errors are logged, not raised."""
    path = _usage_file()
    try:
        path.parent.mkdir(parents=True, exist_ok=True)
        fd, tmp_path = tempfile.mkstemp(
            dir=str(path.parent), prefix=".usage_", suffix=".tmp"
        )
        try:
            with os.fdopen(fd, "w", encoding="utf-8") as f:
                json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
                f.flush()
                os.fsync(f.fileno())
            os.replace(tmp_path, path)
        except BaseException:
            try:
                os.unlink(tmp_path)
            except OSError:
                pass
            raise
    except Exception as e:
        logger.debug("Failed to write %s: %s", path, e, exc_info=True)


def get_record(skill_name: str) -> Dict[str, Any]:
    """Return the record for *skill_name*, creating a fresh one if missing."""
    data = load_usage()
    rec = data.get(skill_name)
    if not isinstance(rec, dict):
        return _empty_record()
    # Backfill any missing keys so callers don't need to handle old files
    base = _empty_record()
    for k, v in base.items():
        rec.setdefault(k, v)
    return rec


def _mutate(skill_name: str, mutator) -> None:
    """Load, apply *mutator(record)* in place, save. Best-effort.

    Bundled and hub-installed skills are NEVER recorded in the sidecar.
    This keeps .usage.json focused on agent-created skills (the only ones
    the curator considers) and prevents stale counters from hanging around
    for upstream-managed skills.
    """
    if not skill_name:
        return
    try:
        if not is_agent_created(skill_name):
            return
        data = load_usage()
        rec = data.get(skill_name)
        if not isinstance(rec, dict):
            rec = _empty_record()
        mutator(rec)
        data[skill_name] = rec
        save_usage(data)
    except Exception as e:
        logger.debug("skill_usage._mutate(%s) failed: %s", skill_name, e, exc_info=True)


# ---------------------------------------------------------------------------
# Public counter-bump helpers
# ---------------------------------------------------------------------------

def bump_view(skill_name: str) -> None:
    """Bump view_count and last_viewed_at. Called from skill_view()."""
    def _apply(rec: Dict[str, Any]) -> None:
        rec["view_count"] = int(rec.get("view_count") or 0) + 1
        rec["last_viewed_at"] = _now_iso()
    _mutate(skill_name, _apply)


def bump_use(skill_name: str) -> None:
    """Bump use_count and last_used_at. Called when a skill is actively used
    (e.g. loaded into the prompt path or referenced from an assistant turn)."""
    def _apply(rec: Dict[str, Any]) -> None:
        rec["use_count"] = int(rec.get("use_count") or 0) + 1
        rec["last_used_at"] = _now_iso()
    _mutate(skill_name, _apply)


def bump_patch(skill_name: str) -> None:
7 / 7

Skill Discoverability

tools/skills_tool.py:674

How skills_list() exposes skills to the user and agent with progressive disclosure

skills_list() is tier one of a two-tier discoverability system. It returns only {name, description, category} per skill and withholds full content, tags, and linked files. The hint field in the response points the agent toward skill_view(name) for the full payload. The split is a token-efficiency decision: a user with 30 skills would burn thousands of tokens loading full content on every listing.

_find_all_skills() delegates to iter_skill_index_files() for the walk (Stop 3) and applies the same platform, disabled, and deduplication filters as scan_skill_commands() (Stop 2). It derives category from the directory path relative to the skills root. The repository ships with 20+ category subdirectories (apple, creative, data-science, mlops, red-teaming, and others) that map directly to this field.

On first call with no skills present, skills_list() creates the skills directory and returns an empty response rather than an error, making the tool safe to call unconditionally on a fresh install.

Key takeaway

Progressive disclosure is an explicit token-efficiency decision, not an oversight. skills_list() returns only name, description, and category, with a hint that teaches the model to call skill_view() when it needs full content. ---

def skills_list(category: str = None, task_id: str = None) -> str:
    """
    List all available skills (progressive disclosure tier 1 - minimal metadata).

    Returns only name + description to minimize token usage. Use skill_view() to
    load full content, tags, related files, etc.

    Args:
        category: Optional category filter (e.g., "mlops")
        task_id: Optional task identifier used to probe the active backend

    Returns:
        JSON string with minimal skill info: name, description, category
    """
    try:
        if not SKILLS_DIR.exists():
            SKILLS_DIR.mkdir(parents=True, exist_ok=True)
            return json.dumps(
                {
                    "success": True,
                    "skills": [],
                    "categories": [],
                    "message": f"No skills found. Skills directory created at {display_hermes_home()}/skills/",
                },
                ensure_ascii=False,
            )

        # Find all skills
        all_skills = _find_all_skills()

        if not all_skills:
            return json.dumps(
                {
                    "success": True,
                    "skills": [],
                    "categories": [],
                    "message": "No skills found in skills/ directory.",
                },
                ensure_ascii=False,
            )

        # Filter by category if specified
        if category:
            all_skills = [s for s in all_skills if s.get("category") == category]

        # Sort by category then name
        all_skills = _sort_skills(all_skills)

        # Extract unique categories
        categories = sorted(
            set(s.get("category") for s in all_skills if s.get("category"))
        )

        return json.dumps(
            {
                "success": True,
                "skills": all_skills,
                "categories": categories,
                "count": len(all_skills),
                "hint": "Use skill_view(name) to see full content, tags, and linked files",
            },
            ensure_ascii=False,
        )

    except Exception as e:
        return tool_error(str(e), success=False)


# ── Plugin skill serving ──────────────────────────────────────────────────


def _serve_plugin_skill(
Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free