๐Ÿฆ€Agento
FeaturesPricingBlog
HomeGuidesImport Knowledge Into Your Agent

Import Knowledge Into Your Agent

Mar 4, 2026ยท9 min read

Table of Contents

  • Knowledge Base vs. Chat vs. Memory
  • Installing the CLI
  • Supported File Formats
  • Loading Files (Dry Run First)
  • Targeting: Swarm vs. Agent
  • Deduplication
  • Listing What is Loaded
  • Deleting Entries
  • Updating Knowledge
  • Chunk Size
  • CSV and JSON FAQ Format
  • Extract Mode for Chat Logs
  • Using the API Directly
  • Tips

Your agent already learns from conversations through its automatic memory system. But sometimes you want to give it reference material up front: product documentation, FAQs, internal policies, or any structured knowledge it should be able to look up.

That is what the knowledge base (KB) is for. You load files into it, and your agent can search them semantically whenever a question comes up. Unlike chat or memory, the knowledge base is something you prepare ahead of time and update on your own schedule.

Knowledge Base vs. Chat vs. Memory

These three systems serve different purposes. Understanding when to use each one will help you get better results from your agent.

Chat Memory Knowledge Base
What it is Real-time conversation Facts extracted from conversations Reference material you load in advance
Who writes it You and your agent, back and forth Your agent, automatically You, using the CLI or API
When it is used During the current session Recalled before every response Searched when the agent needs reference info
Best for Questions, tasks, discussion Preferences, decisions, context Docs, FAQs, policies, product info
Persistence Session-scoped Permanent (until deleted) Permanent (until you delete or replace it)

Use chat when you want to talk to your agent or give it a task right now.

Use memory when your agent should learn something from an ongoing conversation and recall it later. This happens automatically.

Use the knowledge base when you have existing documents your agent should be able to reference. Think employee handbooks, API docs, troubleshooting guides, or product catalogs.

A practical example

Say you run a customer support agent. Here is how the three systems work together:

  1. You load your FAQ and product docs into the KB before the agent starts handling tickets
  2. A customer asks a question in chat and your agent searches the KB to find the answer
  3. During the conversation, the agent learns that this customer is on the Enterprise plan. That gets saved to memory automatically
  4. Next time the customer writes in, the agent recalls (from memory) that they are on Enterprise, and searches the KB for Enterprise-specific docs

Installing the CLI

The fastest way to load knowledge is with the agento-kb command-line tool.

npm install -g @agentoai/kb

You also need an API key with knowledge:write and knowledge:read scopes. Create one from the API Keys page in your dashboard.

Set it as an environment variable:

export AGENTO_API_KEY=ak_live_your_key_here

That is the only key you need for standard ingestion. Embeddings (the vector representations used for semantic search) are generated server-side by Agento. If you plan to use Extract Mode to summarize chat logs, you will also need an OpenAI API key (see below).

Supported File Formats

The CLI works with local files only. It cannot fetch content from URLs, web pages, or cloud storage. Download your documents first, then point the CLI at them.

The following formats are supported:

Format Extensions How it splits
Markdown .md, .markdown Splits on headings (##), keeps heading as context
Plain text .txt Splits on paragraphs, then sentences
CSV / TSV .csv, .tsv Detects Q/A columns or treats each row as a fact
JSON FAQ .json Expects an array of { "q": "...", "a": "..." } objects
Chat logs .log, .txt Auto-detected by timestamp patterns. Requires --extract flag

Point the CLI at a single file or an entire directory. It will recursively find all supported files and skip anything with an unrecognized extension.

Not supported: PDF, DOCX, XLSX, images, audio, or video files. If you have content in these formats, convert them to one of the supported formats first. For example, copy text from a PDF into a .txt or .md file, or export a spreadsheet as .csv.

Loading Files (Dry Run First)

By default, the CLI runs in dry-run mode. It parses your files and shows what it would do, without actually sending anything.

agento-kb load --swarm your-swarm-id --file ./docs/

Output:

File breakdown:
  docs/faq.md: 12 chunks
  docs/product-guide.md: 34 chunks
  docs/troubleshooting.csv: 8 chunks

Total: 54 chunks from 3 files
Target: swarm your-swarm-id

Dry run โ€” no data was sent.
Run with --apply to ingest.

Review the chunk counts. If something looks off (a tiny file producing hundreds of chunks, or a large file producing zero), check the file format.

When you are ready, add --apply:

agento-kb load --swarm your-swarm-id --file ./docs/ --apply
Ingesting batch 1/1...

Done! Stored: 54, Skipped (duplicates): 0, Total: 54

Targeting: Swarm vs. Agent

You can load knowledge into either a swarm or a solo agent.

Swarm (recommended for teams of agents):

agento-kb load --swarm abc123 --file ./docs/

All agents in the swarm can search this knowledge. If you later add a new agent to the swarm, it gets access automatically.

Solo agent (for a single agent not in any swarm):

agento-kb load --agent def456 --file ./docs/

If the agent belongs to a swarm, the CLI automatically stores the knowledge at the swarm level. This way all swarm members benefit.

Deduplication

The system automatically deduplicates content. If you load the same file twice, identical or near-identical chunks are skipped. "Near-identical" means cosine similarity above 0.95, so minor formatting changes are caught too.

This means you can safely re-run the same load command after updating a few files. Only the genuinely new or changed content will be stored.

Listing What is Loaded

See what is in the knowledge base:

agento-kb list --swarm abc123
a1b2c3d4  docs/faq.md                     2026-03-04  Q: How do I reset my password? A: Go to Settings...
b2c3d4e5  docs/faq.md                     2026-03-04  Q: What payment methods do you accept? A: We ac...
c3d4e5f6  docs/product-guide.md           2026-03-04  Getting Started. To create your first project, ...

Filter by source file:

agento-kb list --swarm abc123 --source "docs/faq.md"

Deleting Entries

Remove all entries from a specific source file:

agento-kb delete --swarm abc123 --source "docs/faq.md" --confirm

Remove a single entry by ID (use the full UUID; the list output only shows the first 8 characters):

agento-kb delete --swarm abc123 --id a1b2c3d4-full-uuid-here --confirm

Remove everything:

agento-kb delete --swarm abc123 --all --confirm

Without --confirm, deletes by --source or --id run as preview-only. For --all, the CLI refuses to run unless --confirm is provided. This is a safety measure so you do not accidentally wipe your knowledge base.

Updating Knowledge

There is no dedicated "update" command. The recommended workflow is:

  1. Edit your source files locally
  2. Delete the old source: agento-kb delete --swarm abc123 --source "docs/faq.md" --confirm
  3. Re-load the updated file: agento-kb load --swarm abc123 --file ./docs/faq.md --apply

Because of deduplication, you can also just re-load everything without deleting first. Unchanged chunks will be skipped. But deleting first is cleaner, since it removes chunks from sections you may have deleted from the source file.

Chunk Size

By default, text is split into chunks of up to 2000 characters. You can adjust this:

agento-kb load --swarm abc123 --file ./docs/ --chunk-max-chars 1000 --apply

Smaller chunks give more precise search results but may lose context. Larger chunks preserve context but may include irrelevant text in results. The default of 2000 works well for most use cases.

CSV and JSON FAQ Format

For structured Q&A data, use CSV or JSON.

CSV format (auto-detected column names):

question,answer
How do I reset my password?,Go to Settings > Security > Reset Password.
What payment methods do you accept?,"We accept Visa, Mastercard, and bank transfers."

The CLI looks for columns named question/q and answer/a. If it does not find them, it combines all columns into a single fact per row.

JSON format:

[
  { "q": "How do I reset my password?", "a": "Go to Settings > Security > Reset Password." },
  { "q": "What payment methods?", "a": "Visa, Mastercard, and bank transfers." }
]

Both formats produce chunks like Q: How do I reset my password?\nA: Go to Settings > Security > Reset Password.

Extract Mode for Chat Logs

Structured documents like markdown and CSV work well with standard ingestion. But raw chat logs, Slack exports, WhatsApp dumps, and other conversational data are full of noise: greetings, acknowledgments, filler messages, and back-and-forth that dilutes search quality.

Extract mode solves this by running an LLM summarization step before ingestion. Instead of embedding raw conversation text, it extracts the actual facts, decisions, and knowledge, then stores those clean summaries.

How it works

The pipeline with --extract enabled:

  1. Parse chat logs into conversation threads (split on large line gaps and capped thread size)
  2. Summarize each thread with an LLM to extract key facts and decisions
  3. Strip PII (names, emails, phone numbers, addresses) from the extracted text
  4. Chunk and embed the clean output, same as standard ingestion

Why you need your own OpenAI key

Extract mode calls the OpenAI API to summarize each conversation thread. This runs on your machine using your API key, not through Agento's ingestion API. This keeps your costs proportional to how much data you process and keeps Agento's KB ingest endpoint focused on cleaned extracted facts.

Set your key:

export LLM_API_KEY=sk-your-openai-key-here

Or pass it inline:

agento-kb load --swarm abc123 --file ./chatlogs/ --extract --llm-key sk-... --apply

Example

Say you have a WhatsApp export:

[10:03] Alice: hey did we decide on the auth approach?
[10:04] Bob: yeah JWT with refresh tokens
[10:04] Bob: stored in httpOnly cookies
[10:05] Alice: ok cool, and the expiry?
[10:05] Bob: 15min access, 7day refresh
[10:06] Alice: ๐Ÿ‘

Without --extract, this gets embedded as-is. The vector is diluted by "hey", "ok cool", and the thumbs-up emoji. Search quality suffers.

With --extract, the LLM produces clean facts:

- Authentication uses JWT with refresh tokens
- Tokens stored in httpOnly cookies
- Access token expiry: 15 minutes
- Refresh token expiry: 7 days

Each fact becomes a chunk that gets embedded. Clean, searchable, no noise.

PII removal

By default, extract mode strips personally identifiable information from the summarized output. This includes names, email addresses, phone numbers, physical addresses, and other identifying details. The LLM is instructed to anonymize during summarization, and a regex post-pass catches anything it misses.

To disable PII stripping (for example, if your chat logs are internal and names are relevant):

agento-kb load --swarm abc123 --file ./chatlogs/ --extract --llm-key sk-... --no-strip-pii --apply

Saving extracted facts to disk

By default, extracted facts are sent straight to the knowledge base. If you want to review or archive what was extracted, use --output to save the results as markdown files:

agento-kb load --swarm abc123 --file ./chatlogs/ --extract --llm-key sk-... --output ./extracted/ --apply

This creates one .extracted.md file per source file in the output directory. Each file looks like:

# Extracted facts from chatlogs/team-standup.log

> 14 facts extracted on 2026-03-04

- Authentication uses JWT with refresh tokens
- Tokens stored in httpOnly cookies
- Access token expiry: 15 minutes
- Refresh token expiry: 7 days
- ...

You can use --output without --apply to extract and save locally without ingesting, which is useful for reviewing what the LLM produced before committing it to the knowledge base.

Extract mode flags

Flag Description
--extract Enable LLM-powered extraction and summarization
--llm-key <key> Your OpenAI API key (or set LLM_API_KEY env var)
--llm-model <model> Model to use (default: gpt-4.1-mini)
--no-strip-pii Disable automatic PII removal from extracted text
--output <dir> Save extracted facts as .extracted.md files to this directory

Supported chat log formats

The chat log parser auto-detects common formats:

  • WhatsApp exports: [DD/MM/YY, HH:MM] Name: message
  • Slack/Discord-style text exports: [YYYY-MM-DD HH:MM] name: message
  • Generic timestamped: [timestamp] name: message or name (timestamp): message
  • Plain transcripts: Speaker: message lines

Messages are grouped into threads based on detected message boundaries. The parser starts a new thread on large line gaps or after about 50 messages to keep extraction batches manageable.

When using --extract with a directory, the CLI scans all text files recursively, not just the standard supported extensions. This means chat exports with unusual extensions (.html, .dat, etc.) are included as long as they contain text. Without --extract, only files with supported extensions (.md, .txt, .csv, .tsv, .json, .log) are processed.

Using the API Directly

If you prefer to integrate knowledge loading into your own tools, use the REST API instead of the CLI.

Ingest chunks:

curl -X POST https://api.agento.host/v1/knowledge/ingest \
  -H "X-Api-Key: ak_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "swarmId": "your-swarm-id",
    "chunks": [
      { "text": "Your knowledge text here.", "source": "my-app" }
    ]
  }'

List entries:

curl "https://api.agento.host/v1/knowledge?swarmId=your-swarm-id" \
  -H "X-Api-Key: ak_live_..."

Delete by source:

curl -X DELETE "https://api.agento.host/v1/knowledge?swarmId=your-swarm-id&source=my-app&confirm=true" \
  -H "X-Api-Key: ak_live_..."

See the full API Reference for request/response details.

Tips

Start small. Load one or two files, then test by asking your agent questions that the docs should answer. This helps you verify the system is working before loading everything.

Use meaningful source names. The --source flag (or the auto-detected filename) is how you identify and manage entries later. Avoid generic names like "data.txt".

Keep chunks focused. If you have a very long document covering many topics, consider splitting it into separate files by topic. The markdown parser already splits on headings, but well-organized source files produce better results.

Re-load after major updates. If you restructured a document significantly, delete and re-load it rather than relying on deduplication. This ensures removed sections are cleaned up.

Combine with memory. The knowledge base holds your static reference material. Memory captures what your agent learns from conversations. Together, they give your agent both prepared knowledge and learned context.

Back to all guides
๐Ÿฆ€Agento

AI agents that run 24/7 for your business. Deploy in minutes, not hours.

Remsys, Inc

1606 Headway Cir STE 9078

Austin, TX 78754, USA

+1 650 396 9091

๐ŸฆžPowered by OpenClaw

Product

  • Features
  • Pricing
  • Security

Company

  • About
  • Contact

Resources

  • Skills Marketplace
  • Agento Blog
  • API Reference
  • Guides
  • OpenClaw
  • Skills.sh

Legal

  • Privacy
  • Terms
  • GDPR

ยฉ 2026 Agento. All rights reserved.