Knowledge
What is knowledge in CrewAI and how to use it.
What is Knowledge?
Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks. Think of it as giving your agents a reference library they can consult while working.
Key benefits of using Knowledge:
- Enhance agents with domain-specific information
- Support decisions with real-world data
- Maintain context across conversations
- Ground responses in factual information
Supported Knowledge Sources
CrewAI supports various types of knowledge sources out of the box:
Text Sources
- Raw strings
- Text files (.txt)
- PDF documents
Structured Data
- CSV files
- Excel spreadsheets
- JSON documents
Supported Knowledge Parameters
Parameter | Type | Required | Description |
---|---|---|---|
sources | List[BaseKnowledgeSource] | Yes | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content. |
collection_name | str | No | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to “knowledge” if not provided. |
storage | Optional[KnowledgeStorage] | No | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created. |
Unlike retrieval from a vector database using a tool, agents preloaded with knowledge will not need a retrieval persona or task. Simply add the relevant knowledge sources your agent or crew needs to function.
Knowledge sources can be added at the agent or crew level. Crew level knowledge sources will be used by all agents in the crew. Agent level knowledge sources will be used by the specific agent that is preloaded with the knowledge.
Quickstart Example
For file-Based Knowledge Sources, make sure to place your files in a knowledge
directory at the root of your project.
Also, use relative paths from the knowledge
directory when creating the source.
Here’s an example using string-based knowledge:
Here’s another example with the CrewDoclingSource
. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including MD, PDF, DOCX, HTML, and more.
You need to install docling
for the following example to work: uv add docling
Knowledge Configuration
You can configure the knowledge configuration for the crew or agent.
results_limit
: is the number of relevant documents to return. Default is 3.
score_threshold
: is the minimum score for a document to be considered relevant. Default is 0.35.
More Examples
Here are examples of how to use different types of knowledge sources:
Note: Please ensure that you create the ./knowldge folder. All source files (e.g., .txt, .pdf, .xlsx, .json) should be placed in this folder for centralized management.
Text File Knowledge Source
PDF Knowledge Source
CSV Knowledge Source
Excel Knowledge Source
JSON Knowledge Source
Knowledge Configuration
Chunking Configuration
Knowledge sources automatically chunk content for better processing. You can configure chunking behavior in your knowledge sources:
The chunking configuration helps in:
- Breaking down large documents into manageable pieces
- Maintaining context through chunk overlap
- Optimizing retrieval accuracy
Embeddings Configuration
You can also configure the embedder for the knowledge store.
This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
The embedder
parameter supports various embedding model providers that include:
openai
: OpenAI’s embedding modelsgoogle
: Google’s text embedding modelsazure
: Azure OpenAI embeddingsollama
: Local embeddings with Ollamavertexai
: Google Cloud VertexAI embeddingscohere
: Cohere’s embedding modelsvoyageai
: VoyageAI’s embedding modelsbedrock
: AWS Bedrock embeddingshuggingface
: Hugging Face modelswatson
: IBM Watson embeddings
Here’s an example of how to configure the embedder for the knowledge store using Google’s text-embedding-004
model:
Query Rewriting
CrewAI implements an intelligent query rewriting mechanism to optimize knowledge retrieval. When an agent needs to search through knowledge sources, the raw task prompt is automatically transformed into a more effective search query.
How Query Rewriting Works
- When an agent executes a task with knowledge sources available, the
_get_knowledge_search_query
method is triggered - The agent’s LLM is used to transform the original task prompt into an optimized search query
- This optimized query is then used to retrieve relevant information from knowledge sources
Benefits of Query Rewriting
Improved Retrieval Accuracy
By focusing on key concepts and removing irrelevant content, query rewriting helps retrieve more relevant information.
Context Awareness
The rewritten queries are designed to be more specific and context-aware for vector database retrieval.
Implementation Details
Query rewriting happens transparently using a system prompt that instructs the LLM to:
- Focus on key words of the intended task
- Make the query more specific and context-aware
- Remove irrelevant content like output format instructions
- Generate only the rewritten query without preamble or postamble
This mechanism is fully automatic and requires no configuration from users. The agent’s LLM is used to perform the query rewriting, so using a more capable LLM can improve the quality of rewritten queries.
Example
The rewritten query is more focused on the core information need and removes irrelevant instructions about output formatting.
Clearing Knowledge
If you need to clear the knowledge stored in CrewAI, you can use the crewai reset-memories
command with the --knowledge
option.
This is useful when you’ve updated your knowledge sources and want to ensure that the agents are using the most recent information.
Agent-Specific Knowledge
While knowledge can be provided at the crew level using crew.knowledge_sources
, individual agents can also have their own knowledge sources using the knowledge_sources
parameter:
Benefits of agent-specific knowledge:
- Give agents specialized information for their roles
- Maintain separation of concerns between agents
- Combine with crew-level knowledge for layered information access
Custom Knowledge Sources
CrewAI allows you to create custom knowledge sources for any type of data by extending the BaseKnowledgeSource
class. Let’s create a practical example that fetches and processes space news articles.
Space News Knowledge Source Example
Key Components Explained
-
Custom Knowledge Source (
SpaceNewsKnowledgeSource
):- Extends
BaseKnowledgeSource
for integration with CrewAI - Configurable API endpoint and article limit
- Implements three key methods:
load_content()
: Fetches articles from the API_format_articles()
: Structures the articles into readable textadd()
: Processes and stores the content
- Extends
-
Agent Configuration:
- Specialized role as a Space News Analyst
- Uses the knowledge source to access space news
-
Task Setup:
- Takes a user question as input through
{user_question}
- Designed to provide detailed answers based on the knowledge source
- Takes a user question as input through
-
Crew Orchestration:
- Manages the workflow between agent and task
- Handles input/output through the kickoff method
This example demonstrates how to:
- Create a custom knowledge source that fetches real-time data
- Process and format external data for AI consumption
- Use the knowledge source to answer specific user questions
- Integrate everything seamlessly with CrewAI’s agent system
About the Spaceflight News API
The example uses the Spaceflight News API, which:
- Provides free access to space-related news articles
- Requires no authentication
- Returns structured data about space news
- Supports pagination and filtering
You can customize the API query by modifying the endpoint URL: