What this means
A knowledge base is a living corpus, not a static archive. In a typical Australian enterprise, HR policies are reviewed annually and updated more frequently; compliance requirements change with regulation; product documentation changes with product releases; and operational procedures evolve with process improvement. Every change that is not reflected in the knowledge base creates a potential gap between what the AI system knows and what is currently true.
The operational challenge is that these changes occur in source systems — SharePoint sites, Confluence spaces, shared drives, document management systems — not in the vector store directly. Keeping the knowledge base current requires a synchronisation process that detects changes in source documents, removes or updates the corresponding embeddings and maintains the metadata that determines retrieval scope.
Why it matters for business
The consequences of a stale knowledge base are asymmetric: users trust the AI system's answers, and the system produces confident responses regardless of whether its knowledge is current. A staff member who receives an incorrect answer about their leave entitlements, a customer who is given outdated product terms, or a compliance officer who reviews a superseded regulatory summary — none of these failures announce themselves as data freshness problems. They appear as operational errors.
Only ~25% of AI initiatives have delivered the expected ROI according to IBM's CEO survey research, with data quality and integration cited among the leading causes of underperformance. Knowledge base drift — the gap between source document currency and vector store currency — is a direct contributor to this pattern in retrieval-based systems.
How it works technically
Keeping a knowledge base current requires four technical processes:
Change detection: Monitoring source systems for new, modified or deleted documents. Microsoft Graph's delta API, Confluence's change events and file system watchers each provide mechanisms to trigger re-ingestion when content changes, rather than running full corpus re-ingestion on a fixed schedule.
Differential ingestion: Processing only changed or new documents rather than re-ingesting the entire corpus on each cycle. This requires document-level identifiers in the vector store so that existing embeddings for a changed document can be deleted and replaced with fresh ones.
Deletion propagation: When a document is deleted or superseded in the source system, its embeddings must be removed from the vector store. Without explicit deletion, superseded content remains retrievable indefinitely. A status metadata field alone is not sufficient unless the retrieval pipeline actively filters on it.
Embedding model versioning: If the embedding model is updated or replaced, all existing embeddings must be regenerated using the new model. Mixing embeddings from different model versions in the same index produces incoherent similarity scores. Model version changes require a full re-index.
Practical implementation considerations
The three operational pillars of knowledge base maintenance are process, ownership and monitoring.
Process: Define a re-indexing schedule for each document corpus based on its change rate. High-change corpora (HR policies, compliance documentation, product specs) require event-triggered or weekly re-indexing. Low-change corpora (historical contracts, technical reference guides) can be refreshed monthly. Superseded documents must be removed, not just marked — or the retrieval pipeline must filter them explicitly using status metadata.
Ownership: Content owners — domain experts in HR, legal, operations, product — must understand that the AI knowledge base is downstream of their source documents. Changes to source documents that are not promptly reflected in the knowledge base degrade AI performance. This requires a lightweight notification or handoff process: when content owners update or retire a document, the AI ingestion pipeline is triggered. Edison AI's AI implementation engagements include a RACI for knowledge base maintenance as a standard deliverable, because ambiguous ownership is the leading cause of post-launch degradation.
Monitoring: Retrieval quality metrics — precision at K, answer faithfulness — should be monitored on a scheduled basis and after any significant corpus change. A drop in these metrics often signals that source document changes have not been propagated to the knowledge base. User feedback mechanisms (thumbs up/down, explicit corrections) provide an additional signal layer.
Common mistakes
- Treating the knowledge base as a one-time build. The most common mistake is investing in a well-prepared initial knowledge base and then handing it to operations without a maintenance process. Within six months, quality degrades visibly.
- No deletion propagation. Superseded documents that remain in the index continue to be retrieved. This is particularly damaging when the superseded content directly contradicts the current policy.
- Ownership gap between IT and content teams. IT manages the ingestion pipeline but does not know when content changes. Content teams update documents but do not trigger re-ingestion. The gap produces stale indexes by default.
- Not monitoring retrieval quality post-launch. Without ongoing measurement, quality degradation is only discovered when users report incorrect answers — by which time trust has already been eroded.
- Ignoring embedding model versioning. Upgrading the embedding model without re-indexing produces a corrupted index where new and old embeddings coexist incoherently.
What leaders should do next
Before launching any RAG deployment, define the re-indexing schedule and ownership model for each document corpus in scope. Establish a retrieval quality monitoring process with defined thresholds and responsible owners. Build deletion propagation into the ingestion pipeline from the start. Assign content stewards in each business domain and brief them on their role in keeping the AI knowledge base current. Review knowledge base health monthly as a standing operational metric.
Edison AI builds bespoke AI systems — including retrieval over your own documents — for Australian businesses.