Jun 20, 20264 min read

Building an Autonomous AI Assistant: Selective, Knowledge-Grounded, and Human-Like

#AI#RAG#Convex#Architecture

Building an AI assistant that actually helps a community without turning into a noisy, hallucinating bot is a significant engineering challenge. The goal is to build an assistant that is Selective, Knowledge-Grounded, and Human-Like.

Here is a breakdown of the architecture, tech stack, and design principles behind an autonomous AI assistant pipeline.

AI Assistant Architecture Infographic

1. The Architecture Flow

The system operates in an asynchronous loop, constantly ingesting new knowledge and evaluating community interactions to provide helpful answers.

Knowledge Ingestion (Hourly Cron)

The assistant first needs a reliable brain. Instead of feeding it raw URLs on the fly, it processes published content via an hourly cron job.

Scan & Delta Detection: It scans content sources and uses content hashing to detect what is new, changed, or removed. This prevents the system from doing unnecessary reprocessing.
Embedding: Content is chunked and embedded using Gemini Embedding 001 (producing 3072-dimensional vectors).
Storage: The vectors and metadata (sourceType, slug, title, contentHash, learnedAt) are stored safely in a Knowledge Store (Vector Index) powered by @convex-dev/rag.

The Assistant Engine (Runs Every ~10 Mins)

This is the core decision-making loop that dictates when and how the AI replies.

Select Candidates: Scans for new user posts and replies that haven't been processed yet.
Triage: Evaluates if the post is actually a question and if it's relevant. If not, it safely skips it.
Retrieve (RAG): Performs a semantic search against the Vector Index to pull the top-K (~8) most relevant context chunks.
Confidence Gate: This is the most crucial step to prevent hallucinations. An LLM acts as a judge to determine if the retrieved context actually contains the answer. If the confidence score is < 0.75, the assistant remains silent.
Generate Answer: If confident, the system generates a response using DeepSeek v4-flash. The prompt mandates the inclusion of citations/references from the retrieved sources.
Human-like Delay: To avoid feeling like a frantic robot, the system schedules a random delay (3-20 minutes) before posting.
Deliver Reply: Just before posting, it re-checks if the user's post still exists, then inserts the comment. All decisions (answered, skipped, scheduled, confidence scores, citations) are logged into an audit table for idempotency.

2. Tech Stack

To keep the system fast and manageable, the stack relies heavily on modern serverless and API-driven tools:

Backend & Orchestration: Convex handles the Database, Serverless Actions, Cron Jobs, Scheduler, and File Storage.
RAG & Vector Store: @convex-dev/rag handles the chunking, embedding storage, and vector search natively.
Embeddings: Google Embedding 001 (3072 dim) is used for its high-quality multilingual capabilities and generous free tier.
LLM (Reasoning & Generation): DeepSeek v4-flash (via an OpenAI-compatible API) handles the heavy lifting of triage, confidence scoring, and answer generation.

3. Common Problems & Solutions

Building AI for production means expecting it to misbehave. Here is how common pitfalls are addressed:

Hallucination / Wrong Answers: Solved via the strict Confidence Gate (score $\ge$ 0.75) and mandatory references. If it doesn't know, it doesn't speak.
Feels Like a Bot: Addressed by injecting a human-like delay, using a natural tone, varying the opening sentences, and keeping answers short.
Repeated Reprocessing: Avoided by implementing content hashing and delta ingestion (skipping unchanged content).
Outdated Information: The ingestion cron handles removals explicitly—unpublishing or deleting content from the primary source removes it from the vector store.
Endless Conversations: Hard-capped at a maximum of 2 follow-ups per thread to prevent infinite loops.
Moderation & Safety: The AI operates like a normal user. Normal moderation rules apply, and human moderators can easily review the decision logs and remove replies if necessary.

4. Key Design Principles

Ultimately, the architecture adheres to four core principles:

Selective: Answer only when it knows the answer and when it is relevant. Selective silence is always better than noise.
Knowledge-Grounded: All answers must come from curated, embedded content, not from the LLM's pre-trained assumptions.
Human-Like: Act like a helpful community member, not a soulless encyclopedia.
Safe & Auditable: Maintain transparent AI identity, clear decision logs, and full moderation readiness.

By combining strict confidence gates, delta-based ingestion, and human-like delays, you can build an AI assistant that genuinely elevates a community rather than spamming it.