Getting a language model to produce useful output is only part of shipping an AI product. The other part is controlling how the model behaves and what it actually knows. Fine-tuning, alignment, and retrieval are the techniques that bridge the gap between a general-purpose model and one that performs reliably in a specific context. These terms come up in technical discussions, vendor evaluations, and product specs regularly. A vendor offering a "custom AI" solution is likely describing fine-tuning. A product brief mentioning "grounded responses" describes a retrieval strategy. An engineer explaining why the model gives outdated answers is describing a knowledge cutoff. Each of these concepts shapes what an AI feature can and cannot do for users.

Fine-tuning, RLHF, RAG, vector databases, grounding, knowledge cutoffs, and model alignment are the vocabulary that equips designers and product managers to engage with these decisions rather than defer to engineering. Knowing these terms means knowing what questions to ask, what tradeoffs to surface, and what to look for when an AI feature behaves unexpectedly.

Fine-tuning in LLMs

Fine-tuning is the process of taking a pretrained language model and continuing to train it on a smaller, task-specific dataset to improve its performance for a particular use case. A foundation model trained on broad internet text knows a lot about language in general. Fine-tuning narrows that knowledge toward a specific domain, tone, or task, such as writing in a brand voice, answering medical questions accurately, or classifying customer support tickets.

The result is a model that behaves differently from the base model it started as. The underlying architecture stays the same, but the weights shift based on the additional training data, changing what the model produces. Teams that fine-tune a model own that specialized version and can deploy it independently.

For product managers, fine-tuning matters because it is one of the primary ways AI vendors customize model behavior for enterprise clients. When a vendor says their product is "trained on your data," fine-tuning is often what they mean. Understanding what that involves, what data is required, how long it takes, and what it costs helps teams evaluate vendor claims and set realistic expectations.[1]

Pro Tip! Fine-tuning changes model behavior but not its architecture. The same base model can be fine-tuned into many different specialized versions.

RLHF (reinforcement learning from human feedback)

RLHF (reinforcement learning from human feedback)

Reinforcement learning from human feedback, or RLHF, is a training technique used to align a language model's outputs with human preferences. After pretraining, the model generates multiple responses to the same prompt. Human reviewers rank those responses based on criteria like accuracy, helpfulness, and safety. The model then learns from those rankings, adjusting its behavior to produce outputs that more closely match what people prefer.

RLHF is a significant reason why modern conversational AI feels coherent rather than technically correct but socially awkward. Without it, a model might produce grammatically sound text that is still unhelpful, inappropriate, or off-tone. RLHF pushes the model toward outputs that better align with how people actually want to interact with it.

The tone of an AI assistant, its tendency to add caveats, and its handling of ambiguous requests all emerge partly from RLHF. When a team reports that a model behaves inconsistently between versions, or that a fine-tuned model feels noticeably less polished than the base model, the RLHF layer is often the difference worth examining.[2]

RAG (retrieval-augmented generation)

Retrieval-augmented generation, or RAG, is a technique that combines a language model with a retrieval system to give the model access to relevant external information at the moment it generates a response. Instead of relying on what the model learned during training, RAG retrieves relevant documents or data from an external source and passes them as context. The model then uses that context to generate a more accurate, current, or domain-specific response.

The problem RAG solves is fundamental to LLM deployment. A language model's knowledge is fixed at training. It does not know what happened last week, cannot access internal documentation, and cannot look up current pricing. RAG addresses these limitations by connecting the model to a live or curated knowledge source at inference time.

A support chatbot built with RAG can answer questions about current return policies without retraining every time the policy changes. In contexts where accuracy matters to users, like pricing, eligibility, or legal information, RAG enables responses grounded in verified sources rather than model-generated estimates, directly reducing the risk of hallucination.[3]

Vector database in AI

A vector database is a specialized storage system for data represented as numerical vectors. In AI applications, text and other content are converted into vectors by an embedding model. Each vector captures the semantic meaning of the content as a series of numbers. A vector database stores those vectors and enables fast searches based on similarity, retrieving items semantically related to a query rather than just textually identical. Vector databases are the infrastructure that makes RAG work at scale. When a query arrives, the system converts it to a vector, searches for similar vectors, and passes the closest matches to the language model as context. Without this, semantic search at the speed modern applications need would not be practical.

Vector databases come up in discussions about how an AI feature accesses company knowledge. Tools like Pinecone, Weaviate, and pgvector are common implementations. Knowing what a vector database is helps teams ask better questions when a vendor describes their retrieval architecture or when engineering explains why a knowledge base feature needs a different storage type.[4]

Grounding in AI models

Grounding in AI models

Grounding is the practice of anchoring a language model's responses to verifiable external sources rather than relying on what the model learned during training. A grounded response draws on provided documents, databases, or live data. An ungrounded response is generated from the model's internalized patterns, with no external reference to verify accuracy against.

The distinction matters because language models can produce confident, fluent text about topics where their training data was incomplete or outdated. Grounding introduces a check on that tendency by making the model draw from a defined source and, in many implementations, cite it. AI products for high-stakes domains, such as legal or financial information, typically treat grounding as a core design requirement.

A grounded response can point users to a source. An ungrounded response asks users to trust the model. When teams decide whether to ground an AI feature, they are making a tradeoff between response breadth and trustworthiness, a conversation that benefits from shared vocabulary across product, design, and engineering.[5]

Knowledge cutoff

Knowledge cutoff

A knowledge cutoff is the date after which a language model has no information from training. Because LLMs are trained on static datasets compiled up to a specific point, the model cannot know about events or developments that occurred after that date. Asking a model about something post-cutoff will produce either a refusal, a guess, or outdated information presented with false confidence. Knowledge cutoffs have direct product implications. An AI assistant built on a model with a two-year-old cutoff may give incorrect answers about current software versions, regulations, or product details without signaling any uncertainty. This is not a bug in the usual sense. It is a structural constraint of how the model was built.

Product teams address knowledge cutoffs through RAG, which supplements the model with current information at inference time, or through fine-tuning to update domain knowledge. Some products surface the cutoff date in the UI to set user expectations. For product managers writing specs for AI features, defining how the product handles questions about recent events is a necessary requirement, not an edge case.

Pro Tip! When a user asks an AI feature about recent events and gets confident but wrong answers, a knowledge cutoff is often the root cause.

Model alignment

Model alignment

Model alignment is the process of ensuring a language model's behavior reflects the values, goals, and constraints its developers and operators intend. An aligned model responds helpfully, refuses harmful requests, avoids misleading outputs, and behaves consistently with the product's design intent. Misalignment occurs when behavior diverges, whether by being too restrictive, too permissive, or unpredictable.

Alignment is distinct from capability. A highly capable model can still be misaligned, generating content that is harmful, off-brand, or legally risky in certain contexts. An over-aligned model creates a different problem, refusing reasonable requests out of excessive caution and frustrating users. Alignment surfaces when users report that an AI feature won't do what they ask, or when a model produces outputs the team did not anticipate. These situations, whether too restrictive or too permissive, are alignment problems. Treating alignment as a named concept helps teams diagnose and discuss it in vocabulary that engineering, legal, and policy stakeholders share.