Partial matching in full-text search enables finding documents that contain variations or subsets of a search term rather than requiring exact matches. This is achieved through techniques like tokenization, n-gram indexing, and wildcard queries. Unlike exact keyword searches, partial matching accounts for prefixes, suffixes, or substrings within terms, making it useful for autocomplete, typo tolerance, or flexible querying. For example, searching for “app” could return results containing “apple,” “application,” or “snappier,” depending on the method used.
One common approach involves n-gram tokenization, where terms are split into smaller overlapping character sequences. For instance, a trigram (3-character sequence) of “apple” would generate tokens like “app,” “ppl,” and “ple.” When a user searches for “app,” the engine matches the trigram “app” against indexed terms. This method is efficient for substring matching but increases index size due to storing multiple n-grams. Edge n-grams, which focus on prefixes (e.g., “a,” “ap,” “app” for “apple”), are often used for autocomplete features. Search engines like Elasticsearch leverage this by configuring analyzers to generate edge n-grams during indexing, enabling fast prefix-based queries.
Another method uses wildcard operators, such as *
(matches any sequence) or ?
(matches a single character). For example, a query like “app" matches terms starting with “app,” while "pple” matches terms ending with “pple.” However, leading wildcards (e.g., *pple
) can be slow unless supported by reverse indexes. Some engines also support fuzzy matching, which tolerates typos by measuring edit distance (e.g., “apl” matches “apple” with one insertion). While not strictly partial matching, fuzzy techniques complement it by handling approximate terms. Databases like PostgreSQL use LIKE
clauses or full-text extensions (e.g., tsvector
) for basic partial matching, though performance varies. Developers must balance precision, index size, and query speed—n-grams and wildcards trade storage for flexibility, while fuzzy methods prioritize tolerance over exactness.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word