🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

What is a wildcard search in full-text search?

A wildcard search in full-text search is a technique that allows users to substitute one or more characters in a search query with special symbols, enabling flexible matching of terms with unknown or variable parts. The most common wildcards are the asterisk (), which typically represents zero or more characters, and the question mark (?), which usually matches a single character. For example, searching for "compter" could return "computer", "compacter", or "completer", while “b?g” might match "bag", "big", or "bug". This approach is useful when the exact spelling or form of a term is uncertain, or when targeting variations of a word within a large dataset.

Under the hood, wildcard searches rely on pattern-matching algorithms that scan indexed text for sequences aligning with the query’s structure. For instance, a search engine might use an inverted index—a data structure mapping terms to their locations in documents—to efficiently find matches. However, wildcards can impact performance. A trailing wildcard like “run" can leverage the index by scanning terms starting with "run", but a leading wildcard like "ing” forces the engine to check every term ending with "ing", which is slower. Some systems optimize for middle or leading wildcards using techniques like n-grams (predefined text fragments) or edge n-grams (prefix-based fragments), but these require additional configuration and storage.

Wildcard searches are practical for scenarios like autocomplete features (e.g., "progr" suggesting “programming”), handling typos (“col?r” matching “color” or “colour”), or querying unpredictable data formats (product codes like "ABC-123"). However, developers should use them judiciously. Overusing wildcards, especially leading ones, can slow down queries. Alternatives like prefix queries (for trailing patterns) or fuzzy search (for typos) might be more efficient. Additionally, syntax varies across systems: Elasticsearch uses * and ?, while SQL uses % and _. Understanding these nuances ensures effective implementation without sacrificing performance.

Like the article? Spread the word