Autocomplete at Scale: How Promed HIS Delivers Sub-Millisecond Drug and Terminology Search Across 14 Million Medical Records

In clinical software, a slow autocomplete is not a minor inconvenience. It is a patient safety risk. When a pharmacist types the first three letters of a drug name and waits — even for two seconds — the cognitive thread breaks. They may accept the first suggestion rather than the correct one. They may retype. In high-volume dispensing environments, that friction compounds across hundreds of interactions per shift.

This article describes the architecture behind the drug interaction and vocabulary module in Promed HIS, Yoctobe’s Hospital Information System. The vocabulary module is the most comprehensive of its kind available in a commercially deployed HIS — covering drug nomenclature, clinical findings, laboratory observations, diagnostic codes, and multilingual synonyms across more than 14 million medical concept strings. This article explains specifically how we engineered the terminology search layer to serve prefix completions across that entire corpus in under one millisecond for the majority of queries.

The Problem Space: Medical Terminology Is Not Ordinary Data

Most autocomplete tutorials assume a few hundred thousand records at most — product names, city names, usernames. Medical terminology is a different order of magnitude entirely.

A comprehensive clinical terminology corpus consolidates multiple international source vocabularies into a unified concept table. A single production deployment contains strings drawn from:

RxNorm — drug names, ingredient concepts, clinical drug forms, brand names
SNOMED CT — clinical findings, procedures, body structures, organisms
LOINC — laboratory and clinical observations
ICD-10-CM / ICD-11 — diagnostic codes and their preferred terms
MeSH, NCI Thesaurus, DrugBank — and dozens more

At full scale, the concept string table exceeds 14 million rows. Each row represents a string — a name, synonym, abbreviation, or translation — associated with a concept identifier, a source vocabulary, and a language code. A single drug concept like paracetamol may appear under dozens of rows: its INN name, brand names, RxNorm ingredient form, SNOMED preferred term, French translation, Spanish translation, and so on.

For a drug interaction and clinical vocabulary module to be genuinely useful at the point of care, it must search this corpus in real time, as the clinician types, with no perceptible latency.

Why the Naive Approach Fails

The instinctive first implementation of terminology autocomplete is a SQL LIKE query:

SELECT str FROM terminology
WHERE str LIKE 'para%'
AND suppress = 'N'
AND lang = 'ENG'
LIMIT 25

On a table of 14 million rows, this query forces MySQL’s InnoDB engine into a range scan against the full-text index, which is optimised for relevance scoring, not prefix matching. In our benchmarks, cold queries consistently exceeded 30 seconds. Even with a warm InnoDB buffer pool, execution times remained in the 4–8 second range — completely unsuitable for real-time autocomplete.

The fundamental issue is architectural: MySQL’s FULLTEXT index is built for natural language queries scored by relevance. B-tree indexes support prefix scans efficiently, but the source terminology table has no B-tree index on the string column in a form that benefits prefix lookups. Adding one directly to the source table is inadvisable — the column is VARCHAR(3000), far beyond MySQL’s index prefix limits, and structural changes to imported terminology tables break reproducible vocabulary update pipelines.

The Three-Layer Architecture

The solution Yoctobe implemented for Promed HIS separates concerns across three distinct layers, each optimised for a specific access pattern.

Layer 1: The Shadow Table

Rather than modifying the source terminology table, we introduce a purpose-built shadow table: autocomplete_index. It is populated from the source data and maintained incrementally, but it exists independently and can be rebuilt without touching the source terminology.

CREATE TABLE autocomplete_index (
    str_lower   VARCHAR(500)  NOT NULL,
    str         VARCHAR(3000) NOT NULL,
    sab         VARCHAR(40)   NOT NULL,
    lat         CHAR(3)       NOT NULL,
    PRIMARY KEY (str_lower, sab, lat),
    INDEX idx_ac_lat_str (lat, str_lower),
    INDEX idx_ac_sab_str (sab, str_lower),
    INDEX idx_ac_all     (lat, sab, str_lower)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Several design decisions are embedded here. str_lower stores the lowercased, truncated (500 character) form of the original string. This normalisation means prefix queries are case-insensitive by construction, without runtime LOWER() calls that would prevent index utilisation. The composite primary key on (str_lower, sab, lat) eliminates duplicate entries across vocabulary sources automatically via INSERT IGNORE. The three composite indexes cover the three most common query shapes: language-only, vocabulary-filtered, and both combined.

A prefix scan against this table is now a B-tree range operation. For the query str_lower LIKE 'para%', MySQL performs a single B-tree seek to the first entry matching para, then reads sequentially until the prefix no longer matches. On 14 million normalised strings, this executes in under 10 milliseconds — a 300-fold improvement over the naive approach.

Layer 2: Sharded Redis Sorted Sets

For the most common query pattern — English-language, no vocabulary filter, query length two or more characters — we can do better still.

Redis sorted sets support a lexicographic range command, ZRANGEBYLEX, that retrieves all members between two string bounds. A single sorted set containing all 4 million distinct English term strings can answer a prefix query in microseconds. However, there is a critical subtlety: Redis is single-threaded. A ZRANGEBYLEX operation on a sorted set with 4 million members for a short prefix like "a" must scan potentially hundreds of thousands of entries before finding its limit. During that scan, every other Redis command is blocked.

The solution is sharding by the first two characters of the term. Instead of one monolithic key, we maintain 676 keys of the form ac:ENG:{xy} where xy is the lowercase two-character prefix — aa, ab, …, zz. Each shard contains approximately 6,000 members on average. A ZRANGEBYLEX on 6,000 members is O(log 6000 + limit) — effectively constant time, completing in under one millisecond including network round-trip.

shard_key = f"ac:ENG:{query.lower()[:2]}"
suggestions = await redis.zrangebylex(
    shard_key,
    f"[{query.lower()}",
    f"[{query.lower()}\xff",
    start=0, num=limit
)

Single-character queries — which would require scanning all shards beginning with that character — are deliberately routed to the shadow table instead. In clinical practice, single-character autocomplete is not useful; a clinician typing "p" for paracetamol is not yet at a point where suggestions have value.

Layer 3: The Query Router

The application layer routes each autocomplete request through a decision tree:

Redis fast path — if the query is English, unfiltered, two or more characters, and the Redis status key indicates shards are fully populated: serve from Redis. Latency: under 1 millisecond.
Shadow table medium path — if Redis is unavailable, still warming up, or the query has vocabulary or language filters: serve from autocomplete_index. Latency: 5–15 milliseconds.
Result cache — all responses, regardless of path, are cached in Redis with a configurable TTL under a SHA-256 derived key. Repeated queries for common prefixes — "met", "amox", "ibu" — are served from cache at sub-millisecond latency from the second request onward.

The clinical implication is significant. Even during the minutes immediately after application startup, before Redis shards are fully populated, autocomplete continues to function correctly via the shadow table. There is no degraded mode that returns empty results or errors — only a graceful latency difference that is imperceptible to the end user.

Population Strategy: Keyset Pagination at 14 Million Rows

Building the shadow table and Redis shards from a 14 million row source requires careful engineering to avoid locking the database or timing out application connections.

The instinctive approach — batch INSERT with LIMIT and OFFSET — fails at scale. OFFSET 7500000 instructs MySQL to scan and discard 7.5 million rows before returning the batch. Population time grows quadratically; the process consistently fails before completion on resource-constrained servers.

The correct approach is cursor-based (keyset) pagination using the primary key:

INSERT IGNORE INTO autocomplete_index (str_lower, str, sab, lat)
SELECT LOWER(SUBSTRING(str, 1, 500)), str, sab, lat
FROM terminology
WHERE suppress = 'N'
  AND id > :last_id
ORDER BY id
LIMIT :batch

Each batch seeks directly to the next unprocessed row via the primary key B-tree — O(log n) regardless of position. A 14.9 million row table with 50,000-row batches requires approximately 298 batches, each completing in 1–3 seconds, for a total population time of 5–15 minutes running cleanly in the background while the application serves traffic.

The cursor value — the maximum id processed — is persisted in an index_meta table within the same database transaction as each batch insert. If the process is interrupted by a crash, restart, or out-of-memory event, the next invocation reads the last committed cursor and resumes exactly from that point. No rows are skipped. No rows are duplicated. The INSERT IGNORE combined with the primary key constraint makes every batch idempotent.

Incremental Sync: Keeping Terminology Current

Medical vocabularies are updated continuously — new drug approvals, revised preferred terms, additional language translations, updated interaction data. In a deployed HIS, new terminology rows may be appended to the source table at any time without warning.

The sync mechanism is designed around a simple invariant: the cursor represents the highest source table id that has been indexed. On each manual sync trigger, the system compares the current source MAX(id) against the stored cursor. If they are equal, nothing has changed and the operation returns immediately. If the source table has grown, only the delta — rows with id greater than the cursor — is processed.

source_max = await get_source_max_id()
ac_cursor  = await get_ac_cursor()

if source_max <= ac_cursor:
    return {"status": "up_to_date"}

# process only rows with id > ac_cursor

This makes sync operations safe to trigger frequently. An operator who runs the sync endpoint after every vocabulary patch imports only the new rows, regardless of how many times the endpoint has been called previously. The operation is idempotent, resumable, and proportional in cost to the actual change volume rather than the total table size.

If a sync is interrupted mid-run, the next trigger resumes from the last committed cursor batch — not from zero. The combination of keyset pagination, transactional cursor commits, and INSERT IGNORE idempotency means interruption is always safe.

Memory Efficiency on Constrained Infrastructure

Healthcare institutions in emerging markets — the primary client base for Promed HIS — frequently operate with constrained server infrastructure. The architecture is explicitly designed to perform on a server with 8GB RAM and 2 CPU cores hosting MySQL, Redis, and the application simultaneously.

The memory budget breaks down as follows:

MySQL InnoDB buffer pool: 4 GB — sufficient to cache the autocomplete_index working set and warm source terminology indexes
Redis AC shards: approximately 640 MB — 4 million terms at ~129 bytes per member including skiplist overhead, across 676 shard keys
Application processes: 256 MB
OS and kernel: 512 MB
Free headroom: approximately 2.5 GB

Redis is configured with maxmemory 1800mb and volatile-lru eviction policy. The AC shard keys carry no TTL and are therefore never evicted under memory pressure. Search result cache keys carry TTLs and are the first to be evicted if memory tightens. This ensures that the core autocomplete index — the most expensive asset to rebuild — is always resident in memory.

Persistence is configured as RDB snapshot only, with AOF disabled. On VPS infrastructure with shared storage, AOF write latency introduces unpredictable Redis blocking. Snapshots at 15-minute intervals provide sufficient durability given that the Redis content is fully reconstructable from MySQL in under 20 minutes.

Clinical Relevance: Drug Interactions at the Point of Prescribing

The architecture described above is not incidental infrastructure — it directly enables clinical safety features in Promed HIS that would otherwise be impractical.

The drug interaction module works as follows. As a clinician types a drug name during order entry, autocomplete suggestions are drawn from normalised ingredient concepts — the most clinically relevant vocabulary layer for prescribing decisions. When a concept is selected, its canonical identifier is resolved, and the interaction database is queried in real time against the patient’s current medication list.

This workflow is only viable if autocomplete is genuinely instantaneous. A 500-millisecond response is acceptable in a product search context. In prescribing, it breaks the clinical thought process. The clinician must hold in working memory the drug they intend to prescribe, the patient’s current medications, the clinical indication, and the appropriate dose. Introducing latency at the drug selection step adds cognitive load at exactly the moment when precision matters most.

Sub-millisecond autocomplete is not a performance optimisation. It is a prerequisite for the interaction checking feature to function as designed.

Operational Endpoints

Promed HIS exposes three administrative endpoints for terminology index management:

GET /api/v1/complete/status returns the current state of both the shadow table and Redis shards, including row counts, cursor positions, and sync status. This allows operators to verify that terminology is current after a vocabulary update.

POST /api/v1/complete/status/populate triggers an incremental sync — processing only rows added since the last cursor. This is the routine maintenance operation, safe to call immediately after any terminology update.

POST /api/v1/complete/status/resync triggers a full rebuild — truncating the shadow table and Redis shards and repopulating from scratch. This is reserved for complete vocabulary version upgrades where the entire terminology corpus is replaced rather than extended.

During any rebuild operation, autocomplete degrades gracefully through the layer hierarchy: Redis shards are unavailable, so queries fall through to the shadow table, which continues to serve results from the previous terminology version until the rebuild completes.

Conclusion

The design pattern described here — shadow table with B-tree prefix index, sharded Redis sorted sets, cursor-based population, and a three-layer query router — is not tied to any specific terminology standard or vocabulary source. It is a general solution to the problem of real-time prefix search across any large, periodically updated reference dataset in a healthcare context. It is also the foundation that makes the vocabulary module of Promed HIS the most extensive and performant of its kind on the market: 14 million concepts, sub-millisecond retrieval, and a sync architecture that keeps the index current without ever taking the system offline.

What makes it particularly suited to clinical informatics is the combination of properties it delivers simultaneously: sub-millisecond latency for the common case, graceful degradation when the fast layer is unavailable, crash-safe incremental sync for ongoing maintenance, and memory efficiency that makes it viable on infrastructure representative of the environments where clinical software is actually deployed.

In medical terminology search, the cost of getting this wrong is not a poor user experience. It is a clinician choosing the wrong drug because the right one was too slow to appear.

Promed HIS is developed by Yoctobe Ltd, a UK-based healthcare software company specialising in Laboratory Information Systems and Hospital Information Systems for clinical environments across Africa and the Middle East. For technical enquiries, visit yoctobe.com.

Autocomplete at Scale: How Promed HIS Delivers Sub-Millisecond Drug and Terminology Search Across 14 Million Medical Records

The Problem Space: Medical Terminology Is Not Ordinary Data

Why the Naive Approach Fails

The Three-Layer Architecture

Layer 1: The Shadow Table

Layer 2: Sharded Redis Sorted Sets

Layer 3: The Query Router

Population Strategy: Keyset Pagination at 14 Million Rows

Incremental Sync: Keeping Terminology Current

Memory Efficiency on Constrained Infrastructure

Clinical Relevance: Drug Interactions at the Point of Prescribing

Operational Endpoints

Conclusion

BlackFlow Technical Team

Leave a Comment Cancel reply

You May Also Like

Choosing the Right Enterprise Software Development Company: Your Complete Guide

About us

Links

Follow Us

Autocomplete at Scale: How Promed HIS Delivers Sub-Millisecond Drug and Terminology Search Across 14 Million Medical Records

The Problem Space: Medical Terminology Is Not Ordinary Data

Why the Naive Approach Fails

The Three-Layer Architecture

Layer 1: The Shadow Table

Layer 2: Sharded Redis Sorted Sets

Layer 3: The Query Router

Population Strategy: Keyset Pagination at 14 Million Rows

Incremental Sync: Keeping Terminology Current

Memory Efficiency on Constrained Infrastructure

Clinical Relevance: Drug Interactions at the Point of Prescribing

Operational Endpoints

Conclusion

BlackFlow Technical Team

Leave a Comment Cancel reply

You May Also Like

How Shopify Scales to 600,000+ Merchants: The Ruby on Rails Architecture That Defied Expectations

Choosing the Right Enterprise Software Development Company: Your Complete Guide