Implementing AI-Powered Named Entity Recognition (NER) in Mobile Applications
NER (Named Entity Recognition) extracts structured data from unstructured text. User enters "deliver tomorrow at 6 PM to 5 Maple Street, apartment 12"—NER extracts date, time, street address, and apartment number as separate fields. Without NER, it's either manual entry in 5 fields or brittle regex that breaks on the first nonstandard format.
Where NER is used in mobile apps
Smart forms and autofill. User messages a courier—the app parses delivery address and time without separate form fields.
Search with filters. "iPhone 15 Pro 256GB black" → {brand: Apple, model: iPhone 15 Pro, storage: 256GB, color: black}. Structured queries beat full-text search.
Chatbots and voice assistants. Extract parameters from free speech or text to fill dialog slots.
Receipt and document processing. OCR text from receipt → {store, total, date, items}.
Technical approaches
spaCy + custom NER model
spaCy is the production standard for NER. The base English model en_core_web_lg recognizes persons, organizations, locations, dates. For domain-specific entities (clothing sizes, product SKUs, medical terms), fine-tune the model.
import spacy
from spacy.training import Example
# Load base English model
nlp = spacy.load("en_core_web_lg")
# Add custom entity types
ner = nlp.get_pipe("ner")
ner.add_label("PRODUCT_SIZE") # "42", "XL", "M/L"
ner.add_label("PRODUCT_COLOR") # "black", "navy blue"
ner.add_label("ARTICLE") # "SKU 12345", "PN-ABC"
# Training example
TRAIN_DATA = [
("I want to find sneakers size 42 in blue color SKU 98765",
{"entities": [(35, 37, "PRODUCT_SIZE"), (45, 49, "PRODUCT_COLOR"), (59, 64, "ARTICLE")]}),
]
# Train on custom data
optimizer = nlp.resume_training()
for text, annotations in TRAIN_DATA:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
nlp.update([example], sgd=optimizer)
Transformer NER via Hugging Face
For high accuracy on complex domains: fine-tuned bert-base-uncased with NER head. Slower than spaCy (50–200 ms vs 5–20 ms), but handles complex contexts and long entities better.
from transformers import pipeline
ner_pipeline = pipeline(
"token-classification",
model="bert-base-cased",
aggregation_strategy="simple" # combines B- and I- tokens into one entity
)
def extract_entities(text: str) -> list[Entity]:
raw_entities = ner_pipeline(text)
return [
Entity(
text=e["word"],
label=e["entity_group"],
confidence=e["score"],
start=e["start"],
end=e["end"]
)
for e in raw_entities
if e["score"] > 0.7
]
Regex + NER: hybrid for structured domains
For typed entities with predictable formats, regex is more reliable than ML. Phone numbers, emails, SKUs, dates in specific formats—use regex. Organizations, locations, free descriptions—use NER. The hybrid works better than each approach alone.
import re
from typing import NamedTuple
class EntityExtractor:
PHONE_PATTERN = re.compile(r'(?:\+1)?[\s\-]?\(?\d{3}\)?[\s\-]?\d{3}[\s\-]?\d{4}')
EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
DATE_PATTERN = re.compile(r'\b(\d{1,2})[./](\d{1,2})(?:[./](\d{2,4}))?\b')
def extract_all(self, text: str) -> dict:
# Regex for structured formats
phones = self.PHONE_PATTERN.findall(text)
emails = self.EMAIL_PATTERN.findall(text)
# NER for free-form entities
ner_entities = extract_entities(text)
locations = [e.text for e in ner_entities if e.label in ("LOC", "GPE")]
persons = [e.text for e in ner_entities if e.label == "PER"]
return {
"phones": phones,
"emails": emails,
"locations": locations,
"persons": persons
}
Mobile integration
iOS: NER for smart form filling
// iOS: NER via server API with form autofill
class AddressFormViewModel: ObservableObject {
@Published var street = ""
@Published var building = ""
@Published var apartment = ""
@Published var deliveryTime = ""
func parseFromText(_ userText: String) {
Task {
let entities = try await nerApi.extract(text: userText)
await MainActor.run {
if let address = entities.first(where: { $0.label == "ADDRESS" }) {
parseAddressComponents(address.text)
}
if let time = entities.first(where: { $0.label == "TIME" }) {
deliveryTime = time.text
}
}
}
}
}
On-device NER via CoreNLP or TFLite
For simple domain entities (SKUs, sizes, colors in a specific catalog), deploy a compact TFLite NER model (< 20 MB) directly on device. This eliminates latency and works offline.
Apple's NaturalLanguage.framework with NLTagger handles basic entity types out of the box: names, organizations, places—no external dependencies:
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = userInput
tagger.enumerateTags(in: userInput.startIndex..<userInput.endIndex,
unit: .word,
scheme: .nameType,
options: [.omitWhitespace]) { tag, range in
if let tag = tag {
print("Entity: \(userInput[range]), type: \(tag.rawValue)")
}
return true
}
For non-English languages, use only as a pre-filter or for Latin-script apps.
Process
Define target entity types for your domain.
Collect and label training data in BIO format.
Choose between spaCy, transformer model, and regex hybrid.
Integrate NER API into mobile client: smart forms, search, dialog.
Timeline estimates
Basic NER with off-the-shelf model + API—3–5 days. Fine-tuning on custom domain entities—1–2 weeks. Full mobile UI integration with smart form filling—2–3 weeks.







