Detect Language
Identify the language of a text sample. Uses statistical n-gram matching across hundreds of languages.
0 characters
0 characters
About Detect Language
Language detection works by counting character n-grams in your text and comparing the distribution to known-language profiles. This tool uses franc-min, which covers the world's most-spoken languages and runs entirely in your browser. Longer samples yield more accurate results — a single short word may be ambiguous.
When to use it
- Routing user-submitted content to the right localization team
- Tagging documents by language for archival
- Pre-flighting input before sending to a language-specific NLP tool
- Inspecting an unfamiliar text to identify what it is
How it works
The text is analyzed by franc-min, which computes trigram (3-character) frequencies and compares them against profiles for ~80 of the most-spoken languages. The top candidates are returned with confidence scores; if confidence is too low (very short input or mixed languages), 'undetermined' is shown.
Examples
Bonjour, comment allez-vous?
French (fra) — high confidence
Frequently asked questions
- How long does the input need to be?
- Around 50+ characters works reliably. Very short phrases (a single word) often produce 'undetermined' or wrong guesses, since n-gram statistics need data.
- Are all languages supported?
- franc-min covers ~80 of the most-spoken languages. The full 'franc' package covers ~400+ but is much larger; franc-min is enough for almost all real-world content.
- Why ISO 639-3 codes?
- ISO 639-3 (three-letter codes) covers more languages than the older two-letter ISO 639-1. This tool shows both the code and the English language name.