Remove Punctuation
Strip all punctuation marks from your text — periods, commas, dashes, brackets, and more. Unicode-aware.
0 characters
0 characters
About Remove Punctuation
Punctuation removal uses the Unicode category \p{P} to strip every character classified as punctuation: ASCII punctuation (.,!?;:'"-_) plus all the typographic variants from other scripts and writing systems. Letters, digits, whitespace, and symbols are left alone.
When to use it
- Normalizing text before comparison or hashing
- Cleaning up scraped content for tokenization
- Producing a 'bare' version of a phrase for matching
- Stripping decoration from text headed into a key or filename
How it works
The regex /\p{P}/gu matches any Unicode code point in the Punctuation category. All matches are removed. Adjacent whitespace is left as-is — collapse it separately if needed.
Examples
Hello, world! How's it going?
Hello world Hows it going
Frequently asked questions
- Are quote marks removed?
- Yes. Straight quotes, curly quotes, single quotes, double quotes, and angled quotes from various scripts are all in \p{P} and get stripped.
- What about symbols like + or =?
- Those are in the Symbol category (\p{S}), not Punctuation. They're preserved. Use the remove-special-chars tool to strip symbols too.
- Are dashes and hyphens removed?
- Yes — hyphen-minus, em-dash, en-dash, figure dash, and minus sign are all in \p{P}.