Extract Hashtags
Pull #hashtags out of social-media posts, captions, and other text. Unicode-aware.
0 hashtags found
About Extract Hashtags
The hashtag extractor scans text for #-prefixed tokens that look like hashtags. The matcher accepts Unicode letters, digits, and underscores after the # — so it works for English, Latin-extended, Cyrillic, CJK, and emoji-adjacent hashtags. Use 'Unique only' to deduplicate when scraping multiple posts.
When to use it
- Collecting hashtags from a batch of social-media captions
- Auditing a post for tag count and reach
- Building a hashtag library from inspiration posts
- Counting unique tags across a content campaign
How it works
The regex /#[\p{L}\p{N}_]+/gu matches a # followed by one or more Unicode letters, digits, or underscores. Whitespace and punctuation terminate the match.
Examples
Loving this! #coffee #morning #wfh #日本
#coffee #morning #wfh #日本
Frequently asked questions
- Are emoji included in the tag?
- Generally no — emoji aren't in the \p{L} category. A tag like #love❤️ matches only #love.
- Does it handle non-Latin scripts?
- Yes. The Unicode property \p{L} covers letters in all scripts. Japanese, Arabic, Cyrillic, and other tags work correctly.
- What about Twitter/X hashtag rules specifically?
- X allows letters, digits, and underscores. The extractor matches the same character set, so the output should mirror what X parses out of a post.