By Ran Bar-Zik | 9/27/2018 | General |Beginners

ES2018 - Unicode with Regex

ES2018 - Unicode with Regex

OK so, this is something pretty cool that we can do with regular expressions aka RegEx, and kinda blew my mind. Yes, but is it something worthwhile to learn you ask? Not only is it worthwhile to learn, it’s even fun! So shall we get started? Yes we shall.

 

Are you familiar with the flag u in RegEx? Long story short, it provides support for Unicode. For example, this amazing little piece of code:

let a = /😀$/u.test('😀'); // true

Funny? Sure. Amusing? Why not? Useful, maybe for Grandmas. But what do you think about this code that identifies if a text string is in Hebrew?

let a = /^\p{Script=Hebrew}+$/u.test('עברית'); // true

Did I get your attention? This has now been introduced in to ES2018—Unicode properties. And there are a lot of them! The format is really easy:

\p{LoneUnicodePropertyNameOrValue}

There’s a \p and then the braces that have the Unicode properties inside of them—it could have the name of the property and a value or just the name.

let a = /\p{General_Category=Lowercase_Letter}$/u.test('a'); // true

Unicode is a world of its own, so I’ll try to stay focused on the task at hand. This site has a full list of Unicode properties.

 

In general, there are a few categories of properties. The first is script which we’ve already seen. Unicode Scripts:

let a = /^\p{Script=Greek}+$/u.test('μετά') // true

Here we’re pretty much talking about writing systems. Using a Unicode property we won’t need any kind of mumbo jumbo to find a language. The RegEx verifies that the letters are within a certain range of the writing system. That is to say, if we enter in a space in a range of Latin letters, it won’t identify it as Hebrew.

let a = /^\p{Script=Hebrew}+$/u.test('שתי מילים') // false

If we want to be more specific in our range, we can move on to our second category, General_Category. Here we can find all kinds of interesting things like hyphens and dashes for example.

let a = /^\p{General_Category=Dash_Punctuation}+$/u.test('-־') // true

Or another example is currency symbols:

let a = /^\p{General_Category=Currency_Symbol}+$/u.test('$') // true

Let me remind you that we’re talking about one character—you can put it in wherever you want. For instance, if I want to check for number and currency type e.g. 400$ vs 400₪, I can do something like this:

let a = /^\p{General_Category=Currency_Symbol}[0-9]+$/u.test('₪400') // true

In that example the Unicode character is just a small part of the full regular expression.

 

We can use a capital P for negation—for instance any character that is not a currency symbol:

let a = /^\P{General_Category=Currency_Symbol}$/u.test('₪') // false

The capital P is for negation.

 

This addition to JavaScript ES2018 greatly enriches the use of Unicode in RegEx and gives us quite a bit more power for a broader and more precise usage of RegEx.

 

Previous article: ES2018 RegEx lookAhead and lookBehind

Next article: Finally method in Promise

 

About the author: Ran Bar-Zik is an experienced web developer whose personal blog, Internet Israel, features articles and guides on Node.js, MongoDB, Git, SASS, jQuery, HTML 5, MySQL, and more. Translation of the original article by Aaron Raizen

By Ran Bar-Zik | 9/27/2018 | General

{{CommentsModel.TotalCount}} Comments

Your Comment

{{CommentsModel.Message}}

Recent Stories

Top DiscoverSDK Experts

User photo
3355
Ashton Torrence
Web and Windows developer
GUI | Web and 11 more
View Profile
User photo
3220
Mendy Bennett
Experienced with Ad network & Ad servers.
Mobile | Ad Networks and 1 more
View Profile
User photo
3060
Karen Fitzgerald
7 years in Cross-Platform development.
Mobile | Cross Platform Frameworks
View Profile
Show All
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now