Saturday, April 23, 2005 6:45 PM Olaf Conijn

Write better Regular expression using unicode character classes

I use regular expressions pretty often, in most of the scenarios to validate user input.

 

But let’s say you wrote a regular expression to filter content from html documents.

In a scenario like this it is pretty easy to resort to using character classes like \w for word characters or \d for digits.

 

Nothing wrong with character classes that help readability, is there?

 

Well, after reading this blogpost from blogs.msdn I hit myself to the head realizing there is. \w in regular expressions equals a-zA-Z in ASCII. A word like ‘façade’ (though in the English dictionary) doesn’t solely consist of \w (or word) characters.

 

Fixing this can easily be done by adopting the use of Unicode character classes in your regular expressions. A list of character classes can be found here.

# re: Write better Regular expression using unicode character classes

Monday, May 02, 2005 10:40 AM by Olaf Conijn

Funny to see... I recently came across the same issue...... But this only works on the serverside. Javascript doesn't support character classes. Use \u instead!

# re: Write better Regular expression using unicode character classes

Monday, May 02, 2005 4:04 PM by Olaf Conijn

Or 'AJAX it' ;)


..I've been all about AJAX after seeing the following movie ;)

http://ajax.schwarz-interactive.de/download/ajax.wmv

I've been experimenting with schwarz's AJAX implementation + ASP.NET 2.0, works like a charm!