Wednesday, 4 March 2009

Using Regular Expressions to identify HTML tags

In many situations we might find ourselves looking to find specific HTML tags within a fragment in order to process them in some way. A good example would be a HTML “white list” of tags that you want to pre-process to allow through anti cross site scripting encoding.

The following regular expression will find all tags specified (highlighted red) in the input, regardless of them being start tags, end tags, self terminating tags and irrespective of the number of attributes etc.


This will also return correct results when looking for <I> tags in that it won’t incorrectly allow <IMG> to be a positive match.