10+ regular expressions for efficient web development

by Jean. 27 Comments -

In programming, regular expressions are a very useful tool designed to validate, search, and match text patterns. In this article, I have compiled more than 10 incredibly useful regular expressions, for any language, that will probably be very beneficial to you.

Validate an URL

Is a particular url valid? The following regexp will let you know.

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \?=.-]*)*\/?$/

Source: http://snipplr.com/view/19502/validate-a-url/

Validate US phone number

This regexp will verify that a US phone number is valid.

/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/

Source: http://snippets.dzone.com/posts/show/597

Test if a password is strong

Weak passwords are one of the quickest ways to get hacked. The following regexp will make sure that:

  • Passwords will contain at least (1) upper case letter
  • Passwords will contain at least (1) lower case letter
  • Passwords will contain at least (1) number or special character
  • Passwords will contain at least (8) characters in length
  • Password maximum length should not be arbitrarily limited
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$

Source: http://imar.spaanjaars.com/QuickDocId.aspx?quickdoc=297

Get code within <?php and ?>

If for some reason you need to grab all the code contained within the <?php and ?> tags, this regexp will do the job:

<\?[php]*([^\?>]*)\?>

Source: http://snipplr.com/view/12845/get-all-the-php-code-between/

Match tel: urls

In a recent post, I showed you how you can use iPhone special link prfixes to automatically call someone.
This regular expression will match those tel: urls.

^tel:((?:\+[\d().-]*\d[\d().-]*|[0-9A-F*#().-]*[0-9A-F*#][0-9A-F*#().-]*(?:;[a-z\d-]+(?:=(?:[a-z\d\[\]\/:&+$_!~*'().-]|%[\dA-F]{2})+)?)*;phone-context=(?:\+[\d().-]*\d[\d().-]*|(?:[a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?:[a-z]|[a-z][a-z0-9-]*[a-z0-9])))(?:;[a-z\d-]+(?:=(?:[a-z\d\[\]\/:&+$_!~*'().-]|%[\dA-F]{2})+)?)*(?:,(?:\+[\d().-]*\d[\d().-]*|[0-9A-F*#().-]*[0-9A-F*#][0-9A-F*#().-]*(?:;[a-z\d-]+(?:=(?:[a-z\d\[\]\/:&+$_!~*'().-]|%[\dA-F]{2})+)?)*;phone-context=\+[\d().-]*\d[\d().-]*)(?:;[a-z\d-]+(?:=(?:[a-z\d\[\]\/:&+$_!~*'().-]|%[\dA-F]{2})+)?)*)*)$

Source: http://tools.ietf.org/html/rfc3966#section-3

Validate US zip code

When building a registration form, it is common to ask the user’s zip code. As forms are often boring, there’s a strong chance that the user will try to register false data. This regular expression will make sure he entered a valid American zip code.

^[0-9]{5}(-[0-9]{4})?$

Source: http://reusablecode.blogspot.com/2008/08/isvalidzipcode.html

Validate Canadian postal code

This regexp is very similar to the previous one, but it will match Canadian postal codes instead.

^[ABCEGHJ-NPRSTVXY]{1}[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$

Source: http://reusablecode.blogspot.com/2008/08/isvalidpostalcode.html

Grab unclosed img tags

As you probably know, the xhtml standard requires all tags to be properly closed. This regular expression will search for unclosed img tags. It could be easily modified to grab any other unclosed html tags.

<img([^>]+)(\s*[^\/])>

Source: http://snipplr.com/view/6632/grab-any-unclosed-xhtml-img-tags/

Find all CSS attributes

This regexp will find CSS attributes, such as background:red; or padding-left:25px;.

\s(?[a-zA-Z-]+)\s[:]{1}\s*(?[a-zA-Z0-9\s.#]+)[;]{1}

Source: http://snipplr.com/view/17903/find-css-attributes/

Validate an IBAN

I have recently worked on a banking application and this one was definitely a life-saver. It will verify that the given IBAN is valid.

[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}

Source: http://snipplr.com/view/15322/iban-regex-all-ibans/

Validate a BIC code

Another one very useful for any banking application or website: This regexp will validate a BIC code.

([a-zA-Z]{4}[a-zA-Z]{2}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?)

Source: http://snipplr.com/view/15320/bic-bank-identifier-code-regex/

If you’re interested in regular expressions, make sure you have read our “15 PHP regular expression for developers” post.

Comments (27) - Leave yours

  1. Ethan Gardner said:

    One I use a lot is =”[^"]*["] to get attribute values if I’m trying to clean up html. For example, you could use style=”[^"]*["] to detect any element with an inline style and replace both the inline style attribute and the attribute value with the find/replace function in your IDE.

  2. Michael said:

    The URL expression incorrectly rejects a number of valid URLs. For example, it rejects a URL where you use the IP address instead of a hostname. It rejects a URL that specifies a port number. It rejects any URL that is not http or https.

    There is an RFC that defines what makes a valid URL. I recommend using that as a specification to guide the development of your regular expression.

    Based on that, I have to wonder about the accuracy of the other regular expressions in the list. I would rate this list as Not Recommended.

  3. Jonathan Allen said:

    Ugh. Those don’t even begin to consider validity, they just determine if the pattern is remotely plausable. If your intent is to give meaningful feedback to the user you need to consider the checksum.

  4. Jonathan Allen said:

    @Jenna Molby

    It is impossible to validate an email address using regular expressions alone. Foruntately you should be able to find a real emal validation function in most platforms. Unfortuantely many of them are also broken, just less so.

  5. jasha said:

    What you guys mean by saying it is trouble to validate emails with regex? I guess all the websites do that. Can you guys be more specific?

  6. Jason S said:

    Regexes are great, and I use them a lot, but sometimes a special-purpose parser is best. For example, in Python there’s the urlparse module for URLs, which I think is more convenient than a regex (it also lets you access each part of the url as a named attribute, e.g. parsed_url.scheme, parsed_url.path).

    Long ago, I came across what was claimed to be an RFC-compliant regex for e-mail addresses. It was hundreds and hundreds of characters long. Can’t find it anymore.

  7. Hayden said:

    Wow! I’m not particular with any web development techniques yet. Thanks for posting it here. I now have another idea on how=)

    • EskiMag said:

      That “Validate an URL” was falling into infinite cycle and causing 100% CPU load. I think better version should be: /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\?&=.-]*)\/?$/

  8. Hendrik said:

    Maybe i am wrong … but doesn’t the URL-Validating Code fail to validate UTF8-URLs (especially with umlauts and stuff)? So in my opinion “\w” would be the better choice than “\da-z”.

  9. Danny van Kooten said:

    Nice list, thanks!

    @Rory, try this regex to see if an emailadress has the right mark-up. (name@domain.extension)

    $regex = ‘/([a-z0-9_.-]+)@([a-z0-9.-]+){2,255}.([a-z]+){2,10}/i’;

  10. Mickaël Wolff said:

    Regexp are not fitted for URL or e-mail validation. In PHP, they are better way to check a correct URL or e-mail: http://php.net/manual/en/function.filter-var.php

  11. Constance said:

    It looks like the CSS attributes one was mis-copied or mis-typed. It contains the character sequence (?[ which isn't in any regex library syntax I've ever seen. "(?" indicates extension syntax, but "[" is no extension character (nor does it look like it's being used as one, since a character class comes next).

    Even the parts that are technically valid are really strange: what's the point of adding {1} to anything? Or making a character class with just [;] in it?

  12. Dave Keech said:

    @jasha The email RFC defines emails in such a way that they are more complex than regexes can possibly be. This is a reasonably close regex: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html and even that one warns that it’s not perfect.

    Even if you can validate the *syntax* of an email address with a regex, you still don’t know if the domain has been registered, has name servers, has valid MX records, has a mail server running and whether it accepts mail for that address. All of these things can be easily checked with a few lines of any programming language.

    As for “URLs”… URLs can have arbitrary schemes (not just http and https which has already been mentioned) but also TLDs are not restricted to only 6 characters. Take a look at this canonical list of TLDs: http://www.iana.org/domains/root/db and remember that new ones are being bought all the time.

    The “Strong Password” regex does not determine whether a password is strong but only whether it has a number or symbol, an upper case letter, a lower case letter and at least 8 characters. It can’t tell *which* of these rules the user has broken, so the only feedback you can give the user is the full list of requirements. It can’t tell if the user has used their username as their password or the site name or their real name. It thinks “Password1″ is a good password. It thinks “Ab1…..” is a good password. Worse, it thinks “correct horse battery staple” is a bad password. It’s also pretty hard to understand and doesn’t work in all regex libraries.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please respect the following rules: No advertising, no spam, no keyword in name field. Thank you!