RU beehive logo ITEC dept promo banner
ITEC 325
2014spring
ibarland

homelectshws
D2Lbreeze (snow day)

lect21-regexps
regular expressions

regular expression intro

regexps.pdf

Representing regular expressions: They're not the same type as strings2. Various languages have slightly different ways of representing regexps: Some …

PHP regular expressions

PHP regular-expression matching: In php, regular expressions are strings delimited by a special character (usually /). Some very quick examples: lect21-regexps-examples.php

      echo preg_match( '/abcd/', 'abcd' );
      echo preg_match( '/abcd/', 'azcd' );      // false

      echo preg_match( '/a..d/', 'azcd' );      // . matches any single character (besides newline, null)

      echo preg_match( '/ab*cd/', 'abbbbcd' );  // b*  matches 0-or-more-b's.
      echo preg_match( '/ab*cd/', 'abcd' );  
      echo preg_match( '/ab*cd/', 'acd'  ); 

      echo preg_match( '/ab+cd/', 'abbbbcd' );  // b+  matches 1-or-more-b's.
      echo preg_match( '/ab+cd/', 'abcd' );  
      echo preg_match( '/ab+cd/', 'acd'  );     // false

      echo preg_match( '/ab.*cd/', 'abcd' );
      echo preg_match( '/ab.*cd/', 'abXd' );               // .* and .+ are a common patterns.
      echo preg_match( '/ab.*cd/', 'abBlahBlahBlahcd' );
      echo preg_match( '/ab.+cd/', 'abcd' );               // false
      echo preg_match( '/ab.+cd/', 'abXcd' );
      echo preg_match( '/ab.+cd/', 'abBlahBlahBlahcd' );   // false

      
      echo preg_match( '/ab?cd/', 'acd' );  // b?  matches 0-or-1 b
      echo preg_match( '/ab?cd/', 'abcd' );  
      echo preg_match( '/ab?cd/', 'abbbbcd'  );     // false

      echo preg_match( '/(ab)*cd/', 'ababababcd' );  // parens do grouping
      echo preg_match( '/(ab)*cd/', 'abbbbcd' );  // false


      // WARNING: preg_match looks to see if the string *contains* a match!
      echo preg_match( '/row/', 'How now, brown cow?' );  // true
      echo preg_match( '/.ow/', 'How now, brown cow?' );  // true

      // Use "^" to specify the start-of-string, and "$" to specify end-of-string.
      echo preg_match( '/^.ow$/', 'How now, brown cow?' );  // false
      echo preg_match( '/^.ow$/', 'Zow' );
      echo preg_match( '/^.ow/', 'Zowee' );
      echo preg_match( '/w.e$/', 'Wowee Zowee' );

      
      echo preg_match( '/[WZ]ow/', 'Wowee Zowee' );  // square-brackets match any one character from the set
      echo preg_match( '/[WZ]ow/', 'Yow' );          // false
      echo preg_match( '/[W-Z]ow/', 'Yowee' );       // square-brackets can contain a *range*
      echo preg_match( '/ab[0-9]+de/', 'ab789de' );


      echo preg_match( '/[0-9]*/', '00047' );        // Beware: matching just a * expression (w/o ^,$)!
    
See the manual.
Note: the name “preg” comes from "Perl compatible"; earlier PHP used the POSIX regexp's but PHP decided to deprecate that.

Your task: What is a regular expression to match...

False positive, and false negative:

test result
negativepositive
is test accurate? false false negative false positive
true true negative true positive
In our setting, the “test” is preg_match, and “accurate” means whether the value returned is what we want it to be.

Atomic regexps

Compound regexps

There are also ways of building bigger regexps out of smaller ones:

NOTE: In php, no trailing “g” allowed, as in javascript!
(There are other trailing modifiers however — e.g. i for case-insensitive, and more.)

Three helpful functions:

regexps vs. unicode


2C, ML, and Haskell let you introduce new type-synonyms, but not Java -- you have to introduce a new class that wraps strings.      

1 Similarly, I'd love to have a language that lets me rename types2, so that I could have string (for raw-data), html-data (safe to concatentate to HTML), and sql-data (safe to splice into a SQL query as data). htmlspecialchars and mysql_real_escape_string can be thought of as consturctors for these new types, but the type-system doesn't help protect me from giving a raw (unsanitized) string when it was expecting some already-sanitized data.      

homelectshws
D2Lbreeze (snow day)


©2014, Ian Barland, Radford University
Last modified 2014.Mar.21 (Fri)
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Powered by PLT Scheme