|
home—lects—exams—hws
D2L—breeze (snow day)
Representing regular expressions: They're not the same type as strings2. Various languages have slightly different ways of representing regexps: Some …
PHP regular-expression matching:
In php, regular expressions are strings delimited by a special character
(usually
echo preg_match( '/abcd/', 'abcd' ); echo preg_match( '/abcd/', 'azcd' ); // false echo preg_match( '/a..d/', 'azcd' ); // . matches any single character (besides newline, null) echo preg_match( '/ab*cd/', 'abbbbcd' ); // b* matches 0-or-more-b's. echo preg_match( '/ab*cd/', 'abcd' ); echo preg_match( '/ab*cd/', 'acd' ); echo preg_match( '/ab+cd/', 'abbbbcd' ); // b+ matches 1-or-more-b's. echo preg_match( '/ab+cd/', 'abcd' ); echo preg_match( '/ab+cd/', 'acd' ); // false echo preg_match( '/ab.*cd/', 'abcd' ); echo preg_match( '/ab.*cd/', 'abXd' ); // .* and .+ are a common patterns. echo preg_match( '/ab.*cd/', 'abBlahBlahBlahcd' ); echo preg_match( '/ab.+cd/', 'abcd' ); // false echo preg_match( '/ab.+cd/', 'abXcd' ); echo preg_match( '/ab.+cd/', 'abBlahBlahBlahcd' ); // false echo preg_match( '/ab?cd/', 'acd' ); // b? matches 0-or-1 b echo preg_match( '/ab?cd/', 'abcd' ); echo preg_match( '/ab?cd/', 'abbbbcd' ); // false echo preg_match( '/(ab)*cd/', 'ababababcd' ); // parens do grouping echo preg_match( '/(ab)*cd/', 'abbbbcd' ); // false // WARNING: preg_match looks to see if the string *contains* a match! echo preg_match( '/row/', 'How now, brown cow?' ); // true echo preg_match( '/.ow/', 'How now, brown cow?' ); // true // Use "^" to specify the start-of-string, and "$" to specify end-of-string. echo preg_match( '/^.ow$/', 'How now, brown cow?' ); // false echo preg_match( '/^.ow$/', 'Zow' ); echo preg_match( '/^.ow/', 'Zowee' ); echo preg_match( '/w.e$/', 'Wowee Zowee' ); echo preg_match( '/[WZ]ow/', 'Wowee Zowee' ); // square-brackets match any one character from the set echo preg_match( '/[WZ]ow/', 'Yow' ); // false echo preg_match( '/[W-Z]ow/', 'Yowee' ); // square-brackets can contain a *range* echo preg_match( '/ab[0-9]+de/', 'ab789de' ); echo preg_match( '/[0-9]*/', '00047' ); // Beware: matching just a * expression (w/o ^,$)! |
Your task: What is a regular expression to match...
False positive, and false negative:
test result | |||
---|---|---|---|
negative | positive | ||
is test accurate? | false | false negative | false positive |
true | true negative | true positive |
Warning: Beware matching a top-level* expression: the empty-string matches it, and any string contains the empty-string! Thuspreg_match_all( "/(xyz)*/", "uh-oh") === 6 !!, since"uh-oh" has zero"xyz" 's at the start, followed by'u' , followed by zero more"xyz" 's, followed by'h' , followed by ….
bug?:preg_match_all( '/\p{N}/', "⁴٤𝟜4" ) is returning 1 for me (not 4), in php 5.4.3.
Checking for space (one of the most common situations) is the one
that is hardest:
We have
Also, we are probably least able to ignore this problem, because in web forms people might paste in text from web pages or Word documents, and those exper-typesetting programs do tend to use the various special-spaces.
Solution: Either
$str2 = preg_replace( '/\p{Z}+/', ' ', $str ); … if( preg_match(/\s+/',$str2)) … |
NOTE:
In php, no trailing “
(There are other trailing modifiers however —
e.g. i for case-insensitive,
and more.)
Warning:
2C, ML, and Haskell let you introduce new type-synonyms, but not Java -- you have to introduce a new class that wraps strings. ↩
1
Similarly, I'd love to have a language
that lets me rename types2,
so that I could have
string (for raw-data),
html-data (safe to concatentate to HTML),
and sql-data (safe to splice into a SQL query as data).
3 Okay, that's actually vague: is “de Soto” or “da Vinci” all last name? Names are notoriously hard to characterize, especially across multiple cultures. My advice is to liberally accept what characters people say their name is; trimming and collapsing whitespace is about all I'd do. ↩
home—lects—exams—hws
D2L—breeze (snow day)
©2015, Ian Barland, Radford University Last modified 2015.Oct.09 (Fri) |
Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |