|
home—lects—hws
D2L—breeze (snow day)
Unwieldy video from distance lecture, 2017-feb-21 (1h35m)
but you can watch just [0:24:40,1:33:00) (1h08m) about regular-expressions.
(You can skip over: opening pause [0:00:00,0:00:30),
some ERD review [0:00:30,0:04:30),
git-review [0:04:30,0:22:00),
and a few comments about projects-with-external-clients [0:22:00,0:24:40),
as well as the final few "shutting down" minutes at [1:33:00,1:35:00).)
Representing regular expressions: They're not the same type as strings2. Various languages have slightly different ways of representing regexps: Some …
PHP regular-expression matching:
In php, regular expressions are strings delimited by a special character
(usually
echo preg_match( '/abcd/', 'abcd' ); echo preg_match( '/abcd/', 'azcd' ); // false echo preg_match( '/a..d/', 'azcd' ); // . matches any single character (besides newline, null) echo preg_match( '/ab*cd/', 'abbbbcd' ); // b* matches 0-or-more-b's. echo preg_match( '/ab*cd/', 'abcd' ); echo preg_match( '/ab*cd/', 'acd' ); echo preg_match( '/ab+cd/', 'abbbbcd' ); // b+ matches 1-or-more-b's. echo preg_match( '/ab+cd/', 'abcd' ); echo preg_match( '/ab+cd/', 'acd' ); // false echo preg_match( '/ab.*cd/', 'abcd' ); echo preg_match( '/ab.*cd/', 'abXd' ); // .* and .+ are a common patterns. echo preg_match( '/ab.*cd/', 'abBlahBlahBlahcd' ); echo preg_match( '/ab.+cd/', 'abcd' ); // false echo preg_match( '/ab.+cd/', 'abXcd' ); echo preg_match( '/ab?cd/', 'acd' ); // b? matches 0-or-1 b echo preg_match( '/ab?cd/', 'abcd' ); echo preg_match( '/ab?cd/', 'abbbbcd' ); // false echo preg_match( '/(ab)*cd/', 'ababababcd' ); // parens do grouping echo preg_match( '/(ab)*cd/', 'abbbbcd' ); // false // WARNING: preg_match looks to see if the string *contains* a match! echo preg_match( '/row/', 'How now, brown cow?' ); // true echo preg_match( '/.ow/', 'How now, brown cow?' ); // true // Use "^" to specify the start-of-string, and "$" to specify end-of-string. echo preg_match( '/^.ow$/', 'How now, brown cow?' ); // false echo preg_match( '/^.ow$/', 'Zow' ); echo preg_match( '/^.ow/', 'Zowee' ); echo preg_match( '/w.e$/', 'Wowee Zowee' ); echo preg_match( '/[WZ]ow/', 'Wowee Zowee' ); // square-brackets match any one character from the set echo preg_match( '/[WZ]ow/', 'Yow' ); // false echo preg_match( '/[W-Z]ow/', 'Yowee' ); // square-brackets can contain a *range* echo preg_match( '/ab[0-9]+de/', 'ab789de' ); echo preg_match( '/[0-9]*/', '00047' ); // Beware: matching just a * expression (w/o ^,$)! |
Your task: What is a regular expression to match...
False positive, and false negative:
test result | |||
---|---|---|---|
negative | positive | ||
is test accurate? | false | false negative | false positive |
true | true negative | true positive |
Warning: Beware matching a top-level* expression: the empty-string matches it, and any string contains the empty-string! Thuspreg_match_all( "/(xyz)*/", "uh-oh") === 6 !!, since"uh-oh" has zero"xyz" 's at the start, followed by'u' , followed by zero more"xyz" 's, followed by'h' , followed by ….
bug?:preg_match_all( '/\p{N}/', "⁴٤𝟜4" ) is returning 1 for me (not 4), in php 5.4.3.
Checking for space (one of the most common situations) is the one
that is hardest:
We have
Also, we are probably least able to ignore this problem, because in web forms people might paste in text from web pages or Word documents, and those exper-typesetting programs do tend to use the various special-spaces.
Solution: Either
$str2 = preg_replace( '/\p{Z}+/', ' ', $str ); … if( preg_match(/\s+/',$str2)) … |
NOTE:
In php, no trailing “
(There are other trailing modifiers however —
e.g. i for case-insensitive,
and more.)
Warning:
home—lects—hws
D2L—breeze (snow day)
©2017, Ian Barland, Radford University Last modified 2017.Oct.09 (Mon) |
Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |