|
When creating a page which includes something typed in by the user (from a previous text-input), we must be careful. What characters could the user type that our browser would not interpret as data, but instead as part of the html page's structure? (Here's our sample form.)
We have three levels of html to think about:
Here is an improved form which sanitizes its input before printing. Much better! However, there is still a minor issue: consider a multi-line message: How does it render? We realize that for the browser to render a newline, its html-source must contain “<br/>”, and so our code-to-print-that-html must look for any newlines in the user's input, and replace that character with the br tag. Here's a fully-working, sanitizing form.
video: sanitizing html 2: nl2br (6m59s)Clearly, we want to call htmlspecialchars on anything we print that came from the user. What if we print (say) “echo $_POST["some-radiobutton-name"];” — do we need to sanitize that? At first, it seems like we don't, since the only possible values for that name would come from our own form (not the end-user). …Sadly, we do still need to do this, because it's possible for an attacker to make a fake-form which submits to the same action URL, and contains inputs with the same name attribute, but entirely forged values.
to prof: If you want to demo an XSS attack (or at least, submitting a form where printing un-sanitized contest runs some javascript), you have to:
- [on chrome:] quit all open Chrome windows, and re-start chrome with “--disable-web-security” or “--disable-XSS-auditor” on the command-line;
- [on rucs.radford.edu] call stripslashes on what you lookup in $_POST, since rucs.radford.edu still inserts magic-quotes (deprecated since ).
Btw, another problem can be taking a string that is already
renders-to-data-in-html,
and thinking that it's raw-data,
and inadvertently re-encoding it, oops!
Think of these two things as being different types (even though
our impoverished language merely uses “string” to represent both):
The type “data-encoded-as-html”, and “raw-data”.
Then, a function like htmlspecialchars can be viewed
as a type-conversion,
and (in an ideal world) we'd have tools that understand those types
and would give warnings if you failed-to-convert or double-converted.
(used w/ implicit permission)
Some functions to consider:
Quick q: suppose a user types:
hi <3in a textarea whose name is msg. What is
We have seen arrays, and mentioned that if they have all-numeric indices (keys) then we can process them with a for loop or a while loop, using the same syntax that Java and Javascript happen to use.
Then we saw that if an array has keys which aren't all numeric, we can use a foreach loop to process them:
$myData = array( 'hi' => 'hallo', 'good day' => 'guten Tag', 'see you later' => 'auf wiedersehen' ); foreach ($mydata as $german) { echo $german, "\n"; } foreach ($mydata as $english => $german) { echo "You say '$english', I say '$german'.\n"; } |
Look at the documentation for scandir.
Since it returns an array of filenames,
it's a natural match to use with other functions that want an array of strings:
For example,
echo htmlLines( scandir( '/ibarland/Tmp' ) );
Suppose we wanted an English list of hyperlinks, separated by commas, with the word "and" before the last item. This decomposes into two orthogonal parts:
To create the array, hopefully you also used your function hyperlink, written from hw02. (If you wrote the same long HTML a tag over and over, that's a sign that a function would be better.) So you might have a loop:
$URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" ); $URLsAsHTML = array(); foreach ($URLsAsText AS $url) { $URLsAsHTML[] = hyperlink($url); } echo "It should appeal to users of ", commaSeparatedList( $URLsAsHTML ), "." |
Any other repeated stuff?
Hmm, the “http:” prefix
was kinda annoying, but writing a loop for that seems definite overkill.
(Design Question:
Should hyperlink be prepending a http:?
How does this limit what it can do?
Does it violate the principle of least surprise?)
It's kinda annoying to keep writing loops that make a new array of updated values. Most of the loop is very rote — the only part that differs is the particular rule to transform the individual element to the new element. (In the example above, the answer is “the function hyperlink”.)
There is a handy function, array_map: You pass it an array of data, and you pass it the rule (function) on how to transform each individual datum, and it gives you back the entire transformed array. So our loop above gets turned in to:
$URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" ); $URLsAsHTML = array_map( "hyperlink", $URLsAsText ); echo "It should appeal to users of ", commaSeparatedList( $URLsAsHTML ), "."; |
$URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" ); echo "It should appeal to users of ", commaSeparatedList( array_map( "hyperlink", $URLsAsText ) ), "."; |
Finally, note that we can also handle the “prepend “http://” to each item” issue. We could make a separate function and pass that to array_map, or we could use an anonymous function:
$URLsAsText = array_map( function ($domain) { return "http://" . $domain; }2, array( "d20srd.org", "www.radford.edu", "google.com" ) ); |
echo "It should appeal to users of ", commaSeparatedList( array_map( "hyperlink", array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" ) ) ), "."; |
function arraymap( $arr, $func ) { $result = array(); foreach ($arr AS $k => $v) { $result[$k] = $func($v); // call the function we were handed, and store the answer in our array $result. } return $result; } |
This page licensed CC-BY 4.0 Ian Barland Page last generated | Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |