RU beehive logo ITEC dept promo banner
ITEC 325
2021spring
flo

sanitizing form input
sanitizing html

Sanitizing input

video: sanitizing html 1: htmlspecialchars (25m18s)

When creating a page which includes something typed in by the user (from a previous text-input), we must be careful. What characters could the user type that our browser would not interpret as data, but instead as part of the html page's structure? (Here's our sample form.)

We have three levels of html to think about:

Here is an improved form which sanitizes its input before printing. Much better! However, there is still a minor issue: consider a multi-line message: How does it render? We realize that for the browser to render a newline, its html-source must contain “<br/>”, and so our code-to-print-that-html must look for any newlines in the user's input, and replace that character with the br tag. Here's a fully-working, sanitizing form.

video: sanitizing html 2: nl2br (6m59s)

Clearly, we want to call htmlspecialchars on anything we print that came from the user. What if we print (say) “echo $_POST["some-radiobutton-name"];” — do we need to sanitize that? At first, it seems like we don't, since the only possible values for that name would come from our own form (not the end-user). …Sadly, we do still need to do this, because it's possible for an attacker to make a fake-form which submits to the same action URL, and contains inputs with the same name attribute, but entirely forged values.

to prof: If you want to demo an XSS attack (or at least, submitting a form where printing un-sanitized contest runs some javascript), you have to:
  1. [on chrome:] quit all open Chrome windows, and re-start chrome with “--disable-web-security” or “--disable-XSS-auditor” on the command-line;
  2. [on rucs.radford.edu] call stripslashes on what you lookup in $_POST, since rucs.radford.edu still inserts magic-quotes (deprecated since ).

Btw, another problem can be taking a string that is already renders-to-data-in-html, and thinking that it's raw-data, and inadvertently re-encoding it, oops! Think of these two things as being different types (even though our impoverished language merely uses “string” to represent both): The type “data-encoded-as-html”, and “raw-data”. Then, a function like htmlspecialchars can be viewed as a type-conversion, and (in an ideal world) we'd have tools that understand those types and would give warnings if you failed-to-convert or double-converted.
(used w/ implicit permission)


Optional but good to know: video: magic-quotes; array_map (24m01s)

Some functions to consider:

Quick q: suppose a user types:

  hi
<3
in a textarea whose name is msg. What is


We have seen arrays, and mentioned that if they have all-numeric indices (keys) then we can process them with a for loop or a while loop, using the same syntax that Java and Javascript happen to use.

Then we saw that if an array has keys which aren't all numeric, we can use a foreach loop to process them:

  $myData = array( 'hi' => 'hallo', 'good day' => 'guten Tag', 'see you later' => 'auf wiedersehen' );

  foreach ($mydata as $german) {
    echo $german, "\n";
    }

  foreach ($mydata as $english => $german) {
    echo "You say '$english', I say '$german'.\n";
    }
The foreach loop is of course one way to process each element of the $_POST array (if you didn't want code specific/different for each input form).


directory-processing

Look at the documentation for scandir.
Since it returns an array of filenames, it's a natural match to use with other functions that want an array of strings: For example, echo htmlLines( scandir( '/ibarland/Tmp' ) );


array_map

Suppose we wanted an English list of hyperlinks, separated by commas, with the word "and" before the last item. This decomposes into two orthogonal parts:

And you just wrote a function for making an English list, and you called that. The array you pass in must be an array of URLs.

To create the array, hopefully you also used your function hyperlink, written from hw02. (If you wrote the same long HTML a tag over and over, that's a sign that a function would be better.) So you might have a loop:

  $URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" );

  $URLsAsHTML = array();
  foreach ($URLsAsText AS $url) {
     $URLsAsHTML[] = hyperlink($url);
     }

  echo "It should appeal to users of ", commaSeparatedList( $URLsAsHTML ), "."
  

Any other repeated stuff? Hmm, the “http:” prefix was kinda annoying, but writing a loop for that seems definite overkill.
(Design Question: Should hyperlink be prepending a http:? How does this limit what it can do? Does it violate the principle of least surprise?)

It's kinda annoying to keep writing loops that make a new array of updated values. Most of the loop is very rote — the only part that differs is the particular rule to transform the individual element to the new element. (In the example above, the answer is “the function hyperlink”.)

There is a handy function, array_map: You pass it an array of data, and you pass it the rule (function) on how to transform each individual datum, and it gives you back the entire transformed array. So our loop above gets turned in to:

  $URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" );
  $URLsAsHTML = array_map( "hyperlink", $URLsAsText );
  echo "It should appeal to users of ", commaSeparatedList( $URLsAsHTML ), ".";
or we could even get rid of the intermediate variable, if we don't want to save the result1:
  $URLsAsText = array( "http://d20srd.org", "http://www.radford.edu", "http://google.com" );
  echo "It should appeal to users of ", 
       commaSeparatedList( array_map( "hyperlink", $URLsAsText ) ),
       ".";

Finally, note that we can also handle the “prepend “http://” to each item” issue. We could make a separate function and pass that to array_map, or we could use an anonymous function:

  $URLsAsText = array_map( function ($domain) { return "http://" . $domain; }2,
                           array( "d20srd.org", "www.radford.edu", "google.com" ) );
  

video from distance lecture (breeze), 2017-feb-14 (1h23m), REVIEWING this info

1 Heck, if you don't even want to name the original array, you could inline that. This is arguably in-lining too much, but that can be an issue of taste (and requires taking care with indentation):
  echo "It should appeal to users of ", 
       commaSeparatedList( array_map( "hyperlink",
                                      array( "http://d20srd.org",
                                             "http://www.radford.edu",
                                             "http://google.com" ) ) ),
       ".";
     
2 This is a function that we declare in the middle of the line; note it doesn't even have a name. We don't call this function ourselves: we are giving the function to array_map, and they'll use the function as they see fit — in this case, calling the function on every element of the array we're also handing them. In truth, the implementation of array_map is potentially very simple:
function arraymap( $arr, $func ) {
  $result = array();
  foreach ($arr AS $k => $v) {
    $result[$k] = $func($v);    // call the function we were handed, and store the answer in our array $result.
    }
  return $result;
  }
.      

logo for creative commons by-attribution license
This page licensed CC-BY 4.0 Ian Barland
Page last generated
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.