home—lects—hws
D2L—breeze (snow day)
cookies
cookies
chapter 9
From PHP Visual Quickstart Guide by Larry Ullman
Originally based on notes by Jack Davis (jcdavis@radford.edu)
Http is stateless.
Each http request is independent of all others.
The server doesn't view its interactions as a bunch of phone calls;
rather each http-request is a post-card.
There is no inherent way of knowing, when reading a postcard,
what previous postcards it may be referring to.
One solution is that the person sending the postcard (the client)
includes a “re-cap”,
reminding the recipient (the server) who they are, and what they've already talked
about or agreed upon.
That's a cookie!
Of course, the recipient needs be wary of whatever the
sender claims was previously agreed-upon.
We'll (help) address that issue with sessions,
next lecture.
In particular, we'll use sessions that are based on cookies.
Summary: Cookies.
- Concepts:
- A way of hacking state, between two different http requests/visits.
- The server wants to remember something for a future visit, so it
asks the client to remember it for them, and to provide that info
on all future requests.1
-
This is a cooperation between client and server.
(In particular, client can choose to ignore requests to cooperate.)
- Technicalities:
- call set_cookie; see docs for its args/info
(including the sentinel expiration-date of 0)
- ...
- Pragmatics/Gotchas:
As larger more complex web sites are being built the
limitation of http as a stateless protocol becomes a
problem. Web developers have no built in (html) method
of remembering data from one page of an application
to the next. This is a serious short-coming, e-commerce
systems, user registration and login systems, and other
online services rely on this functionality. Fortunately,
maintaining state from one page to another is fairly
simple using PHP.
- Cookies
-
(Note: we will not use cookies directly, in this class;
we'll use sessions. But sessions are built on top of cookies,
knowing how cookies work are a requirement for understanding sessions.)
- Cookies are a method for the server to store information about the user --
on the user's machine -- so that the server can remember
the user over the course of the visit or through several
web visits.
-
A cookie is just a key=>value entry
(just like a php array entry, or javascript object property,
or a java.util.Map entry, or json line).
-
It lives on the client machine,
and is provided by the browser when a request is made.
A PHP script can access
cookies by looking in the super-global $_COOKIE.
-
Example:
If a php script tells a browser to
setcookie('hamburger-price', 2.70) today,
and then the browser visits that same site next Tuesday,
the php can evaluate $_COOKIE['hamburger-price']
and get back 2.7.
-
In addition to the name/value pair, a cookie also has:
an expiration date, a domain, and a directory path.
Whenever a browser requests, a page, it attaches all cookie key/value pairs,
and sends those to the server — if the domain and path match.
So if you visit a page that performs a
setcookie('hamburger-price', 2.70, '/~ibarland', 'php.radford.edu'),
and later you visit
https://php.radford.edu/~ibarland/someDir/somefile.php,
your browser will attach the hamburger-price cookie.
But if you visit
https://php.radford.edu/~jcdavis/someDir/somefile.php,
the browser won't attach the cookie.
Similarly if you visit
https://rucs.radford.edu/~ibarland/someDir/somefile.php,
the browser won't attach the cookie either.
(Note that setting a cookie for '.radford.edu' (note the initial .),
this refers to all machines within the radford.edu domain.)
Don't make two different cookies with the same path, but different domains
(one a superset of the other).
Different browsers may choose differently, which one gets sent.
(AFAICT: the more specific path wins; but for same paths with
two applicable domains, the first cookie set made wins.)
It's not exactly advisable to make two different cookies with
the same name but different paths either, though that may not be enforceable,
e.g. /~ibarland and /~jcdavis may each contain
different scripts that happen to use the cookie “monster”.
Note also that if I set a cookie's path to be /,
then this is potentially a security flaw:
if somebody visits my script and I set a cookie secret-code-word
with server&path being radford.edu & /,
and then that person visits (say) radford.edu/~jcdavis,
his script will be sent the cookie that I had saved.
(He's a sly one, that jcdavis!)
Upshot: don't set the cookie's server&path to include URLs that others control.
-
You can look at the cookies on your machine itself:
e.g.
Chrome > Preferences… > Show advanced settings… Privacy >
Content Settings… > All cookies and site data…,
and then use the search-box for (say) radford or amazon
- The full parameter-list lets you set all the info seen above:
setcookie(name, value, expiration, path, domain, secure, httponly);
- name — (required) cookie name
- value — (required) limited to 4KB of data, string
- expiration — (optional) used to
set the expiration-date for the cookie, as a unix timestamp2.
Often, one takes the current-time plus some amount:
setcookie('my-c-name','some-value',time() + 3600);
Setting the expiration-time as 0 is a special sentinel,
which means to expire when the user closes their browser.
path and domain (optional) parameters are used to limit a cookie
to a specific folder in a Web site (the path) or to
a specific domain, so this might be used to limit a
cookie to a subdomain, such as learn.radford.edu.
Using the path option, you could limit a cookie to exist only
while a user is in, say, the user/jellybeans folder of
the domain:
setcookie('name','value',time() + 3600, '/user/jellybeans').
Now, that cookie won't be shared even w/ most other scripts on the same host
(provided the browser elects to use cookies as-intended).
secure value (optional) dictates that a cookie should
only be sent over a secure HTTPS connection. A value
of 1 indicates that a secure connection must be used,
whereas, 0 indicates that a secure connection isn't
required.
setcookie(name,value,time()+3600,'','',1);
By default, you should use this option unless
you have a specific reason not to.
httponly (optional) — can be used to restrict access to
the cookie (for example, preventing a cookie from being
read using Javascript) but isn't supported by all browsers.
By default, you should use this option unless
you have a specific reason not to.
This last flag reminds us: if the browser has a bug where
it might give out cookies to other sites, or an attacker
can gain other access to the folder where cookies are stored),
then there is a privacy vulnerability.
It is essential to remember:
Cookies are created by the server, but stored on the client machine.
This explains why you have to call setcookie (rather than
just assign $_COOKIE['hamburger-price'] = ...):
You actually want to send cookies to the browser to remember (which is what
setcookie(…) does, and what assigning
to an array can't do).
It also explains why:
You must call setcookie before sending any other HTML —
the set-cookie info is sent to the browser as part of the http header, which
must be sent before any of the HTML data.
- Cookies have gotten a bad rap because some users
believe cookies magically allow anybody to learn any information.
While there are valid concerns,
a cookie can only be used to store information that you give it.
Limiting third-party cookies,
using secure (https-only) cookies,
and
disallowing javascript access (httponly) to cookies
are all wise ways of limiting access.
(And, they should be the defaults unless the programmer
requests otherwise, but that's not the case.)
- Script Example
cookies-customize.php
- Deleting a Cookie -
Delete the existing cookies by sending blank cookies and complete the PHP code.
setcookie('name','',time()-600);
Better yet,
to try to delete cookies even if the client's clock is wildly off
by hours or months or years:
set the timeout to 1 (one second after the epoch-start).
You might think of setting it to 0, but remember that that value is
used as a sentinel, meaning end-of-browser-session expiration.
Third party cookies
Example:
Remember, (hosted) images are often stored on a different server than
the page's text/html data.
Cookies can be set on any http request, including retrieving images!
-
When you request cia.gov, the response includes the
html “<img src='http://lotofbanners.com/qwerty.jpg' … />”.
-
Sure enough, your browser makes a request to lotofbanners.com, which responds
with the requested jpeg, and also has your browser set the
cookie victimID = 47 for the domain lotofbanners.com.
-
Later, you request mediawiki.org, whose response also include
the html for the same banner.
-
Sure enough, your browser makes a request to lotofbanners.com — but this time
your browser is passing along the cookie victimID = 47 —
and now that site knows that whoever victimID=47 is, that person has seen their ad twice.
(In this example, we're presuming that lotofbanners.com is storing the exact ads seen
by each victimID.)
This doesn't seem too bad — as written, lotofbanners.com doesn't actually
know who you are,
just that the same person viewing the current banner has previously seen certain other banners.
But this can be leveraged:
If they name their banners “qwerty-for-cia.jpeg” and
“qwerty-for-mediawiki.jpeg”
and so on, then
they can now know,
out of all this sites they give banners for,
which of those sites you've visited (and when).
Note that separately, just knowing a large chunk of browser history can be suprisingly specific,
when you include
specific-amazon-products-looked-at,
which takeout-restaurant-phone-numbers you're looking up,
what political-candidate-webpages you're viewing, and
what medical-info-pages you look at
—
from this
it is a not-unreasonable-step that one could conceviably
narrow down, with decent confidence,
somebody's neighborhood, diseases, how they vote, and what their favorite pizza topping is.
BUT, it would require a single company to be hosting banners/ads for lots and lots of different
companies,
so perhaps this isn't too big a worry?
Well, one last thought:
huge numbers of websites outsource to google-analytics, to get info about usage.
These google-third-party cookies can be combined with the exact google searches you make
and your gmail contents,
which can give that company a vast trove of highly specific information.
It's a good thing they use their power for good only!
(… until NSA gives a court-order,
or just plain steals the data from wiretaps placed on intercontinental data trunks,
or a hacker-or-disgruntled-employee gets access to their database, …).
1Kinda like emails that start with
repeating/quoting the entire previous thread. ↩
2
Although php's setcookie is given a timestampe,
the representation actually in the http packet is
a formatted date string.
So there isn't any Y2K38 problem in http.
↩
home—lects—hws
D2L—breeze (snow day)