New York Times Claims HTML5 is a “Pandora’s Box” of Privacy Risks

Alarmist rhetoric from news organizations about the web is nothing new, but today’s front-page headline on the New York Times still caught my eye: “Web Code Offers New Ways to See What Users do Online.” It’s about HTML5 privacy risks, and it’s a load of crap.

In the author’s rush to scare her readers, she picked the wrong target. The article claims HTML5 will make it easier for advertisers to gather information about you. Through context it’s clear that the cause of concern is the web storage specification (which isn’t actually part of HTML5, but we’ll let that slide). The only real-world example of a privacy risk presented is “evercookie,” a persistant tracking cookie that is very difficult to delete. Evercookie is a legitimate security concern, and if the article had focused on that it would have been fine, but by putting HTML5 in the crosshairs it completely misses the point.

Here’s what the article has to say about how HTML5 will allow advertisers to track you:

The technology uses a process in which large amounts of data can be collected and stored on the user’s hard drive while online. Because of that process, advertisers and others could, experts say, see weeks or even months of personal data. That could include a user’s location, time zone, photographs, text from blogs, shopping cart contents, e-mails and a history of the Web pages visited.

Assuming the article is referring to the web storage specification, this is all true — but you can also do all of this today, using existing tools. This is not some insidious new security threat that HTML5 introduces.

Hakon Wium Lie, the CTO of Opera, attempts to tone things down by pointing out that HTML5 “gives trackers one more bucket to put tracking information into.” One more — as in, the trackers already have many buckets. However, his quote is immediately followed by Pam Dixon of the World Privacy Forum — an organization that, as far as I can tell, consists only of Pam Dixon. She is quoted as saying “HTML 5 opens Pandora’s box of tracking in the Internet,” which is a wonderful scare quote, and I’m surprised they didn’t use it in the headline.

Even worse is the quote they got from Ian Jacobs at the W3C, who says “This is not a secret cabal for global adoption of these core standards.” Ian, I can see where you were going with that quote, but do me a favor. When trying to tone down an attack piece, don’t ever use the words “secret cabal,” even to say that you’re not part of one.

The crown jewel of the article is the section discussing Samy Kamkar, who is breathlessly introduced as having “creating a virus called the ‘Samy Worm,’ which took down MySpace.com in 2005.” It goes on to explain that he recently created the evercookie, an extremely persistant cookie that is intended to be difficult to delete. On the evercookie site, Samy explains:

evercookie is a javascript API available that produces extremely persistent cookies in a browser. Its goal is to identify a client even after they’ve removed standard cookies, Flash cookies, and others. evercookie accomplishes this by storing the cookie data in several types of storage mechanisms that are available on the local browser. Additionally, if evercookie has found the user has removed any of the types of cookies in question, it recreates them using each mechanism available.

Now, to be fair, the evercookie does take advantage of HTML5 techniques like Web Storage, but this is just one of many storage vectors it takes advantage of. It also uses an Internet Explorer storage feature, PNG files, and Javascript, but you don’t see the article fretting over any of those technologies.

Despite all evidence to the contrary, the article attempts to paint Samy as some sort of white hat hacker who’s just looking out for the common man by highlighting security risks. Let’s ignore the fact that his previous claim to fame was writing a virus that crashed a major website. He goes out of his way to point out that he could have sold the evercookie code to advertisers, but didn’t. Instead, he published the technique for free on his website! His argument is that this will provoke the browser vendors to create better privacy tools to combat his cookie.

Well, I’m sorry, but I don’t think Samy deserves a pat on the back for this. This is like saying “Hey, I designed this really neat gun that never runs out of bullets, but don’t worry! I didn’t sell it to our enemies, I just put the plans on my website for anyone to read. Oh, you guys should probably start designing some body armor.”

If you can get past the rhetoric in the article, it’s clear that the risk to your privacy is real, but it’s not from HTML5. It’s from guys like Samy Kamkar.

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.

I’ve Got My Head in the Cloud

It used to be that pretty much my entire life was on my computer. If my house/apartment/dorm burned down, I could lose everything — bookmarks, web development files, documents, graphics, software, photos, music. I invested heavily in storage, starting with endless stacks of floppies, then zip discs, burned CDs, and finally burned DVDs and external hard drives. Data loss was absurdly common, even so. I can remember several times losing entire hard drives’ worth of data when a computer crashed (or I formatted the wrong drive when reinstalling Windows).

So it’s funny to realize that I don’t think about that at all anymore. Losing a computer would be an annoyance (an admittedly expensive one), but I wouldn’t suffer any real data loss. My bookmarks are synced online. My photos are on flickr. My websites automatically send a database dump to gmail on a weekly basis. My email and documents are in google. My feed reader is online. My web development files are all stored in a Dropbox account or in a version control system like Github. In fact, just about the only files that I don’t already store online are my MP3s, but even they are distributed across various computers, iPods and iPhones, so losing a single computer wouldn’t really cause any serious losses. If I lost my home computer, I would lose about three months of photos, but only because I’m lazy and keep forgetting to upload the latest ones to Flickr.

10 years

The whole thing reminds me of this illustration comparing a 2000 iMac to a 2010 iPhone. In ten years, we’ve gone from my entirely livelihood being physically attached to a single computer to nearly everything being stored online, and any given computer is just the local copies of those files.

What Makes HTML5 so Great?

HTML5 Design Principles

When the W3C started working on HTML again in 2007, they posted a set of guiding principles for the new version, emphasizing compatibility, utility and interoperability. I’d like to highlight four of these principles that I think are especially important.

  1. Support existing content
  2. Degrade gracefully
  3. Pave the cowpaths
  4. Priority of Constituencies

In the process, I’ll explain why HTML5 is not just the latest version, but represents a fundamental shift in the philosophy behind HTML.

1. Support Existing Content

“It should be possible to process existing HTML documents as HTML5 and get results that are compatible with the existing expectations of users and authors, based on the behavior of existing browsers.”
W3C HTML Design Principles

Another way to put this is backwards-compatibility, and it almost didn’t happen this way. Without getting into too much detail, in the late 90s, the W3C was concerned that HTML was too forgiving of markup errors, and began shifting focus from HTML to the more draconian XML.

However, the browser vendors and web development community didn’t like the new direction, and formed a new group to evolve HTML. In 2006 the W3C admitted that they were wrong, and that they would work with the new group on HTML5,

As a result, one of the core principles of HTML5 represents the conclusion of nearly a decade of debate and politics into the simple idea that a new standard shouldn’t break existing websites.

2. Degrade Gracefully

“HTML5 should be designed so that Web content can degrade gracefully in older or less capable user agents, even when making use of new elements, attributes, and APIs.”
W3C HTML Design Principles

HTML5 includes several new types of form inputs, including phone numbers, URLs, and search. Modern browsers that support the new types deliver an enhanced experience, while older browsers treat them as plain text inputs.

This sort of progressive enhancement is not possible for every new feature, but codifying this ideal demonstrates the commitment to backwards compatibility.

3. Pave the Cowpaths

“When a practice is already widespread among authors, consider adopting it rather than forbidding it or inventing something new.”
W3C HTML Design Principles

HTML5 allows you to use either HTML style (<br>) or XHTML style (<br />) markup. Both approaches will validate, and which you use is a matter of preference.

It would have been easy enough to say “We’re back to HTML now, so XHTML syntax isn’t valid anymore,” but allowing both styles encourages more people to follow the spec.

4. Priority of Constituencies

“In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors.”
W3C HTML Design Principles

Finally, we have the priority list. Sounds like common sense, but since the vendors are part of the groups that draft the specs, they had a larger voice than we did. This guideline reminds the authors that they ultimately answer to the users, not the vendors. After all, if nobody follows the spec, it doesn’t matter if any browsers support it.

* Illustration by Dale Stephanos

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.

optimizeLegibility does not work with @font-face

Recently, twitter was buzzing with news of a CSS technique called optimizeLegibility that enables better kerning and font ligatures. It’s enabled by default in Firefox above 20px text, so you may have already seen it in action. I’d noticed the effect on my Talk Like Warren Ellis site (warning: possibly not safe for work language). I happily added it to my stylesheets, and was pleased to see the effect start working in Safari and Chrome as well. However, when I created the new Metal Toad site, it wasn’t working.

After running some tests, I found out that optimizeLegibility and @font-face don’t work together. I was able to verify that no matter how I tried to load the font using @font-face, even when linking directly to the .otf file, optimizeLegibility had no effect. But the instant I switched to a local copy of the same font, it works just fine. This is very disappointing, as @font-face has always been presented as working the same as loading native fonts, but in this one instance, they don’t work the same at all.

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.