New York Times Claims HTML5 is a “Pandora’s Box” of Privacy Risks

Alarmist rhetoric from news organizations about the web is nothing new, but today’s front-page headline on the New York Times still caught my eye: “Web Code Offers New Ways to See What Users do Online.” It’s about HTML5 privacy risks, and it’s a load of crap.

In the author’s rush to scare her readers, she picked the wrong target. The article claims HTML5 will make it easier for advertisers to gather information about you. Through context it’s clear that the cause of concern is the web storage specification (which isn’t actually part of HTML5, but we’ll let that slide). The only real-world example of a privacy risk presented is “evercookie,” a persistant tracking cookie that is very difficult to delete. Evercookie is a legitimate security concern, and if the article had focused on that it would have been fine, but by putting HTML5 in the crosshairs it completely misses the point.

Here’s what the article has to say about how HTML5 will allow advertisers to track you:

The technology uses a process in which large amounts of data can be collected and stored on the user’s hard drive while online. Because of that process, advertisers and others could, experts say, see weeks or even months of personal data. That could include a user’s location, time zone, photographs, text from blogs, shopping cart contents, e-mails and a history of the Web pages visited.

Assuming the article is referring to the web storage specification, this is all true — but you can also do all of this today, using existing tools. This is not some insidious new security threat that HTML5 introduces.

Hakon Wium Lie, the CTO of Opera, attempts to tone things down by pointing out that HTML5 “gives trackers one more bucket to put tracking information into.” One more — as in, the trackers already have many buckets. However, his quote is immediately followed by Pam Dixon of the World Privacy Forum — an organization that, as far as I can tell, consists only of Pam Dixon. She is quoted as saying “HTML 5 opens Pandora’s box of tracking in the Internet,” which is a wonderful scare quote, and I’m surprised they didn’t use it in the headline.

Even worse is the quote they got from Ian Jacobs at the W3C, who says “This is not a secret cabal for global adoption of these core standards.” Ian, I can see where you were going with that quote, but do me a favor. When trying to tone down an attack piece, don’t ever use the words “secret cabal,” even to say that you’re not part of one.

The crown jewel of the article is the section discussing Samy Kamkar, who is breathlessly introduced as having “creating a virus called the ‘Samy Worm,’ which took down MySpace.com in 2005.” It goes on to explain that he recently created the evercookie, an extremely persistant cookie that is intended to be difficult to delete. On the evercookie site, Samy explains:

evercookie is a javascript API available that produces extremely persistent cookies in a browser. Its goal is to identify a client even after they’ve removed standard cookies, Flash cookies, and others. evercookie accomplishes this by storing the cookie data in several types of storage mechanisms that are available on the local browser. Additionally, if evercookie has found the user has removed any of the types of cookies in question, it recreates them using each mechanism available.

Now, to be fair, the evercookie does take advantage of HTML5 techniques like Web Storage, but this is just one of many storage vectors it takes advantage of. It also uses an Internet Explorer storage feature, PNG files, and Javascript, but you don’t see the article fretting over any of those technologies.

Despite all evidence to the contrary, the article attempts to paint Samy as some sort of white hat hacker who’s just looking out for the common man by highlighting security risks. Let’s ignore the fact that his previous claim to fame was writing a virus that crashed a major website. He goes out of his way to point out that he could have sold the evercookie code to advertisers, but didn’t. Instead, he published the technique for free on his website! His argument is that this will provoke the browser vendors to create better privacy tools to combat his cookie.

Well, I’m sorry, but I don’t think Samy deserves a pat on the back for this. This is like saying “Hey, I designed this really neat gun that never runs out of bullets, but don’t worry! I didn’t sell it to our enemies, I just put the plans on my website for anyone to read. Oh, you guys should probably start designing some body armor.”

If you can get past the rhetoric in the article, it’s clear that the risk to your privacy is real, but it’s not from HTML5. It’s from guys like Samy Kamkar.

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.

What Makes HTML5 so Great?

HTML5 Design Principles

When the W3C started working on HTML again in 2007, they posted a set of guiding principles for the new version, emphasizing compatibility, utility and interoperability. I’d like to highlight four of these principles that I think are especially important.

  1. Support existing content
  2. Degrade gracefully
  3. Pave the cowpaths
  4. Priority of Constituencies

In the process, I’ll explain why HTML5 is not just the latest version, but represents a fundamental shift in the philosophy behind HTML.

1. Support Existing Content

“It should be possible to process existing HTML documents as HTML5 and get results that are compatible with the existing expectations of users and authors, based on the behavior of existing browsers.”
W3C HTML Design Principles

Another way to put this is backwards-compatibility, and it almost didn’t happen this way. Without getting into too much detail, in the late 90s, the W3C was concerned that HTML was too forgiving of markup errors, and began shifting focus from HTML to the more draconian XML.

However, the browser vendors and web development community didn’t like the new direction, and formed a new group to evolve HTML. In 2006 the W3C admitted that they were wrong, and that they would work with the new group on HTML5,

As a result, one of the core principles of HTML5 represents the conclusion of nearly a decade of debate and politics into the simple idea that a new standard shouldn’t break existing websites.

2. Degrade Gracefully

“HTML5 should be designed so that Web content can degrade gracefully in older or less capable user agents, even when making use of new elements, attributes, and APIs.”
W3C HTML Design Principles

HTML5 includes several new types of form inputs, including phone numbers, URLs, and search. Modern browsers that support the new types deliver an enhanced experience, while older browsers treat them as plain text inputs.

This sort of progressive enhancement is not possible for every new feature, but codifying this ideal demonstrates the commitment to backwards compatibility.

3. Pave the Cowpaths

“When a practice is already widespread among authors, consider adopting it rather than forbidding it or inventing something new.”
W3C HTML Design Principles

HTML5 allows you to use either HTML style (<br>) or XHTML style (<br />) markup. Both approaches will validate, and which you use is a matter of preference.

It would have been easy enough to say “We’re back to HTML now, so XHTML syntax isn’t valid anymore,” but allowing both styles encourages more people to follow the spec.

4. Priority of Constituencies

“In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors.”
W3C HTML Design Principles

Finally, we have the priority list. Sounds like common sense, but since the vendors are part of the groups that draft the specs, they had a larger voice than we did. This guideline reminds the authors that they ultimately answer to the users, not the vendors. After all, if nobody follows the spec, it doesn’t matter if any browsers support it.

* Illustration by Dale Stephanos

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.

Big news for web fonts and video today

WebM Video

The codec wars around the HTML5 video element might be settled sooner than you think:

Basically, Google just open-sourced VP8, a video codec. VP8 is being combined with the Vorbis audio codec to create a new video format called WebM.

This wouldn’t be news at all except that a ton of groups have already pledged to support it, including Firefox, Chrome, Opera, and Youtube(!). YouTube has committed to encoding EVERY video on their service to WebM, including the back catalog.

Given that kind of support, I would be shocked if it didn’t get back-ported into Safari, and then IE9 announced support as well. Whatever happens, this is worth keeping an eye on.

Typekit and Google

Google released a bunch of open source fonts (including the Droid fonts and Inconsolata, the finest monospace font I’ve ever used). They also released the Google Font API, which is really just Google doing all the @font-face generation and declarations, as well as encoding the fonts for all browsers.

Then Typekit announced that they were open-sourcing their javascript font-loading API, which fires events at various points in the font-loading process, so you can make a more consistent cross-browser experience. That library is now an open-source collaboration with Google, the WebFont Loader, and can be used through Google’s ajax library.

Pretty cool that Typekit would open their doors like this, and it speaks to their (and Google’s) commitment to making fonts easy to use for everyone, not just paying members.

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.

XHTML 2 is Dead

Wow, I didn’t see this coming. Zeldman reports that the W3C is not going to renew the XHTML 2 working group‘s charter this year. That effectively kills XHTML 2 in favor of devoting the resources to the HTML 5 working group.

This makes sense in that HTML 5 is already gaining traction, and we’ve seen lots of talk in the web community about switching back to HTML, but I never expected to see development halted like this.

“W3C hopes to accelerate the progress of HTML 5 and clarify W3C’s position regarding the future of HTML.”

Nice to know that the W3C is clearly picking a side and throwing their support behind one product, though.

Note: This was originally posted on my work blog, and I’m re-posting it here for archival purposes.