blufive: (Default)
[personal profile] blufive

[As some of you may have noticed, I'm a bit of a web browser weenie. Some of my bubbling-under posts are about web browser statistics. So, as a prelude, here's an updated version of something I wrote for a different audience some time ago]

Many common web statistics analysis packages are lousy at identifying browsers beyond a few market leaders. In the case of openly-published large-scale web browser statistics, browser detection/identification is almost uniformly apalling. To be fair, it is particularly difficult to do this job well. The only remotely accurate way to identify a browser from the server-side is to analyse the HTTP "User-Agent" header that the browser sends when it requests a file. Unfortunately, this isn't as simple as it may sound. Here are some of the problems.

Manufacturer Controlled Spoofing

In the early days of the web, when support for things as basic as images varied wildly from one browser to another, it became common for web servers to send different content to different browsers. As new browsers came on the market, they would sometimes send user-agent strings resembling those of a similarly capable competitor – usually a version of Netscape, the market leader at the time – so that web servers would allow the browser to access more sophisticated content.

This practice was particularly widespread at the height of the "browser wars". As a result, to this day, most browsers have a user-agent string that begins "Mozilla/x.x" – "Mozilla" being the internal product codename of Netscape's Navigator web browser. To add to the confusion, the name "Mozilla" has now been passed on to the open-source Mozilla project, which, in turn, uses a "Mozilla/5.0" user-agent, leading many browser detection routines to categorise it as the non-existent "Netscape 5".

Nowadays, of course, Netscape is no longer market leader so browsers imitate Internet Explorer, which in turn still mimics Netscape 4.0. Confused yet? You're not the only one.

User Controlled Spoofing

In the last few years, a combination of the overwhelming market dominance of Internet Explorer, poor support for standards in many older browsers, and security concerns for e-commerce transactions has led many websites to allow only one or two browsers to access their sites.

Many users are unhappy about being prevented from accessing sites of interest on the basis of the browser they are using, so some minority browsers have started to offer the user control over the user agent string that is sent. In most cases, when spoofing, these browsers provide enough information to allow correct identification by those in the know, but many crude browser detection methods will jump to the "wrong" conclusion, and let the user in.

The problem for anyone wanting to analyse browser statistics is that many stats packages are just as vulnerable to this deception as web servers.

Variable User Agent Strings

In addition to deliberate spoofing, most browsers have many different user agent strings, indicating differences in language support, release/service pack level, operating system, ISP customisation, and so on. To get some idea of what's going on , have a look at these May 2004 stats for a Server at a US university [warning: 1.6MB HTML file]. The bulk of that file is a list of all the different user agent strings collected by the server over the month. Over 17,000 of them. A vast number of those (I'd guess at about 7-8,000) appear to be different versions of Internet Explorer 6, and other browsers display similar variety, not to mention all the search spiders, download managers, etcetera.

Opera

Recent releases of Opera have built-in user-controlled spoofing, and are set to masquerade as Internet Explorer 5 or 6 by default (depending on the version of Opera). While it's an imperfect mimic (to allow clued-in people to identify Opera) many statistics packages are fooled by the disguise, and evidence suggests that most Opera users never bother to change this default, suggesting that many sources under-report usage of this browser. Many sources that do detect Opera do not distinguish individual versions.

Safari

Apple's Safari web browser has a manufacturer-controlled spoof, which designed to be mistaken for a Mozilla/Gecko-based browser. Like Opera, it includes enough information to allow easy identification by those in the know. Many public web stats sources are falling for the spoof, and including it with other Mozilla/Gecko based browsers, though the situation is improving

Mozilla/Gecko

Few sources correctly differentiate all Mozilla/Gecko based browsers. Again, this is a difficult task, as mozilla.org releases dozens of builds every day, all with different user-agent strings, as part of their test process. There are also several commercial and non-commercial entities (such as Netscape, IBM, Sun Microsystems, Debian and others) releasing browsers based on Mozilla code. It's also possible for users to alter the Mozilla user-agent string.

AOL/Windows and Other Internet Explorer-Based Browsers

Many stats packages do not accurately distinguish web client software that embeds parts of Internet Explorer from Internet Explorer itself. The most common example of this is the browser software that AOL provides to its customers, though there are others that take this approach, such as MSN Explorer, older versions of the CompuServe client, NeoPlanet, and many others. Most of the time, these browsers will behave much like Internet Explorer, but there are exceptions.

Platform

Few stats packages make a detailed distinction between user platforms, so we can't distinguish Internet Explorer (Mac) from Internet Explorer (Windows), or similar. This can be important, as (for example) Internet Explorer for Mac OS is not simply a port of the Windows code – it has a completely different rendering engine, which is much better at handling some CSS and standards-compliant code than its Windows counterpart.

Date: 2004-06-19 10:22 (UTC)
From: [identity profile] eggwhite.livejournal.com
A slight update for you: Opera has been identifying as Opera as standard since version 7ish... It's now on 7.51. Although you still have the opportunity to identify as about 5 other browsers, it always still includes "opera" in the user agent string (in versions from 5 onwards, at least) so that clever browser detection can pick it up as Opera instead of what it's IDing as... You've always had to hack .ini files to completely hide it's opera-ness.

Alas it's become a little buggy in this release, so even though they've fixed a lot of the user-interface problems I'm still finding myself switching to Firefox a little more often...

Date: 2004-06-20 00:50 (UTC)
From: [identity profile] blufive.livejournal.com
Hmm. Could have sworn I wrote that, but upon re-reading, I find that while I referred to it in a subtle and oblique manner, I didn't actually state it directly. Arse. Thanks for the copy-editing, I'll go back and fix it.

I kinda had an allergic reaction to the whole Opera 7.x UI design, so I've not really used it much. Mostly 'cause I'm a fan of the lizard, but you knew that...

Date: 2004-06-20 00:53 (UTC)
From: [identity profile] blufive.livejournal.com
[wakes up]

Oh, hang on. Do you mean they identify as Opera, with no disguise, by default nowadays? I'm sure the 7.0x series were set to IE6 by default, but I've no idea about later versions...

Date: 2004-06-20 11:56 (UTC)
From: [identity profile] eggwhite.livejournal.com
Well, I could be wrong as I've not done a fresh install, but I know that around the 7.xx mark somewhere they said they were changing it, and I've not had to tweak it any time lately to identify as itself.

Date: 2004-06-20 15:55 (UTC)
From: [identity profile] blufive.livejournal.com
I know that Opera 6 identified as IE5, but Opera 7 changed the available spoof to IE6.

I also know that Opere 7 stores its profile/preferences information outside the opera directory, so it's probably remembering the preference as you upgrade from one 7.x version to the next.

So I think the only way I'm going to find out for certain is if I uninstall O7 here, blow away my profile (no loss, I hardly ever use it) and reinstall. I'll file that under "things to do" as I'd been meaning to upgrade to something newer than 7.01 anyhow.

Date: 2004-06-21 02:03 (UTC)
From: [identity profile] eggwhite.livejournal.com
Hmmm... you could be right - I've just visited the forums and it looks like the "should we change it" discussion is still going on. Last time I looked (probably nearly a year ago) a decision had been made - looks like they changed their minds...

Profile

blufive: (Default)
blufive

April 2024

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-03-24 01:16
Powered by Dreamwidth Studios