Archive for April, 2006

The BBC’s heavenly cloud of digital content

The Economist has a short piece about the BBC’s future on the Internet, following announcements of the Beeb’s plans to unlock their archives by providing access via the Internet to a treasure trove of programmes going back to the first half of the last century.

Its most ambitious idea is to go ‘on-demand’, making the million programmes it has produced since 1937 available to viewers online, mostly for free. Soon it plans to introduce a new service, BBC iPlayer, to allow people to catch up on programmes they missed on its main channels.

Potential downsides: some private sector players are pointing to the potential for this digital flood of Biblical proportions to have adverse effects for themselves, such as:

Opening up the BBC’S archive, mainly for free, could amplify the corporation’s market-distorting effect. Lots of popular past programmes will suddenly be available alongside its current shows. People have a limited time to goggle, and if they spend it watching old BBC favourites such as ‘Smiley’s People’, they will skip something else, which might include pay-TV or DVDs or TV financed by advertising.

Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Reddit
  • NewsVine
  • YahooMyWeb
  • Furl
  • co.mments

Comments

Google Sitemaps

Continuing our run of Google-related posts, if either of the following statements is true, you may be interested in taking advantage of Google Sitemaps:

  • You want Google to crawl more of your pages.
  • You want to be able to tell Google when content on your site changes.
  • From the Sitemaps FAQ:

    Google Sitemaps is an experiment in web crawling. Using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and speed up the discovery and addition of pages to our index. By placing a Sitemap-formatted file on your web server, you enable our crawlers to find out what pages are present and which have recently changed, and to crawl your site accordingly.

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments
    Tags: ,

    Comments

    Interpreting Your Webalizer Statistics

    Webalizer graphic

    Many of you will have hosting packages that come with the Webalizer statistics package as a standard feature. Those of you who track your site’s performance carefully will already know this stuff, and can skip this post.

    But for those who may need to get reacquainted with their site stats, I thought a brief refresher on the terminology used in your Webalizer reports might be in order.

    This extract comes courtesy of the Webalizer Quick Help page.

    Hits represent the total number of requests made to the server during the given time period (month, day, hour etc..).

    Files represent the total number of hits (requests) that actually resulted in something being sent back to the user. Not all hits will send data, such as 404-Not Found requests and requests for pages that are already in the browsers cache.

    Tip: By looking at the difference between hits and files, you can get a rough indication of repeat visitors, as the greater the difference between the two, the more people are requesting pages they already have cached (have viewed already).

    Sites is the number of unique IP addresses/hostnames that made requests to the server. Care should be taken when using this metric for anything other than that. Many users can appear to come from a single site, and they can also appear to come from many ip addresses so it should be used simply as a rough guage as to the number of visitors to your server.

    Visits occur when some remote site makes a request for a page on your server for the first time. As long as the same site keeps making requests within a given timeout period, they will all be considered part of the same Visit. If the site makes a request to your server, and the length of time since the last request is greater than the specified timeout period (default is 30 minutes), a new Visit is started and counted, and the sequence repeats. Since only pages will trigger a visit, remotes sites that link to graphic and other non- page URLs will not be counted in the visit totals, reducing the number of false visits.

    Pages are those URLs that would be considered the actual page being requested, and not all of the individual items that make it up (such as graphics and audio clips). Some people call this metric page views or page impressions, and defaults to any URL that has an extension of .htm, .html or .cgi.

    A KByte (KB) is 1024 bytes (1 Kilobyte). Used to show the amount of data that was transfered between the server and the remote machine, based on the data found in the server log.

    Common Definitions

    A Site is a remote machine that makes requests to your server, and is based on the remote machines IP Address/Hostname.

    URL - Uniform Resource Locator. All requests made to a web server need to request something. A URL is that something, and represents an object somewhere on your server, that is accessable to the remote user, or results in an error (ie: 404 - Not found). URLs can be of any type (HTML, Audio, Graphics, etc…).

    Referrers are those URLs that lead a user to your site or caused the browser to request something from your server. The vast majority of requests are made from your own URLs, since most HTML pages contain links to other objects such as graphics files. If one of your HTML pages contains links to 10 graphic images, then each request for the HTML page will produce 10 more hits with the referrer specified as the URL of your own HTML page.

    Search Strings are obtained from examining the referrer string and looking for known patterns from various search engines. The search engines and the patterns to look for can be specified by the user within a configuration file. The default will catch most of the major ones.

    Note: Only available if that information is contained in the server logs.

    User Agents are a fancy name for browsers. Netscape, Opera, Konqueror, etc.. are all User Agents, and each reports itself in a unique way to your server. Keep in mind however, that many browsers allow the user to change it’s reported name, so you might see some obvious fake names in the listing.

    Note: Only available if that information is contained in the server logs.

    Entry/Exit pages are those pages that were the first requested in a visit (Entry), and the last requested (Exit). These pages are calculated using the Visits logic above. When a visit is first triggered, the requested page is counted as an Entry page, and whatever the last requested URL was, is counted as an Exit page.

    Countries are determined based on the top level domain of the requesting site. This is somewhat questionable however, as there is no longer strong enforcement of domains as there was in the past. A .COM domain may reside in the US, or somewhere else. An .IL domain may actually be in Isreal, however it may also be located in the US or elsewhere. The most common domains seen are .COM (US Commercial), .NET (Network), .ORG (Non-profit Organization) and .EDU (Educational). A large percentage may also be shown as Unresolved/Unknown, as a fairly large percentage of dialup and other customer access points do not resolve to a name and are left as an IP address.

    Response Codes are defined as part of the HTTP/1.1 protocol (RFC 2068; See Chapter 10). These codes are generated by the web server and indicate the completion status of each request made to it.

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments

    The Google Bigdaddy Blues

    Empty whisky flaskThe Google Bigdaddy Blues (In G minor)

    Well, woke up this mornin’
    Page Rank was gone
    Yeah, woke up this mornin’
    My pretty lil’ Page Rank was gone
    Well if you see my baby Page Rank
    O’ please, won’t you drive it on home.

    Trad. arr: Blind Lemon Brin.

    (With humble apologies to any hardcore blues guys and gals reading this. :) )

    Right. I’ve had countless discussions with colleagues and strangers in pubs alike about Google’s apparent determination to become a ubiquitous presence in our online lives. They (Google) sure have the avid attention of just about every web entrepreneur or code hack I know.

    Google enjoys this privileged position largely thanks to:

  • it being the gatekeeper for an immense percentage of the immense amount of search activity carried out daily on the web;
  • the general dependence on Google by a great mass of site owners (yes like you and me) to attract qualified leads, enquiries and traffic to our e-commerce and advertising revenue-dependent web ventures;
  • naturally, the Adwords/Adsense advertising cash cow; and
  • the fact that the real white elephant in the corner is the immense value of the data to be mined and monetized by Google from its collection of years and years of search records, together with any associated data they’ve been able to subsequently capture, match and index. (Has any single advertiser ever held so much information/power, or been so well placed to capitalize on it into the future?)
  • All of these points are worth exploring, but it was the second that became most topical recently. If you didn’t already know, search engine optimization is a big industry and a deadly serious business to a great many web site owners and entrepreneurs, as you’ll see.

    We pick up the story on January 4, 2006 here on Matt Cutt’s blog. ‘Bigdaddy’ is revealed to be a new Google datacentre geared for a major update of the key Google search infrastructure and algorithm. In fact, Bigdaddy was/is intended to become the default source of Google web results. Witness Matt’s responses to these questions:

    Q: Do you expect this to become the default source of web results? How long will it take?
    A: Yes, I do expect Bigdaddy to become the default source of web results. The length of the transition will depend on lots of different issues. Right now I’m guessing 1-2 months, but if I find out more specifics I’ll let you know.

    Q: What’s new and different in Bigdaddy?
    A: It has some new infrastructure, not just better algorithms or different data. Most of the changes are under the hood, enough so that an average user might not even notice any difference in this iteration.

    OK. So Google proceeds to roll out Bigdaddy. Then the fun begins. It seems either some of those improvements under the hood do not play nice with quite a few of the search optimization techniques and principles applied assiduously out there over recent years to an ungodly number of web sites, or there’s a major shift underway in how pages will be ranked into the near future. Actually there most likely is, but that’s another story for another day.

    Fast forward to this thread over at webmasterworld.com. That gut-wrenching wailing and teeth-gnashing you can hear is coming from a great many unhappy web site owners who have placed maybe too many eggs in the Google basket. Take these comments excerpted from the thread referenced earlier - in some the desperation and angst is quite palpable:

    I’ve been frantically going over all the things i can do to explain to my boss what has been going on with google. Unfortunately we have not only lost several thousand pages from googles listing but also substantial numbers from MSN and Yahoo.

    Two weeks back we had 18000 total pages listed over google, yahoo, msn and alltheweb. Last week we dropped to just over 6000. Checking today we are at 5700.

    and

    Phah! 500,000 down to 44,300 right here! All turned supplemental after two years of good rankings. No I ain’t spam, no I ain’t scraper, no I ain’t MFA [Ed: means Made For Adsense] and no I ain’t an espotting affiliate …. even those are still ranking better than me! ….

    … unless Google has raised their unique content filter to “must have at least 90% unique content” I don’t have an explanation.

    Ouch. The take-home lesson is once again to build for your audience, not for the search engines. Which is not to say don’t optimize your sites for good rankings, but make sure you are catering properly for your human visitors first, not the bots and spiders.

    I know, that is much the harder type of optimization to do, but the alternative it seems is to risk being left high and dry when Google decide, as they will and probably must periodically, to change direction every once in a while.

    OK Blind Boy, bring it on home. One last time, with feeling …..

    Well if you see my baby Page Rank
    O’ please, won’t you drive it on home.

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments

    Alfred sez: “Automate your business”

    Alfred North Whitehead photoWhat Alfred North Whitehead really said was:

    Civilization advances by extending the number of important operations which we can perform without thinking about them.

    Nuff said. We agree! Thanks Alfred. :)

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments

    Quick Links: WebmasterWorld.com | Ajax

    Webmaster World Web Site logoFor geeks and entrepreneurial types: Like many of you, I’m guilty of taking for granted what a literal treasure trove Webmasterworld.com really is. For those of you not familiar with the site, here’s their mission statement:

    This site is a service to the web site administrator community. We are here for WebmasterWorld members to discuss the process of doing business on the internet.

    For a taste, I’ve dropped in their RSS feed link at the end of this post.* Read the rest of this entry »

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments

    Modern Design inspiration in a blog

    Modern Design Blog logo from http://modern.weblogswork.com/

    One of the great things about being a designer in the age of the World Wide Web is that you need never be stuck for inspiration.

    And often the best sources of inspiration can be found in work outside your own design field. Modern Design Blog looks like it might be a future favourite around here.

    Care to share your favourite places or things that help you bust through bouts of ‘designer’s-block’?

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments

    Blog helps GPS maker land big-bang publicity

    How’s this for PR money just can’t buy.

    On its blog GPS equipment maker Garmin has this testimonial from a happy customer singing the praises of one of their GPS units products.

    The customer is obviously happy, the product is clearly great, and there’s a great story to tell.

    Share:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • del.icio.us
    • digg
    • Reddit
    • NewsVine
    • YahooMyWeb
    • Furl
    • co.mments

    Comments