Home > Method, Monitoring, Observations, Software > Search Engine is the 21st century infrastructure!

Search Engine is the 21st century infrastructure!

Search is infrastructure

IndustrialNight Should we start looking at investments in building and maintaining search engines similar to other investment in infrastructure systems? I see two similarities. The first is that it is the next important thing in our digital lifestyle today after the hardware and software that connect our computers together. The second is because of the huge cost required for building and maintaining one.

The next most important thing

If you stop for a second to think about how many times a day you employ a search engine to accomplish a task you’ll notice that life could be way harder without it. If you need to find a place, a person or a  job online search is your starting point. It is the same case when looking for information about a disease, a company or a product. Modern search engines also helps to find directions, contact info, stock quotes and many more. I can’t think of a day without using a search engine (Google or others). When I think about infrastructure on a large scale I think about roads, train tracks, ports, and utilities. Metaphorically search engines take us from one place to another and if done right can save us a ton of time and energy. If done poorly it is a big waste!

Do you think like me that search engine have an impact on the world economy?

The mighty task

The web is big and expanding. In February of 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites (not pages). In March of 2009 Netcraft found 224,749,695. New blogs are popping up every day and blogs post in some cases multiple times a day. Recently with the introduction of microblogging services like Twitter and other personal life streaming tools, content is growing even more rapidly. The information is also dynamic: websites go down and pages are being constanlty modified. Blogs allow people to leave comments over time. Content is way more than text and includes video, audio, and images.

Search consists of many steps and usually it starts with crawling – getting the data. This is a mighty task that requires building an army of web crawlers to spider the web. It requires a crawling plan using sophisticated algorithms looking for new content and also for keeping the stored ones up to date. It requires huge number of storage place and heavy computation resources.

The other tasks include indexing, lingual processing and ranking (for relevance and popularity). If you are interested in learning how Google scale this process by breaking down tasks even further read the following blog post about Google Architecture.

It is impossible to compare but it seems like building and maintaining a large scale search engine is as hard as building a new power station and probably costs as much too.

Do you think like me that search engines have an impact on our energy resources and our environment?

Question and concerns

The purpose of this section is getting you thinking about my analogy and what it might mean.

The Monopoly question – do we need more than one?

In some aspect the search engine industry fit the Natural monopoly dual definitions:

  1. “…it is the assertion about an industry, that multiple firms providing a good or service is less efficient (more costly to a nation or economy) than would be the case if a single firm provided a good or service.”
  2. “It is said that this is the result of high fixed costs of entering an industry which causes long run average costs to decline as output expands”

Google could be explained as a natural monopoly.  It now has now more than 70% market share.

The first definition raises the question: why do we need to more than one?  The second could explain why only one may survive.

If you noticed in my language here I leave plenty of room for alternative options – it is on purpose. I know software and technology too well to surprise me. IBM was almost invincible at the time, Sun was not far from it too. Even Microsoft does not look as intimidating as it use to be. And if you believe that real-time search is the future than you already know that maybe there is no need for deploying such a huge crawling tasks in order to find great content.

I personally don’t have much concerns about Google as a monopoly now. As a consumer I don’t feel any pricing power:) but maybe the companies that pay for ads do.

I do have concerns about the cost of maintaining a search engine or duplicating the effort in a large scale.

High Energy cost

Here is an excerpt from Data Center Energy Forecast – Executive Summary – July 29, 2008.

“As of 2006, the electricity use attributable to the nation’s servers and data centers is estimated at about 61 billion kilowatt-hours (kWh), or 1.5 percent of total U.S. electricity consumption. Between 2000 and 2006 electricity use more than doubled, amounting to about $4.5 billion in electricity costs. This amount was more than the electricity consumed by color televisions in the U.S. It was equivalent to the electricity consumed by 5.8 million average U.S. households (which represent 5% of the U.S. housing stock). And it was similar to the amount of electricity used by the entire U.S. transportation manufacturing industry (including the manufacture of automobiles, aircraft, trucks, and ships)”

Google is making an effort to reduce the cost of their data centers’ energy bills. My concern is that having multiple search engine companies around seems as wasteful as pooling multiple power lines to every home. I also think that the energy consumption should be distributed across the globe since the search engine serves the entire world and not only one country.

Yet, what will happen if Google goes belly up?

I know that this seems radical and almost unimaginable at this point but what if one day advertisers will find another place to buy ad-space other than SERPs? Our lives are so dependent on Internet search technology that if no one can pay for the cost of maintaining one that could be a big regression with direct impact on world economy.

Should we do something?

Regulations

One way to deal with Natural Monopoly is to turn in into some sort of Government-granted monopoly. In this case it is not the government but some sort of world organization that can enforce regulations and demands like:

  • More energy efficient data centers
  • Improving crawl technics (Cuil claimed it has one)
  • Crawl to cover more ground -  deep web
  • Accounting governance and building cash reserves.

I know that this is radical – please remember, the purpose of this article is not to support going back to controlled market but to get us aware of the cost, power and dependencies associated with search engines.

How to break Google the right way?

I read somewhere that maybe Google should be broken up by the functionality it provides like search, email, maps etc… Another way to break Google is to take away the crawl and leave the rest. Something like the Yahoo BOSS model. The crawl should be done by a single non profit organization founded by multiple governments (i.e. tax money). In the same way as we pay for our education system (I know…it is not that great). Again, just think about it differently for a moment:)

The New Deal

I know that this is the most radical idea in this post but if search engine is such an important part of our infrastructure should our president, Barack Obama, include it in his 21st Century New Deal? At the least listing maintaning search engine as another infrastructure system. Maythe one that function relativly the best at this point.

Summary

The points that I like you to take from this post are:

  • Search engine is more than software
  • The tasks of building and maintaining new search engine on a large scale have an impact on society
  • Search is a global problem
  • We are heavily dependent on this technology
  • Google is a monopoly – for good and bad.
  • Maybe it is time to rethink the old way of crawling the web
    • How much data is collected but never used (SERP #200)?
    • Can people replace crawling (Social search engines/Twitter)?

Picture credit to my favorite artist Ron Shoshani

Reblog this post [with Zemanta]
About these ads
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: