- 2010 FIFA World Cup Final
- Movie near Lexington, MA
- Red Sox
- Life expectancy – the world is far from flat
- Time in India (or anywhere else)
- Currency conversions
- Flight progress
- Math and Physics: 2^64, pi, e, sqrt(100), The speed of light, Avogadro Constant
- Translate cool to Spanish and Chinese
- Free images for your your blog (don’t forget to add picture credit) – you can also use this link instead
- Mark R. Levin on twitter and the results
- How to (almost anything) – How to eat a jellyfish
- X vs. Y
Google’s search engine is the 21st infrastructure.
Search is infrastructure
When we think about infrastructure on a large scale we think about roads, train tracks, ports, and utilities – all things that are essential to the smooth running of our economy. Online searching has become so essential to our lives today that I think that we should add it to the traditional world infrastructure list.
Building and maintaining a search engine is so expensive and labor intensive that it requires the same kind of planning and upkeep that, say, the Golden Gate Bridge does.
I see two similarities between traditional infrastructure and search engines. The first is that a search engine is a mission critical system. The second is because the cost required for building and maintaining a good search engine is enormous—just as the costs are for ports, railroad tracks, and the electrical grid.
Mission critical system
Can you imagine a week without Google? Think for a moment how many times a day you use a search engine for a task. Life would be much harder without it. We are using a search engine to find a place, a person or a job. It is the same case when looking for information about a disease, a company or a product. Modern search engines also help to find directions, contact info, stock quotes and innumerable other things. I can’t think of a day without using a search engine (mostly Google but others too). Metaphorically search engines take us from one place to another (like planes, trains and boats), and if well designed and maintained they can save us an enormous amount of time and energy. But if that is not the case, they can be a big waste of time!
The mighty task
The web is big and expanding. In February of 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites (not pages). In March of 2009 (only two years later) the number had more than doubled, to 224,749,695. The number of web pages is more accurate than the number of websites but I think that the numbers above tell us enough about the size of the web.
New blogs are popping up every day, and blogs can post in some cases multiple times a day. With the recent introduction of microblogging services like Twitter and other personal life streaming tools, content is growing even more rapidly. The information is also dynamic: websites go down and pages are being constantly modified. Blogs allow people to leave comments over time. Content is much more than text and can include video, audio, and images.
A search consists of many steps. It usually starts with crawling – getting the data. This is a mighty task that requires building an army of web crawlers to spider the web. It requires a crawling plan using sophisticated algorithms looking for new content and also for keeping the stored ones up to date. It necessitates an immense amount of storage space and heavy computation resources.
The other tasks include indexing, lingual processing and ranking (for relevance and popularity). (If you are interested in learning how Google scales this process by breaking down tasks even further, read the following blog post about Google Architecture)
It is impossible to compare entirely, but it seems like building and maintaining a large-scale search engine is as hard as building a new power station and probably costs as much too.
Living with Monopoly
The purpose of this section is to get you thinking about my analogy and what it might mean.
The Monopoly question – do we need more than one search engine?
In some ways, a search engine industry might fit the definition of what’s known as a “Natural monopoly” (wikipedia):
- “…it is the assertion about an industry, that multiple firms providing a good or service is less efficient (more costly to a nation or economy) than would be the case if a single firm provided a good or service.”
- “It is said that this is the result of high fixed costs of entering an industry which causes long run average costs to decline as output expands”
Google could be defined as a natural monopoly. It now has more than a 70% market share.
The first definition raises the question: why do we need to more than one search engine provider? The second could explain why only one provider may survive.
Why we don’t need more than this one?
I’m personally not concerned about Google’s monopoly power to set rates. As a consumer I don’t feel any pricing power:) but maybe the companies that pay for ads do.
I do have a couple of concerns: The first is about the cost to the country and the world of maintaining a search engine or duplicating the effort in a large scale.
The second is that because it is such an important and world critical system, more stakeholders around the globe should be paying attention.
High Energy cost
Here is an excerpt from Data Center Energy Forecast – Executive Summary – July 29, 2008.
“As of 2006, the electricity use attributable to the nation’s servers and data centers is estimated at about 61 billion kilowatt-hours (kWh), or 1.5 percent of total U.S. electricity consumption. Between 2000 and 2006 electricity use more than doubled, amounting to about $4.5 billion in electricity costs. This amount was more than the electricity consumed by color televisions in the U.S. It was equivalent to the electricity consumed by 5.8 million average U.S. households (which represent 5% of the U.S. housing stock). And it was similar to the amount of electricity used by the entire U.S. transportation manufacturing industry (including the manufacture of automobiles, aircraft, trucks, and ships)”
Google is making an effort to reduce the cost of their data centers’ energy bills. My concern is that having multiple Google size search engine companies around seems as wasteful as pooling multiple power lines to every home. I also think that the energy consumption should be distributed across the globe since the search engine serves the entire world and not only one country.
What will happen if Google goes belly up?
I know that this seems radical and almost unimaginable at this point, but what if one day advertisers find another place to buy ad-space other than SERPs? Our lives are so dependent on Internet search technology that if no one can pay for the cost of maintaining one, that would have a direct impact on the world economy.
Maybe we need a different solution?
-Search is a very large task
-Search is costly
-Search has become essential to the modern economy
-Google is effective but it is a monopoly
Yet today it is so mission critical that we need to watch it closely or maybe even break it up.
One way to deal with a mission-critical natural monopoly is to turn it into some sort of government-granted monopoly. In this case it is not the government but some sort of world organization that can enforce regulations and demands like:
- More energy efficient data centers
- Better storage solutions
- Crawl to cover more ground – deep web
- Accounting governance and building cash reserves.
I know that this might sound like a radical idea. Please remember, the purpose of this article is not to support a return to a controlled market but to get us aware of the cost, power and dependencies associated with search engines.
Explore alternative search technologies (similar to exploring alternative energy sources)
In addition to possible regulations, there are other ways to address the functions that a natural monopoly like Google currently serves:
- Split the search task like crawling, storage and indexing and distribute them across multiple venors.
- Create better crawling algorithms – Cuil claimed to find a more efficient and scalable ways to crawl the web (it is not about Cuil it is about the idea).
- Real-time search (conversational search) – If you believe that real-time search is the future than you already know that maybe there is no need for deploying such a huge crawling tasks in order to find great content. Let the crowd do the job.
- p2p - distribute the the crawl, indexing, ranking and storage, across many search users. This technology mitigates the single point of failure risk and leverages existing unused computational resources.
The new president of the United States, Barack Obama, is leading his 21st Century New Deal with the hope that big investment in the country’s infrastructure will spur economic growth and prosperity. Online search has become a mission critical task in our lives. It has an impact on the world economy and energy consumption. I think that it should not be overlooked. To the traditional infrastructure list of transportation, telecommunication and energy we should add the 21st century infrastructure – online search engine.
In the same way that nations monitor the condition of their infrastructure, they should be looking at search engine implementations and technologies.
A few points that I like you to take from this post are:
- A search engine is more than software
- The tasks of building and maintaining new search engine on a large scale have an impact on society
- Search is a global objective
- We are heavily dependent on this technology
- Google is a monopoly – for better or worse.
Do you share my opinion that search engines have an impact on the world economy?
Do you agree with me that Google is a mission critical system today?
Should we be worried if someone might duplicate the task of keeping a large portion of the web crawled, stored and indexed?
**This blog post was published before on AltSearchEngine.com (my guest post) and it is no longer available so I decided to publish it here again.
Picture credit to my favorite artist Ron Shoshani
Here are my ramp-up tasks:
- Read through the Getting Started section
- Ramped up on Python – very cool and easy to use
- I learned to JSON using simplejson- it works nicely with python
- I’m now adopting new django for Python
- And I’m getting up to speed with a new data storage concept
All are great technologies.
Google App Engine and misc
If you have additional useful links relevant to the technologies listed above please let me know.
*I plan to update the additional useful sources from time to time as I find more content