Archive

Posts Tagged ‘search’

Google’s search engine is the 21st infrastructure.

June 11, 2010 5 comments

Google’s search engine is the 21st infrastructure.

Search is infrastructure

When we think about infrastructure on a large scale we think about roads, train tracks, ports, and utilities – all things that are essential to the smooth running of our economy. Online searching has become so essential to our lives today that I think that we should add it to the traditional world infrastructure list.

Building and maintaining a search engine is so expensive and labor intensive that it requires the same kind of planning and upkeep that, say, the Golden Gate Bridge does.

I see two similarities between traditional infrastructure and search engines. The first is that a search engine is a mission critical system. The second is because the cost required for building and maintaining a good search engine is enormous—just as the costs are for ports, railroad tracks, and the electrical grid.

Mission critical system

Can you imagine a week without Google? Think for a moment how many times a day you use a search engine for a task. Life would be much harder without it. We are using a search engine to find a place, a person or a job. It is the same case when looking for information about a disease, a company or a product. Modern search engines also help to find directions, contact info, stock quotes and innumerable other things. I can’t think of a day without using a search engine (mostly Google but others too). Metaphorically search engines take us from one place to another (like planes, trains and boats), and if well designed and maintained they can save us an enormous amount of time and energy. But if that is not the case, they can be a big waste of time!

The mighty task

The web is big and expanding. In February of 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites (not pages). In March of 2009 (only two years later) the number had more than doubled, to 224,749,695. The number of web pages is more accurate than the number of websites but I think that the numbers above tell us enough about the size of the web.

New blogs are popping up every day, and blogs can post in some cases multiple times a day. With the recent introduction of microblogging services like Twitter and other personal life streaming tools, content is growing even more rapidly. The information is also dynamic: websites go down and pages are being constantly modified. Blogs allow people to leave comments over time. Content is much more than text and can include video, audio, and images.

A search consists of many steps. It usually starts with crawling – getting the data. This is a mighty task that requires building an army of web crawlers to spider the web. It requires a crawling plan using sophisticated algorithms looking for new content and also for keeping the stored ones up to date. It necessitates an immense amount of storage space and heavy computation resources.
The other tasks include indexing, lingual processing and ranking (for relevance and popularity). (If you are interested in learning how Google scales this process by breaking down tasks even further, read the following blog post about Google Architecture)

It is impossible to compare entirely, but it seems like building and maintaining a large-scale search engine is as hard as building a new power station and probably costs as much too.

Living with Monopoly

The purpose of this section is to get you thinking about my analogy and what it might mean.

The Monopoly question – do we need more than one search engine?

In some ways, a search engine industry might fit the definition of what’s known as a “Natural monopoly” (wikipedia):

  1. “…it is the assertion about an industry, that multiple firms providing a good or service is less efficient (more costly to a nation or economy) than would be the case if a single firm provided a good or service.”
  2. “It is said that this is the result of high fixed costs of entering an industry which causes long run average costs to decline as output expands”

Google could be defined as a natural monopoly.  It now has more than a 70% market share.
The first definition raises the question: why do we need to more than one search engine provider? The second could explain why only one provider may survive.

Why we don’t need more than this one?

I’m personally not concerned about Google’s monopoly power to set rates. As a consumer I don’t feel any pricing power:) but maybe the companies that pay for ads do.

I do have a couple of concerns: The first is about the cost to the country and the world of maintaining a search engine or duplicating the effort in a large scale.
The second is that because it is such an important and world critical system, more stakeholders around the globe should be paying attention.

High Energy cost

Here is an excerpt from Data Center Energy Forecast – Executive Summary – July 29, 2008.

“As of 2006, the electricity use attributable to the nation’s servers and data centers is estimated at about 61 billion kilowatt-hours (kWh), or 1.5 percent of total U.S. electricity consumption. Between 2000 and 2006 electricity use more than doubled, amounting to about $4.5 billion in electricity costs. This amount was more than the electricity consumed by color televisions in the U.S. It was equivalent to the electricity consumed by 5.8 million average U.S. households (which represent 5% of the U.S. housing stock). And it was similar to the amount of electricity used by the entire U.S. transportation manufacturing industry (including the manufacture of automobiles, aircraft, trucks, and ships)”

Google is making an effort to reduce the cost of their data centers’ energy bills. My concern is that having multiple Google size search engine companies around seems as wasteful as pooling multiple power lines to every home. I also think that the energy consumption should be distributed across the globe since the search engine serves the entire world and not only one country.

What will happen if Google goes belly up?

I know that this seems radical and almost unimaginable at this point, but what if one day advertisers find another place to buy ad-space other than SERPs? Our lives are so dependent on Internet search technology that if no one can pay for the cost of maintaining one, that would have a direct impact on the world economy.

Maybe we need a different solution?

To reiterate:
-Search is a very large task
-Search is costly
-Search has become essential to the modern economy
-Google is effective but it is a monopoly
Yet today it is so mission critical that we need to watch it closely or maybe even break it up.

Regulations

One way to deal with a mission-critical natural monopoly is to turn it into some sort of government-granted monopoly. In this case it is not the government but some sort of world organization that can enforce regulations and demands like:

  • More energy efficient data centers
  • Better storage solutions
  • Crawl to cover more ground – deep web
  • Accounting governance and building cash reserves.

I know that this might sound like a radical idea. Please remember, the purpose of this article is not to support a return to a controlled market but to get us aware of the cost, power and dependencies associated with search engines.

Explore alternative search technologies (similar to exploring alternative energy sources)

In addition to possible regulations, there are other ways to address the functions that a natural monopoly like Google currently serves:

  • Split the search task like crawling, storage and indexing and distribute them across multiple venors.
  • Create better crawling algorithmsCuil claimed to find a more efficient and scalable ways to crawl the web (it is not about Cuil it is about the idea).
  • Real-time search (conversational search) – If you believe that real-time search is the future than you already know that maybe there is no need for deploying such a huge crawling tasks in order to find great content. Let the crowd do the job.
  • p2p - distribute the the crawl, indexing, ranking and storage, across many search users. This technology mitigates the single point of failure risk and leverages existing unused computational resources.

Summary

The new president of the United States, Barack Obama, is leading his 21st Century New Deal with the hope that big investment in the country’s infrastructure will spur economic growth and prosperity. Online search has become a mission critical task in our lives. It has an impact on the world economy and energy consumption. I think that it should not be overlooked. To the traditional infrastructure list of transportation, telecommunication and energy we should add the 21st century infrastructure – online search engine.
In the same way that nations monitor the condition of their infrastructure, they should be looking at search engine implementations and technologies.

A few points that I like you to take from this post are:

  • A search engine is more than software
  • The tasks of building and maintaining new search engine on a large scale have an impact on society
  • Search is a global objective
  • We are heavily dependent on this technology
  • Google is a monopoly – for better or worse.

Do you share my opinion that search engines have an impact on the world economy?
Do you agree with me that Google is a mission critical system today?
Should we be worried if someone might duplicate the task of keeping a large portion of the web crawled, stored and indexed?

**This blog post was published before on AltSearchEngine.com (my guest post) and it is no longer available so I decided to publish it here again.

Picture credit to my favorite artist Ron Shoshani

Reblog this post [with Zemanta]

Real-time search – the missing piece

November 11, 2009 Leave a comment

Shifting the problem from finding content to finding people for search, discovery and filtering is not enough.

The evolution of finding new and engaging content:

Step 1: We started by searching for engaging content using search engine like Google or blog search engine/directory such as Technorati. These search engines operates web crawlers scanning the web for new information, then index (categorize) and rank web pages using different algorithms. As time went by we started adding blogs  feeds (using the RSS and ATOM protocols) to our feed reader of choice like Google Reader.
Results: with some effort we managed to find great bloggers to follow, but new content was slow to arrive, it was slow to discover, and even after awhile we ended up with not enough variety. No wonder it was a dead-end!
Step 2: step #1, plus finding the people behind the content, following their feeds on social media tools (twitter, FriendFeed, facebook etc.).
Results: initially, we got faster and richer content , but it got messy very quickly (especially when we auto follow back), it was also overwhelming at times, and lots of people share the same content (whether it is lame or great).  Add to the feed stream cacophonies the fact that people are using these channels for chatting with their peers, sharing thoughts and feeling, promoting their business/products/services and we end up with yet another dead-end!
Step 3: step #2, plus lists. Now we can group people into categorized twitter lists, and follow their tweets.
Results: Now, the content is a little less messy because we have more control over the data filtering. The process for building your own list is very slow and tedious at the moment, but you can use other’s lists via listorious or tweepml. On the flip side it requires coming up with a new process for scanning the lists timelines (how frequently? whom to give more attention? adding/removing tweeps), and you can easily end up with too many lists. The worse part is that the people on the list not always share just about the subject that matches the list category.  Bottom-line, it is somehow better than step #2 but not by much – another dead-end?

Content by people

In steps #1 we let the crawler to find and categorize the content and it was up to us to find it. In step #2 and #3 we shifted to people search and then we let them drive content to us. This time the crowd took care of the categorization tasks; finding and matching people to domains of knowledge. People categorized themselves and others, built many great lists, follow other lists (indication of popularity) and shared them for us to grab.

The shift

imageIn the process from #1 to #2 we shifted the content discovery problem to people discovery problem.  Due to this shift we gained big time in scale, arming the entire web community to search for new content. We accelerated discovery and knowledge gain. We also gained speed over RSS or the web crawler. Among the changes, going from steps #1 to step #3, the focus shifted from filtering content to filtering people (lists).

Small pause to recap: we have categorized content thanks to search engines and tags, we have people grouped by categories thanks to the people, but we still have a lot of noise.

The missing step

In my opinion, we are missing a step.  I think that we ought to get back to the computerized categorization. We need a crawler, to categorize and rank the data in the context of the list.
I would like to be able to filter list timeline view by: links only, discussion threads only, and even more important by content that matches the list’s definition in the first place.
If I follow a list that discuss mobile phone technology I want to see only mobile phone technology related content.

Picture credit orangeacid

Four ways to deliver value in your short tweets

October 7, 2009 1 comment

charged Here are four simple ways to create tweets charged with valuable information.

Valuable tweets

  1. Share news
    1. New tool, addon, web-site – like the mini launch of a new online real-time community, cliqset, or the re-launch of Pijoo as a purely content-driven service.
    2. New book, movie, album– the new Dan Brown The Lost Symbol (the bestselling author of The Da Vinci Code).
    3. Stats – up, down, on top – for blogs, for book, movie, album. Maybe how much a movie made in its first week.
    4. New Trending Topics – this is another way to break the news. Just look at TwitScoop tag cloud or Twitter Search Trending topics.
    5. Winning a prize – like announcing Hilary Mantel as the Man Booker Fiction 2009 winner
  2. Connect the dots
    1. For new book, movie, album – add link to previous work done by the same creator. Example: similar to the way I just did with the Dan Brown’s example above. I made the connection to his previous work (it is not that obvious in many other cases).
    2. For a person add link to multiple places where she has presence on the web. If we take the Hilary Mantel as an example point to her Facebook page
    3. For a book, movie, album add link to coming event – something that take a little longer than Book Signing, The Dan Brown Way
    4. For news – add other items that can help understanding the context better. This is especially useful for sport event. Having the context make it a lot more interesting.
  3. Connect people/Introduce
    1. For new band, author, producer, or actor, provide their twitter username (include the @). Example: The book the Lost Symbol has a twitter account @lostsymbolbook (administered by his US publisher, Doubleday)
    2. Point to hot discussion. Example: If you like book talks check #litchat
    3. Active #hashtag – not just the most active (and some times abused) from the Trending Topics. Find others from one of your twitter timelines or twubs.
    4. Engaging blog – blogs with lots of comments activity – use BackType. Example: I found this blog post Kiss “Sonic Boom” Review with 45 comments (the last one I saw was from October 6, 2009 at 10:25 pm). I searched BackType for CD review.
    5. Popular item: bestseller, popular on Glue, Amzon, B&N. Example: This is fairly trivial. Here is Amazon Bestsellers in Book page (hint: check how many days the book is in the top 100 – look for the more recent additions).
  4. Compress (encode/decode) greater knowledge into short messages
    1. The best example that I could find is @cookbook – tweeting tiny recipes condensed by @Maureen. The owner of this twitter account built a @cookbook glossary that helps to convert the encoded recipes to real one.
    2. The second best example is StockTwits – here too, people found a way to communicate more than what the 140 characters allows.

Why?

  • Because delivering value can really help you to get more followers on twitter
  • Because if you use Affiliate Marketing links you can truly assist in the buying decision.
  • Because it is a little more interesting than seeing the same 5 or 10 top bloggers being retweeted over and over again.

The secret for building valuable tweets

Closely examining my examples above, there are three key value drivers:

  1. Search – finding the data. Access to great and trusted content sources is value.
  2. Tying a couple or more data points together into a single piece of information (tweet).  Association is value.
  3. Timing. Relevancy is value.

Additional ideas for building valuable tweets: attention, and help.

What other ways do you see for charging tweets with value?

If you liked this post please consider buying my eBook on Scribd: Timing the tweet

Do you think that you can live without Google?

March 25, 2009 1 comment

InfrastructureHere is my latest guest post on AltSearchEngines blog.

Google’s search engine is the 21st century infrastructure.

A quick summary:

  • Search is a very large task
  • Search is costly
  • Search has become essential to the modern economy
  • Google is effective but it is a monopoly

It is similar to infrastructure on a large scale like roads, train tracks, ports, and utilities – all things that are essential to the smooth running of our economy.

Today it is so mission critical that we need to watch it closely or maybe even break it up.

Reblog this post [with Zemanta]

Search Engine is the 21st century infrastructure!

March 17, 2009 Leave a comment

Search is infrastructure

IndustrialNight Should we start looking at investments in building and maintaining search engines similar to other investment in infrastructure systems? I see two similarities. The first is that it is the next important thing in our digital lifestyle today after the hardware and software that connect our computers together. The second is because of the huge cost required for building and maintaining one.

The next most important thing

If you stop for a second to think about how many times a day you employ a search engine to accomplish a task you’ll notice that life could be way harder without it. If you need to find a place, a person or a  job online search is your starting point. It is the same case when looking for information about a disease, a company or a product. Modern search engines also helps to find directions, contact info, stock quotes and many more. I can’t think of a day without using a search engine (Google or others). When I think about infrastructure on a large scale I think about roads, train tracks, ports, and utilities. Metaphorically search engines take us from one place to another and if done right can save us a ton of time and energy. If done poorly it is a big waste!

Do you think like me that search engine have an impact on the world economy?

The mighty task

The web is big and expanding. In February of 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites (not pages). In March of 2009 Netcraft found 224,749,695. New blogs are popping up every day and blogs post in some cases multiple times a day. Recently with the introduction of microblogging services like Twitter and other personal life streaming tools, content is growing even more rapidly. The information is also dynamic: websites go down and pages are being constanlty modified. Blogs allow people to leave comments over time. Content is way more than text and includes video, audio, and images.

Search consists of many steps and usually it starts with crawling – getting the data. This is a mighty task that requires building an army of web crawlers to spider the web. It requires a crawling plan using sophisticated algorithms looking for new content and also for keeping the stored ones up to date. It requires huge number of storage place and heavy computation resources.

The other tasks include indexing, lingual processing and ranking (for relevance and popularity). If you are interested in learning how Google scale this process by breaking down tasks even further read the following blog post about Google Architecture.

It is impossible to compare but it seems like building and maintaining a large scale search engine is as hard as building a new power station and probably costs as much too.

Do you think like me that search engines have an impact on our energy resources and our environment?

Question and concerns

The purpose of this section is getting you thinking about my analogy and what it might mean.

The Monopoly question – do we need more than one?

In some aspect the search engine industry fit the Natural monopoly dual definitions:

  1. “…it is the assertion about an industry, that multiple firms providing a good or service is less efficient (more costly to a nation or economy) than would be the case if a single firm provided a good or service.”
  2. “It is said that this is the result of high fixed costs of entering an industry which causes long run average costs to decline as output expands”

Google could be explained as a natural monopoly.  It now has now more than 70% market share.

The first definition raises the question: why do we need to more than one?  The second could explain why only one may survive.

If you noticed in my language here I leave plenty of room for alternative options – it is on purpose. I know software and technology too well to surprise me. IBM was almost invincible at the time, Sun was not far from it too. Even Microsoft does not look as intimidating as it use to be. And if you believe that real-time search is the future than you already know that maybe there is no need for deploying such a huge crawling tasks in order to find great content.

I personally don’t have much concerns about Google as a monopoly now. As a consumer I don’t feel any pricing power:) but maybe the companies that pay for ads do.

I do have concerns about the cost of maintaining a search engine or duplicating the effort in a large scale.

High Energy cost

Here is an excerpt from Data Center Energy Forecast – Executive Summary – July 29, 2008.

“As of 2006, the electricity use attributable to the nation’s servers and data centers is estimated at about 61 billion kilowatt-hours (kWh), or 1.5 percent of total U.S. electricity consumption. Between 2000 and 2006 electricity use more than doubled, amounting to about $4.5 billion in electricity costs. This amount was more than the electricity consumed by color televisions in the U.S. It was equivalent to the electricity consumed by 5.8 million average U.S. households (which represent 5% of the U.S. housing stock). And it was similar to the amount of electricity used by the entire U.S. transportation manufacturing industry (including the manufacture of automobiles, aircraft, trucks, and ships)”

Google is making an effort to reduce the cost of their data centers’ energy bills. My concern is that having multiple search engine companies around seems as wasteful as pooling multiple power lines to every home. I also think that the energy consumption should be distributed across the globe since the search engine serves the entire world and not only one country.

Yet, what will happen if Google goes belly up?

I know that this seems radical and almost unimaginable at this point but what if one day advertisers will find another place to buy ad-space other than SERPs? Our lives are so dependent on Internet search technology that if no one can pay for the cost of maintaining one that could be a big regression with direct impact on world economy.

Should we do something?

Regulations

One way to deal with Natural Monopoly is to turn in into some sort of Government-granted monopoly. In this case it is not the government but some sort of world organization that can enforce regulations and demands like:

  • More energy efficient data centers
  • Improving crawl technics (Cuil claimed it has one)
  • Crawl to cover more ground –  deep web
  • Accounting governance and building cash reserves.

I know that this is radical – please remember, the purpose of this article is not to support going back to controlled market but to get us aware of the cost, power and dependencies associated with search engines.

How to break Google the right way?

I read somewhere that maybe Google should be broken up by the functionality it provides like search, email, maps etc… Another way to break Google is to take away the crawl and leave the rest. Something like the Yahoo BOSS model. The crawl should be done by a single non profit organization founded by multiple governments (i.e. tax money). In the same way as we pay for our education system (I know…it is not that great). Again, just think about it differently for a moment:)

The New Deal

I know that this is the most radical idea in this post but if search engine is such an important part of our infrastructure should our president, Barack Obama, include it in his 21st Century New Deal? At the least listing maintaning search engine as another infrastructure system. Maythe one that function relativly the best at this point.

Summary

The points that I like you to take from this post are:

  • Search engine is more than software
  • The tasks of building and maintaining new search engine on a large scale have an impact on society
  • Search is a global problem
  • We are heavily dependent on this technology
  • Google is a monopoly – for good and bad.
  • Maybe it is time to rethink the old way of crawling the web
    • How much data is collected but never used (SERP #200)?
    • Can people replace crawling (Social search engines/Twitter)?

Picture credit to my favorite artist Ron Shoshani

Reblog this post [with Zemanta]

Eight good reasons for using headup (Firefox add-on)

January 25, 2009 6 comments

Headup – the semantic web Firefox addon

I recently started using Headup. I’ve been looking for this kind of addon for some time now. When bits of information are missing from peoples’ profile pages, product specs, media, and other online content it is crucial to combine multiple data sources to piece together a complete picture. Headup does this!

Using its smart semantic mapping of entities and relationships Headup gathers and links information from multiple online sources. To complete the picture it then personalizes the results using your presence on multiple web services like Gmail, Twitter, Facebook, Digg, etc.
Headup is not only innovative in its semantic approach to linking data, it also integrates nicely with your Firefox browser and offers you a few ways to access the data it discovers. One example is Google searches: After installing Headup you can expect to see your search term annotated “Headup:[search term]” with a thin orange underline at the top of Google’s results. When your mouse hovers over the term a click-able circular plus sign loader will allow you to open Headup’s overlay  interface.

headup-topserp 

The starting point – googling eagle eye.

headup-eagleeye

The complete picture – headup-ing eagle eye

I recommend you visit Headup‘s website to learn how to use it but as a whole it’s pretty intuitive and I prefer dedicating this post to the reasons you should get it:

My eight reasons for using the Headup Semantic Web Firefox add-on :

  1. Because hyperlinks simply aren’t enough – Relying merely on arbitrarily selected outbound links that send you to find info related to the page you are browsing is limiting. There are more relationships among the different entities on the page that could be leveraged to retrieve associated information. Headup already mapped out these semantic links and makes them available for you in a neat and accessible interface. The experience doesn’t end with search results.
  2. Because you can save valuable search time - Both the user interface, and the way information is presented, require less clicks to complete an in-depth search through multiple search sources.
  3. Because the information comes to you – Search can be an exhausting task. In many cases it involves either a recursive drilling down into multiple levels, or traversing the search vertical up and down for additional information. Google itself is aware of this potentially laborious process and is making an effort to bring associated information to the first SERP: Recently when I googled the term “movie” I got three results that were movies playing in theaters in my area. Headup provides multiple data types as a default: Using Headup on the “Pink Floyd” will get you a summary relating to the term, the bands albums, see photos depicting it, listen to the bands songs while reading their lyrics, find news blogs and web activities related to it, and much more.
  4. Because it brings down the chances you’ll miss key information – “Headuping” people is a terrific way to learn more about them. I “Headup-ed” my friend Bill Cammack on Facebook and immediately discovered that he’s a video editor with an Emmy award to his name. In this case the extra information regarding the Emmy award was brought in from Bill’s LinkedIn profile.
  5. Because you can learn and find information you didn’t expect –  If the example from my previous item wasn’t proof enough here’s anoter example: I ran Headup on “Kill Bill” (what can I say? – I’m a Tarantino fan) and discovered this blog post published today (1-2-2009): “More Kill Bill on the way” – Tell me this isn’t cool!!
  6. Because it’s personalized – When configuring Headup after download, or later via the “Settings” option, you can choose to connect Headup to the online services you are subscribed to. Headup connects to a wide variety of web services like: Gmail, Delicious, Twitter, Facebook, FriendDeed, Digg, Last.fm etc. The information Headup retrieves from these services allows it to personalize the info it discovers for you: If you Headup a firm you’ll get friends of yours that work there. If you Headup a band you’ll see who in your network likes them. This is another example of how Headup is not just a search tool but a browsing experience.
  7. Because you don’t lose your starting point – Headup is designed as an overlay window that keeps your starting web page, and anything else you have open on your desktop, visible beneath the interfaces’ SilverLight frame. Inside Headup you can drill down endlessly, but when you’re done you are back where you started.
  8. Because your information is safe – from Headup’s Privacy Policy – “In plain English”:
“We here at Headup treasure our privacy and that’s exactly why we made every effort to create a browser add-on that would live up to user privacy standards we would be comfortable with. We’d be embarrassed to let you download an add-on we wouldn’t download ourselves.”
 
**You don’t need to sign-up for using Headup and your information is stored on your machine only**

 **Bonus: one additional reason – because on some pages it ROCKS! Try it on last.fm and you’ll see why it ROCKS…literally! By the way, the Headup user interface lets you watch videos and listen to music like a regular media player.

My questions for the Headup team

I plan on occasionally checking Headup’s blog for updates. At this point Headup supports Firefox on Windows and on Macs but I know that they plan to support more browsers in the future. I think that at this point the key thing to focus on is that the Headup concept works.

I do have few questions for the Headup team:

  1. Do you plan on adding vertical derived classifications? I can see some use cases for health (and maybe even for software development). Just as headup was able to map out “Actors”, “Films by the same director”, “Web Activities”, “Related News”, “Trailers”, etc. for a “Film” type entity. I can see it applied in a similar fashion for a “Health” type entity – retrienving things like: “Case”, “Treatment”, “Clinics”, “Pharmaceuticals”, “News Groups” etc…
  2. Do you see enterprise usage for Headup? I still need to give it more thought but having Headup in my email could be cool. Another possible implementation is supporting corporate CMS tools.

Epilogue – Is Headup’s “Top Down” approach the face of the future Semantic Web?

The Semantic web promises to make information understandable by machines. If you follow Alex Iskold‘s excellent series on Semantic Web on ReadWriteWeb you are aware of the multiple approaches to make this happen. The top-down method implemented by Headup helps brings the future to us a little sooner. I think Headup is giving us a taste of what future browsers will look like in an age when they, and other tools, will be able to understand more than just hyperlinks. When using Headup it feels like I’m doing more than “browsing” or “searching” I feel like I’m experiencing a new web!

One last thing: using Headup for some objects didn’t yield complete results. Don’t judge them too harshly for it, instead please focus on the concept. My experience with Headup so far is that in most cases the relevancy of the information provided was more than reasonable. I think that for a small company just out of Alpha what has been accomplished in the short time the company has exited is impressive and promises that improvements will be fast coming.

I’m using Headup and gave you the eight reason I have for doing so. If you are using it too I’d be happy to hear why…

Web presence – piecing together an Identity

July 31, 2008 3 comments

People leave missing information all the time. No blog About page, no employer name, no picture, no blogger name, Twitter account without web page link. Some time the simple link connection is not enough to piece it together. Your network too can help in finding connections or confuse people if your connections are spreads on more than one social network and accounts?  In some cases it is done intentionally and no harm done but in others cases when done by mistake it could lead to lost traffic and opportunities. From what that I see and through my experience most times if the information is not just there, only few will bother looking for it. Isn’t bringing these connections forward and bridging information gaps the role of the new Social Graph Search Engines?

This post will cover:

Finding out how objects are connected across multiple web applications. Overcoming cases when the information falls between the web cracks or is deliberately missing. Looking beyond the trivial context of web links (URL), friends, fans or followers.

  • Understanding the problems – some examples
  • Looking what tools can help piecing the missing information together
  • Bringing it forward – making it easily available when needed

Understanding the problems

Example 1 – me and my blog:

I did not add my blog URL to my LinkedIn profile. I did not add my employer’s name to my blog About page. I did it intentionally. I like to keep them separate for now. Omitting these two pieces of information seems to work so far. This missing information is not bridged by any social graph search engine that I’ve seen so far. It is ironic but a simple search of my name on Google will reveal the connection (warning: there are couple more Keren Dagans out there – both has nothing to do with software or technology). The connection in this case between me, my blog and my employer is my identity (similar profile info such as name, picture and location).

Example 2 – disconnected social networks:

I keep my Facebook and LinkedIn networks separate. I only have a couple of relatives overlap. I use Facebook for personal connections and LinkedIn for professional ones. I use Facebook sometime to post my new blog posts. It seems like social search engines can link between my friends across the networks, yet again, there is no association between me and my blog. Some information inside one’s activity can help making the connection.

Example 3 – multiple presence using disconnected accounts:

Case 1: My “personal” Twitter account is @kerendg. The other day I submitted a query to Twitter Search, searching for references to my other Twitter account @BlogMon. I found out that Stowe Boyd (stoweboyd) was asking “who is running BlogMon?”. The link to my blog is on BlogMon web page, and can be easily found using Twhirl or Twitter Search. I don’t know if he ever got an answer to this question or no. Soon after, I started following @stoweboyd using my @kerendg account (I don’t follow from @Blogmon). He did not contact me or follow-me on Twitter till this day- maybe because it is hard to make the connection or maybe it is just not that important. Your blog is an important piece of the new identity (FYI – WordPress using your blog URL for your OpenID).

Case 2: Some entrepreneurs runs both the start-up blog and their own blog. Only in few cases there is a link from the corporate web site to the entrepreneur’s (see Mashery  -> Blogroll for Praxis blog ran by their CEO, Oren Michels – even this is not easy to connect). Same story using Twitter accounts (one for the business and one for personal updates). In LinkedIn organization is a connection. This is true across networks in addition to your role.

Example 4  – blog action:

Case 1 – comments: I don’t get too many comments on my blog. I can only wish to get more. Yet, I did get some comments from people with vast web presence. Is this some kind of connection? Did I Digg/Saved one of their blog post ? Did I mentioned their companies? You give me your attention I see it as another type of connection.

I also leave comments occasionally, mostly on the same 4 or 5 blogs.  This information can help to understand my preferences.  Similar interest is a another piece in the puzzle. Past activity on my blog too. Frequent reader is yet another type of connection with the blogger.

Case 2 – traffic source and blog reaction: I look at WordPress, BlogStats page. The section of Referrers shows traffic that is not coming from two type of sources:

  • Unidentified – there is no way to track it back to the person that looked on my web site.
    • Traffic from search engines
    • Traffic from commends that I left (and there is no reply)
    • Traffic from “similar post” links
    • Traffic from incoming link to my blog from other blogs
    • WordPress tags
  • Identified – tracked back, with some luck
    • Traffic coming from some sort of a network like Twitter, Jaiku, Digg, reddit, StumbleUpon, Twine, Pijoo.

Since there is nothing to do about the first type of traffic (non invasively) there is nothing to add here, but for the second type: let the search begin…trying to find the source. Who dugg my blog? save it in reddit? is there a reply to my comment (like in Techcrunch comment threads)? Who mentioned it on Jaiku or Twitter? The information is scattered all over the net.

Most of my Twitters followers came through my blog. I added friends, people that acted upon my posts in other networks. I care about the origin reaction.  When it is actually possible to track it back to the source, it involves a lot of  leg work. Blog reaction is another type of connection . As I wrote about this before in “What is a blog reaction these days?” I don’t refer to “Blog Reaction” in the narrow definition of someone writing a counter blog post (pingback is a trivial link).

Some of the tools that are available today for piecing it together:

  • Google search(Web) – searching for the person’s name, organization association – this is enough to discover presence across multiple networks.
  • LinkedIn – searching for the person name and organization – looking at both profiles to cross reference with the other details to make sure that this is the same person.
  • Google Alerts  – add your link, blog name, your name, Twitter @account and link – looking for references.
  • Technorati – looking for blog info and fans  
  • Twitter – this is a process
    • If the information is coming from WordPress Referrer then you can follow it (unless it is coming from your account and not worth following)
    • If not – you can search your link as is but it is better to try hashing it using TinyLink or http://is.gd/ or http://snurl.com or other URL shortening services. Use Twitter Search for that. Tip: select in all Languages – I found out that if this setting is not on a link search will return nothing even if the update is in English – the default.
    • Use Twitter Search to search your name, and @account too.
  • Delver – network graph – this service can save some of the leg work checking multiple networks.
  • Jaiku – same as Twitter. What that is nice about Jaiku is that Google Alerts pick up on conversion.  
  • Flickr – some people like me don’t have account but do have tagged pictures submitted by friends.
  • WordPress Referrer – this is the starting point
  • Digg, reddit, del.icio.us and a like – go to your profile page and see who dugg/saved/rate your post

Do you know about more tools?

Now, wouldn’t it be nice if there was one tool that does all that and bring this information forward when it is most relevant. 

Bringing it forward

If this information could be gathered through single tool then my ideal solution is something like SnapShots. When I click on any account, link from anywhere on the web present me with the graph. Show me this entity’s web presence. Show me how can we connect? This person blog, other accounts. The information could be context sensitive – e.g if I’m in Twitter show me all Twitter accounts for the same entity – show me if we are connected through Twitter first.

Alternatively, send me an alerts about subtle semantic links to me and my blog. Something like.

  • This individual
    • from this location
    • working at
    • in this role
    • own this blog
    • x degree from you on Y network
  • Acted:
    • was once at your blog before
    • looked at your blog on this post
    • you profile in LinkedIn
    • your other Twitter account
    • respond to your comment
    • dugg your blog.
  • Options
    • Do you want to make a “trivial” link and connect?
    • Go to his blog

Summary

In this post I was trying to explain that if we want to build a complete social graph network it is not enough to look only on the “trivial” links. This is just the beginning. It will not present a full picture of one’s web presence and identity. In order to construct a useful graph there is a need to look at other type of links. These links are scattered across multiple social network and services. They are part of the enhanced meaning of one’s profile attributes including activities and relationship.

I see three steps moving in this direction. The first is piecing someone’s identity drawing on information from multiple sources. The second is using this information for finding new ways that people are connected i.e. building the complete and rich social graph. the third is presenting it when relevant.

I did not cover the uses cases for having such information at hand. On top of my head I can think of a few:

  •  It could be handy to QA web presence – especially if the entity is a business
  • It could be handy for web-sites to understand their crowd
  • It could be handy for business development

Six month check-up – large knowledge gain and some changes!

June 30, 2008 1 comment

After 6 month of blogging I decided that I want to continue doing it so I took one small step forward.

I purchased a new domain www.webnomena.com that is now redirected to my blog on WordPress hosting service. I changed the blog’s name from UsingIT to Webnomena (Web Phenomena) because I think that it is more consistent with my writings (you can guess that Webnomenon.com was taken).  I will continue sharing my observations about what that I think is happening in the web world with the focus on search engines, monitoring tools and the blogsphere.

At this time in my life I don’t find the time yet to “decorate” my blog or taking care of the hosting myself. I love to do so and the limitation using WP hosting service are a real drag (no iFrame, no Flash) yet I prefer to spend my time exploring the web and blogging. 

Few things that I learned in the past 6 month:

  • About blogging:
    • Blogging is learning – since I started I learned a ton about companies, products and technologies.
    • Blogging is helping me at my work – when looking for a new “skin” for the web app I now expect nothing less than Web 2.0 look and feel (lots of Java Scripts). Netvibe is an inspiration.
    • Blogging is fun – through this blog I had a chance to talk with people all over the world.
    • I can’t blog from the air  – I need to play, explore, interact, develop to come up with new posts.
    • Treat comment like blog post! – The power of a good comment is amazing. A good comment stands out. I spend the same amount of time for writing a single comment as I do for writing blog post.
    • I’m not an artist – I don’t write like some of the great bloggers/writer out there. I just hope that you will find the ideas and information on this blog usefully.
  • About searching.
    • We have plenty of new tools to find information – things that it took me long time to find, I later discovered more than once, were saved by lots of other people on del.icio.us. Some of the others you can find in this post I wrote recently: Is there a way around Google?
    • The business world is not really here yet - I think that there is a need for monitoring tools and analytics showing where the right places to deliver your messages are. I think that there is a place for a tool that will help PR and Marketing. A tool that will constantly gather information that can turn into action (actionable). Internet Marketing is not just Web Analytics. I did not see a company/tool that mash it yet. I’m not aware of an enterprise that use a tool like this.
  • About the web
    • Empowerment is one of the keys to viral adoption – see blogging, wikis and comments, rating and annotation, twitter following model vs. friends request,  Seesmic, Qik and video streaming from the phone to the web, API/Web services and mashups. Give people the power and they’ll use it!
    • What is the very next things? – let’s see if 6 month is enough to ask these questions.
      • Is it location aware social networks and search technologies? - the kick off is July 11th?
      • Is it Video in addition to photos transmitted from the cell phone to the web? – I was actually never excited about taking pictures from my mobile phone (the rest of the world was). I’m very excited about shooting videos and transmitting them in real-time to the web. It is not cheap yet, work well only using Wi-Fi and not when using the network but same as with blogging it empowers every on of us to become a filed journalist. I think that we will see more Qik like web sites proliferating or integrating with existing social networks.
      • Will Adobe Air take off? – this will help web developers to play on the desktop. I’m curios to know if accompanies with existing products considering this technology for a re-write.  

 

Now it is your turn…

Somewhere over the tag cloud…

 

Some blogs relay more than their content.

Some blogs emits all sort of personality characteristics.

Sometime you can get a hint about it from the tag cloud yet in most cases it is not there to be found. It is beyond the tag cloud. It is only after reading several posts on this blog and comments that you know what kind of experience to expect. It is as if you can predict how the “engagement” will feel like and what kind of “taste” it will leave you with after.

My first examples is TechCrunch. This is an excellent blog about start-up companies and technology .

Techcrunch - top tags

 

As you can see from the tag cloud the focus is on product’s and company’s profiles. What that you don’t see/feel is the tone, the ambition, the culture of hard working (the sweat), how competitive this blog is, the “ready to fight” stance, and more.

Yet, after reading this blog for sometime I know what to expect when I go there.

One interesting fact is, and maybe is the key in here, that you get a lot of these feeling not just from the blog post but from the comment section. It is how the communication between the readers and the writer going on that reveals it. If you want to get a quick demonstration go read Surviving the Net by Steve Gillmor  – read the comment section too to see what that I’m talking about.

Maybe if there was a tag cloud for the comments section we could see/feel the blog in its entirety.

My second example is Chris Brogan another excellent blog and blogger.

ChrisBrogan Top Tags

From the tag cloud you can see the focus on social media promotion for business. You can get a hint about the personality from the howto tag. Chris is willing to share a lot of his knowledge and educate the rest of us about the trade. You can also see that he cares about writing by looking at the writing and the article tags. Yet these are just hints and I looked for them knowing what to expect after reading this blog for a while now.

In this blog, too, the comment section reveals the crowd that is lured to this blog and the communication style. I always leave this blog with “good taste” in my mouth. 

I could provide some more examples but I think that you get the point. Some blogs takes away energy from you while providing important information in return. Some blogs charge you with both energy and valuable information and some blogs are great to “cuddle” with.

I don’t know if there is a way to mark it (personality tag) or search for it. If there is I can only suggest not to avoid looking at the comment section in this kind of analysis. Maybe these guys from Sweden (Jon Kågström and Mattias Östmar from PRfekt) with their attempt to analyze blogger’s personality will find the answer.

For now we can only share this with others using social network tools like Twitter, FriendFeed, Plurk and the rest.

Please share your examples it  in the comment section.

"The Blog Search" – single exit strategy? Thinking outside the search (edit) box.

I see new start-up companies still working on new search engines and search technologies. I can’t seem to see the reason why and the need for it when it comes to searching web sites. I do see a business need for new ways searching blogs, under some conditions.

Google is more than good enough for me. Most time I manage to find what that I need on Google search.

There are many basic things that Google does great beside text search like finding address, directions, maps and telephone numbers. They have built a highly scalable spiders, data center, ad sense technology and many other essential capabilities.

Do you really think that everyone can afford to build it?

The only exit strategy that make sense for these companies is one contingent upon buyout by one of the search giants like Yahoo, Microsoft, Ask.com and Google.

If I’m right in this case I would suggest to these start-ups not to focus on the amount of data indexed by their new search engine because no one will ever use them under their origin brand name. They will not be able to wave with great statistics to convince someone to buy them because the chances for them to reach wide audience are slim.

On the other hand they should focus on building high technology that could be patent protected, alternatively they should find a niche or geography that Google is very weak in.

I can only offer them to walk directly to one of the large search engine companies, knock on the door and offer the new technology for sale and then to spend their effort in another technology front.

                                     ExitStartegy

One place still worth making an effort in search technology is the blogsphere. Google is not so great over there as I wrote in here . Technorati is a fantastic blog search engine but it is mainly focused on English speaking blog readers and bloggers.

One blog search engine with promising future is Twingly. This is an example for a blog search engine that is strongly invested in other languages and have deep roots in European countries. See posts on TechCrunch here and here.

I recently signed up to the private beta and I like what that I see so far. I can’t speak much about the search quality but I can tell about few things that I liked. First is the spam free search engine approach. This is a working progress to find information based on few initial reliable sources and then spreading out using links from those known bloggers building a “white” links list to index. The second is the powerful support for multiple languages other than English. Yes, not all the bloggers are speaking English, there are  more than 140,000 registered Swedish blogs, tons of French bloggers to name a few. So there is a need. Even in the US, the big melting pot, there is a great market for multi-lingual blog search engines(like in Spanish). What that I really liked in this beta is the TechPlan section where people can offer suggestions for product improvements features and then allow others to vote on them. This is a great way to collect feedback. I can’t see why not leaving it out there, even after the beta is done. There are more capabilities like voting on blog posts, prefix a search phrase with tags like link:, site:, blog:, lang:, tag:, and tspan: for qualifying target searches.

So, where should Twingly invest their effort?

I have few ideas:

Be Cultural

Don’t try to be American. Europe has a classic culture that could be embedded in the search results. Alternatively Twingly can create countries’ specific web pages the same way that Technorati created the “what’s percolating in blogs now”. E.g. what’s percolating in [country name here e.g. Norway] blogs now?

Be Social

Find a way to bring bloggers from around the world together. Match bloggers from different places by area of interest and create a place for them to interact. Create events centered in Europe for bloggers to meet. Use the Twitter like follow model for bloggers to find one another.

Be digital – build the best Blogsphere system-of-record

As I wrote in here strive to monitor and collect any available data about blog posts, blogs and bloggers, measures, profile information, area of interest, methods, preferred media, activities, patterns. Make sure to organize the data in a way that it is useful for both the general public and businesses.

Well if Twingly grows on their own or not they still have high chances for getting on Google radar:)

Follow

Get every new post delivered to your Inbox.