I see more and more people raising the questing about the need for blog search engine, especially when Google is doing such a great job finding good content from blog as well as from web sites. It seems like that Google itslef is not investing too much in their blog search too. So, in this post I will explain what I think should be the duties of a blog search engine and why I still see a need for one.
Blog search engines (should) serve multiple purposes
- Finding great bloggers, blogs and blog posts
- Recognizing great bloggers, blogs and blog posts – rank.
- Categorizing blogs and bloggers in multiple ways not limited to content type. Categorize blogs by their objectives: personal blogging is not the same as corporate blog or professional bloggers, subject expert, politics, go green, artist or others. It is not just about what that the blogger writes about but also about what the blogger is trying to achieve.
- Monitoring blog and blogger progress – is this blog alive? a shooting star?
- Web-now – see Twitter Search Trending Topics, Twingly’s Hot right now or Technorati’s what’s percolating in blogs now
- Alerts – a list of new blogs in a given category that are doing well
- Community building – increasing cooperation among bloggers (e.g. you should read this blog)
What do we need to know?
- The top bloggers in a category
- The top blogs in a category
- The top blog post in a category
Who needs it?
- The readers – to know what to read, what is going on in real-time
- The blogger
- To present a case to a sponsor
- To know whom to look up to
- To see and share about the blogger progress
- The business
- To know where to buy ad real-estate or whom to sponsor
- PR – where to spend my effort effectively
The challenges of blog search engines today.
Using the reaction counting method for ranking, the service needs to distil humans actions from automated (bots) one in order to be accurate. So far this is not working well and adds another questions mark around the validity of blog search engines.
Here are some example for both:
- Human reactions
- Blog post reacting to another
- Update on Twitter or Jaiku
- Digging on Digg
- Submitting to social bookmarking site
- Posting on a social network
- Bloggers community
- Bot reactions
The number of sites that offer posting of human blog reactions is growing faster than the crawling capabilities and sometime does not offer access to crawlers.
The service should also remove the “me” links from the count i.e. links from all the social object under the same owner.
A couple of thoughts
Maybe someone could think about another way to rank blogs and bloggers. Measuring traffic is probably a more accurate way (Alexa). The traffic is relative to the category. I assume that a blog about Technology will get more traffic than a blog about biology. The rank should be within a category and not across all (or not just across all blogs).
In my opinion there is a need for blog/blogger search engine but the emphasize of the search capability should be less around finding content (leave that to Google) and more about discovering leading blogs and bloggers.
It does not need to be a free service at least not for the business. The premium or a sponsor account model could work as well.
What is BlogMon?
You can find here results from a small application I wrote for monitoring Technorati rank changes over time. I find these blogs and bloggers in two ways:
- The first is the same way as you add links to your favorites when you like the content. I browse, I like it, I add it to the list of blogs to scan. If you follow me on Twitter @kerendg and your blog is listed on your Twitter About section I will be tempted to monitor it.
- The second is pragmatically by “crawling” the data in a certain way that is helping me to find more great resources.
I see it as if I have “dynamic Favorites” – a list of blogs that is worth to get back to. This is great way to find who is consistently getting the crowd’s attention. I know that it seems simple but I have my share of complexity dealing with multiple constrains and “interesting” data.
The patterns reported using Twitter (on daily bases)
I plot daily results to my Twitter @Blogmon account
- Rank Change: more than 25% gain in rank since baseline and at least 7 scans (snapshots)
- New High: more than 35% gain in rank since the first Rank Change report. The application reports the first time and then if the blog reached a new high the tool will report only every 30 day since last report.
The information reported to BlogMon results web site
The A-list blogs consists of bloggers with Technorati rank under 1000 (in other words the subset of the top 1000 bloggers that I monitor).
I publish monthly results to BlogMon Results for the A-list, and for the rest here. The rest are bloggers that moved up in rank more than 50% since the baseline (the first time I started monitoring them).
Speed: calculates as the percentage gain in rank divided by the number of days it took to make it.
Perfect records – these are the rare blogs that are moving up consistently. The conditions are at list 9 scans (snapshots) and more than 20% gain in rank.
You got nothing to loose – The only way is up^
Joining in – if you’ de like me to monitor your blog please submit your name and blog URL in the small form on the BlogMon home page. EMAIL IS OPTIONAL!!! – I don’t need it for monitoring you blog. I do need your Blog URL. Please add it in the Comments section.
Your blog need to be claimed on Technorati.
I only report if your blog is going up. So, if you are not doing so great, now, your blog will not appear in the statistics. So, there is nothing to loose by joining in. If the blog is not going up in more than 30 days I will slow down the scans for this blog from every other day to every week. So, if your blog is making good progress you can show it off (or charge more from your sponsors).
I’m using Weebly to build the Blogmon Results web site. It is a great service that saves me a ton of time. I use Google Docs to generate the spreadsheets. I can publish them online and then embed the table as an iFrame inside the web pages – another great time saver.
- I monitor 794 blogs. I scan 252 blogs weekly. The rest every other day.
- Most of the blogs reside between 1-100,000 rank
- In August I had 70 blogs that were making good progress (more than 50% than the baseline rank).
- in August I had 22 blogs on the A-List
Any ideas how to make it more useful and interesting?
I still have a lot to learn about semantic search engines and the semantic web but I have few early observations about its direction.
I belive that Semantic search engines will do a lot of work in the back-end to help people with simple queries to find deep meanings.
I don’t see my dad going online and typing something like “where did US president’s kids went to school?”. Not because of the remote subject but because of the complexity. Most people type one or two words the most inside the search edit box. Maybe in the future he will ask such question using his own voice (see Nuance).
So, my assumption is that semantic search engines have different roles:
- In the front-end
- Research – A tool to to find patterns, trends and sets. This tool should provide multiple visual ways to present the results (map, heat map, timeline, bubbles, tag cloud, 3D). Use case: I want to expose these and show them to the world in a clear visual way (example). I like to present them on my blog or web site (iFrame/Widget it). This is a great tool for researchers, news reporter, and bloggers. TechCrunch, ReadWriteWeb, and others do present such data on their blog periodically – example. Again, this is not for the average user.
- Time saving: Operator is a Firefox extension that adds the ability to interact with semantic data on web pages, including Microformat, RDFa and eRDF. You can use it to extract and export contact from a web page to your contact list, and event information to your calendar. It is still hard to find sites that supports microformats today (examples that do: Technorati – search page results, Google Map, LinkedIn – contact) but maybe Dapper will change this. You can now take an HTML page and automate adding microformats classes using this tool – let the tool find recurring elements in the origin page. Dapper does the “semantic work” for you. For more about this tasks see this smartly titled blog post: Does Tim Burners Lee’s Blog support Microformats? By the way Operator is a great way to check which web-sites supports microformats (use the Operator sidebar: from Firefox menu bar choose View->Sidebar->Operator).
- As a back-end operation
- Organize – I want my data to be linked using semantic techniques – see Twine .
- Recommendation (discovery and sharing) – based on mining my data (and others) what you can tell me that: I don’t know that I don’t know.
- Time saving: people have trouble with tags. In my opinion, the main reason is that it is hard to come up with forward thinking useful tags that will help to find data later, to match other tags and to help search engines to drive more traffic to the article. I can spend few good minutes thinking, should I tag it as: social network, social networks, or social-network. So, automating this process will help in many ways: more tagging, quicker tagging, more common tagging=more links and association. Twine today suggests tags that I can pick from, yet I think that this is just the beginnings.
- Money – in a way Google does this in AdSense – matching content, and target readers with ads. If you manage to do better job in this area there is a great potential. There are more places to add ads: people profile pages (not just in Facebook – I have a ton of profile pages across many social services, sigh), maybe in comments.
- Passive search – Alerts – at this point Google allow you to set alerts for keywords and Delicious for Tags. This is another way (as Chris Brogan says) to listen to the web. Maybe there is a way to improve exact keyword matching with associated content – same as semantic recommendation engines works. This could be also useful to organize the alerts results – I get alerts about some keywords in one long list. I’m not sure how many people are using alerts but I think that this is a great tool. This is by the way another marketing channel – if I’m telling you what I’m looking for why don’t you “help me find it”? Alerts can become more sophisticated looking for patterns – example: more than 10 mentions of a word/phrase/product/company/my competitor in a day/week/month. I know that you can check that today on Google Trends but only for high volume terms, and it is not set as Alert – if you don’t look for it, it will not come to you.
Finally. Google today does a lot of “Microformat work” and some semantic discovery too, all behind the scene. You don’t ask for it explicitly but Google will still going to deliver it. When you search for ReadWriteWeb for example you’ll get along with the site link also the Contact, About, and Products information. If you’ll search for “movie near Lexington, MA” you’ll get what you need. This is too, support my assumption that you’ll not going to see sophisticated queries submitted by most people but the semantic web will be able to come up with better and more relevant answers to simple queries with complex meanings.
I wrote a guest post on Pravda on Media, Technology, and Rebel Filmmaking.
For more information about Kfir Pravda please read here.
The reason that I’m using Twhirl and not Twitter web page is similar to the reason why people are using install desktop applications for instant messaging, we need it right here and now, working asynchronously. I don’t want to keep on looking for my Twitter home page or keep pressing F-5.
What that is great about Twitter (it almost feel corny to write another post about it, don’t you think?) is that conversation sparks quickly and spread wide and far. See these two examples from Twitscoop: #thewaywewere and surname.
Twhirl is an Adobe Air application; a lightweight desktop client using Twitter API for getting and sending Twitter status updates, replies and direct messages. I know that there are many other similar applications out there but so far Twhirl seems to do a good job for me. Twhirl was acquired by Seesmic another interesting lifestreaming service few months ago. A great move by Loic Lemeur, Seesmic CEO.
Here are three suggestions for getting additional information on to my desktop via Twhirl:
- Twitter search Trending topics – this will help keeping me in the loop. Lately, I go to this page almost every day some times even more than once for my web-now treat. If you want to stay current on the day’s agenda all you got to do is to take a quick peek at this list. In some cases you’ll need to drill down to the conversation itself for better understanding of the context, but it takes reading only few updates to get it. One option is to mashup Twitter + TwitScoop. This combination brings to your desktop not just the Trending topics but also the volume and duration of the conversation around them. Just think about how a small change like this can get more people and quicker riding new waves of conversations.
- The number of followers – sounds trivial but it is something that I check occasionally and still have to go to my Twitter(kerendg) home page for getting this information. Twhirl Friends/Followers page is a good place to put it.
- Please make the @ and envelop buttons in different colors when I have an answer or direct message (red or green would work fine). I can mark them as read (for you to turn it back off).
To sum up this short post : Twhirl brings immediacy to microblogging. I barely install new software on my laptop these days (my OS loves me for that). Twhirl is an exception for two reasons:
- The name AIR (fantastic choice of a word by Adobe) implies something light and transparent (vs. windows application/thick client).
- I need it as close to real-time as possible.
So, bottom line is that I count on it to deliver more information supporting my lifestreaming and web right now experience using Twitter.
It sounds trivial, I know. Yet, I “see” people’s behavior on my blog that seems as if there is enough justification for this post. If you want to understand about blog’ elements I found this useful post Understanding and Reading a Blog (for Newcomers). I, want to focus on the sequence and not the blog parts and layout.
There are multiple ways to land on a blog post, one is following a link from another blog or comment left by the blogger, another way is checking a link from SERP(Search Engine Results Page). From what that I see, the people that land on my blog using search engine tends to see less of my blog. I don’t know if this is because my blog is that bad, they did not find what they were looking for, lack of time or because they do not know the difference between blog and web site. Because people that arrives at my blog from links spread across the web, by me or others, tends to check more blog posts on my blog, I think that there is some true to the later reason. This is encouraging and kind of supporting my assumption that many people don’t understand the blog concept and how to make the best of the blog reading experience.
So how to read a blog?
The most important thing to understand about blogs is that it is not just this current post that you landed on. If you are here to visit, there are few more things that you should consider checking before you leave.
- Click on the title or the Home tab – this will lead you to the Blog home page where the rest of the posts are.
- Then if you are not in a rush, check few more blog posts – scroll down. Maybe the one you initially landed on is not the best one.
- Check the Recent Post, and Top Post lists, mostly located on the sidebar – these will show you what is fresh and what others liked.
- Check the Recent Comment section – see who’s on it – read some comments. You can also follow the links by clicking on the commenter name.
- You can also explore the Blogroll- the selected links out of the blog – this is the blogger’s Favorites.
- Speaking about the blogger (in some cases bloggers) most blogs has an About page with information about the person that writes it. Take a peek, you may find something in common.
- Finally, if you really want to know more about the blog, you can search for it on Technorati or Twingly. There you can find other blog post are linked to this one. You can also find its rank in the blogsphere.
If you like the blog that you just read then you can subscribed to its feeds. You’ll be getting new content to your favorite feed reader as soon as they come out (like Google Reader or Netvibe). Just click on the image that looks like this : and copy the link. Some blogs offers email subscriptions too – I’m subscribed to few really good one like ReadWriteWeb. In this way I make sure that I don’t miss anything if I did not open my RSS reader (I do check my email periodically)
If the About page contains contact info there is a high chance that the author will be happy to connect with his readers, so don’t be shy and follow him on Twitter or become a friend on other social network.
The most important thing that I hope you took from this blog post is that a blog is not a single post. It is a collections of information from and about the blogger and if you have a chance to slow down a little the browsing rush, there is a chance to leave a blog with a little more than just information.
So what is your blog reading ritual?
In the past few weeks I’m working on a project trying to push our system (at work) throughput to a new level of scalability and performance. The motivation is a new vertical with potential enormous number of “transactions” per day. We already completed a similar project with the same aim more than a year ago and it allows us to deal with tones of transactions entering another new market. Last time it was done in a rush under very tight schedule but we made it. This time I have some time to think through and explore what’s out there. I’m looking at multiple solutions to scale database operations. We already abstracted servers and other resources to allow both vertical and horizontal scalability yet we are still “counting” on enterprise database solutions from vendors like Oracle and SQL Server to help scale storage operations. In the levels that we are about to deal with, it will cost a lot to our customers to install such storage environments. We like to squeeze more out of an existing one as much as possible so we can lower TCO.
So, to the point of this post – I’m looking at multiple ways to gather existing knowledge from the web about Scalability, Performance, Optimization, Utilization, Storage and more in one place. I like to create a knowledge base with the best resources available in this matter. Since I like to blog and explore new search technics I decided to look at Twine. I found it few days ago looking at my WordPress, Blog Stats page. I got some traffic from this site so I went to check it out. I’m still learning about this tool and I hope that I can accurately describe it. Remember, my first objective is to be able to aggregate as much knowledge as possible about scalability and performance.
What that Twine let me do:
- You can simply use it as a way to save bookmark – you can tag an article and the system too will offer some tags for you allowing you to remove undesired ones.
- You can join an existing Twine – I joined the Web Industry Trends where I can see articles saved by the members of this Twine. I can see comments left by others and add my own comments. I can see how many people viewed it. The system supports email and feed options per Twine.
- You can start your own Twine - Here you have full control of the content of the Twine. You are the Twine webmaster. So, I started the Scalability and Performance Twine. You can make it public and allow new members to join or invite others yourself. I made it so you can join by request. The system allows you to add all sort of items to this mini knowledge base, like: bookmarks, documents, notes, images and Video. The engine behind this application is using all sort off recommendations algorithm suggesting related Twines and Tags. I’m not sure if it suggest it or you’ll need to add yourself for these addtioanl options: Places, Organizations. The key thing is that you can organize information around a subject matter in one place with the help of others and sophisticated recommendation/search engine. It reminds me of wiki but without learning the specific wiki syntax and a powerful recommendation engine to help with the task. It is also great that someone owns the Twine and is responsible for making sure that only the best information is keep and only the members that truly contribute to the Twine remains associated with it.
The UI is easy to use and very intuitive. I see some places for improvement around the UI real estate utilization; too much scrolling in some pages like the profile page (make the profile picture smaller). Some pages comes a little slow like the My Twines overview page but this is easy to fix and as you can see in the upper left corner of this site is still in beta phase. I actually in there after asked and got a private invite.
Now, I hope that Twine will open up and launce it service publicly so more people could join in. When that happen of if you got a private invite and If you care about the subject of Scalability and Performance, knows about it and where to find good source of information please join the Twine service and my Scalability and Performance Twine. The more I use it the more I like it.
Update: you can find here a very helpful screencast given by Nova Spivack Twine founder. The sysetm is packed with functionality so this is a great way to get up to speed with it. Also, the Twine Bookmarlet is a great time saver adding bookmarks (aren’t you getting tired adding title, descriptions and tags for each bookmark saved?). This for me is enough reason to use the system (and I’m a fan of delicious the social booknarking site).