I’m learning to crawl…
I’m learning to fly but I ain’t got wings
Coming down is the hardest thing – Tom Petty
This chart show how long is the tail climbing up the authority till the blog reaches more than 1000 reactions. Then the sky is the limit!
I was looking for a way to get more blogs for the blog monitoring project and was thinking about different ways to go about it.
So, I decided to write a blogger crawler. I decided on the following strategy for discovering new blogs:
Starting from one of the great blogs that I already discovered (manually) I will look for this blog’s inbound links (using Technorati Cosmos API call) and then keep only blogs from this list with equal or higher authority (than the starting blog) . My working assumption is that blogs with higher authority will leads me to higher quality blogs. The crawler then continues in this path recursively hopefully as high as possible. The stop condition is no more inbound links (blogs reactions) with equal or higher authority than the parent node (the current blog).
I believe that starting from each one of the blogs in my origin list will lead to a list of high potential blogs and doing it every now and then will help me to discover new great bloggers.
Once I’ll solve the blog categorization problem (Technorati don’t provide this information through their APY) I could also improve the crawl to find only blogs from the same domain of interest. In this way I can check blog’s visibility (and maybe reach/influence) in a category.
I did a test run today starting from one of the blogs from tier3:
The starting blog: authority = 16, rank = 517455
The crawler ended up finding 128 new blogs.
The top blog in the list is: Daily Kos: State of the Nation, authority = 10854, rank = 12
I think that this is not bad of a catch.
I wanted to continue but I maxed my Technorati API call for that day (500)
The theory behind this strategy is that it is easier to get on the radar of low authority blogs (upcoming) and then continue moving your message up. If these bloggers has already some visibility to higher quality blogs they may help to expend your reach. You can see how many comment I got from tier3 compare with tier2 and tier1.
Few more thought:
- I could add a check for freshness using the lastupdate date
- It is also possible to go top down – the outbound links should be discovered from a top blog (not using Technorati)
- The same approach could be added to Twitter – who follow you, from what field and what are their credentials in the blogspheres
I have more thoughts and observation after running my first test and I skipped some of the implementation details but I hope that you can see the picture.
I’m learning to fly around the clouds
But what goes up must come down – Tom Petty
Love to hear your thoughts.
Leave a Reply Cancel reply
- If you are looking for a great CBT group in the Arlington, MA area(wife). necbt.org 1 month ago
- #loveboston #loveusa #godblessamerica 2 months ago
- Glass Collective, new platform is open blog.pmarca.com/2013/04/10/goo… 2 months ago
|http://www.realtimea… on More on short URLs and are we…|
|green coffee on Web3.0 or WebRT|
- February 2013
- January 2013
- December 2012
- August 2012
- June 2012
- April 2012
- December 2011
- September 2011
- August 2011
- December 2010
- November 2010
- October 2010
- August 2010
- July 2010
- June 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007