Archive for February, 2008

The path to the top

February 29, 2008 Leave a comment

I used the crawl method described here. This time the up crawl started from the Inside analytics blog. I was curios to see what will be the shortest path to the top.

The idea is to check in each step what blogs this blog has a chance to be visible to. By visibility I mean what blogs with higher rank and authority reacted in the past to one of the post posted by the examined blog.

I believe that armed with this information the blogger can find the right bloggers to interact with so they can propel his message as far as possible.

I started with a blog from the third tier one with rank above 100,000 and as you can see this blog can access blogs from the second tier after only two steps up and the first tier on the 8th step up.

The shortest path to the higher ranked blog is presented here:

0,Inside analytics,415686,20,2007-06-27
  2,The Copywriter Underground,47546,132,2008-02-14
   3,2k Bloggers: The Face of the Blogosphere,42394,146,2008-02-24
    4,Internet Lifesytle,40190,153,2008-02-27
     5,And the Legend Lives,30891,190,2008-02-25
       7,Lifecruiser Travel Blog,18159,282,2008-02-27
        8,Life in the Fast Lane,9061,473,2008-02-28
         9,Dumb Little Man,818,1842,2008-02-28
          10,Lifehacker, 7,16216,2008-02-28

From the same blog my crawler found few more path to top Technorati blogs – here is another example:

0,Inside analytics,415686,20,2007-06-27
  2,The Copywriter Underground,47546,132,2008-02-14
   3 2k Bloggers:The Face of the Blogosphere,42394,146,2008-02-24
    4,Internet Lifesytle,40190,153,2008-02-27
      5,And the Legend Lives,30891,190,2008-02-25
       6,Fil-Am Gallery,27355,208,2008-02-27
        7,My so-called-life…,25254,220,2008-02-25
         8,Bisdak Footprints,21958,243,2008-02-25
           10,A Blog For All,13913,346,2008-02-28
            11,DBKP:Worldwide Leader in Weird,11511,394,2008-02-28
             12,Blogger News Network,3149,903,2008-02-28
              13,Machinist –,2476,1050,2008-02-28
                 15,Geekologie – Gadgets,1146,1579,2008-02-28
                  16,MAKE: Blog,742,1934,2008-02-28

This one is not the shortest path but it is also depends where do you want to go. In both cases we ended up in blogs about gadgets and technology but what if you’re aim is blogs about marketing, video streaming or fashion?

For a full crawl log see this file: Inside-Analytics – Crawl – 2-28-08 log

By the way, it took a while to complete this crawl:)

I’m learning to crawl…

February 27, 2008 Leave a comment

I’m learning to fly but I ain’t got wings
Coming down is the hardest thing
– Tom Petty

Blog Discovery

This chart show how long is the tail climbing up the authority till the blog reaches more than 1000 reactions. Then the sky is the limit!

I was looking for a way to get more blogs for the blog monitoring project and was thinking about different ways to go about it.

So, I decided to write a blogger crawler.  I decided on the following strategy for discovering new blogs:

Starting from one of the great blogs that I already discovered (manually) I will look for this blog’s inbound links (using Technorati Cosmos API call) and then keep only blogs from this list with equal or higher authority (than the starting blog) . My working assumption is that blogs with higher authority will leads me to higher quality blogs. The crawler then continues in this path recursively hopefully as high as possible. The stop condition is no more inbound links (blogs reactions) with equal or higher authority than the parent node (the current blog).

I believe that starting from each one of the blogs in my origin list will lead to a list of high potential blogs and doing it every now and then will help me to discover new great bloggers.

Once I’ll solve the blog categorization problem (Technorati don’t provide this information through their APY) I could also improve the crawl to find only blogs from the same domain of interest.  In this way I can check blog’s visibility (and maybe reach/influence) in a category.

I did a test run today starting from one of the blogs from tier3:

The starting blog: authority =  16, rank  = 517455

The crawler ended up finding 128 new blogs.

The top blog in the list is: Daily Kos: State of the Nation, authority =  10854, rank  = 12

I think that this is not bad of a catch.

I wanted to continue but I maxed my Technorati API call for that day (500) :)

The theory behind this strategy is that it is easier to get on the radar of low authority blogs (upcoming) and then continue moving your message up. If these bloggers has already some visibility to higher quality blogs they may help to expend your reach. You can see how many comment I got from tier3 compare with tier2 and tier1.

Few more thought:

  • I could add a check for freshness using the lastupdate date
  • It is also possible to go top down – the outbound links should be discovered from a top blog (not using Technorati)
  • The same approach could be added to Twitter – who follow you, from what field and what are their credentials in the blogspheres

I have more thoughts and observation after running my first test and I skipped some of the implementation details but I hope that you can see the picture.

I’m learning to fly around the clouds
But what goes up must come down
– Tom Petty

Love to hear your thoughts.

Blog rank monitoring – update

February 24, 2008 4 comments

I got great feedback about this project and I’m doing a lot of thinking how to proceed with it.

I got very encouraging comments from two members of the Technorati team: David Sifry the founder and  CEO and Ian Kallen the leader of the Core Services engineering group .

I’m now working on getting more steps automated so I can monitor more blogs. It takes time.

I plan to focus mainly on monitoring activity within category and subcategories. I would like to point out great bloggers in the context of their domain of interest and their relationship with other great bloggers from the same field.

Still thinking about the way to report it. Till I start monitoring more blogs I’m not sure how interesting this report is anyway.

I’m also thinking about tiers and how to name them, maybe: unknown, upcoming, established and leaders.

For now I can at least show some great and promising bloggers from the list that I started with after monitoring for only couple of weeks:

Improved rank in both weeks and more than 10% in the last week:

Adventures in social software/media, sustainability, and life

Doug Haslam

Improved rank in both weeks



Niche Marketing – Andy Beard

The Business and Politics of New Media (And the Podcast)

Dossy’s Blog


A design and usability blog: Signal vs. Noise (by 37signals)

Micah Baldwin on Succeeding Through Failing: Startups and Entrepreneurship

Social Honeycomb

Social Media Explorer

Occam’s Razor by Avinash Kaushik

More than 20% improve only in the last week


Mahalo Daily


So, at this point I think that I’ll stop reporting about blogs’ progress and focus on the next step whatever it is.

Btw, I’m aware of the possibility that I’m helping some of this rank changes just by writing this post but over time I believe that the impact should be negligible.

I would love to hear your thoughts.

Marble Run and Software Automation

February 23, 2008 Leave a comment

For someone that work building software I find it hard explaining my job to my five years old son. Maybe if I worked for Electronics Arts things could have been a little simpler but since I don’t develop computer games I need to find another way  for him to understand automation through software.

Lately this way present itself in the form of a new toy we bought him for Valentine day: the MarbleRun with the the automatic elevator. What that is cool about this version is that you can really close a loop.

He called me today to show a new structure that he came up with. His mommy and I stood very proud watching his latest creation and how marbles go from one side through multiple passages and tubes all the way down to the elevator and then up to start all over again.

Then a thought crossed my mind. This is exactly what that I’m doing at work. Building something that eliminate human intervention – closing loops.

I shared this thought with my son.I don’t know how much he understood from it though. Any way, no matter what I say so far, after every visit in my office he still thinks that all I do is playing the computer:)

I had one more observation after watching few cycles of marbles looping through his system. Once in a while a marble fell off, few marbles clogged a passage or the elevator got stuck. In every case he had to get it working again. Well, I guess that we still need humans after all. 

Maybe like the fact that there is no perfect circle in nature there is no perfect automation loop either.

Honestly, this is a great game and not just for the little ones…

Categories: Observations, Software

Search and monitoring

February 23, 2008 Leave a comment

Search extract data from data monitoring calls for an action from data.

Search is transient and leaves no trace. Monitoring is a constant execution of the same search (or multiple searches) and each instance changes the data.

When someone searches Google for a term, Google runs a query and returns a list of items than matches as close as possible the submitted term. If you run the same search again you may get the same or maybe slightly different results, it depends on how frequent Google refresh the data. Each search is independent of each other. Unless Google keeps record of our searches for statistics there is not real trace of a search.

In a way search is like the roulette game it has no memory.

Monitoring is more like the blackjack game. It has memory. In my world we refer to memory as state and monitoring as a state-full search. Maybe this is why I think that monitoring is way more interesting than a single search (btw, I don’t play either game but at least the later give you some chance)

Every executed search changes the state and the the starting point for the next search. So what is this state?

The state is the subject’s status and it could have three possible option of being:

  • No state – we know nothing about it
  • Intermediate state – we know something about this item but it is not enough to report about it
  • Completion – there is something interesting to report about this subject

The subject toggles between these three options.

The subject moves from No State to Intermediate state  when we find new data that worth keeping.

The subject moves from No State to Completion if we find in one iteration all the desired data.

The subject moves from Intermediate state to Completion if we find the rest of the desired data.

The subject moves from Intermediate state to No State if the stored data ages (and now obsolete) or we don’t care anymore.

The subject moves to No State as soon as it reaches Completion.

Actually, in some cases there is no need to store the complete state.

When running search query one expect all the results to appear at the same time. Monitoring works like precipitation in Chemistry results accumulate like little water drops over time. Like in Chemistry where it is used to separate solid from a solution we use it to extract meaningful data out of the “cloud”.

Monitoring is only one application of state-full search. Actually, it is the execution of this kind of search in the present. State-full search exist in each of the three tenses. It has different form and applications for each tense.

You can use state-full search in the past, e.g. stock trader, counting on technical analysts are using it for back testing, proving that their early indicators for a trend shift are correct.

Another use of state-full search in the present is personalization. Your purchase history could predict your future spending and Amazon offer products this way as soon as you enter their web-site.

Using state-full search in the future for simulation is another application. Playing “what if” scenarios.

The three main challenges in state-full search is to decide what data to keep, how to organize the data, and debugging. Saving too little and it is hard to find something meaningfully. Saving too much beats the purpose of filtering information. Organizing the data the “wrong” way will kill performance. Both problem are hard to fix in the “middle” of a search. The debugging challenge refers to the ability to explain why this is the right results or why there is no result. It is not always intuitive why certain state changed and like in most software programs you can be surprised by a new path of execution. Each of these challenges impact the key for a successful state-full search implementation: flexibility.

In my opinion the most significant difference between search and state-full search is the results.

Data + search = data

Data + state-full search  = events, leads, alerts, actionable items, new data

Love to hear your thoughts!

T2T – Technorati to Twitter – blog monitoring

February 19, 2008 2 comments

Following Chris Brogan advice in his Commuter Feed-Twitter as a Platform post:

“Twitter as a platform is one of my favorite things to consider. Why? Because what Twitter has created has spawned so many interesting, sometimes useful, and always creative extensions.”

I will be sharing highlights from the blog monitoring project results and news using Twitter. If you are interested in hearing about changes to blogs’ Technorati rank please follow the @BlogMon user.

I’m now working to automate more and more steps in the blog monitoring project that will help me to produce results quicker and to scale the process.

Most of the twits to @BlogMon are done manually today but hopefully not for long.

My Twitter personal username is @kerendg – I usually don’t twit what I’m currently doing but I do note:

  • What am I thinking now
  • What am I reading, seeing, hearing now
  • About new UsingIT blog posts and ideas

So, if you want to DM me please use @kerendg  and not @BlogMon. If you follow @BlogMon I will follow you using @kerendg.

I will be happy if you share with me about great bloggers you know using @kerendg. If you do so please add under what category (or domains of interest) these bloggers falls in (if possible).

In the future as this project grows beyond using Technorati I plan on keeping this channel for sharing information related to finding great blogs and bloggers.


Blog rank monitoring – 1st week results – tier1

February 17, 2008 Leave a comment

This is the first week monitoring blogs’ progress using Technorati.

For full explanation about this mini-project see: Blog Rank Monitoring – prototype

Blogs in tier 21 are at the top of the pack (rank lower than 10,000).

Here are the movers in tier1 for the week between 2/12/08-2/16/08 (this first one is a short monitoring week) .

Name category sub-category
authority change
Design*Sponge News Design 28 69
PaidContent Business Content 16 32
A design and usability blog: Signal vs. Noise (by 37signals) Corporate Software 10 14
Niche Marketing – Andy Beard Business Web-analytics 12 20
The Jason Calacanis Weblog Corporate Search engine -27 -40
Occam’s Razor by Avinash Kaushik Business Web-analytics 135 38
Global Neighbourhoods Business Social media -129 -31

The rest of the blogs in tier1 had no change.

As you can see the categories and sub-categories needs some work. Please feel free to correct me using comments.

The biggest mover up in this week was the Occam’s Razor by Avinash Kaushik blog.

The biggest mover down in this week was Global Neighbourhoods.

Very small moves (smaller than in tier2 and tier3) up or down are common in this tier but we can see next week if the trend continue for some of this week gainer.

One more view of this week results: Tier1-HeatMap

I will be happy to hear if you have suggestions or comments.


Get every new post delivered to your Inbox.