In this post I will discuss the three elements of Mashup sign-on process: Security(SSO), Access Control and Single Identity. I see a lot written and done about each individually but I think that it is not always clear what solution map to which problem.
If you are familiar with this subject then you can skip to the next paragraph (or this post entirely). For the ones that are not familiar with mashups and how to get them working for you, please read this short preface. There are many online services today: social networks like Facebook, bookmarking services like ma.gnolia, news like Digg, media sharing sites like Fliker and YouTube, and more. In the screen capture of the form below you can find 43 such services. These services provide online APIs (a way to request data and to execute functionality in remote from service by the world outside). This allows the development of new services on top called Mashups. The new service interact with the underline services and add value because of the unique mix created. The first form below is taken from FriendFeed a mashup application that help you keep track of your friends’ activities across many web services. In this form you are requested to select the services that you permit the current application to pull or push information from and to (in the case of FriendFeed, pulling only). In order for the application to be able to access your data the system needs to know who you are i.e. your user name (login). In some cases it will ask you for your password too.
These form raises three hot issues in the growing environment of open API and mashups. If you want to see how rapidly this world is growing look in this excellent source of information: ProgrammableWeb
- Security: not having login and password information stored in multiple places. Single sign-on (SSO)
- Access Control: having control over what the service can do with my data. Defining security policy.
- Single Identity: not having to re-enter my profile and friends’ information all over again. Data Portability.
Every service offers a sign-up process where you type in your login and password. Companies like Google, Microsoft and Yahoo that offer multiple applications online offers kind of single sign-on mechanism that once you’re signed in to one service you can safely go to the next one without re-login. The available solution for web sites that are not belong to the same company is the OpenID and here is an example for how to use it from WordPress. It is in a way the solution for single sign-on on the web today. Not all the services today support it but the adoption seems promising. If you want to see a decent amount of available options to authenticate across service just click on the “Sign in using” drop down list in ma.gnolia’s login page.
When I allow a service to access my data from another service I don’t have a way telling the source what I allow them to provide. I can’t tell the service if I allow it to just read my data or also the update information (e.g. updating Twitter status). It is mostly determined today by the APIs. If there is a way to configure it (to some extent in Facebook) it is not consistent across the web. I know that there is an effort by multiple leading software companies to deal with it. For more information read the page about the new OAuth protocol.
The term profile today refers to way more than your name, address and email. I think that Facebook took it the farthest including your media preferences, activities and your choice of applications. But most important it includes your contacts i.e. your network. It is in the basis of most social network services that your experience and satisfaction from the site is in direct relationship with your network size. Yet, no one want to re-type his personal information and re-build his network. Some claim, and I agree, that this data should not belong to anyone but you. The Data Portability initiative is trying to eliminate the need for recreating your online identity and profile over and over again by defining a new open standards that will allow services to port it to your request. This is a great step and I can only hope to see it implemented across the web soon.
If you are new to the subject but not new to using mashup applications I hope that you’ll find this post helpful – maybe now you can start using the OpenID option instead of your login. If you are about to start a new service or mashup I hope that this will help you to think about how to make it easy for us to interact with it.
Do you see more ways for improving this process?
In his post “Could Someone Explain Technorati” Chris Brogan wonders about the consistency, accuracy and reliability of Technorati service. I can’t explain the behavior of the system over there but I can share some of my experience dealing with different challenges using online APIs (web services) and data. The objective here is to help other mashupers to better prepare for future integrations effort across multiple web services. Since it appears that the mashupers community is growing faster than the web service provider I’m sure that more fellow API consumers can share some stories of their own. I will be happy to hear about.
I see three participants perspectives in this “love triangle”: the web site visitor, the mashuper (the API consumer) and the service provider.
My visitor experience:
Chris Brogan talks about his experience from the user perspective in his post. I have nothing to add here but I would say that as a service provider, this should be my top concern satisfying my loyal community. Maybe the way to deal with this in the case from Chris’s post is by monitoring for exceptions (drastic rise or fall in the rank/authority).
My mashup experience:
As I mentioned in some of my earlier posts (here, here and here) I’m working on a small project for finding productive bloggers by monitoring for consistent improvements in their Technorati rank. So on a frequent basis I monitor the rank for over 800 bloggers now. I plot some of the result to a designated Twitter account: blogmon.
The first set of challenge is dealing with volatile data:
- Some times I see no authority in the results (inboundblogs).
- Some times there is no valid last update date in the results: <lastupdate>1970-01-01 00:00:00 GMT</lastupdate>
- Most time there is no author (the user did not add it)
- Some time there are no tags (the user did not add it)
- Some time as Chris mentioned the rank is off for a short period of time
For example see Seth Godin’s Blog rank history:
last update rank authority
2/12/2008 19 8599
2/25/2008 18 8697
3/17/2008 19 8658
3/22/2008 16 8827
4/10/2008 15 8946
4/19/2008 16 8882
4/23/2008 17 8819
5/12/2008 17 8828
5/14/2008 16 8863
5/20/2008 15 8890
These are the details that a consumer of online volatile data must plan and look for ways to compensate for.
- Check the validity of the date
- Don’t just count on the last result i.e. search for the last valid result and monitor over time.
- Be prepare to plot partial results (e.g. no top tags or author).
- Most important: guard your data i.e. protect what that you take from the service and store in your records.
The next set of challenge has to do with the web service behavior:
- I get the fowling error once or twice: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
- Some API requests come back with:
<META HTTP-EQUIV=”REFRESH” CONTENT=”2; URL=http://api.technorati.com/bloginfo?url=****&key=****&version=0.9&start=1&limit=30&claim=0&highlight=0″>
**I intentionally masked the URL, title, image and my developer Key with ****
This result can crash your system if not handled.
- Finally: and I get this one a lot:)
<?xml … “http://api.technorati.com/dtd/tapi-002.xml”>
<error>You have used up your daily allotment of Technorati API queries.</error>
- I can’t picture my dev world without Exception Handling – this is the ultimate protection against web service unexpected behavior in this specific case. So guard any call, loading XML result and data parsing by wrapping them with a try and catch block.
- Logging – log expected and unexpected behavior for later analysis and recovery.
- Build the system so exceptions are caught, logged but the execution can move on to the next task.
- This is something that I learned from a smart Army office: “If there is a doubt there is no doubt” basically saying that it is better to not report at all than to report inaccurate data.
- Find ways to minimize the API calls – e.g. I ask for tags only when I find a blog worth reporting on
- A thought: I’m not an expert in XML and DTD but could it be that using DTD slows down the web service. If you know more about it please share with me/us. Is this really necessary on a read only calls?
About the service:
I can’t talk much about what that a web service provider feels or experience (I’m sure that Ian Kallen from Technorati has a lot to share about this subject) but I want to say few things:
- Please don’t get this post wrong I’m a fan of Technorati – I use it and deeply appreciate their service and thankful for having the option using the APIs . As I said earlier the intention is to share from experience and to allow you to better prepare for such effort.
- I guess that it is hard to estimate the load on the system with such growth in the number of mashupers out there. So my heart is with them.
- There are two more threats that the web service provider needs to protect itself from and I’m sure that those consume some energy: protecting the hard gather data and its environments from abuse and malicious attacks.
One last comment: ironically I had none problems with Twitter so far:) but I’m aware of the pain that some of the Twitter API user suffer occasionally.
As I continue playing with the small application that I’m writing for monitoring positive shifts in bloggers’ Technorati rank I realized that I’m actually finding bloggers writing about almost everything. The only common thing I could find so far is that they are just consistently great.
The tool scans and builds historical data for over 700 blogs so far. I build this growing blogs’ URL list using my favorites (i.e. humanly picked in multiple social ways) and the crawling algorithm I previously explained in this post.
I won’t get into the operation details (and there are plenty of details) but I mange to get a lot done not exceeding the 500 API calls daily Technorati limitation.
I output the result to BlogMon Twitter user for now so please, you are invited to be a follower.
Example of outputs:
Short-term pattern: http://wpthemesplugin.com, rank gain: 18.10 %, since: 4/12/2008, Top Tags: “wordpress”, “themes”
Long-term pattern: http://mediaphyter.wordpress.com, rank gain: 76.10 %, since: 2/1/2008, Top Tags: “Social Media”, “Security”
As you can see I log the URL, the rank gain, since when, and the top two tags to give you an idea what this blog is all about. I found that in most cases this is good enough. Do you?
Why am I doing this?
- First, it keeps me engaged with the mashup opportunities and there are lots of those available today .
- Second, I enjoy doing it.
- Finally, you may find it useful in some way – you can leave a comment on these blogs and maybe get some traffic to your website/blog. I will be happy to hear if you did.
I may be tempted to mashup more web data sources/services in the future or explore discrepancies between Alexa data and Technorati rank .
I’m also using a great early stage service developed by Microsoft called Popfly to build and deploy a small (too small and simple at this time) application to my Facebook profile called BlogTwitt. BlogTwitt will show the recently posted updates to the BlogMon Twitter user I use for outputting the daily findings from the application I’m working on.
At this point I could not share this application – I don’t know why so I left a message on the Popfly Facebook wall. As of this time I got no answer. I do appreciate what that they are trying to do, saving me the time learning/working with the Facebook API.
I think that I will write soon a post about the Popfly and the challenges writing a good mashup. I do encourage people that are just starting their mashup thought process to look at this tool and also at Yahoo pipes (fantastic interface) to play, understand, get ideas and brain-storm with the numerous available web services (API) out there. This is like working in a software solution architect group for a company that offers multiple products and findings new way to increase the value of the existing modules by symbiotically integrating them to new offerings.
Finally I don’t think that this is Software plus Service like Microsoft tries to sell it I see it as Service plus Service (the service is build of software, da). Maybe Service X N.
As always, I would love to hear your thoughts so please use the comment section.
Update: I forgot to mention that what that I like about using Twitter vs. my blog to post results is that it does not add to the blog reactions count. So, it goes under Technorati radar and does not impact the Rank (avoiding the Observer Effect). That may change one day when they will realize that Twitter’s twitt with blog’s URL is actually a blog reaction.