Archive

Posts Tagged ‘Twine’

Semantic search engine – not your "average Joe" search task

August 23, 2008 1 comment

I still have a lot to learn about semantic search engines and the semantic web but I have few early observations about its direction.

I belive that Semantic search engines will do a lot of work in the back-end to help people with simple queries to find deep meanings.

I don’t see my dad going online and typing something like “where did US president’s kids went to school?”. Not because of the remote subject but because of the complexity. Most people type one or two words the most inside the search edit box. Maybe in the future he will ask such question using his own voice (see Nuance).

So, my assumption is that semantic search engines have different roles:

  • In the front-end
    • Research – A tool to to find patterns, trends and sets.  This tool should provide multiple visual ways to present the results (map, heat map, timeline, bubbles, tag cloud, 3D). Use case: I want to expose these and show them to the world in a clear visual way (example). I like to present them on my blog or web site (iFrame/Widget it). This is a great tool for researchers, news reporter, and bloggers.  TechCrunch, ReadWriteWeb, and others do present such data on their blog periodically – example. Again, this is not for the average user. 
    • Time saving: Operator is a Firefox extension that adds the ability to interact with semantic data on web pages, including Microformat, RDFa and eRDF.  You can use it to extract and export contact from a web page to your contact list, and event information to your calendar. It is still hard to find sites that supports microformats today (examples that do: Technorati – search page results, Google Map, LinkedIn – contact) but maybe Dapper will change this. You can now take an HTML page and automate adding microformats classes using this tool – let the tool find recurring elements in the origin page. Dapper does the “semantic work” for you. For more about this tasks see this smartly titled blog post: Does Tim Burners Lee’s Blog support Microformats? By the way Operator is a great way to check which web-sites supports microformats (use the Operator sidebar: from Firefox menu bar choose View->Sidebar->Operator).
  • As a back-end operation
    • Organize – I want my data to be linked using semantic techniques – see Twine .
    • Recommendation (discovery and sharing) – based on mining my data (and others) what you can tell me that: I don’t know that I don’t know.
    • Time saving: people have trouble with tags. In my opinion, the main reason is that it is hard to come up with forward thinking useful tags that will help to find data later, to match other tags and to help search engines to drive more traffic to the article. I can spend few good minutes thinking, should I tag it as: social network, social networks, or social-network. So, automating this process will help in many ways: more tagging, quicker tagging, more common tagging=more links and association. Twine today suggests tags that I can pick from, yet I think that this is just the beginnings. 
    • Money – in a way Google does this in AdSense – matching content, and target readers with ads. If you manage to do better job in this area there is a great potential. There are more places to add ads: people profile pages (not just in Facebook – I have a ton of profile pages across many social services, sigh), maybe in comments. 
    • Passive search – Alerts – at this point Google allow you to set alerts for keywords and Delicious for Tags. This is another way (as Chris Brogan says) to listen to the web. Maybe there is a way to improve exact keyword matching with associated content – same as semantic recommendation engines works. This could be also useful to organize the alerts results – I get alerts about some keywords in one long list. I’m not sure how many people are using alerts but I think that this is a great tool. This is by the way another marketing channel  – if I’m telling you what I’m looking for why don’t you “help me find it”? Alerts can become more sophisticated looking for patterns – example: more than 10 mentions of a word/phrase/product/company/my competitor in a day/week/month. I know that you can check that today on Google Trends but only for high volume terms, and it is not set as Alert – if you don’t look for it, it will not come to you.

Finally. Google today does a lot of “Microformat work” and some semantic discovery too, all behind the scene. You don’t ask for it explicitly but Google will still going to deliver it. When you search for ReadWriteWeb for example you’ll get along with the site link also the Contact, About, and Products information. If you’ll search for “movie near Lexington, MA” you’ll get what you need. This is too, support my assumption that you’ll not going to see sophisticated queries submitted by most people but the semantic web will be able to come up with better and more relevant answers to simple queries with complex meanings.

Building knowledge base using Twine

August 4, 2008 Leave a comment

 

In the past few weeks I’m working on a project trying to push our system (at work) throughput to a new level of scalability and performance. The motivation is a new vertical with potential enormous number of “transactions” per day. We already completed a similar project with the same aim more than a year ago and it allows us to deal with tones of transactions entering another new market.  Last time it was done in a rush under very tight schedule but we made it. This time I have some time to think through and explore what’s out there. I’m looking at multiple solutions to scale database operations. We already abstracted servers and other resources to allow both vertical and horizontal scalability yet we are still “counting” on enterprise database solutions from vendors  like Oracle and SQL Server to help scale storage operations. In the levels that we are about to deal with, it will cost a lot to our customers to install such storage environments. We like to squeeze more out of an existing one as much as possible so we can lower TCO.

So, to the point of this post – I’m looking at multiple ways to gather existing knowledge from the web about Scalability, Performance, Optimization, Utilization, Storage and more in one place. I like to create a knowledge base with the best resources available in this matter. Since I like to blog and explore new search technics I decided to look at Twine. I found it few days ago looking at my WordPress,  Blog Stats page. I got some traffic from this site so I went to check it out. I’m still learning about this tool and I hope that I can accurately describe it. Remember, my first objective is to be able to aggregate as much knowledge as possible about scalability and performance.

Twine1

What that Twine let me do:

  • You can simply use it as a way to save bookmark – you can tag an article and the system too will offer some tags for you allowing you to remove undesired ones.
  • You can join an existing Twine – I joined the Web Industry Trends where I can see articles saved by the members of this Twine. I can see comments left by others and add my own comments. I can see how many people viewed it. The system supports email and feed options per Twine.
  • You can start your own Twine - Here you have full control of the content of the Twine. You are the Twine webmaster. So, I started the Scalability and Performance Twine. You can make it public and allow new members to join or invite others yourself. I made it so you can join by request. The system allows you to add all sort of items to this mini knowledge base, like: bookmarks, documents, notes, images and Video. The engine behind this application is using all sort off recommendations algorithm suggesting related Twines and Tags. I’m not sure if it suggest it or you’ll need to add yourself for these addtioanl options: Places, Organizations.  The key thing is that you can organize information around a subject matter in one place with the help of others and sophisticated recommendation/search engine. It reminds me of wiki but without learning the specific wiki syntax and a powerful recommendation engine to help with the task. It is also great that someone owns the Twine and is responsible for making sure that only the best information is keep and only the members that truly contribute to the Twine remains associated with it.

Twine2

The UI is easy to use and very intuitive. I see some places for improvement around the UI real estate utilization; too much scrolling in some pages like the profile page (make the profile picture smaller). Some pages comes a little slow like the My Twines overview page but this is easy to fix and as you can see in the upper left corner of this site is still in beta phase. I actually in there after asked and got a private invite.

Now, I hope that Twine will open up and launce it service publicly so more people could join in. When that happen of if you got a private invite and If you care about the subject of Scalability and Performance, knows about it and where to find good source of information please join the Twine service and my Scalability and Performance Twine. The more I use it the more I like it.

Update: you can find here a very helpful screencast given by Nova Spivack Twine founder. The sysetm is packed with functionality so this is a great way to get up to speed with it. Also, the Twine Bookmarlet is a great time saver adding bookmarks (aren’t you getting tired adding title, descriptions and tags for each bookmark saved?). This for me is enough reason to use the system (and I’m a fan of delicious the social booknarking site).

Web presence – piecing together an Identity

July 31, 2008 3 comments

People leave missing information all the time. No blog About page, no employer name, no picture, no blogger name, Twitter account without web page link. Some time the simple link connection is not enough to piece it together. Your network too can help in finding connections or confuse people if your connections are spreads on more than one social network and accounts?  In some cases it is done intentionally and no harm done but in others cases when done by mistake it could lead to lost traffic and opportunities. From what that I see and through my experience most times if the information is not just there, only few will bother looking for it. Isn’t bringing these connections forward and bridging information gaps the role of the new Social Graph Search Engines?

This post will cover:

Finding out how objects are connected across multiple web applications. Overcoming cases when the information falls between the web cracks or is deliberately missing. Looking beyond the trivial context of web links (URL), friends, fans or followers.

  • Understanding the problems – some examples
  • Looking what tools can help piecing the missing information together
  • Bringing it forward – making it easily available when needed

Understanding the problems

Example 1 – me and my blog:

I did not add my blog URL to my LinkedIn profile. I did not add my employer’s name to my blog About page. I did it intentionally. I like to keep them separate for now. Omitting these two pieces of information seems to work so far. This missing information is not bridged by any social graph search engine that I’ve seen so far. It is ironic but a simple search of my name on Google will reveal the connection (warning: there are couple more Keren Dagans out there – both has nothing to do with software or technology). The connection in this case between me, my blog and my employer is my identity (similar profile info such as name, picture and location).

Example 2 – disconnected social networks:

I keep my Facebook and LinkedIn networks separate. I only have a couple of relatives overlap. I use Facebook for personal connections and LinkedIn for professional ones. I use Facebook sometime to post my new blog posts. It seems like social search engines can link between my friends across the networks, yet again, there is no association between me and my blog. Some information inside one’s activity can help making the connection.

Example 3 – multiple presence using disconnected accounts:

Case 1: My “personal” Twitter account is @kerendg. The other day I submitted a query to Twitter Search, searching for references to my other Twitter account @BlogMon. I found out that Stowe Boyd (stoweboyd) was asking “who is running BlogMon?”. The link to my blog is on BlogMon web page, and can be easily found using Twhirl or Twitter Search. I don’t know if he ever got an answer to this question or no. Soon after, I started following @stoweboyd using my @kerendg account (I don’t follow from @Blogmon). He did not contact me or follow-me on Twitter till this day- maybe because it is hard to make the connection or maybe it is just not that important. Your blog is an important piece of the new identity (FYI – WordPress using your blog URL for your OpenID).

Case 2: Some entrepreneurs runs both the start-up blog and their own blog. Only in few cases there is a link from the corporate web site to the entrepreneur’s (see Mashery  -> Blogroll for Praxis blog ran by their CEO, Oren Michels – even this is not easy to connect). Same story using Twitter accounts (one for the business and one for personal updates). In LinkedIn organization is a connection. This is true across networks in addition to your role.

Example 4  – blog action:

Case 1 – comments: I don’t get too many comments on my blog. I can only wish to get more. Yet, I did get some comments from people with vast web presence. Is this some kind of connection? Did I Digg/Saved one of their blog post ? Did I mentioned their companies? You give me your attention I see it as another type of connection.

I also leave comments occasionally, mostly on the same 4 or 5 blogs.  This information can help to understand my preferences.  Similar interest is a another piece in the puzzle. Past activity on my blog too. Frequent reader is yet another type of connection with the blogger.

Case 2 – traffic source and blog reaction: I look at WordPress, BlogStats page. The section of Referrers shows traffic that is not coming from two type of sources:

  • Unidentified – there is no way to track it back to the person that looked on my web site.
    • Traffic from search engines
    • Traffic from commends that I left (and there is no reply)
    • Traffic from “similar post” links
    • Traffic from incoming link to my blog from other blogs
    • WordPress tags
  • Identified – tracked back, with some luck
    • Traffic coming from some sort of a network like Twitter, Jaiku, Digg, reddit, StumbleUpon, Twine, Pijoo.

Since there is nothing to do about the first type of traffic (non invasively) there is nothing to add here, but for the second type: let the search begin…trying to find the source. Who dugg my blog? save it in reddit? is there a reply to my comment (like in Techcrunch comment threads)? Who mentioned it on Jaiku or Twitter? The information is scattered all over the net.

Most of my Twitters followers came through my blog. I added friends, people that acted upon my posts in other networks. I care about the origin reaction.  When it is actually possible to track it back to the source, it involves a lot of  leg work. Blog reaction is another type of connection . As I wrote about this before in “What is a blog reaction these days?” I don’t refer to “Blog Reaction” in the narrow definition of someone writing a counter blog post (pingback is a trivial link).

Some of the tools that are available today for piecing it together:

  • Google search(Web) – searching for the person’s name, organization association – this is enough to discover presence across multiple networks.
  • LinkedIn – searching for the person name and organization – looking at both profiles to cross reference with the other details to make sure that this is the same person.
  • Google Alerts  – add your link, blog name, your name, Twitter @account and link – looking for references.
  • Technorati – looking for blog info and fans  
  • Twitter – this is a process
    • If the information is coming from WordPress Referrer then you can follow it (unless it is coming from your account and not worth following)
    • If not – you can search your link as is but it is better to try hashing it using TinyLink or http://is.gd/ or http://snurl.com or other URL shortening services. Use Twitter Search for that. Tip: select in all Languages – I found out that if this setting is not on a link search will return nothing even if the update is in English – the default.
    • Use Twitter Search to search your name, and @account too.
  • Delver – network graph – this service can save some of the leg work checking multiple networks.
  • Jaiku – same as Twitter. What that is nice about Jaiku is that Google Alerts pick up on conversion.  
  • Flickr – some people like me don’t have account but do have tagged pictures submitted by friends.
  • WordPress Referrer – this is the starting point
  • Digg, reddit, del.icio.us and a like – go to your profile page and see who dugg/saved/rate your post

Do you know about more tools?

Now, wouldn’t it be nice if there was one tool that does all that and bring this information forward when it is most relevant. 

Bringing it forward

If this information could be gathered through single tool then my ideal solution is something like SnapShots. When I click on any account, link from anywhere on the web present me with the graph. Show me this entity’s web presence. Show me how can we connect? This person blog, other accounts. The information could be context sensitive – e.g if I’m in Twitter show me all Twitter accounts for the same entity – show me if we are connected through Twitter first.

Alternatively, send me an alerts about subtle semantic links to me and my blog. Something like.

  • This individual
    • from this location
    • working at
    • in this role
    • own this blog
    • x degree from you on Y network
  • Acted:
    • was once at your blog before
    • looked at your blog on this post
    • you profile in LinkedIn
    • your other Twitter account
    • respond to your comment
    • dugg your blog.
  • Options
    • Do you want to make a “trivial” link and connect?
    • Go to his blog

Summary

In this post I was trying to explain that if we want to build a complete social graph network it is not enough to look only on the “trivial” links. This is just the beginning. It will not present a full picture of one’s web presence and identity. In order to construct a useful graph there is a need to look at other type of links. These links are scattered across multiple social network and services. They are part of the enhanced meaning of one’s profile attributes including activities and relationship.

I see three steps moving in this direction. The first is piecing someone’s identity drawing on information from multiple sources. The second is using this information for finding new ways that people are connected i.e. building the complete and rich social graph. the third is presenting it when relevant.

I did not cover the uses cases for having such information at hand. On top of my head I can think of a few:

  •  It could be handy to QA web presence – especially if the entity is a business
  • It could be handy for web-sites to understand their crowd
  • It could be handy for business development
Follow

Get every new post delivered to your Inbox.