I still have a lot to learn about semantic search engines and the semantic web but I have few early observations about its direction.
I belive that Semantic search engines will do a lot of work in the back-end to help people with simple queries to find deep meanings.
I don’t see my dad going online and typing something like “where did US president’s kids went to school?”. Not because of the remote subject but because of the complexity. Most people type one or two words the most inside the search edit box. Maybe in the future he will ask such question using his own voice (see Nuance).
So, my assumption is that semantic search engines have different roles:
- In the front-end
- Research – A tool to to find patterns, trends and sets. This tool should provide multiple visual ways to present the results (map, heat map, timeline, bubbles, tag cloud, 3D). Use case: I want to expose these and show them to the world in a clear visual way (example). I like to present them on my blog or web site (iFrame/Widget it). This is a great tool for researchers, news reporter, and bloggers. TechCrunch, ReadWriteWeb, and others do present such data on their blog periodically – example. Again, this is not for the average user.
- Time saving: Operator is a Firefox extension that adds the ability to interact with semantic data on web pages, including Microformat, RDFa and eRDF. You can use it to extract and export contact from a web page to your contact list, and event information to your calendar. It is still hard to find sites that supports microformats today (examples that do: Technorati – search page results, Google Map, LinkedIn – contact) but maybe Dapper will change this. You can now take an HTML page and automate adding microformats classes using this tool – let the tool find recurring elements in the origin page. Dapper does the “semantic work” for you. For more about this tasks see this smartly titled blog post: Does Tim Burners Lee’s Blog support Microformats? By the way Operator is a great way to check which web-sites supports microformats (use the Operator sidebar: from Firefox menu bar choose View->Sidebar->Operator).
- As a back-end operation
- Organize – I want my data to be linked using semantic techniques – see Twine .
- Recommendation (discovery and sharing) – based on mining my data (and others) what you can tell me that: I don’t know that I don’t know.
- Time saving: people have trouble with tags. In my opinion, the main reason is that it is hard to come up with forward thinking useful tags that will help to find data later, to match other tags and to help search engines to drive more traffic to the article. I can spend few good minutes thinking, should I tag it as: social network, social networks, or social-network. So, automating this process will help in many ways: more tagging, quicker tagging, more common tagging=more links and association. Twine today suggests tags that I can pick from, yet I think that this is just the beginnings.
- Money – in a way Google does this in AdSense – matching content, and target readers with ads. If you manage to do better job in this area there is a great potential. There are more places to add ads: people profile pages (not just in Facebook – I have a ton of profile pages across many social services, sigh), maybe in comments.
- Passive search – Alerts – at this point Google allow you to set alerts for keywords and Delicious for Tags. This is another way (as Chris Brogan says) to listen to the web. Maybe there is a way to improve exact keyword matching with associated content – same as semantic recommendation engines works. This could be also useful to organize the alerts results – I get alerts about some keywords in one long list. I’m not sure how many people are using alerts but I think that this is a great tool. This is by the way another marketing channel – if I’m telling you what I’m looking for why don’t you “help me find it”? Alerts can become more sophisticated looking for patterns – example: more than 10 mentions of a word/phrase/product/company/my competitor in a day/week/month. I know that you can check that today on Google Trends but only for high volume terms, and it is not set as Alert – if you don’t look for it, it will not come to you.
Finally. Google today does a lot of “Microformat work” and some semantic discovery too, all behind the scene. You don’t ask for it explicitly but Google will still going to deliver it. When you search for ReadWriteWeb for example you’ll get along with the site link also the Contact, About, and Products information. If you’ll search for “movie near Lexington, MA” you’ll get what you need. This is too, support my assumption that you’ll not going to see sophisticated queries submitted by most people but the semantic web will be able to come up with better and more relevant answers to simple queries with complex meanings.
In the past few weeks I’m working on a project trying to push our system (at work) throughput to a new level of scalability and performance. The motivation is a new vertical with potential enormous number of “transactions” per day. We already completed a similar project with the same aim more than a year ago and it allows us to deal with tones of transactions entering another new market. Last time it was done in a rush under very tight schedule but we made it. This time I have some time to think through and explore what’s out there. I’m looking at multiple solutions to scale database operations. We already abstracted servers and other resources to allow both vertical and horizontal scalability yet we are still “counting” on enterprise database solutions from vendors like Oracle and SQL Server to help scale storage operations. In the levels that we are about to deal with, it will cost a lot to our customers to install such storage environments. We like to squeeze more out of an existing one as much as possible so we can lower TCO.
So, to the point of this post – I’m looking at multiple ways to gather existing knowledge from the web about Scalability, Performance, Optimization, Utilization, Storage and more in one place. I like to create a knowledge base with the best resources available in this matter. Since I like to blog and explore new search technics I decided to look at Twine. I found it few days ago looking at my WordPress, Blog Stats page. I got some traffic from this site so I went to check it out. I’m still learning about this tool and I hope that I can accurately describe it. Remember, my first objective is to be able to aggregate as much knowledge as possible about scalability and performance.
What that Twine let me do:
- You can simply use it as a way to save bookmark – you can tag an article and the system too will offer some tags for you allowing you to remove undesired ones.
- You can join an existing Twine – I joined the Web Industry Trends where I can see articles saved by the members of this Twine. I can see comments left by others and add my own comments. I can see how many people viewed it. The system supports email and feed options per Twine.
- You can start your own Twine - Here you have full control of the content of the Twine. You are the Twine webmaster. So, I started the Scalability and Performance Twine. You can make it public and allow new members to join or invite others yourself. I made it so you can join by request. The system allows you to add all sort of items to this mini knowledge base, like: bookmarks, documents, notes, images and Video. The engine behind this application is using all sort off recommendations algorithm suggesting related Twines and Tags. I’m not sure if it suggest it or you’ll need to add yourself for these addtioanl options: Places, Organizations. The key thing is that you can organize information around a subject matter in one place with the help of others and sophisticated recommendation/search engine. It reminds me of wiki but without learning the specific wiki syntax and a powerful recommendation engine to help with the task. It is also great that someone owns the Twine and is responsible for making sure that only the best information is keep and only the members that truly contribute to the Twine remains associated with it.
The UI is easy to use and very intuitive. I see some places for improvement around the UI real estate utilization; too much scrolling in some pages like the profile page (make the profile picture smaller). Some pages comes a little slow like the My Twines overview page but this is easy to fix and as you can see in the upper left corner of this site is still in beta phase. I actually in there after asked and got a private invite.
Now, I hope that Twine will open up and launce it service publicly so more people could join in. When that happen of if you got a private invite and If you care about the subject of Scalability and Performance, knows about it and where to find good source of information please join the Twine service and my Scalability and Performance Twine. The more I use it the more I like it.
Update: you can find here a very helpful screencast given by Nova Spivack Twine founder. The sysetm is packed with functionality so this is a great way to get up to speed with it. Also, the Twine Bookmarlet is a great time saver adding bookmarks (aren’t you getting tired adding title, descriptions and tags for each bookmark saved?). This for me is enough reason to use the system (and I’m a fan of delicious the social booknarking site).