Monday, September 01, 2008

Enabling Semantic Infrastructure for Collaborative Systems

As more companies embrace the techniques of enterprise social software (ESS), I start wondering about what I am going to do with this extra data coming my way. ESS systems can promote collaboration which often leads to Even More Data coming into my "information space." Currently it's not so bad-- I have a middling amount of email, about fifty thousand bookmarks, about a hundred RSS feeds-- how hard can that be to keep up with? When you think about how much that space would expand if I add even my most immediate group of co-workers, in effect, I have all of their bookmarks, RSS feeds, etc. This is only a good thing if I have some help dealing with it all.

The only way out of the coming information glut will be to have the machines help dig us out. But how to do that? Enterprise search is heavy, expensive, local text search. It can be helpful, but it is not the way to handle any the meaning of the data, i.e., its semantics. Semantic data management needs to be baked into a content system, so the generation of metadata just becomes part of the working environment. This can be in the very simple form of tags/folksonomies and standard representations of working groups using techniques such as friend of a friend (FOAF) and description of a project (DOAP). When semantic data management capability is part of the infrastructure in an ESS-type environment, it promises to allow data to be organized and queries in interesting and emergent ways. In a corporate environment, the system can easily link up associated projects, content, or people.

This sounds fabulous in theory, but adding such capability to your environment is actually worse and harder than putting in a bigtime enterprise text search server. Semantic web technology is some of the newest 10-year old technology in our bag of tricks, and it has the capability to capture, analyze, and derive value from relationships inside of the data. But, in order to get this kind of semantic connectivity, you need a purpose-built data store, a server to house it, and an analytical/query system to get to it. Even worse, semantic meta data gets big very quickly which can lead to storage and query-response issues. So, not helpful, right? The software integration problem alone is daunting enough, and adding semantic infrastructure to my overall ESS platform means that I have to adopt yet another system to manage.

Naturally, I want to imagine that I have access to a semantic data server that looks like it lives inside my machine room, and that feels like software-as-a-service (SaaS). I have worked quite a bit with virtualization from external providers: Amazon's EC2, CohesiveFT, and others. These are amazing systems that allow me to sign up, submit some commands, and some magic happens. What's great about that is that I don't have to put the systems up, I don't even need to understand how they work. I can just use them as if I *had* spent 6 months adding a new wing to my server room. With a virtual semantic server, I can integrate the promise of the "linked web of data" into my ESS platform to manage and leverage the meaning of all that data.

Now, it's important to observe that virtualization works in the SaaS model because of the generic nature of the task. The vendors of such services are able to optimize for scale, performance, and reliability without needing to know precisely how the systems are going to be used. Semantic data as a service falls into this same category in that it's generic and needs to be actively optimized for scale, performance, and reliability.

Talis, a UK-based company is aiming to be the Amazon EC2 of the semantic web, and I think they have a good shot at it. The same principles apply: Talis is concerned with making a semantic store fast, reliable, and scalable, so you don't have to. Your data is stored and processed somewhere else, but it's always your data. Via a straightforward HTTP-based interface, you add metadata, and query against it.

Mind you, this is all very new, and Talis themselves have not yet defined their precise business model, but they are working on it, and making developer access free-for-the-asking for the time being. Clearly, there are many real-world issues to resolve, such as SLAs, privacy, and billing models, but the key notion here is that semantic data processing is quite generic, and we should not be creating our own semantic servers to manage this data. In consumer-land, "Web 2.0" is creeping toward the linked web of data where more people are (finally) starting to understand what TimBL was talking about with this 'semantic web' stuff. Now is the time as ESS systems proliferate, we who glue these systems together should be taking advantage of what semantic web technology can do for us, and skip the server set-up part by using a system such as that provided by Talis.

See the Talis.com website and their developer wiki (n2.talis.com) for some overview articles and taste of how to interact with a Talis data store. A future post will include some of my experiments with the system.

No comments: