Folksonomies vs. Ontologies

According to Wikipedia a Folksonomy is a the practice of collaborative categorization using freely chosen keywords. Folksonomies offer an alternative to classification using ontologies that is more flexible and scalable and is therefore well suited for collaborative categorization by non-professionals.

Although ontologies are widely used to classify the information we posses about world, they are not flawless. Some people think that in the context of the Web the very purpose of ontologies (classifying objects into classes) is flawed. Clay Shirky has a recent interview on ITConversations entitled “Ontology is Overrated” on the topic. In my view everyone attempting to build a general ontology runs into a major problem: some concepts can be classified in different ways according to the expected use. Is Australia a continent, a country or a rugby team? No matter which of the classes is chosen there will be some users that will find the classification arbitrary. This is one reason why directory services (not only tools like LDAP, but also web directories like ODP and even file systems) offer the possibility to add aliases (symbolic links) between related categories that are part of different subtrees. While this could solve the classification problem in many of the cases, the amount of links that one has to add can be restrictive.

What the folksonomies do is completely renounce to the hierarchical relations. Metadata is attached instead to each object in the form of tags (labels). A major advantage is that an object can have an arbitrarily large number of tags. Related tags can be determined automatically and used to increase the accuracy of the searches. Moreover these tags are added by the users themselves and not by experts. So it is the users that create metadata for their own individual use that is also shared throughout a community. Two such communities are Delicious and Flickr. Delicious allows users to store and share bookmarks while Flickr does the same thing for photos. Because the users can add any tags they like this is a very flexible way for users to organize their information. But things don’t stop here because the real power of folksonomies comes in when the supporting community grows large enough. Not only the categorization is usually comparable to that made by experts but the resulting system is much flexible and the advanced search capabilities are almost limitless. Although folksonomies are just beginning to become popular at this time, the idea behind them is very simple and interesting. They surely deserve further investigation.

For a more in-depth introduction to Folksonomies has a very interesting article entitled “Folksonomies – Cooperative Classification and Communication Through Shared Metadata“.

7 Responses to Folksonomies vs. Ontologies

  1. Henry Story says:

    (You really need a preview button!)
    Well Folksonomies and RDF are not at all at loggerheads. In fact RDF can be an excellent way to exchange folskonomies. If I wanted to tag my latest blog I could do so easily using RDF (In N3 notation)

    @prefix : ≤≥ .

    ≤≥ :category [ :term “w3c” ], [ :term “html” ], [ :term “standards”] .

  2. hritcu says:

    Tagging and RDF surely do not exclude each other, and your example is very nice at illustrating this.

    So you like RDF, perfect. It can be used together with tags, even better, that means it is a versatile enough language. Well, triples existed a long time before RDF so it’s not really a novel idea by the W3C. And it’s not that you have to use RDF in order to get tagging. And finally, it’s not that the database vendors have to do a lot of work to store triples. Maybe they can optimize things a little when they are not storing arbitrary relations.

    Anyway, the discussion here was Folksonomies vs. Ontologies, or better put social tagging and searching vs. … vs. what? ODP? Yahoo Directory? Cyc? OWL? File systems?

    ODP: “The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors”. I am a small editor for ODP and I know the whole principle behind it is flowed. You simply cannot classify web sites meaningfully into a general ontology, because there are way too many categories a site has to belong too (and by way too many, I mean hundreds if not thousands). The process is usually quite arbitrary, and in the end nobody browses the categories anyway, even when using ODP directly: ODP has search. And delicious is better anyway.

    Yahoo Directory: Yahoo hired the best people they could find to build its directory service. The result: Who uses Yahoo Directory when there is Yahoo Search? Now when I go to yahoo directory I only get a search field. They entirely disabled browsing the directory (should they still call it a directory in this case?). And when searching for more obscure words you get: “No Directory Search results were found. Showing Web Search results for the term …”.

    Cyc: “The world’s largest and most complete general knowledge base and commonsense reasoning engine”. They have one of the largest ontologies in the world, and it was recently open sourced. Did anyone care? Where was IBM trolling Cycorp to open source Cyc, the way they are doing it with Sun and Java? Probably Cycorp finally realized it’s not worth anything so they dumped it to the community under the apache license. They worked 20 years on it. It was supposed to make true A.I. happen (this was what people were thinking in the 80s). Now it’s the last individual of an already extinct species. The bubble bursted in A.I., why do you think that the Semantic Web bubble won’t? Everything that is build only of hot air will eventually burst.

    OWL: It is just an XML format to store ontologies. It is subject to all the problems of ontologies I mentioned above. Do you know of places where it’s meaningfully used? Other than FOAF please.

    File Systems: Since I’m running out of time, I’ll end here. File systems have always been the best example for a very domain specific ontology the user has to build. The response: desktop search is a technology that is going to skyrocket. Why? Simply because people don’t really want to have the trouble of maintaining even a very small ontology. Because usually the files end up in a terrible mess on the desktop and then in garbage can. When I moved to a Mac, at first I thought that programs like iTunes that want to organize my music files themselves using metadata from the web are way to intrusive. I was still used to maintain the ontology myself. But not I am thinking, why bother? Why not let iTunes organize (i.e. index) my music? Then I can use search to find whatever I am looking for.

  3. Henry Story says:

    You could have made the same claim with respect to the web until 1993. There were other hypertext systems and they failed. The fact that previous technologies failed is not a proof that they are going to fail again (see: Semantic web: a note on the history of technology adoption).
    Regarding CYC. They were trying to do something a lot more complicated that what we are trying to do with the Semantic Web. They were trying to encode common sense knowledge. We are just trying to make data available. Think of the Semantic Web as simply Databases+URLs. A good example is my recent SPARQLing Roller where I show how one can make the Roller database queriable using the terms found in the well known Atom xml document format. Doing this is simple to do, and could proove very useful for finding entries that may otherwise be difficult to find. It also works very nicely with tagging.
    Again we are not trying to create a perfect ontology. The Semantic web should be grown not designed. Start simple and solve simple problems one at a time. If you have valuable data, people will search it. SPARQL gives us a nice standard query language. RDF makes it easy to mix and match different vocabularies. Just use those vocabularies that are allready well established, it will reduce your work. The problem with Open Directory project, from what you are saying, is that you were trying to create a perfect initial ontology. We just say: let it grow.

    So what is the main advantage of RDF over other languages. Simple. It uses URIs to identify concepts, and so makes it easy to GET their meaning. It is a very simple improovement. But that’s exactly the improovement that made the web so successful. The web is after all just a document format + URIs to name them, and HTTP to GET them.

  4. hritcu says:

    We hardly have an argument here. You still try to convince me that RDF is great, but I already agree with you on this. So let’s make this clear: RDF is great for storing and querying metadata. I was not saying that the web won’t become more “semantic” – it surely will, or that’s impossible that maybe one day we will have our metadata stored in a meaningful way. However from this to the W3C vision of the semantic web there is a very big gap.

    What I am trying to explain to you is that the only kind of meaningful layer that can be built on top of XML and RDF can only be based on tagging and searching. Ontologies in the Cyc/W3C OWL sense are just not going to happen again (they just died), and this is for exactly the same reason you mention in your note on the history of technology adoption: there is no “wide enough spread of foundational technologies on which the successful one can grow”. The simple reason is that for Ontologies to become mainstream we would first need at least true A.I., and as you can probably see, true A.I. is not here yet, and won’t be for a very long time. So unless you can make a compelling argument against this we already agree on everything else.

  5. Henry Story says:

    You wrote

    The simple reason is that for Ontologies to become mainstream we would first need at least true A.I., and as you can probably see, true A.I. is not here yet, and won’t be for a very long time. So unless you can make a compelling argument against this we already agree on everything else.

    This is where I disagree. We don’t need true AI. People just need to see the value in opening up their databases. The value is obvious. It is just an extension of Metcalf’s law. You are blinded because you look at the fulfilled promise of the Semantic Web, which is a long way off. But there are a lot of small steps that lead to it. Again, have a look at my SPARQLing Roller example to see what the first step is.

    Small, simple ontologies are the way forward. The network effect will take care of the rest.

  6. […] So what do I think would be a better solution? Deprecate HTML and XHTML Transitional entirely (that should include completely removing the validators, they are poor anyway) in favor of XHTML. Whoever would still want to publish “tag soup” online would not adhere to any standard, and whoever wants to render “tag soup” in a browser would not adhere to any standard (this is the current situation anyway). However this could be an incentive for all (web developers, web publishing software developers and web browser developers) to go away from “tag soup” and towards something better (yes, I do think that XHTML is better than HTML). Maybe I am wrong, but I don’t think that you can built a meaningful web by having “tag soup” as a foundation. Notice that I did not use the word semantic in the last phrase, here is why, and here. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: