Saturday, September 12, 2015

What can we do for the OBKMS to happen?

This is my first blog post here, so it merits to define what actually The Open Biodiversity Knowledge Management System (OBKMS) is, or at least how I view it. The OBKMS was defined as part of the EU-funded project pro-iBiosphere. In a nutshell, its implementation would allow biodiversity and biodiversity-related data to freely flow from acquisition to analysis and storage to publishing and back.

One of its early critics, and I mean this in a positive way, since only through critique can a concept be improved---it is the very essense of the scientific method---is a person, whose blog I will look up to for inspiration and as a model in writing mine, namely none other than Prof. Rod Page. In a personal correspondence with me, Prof. Page organized his concerns into the following major categories:

  1. The system itself needs to be well-defined: what it is, what it is trying to achieve, etc.
  2. Linked data and semantic web are still in a very early phase of their development and represent hopes rather than address real challenges.
  3. Network effects need to be leveraged. In his language, a "network effect" is an effect "where both users and providers get tangible benefits."
In my current grasp of OBKMS, I completely agree with Prof. Page on (1) and (3) and disagree on (2). I do understand the dislike in the scientific community for linked data---scientists are used to storing data in a much different way than triples---but I also believe in linked data at the moment. I also believe that in the future there will be ways to easily transform tabular data into triple-store and back. For an idea, you could check an article by Allocca and Gougousis (2015), whose academic editor was the humble writer of this blog, which gives an idea of how to reverse RML.

This is my first post for now. Next week I will be at the BIG4 kick-off meeting in Copenhagen, where I will present some interesting workflows that I've been helping develop for the Biodiversity Data Journal.

1 comment:

  1. Hi Viktor,

    Nice bog, looking forward to reading more. Regarding (2), it's not so much that linked data and the semantic web are still in the early stages (they've actually been around for a while), it's just that nobody seems to have clearly articulated what problems they solve, what the drivers are for adopting those technologies, and how they are going to be implemented in practice. Instead, it's become a slogan that people invoke like magic pixie dust.

    By way of contrast, it's interesting to look at how structured data in web sites has taken off (which could be viewed as semantic web-lite, at a stretch).

    1. There's a clear driver, in that search in a mobile world needs to be about things, not web pages about things. So Google, Facebook, etc. want to be able to provide answers to searches that are entities. If I'm looking for a place to eat, I don't want a web page about places to eat, I want a list restaurants, directions, and reviews.

    2. Google uses two approaches to building its knowledge graph of entities (things), sophisticated text mining/machine learning, and structured markup.

    3. Given the importance of Google search to web masters, there is a very clear driver for web sites to provide structured markup for Google, so that they will be discovered and correctly interpreted. Markup = discoverability, no markup = drop in ranking.

    4. Google provides simple markup ( designed to to capture core entities and their relationships, and encourages use of consistent identifiers.

    I've yet to see anyone in biodiversity informatics articulate a comparable vision for our field, and until we do I think we are essentially whistling in the wind.