Tuesday, August 2, 2016

Taxon vs Taxon Concept

In our August issue of the OBKMS blog, I would like to write about the "Taxon vs Taxon Concept" debate. July was the month for this heated debate for myself. And while I started the month not really sure what a taxon concept is—and thinking I know what a taxon is—I am ending it with a changed perception of what a taxon is and having a pretty much better idea of what a taxon concept is.

But let’s rewind and set the stage. Enter the taxon:
“A taxonomic unit, whether named or not: i.e. a population, or group of populations of organisms which are usually inferred to be phylogenetically related and which have characters in common which differentiate (q.v.) the unit (e.g. a geographic population, a genus, a family, an order) from other such units. A taxon encompasses all included taxa of lower rank (q.v.) and individual organisms. [...]" (Wikipedia citing the Code).
So, the taxon is the natural group taxonomists are studying and we ought to model it in OBKMS!

Well, not so fast!

OBKMS is a scientific database that aims to integrate taxonomic information. Being a scientific database, it integrates information about taxa found in scientific theories about taxa. Enter the taxon concept:
“A taxonomic concept is the underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication. It represents the author’s full-blown view of how the name is supposed to reach out to objects in nature.”
Okay, so the taxon concept is a circumscription of a taxon by a taxonomist in a written record. So, as the OBKMS will initially process scientific articles, and extract their information, one could argue that the information extracted from the articles will be encapsulated in informational items, which are taxon concepts. So, it is even more natural to model taxon concepts, as well.

But then we model taxa and taxon concepts? Taxon concepts “are about” taxa. Well, not necessarily. Essentially our database needs to do two things:

  1. It needs to capture/ extract the information contained in various biodiversity texts and records.
  2. It needs to link this information “to the real world.” I.e. when we want to know something about a taxon, an entity in the real world, we want to be able to ask the database what information it stores about this entity.

It seems natural to think of the database taxon, x, as an entity standing for the real world taxon and linked to the information concepts. However, what is the real nature of x, i.e. what is the entity x that stands for the real world object taxon? In semiotics, the branch of philosophy dealing with symbolism, the act of linking of units of thoughts to real things is modeled by a triple/ triangle: [symbol - signifier, reference - unit of thought/ meaning, referent - real world object]. See http://biorxiv.org/content/early/2015/07/10/022145 for details. So, the taxon is the referent, i.e. the real world object we are modeling. The taxon concept is the reference, i.e. unit of thought that we ought to link to the real world object taxon. What is the symbol then? Enter the scientific name:
“The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the identificationQualifier term. 
Examples: ‘Coleoptera’ (order), ‘Vespertilionidae’ (family), ‘Manis’ (genus), ‘Ctenomys sociabilis’ (genus + specificEpithet), ‘Ambystoma tigrinum diaboli’ (genus + specificEpithet + infraspecificEpithet), ‘Roptrocerus typographi (Györfi, 1952)’ (genus + specificEpithet + scientificNameAuthorship), ‘Quercus agrifolia var. oxyadenia (Torr.) J.T. Howell’ (genus + specificEpithet + taxonRank + infraspecificEpithet + scientificNameAuthorship).”
So the scientific name is the symbol that stands for the taxon in nature. In a sense, the taxon is forever inaccessible to the world of informatics, as we cannot speak directly of it. Only the act of naming creates a connection between the world of thought and the real world. So we do not need to model taxa! We need to model scientific names!

Yes, but that is not the full story. Remember the semiotic triangle: it links concepts to taxa, the names symbolize concepts and stand for taxa. So, we want to know something about a taxon, we use its scientific name instead and request the information about it (the concept). Clearly, there is a one-to-one mapping between scientific names and taxonomic concepts!


Taxonomic concepts evolve. Enter the African elephant, Loxodonta africana. Until 2000 L. africana symbolized a taxonomic concept that included both the bush elephant and the forest elephant (they were considered subspecies of the same species). However, Grubb et al (2000) hypothesized that the two subspecies are two species and a Nature article in 2001 (DOI: 10.1126/science.1059936) proved that they are about as different as the Asian elephant and the woolly mammoth. So, to many, after 2001, L. africana stands only for the bush elephant and symbolizes a different revised taxonomic concept. The forest elephant is now “stood for” by the symbol L. cyclotis.

But what other symbol do we use to unambiguously link referents (taxa) to their references/ meanings (taxonomic concepts)? Thanks to Nico Franz we have this symbol and it is called taxonomic concept label. Essentially, it contains an extra part that adds the reference to the record containing the taxon concept to the end of scientific name string. In the elephant example above, we might say L. africana sensu Grubb et al (2000) to unambiguously speak of the taxon referenced by the taxon concept of Grubb et al (2000). The whole string “L. africana sensu Grubb et al (2000)” is the taxonomic concept label string of the taxon concept of Grubb et al (2000).

That's it for now—this discussion was difficult but I think needed!

Live long and prosper!