Monday, January 25, 2016

PlutoF - ARPHA Workflow Established

Estonia, the homeland of Skype, is also the home of PlutoF, a global biodiversity data management system. The purpose of PlutoF is to “Create, manage, share, analyse and publish biology-related databases and projects.” It can administer research projects, as well as citizen science projects. It can administer all kinds of taxon occurrence data such as specimens, observations, sequences, living cultures, bibliographic references, or material samples. Finally, it has several work-benches, called “labs,” which facilitate work with files or collections, publishing work, taxonomy work, data analysis, and work with molecular data. It is a web-based multi-user system that is free of charge. The PlutoF services are utilized by several big academic and citizen science projects in Estonia and around the world, such as UNITE, the Unified system for DNA based fungal species linked to classification, and DINA, an open-source Web-based information management system for natural history data.

The PlutoF system can be useful to individual researchers, citizen scientists, or natural history museums to manage all their data. To learn more about the system and initiate collaboration, Pensoft invited a team of scientists and engineers to visit Bulgaria in the fall of 2015.


PLutoFteamBulgaria_small.jpg
The PlutoF technical team in Pensoft’s office in Sofia. From left to right: Dr. Kessy Abarenkov, bioinformatician, database designer; Alan Zirk, team leader; Prof. Urmas Kõljalg, a world-renowned mycologist and bioinformatician, visionary; Timo Piirman, back-end developer, API design; Raivo Pöhönen, front-end developer, user experience design.

Several ideas grew out of this meeting, one of which has already been deployed in the production environment of Pensoft, for which I was the main responsible person from the Pensoft side. Namely, the users of ARPHA can now import specimen or occurrence records from the PlutoF database directly into an ARPHA manuscript via their Specimen ID. In order to accomplish this task, Timo designed a specialized API for the export of occurrence data from PlutoF and I, with the help of other programmers, wrote the software necessary for ingesting this data and transforming it into a record in the manuscript. Essentially, it is a two-step process, due to the fact that universal identifiers are not deployed widely in the taxonomic community.

Currently, the most widely used system for naming and identifying specimens in biodiversity science is the so-called Darwin Core Triplet, which consists of the institution code, collection code, and catalog number. Essentially the catalog number, i.e. the textual reference that is on the physical label of the specimen, is unique within a collection but need not be unique across collections and institutions. The catalog number corresponds to the Specimen ID in the PlutoF system.

The way we thought about importing records from PlutoF via Specimen ID’s is as follows: the PlutoF user locates the Specimen ID of the record that they want to import and enters it into a dialog in the ARPHA authoring tool. This ID resolves to one or more unique ID’s in the PlutoF database, the records belonging to which are then imported into the manuscript. By looking at the imported data, the user removes irrelevant records. Thankfully in most of the cases, the Specimen ID’s are also unique and the user does not need to do the last step.

In short, thanks to the efforts of the Pensoft team and our partners world-wide, ARPHA can now import specimen and occurrence records directly from the following repositories: GBIF, BOLD Systems, iDigBio, and PlutoF.

Automated import of specimen records into the ARPHA writing tool.

There are numerous further workflows that I am certainly looking forward to collaborating on with PlutoF. One is the very important import of specimen records from a Species Hypothesis (SH) ID. Species Hypotheses is the terminology used by the UNITE to describe DNA-based fungal species, equivalent to the Operational Taxonomic Units (OTU’s) terminology used by analogous platforms such as BOLD Systems. In order to streamline the publication of these SH’s as new species we plan to develop a workflow that takes all specimen records linked to the particular SH and imports them in a treatment in a manuscript authored in ARPHA.

I expect to establish this and other workflows, such as data paper generation, between ARPHA and PlutoF in the near future, for which I will give regular updates. This concludes today’s discussion.

No comments:

Post a Comment