Wednesday, October 25, 2006

Automatic updates on publications, sequences, taxa ...

In the ideal world, I would like to focus on research rather than searching for existing information and keeping this database up-to-date.

Bibliographies:

Antweb.org 's "In the News" section based on an RSS feed from UBIO

Semant library at Connotea created by Rod Page is also initiatlly based on UBio's RSS feed, but then allows more refined searches. Detailed explanations are at Rod's semant blog. It would be intersting, if more colleagues would try it out, add more tags and eventually papers the use in their research, and we could get an integration of FORMIS and antbase.org's reference bibliography of the online catalogue, and through it have acccess to the original online literature. It would also be helpful, if those colleagues self-archiving their papers, which are otherwise not online accessible, could add their link to the semant library. Would ease the process of finding a publication much easier, and warrant a higher chance, that your papers would be read and cited.

New Taxa

Norm Johnson recently installed a very simple new taxon notification alert at the Hymenoptera Name Server / antbase.org. You can sign up at the bottom of the taxa pages you get when searching for a particular taxon, for example at antbase.org.

Wednesday, October 11, 2006

Thoughts on the future of antbase

Whilst doing some research for an "open access" newspaper article, I found this interesting piece on Web2.0 by Tim O'Reilly.

One interesting aspect is clearly how to harvest the wisdom of all the users of antbase so we can let antbase grow and improve: what is needed that users contribute?

Tuesday, September 05, 2006

Digital Library Issues

Improved Access to Systematics Publications: Releasing the Power of Legacy Data

Systematics publications are unique among scientific publications. Besides their quasi legal status as a conditional part of the description of a new taxon recommended by the International Code for Zoological Nomenclature (ICZN for animals, with similar codes for plants, bacteria, virus and fungi), they are highly structured and standardized. All the descriptive content is linked to a particular taxon, and they are very rich in descriptive data, the original (and subsequent) description of the taxon. These descriptions are not just taxon hypotheses but include various amounts of morphological and more recently molecular characters, materials examined, notes on behavior and distribution (an interpretation of materials examined), nomenclatorial sections, phylogenies, bibliographic references and visual art. Finally most of the content is factual knowledge, or a description of a piece of nature. Recent publications share additionally some of the structural elements of standard scientific publications (e.g. abstract, introduction, acknowledgments, etc.).

Over the centuries (since 1758 or the 10th edition of Linnaeus’ Systema Naturae), the basic structure of descriptions have not changed substantially. In its most basic lay out, they include a title (title and author) and a list of treatments. They include a nomenclatorial section containing minimally a name of the taxon, a brief description and mentioning of its distribution.

Each element in the descriptive part of a systematic publication can thus be related to a particular taxon, in particular position in a particular publication.

“Red head” of taxon X on page Y in publication Z is enough to locate it in the entire body of our legacy data. To make this machine readable, this entire relationship can be standardized, using ontologies (or controlled vocabulary), DOIs and LSIDs identifying each elements. Rod Page shows how to retrieve it. The page number is given by original designation of the position within the hard copy, which is no longer available from electronic publications. The taxon LSID can be automaticall retrieved from the Hymenoptera Name Server. A controled vocabulary or ontology is being developed by a group within the International Society of Hymenopterists, including antbase and HNS participation.

This nested structure makes systematics publication a prime candidate for automated data extraction, which we currently try to develop using, among other taxa, ant literature as pilot group (see TWiki on ants). TaxonX is the XML schema we developed and are now applying to an increasing body of publications to see its strength and value.

In fact the enormous amount of data collated over the last 250 years is the prime reason to make an effort to find ways to extract this information. For ants alone, systematics publications include over 90,000 pages, and most likely several 10 million pages for all the world’s currently described species. If we are lucky, some of it will be made accessible through the Biodiversity Heritage Library, the Biologia Centrali Americana, Animal Base and other digitization efforts.

At antbase we are currently looking into marking up all the ca 120 publications covering the Malagasy ant fauna, the ant publications from within the American Museum of Natural History Novitates, and incoming new publications.

Since descriptions are factual knowledge, they can not be copyrighted and thus be made accessible over the Web. Scientific practice demands a acknowledgement of the authorship, which at the same time is the proof of quality (i.e. it has been published).

Tuesday, August 29, 2006

Some comments on the ant phylogenetics symposium held at Washington D.C. Part II.

Ant phylogenetics: New molecular trees to address old problems in ant biology. XV Congress IUSSI. August 1, 2006. Washington D.C.

Part I dealt only with the first four talks of the symposium; those happened to be the talks addressing phylogenetic problems at the highest hierarchical levels within Formicidae. The rest concentrated mostly on single genera or less inclusive clades with an improvement on detail. As such, they also highlighted the reasons behind phylogenetic reconstruction: improve taxonomy (Schöning et al.; LaPolla and Schultz; Wild); test predictions about the evolution of behavior (Savolainen and Vepsäläinen; Peeters); and introduce a historical component to ecological questions (Crozier et al.; Schultz and Brady; Solomon and Mueller). Rather than reviewing each of the remaining talks here I want to comment on two of them.

Alex Wild (Tucson, Arizona) presented part of his work on Linepithema. The revision of this genus is important not only because the taxonomy of the included species was outdated, but also because the Argentine ant, Linepithema humile (Mayr), is a predominant invasive ant worldwide. What is worth noting is Wild’s integrative approach to species delimitation. He draws data from morphology, mitochondrial and nuclear molecular markers and combines the information looking for agreement. The project is a painful reminder of the current state in ant species-level taxonomy; while female workers are the most commonly collected and therefore used cast in classification studies, adult males seem to display the greatest morphological diversity useful at the species level. Incidentally, Wild also showed that pure molecular approaches may be insufficient for the task, as with the use of a short COI barcode where calibrating an adequate threshold to reflect species circumscribed by an integrative approach proved daunting. Wild states in print that he explicitly follows E. Mayr’s biological species concept (Phil Ward’s fault I suppose), however it seems to me that by searching for congruence of the different lines of evidence he is really reconstructing species as historical units of the sort advocated by the phylogenetic species concept, a result that is conveniently more in line with phylogenetic reconstruction in general.

Riitta Savolainen (Helsinki) talk presented some preliminary results on a phylogenetic study of the host-parasite association between Formica and Polyergus. Previous phylogenetic work using both morphology and molecular data support a close or sister relationship between these two genera. My understanding is that the possibility exists that Polyergus may be a derived subgroup within the large Formica genus. It was puzzling to see, therefore, that a phylogeny for each genus was reconstructed independently, even though the molecular markers used were the same. Even if both genera are monophyletic, the best reconstruction within each genus will be achieved by pooling together all the species into a single matrix and rooting the result at the branch between them or using the more distant outgroups originally included.

In comparison with the similar symposium held at the previous IUSSI meeting in Sapporo (2002), the number, scope and quality of the talks presented at Washington D.C. was far superior and reflects the long overdue incorporation of cladistic methods at all levels of ant taxonomy. We have to congratulate Sean Brady and Riitta Savolainen for organizing such a stimulating symposium.

Monday, August 21, 2006

Some comments on the ant phylogenetics symposium held at Washington D.C. Part I.

Ant phylogenetics: New molecular trees to address old problems in ant biology. XV Congress IUSSI. August 1, 2006. Washington D.C.

The general mood of the symposium was good. In part this had to do with the fact that the venue was adequate for the event; a large and well-lit conference room at its full capacity. Phil Ward (Davis, California) opened the symposium talking about their Ant-AToL project clearly feeling grand about the full attendance and with the advantage of presenting before Moreau et al.’s talk, yet with characteristic lusterless. Their result so far is the same as the one recently published by Moreau et al.’s in Science (and it better be, since they have practically the same taxon sampling and genes, except using EF1-alpha instead of CO-I), but, unlike the Science report, they voiced the concern that the ant tree may be rooting in the wrong place; an analysis including just the ant sequences results in a topology that cannot be rooted to achieve the same results as with outgroups included. Basically there cannot be a Poneroid clade independent of Leptanillinae (see Moreau et al. 2006). They proceeded to constrain the root at different places within the ingroup comparing the parsimony/likelihood/probability scores (Lumberg-rooting style) to argue that in any case placing the root inside the large Formicoid cluster produces highly suboptimal results. I fail to see the point of this exercise since the global optimal solution for these data (including the ingroup and outgroup sequences simultaneously) already showed that the root doesn’t end up inside that clade. At the end recovering a highly supported Formicoid clade seems to be what most excites the AtoL group.


Corrie Moreau (Harvard) followed Ward's talk. She gave a good and fluid talk and stood pleased even though it was a repetition of results from the previous talk (yet already published by her group), but including the part of dating and correlation with the diversification of Angiosperms. She also cast doubts on the root position, mentioning the fact that both Myrmecia and Amblyopone have been hypothesized as the most primitive ants even though you cannot have both, since they occur well nested inside unrelated groups. Both her and Ward said that it will probably be necessary to sequence the other nominal Leptanillinae taxa to break the assumed long-branch attraction artifact. However the question remains if the problem if confined to Leptanillinae alone or if the whole so-called Poneroid clade is acting as an “attractor” of the divergent outgroup sequences and thus the ant tree is getting rooted upside-down. After all this “clade” contains several long branches (e.g., what is the lone Paraponera clavata doing in the middle of it?). A more promising solution would be to balance the outgroup portion of the analysis, but we will have to wait for the HymATol team for this.

Chris Schmidt (Tucson, Arizona) talked next. He is working on a phylogeny of the Ponerinae sensu Bolton and not wasting any time since he downloaded many sequences from Moreau's et al. to include as many outgroups as ingroup terminals. This talk was much more sophisticated than the rest in the symposium in terms of phylogenetic methods (probably under the influence of D. Maddison), and talked about the use of mixed models instead of the traditional way of applying the same model across a given gene or a priori codon partition. The results on the tree topology are very different, with the exception of Platythyrea that always comes as a long branch sister to Ponerini. The use of mixed models seems to be a good trend among researches fond of model-base approaches, and I wonder how far will they go before realizing they came back full circle to the realm of parsimony. I only regret that C. Schmidt omitted a discussion of the taxonomic and nomenclatural problems of this group as advertised in his abstract, since sorting out taxa like Pachycondyla is going to be the real challenge for this subfamily.

After this came the second turn for the Ant-AtoL team with a talk by Sean Brady (Smithsonian), co-organizer of the symposium. In this talk the AtoL team addressed the issue of dating the molecular ant phylogeny. The results are, again, basically the same as the ones published and presented by Moreau et al. However Brady also showed that his team is exploring the sensibility of their estimations to different parameters like models, taxon sampling and tree topology. This is important given the apparent problems with the ant tree reconstruction itself discussed earlier.

I will end this already long first part noting that all of the above speakers in some way or another acknowledged the need to incorporate morphology into the picture. In part to ameliorate the issues of poor resolution and indecisiveness that the current molecular data is showing about the relationships among the ant subfamilies, but also to be able to incorporate the fossil information more precisely and achieve a more complete picture of the ant’s phylogenetic history. In all, this positive attitude towards morphology is the best thing coming out of a molecular phylogenetics symposium.

To Part II

Wednesday, June 21, 2006

Rod Page reminded me on his SemAnt blog, that my antbase blog is still unpopulated. There is no excuse for this, but that I am not sure, how to structure this blog. Shall it be the history and thus a documentation of the development of antbase, which begun as the Social Insects World Wide Web? Shall it be about the social interactions in which we were involved building up this Web site? Shall it be about the ideas and the technical implementation? Shall it be about visions and strategic planning?

I don't know - I can't make up my mind - there are a lot of thoughts, experiences and dreams in dire need to be sorted out.

So, I will use this blog to help me sort out the various issues and write down ideas, bits of history as they come along. Hopefully, at some point, it will morph into a nice structured representation of a rathe complex world.

For those interested in the goals of antbase, there is a brief description at antbase. Write up and outreach are at this site. One of the most important development, the building of a semantic type digital library is here.

More general issues are placed either in biosyscontext for systematics specific issues, or biodivcontext for issues relating to biodiversity science, conservation and sociology.