Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Awakening taxonomist’s third eye: exploring the utility of computer vision and deep learning in insect systematics
Stockholm University, Faculty of Science, Department of Zoology, Systematic Zoology. Savantic AB, Sweden.ORCID iD: 0000-0003-1093-2752
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Automating taxonomic identification has been a dream for many insect systematists, as this could free up time for more challenging scientific endeavours, such as circumscribing and describing new species or inferring phylogeny. The last decades have seen several attempts to develop systems for automated identification, but it has required significant skills and effort to develop them, the systems have only been able to handle specific tasks, and they have rarely been able to compete in performance with human taxonomists. With recent advances in computer vision and machine learning, this situation is now rapidly changing. Here, we illustrate several practical perspectives on the current state-of-the-art in automated identification using a case study, the discrimination of ten species of the flower chafer beetle genus Oxythyrea, and an imagined conversation between an insect taxonomist and an expert in artificial intelligence. Modern automated identification is based on training convolutional neural networks (CNNs) using extremely large training sets, so-called deep learning. In previous work, we have shown that the size of the training sets required for insect identification can be reduced significantly by using CNNs pretrained on general image classification tasks. Here we demonstrate that high-resolution images are not needed; adequate training sets for insect identification can be collected cheaply and rapidly using smartphone cameras with an attachable zoom lens. Furthermore, we demonstrate that easy-to-use off-the-shelf solutions are often sufficient even for challenging identification tasks that are impossible for humans, such as separating the sexes from the dorsal habitus in these beetles. Finally, we demonstrate recent techniques that can highlight the morphological features used by the trained systems in discriminating species. This can be helpful in developing a better understanding of how the species differ. We hope that these results will encourage insect systematists to explore some of the many exciting opportunities that modern AI tools offer.

National Category
Zoology
Research subject
Zoology; Systematic Zoology
Identifiers
URN: urn:nbn:se:su:diva-189456OAI: oai:DiVA.org:su-189456DiVA, id: diva2:1521222
Funder
EU, Horizon 2020, 642241Available from: 2021-01-22 Created: 2021-01-22 Last updated: 2022-02-25Bibliographically approved
In thesis
1. Automated image-based taxon identification using deep learning and citizen-science contributions
Open this publication in new window or tab >>Automated image-based taxon identification using deep learning and citizen-science contributions
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The sixth mass extinction is well under way, with biodiversity disappearing at unprecedented rates in terms of species richness and biomass. At the same time, given the currentpace, we would need the next two centuries to complete the inventory of life on Earthand this is only one of the necessary steps toward monitoring and conservation of species. Clearly, there is an urgent need to accelerate the inventory and the taxonomic researchrequired to identify and describe the remaining species, a critical bottleneck. Arguably, leveraging recent technological innovations is our best chance to speed up taxonomic research. Given that taxonomy has been and still is notably visual, and the recent break-throughs in computer vision and machine learning, it seems that the time is ripe to exploreto what extent we can accelerate morphology-based taxonomy using these advances inartificial intelligence. Unfortunately, these so-called deep learning systems often requiresubstantial computational resources, large volumes of labeled training data and sophisticated technical support, which are rarely available to taxonomists. This thesis is devoted to addressing these challenges. In paper I and paper II, we focus on developing an easy-to-use (’off-the-shelf’) solution to automated image-based taxon identification, which is at the same time reliable, inexpensive, and generally applicable. This enables taxonomists to build their own automated identification systems without prohibitive investments in imaging and computation. Our proposed solution utilizes a technique called feature transfer, in which a pretrained convolutional neural network (CNN) is used to obtain image representations (”deep features”) for a taxonomic task of interest. Then, these features are used to train a simpler system, such as a linear support vector machine classifier. In paper I we optimized parameters for feature transfer on a range of challenging taxonomic tasks, from the identification of insects to higher groups --- even when they are likely to belong to subgroups that have not been seen previously --- to the identification of visually similar species that are difficult to separate for human experts. In paper II, we applied the optimal approach from paper I to a new set of tasks, including a task unsolvable by humans - separating specimens by sex from images of body parts that were not previously known to show any sexual dimorphism. Papers I and II demonstrate that off-the-shelf solutions often provide impressive identification performance while at the same time requiring minimal technical skills. In paper III, we show that phylogenetic information describing evolutionary relationships among organisms can be used to improve the performance of AI systems for taxon identification. Systems trained with phylogenetic information do as well as or better than standard systems in terms of common identification performance metrics. At the same time, the errors they make are less wrong in a biological sense, and thus more acceptable to humans. Finally, in paper IV we describe our experience from running a large-scale citizen science project organized in summer 2018, the Swedish Ladybird Project, to collect images for training automated identification systems for ladybird beetles. The project engaged more than 15,000 school children, who contributed over 5,000 images and over 15,000 hours of effort. The project demonstrates the potential of targeted citizen science efforts in collecting the required image sets for training automated taxonomic identification systems for new groups of organisms, while providing many positive educational and societal side effects.

Place, publisher, year, edition, pages
Stockholm: Department of Zoology, Stockholm University, 2021. p. 66
National Category
Zoology
Research subject
Systematic Zoology
Identifiers
urn:nbn:se:su:diva-189460 (URN)978-91-7911-416-9 (ISBN)978-91-7911-417-6 (ISBN)
Public defence
2021-03-10, Vivi Täckholmsalen (Q-salen), NPQ-huset, Svante Arrhenius väg 20, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, 642241
Available from: 2021-02-15 Created: 2021-01-25 Last updated: 2022-02-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Valan, Miroslav

Search in DiVA

By author/editor
Valan, Miroslav
By organisation
Systematic Zoology
Zoology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 353 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf