ImportantWe are currently still in Alpha Development of this service and hope to conduct a private beta Fall 2011.
If you are a collection manager and interested in using this service please contact us.

How it works...

Current methods being used in the world:

  • Data Entry - This this the most manual approach where curators, student workers, and volunteers sit in the herbarium and manually enter information from the physical specimen directly into the collection management database.
  • OCR (Optical Character Recognition) - This is a computer solution to attempt to analyze and identify the label information found on the image. A proof of concept was created called HERBIS.
  • Online Data Entry - This is where the images have been photographed and then data entry is done online. Current project that is using this approach is Herbaria@Home.

What we do different: To start we have focused being able to allow many people to work in parallel and to have different level of volunteers to help. In order to do this we needed to come up with a system that work in small work units. So the overall strategy has been to have a 3 stage process that breaks a Specimen Sheet image down to its individual elements and then be reassembled.

Sample Specimen Sheet

Example: Lets say that the specimen image on the right here is a specimen we would like to have a digital record for so we can understand what it is. More importantly we want to know WHAT it is, WHEN it was collected, and WHERE it was found. This gives us a historical timestamp for researched to understand more about our world.

Stage 1 - Identifying Labels

Our first stage is a simple but vital part of the process. Special volunteers are presented with a specimen image from the queue of millions of specimen images and is asked to located and draw a box around the labels of interest. Those labels consist of Accession stamps, handwritten labels and determinations. This is done by these images are sent out to our chopper and individual images are created of each of those labels marked. These images are what we call Labels are will be used in Stage 2. After the images are chopped and before they arrive at Stage 2 they are handed over to Evernote so that they can take a close inspection of the label and try to identify the location of all the words, handwritten and digital, found on the label. Once they find a word they also give us all the possible words it could be. Once they send this information back to us we then analyze the finding and make some educated predictions based on known vocabularies. This helps suggest information to the user in Stage 2 to expedite the process.

Stage 2 - Identifying Fields

Now that we have a label we know that it contains words on this image that are relevant to our WHO, WHAT, and WHERE. The HelpingScience processing engine has also analyzed the words and can now present this label to a user with suggest advice on how the user should proceed. This stage is the most critical and requires botanical knowledge and a general understanding of herbarium labels to be able to identify the fields that we are trying to find. Again our team of volunteers will get a random label from the queue and is presented with tools to mark the words of interest. You can view a complete list to the right that the system currently captures. Once the labels has been identified it also gets sent to our chopper. In this step our chopper is creating individual field images and associates any of the Evernote related words.

Historical Recognition and Lexical Grouping is an internal process that takes place between the 2 stages and is a self learning system that uses OCR and historical patters to form group of words that all should mean the same. So once the system learns how to recognize a Field it is then know for any future identical Field. This is a huge boost to the system and help to expedite the turnaround time for processing a Specimen Image.

Stage 3 - Identifying Fields

This is where anyone and everyone can help. Now that we have millions of individual fields of known types we can then engage volunteers to help us enter these values. Not only will you help the Botanical world but to encourage you to do a little extra we have a rewards system that is our on going effort to help charitable organizations, student research, and to provide software & equipment to herbaria in need. As you play our games and use our app's you earn HS Tokens, these tokens you can redeem in our store choose how to help to community.

Stage 4 - Assembly of Data

Once all the Fields have been validated we then make this data available to the client so that they can download this information and use it within their on Collection Management Software.