We are pleased to present benchmark results for the
datasets of the ICDAR 2003/2005 Competitions, produced with SceneReader 2.0.
These results are supplied in the form of annotated images showing
text matches found and processing time.
The colour of the annotation reflects the confidence of the match:
- green: high confidence
- yellow: medium confidence
- red: low confidence
All the benchmark analyses were performed
using the same general-purpose font knowledgebase;
no image-specific training was required or used.
Please refer to the Technical notes prior to exploring the benchmark results data.
Details of benchmark tests
We include results for both the "TrialTest" and "TrialTrain" datasets.
Since SceneReader does not require pre-training, both datasets are
treated purely as test data.
Test results are supplied for two dictionaries.
The first dictionary includes just the ICDAR
"words.xml" listings for each dataset (with duplicates omitted,
and a handful of corrections).
The second dictionary includes both the ICDAR words and
all the words in a general-purpose 60,000 word English dictionary.
All tests were performed on a single 3.08GHz Pentium 4 CPU machine,
with 512KB cache and 1Gb RAM, running Slackware 12.2 O/S.
Please contact us if you wish to
see results in the ICDAR competition's XML output format, rather
than as annotated images. Additionally, we welcome the opportunity to
benchmark SceneReader against other text-related image collections.