The Provincial Library „Dr. Friedrich Teßmann“ participates together with a group of 16 partner institutions from all over Europe in the EU-project „Europeana Newspapers – A Gateway To Newspapers Online“ with the objective to optimise and simplify research in digitised historical newspaper pages.
Daily press is an important source of information – not only as a means of coverage of local news and politics and world affairs but also as a testimonial of past days. Therefore, it becomes a valuable research base e.g. for studies in history, social science or linguistics. To simplify the access to historical newspapers and to protect the paper from decomposition, numerous libraries and other institutions have begun to scan large newspaper stocks and to make these digitised versions accessible for their users. Between 2006 and 2011 also the Teßmann Library digitised about 1.5 million newspaper pages that are accessible through its portal „Teßmann digital
“. Over 40 newspapers and periodicals of the area of historical Tyrol are presented to the users within this portal and can be selected easily from the list of titles or the calendar navigation. Since the digital copies are currently only available as image files, it is not possible to search for keywords within the newspaper texts and users have to proceed the traditional way by flipping through every single page. This, however, is going to change thanks to the participation of the Teßmann Library in the EU-project „Europeana Newspapers“ that aims to enable specific research in digitised newspaper stocks. For this purpose the digitised pages are processed with a software for automatic text conversion (OCR – Optical Character Recognition) and article segmentation (OLR – Optical Layout Recognition) allowing full-text search in digitised newspapers as well as the detection of a specific keyword in a certain position within a text, for example in a headline or a lead.
A particular challenge for automatic character recognition is that the majority of historical newspapers were printed in Gothic print which cannot always be detected properly by the OCR software. Therefore, the University of Innsbruck and the German company CSS- Content Conversion Specialists GmbH are working on the improvement of the automatic character recognition of historical publications in the course of this EU-project. Their findings will be documented in a best-practice paper which may provide a base to other institutions and projects dealing with automatic indexing of digitised texts.
At the end of the EU-project the participating libraries will provide over 18 millions of digitised newspaper pages technically refined for specific research through the European online culture portal Europeana (www.europeana.eu
). The Teßmann Library will then upload those refined newspaper pages to its portal „Digital Newspaper Archive“, allowing its users the specific search within the stock of digitised newspapers.
Institutions participating in the project:
|Berlin State Library
National Library of Estonia
University of Helsinki, National Library of Finland
National Library of France
CCS Content Conversion Specialists GmbH
National Library of Latvia
University of Belgrad
Dr. Friedrich Teßmann Library
University of Salford
|National Library of the Netherlands
Austrian National Library
Hamburg State and University Library
National Library of Poland
National Library of Turkey
University of Innsbruck
The British Library
The European Library
Further information on the project can be read on the website of the EU-project www.europeana-newspapers.eu
For questions on the project please contact Karin Pircher (Karin.Pircher@tessmann.it