Importance of digitisation lies in comprehensive collation of past data, which are available to them on various devices and making the content searchable, extractable and shareable across various mediums.
By Apurva Ashar
The onus to go paperless is urging organizations, big and small, to take up the cause of digitization. Importance lies in comprehensive collation of past data, which are available to them on various devices and making the content searchable, extractable and shareable across various mediums. Owing to this rapid access of documents on cloud-based solutions, we now have added mobility to the earlier tedious task of building content.
The first thing that comes to mind when we talk about digitization is “scanning”. Though, scanning is the first step of digitization in most cases, it is not the ideal end-product. A scanned document is merely a “digital replica” of the printed version but it lacks so many functionalities of a true digital document, viz. reusability, search-ability, possibility to extract partial text from it and the most important the ability to use the e-text for test to speech purposes.
With paperless offices being one aspect of digitizing content, another fascinating aspect is the conversion of existing literary content – namely books – into dynamic digital content. The perspective here is to treat literature as reusable data and in India with so many regional languages, each with their own rich culture and literature – we are sitting on a wealth of reusable literary resource. Publishing industry has been utilizing desktop publishing for over 25 years now. This fact essentially translates into an unspoken reality of today that every material that has either been published or is ready to be published must have a computerized source. These sources hold immense potential to create accessible eBooks with standard guidelines for the kind of format and fonts used, converting them to mainstream books in electronic format that can be used for archiving ancient literature, preserving data that is searchable, creating eBooks, print books, and books for persons with print-disability.
Unfortunately, for Indian languages, owing to the haphazard design and use of non-standard fonts and scanned documents, we haven’t been able to utilize these electronic source files to recognize and address the dire need for books in accessible formats of data. Lack of proper frameworks, tools and methodology for digitizing Indian regional language makes it difficult to make information in this language available on search engines like Google and Yahoo.
Innovation in technology has made it possible to create true extractable, searchable and reusable data in Indian languages just as efficiently as in English. Conversion to digital format does not merely mean scanning the document and tagging it to be able to search it for future use. Scanning documents is the very first step towards digitization of data. These scanned photos then need to be passed through OCR which gives us a Unicode format of the content. To ensure a higher accuracy, it is then run through intelligent search replace i.e. the vocabulary and a dictionary. The discrepancies found in typing at this stage are then fixed and matched with the original document that we received in the first place. This digital format after having processed through intelligent search replacement and type fixes, moves on to the next stage of proof reading and the errors found here.
Hence, after a rigorous filtering process of digital conversion of the said content into a viable and dynamic format we get a master book in its electronic format. These electronic books are already being used for main stream printing and for projects such as eBasta.
Thanks to the constant innovation in technology, the process is to be carried out only once where the product is a master electronic document of the entire content and designing it too, is only a one-time effort. If proper guidelines are followed, with use of proper tools, structured working methodology and directed efforts, publishers, authors, archivers and even government departments, businesses and offices operating in Indian languages like Hindi, Gujarati, Marathi, Odia, Tamil, etc. will require to design their books just once. Moreover, transliteration from one given language to the other is also just a click away now and that makes all the extra paperwork redundant across sectors.
The beauty of this methodology is that such a master document can be then used as a pool of searchable data, print book, eBook, eBooks for people with print disabilities as well as Braille books. Latest developments and the digital technology revolution has a potential of providing a solution to produce books in such formats and all we would need then, is the master book in its electronic format.
The author is executive director, ePUB-Hub, an Ahmedabad based digital publishing company.