Screenshot of a responsive encoding converter designed by Jnanaranjan Sahu.
In 2012, Odisha-based non-profit Srujanika created two text encoding converters that could convert two different legacy non-Unicode based script encoding systems to Unicode. I personally tested by converting a few documents and found a lot of errors in the text. It seemed to me that converting and proofreading the content would be more time-consuming than just typing the text.
Unicode, as explained in the Unicode Consortium's website, is a computing industry standard that provides a unique number for every character irrespective of the platform, program or script. Before Unicode there existed several other standards and modified standards that broadly lay under the categories of American Standard Code for Information Interchange (ASCII) and Indian Script Code for Information Interchange (ISCII). Text encoding converters are generally used to convert the source encoding systems to a desired encoding system. The proprietary and legacy encoding systems were so popular among the desktop publishing (DTP) operators that most Indian language media houses stuck with them even after Unicode was introduced. The victims of this were those editors, journalists, writers and many native language users who had no/little knowledge of input in their own language.
The converters I explained before could solve this problem partially as they could convert only two encoding systems with about 80% linguistic accuracy. To help enhance and scale up these existing converters, three Wikimedian developers came forward. We worked together for many hours over a few months to make the converters better. When I asked my writer and journalist friends to test, the result literally thrilled me -- they all started writing in Odia on Facebook the next day. More blogs were written in Odia and the language featured in more social media interaction. The popular newspaper Sarbasadharana.com and online portal Odisha.com started using it too. It became way easier for Wikimedians to use existing resources from portals, newspapers and magazines to enrich Wikipedia. Some of the available soft copies of public domain books acquired, along with books that were relicensed to CC licenses could easily be used on Wikisource.
Though it is difficult to measure the exact percentage of growth of online Odia-language content on the internet, there's been a significant change in the past six months. Almost all the federal entities that were stuck with two non-Unicode encoding systems finally moved to Unicode, with official portal odia.odisha.gov.in including adoption of Unicode in their core policy. As a gesture of support to the development with this work the federal department has included Odia Wikipedia on the top of their resources page.
Recently, Jnanaranjan Sahu, one of core contributors to the project combined all the converters into a standalone on-wiki converter that is available both on Wikipedia and Wikisource. Many members of the larger Odia-language community have contributed in finding errors which were fixed. Gyanaranjan has recently made available a free online responsive converter that not just works from a computer but also from any smartphone. The converter has indeed helped promote the use of Odia on the Internet. The dream of the Odia version of Google is coming closer and becoming more real.
Further reading:
How the Odia Wikimedia community is enriching Wikipedia with character encoding technology. Wikimedia blog
Like Us On Facebook |
Follow Us On Twitter |
Contact HuffPost India
Also see on HuffPost:
In 2012, Odisha-based non-profit Srujanika created two text encoding converters that could convert two different legacy non-Unicode based script encoding systems to Unicode. I personally tested by converting a few documents and found a lot of errors in the text. It seemed to me that converting and proofreading the content would be more time-consuming than just typing the text.
Unicode, as explained in the Unicode Consortium's website, is a computing industry standard that provides a unique number for every character irrespective of the platform, program or script. Before Unicode there existed several other standards and modified standards that broadly lay under the categories of American Standard Code for Information Interchange (ASCII) and Indian Script Code for Information Interchange (ISCII). Text encoding converters are generally used to convert the source encoding systems to a desired encoding system. The proprietary and legacy encoding systems were so popular among the desktop publishing (DTP) operators that most Indian language media houses stuck with them even after Unicode was introduced. The victims of this were those editors, journalists, writers and many native language users who had no/little knowledge of input in their own language.
When I asked my writer and journalist friends to test, the result literally thrilled me -- they all started writing in Odia on Facebook the next day.
The converters I explained before could solve this problem partially as they could convert only two encoding systems with about 80% linguistic accuracy. To help enhance and scale up these existing converters, three Wikimedian developers came forward. We worked together for many hours over a few months to make the converters better. When I asked my writer and journalist friends to test, the result literally thrilled me -- they all started writing in Odia on Facebook the next day. More blogs were written in Odia and the language featured in more social media interaction. The popular newspaper Sarbasadharana.com and online portal Odisha.com started using it too. It became way easier for Wikimedians to use existing resources from portals, newspapers and magazines to enrich Wikipedia. Some of the available soft copies of public domain books acquired, along with books that were relicensed to CC licenses could easily be used on Wikisource.
Though it is difficult to measure the exact percentage of growth of online Odia-language content on the internet, there's been a significant change in the past six months. Almost all the federal entities that were stuck with two non-Unicode encoding systems finally moved to Unicode, with official portal odia.odisha.gov.in including adoption of Unicode in their core policy. As a gesture of support to the development with this work the federal department has included Odia Wikipedia on the top of their resources page.
The dream of the Odia version of Google is coming closer and becoming more real.
Recently, Jnanaranjan Sahu, one of core contributors to the project combined all the converters into a standalone on-wiki converter that is available both on Wikipedia and Wikisource. Many members of the larger Odia-language community have contributed in finding errors which were fixed. Gyanaranjan has recently made available a free online responsive converter that not just works from a computer but also from any smartphone. The converter has indeed helped promote the use of Odia on the Internet. The dream of the Odia version of Google is coming closer and becoming more real.
Further reading:
How the Odia Wikimedia community is enriching Wikipedia with character encoding technology. Wikimedia blog
Like Us On Facebook |
Follow Us On Twitter |
Contact HuffPost India
Also see on HuffPost: