French and Chinese seed dictionaries are added to Newsmap

newsmap is a dictionary-based semi-supervised model for geographical document classification. The core of the package is not the machine learning algorithm but multi-lingual seed dictionaries created by me and other contributors in English, German, French, Spanish, Japanese, Russian, Chinese. We recently added Chinese (traditional and simplified) and French dictionaries, and submitted the package to CRAN.

The number of native speakers of these languages accounts for 30% of world population, which is actually much smaller than I though. Creation of Arabic, Hindi and Portuguese dictionaries will increase the population coverage by 12%, but there is a long way to go!

Posts created 71

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top