早稲田大学で多言語テキスト分析法について発表

早稲田大学の政治学研究科セミナーにて、『バイリンガル分析へのデータ駆動アプローチ:30年間の日英新聞における米国外交政策の表象』と題するプレゼンテーションを行いました。当プレゼンテーションは、アメリカの政治・外交について研究プロジェクトにおいて、異なる言語(英語と日本語)の文書に対して同一の量的テキスト分析手法を適用する方法に関するものです。本セミナーで発表した手法の一部は、5月22日の15時から行われる日本語の量的テキスト分析に関するワークショップでより具体的に説明します。

Redefining word boundaries by collocation analysis

Quanteda’s tokenizer can segment Japanese and Chinese texts thanks to stringi, but its results are not always good, because its underlying function, ICU, recognizes only limited number of words. For example, this Japanese text “ニューヨークのケネディ国際空港” can be translated to “Kennedy International Airport (ケネディ国際空港) in (の) New York (ニューヨーク)”. Quanteda’s tokenizer (tokens function) segments this into […]

Newsmap paper in Digital Journalism

My paper on geographical news classification is finally published in Digital Journalism, a sister journal of Journalism Studies. In this paper, I not only evaluate Newsmap’s classification accuracy, but compare it with other tools such as Open Calais and Geoparser.io. This paper presents the results of an evaluation of three different types of geographical news […]

New paper on Russia’s international propaganda during the Ukraine crisis

My paper on Russia’s international propaganda during the Ukraine crisis, The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis, is published in the Journal of International Communication. This is very timely, because people are talking about spread of “fake news”! The description of the Ukraine crisis as an ‘information […]

Handling multi-word features in R

Multi-word verbs (e.g. “set out”, “agree on” and “take off”) or names (e.g. “United Kingdom” and “New York”) are very important features of texts, but it is often difficult to keep them in bag-of-words text analysis, because tokenizers usually break up strings by spaces. You can preprocess texts to concatenate multi-word features with underscores like […]

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top