Text analysis dictionary on immigration policy


Dictionary-based text analysis has a number of good properties, but it is always difficult to make a new dictionary and text analysts often use existing dictionaries that include the General Inquirer dictionaries, which are originally created decades ago, or their derivatives. However, I believe that it is time to create new dictionaries from scratch using a number of tools and techniques available to us.

My first original dictionary is the UK Immigration Dictionary. It is meant to measure attitude toward immigration to the UK. The words contains counter intuitive positive entities such as ‘racist’, but the result becomes as follows when applied to the 2010 UK party manifestos.

BNP          -0.660772785
Coalition     0.403547905
Conservative  0.002508397
Greens       -0.898075732
Labour        0.081029432
LibDem        0.050535076
PC           -0.015306746
SNP          -0.551027977
UKIP         -0.335952325

I am not yet sure how accurate this is, but it looks interesting since small parties, which tend to be against immigration, are all negative.

It is very easy to used the dictionary in R using Quanteda:

df.temp <- read.csv(file="news.dictionary.tfidf.500.csv",  header=FALSE, sep='\t')
df.dict <- data.frame(word=as.character(df.temp$V1), score=as.numeric(df.temp$V2))

uk2010immigCorpus <- corpus(uk2010immig,
                            notes="Immigration-related sections of 2010 UK party manifestos",
mx <- tfidf(dfm(uk2010immigCorpus))
mx2 <- as.data.frame.matrix(t(subset(t(mx), colnames(mx) %in% df.dict$word))) #Remove columns not in the dictionary

# Make a list in the same order as the columns
v.dict <- list()
for(word in colnames(mx2)){
  v.dict[[word]] <- df.dict$score[df.dict$word==word]
  #v.dict[[word]] <- ifelse(df.dict$score[df.dict$word==word] > 0, 1, -1)

print(as.matrix(mx2) %*% as.matrix(unlist(v.dict)))

Leave a Reply

Your email address will not be published. Required fields are marked *