Immigration dictionary

Standard

This is probably the final version of my immigration dictionary. This text analysis dictionary was created using technique called the Latent Semantic Scaling, which is based on the Latent Semantic Analysis, from British newspaper corpus.

The result of the automated content analysis by this dictionary is strongly corresponds to manual coding by Amazon’s Mechanical Turks as you can see in the chart (whiskers represent 95% confidence intervals). Yet, please note that the documents coded by the dictionary are only sentences about immigration in the party manifestos selected by keywords (‘immigra*’, ‘migra*’, ‘refugee*’, ‘asylum*’, ‘foreign*’).

UK 2010 manifestos on immigration

The dictionary is made up of 750 entry words. The following is the top 30 most positive and negative words in the dictionary. Many of them are intuitively positive or negative, but some are not. For example, ‘globalisation’ is positive only in the context of immigration. This is why texts are restricted to sentences on this subject. We can spot words like ‘species’ and ‘wildebeest’, because the newspaper corpus contains stories about animal migration, but it is not too harmful.

# Positive words

1   skills            100
2   globalisation     88.24
3   chauffeured       86.93
4   airport           86.68
5   ranging           82.41
6   clearance         79.48
7   status            78.4
8   agency            74.98
9   issues            72.15
10  breed             69.45
11  claimed           68.84
12  vehemently        68.6
13  skill             67.3
14  test              65.91
15  attract           64.39
16  permanent         63.68
17  legal             59.23
18  melting-pot       57.34
19  species           57.27
20  wildebeest        56.96
21  overstaying       56.07
22  documents         55.9
23  routes            55.75
24  work              55.63
25  shambles          55.28
26  breeding          53.65
27  bringing          53.24
28  employ            52.76
29  passport          52.24
30  official          51.88


# Negative words

1   xenophobia        -141.27
2   control           -130.09
3   racist            -125.2
4   stemming          -122.5
5   tide              -122.46
6   working-class     -115.53
7   negative          -113.76
8   failure           -110.32
9   problems          -106.95
10  influx            -100.81
11  branded           -99.42
12  caused            -96.82
13  exploit           -94.11
14  first-generation  -90.78
15  warned            -89.93
16  families          -88.51
17  soaring           -86.53
18  ignored           -86.45
19  housed            -85.33
20  magnet            -84.47
21  borders           -83.18
22  newly-arrived     -83.12
23  accused           -82.89
24  evicted           -82.02
25  trickle           -81.42
26  rates             -79.42
27  fuelled           -78.34
28  flooded           -76.69
29  non-white         -76.48
30  lorries           -76.38

Text analysis dictionary on immigration policy

Standard

Dictionary-based text analysis has a number of good properties, but it is always difficult to make a new dictionary and text analysts often use existing dictionaries that include the General Inquirer dictionaries, which are originally created decades ago, or their derivatives. However, I believe that it is time to create new dictionaries from scratch using a number of tools and techniques available to us.

My first original dictionary is the UK Immigration Dictionary. It is meant to measure attitude toward immigration to the UK. The words contains counter intuitive positive entities such as ‘racist’, but the result becomes as follows when applied to the 2010 UK party manifestos.

BNP          -0.660772785
Coalition     0.403547905
Conservative  0.002508397
Greens       -0.898075732
Labour        0.081029432
LibDem        0.050535076
PC           -0.015306746
SNP          -0.551027977
UKIP         -0.335952325

I am not yet sure how accurate this is, but it looks interesting since small parties, which tend to be against immigration, are all negative.

It is very easy to used the dictionary in R using Quanteda:

options(stringsAsFactors=FALSE)
df.temp <- read.csv(file="news.dictionary.tfidf.500.csv",  header=FALSE, sep='\t')
df.dict <- data.frame(word=as.character(df.temp$V1), score=as.numeric(df.temp$V2))

uk2010immigCorpus <- corpus(uk2010immig,
                            docvars=data.frame(party=names(uk2010immig)),
                            notes="Immigration-related sections of 2010 UK party manifestos",
                            enc="UTF-8")
mx <- tfidf(dfm(uk2010immigCorpus))
mx2 <- as.data.frame.matrix(t(subset(t(mx), colnames(mx) %in% df.dict$word))) #Remove columns not in the dictionary

# Make a list in the same order as the columns
v.dict <- list()
for(word in colnames(mx2)){
  v.dict[[word]] <- df.dict$score[df.dict$word==word]
  #v.dict[[word]] <- ifelse(df.dict$score[df.dict$word==word] > 0, 1, -1)
}

print(as.matrix(mx2) %*% as.matrix(unlist(v.dict)))