Factor analysis in R and Python


Python has a number of statistical modules that allows us to perform analysis without R, but it is always good idea to compare the outputs of different implementations. I performed factor analysis using Scikit-learn module of Python for my dictionary creation system, but the outputs were completely different from that of R’s factanal function just like someone’s post to stackoverflow. After long hours, I finally found that it is because I had’t have normalized data for Scikit-learn. Factanal does normalization automatically, but Scikit-learn doesn’t. The right way of performing factor analysis must be this:

from sklearn import decomposition, preprocessing

data_normal = preprocessing.scale(data) # Normalization
fa = decomposition.FactorAnalysis(n_components=1)
print fa.components_ # Factor loadings

If you I do like this, factor loadings estimated by Scikit-learn become very close to R’s estimates:

# Python (Scikit-learn)
1: 0.24705429
2: 0.56100678
3: 0.48559474
4: 0.54208185
5: 0.50989289
6: 0.33028625
7: 0.38651951

# R (factanal)
1: 0.285719656390773
2: 0.633553717909623
3: 0.493731965398187
4: 0.527418210503982
5: 0.487150249901473
6: 0.312724093202758
7: 0.378827084637606

One thought on “Factor analysis in R and Python

  1. Stephen

    I know your post on Factor analysis in R and Python is a few years old, but I am trying to find some reliable code for factor rotation – Varimax and Parsimax in particular. There are various routines that can be found online but seem to produce some strange output. For example some values appear to be out of the -1.00 to +1.00 range. Have you any thoughts or experience with this ?

