Factor analysis in R and Python


Python has a number of statistical modules that allows us to perform analysis without R, but it is always good idea to compare the outputs of different implementations. I performed factor analysis using Scikit-learn module of Python for my dictionary creation system, but the outputs were completely different from that of R’s factanal function just like someone’s post to stackoverflow. After long hours, I finally found that it is because I had’t have normalized data for Scikit-learn. Factanal does normalization automatically, but Scikit-learn doesn’t. The right way of performing factor analysis must be this:

from sklearn import decomposition, preprocessing

data_normal = preprocessing.scale(data) # Normalization
fa = decomposition.FactorAnalysis(n_components=1)
print fa.components_ # Factor loadings

If you I do like this, factor loadings estimated by Scikit-learn become very close to R’s estimates:

# Python (Scikit-learn)
1: 0.24705429
2: 0.56100678
3: 0.48559474
4: 0.54208185
5: 0.50989289
6: 0.33028625
7: 0.38651951

# R (factanal)
1: 0.285719656390773
2: 0.633553717909623
3: 0.493731965398187
4: 0.527418210503982
5: 0.487150249901473
6: 0.312724093202758
7: 0.378827084637606

Leave a Reply

Your email address will not be published. Required fields are marked *