Python has a number of statistical modules that allows us to perform analysis without R, but it is always good idea to compare the outputs of different implementations. I performed factor analysis using Scikit-learn module of Python for my dictionary creation system, but the outputs were completely different from that of R’s factanal function just like someone’s post to stackoverflow. After long hours, I finally found that it is because I had’t have normalized data for Scikit-learn. Factanal does normalization automatically, but Scikit-learn doesn’t. The right way of performing factor analysis must be this:

from sklearn import decomposition, preprocessing data_normal = preprocessing.scale(data) # Normalization fa = decomposition.FactorAnalysis(n_components=1) fa.fit(data_normal) print fa.components_ # Factor loadings

If you I do like this, factor loadings estimated by Scikit-learn become very close to R’s estimates:

# Python (Scikit-learn) 1: 0.24705429 2: 0.56100678 3: 0.48559474 4: 0.54208185 5: 0.50989289 6: 0.33028625 7: 0.38651951 # R (factanal) 1: 0.285719656390773 2: 0.633553717909623 3: 0.493731965398187 4: 0.527418210503982 5: 0.487150249901473 6: 0.312724093202758 7: 0.378827084637606