Where do I am wrong? I am trying to perform PCA through prcomp and by myself, and I get different results, can you please help me?
DOING IT BY MYSELF:
>database <- read.csv("E:/R/database.csv", sep=";", dec=",") #it's a 105 rows x 8 columns, each column is a variable
>matrix.cor<-cor(database)
>standardize<-function(x) {(x-mean(x))/sd(x)}
>values.standard<-apply(database, MARGIN=2, FUN=standardize)
>my.eigen<-eigen(matrix.cor)
>loadings<-my.eigen$vectors
>scores<-values.standard %*% loadings
>head (scores, n=10) # I m just posting here the first row scores for the first 6 pc
[,1]       [,2]       [,3]        [,4]       [,5]        [,6]        
2.3342586  2.3426398 -0.9169527  0.80711713  1.1409138 -0.25832090    
>sd <-sqrt (my.eigen$values)
>sd
[1] 1.5586078 1.1577093 1.1168477 0.9562853 0.8793033 0.8094500 0.6574788
0.4560247
DOING IT WITH PRCOMP:
>database.pca<-prcomp(database, retx=TRUE, center= TRUE, scale=TRUE)
>sd1<-database.pca$sdev 
>loadings1<-database.pca$rotation
>rownames(loadings1)<-colnames(database)
>scores1<-database.pca$x
>head (scores1, n=10)
PC1        PC2        PC3         PC4        PC5         PC6       
-2.3342586  2.3426398  0.9169527  0.80711713  1.1409138  0.25832090
range (scores-scores1) is not zero! Please help me!!! Gloria
The prcomp function returns an object of class prcomp, which have some methods available. The print method returns the standard deviation of each of the four PCs, and their rotation (or loadings), which are the coefficients of the linear combinations of the continuous variables.
The function princomp() uses the spectral decomposition approach. The functions prcomp() and PCA()[FactoMineR] use the singular value decomposition (SVD). According to the R help, SVD has slightly better numerical accuracy. Therefore, the function prcomp() is preferred compared to princomp().
Low interpretability of principal components. Principal components are linear combinations of the features from the original data, but they are not as easy to interpret. For example, it is difficult to tell which are the most important features in the dataset after computing principal components.
It looks like your principal component scores have come out more or less exactly the same, just with different signs. As I learned here, the sign of a principal component is basically arbitrary.
If you test your manually calculated scores with something like range(abs(scores) - abs(scores1)) instead, you should get something pretty close to 0 (maybe not exactly 0, due to possible floating-point precision effects).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With