Statistics/Multivariate Data Analysis/Principal Component Analysis

From Wikibooks, open books for an open world
Jump to navigation Jump to search

In a Principal Component Analysis (PCA) a new coordinate system is created from the data. The origin of this new coordinate system is the grand mean, i.e. the mean of each variable becomes the 0 in the transformed coordinate system. The first principal component (first axis) extends through the longest extent of the data. Imagine your data being three variables, x1, x2 and x3. The data form a data cloud in three-dimensional space, like a bun, for example. Then the first axis (p1) can be visualised as a knitting needle poked through the maximum dimensions of the bun. The second axis (p2) is orthogonal to the first axis, extending through the next-longest side of the bun. As mentioned, the origin, i.e. the intercept of the two axis, is the mean. Therefore the two axes meet in the center of gravity (assuming uniform density) of the bun. The third axis, again, is orthogonal to both previous axes. In our 3D-example you can easily work out that there is only one possibility left.

Any combination of old variable values (e.g. x11, x21, x31) will have a new value in the transformed PCA-system (p11, p21, p31). However now the vectors of coordinates (p1., p2. and p3.) will be orthogonal and uncorrelated.

See also[edit | edit source]