Why, and How, Should Geologists Use Compositional Data Analysis/Factor Analysis

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Factor analysis is a statistical data reduction technique used to explain variability among observed random variables in terms of fewer unobserved random variables called factors. It is useful to reduce the number of variables, by combining two or more variables into a single factor, thus “simplifying” the original dataset.

Factor analysis (FA) is especially useful in geochemistry when one has a known target or some other way to understand the meaning of the obtained associations. When failing this, the geologist is usually forced to “plot and see”, and then to select the FA that he believes is the most useful for the studied area.

I processed both the initial dataset and the three transformed versions using SYSTAT SSPS 10.0 for Windows, but you can use any other statistical program capable of factor analysis.

Factor Analysis for the Initial Dataset[edit | edit source]

Figure 36 shows the plot for the initial dataset, while table 24 shows the principal components defined by the software.

Figure 36. Scree plot for the initial dataset.


Table 24. Principal component analysis (PCA) for the initial dataset.


Equations 23 – 25 show the three FA components for the initial dataset.

Equation 23. FA 1 for the initial dataset.

Equation 24. FA 2 for the initial dataset.

Equation 25. FA 3 for the initial dataset.


Figures 37 – 39 show the effectiveness of these FA as a targeting tool for our ore body.



Conclusions and recommendations on the use of FA for the initial dataset[edit | edit source]

For as long as we have a known target to test the obtained FA, this method offers better results than the RCC. It also allows for the combined studied of all the elements together.

FA1 and FA2 do contain the embedded correlations I introduced in the initial dataset, thus their effectiveness, especially FA 1, in mapping the location of the ore body.

The next question will be: Will the transformed data be any more effective in helping us locate our target?

CRL transformed data[edit | edit source]

Figure 40 shows the scree plot for the CLR transformed dataset, while table 25 shows the principal components defined by SYSTAT.

Figure 40. Scree plot for the CLR transformed dataset.


Table 25. Principal component analysis for the CLR transformed dataset.


Equations 26 – 28 show the three FA components for the CLR transformed dataset.

Equation 26. FA 4 for the CLR transformed dataset.

Equation 27. FA5 for the CLR transformed dataset.

Equation 28. FA6 for the CLR transformed dataset.

Figures 41 – 43 show the effectiveness of these FA as a targeting tool for our ore body.


Factor Analysis for the ALR Transformed Dataset[edit | edit source]

Figure 44 shows the scree plot for the ALR transformed dataset, while table 26 shows the principal components defined by SYSTAT.

Figure 44. Scree plot for the ALR transformed dataset.


Table 26. Principal component analysis for the ALR transformed dataset.


Although table 26 shows two components, I will analyze only the second, which is a coefficient as shown in equation 29.

Equation 29. FA7 for the ALR transformed dataset.

This factor contains the embedded relationship from the initial dataset, but because of the presence of other elements, its usefulness as a targeting tool is more limited, as shown in Figure 45.

Figure 45. FA7 covers mostly the southeastern part of the ore body.

Factor Analysis of the IRL Transformed Dataset[edit | edit source]

Figure 46 shows the scree plot for the IRL transformed dataset, while table 27 shows the principal components defined by SYSTAT.

Figure 46. Scree plot for the IRL transformed dataset.


Table 27. Principal component analysis for the IRL transformed dataset.

The fact that we have so many components as the result of the P.C.A., is an indication that we will not get good results this time. Equations 30 through 34 show the obtained factors.

Equation 30. FA8 for the IRL transformed dataset.

Equation 31. FA9 for the IRL transformed dataset.

Equation 32. FA10 for the IRL transformed dataset.

Equation 33. FA11 for the IRL transformed dataset.

Figures 47 through 50 shows the spatial distribution of these factors with respect to the location of our ore body.



Conclusions and Recommendations on the Use of FA for the Transformed Datasets[edit | edit source]

As I mentioned earlier, for FA to be most useful, one needs to have a known target to calibrate it. The factor analysis applied to the CLR transformed data gave us three factors, but only one (FA5) was useful for targeting the ore body.

The factor analysis of the ALR transformed data (Factor 7) was good in general, but the best factors were obtained from the ILR transformed data, specially Factor 9 that not only gave the exact location of the ore body, but also its internal structure. Another efficient factor was FA11, but it definitively required calibration based on a known target.

So answering the question from page 41, yes, the factor analysis of the IRL transformed data will be more effective than the factor analysis of the raw data as a tool for locating the ore deposit.