Empirical Orthogonal Functions (EOF) homework.

Background reading: Hsieh book chapter 2


1. First, a question about the sense of EOF's:

You have some data(x,t) with space-time structure: 144 space bins (in this case, just longitude), by 240 time bins (months).
You want to decompose it into a set of orthogonal terms that add together to give the total.
Since they are orthogonal, each term represents some variance: cross terms disappear when you average the square of the sum.
If you keep enough terms you will get back all the variance (and more importantly, you can reconstruct the data in all its detail).

In the case of EOF (also known as Principal Components (PC)) analysis, you express your data as:
EOFeqn.jpg
  1. How many values (numbers) are in your input data array? The input array is simply our time series which is 144 (2.5 ^0 grid spacing in longitude) by 240 (20 years of monthly data) resulting in an input array consisting of 34,560 elements.
  2. How many values (numbers) are needed to build each term on the left? Each term on the left consists of the product of an EOF (time) and PC (space) component. The EOF part has 240 (20 years of monthly data) elements and the PC part has 140 (2.5^0 grid spacing in longitude) resulting in an EOF*PC component consisting of 384 elements.
  3. If 5 EOFs capture most of the data's variance, how much smaller (in the above sense) is the EOFxPC representation compared to the full data set? If only 5 EOFs are needed to capture most of the data's variance, then this represents 5*384 or 1,920 elements. This representation is 1920/34560 or only 5.5% the size of the original data set or approximately 18 times
    smaller.


2. Read in your field1 (let's call it x again). Use the same data from HW3 data source here.


Perform and display an EOF analysis of your first field

The below figure shows the total SST Perturbations against their EOF reconstruction and the difference between the two. The EOF Reconstruction is extremely close to the total field (not colorbar limits for rightmost figure). This, however, is the representation of all EOF modes and so would be expected to be extremely close to the original.

Leber_HW5_2-1.jpg
More interesting, perhaps, is to look at how much of the variance can be captures for just a small number of EOF modes. Below, we see that a huge amount of variance is captured by the first EOF mode, with more than 90% of the variance being explained by the first 4 modes:

Leber_HW5_2-2.jpg


Extra credit/ teach us something new:


4. Try doing the computation with x and t transposed. Now the "coefficients" or "eigenvectors" are in time (240) and the "scores" are in space (144).
  1. There is a part of the total spacetime variance that EOF's can't reach if you remove the TIME mean, but then use SPACE as the statistical dimension over which you sum to compute covariances. (Or, for that matter, if you remove the SPACE mean to define anomalies but then perform a TIME covariance analysis). What is that unreachable part of the spacetime variance? (Just look at the difference between the input data and the reconstruction and you will see what I am getting at.)

Leber_HW5_4-1.jpg

Transposing SST prior to performing an EOF results in huge differences between the original field and the EOF Reconstruction, and is incredibly interesting, especially when compared with the first figure above where the EOF reconstruction did so well. When the time mean is removed, but then space is used as the statistical dimension, the EOF cannot capture the average change with time, which makes sense.


5. Do a "Combined EOF" analysis of a vector that combines the two fields (each field must be standardized, since the units are different).
  1. you just make a (240x288) array where the 288 values at each time are the 144 field1 (standardized) and then the 144 field2.
  2. Run princomp() in the usual way
  3. Unpack the results at plotting time: the first 144 values are your field1, the others your field2. Rescale with physical units for a better plot.
  4. CEOFs here maximized the variance of the combined data, so they indicate related variations between the 2 fields.
examples: [[file/view/CEOF.slp.uwnd.ps|CEOF.slp.uwnd.ps]] [[file/view/CEOF.sst.slp.ps| CEOF.sst.slp.ps]] [[file/view/CEOF.sst.precip.ps|CEOF.sst.precip.ps]] from code [[file/view/HW5_CEOF_BEM.pro|HW5_CEOF_BEM.pro]]

Leber_HW5_5-1.jpg