Empirical Orthogonal Functions (EOF) homework.

Background reading: Hsieh book chapter 2


1. First, a question about the sense of EOF's:

You have some data(x,t) with space-time structure: 144 space bins (in this case, just longitude), by 240 time bins (months).
You want to decompose it into a set of orthogonal terms that add together to give the total.
Since they are orthogonal, each term represents some variance: cross terms disappear when you average the square of the sum.
If you keep enough terms you will get back all the variance (and more importantly, you can reconstruct the data in all its detail).

In the case of EOF (also known as Principal Components (PC)) analysis, you express your data as:
[[image:space/math/3a322530c7d160da73e6a98417605451.gif align="absmiddle" caption=" {EOF}_1(bold x) {PC}_1(t) + {EOF}_2(bold x) {PC}_2(t) + {EOF}_3(bold x) {PC}_3(t) + ... = mydata(x,t)"]]
  1. How many values (numbers) are in your input data array? 144 x 240 (lon vs t)
  2. How many values (numbers) are needed to build each term on the left? 144 for EOFs, 240 for PCs
  3. If 5 EOFs capture most of the data's variance, how much smaller (in the above sense) is the EOFxPC representation compared to the full data set?
144 x 240 - (144 + 240) x 5


2. Read in your field1 (let's call it x again). Use the same data from HW3 data source here.


Perform and display an EOF analysis of your first field.
  • IDL: here is an example code [[file/view/HW5_EOF_BEM.pro|HW5_EOF_BEM.pro]]
  • IDL results are at the bottom of the page

  • Matlab: x should be (x,t) or (240,144), but in the file it is (144 x 240), so you need to transpose it.
  • Then call [COEFF, SCORE, latent, tsquare] = princomp(x);
    • COEFF is a 144 x 144 array of coefficients ("projection" or "loadings" or "weights" in the x domain)
    • SCORE is a 240 x 144 array of the PC(240) associated with each EOF(144).

<span class="co1">%%% Matlab: here are the key lines. Reconstruction is the test of whether you have things ordered right.
%%% If you can reconstruct the data, you can zero out some of the modes (in a copy of either SCORE or COEFF)
%%% and reconstruct just one mode, or the sum of the first N modes, or whatever.</span>
 
x = <span class="kw2">double</span><span class="br0">(</span>ncload<span class="br0">(</span><span class="co2">'/Users/bem/Downloads/hw3_data.nc'</span>,<span class="co2">'olr'</span><span class="br0">))</span>;
NLONS = <span class="nu0">144</span>;
 
<span class="co1">% EOF analysis only deals with variations (because it works on the covariance matrix).
% Temporal covariance that is. So remove the time mean of each longitude.</span>
 
xp = x.*<span class="nu0">0</span>; <span class="co1">% make array of correct size for x perturbations (pert. from time mean)</span>
<span class="kw1">for</span> ix = <span class="nu0">1</span>:NLONS
    xp<span class="br0">(</span>ix,:<span class="br0">)</span> = x<span class="br0">(</span>ix,:<span class="br0">)</span> - <span class="kw2">mean</span><span class="br0">(</span>x<span class="br0">(</span>ix,:<span class="br0">))</span>;
<span class="kw1">end</span>
 
xp = transpose<span class="br0">(</span>xp<span class="br0">)</span>;
<span class="br0">[</span>COEFF,SCORE,latent,tsquare<span class="br0">]</span> = princomp<span class="br0">(</span> xp <span class="br0">)</span>;
reconXp = SCORE*transpose<span class="br0">(</span>COEFF<span class="br0">)</span>;
<span class="kw2">diff</span> = xp-reconXp;
 
<span class="kw2">figure</span><span class="br0">(</span><span class="nu0">1</span><span class="br0">)</span>
<span class="kw2">contourf</span><span class="br0">(</span>xp<span class="br0">)</span>
<span class="kw2">figure</span><span class="br0">(</span><span class="nu0">2</span><span class="br0">)</span>
<span class="kw2">contourf</span><span class="br0">(</span>reconXp<span class="br0">)</span>
<span class="kw2">figure</span><span class="br0">(</span><span class="nu0">3</span><span class="br0">)</span>
<span class="kw2">contourf</span><span class="br0">(</span><span class="kw2">diff</span><span class="br0">)</span>
 
explained = latent/<span class="kw2">sum</span><span class="br0">(</span>latent<span class="br0">(</span>:<span class="br0">))</span> .*<span class="nu0">100</span>;
explained<span class="br0">(</span><span class="nu0">1</span>:<span class="nu0">10</span><span class="br0">)</span>
 
<span class="co1">% truncate to 1st EOF only to show pure 1st mode variability</span>
SCOREtrunc = SCORE;
SCOREtrunc<span class="br0">(</span>:,<span class="nu0">2</span>:<span class="kw1">end</span><span class="br0">)</span> = <span class="nu0">0</span>; <span class="co1">% zero out modes 2:144</span>
reconXtrunc = SCOREtrunc*transpose<span class="br0">(</span>COEFF<span class="br0">)</span>;
<span class="kw2">figure</span><span class="br0">(</span><span class="nu0">4</span><span class="br0">)</span>
<span class="kw2">contourf</span><span class="br0">(</span>reconXtrunc<span class="br0">)</span> <span class="co1">%%% result below as test: looks good</span>
 
 
OLR.EOF1PC1.Matlab.BEM.png
OLR.EOF1PC1.Matlab.BEM.png




Extra credit/ teach us something new:

  1. Do an EOF analysis of your second field. Display and interpret. Compare and contrast with your first field.
  2. Remove the mean, or don't, or remove a different mean. How are the results affected? Explain the sense of the results.
  3. Standardize the time series at each longitude: this gives eigenvectors and eigenvalues of the correlation matrix rather than the covariance matrix (recall HW3 where you plotted slices of these).
  4. Try doing the computation with x and t transposed. Now the "coefficients" or "eigenvectors" are in time (240) and the "scores" are in space (144).
    1. There is a part of the total spacetime variance that EOF's can't reach if you remove the TIME mean, but then use SPACE as the statistical dimension over which you sum to compute covariances. (Or, for that matter, if you remove the SPACE mean to define anomalies but then perform a TIME covariance analysis). What is that unreachable part of the spacetime variance? (Just look at the difference between the input data and the reconstruction and you will see what I am getting at.)
  5. Do a "Combined EOF" analysis of a vector that combines the two fields (each field must be standardized, since the units are different).
    1. you just make a (240x288) array where the 288 values at each time are the 144 field1 (standardized) and then the 144 field2.
    2. Run princomp() in the usual way
    3. Unpack the results at plotting time: the first 144 values are your field1, the others your field2. Rescale with physical units for a better plot.
    4. CEOFs here maximized the variance of the combined data, so they indicate related variations between the 2 fields.
    • examples: [[file/view/CEOF.sst.slp.ps|CEOF.sst.slp.ps]] [[file/view/CEOF.sst.precip.ps|CEOF.sst.precip.ps]] from code [[file/view/HW5_CEOF_BEM.pro|HW5_CEOF_BEM.pro]]
  6. Read rotatefactors() documentation (Matlab) and learn and teach us all about "rotated EOFs".
    1. Background: you first truncate to the first few EOF/PC pairs, then relax one or more of the orthogonality conditions.
      1. (background of background: EOFs are orthogonal in space, AND PCs are orthogonal in time. Heavy constraint! Either one would allow us to still speak of variance as being cleanly partitioned among modes. Rotated modes may or might or could (?) thus be more physical, since a purely mathematical constraint has been relaxed).
The double orthogonality condition:
[[image:space/math/6ae5df9b48e5c6b2bf39479f51e7b41f.gif align="absmiddle" caption=" int EOF_i(vec x) EOF_j(vec x) d vec x = 0andint PC_i(t) PC_j(t) dt = 0"]]


IDL results for OLR field

[[file/view/HW5_EOF_BEM.ps|HW5_EOF_BEM.ps]]
[[file/view/HW5_EOF_BEM.pro|IDL code HW5_EOF_BEM.pro]]
HW5_EOF_OLRmodes_BEM.png
HW5_EOF_OLRmodes_BEM.png

HW5_EOF_OLRreconstructions_BEM.png
HW5_EOF_OLRreconstructions_BEM.png



Showing that an EOF is an eigenvector of the covariance matrix

(so that the results here are related back to those of Question 6.1 of HW3).

Recall that an eigenvector of a matrix is a vector that, when multiplied by the matrix, returns the same vector (times a constant, the eigenvalue):
[[image:space/math/13db31ddd1f603b4c1c102d17c1165b7.gif align="absmiddle" caption=" matrix M vec x = lambda vec x"]]

EOF1_is_eigenvector_check_olrBEM.png
EOF1_is_eigenvector_check_olrBEM.png


EOF2_is_eigenvector_check_olrBEM.png
EOF2_is_eigenvector_check_olrBEM.png