nyu-cds-math-tools-covariance_matrix.pdf

Content text nyu-cds-math-tools-covariance_matrix.pdf

Proof. We prove the result for vectors, the proof for matrices is the same. The ith entry of E(Ax ̃ + b) equals E(Ax ̃ + b)[i] = E ((Ax ̃ + b)[i]) by definition of the mean for random vectors (5) = E X d j=1 A[i, j] ̃x[j] + b[i] ! (6) = X d j=1 A[i, j]E ( ̃x[j]) + b[i] by linearity of expectation for scalars (7) = (AE( ̃x) + b)[i]. (8) We usually estimate the mean of random vectors by computing their sample mean, which equals the vector of sample means of the entries. Definition 1.4 (Sample mean of multivariate data). Let X := {x1, x2, . . . , xn} denote a set of d-dimensional vectors of real-valued data. The sample mean is the entry-wise average μX := Pn i=1 xi n . (9) When manipulating a random vector within a probabilistic model, it may be useful to know the variance of linear combinations of its entries, i.e. the variance of the random variable hv, x ̃i for some deterministic vector v ∈ R d . By linearity of expectation, this is given by Var v T x ̃ = E (v T x ̃ − E(v T x ̃))2 (10) = E (v T c( ̃x))2 (11) = v TE c( ̃x)c( ̃x) T v, (12) where c( ̃x) := ̃x − E( ̃x) is the centered random vector. For an example where d = 2 and the mean of ̃x is zero we have, E c( ̃x)c( ̃x) T = E x ̃x ̃ T (13) = E "x ̃[1] x ̃[2]# h x ̃[1] ̃x[2]i ! (14) = E " x ̃[1]2 x ̃[1] ̃x[2] x ̃[1] ̃x[2] ̃x[2]2 #! (15) = " E( ̃x[1]2 ) E( ̃x[1] ̃x[2]) E( ̃x[1] ̃x[2]) E( ̃x[2]2 ) # (16) = " Var ( ̃x[1]) Cov ( ̃x[1], x ̃[2]) Cov ( ̃x[1], x ̃[2]) Var ( ̃x[2]) # . (17) This motivates defining the covariance matrix of the random vector as follows. 2
Definition 1.5 (Covariance matrix). The covariance matrix of a d-dimensional random vector x ̃ is the d × d matrix Σx ̃ := E c( ̃x)c( ̃x) T (18) =       Var ( ̃x[1]) Cov ( ̃x[1], x ̃[2]) · · · Cov ( ̃x[1], x ̃[d]) Cov ( ̃x[1], x ̃[2]) Var ( ̃x[2]) · · · Cov ( ̃x[2], x ̃[d]) . . . . . . . . . . . . Cov ( ̃x[1], x ̃[d]) Cov ( ̃x[2], x ̃[d]) · · · Var ( ̃x[d])       , (19) where c( ̃x) := ̃x − E( ̃x). The covariance matrix encodes the variance of any linear combination of the entries of a random vector. Lemma 1.6. For any random vector x ̃ with covariance matrix Σx ̃, and any vector v Var v T x ̃ = v TΣx ̃v. (20) Proof. This follows immediately from Eq. (12). Example 1.7 (Cheese sandwich). A deli in New York is worried about the fluctuations in the cost of their signature cheese sandwich. The ingredients of the sandwich are bread, a local cheese, and an imported cheese. They model the price in cents per gram of each ingredient as an entry in a three dimensional random vector ̃x. ̃x[1], ̃x[2], and ̃x[3] represent the price of the bread, the local cheese and the imported cheese respectively. From past data they determine that the covariance matrix of ̃x is Σx ̃ =    1 0.8 0 0.8 1 0 0 0 1.2    . (21) They consider two recipes; one that uses 100g of bread, 50g of local cheese, and 50g of imported cheese, and another that uses 100g of bread, 100g of local cheese, and no imported cheese. By Lemma 1.6 the standard deviation in the price of the first recipe equals σ100 ̃x[1]+50 ̃x[2]+50 ̃x[3] = vuuuut h 100 50 50i Σx ̃    100 50 50    (22) = 153 cents. (23) The standard deviation in the price of the second recipe equals σ100 ̃x[1]+100 ̃x[2] = vuuuut h 100 100 0i Σx ̃    100 100 0    (24) = 190 cents. (25) 3
140 120 100 80 60 Longitude 40 50 60 70 80 Latitude Figure 1: Canadian cities. Scatterplot of the latitude and longitude of the main 248 cities in Canada. Even though the price of the imported cheese is more volatile than that of the local cheese, adding it to the recipe lowers the variance of the cost because it is uncorrelated with the other ingredients. 4 A natural way to estimate the covariance matrix from data is to compute the sample covariance matrix. Definition 1.8 (Sample covariance matrix). Let X := {x1, x2, . . . , xn} denote a set of d-dimensional vectors of real-valued data. The sample covariance matrix equals ΣX := 1 n Xn i=1 c(xi)c(xi) T (26) =       σ 2 X[1] σX[1],X[2] · · · σX[1],X[d] σX[1],X[2] σ 2 X[2] · · · σX[2],X[d] . . . . . . . . . . . . σX[1],X[d] σX[2],X[d] · · · σ 2 X[d]       , (27) where c(xi) := xi − μX for 1 ≤ i ≤ n, X[j] := {x1[j], . . . , xn[j]} for 1 ≤ j ≤ d, σ 2 X[i] is the sample variance of X[i], and σX[i],X[j] is the sample covariance of the entries of X[i] and X[j]. Example 1.9 (Canadian cities). We consider a dataset which contains the locations (latitude and longitude) of major cities in Canada (so d = 2 in this case). Figure 1 shows a scatterplot of the data. The sample covariance matrix is ΣX = " 524.9 −59.8 −59.8 53.7 # . (28) The latitudes have much higher variance than the longitudes. Latitude and longitude are nega- tively correlated because people at higher longitudes (in the east) tend to live at lower latitudes (in the south). 4 The data are available at http://https://simplemaps.com/data/ca-cities 4

PDF Google Drive Downloader v1.1

Content text nyu-cds-math-tools-covariance_matrix.pdf

Related document

PDF Google Drive Downloader v1.1

Title nyu-cds-math-tools-covariance_matrix.pdf ✅

Content text nyu-cds-math-tools-covariance_matrix.pdf

Related document