Problem 1
(Coding Householder Reflections)
We start by coding up Householder reflectors. Write a function reflectMult(A,w) that takes in an
m × n matrix A and maps its columns via the Householder transformation that reflects across the plane
perpendicular to w (do not assume its a unit vector). This function should just change A in place and not
return anything.
Validate your code as follows:
for i in range(50):
A=np.random.rand(100,50)
B=np.copy(A)
reflectMult(B,np.eye(100)[:,i]-B[:,i]/np.linalg.norm(B[:,i]))
print(np.allclose(B[:,i],np.eye(100)[:,i]*np.linalg.norm(A[:,i])))
print(np.allclose(np.linalg.norm(B),np.linalg.norm(A)))
This should print a long string of trues.
Problem 2
(Coding the Householder QR Algorithm)
Create a function myQR(A) which takes in an m × n matrix A and returns an m × n matrix q and n × n
matrix R which consist of the following:
(1) q[:,i] contains the direction for the Householder reflection at stage i of the Householder QR
Algorithm, as used in the function reflectMult above. (Remark: Be careful! You need q[k,i]=0
when k < i because the corresponding Householder transformation should be the identity on for
indices k < i. Also be careful about the sign of the reflected vector.)
(2) The matrix R is Rˆ in the condensed QR decomposition A = QˆRˆ.
You can validate your code as follows. First active the code:
def assembleQ(q):
m,n=q.shape
Q=np.eye(m)
for i in range(n):
reflectMult(Q,q[:,n-1-i])
return Q
This will assemble the orthogonal matrix Q in the full A = QR decomposition. Next, we run the check:
A=np.random.rand(100,50)
q,R=myQR(A)
Q=assembleQ(q)
np.allclose(Q[:,:50]@R,A)
Problem 3
1
(Coding Least Squares)
The final coding part of this Lab consists of creating a least squares solver myLS(A,b) which takes in an
m × n matrix A for m ≥ n and m-dimensional vector b and returns an n-dimensional vector x which is a
solution to the least squares problem Rxˆ = QˆT
b where A = QˆRˆ is the condensed QR decomposition of A.
This code should make use of the efficient solver upperbSub from previous labs. You can validate your code
with calls to:
A=np.random.rand(100,50)
b=np.random.rand(100)
x=myLS(A,b)
np.allclose(x,np.linalg.lstsq(A,b,rcond=None)[0])
Problem 4
(Application: WHO Life Expectancy Data)
We’ll now use our QR decomposition code to create a linear model for average life expectancy by country
based on certain predictors. For this part you will need the files life train.csv and life test.csv from
the “Assignments” section on canvas (where you found this Lab). You can open these in any spreadsheet
app (e.g. Excel or Numbers) to get a sense of the data. We will use life train.csv to train our model,
and life test.csv to test its accuracy. These two files are a random splitting of a cleaned up version
of the WHO data which can be found on Kaggle: https://www.kaggle.com/datasets/kumarajarshi/
life-expectancy-who.
To import this data to your Python session in Colab you will need to upload the two files life train.csv
and life test.csv to your google drive https://drive.google.com/drive/my-drive. Its probably best
to create a new folder, say Regression, and place the files there. To access these files once you are in Colab
first type:
from google.colab import drive
drive.mount(“/content/gdrive/”)
which will involve an activation process (Google will ask if its OK to link your drive to the Python session).
After that type:
cd gdrive/MyDrive/Regression
You may need to replace this with the name you chose for your data folder. To make sure everything is there
type:
ls
and you should see your stuff. If you are having trouble with this part of the Lab please ask questions on
Piazza and Discord to see if a classmate can help.
Once the files are in place and we are in the right directory, we need to import the data to our Python
session and write it to Numpy arrays. To do this run (beware, the underscores may not copy and paste
correcty):
import pandas as pd
from matplotlib import pyplot as plt
df train=pd.read csv(“life train.csv”)