Q1. (40 points) Suppose you are given 7 data points as follows: A = (1, 1); B = (1.5,
2.0); C = (3.0, 4.0); D = (5.0, 7.0); E = (3.5, 5.0); F = (4.5, 5.0); and G = (3.5, 4.5).
Manually perform 2 iterations of K-Means clustering algorithm (slide 22 on clustering) on
this data. You need to show all the steps. Use Euclidean distance (L2 distance) as the distance/similarity metric. Assume number of clusters k=2 and the initial two cluster centers
C1 and C2 are B and C respectively.
Q2. (30 points) Please read the following two papers and write a brief summary of the
main points in at most FOUR pages.
Matthew Zook, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Pea Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara Knig, Jacob Metcalf, Arvind
Narayanan, Alondra Nelson, Frank Pasquale: Ten simple rules for responsible big data
research. PLoS Computational Biology 13(3) (2017)
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/journal.
pcbi_.1005399.pdf
Chelsea Barabas, Madars Virza, Karthik Dinakar, Joichi Ito, Jonathan Zittrain: Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment. Proceedings of Machine Learning Research (PMLR), 81:62-76, 2018
http://proceedings.mlr.press/v81/barabas18a/barabas18a.pdf
Q3. (30 points) Please go through the excellent talk given by Kate Crawford at NIPS-2017
Conference on the topic of “Bias in Data Analysis” and write a brief summary of the main
points in at most FOUR pages.