Instructions

Answer all the questions. None of these questions will be graded.

Practice Problems

Question 1

Read the Harvard Business Review article, Hiring Algorithms are Not Neutral, by Gideon Mann and Cathy O’Neil.

What is the primary ethical issue discussed in the article?

If screening decisions are based solely on an AI algorithm then the decisions will on which candidates should be selected for an interview or hired may be subject to algorithmic bias. What is the proper degree of human involvement and AI in screening job applications? If the decision is left solely to an AI alogrithm then biases reflected in the training data will be reflected in the candidates selected for an interview or hired. The article states that “If the algorithm learns what a “good” hire looks like based on that kind of biased data, it will make biased hiring decisions. The result is that automatic resume screening software often evaluates job applicants based on subjective criteria, such as one’s name. By latching on to the wrong features, this approach discounts the candidate’s true potential.”

What does the article suggest that HR departments should do to avoid algorithmic bias?

The authors suggest that we should accept that algorithms are imperfect. In addition, the authors suggest: assigning a team to audit key algorithms “… so that they do not perpetuate inequities in businesses and society”; and use multiple algorithms to help limit bias.

Question 2

What type of review would a university ethics board apply to a study of publicly available social media posts of university students’ political beliefs? Discuss any ethical considerations.

The usual criteria for human subjects’ protection assumes that once data is publicly available then it poses no new risks. But, there is ample evidence that publicly available data can still pose significant risks to users especially if it is identifiable.

Question 3

(modified from Exercise 6.2 in Modern Data Science with R) Kathy, a data science student, found two publicly available data sets posted on a website. Both data sets were obtained by scraping two social media sites. Kathy joined the datasets by screen name and was able to create a new data set that included: screen name; email address; geographic location; IP (Internet protocol) address, demographic profiles, and preferences for relationships. If email address were removed then is it problematic to post this data set on a social media site?

There may be unintended consequences in terms of user reidentification that arise from posting data sets. To help minimize possible damage, data scientists should remove any variables (not just username) that would make it less likely that someone is able to reidentify the users.

Question 4

Read the section on Responsibilities to Research Subjects in the American Statistical Association’s Ethical Guidelines for Statistical Practice. Do these guidelines suggest that the data scientists at Cambridge Analytica were unethical when they linked the Facebook profiles to other databases to target Facebook users (see NYT story or search for other recent articles)? Explain. (NB: A question of this nature would not appear the final exam)

It’s not clear that the psychologist Aleksandr Kogan broke Facebook’s (FB) policy see CNN article. If he didn’t break FB’s policy then linking the data with other available data and using it to send ads to Facebook users doesn’t seem violate any laws. On the other hand if he did break FB’s policy then the data was not approved for secondary and indirect use as per section 5. of the Responsibilities to Research Subjects.

Anticipates and solicits approval for secondary and indirect uses of the data, including linkage to other data sets, when obtaining approvals from research subjects, and obtains approvals appropriate to allow for peer review and independent replication of analyses.

STA130H1 – Winter 2018 - Solutions