Ethics and Social Issues in Data Science
- I am basing this on Sara Baase's A Gift of Fire
- Ethics is a branch of Philosophy
- It involves the concepts of right and wrong
- And how to conduct yourself in a moral way.
- Baase: "What it means to do the right thing"
- This is tough, there are many different ethical systems.
- A few of the systems Baase discusses.
- Utilitarianism
- An action can be judged by how it increases the utility of the people involved.
- Do what is best for society.
- Judge actions by the results they produce.
- Mill
- Deontology
- There is a set of fundamental moral laws.
- You must follow these laws, even if it leads to harm.
- Judge actions by how they follow the rules.
- Kant
- The golden rule (Biblical and Confucius)
- Treat others as we wish them to treat us.
- There are many others.
- Since ethical systems are somewhat abstract professional codes of ethics are developed.
- These tend to be specialized for a discipline.
- Discuss responsibilities when interacting with clients, customers, peers, employers, colleagues, employees and society in general.
- They are designed to help guide you in decision making as it relates to work.
- A look at the The ACM Code of Ethics and Professional Conduct
- A well established computing society.
- Three parts:
- General Principles
- Professional Responsibilities
- Leadership Principles
- Under each part is a set of statements
- Each is followed by a brief discussion.
- General principles
- Contribute to society and human well-being
- Avoid harm
- Be Honest
- Be fair and do not discriminate
- Respect intellectual property
- Respect privacy
- Honor confidentiality
- Professional responsibilities
- Strive for high quality in process and results.
- Maintain high standards of professional competence, conduct and ethical practices.
- Know and respect rules (laws) pertaining to professional work.
- Accept and provide appropriate professional review.
- Give comprehensive and thorough evaluations of computer systems, impacts and analysis of risk.
- Perform work only in areas of competence.
- Foster awareness and understanding of computing, technologies and their consequences.
- Access computing and communication resources only when authorized or compelled by public good.
- Design and implement systems that are robustly and usable secure
- There is no professional data science society right now.
- The field is too young.
- There are several drafts of codes of ethics out there.
- And several groups working on them.
- A data science code of ethics will probably be close to ACM.
- But it will probably include
- Stronger language about privacy of data.
- Stronger language related to biases built into data.
- Language about transparency and reproducibility of research and methods.
- Language about accuracy.
- Language about making decisions based upon models.
- Some Illustrative Cases.
- The Filter Bubble
- See his Ted Talk, or read his book.
- This was "discovered" by Eli Pariser
- Story 1
- Two of his friends received radically different results on searches.
- one was conservative and the other liberal.
- When searching for BP
- One received information about stock, investment, ...
- The other received information about the recent oil spill.
- Story 2
- He noticed on Facebook that his feed had dropped his conservative friends.
- He was only seeing information from his liberal friends.
- The filter bubble is the personal ecosystem of information that's bee created by the algorithms that customize searches for individual users.
- A step back
- You know that most search engines maintain a history of the searches you performed and where you went.
- They use this information to decide "what you want to see"
- He claims that "there is no standard google any longer".
- You will receive the results "you are interested in".
- And this is not just google
- Facebook, amazon, netflix ...
- News services (google news, but also for pay news sites)
- The internet is showing what we want to see, not what we need to see.
- He claims that we are moving from an era where "human editors" controlled the news we watched (most of you are too young for this)
- To an era where algorithms (machine learning, big data) control what we see.
- As a result, we are only receiving the news "we want", and not the news "we need."
- And we are building digital tribes.
- What to do
- There are ways to turn filters off.
- There are ways to defeat the filters
- Seek out and click on news stories that don't match your idea.
- Seek out sites that don't filter. (allsides.com.
- There is apparently some correction going on
- In some spheres companies are attempting to reduce this for news feeds.
- But not in marketing.
- COMPAS
- Correctional Offender Management and Profiling Alternative Sanctions (COMPAS)
- Software designed to predict the likeliness that a criminal will commit another crime.
- Based on an algorithm that uses 137 features for each person
- Race is not a feature.
- But apparently there are other features that can be correlated with race.
- In multiple studies this software has been found to mispredict rates.
- Black defendants were overpredicted to recommit a crime.
- And therefore were more likely to be denied parole or sentenced more harshly.
- White defendants were underpredicted to recommit a crime.
- A study of 10,000 criminals in Broward County Fl. By Propublica (look here for details.
- Black defendants were predicted to recidivate but did not at a rate of 45%
- White defendants 23%
- The manufacturer, as well as at least one study (The age of secrecy and unfairness in recidivism
prediction) do not agree with these findings.
- The study finds that they are incorrectly using age as a predictor.
- For the study data, African-Americans were more likely to commit a crime at a younger age.
- And they believe that this causes the problem in COMPAS
- What is wrong?
- Errors in data collection and entry cause people to be misclassified.
- The algorithms used are proprietary, therefore hidden
- For the most part, we have no idea how this software is making its predictions.
- There is no way for a person to face this accuser.
- We may not even know how the algorithms are making these predictions
- Many machine learning techniques do not provide any justification for their decisions.
- In fact, in some ways we don't know how they make the decisions.
- College Ratings
- Weapons of Math Destruction by Cathy O'Neil
- US News and World Report College Rankings.
- Use a model to predict the "quality" of a college or university.
- Includes 15 measurements of quality.
- Currently listed here
- But has changed over time.
- Currently 20% is based upon Undergraduate Academic Reputation.
- This is based upon a survey of deans, provosts and presidents at peer institutions.
- Other factors include
- Alumni giving. (5%)
- Financial resources (10%)
- Class size, faculty salary, faculty with terminal degree, ... (20%)
- This seems OK but
- The 20% reputation vote is equivalent to a "popularity contest"
- There have been multiple studies that point out that most of those surveyed don't know enough about the peer institutions to correctly answer the survey.
- There have been "campaigns" to raise the ratings in the past. (Advertisement to Deans, Provosts and Presidents of other schools)
- Cheating is a problem
- at least in the past, a major portion of the data was self-reported.
- And schools misreported the data.
- Or tried to fake the data.
- At one point SAT scores were part of the ranking.
- And schools paid students to retake the SAT after they were admitted to drive up this score.
- In the past the acceptance rate was a factor.
- And schools would inflate applications to decrease acceptance rate.
- Schools would reject the most qualified?
- The reasoning went that these students used the school as a "back up" school.
- They weren't going to attend anyway
- So rejecting them would increase the rejection rate at no cost to the school.
- And there is really no direct measure of student success
- So spending money to improve facilities would help in the ratings.
- But does this help with student success?
- It helps with the 10% to Financial resources.
- O'Neil points out that this actually led to huge tuition increases at many schools.
- Note even now, cost is not a factor in the rating.
- But Faculty salary (7%)
- Class Size (8%)
- And other measures relating to students to faculty ratio (5%)
- All contribute.
- The good:
- The model is transparent.
- But the data is a true problem.