Global Ethics Forum: The Ethics of Big Data with danah boyd

Global Ethics Forum: The Ethics of Big Data with danah boyd

– Welcome to Ethics Matter. I’m Stephanie Sy and our
guest today is danah boyd, a principal researcher at
Microsoft Research and the founder and president
of Data & Society, a research institute
focused on understanding the role of data-driven
technology in our society, technologies used by
companies like Facebook and Google, and soon
perhaps your employer, your government, perhaps
your police department. She has degrees in
computer science as well as a Ph.D. in information, and has
written two books on the intersection of
society, culture, and internet technologies. Danah is here to help us
understand the limitations and the risks of big
data, algorithms, and machine learning,
and hopefully, danah, we will be able to define
some of these terms as we get into the conversation. Welcome to the Carnegie Council. – Thanks for having me. – Are we in the midst of
a technological upheaval that is drastically
changing society? Is that why you started
your research center? – For me, it’s not so much
that we are in the midst of something that is changing, it’s more that there’s
a moment where we’re suddenly paying attention
to a set of issues that have actually been going
on for a very long time. When you step back, you
actually can see patterns over a longer period, but
we’re in a moment where everybody is paying attention. It’s a phenomenon, people
want to understand it. And there’s nothing like
that moment of phenomenon for people to get
obsessed with hype, imagine all of the
great things it will do, and also simultaneously
be terrified. So a lot of what I’m interested
in with Data & Society is to make certain
we can ground it, and sort of step back and
say: Okay, what is real? What are the ramifications
in light of a whole set of other social dynamics? and really try to make
certain that we’re more informed in our approach to
a lot of these technologies. – A phrase that I hear
thrown around a lot in the last few years is big data, and I’m sure that is
something your research center looks into. How big has big data gotten, and how did we get here? – I joke that big data
often has nothing to do with bigness and rarely
has anything to do with data, but it’s in many
ways the mythology that if we just collect more
information about more people that we can
do magical things, that we can solve
problems that have been historically intractable. And I that’s actually
where we get ourselves into trouble. There are a lot
of techniques and technologies that are
actually doing data analytics across
large swaths of data, and some of the most
sophisticated have nothing to do with people:
astronomy data, for example, pretty amazing; what we’re seeing
in terms of genetic analysis, unbelievable. But a lot of what we talk
about when we talk about big data is the idea that
companies like Facebook have tremendous
information about you and your practices and
what you’re doing, and so they’re trying
to understand patterns. So a lot of what it
becomes synonymous with is the idea of prediction,
the idea that we could just take this data and
predict something about you. The question is: Should we
be doing those predictions? Who is manipulating that data, and what are the
ramifications there? The other thing about big
data is that it has become collectively synonymous
with artificial intelligence, which
is our other term. – We are going to get into
artificial intelligence, but can you give us a
broad definition of what you mean by big data? You brought up some of the
ways data is collected, but when we talk about big data, what are we actually
referring to? – From my perspective, big
data is actually about a phenomenon, it’s not
actually about something that is collection of
large swaths of data. It’s about a set of
business practices, a set of technologies, and
a set of beliefs of what we can do with a huge
amount of information about people and
their practices. – Part of that is
algorithms and how big data is used by algorithms
to make decisions. – That’s been a lot of the
interesting transition, right, which is, one, the
phenomenon has been what do we do with all
the analytics or the information we have? How do we analyze it? Often we get into
conversations then about machine learning. Machine learning is usually
the next translation. So at that moment we can
take all of this and not just do run-of-the-mill
business analytics or statistical processing, but say, How do we actually analyze
this data for prediction? A lot of machine learning
algorithms are to cluster or to predict, to make
specific decision-making processes available. That actually is one of
the reasons why you have to connect it to
artificial intelligence because big data became almost
synonymous with Big Brother, with big surveillance, and so that became a term
that has been deprecated by a lot of different
communities and been replaced with
artificial intelligence, where we actually
meant many of the same things, large amounts
of data analytics, the ability to do
sophisticated machine learning, and more and more
advances in machine learning. – The way I think most of
us typically confront and use these technologies is
every time we go on Google or Facebook. What are other ways and
other examples of how big data and machine learning
and algorithms are impacting our lives today? – Within a core
technology universe, think of any time you are
given a recommendation: What movies should you watch? What things you
should purchase next? What news articles
should you read next? Those are all part
of this ecosystem. But, of course, it goes
beyond the world of information technologies. It’s also starting to
shape things like medicine. How do we start to
understand your cancer? We also see this in
environments like criminal justice, which is where
it can actually be a much more problematic environment. – Let’s stop there,
with criminal justice. That is an area in which
algorithms are being applied that some say
is ethically concerning. Let’s parse that out
a little bit more. What types of risks are
there to using machine learning in criminal justice? – The biggest challenge
with criminal justice is that the public does not
actually agree on the role of criminal justice. Is criminal justice
intended to punish somebody? Is it intended to prevent
them from committing future crimes? Is it intended to rehabilitate? What is the actual role of that? That’s part one, because
actually it starts to shape the kinds of
data that we collect. We collect different data
depending on what we view the role of criminal
justice to be. Next, we have a whole set
of biases in how we’ve actually deployed criminal
justice practices, how we’ve engaged with policing. Our policing structures
have long been biased along axes like race, gender, income level, communities. – In theory, wouldn’t
making machines more part of the process make
it more neutral, make it less biased? – It depends on what
data it’s using. The challenge here is what
most of those systems are designed to do is to say,
Let me learn based on what I’ve known in the
past to make decisions about what should
happen in the future. – Give me an example of that. – For example, when
you’re trying to do a predictive-policing algorithm, you’re trying to say,
Where have there been criminal activities in the past? Let me send, then, law
enforcement to those sites where there is a higher
likelihood of criminal behavior. Where has there been
activity in the past? It’s the places where
police have chosen to spend time; it’s the people
that they’ve chosen to arrest. – And that might be based
on their personal biases. – Or a whole set
of other values. For example, if you
look at drug arrests, we actually know that the
drug data in the United States is that whites are far
more likely to consume and to sell drugs. Yet, when we look at the
arrest records of the United States, we
overwhelmingly arrest blacks for both consumption
and sales of drugs. As a result, when you’re
trying to predict who is most likely to engage in
criminal activity around drugs, your algorithms are
going to say, Oh, well, actually it seems to be
mostly something that occurs with black and
African American individuals. That’s not true. That’s based on flawed data. That’s the problem
in criminal justice. Could we design a system
to be more responsible? Certainly. But it all depends on the data. The problem with
machine-learning algorithms or big data or
artificial intelligence is that when the data is
flawed we are going to not only pipe that flawed
bias through the system, but we’re going to amplify it. The result is that we
increase the likelihood that we are reproducing more
and more biases in our data. – How do companies like
Facebook and Google use machine learning and
algorithms, for example, to in their case optimize
their bottom line? How do they account for
values such as democracy and privacy and free speech? – Take something
like a search engine. That’s probably the
easiest example to make sense of. When you put in a
search term like cats, what you might want to
get out of it is the Broadway show. What I might want to get
out of it is pictures of soft, fuzzy things. Part of it is the system
is trying to figure out, it’s trying to make a
good prediction of what, based on knowing about you, you actually meant by
that very vague term. The result is that the
data is used to start personalizing the
search queries. The result is that
you search for cats, you get the Broadway
show because we all know you love Broadway; I, who
have clearly watched way too many cat videos, I’m
getting lots of fuzzy animals, and that feels
all nice and fine. But what happens when
I decide to figure out about, say, a
political candidate? I want to search for the
current mayoral candidates in my home city. What is the information
that I should receive? I have a clear history
of watching a particular segment of news. Let’s say I regularly
watch Fox News. Should I receive the
Fox News link to the information about that
candidate as the first thing? Or should I receive,
for example, a New York Times response to it? The challenge with those
is those are two different social views on a
political candidate. What Google is trying to
do for its bottom line is to try to give you the
information it believes you want the most. That’s because it makes
certain that you come back and are a return customer. It fulfills your goals,
you are more likely to spend time in its services
and therefore click on its advertisements, et
cetera, et cetera. – This goes into that whole idea of confirmation bias,
that what people want in general is for their
views to be confirmed. – And what they want is
to feel like they have control over the information
that they’re receiving. So the results is that
combination of their perception, that they
have control with their perception that they’re
getting what they want, is what makes them commit to
that particular technology. This is the funny thing. People actually want to be
given the information that confirms their worldview. They want the things that
actually make them feel comfortable. It’s hard work to deal
with things that are contradictory; it’s hard work
to tease out information. People generally want things
that are much more simple. What’s challenging for me
is that as a society we’re so obsessed with us
individually, our choices, our opportunities, what
will let us feel good, that we’re not able to
think holistically about what is healthy for society. That is a challenge
at every level. We live in an
individualistic society, and even though we can use
technology to connect with people, we use it to
magnify our relationships with people that we like,
that we’re interested in, who share our values. – There’s the
magnification part, and I also want to talk
about the manipulation part. In this past election,
American intelligence believes that there was
intervention by a foreign power, specifically by Russia. There is a sense that
there was a manipulation of social media and
other search platforms. The stakes are high in
all the ways you describe them, but even to
the point that on a geopolitical scale that’s
how high the stakes are. Was that a wake-up call? – I think it’s become a
wake-up call for many. I think it’s got a long history. Let’s just take the core
data architecture that is built into our
Constitution: the census. The census is what allows
us every 10 years to count the population in the
United States and then to make decisions how we
reapportion the population and how we distribute
a lot of resources. Since 1790 when we started
actually doing this, people have manipulated
that data source. They’ve manipulated it
for all sorts of political gains, they’ve manipulated
the outputs of it for all sorts of gerrymandering,
they’ve tried to mess with it. Voting records? No different. We have a long history of
trying to mess with voter registration. That manipulation is not
just by external actors, there’s also manipulation
within our own state. Nowhere is that clearer
than the history of Jim Crow and what we’ve done
around a huge amount of racism in the United States. Here we are walking into a
2016 election with a long history of every data
type being messed with for economic gain, for
political ideology, for fun and games, for
foreign adversarial attacks. Of course people tried to
mess with this election, they always have. The question is, what was
different about this one, and how did it play out? – Okay. What was different? – For me, what I saw again
was that we started to see technologies be part
of the equation, and they started being
part of the equation on multiple fronts. On one hand, there was the
moment of using technology to manipulate the media,
and that perhaps is the one that is often
most challenging. – How was the media manipulated? – Any journalist knows
that you get phone calls trying to get you to sell
their product effectively or to tell their story
or their opinion from the White House or whatever
variation of it. Journalists have long
dealt with very powerful actors trying to
manipulate them directly. What they are less
familiar with is a world of things that look
organic designed to manipulate them. Let’s talk about some
concrete examples. When you have
decentralized populations who are running campaigns
to get content onto Twitter to make it look natural, to produce sock
puppets, basically fake accounts on Twitter, to
then write out to you as a journalist and be
like, “Hey, you know, “what’s going on with
this Pizzagate thing?” And all of a sudden, you
as a journalist are like, What is happening? Somebody in the public
has given me a tip. I need to pay attention. Except it wasn’t just somebody in the public;
it’s somebody who is intending to get a message
to you very much designed to send you down a
particular track. That’s when we started to see
massive coordinated efforts. These efforts had been
happening for social media marketing for the
better part of a decade, but we started to see it
really turn political. The interesting thing is
the political coordination of it, at least that I got
to witness, was, first, not foreign actors, it was
people who were messing with systems. I watched this pattern
with young people for the better part of 10 years. – So it was trolls and people
who were just having fun? – It started out that way. Years ago there was
nothing funnier than to get Oprah Winfrey to say
inappropriate things on TV. It was great. I watched teenagers build
these skills in order to get Oprah to say
something ludicrous. And they learned how to do this. That’s a skill that is
interesting when you start to think of how it
can be mobilized. Then we had a situation
about four or five years ago where we have a lot
of very misogynistic practices happening
through technology. New techniques,
things like doxing, the idea of finding
somebody’s full information so that you can
actually cause them harm. An example of causing them
harm would be something like swatting, which is the
idea that I would call up 911 and say that there’s
a bomb in your house. The result is that the
police would send out a SWAT team -swatting-to
your house, cordon it off, looking for the bomb. But it was a hoax, it
was not actually real. These were things that
were done to start attacking a group of
women in a whole set of phenomenon known as Gamergate. These were moments when
these same networks started to take a more problematic turn. They started actually
doing things that did a lot more harm to people. These are the cornerstones
of a lot of groups who began then trying to mess with
journalists for the election. In the beginning, it
was pure spectacle. It was hysterical to watch
during the Republican primaries this weird candidate, who for all intents and
purposes was a reality TV show star, be such a fun
game to mess with because you get the journalist
to obsess over him. – It feels scary to
hear you talk about this because it feels like
we have surrendered our control entirely to these
anonymous people that have figured out how to utilize
these technologies to manipulate societies,
governments, democracy, voters, journalists, every
aspect of society that we could talk about that is
dependent now on social media and online technologies. – But that’s been true
of every technology throughout history. – Has it?
– Yes. That was the story of film. Look at the story of film
and propaganda and the anxieties in the 1930s
that we had because we thought it was a fascist media. We’ve had these turns and
we’ve had these moments where we had wake-up calls. What we’re in the middle
of right now is a serious wake-up call. And the question is what
we’re going to build in response to it. Also, are we going to be
able to address some of the root problems that
are actually made visible during these moments, root
problems of serious racism? That is not new in this country, but for so many people
the Obama years meant, Oh, we’re past that. It’s like, no. We’re not even close
to being past that. Or these moments where we
actually have to deal with destabilized identities. We have a large number
of people in this country-especially young
people-who don’t feel secure in who they are
or where they’re going. They are so ripe
for radicalization, and that is extremely scary. We, again, have seen
this throughout history. How do we get ahead
of that and say: Whoa. It’s not just about who
is engaged currently in horrible, nefarious,
racist acts, but also who has the
potential to be where we have a moment we can
actually turn them? I think that’s where we
should be starting to be responsible about our actions. When we think about
the morality of these technologies, it’s not
just about thinking about the technologies, but
the people as they’re interfacing with them. – I agree that we can
point to different technologies throughout time, even dating back to
the printing press, as being sort of periods of, I think you’ve called it
moral panic in your writings. But that brings me to
artificial intelligence and the new dimension and
the new risks and worries that we’re hearing
about with AI. First of all, give me your
sixth-grader definition of AI, and then let’s talk
about how that maybe changes the game a little bit. – I think that what AI
currently means is not the technical definition. It’s actually about a set
of business and social processes where we’re
going to take large quantities of information,
and we’re going to use it to train a decision-making
algorithm to then produce results that we then go
and use in different ways. – Okay. And eventually that
machine will be trained to in some ways think on its own, make decisions based on
huge amounts of data, machine learning. AI is sort of that next level. – It’s not think on their
own in the way that we as humans think about thinking. It’s about going beyond
procedural decision making basically it’s training an
algorithm to design better algorithms for the
broader system. But the values are still
the whole way through. The idea that the machines
will suddenly wake up and start thinking, that
is not the case. It’s more that they will
no longer just do exactly what they’re told, they’ll
be designed to iterate themselves. – But doesn’t that
definition surrender part of a human being’s ability
to control that machine? – Part of why we have
always designed machines is to scale up our capacities. I can count pretty high,
but a machine is going to be able to count a lot
higher and a lot faster. I can certainly
divide perfectly fine, but a machine is going to be
able to divide a lot faster. Those are those
moments of scale. What you want is for
technologies to be designed in ways that
actually allow us to do things for which we simply
don’t have the capacity. Think about something
in a medical realm, detection of cancer. We have to use tools
to detect cancer. We have used a ton of
tools throughout the history of medicine. We have the ability
to use more and more sophisticated tools
based on data, artificial intelligence systems, to be able to detect
cancer faster, and over the next 10
years we’re going to see phenomenal
advancements with this. Those are the moments
where I get very excited because that is leveling
up a capacity that we don’t have. It’s also about pattern
matching in other contexts. I’ll give you one. I’m on the board of
Crisis Text Line, which is this amazing
service where we counsel young people and adults,
but primarily young people through text messaging
with trained counselors. We use a lot of
technologies to augment the capabilities of
those counselors. A counselor may have
had a hundred sessions, they have experienced
those sessions, and they use their past
knowledge to decide then how to interact with
whoever is coming in their text message stream. But what does it mean to
use technology for that counselor to learn from
the best practices of thousands of counselors
and for the technology to sit in a relationship to
her and say: Guess what? I’ve seen this pattern before. Maybe you want to ask if this
is a disordered eating issue. And that actually is
that augmentation. – That’s terrific, and
obviously there are a lot of positive uses of AI. Let’s talk about in the
context of our previous conversations, again, that
idea that every time there is a new technology,
society must reckon with how it expresses its values, and whether you feel like
artificial intelligence presents yet another
challenge to what we’ve already been talking about
here in the deployment of algorithms and machine learning. – I think it presents a
challenge for the same reasons that I’m
talking about it in these positive dimensions. Just because it can scale
positive doesn’t mean it will only scale positive. It will also scale negative. How do we grapple with that? Again, I like history because
it’s a grounding force. Watching the history of
scientists around nuclear energy is a really
good reminder of this. They saw the potential
for a much more environmentally friendly
ability to achieve energy at a significant level. Of course, we also know
where that technology got deployed in much
more horrifying ways. – Are you optimistic that
there can be a regime that can grapple with these
issues and hold the different players to
account in ways that we saw with nuclear technology? Are you optimistic about that? – Yes and no, and I say
that because I think we’ve done a decent enough
job on nuclear. We’re still contending
with it massively. We’ve done a lousy
job on climate. We have the data on that,
and we can’t get our political processes
together to actually do anything about it. So I can see both possibilities. I think there are a lot of
really good interventions or efforts being made. I think there are a lot
of attempts to build out tools to understand what’s
going on with the system. The question for me is,
it’s not actually about the technology or
about the systems; it’s about us as agented
actors in the society, and what will it take for
us to mobilize to do the hard political work? It’s not clear to me. I can see us getting there, but I would have thought
we would be a lot further on climate today than we are. That’s the challenge. – Danah boyd, thank you so much. Fascinating insights. – Thank you. (upbeat eletronic music) – [Announcer] For more
on this program and other Carnegie Ethics
Studio productions, visit There you can find video,
highlights, transcripts, audio recordings, and
other multi-media resources on Global Ethics. This program was
made possible by the Carnegie Ethics Studio
and viewers like you.

Danny Hutson

1 thought on “Global Ethics Forum: The Ethics of Big Data with danah boyd

  1. There is a great danger that the Social justice warriors hijacking the ethics and introducing their own biased unscientific Politically Correct views into the AI justice system. There should be proper checks and balances to prevent it.

Leave a Reply

Your email address will not be published. Required fields are marked *