Drinking the Big Data Kool-Aid

  • submit to reddit

Electrical conduits are installed overhead in a server room in New York. (AP Photo/Mark Lennihan)

Electrical conduits in a server room in New York City. (AP Photo/Mark Lennihan)

One of the terms that has gotten a lot of play in the media’s NSA surveillance program coverage is “big data.” It’s a relatively new term for data sets that are so large they become hard to process and analyze. The data encompassed by the term is the digital trail of every keystroke we make: in emails, cellphone calls, credit card purchases, Google searches, tweets, Facebook status updates, etc. The list goes on, and on.

In Big Data, A Revolution That Will Transform How We Live, Work, And Think, published earlier this year, authors Viktor Mayer-Schonberger and Kenneth Cukier try to explain just how much data there is in big data. They write that “in 2013 the amount of stored information in the world is estimated to be around 1,200 exabytes, of which less than 2 percent is non-digital.”

What exactly is an exabyte, you might ask? They continue:

There is no good way to think about what this size of data means. If it were all printed in books, they would cover the entire surface of the United States some 52 layers thick. If it were placed on CD-ROMs and stacked up, they would stretch to the moon in five separate piles. In the third century B.C., as Ptolemy II of Egypt strove to store a copy of every written work, the great Library of Alexandria represented the sum of all knowledge in the world. The digital deluge now sweeping the globe is the equivalent of giving every person living on Earth today 320 times as much information as is estimated to have been stored in the Library of Alexandria.

Hundreds — probably thousands — of projects utilizing that data to improve the way the world works are already underway. A recent New York Times article profiled Mayor Bloomberg’s geek squad and the ways they are using data to solve problems around New York City.

An article in The New Yorker by Gary Marcus entitled “Steamrolled by Big Data,” notes that enthusiasm surounding big data in tech circles is “kind of a new religion.”

In your life, you’re probably most familiar with the benefits of big data by way of recommendation engines on shopping or movie sites that tell you what you might like based on what others like you, like. As Lawrence Lessig told Bill this week, he doesn’t mind seeing ads that are curated for him, “The purpose of that profiling is to narrow the information … pushed into my sphere to that information which I want.”

The New Republic’s Leon Wieseltier agrees with Lessig, with one, kind of big, caveat. He writes, “[T]he study of the consumer is one of capitalism’s oldest techniques. But it is not fine that the consumer is mistaken for the entirety of the person.”

The biggest complaint about big data is that while it’s great for correlation, it’s not so great at causality. That concerns many experts who worry about how the government is vetting the data they’ve collected, and whether they are using it to predict future criminal behavior, in a sort of Minority Report nightmare scenario.

Regardless of how the government is making use of big data, this week’s revelations have already begun a debate about personal digital data, privacy and policy that should have happened years ago, which is some good news. As Chris Hughes writes in the New Republic:

Technology may continue to grow and become more complex, but that need not preclude debate — and potentially legislation — about how it can and should be used.

The security and privacy crises that have unfolded over the past week are the perfect moment for us to ask ourselves what public policy we should adopt not only to limit the government’s ability to mine data, but the ability of technological systems to store and process this data in the first place.

  • submit to reddit
  • Strawman411

    “Find out just what any people will quietly submit to and you have the exact measure of the injustice and wrong which will be imposed on them.” ~ Frederick Douglass

  • Kenneth Rubenstein

    As Bill implies, the NSA thing is just the icing on a cake we’ve been eating more of each year for the at least a couple of decades now. It’s been at least a year since some digital God informed us that we no longer had any privacy, so get over it. That’s a very big discussion. How much of the luxury people of my generation got along with so well are we willing to surrender? Not much I’d bet. Baaaaaaa.

  • unkerjay

    We don’t seem to have a problem generating “big data” – Google, Microsoft, Bing, Yahoo, Facebook, Twitter, Tumblr, Flickr, Photobucket, Instagram, smart phones, tablets, the internet writ large.

    It’s not big data that’s the problem. Usage doesn’t seem to have gone down as a result of the knowledge of just how much data is generated, just how pervasive / invasive NSA’s reach can be and with the recent revelations of Snowden on what the NSA is capable of.

    Now that we know what our government is up to and capable of, just exactly how much has usage dipped? .00000000000001 percent maybe?

    It’s not the data that’s the problem. It’s how it’s used and by who – with or without permission, and for what purpose(s).

    We need to take control of how, when it’s collected, by who and for what purposes and opt-in rather than opt-out collection policies. We need to apply the same sensibility to data that we apply to sex – a simple understanding with force of law and enforcement that no means no.

    Short of that, as stated below, we’ll get what we’re willing to tolerate. And, so far, we seem to be willing to tolerate quite a lot despite what we know, despite what is undeniable and irrefutable.

    Either it’s sufficiently intolerable to a majority of us willing to stand up and fight against it until it is adequately resolved or we’re entirely too busy, distracted, otherwise coddled and modified with the pointless assertion that “we have nothing to hide”, which last I checked has NOTHING to do with the fourth amendment which simply states that we ALL have the right to be free from unreasonable search and seizure PERIOD regardless of whether or not we have “nothing to hide”.


    Nevermind that which we willingly surrender without a second thought.

  • tangorepublic

    The social/consumer aspects of big data are only part of the story. The bulk of big data is the kind in databases – scientific, records of all kinds across many industries. I fear this new interest in big data will become associated with “big brother” in the minds of most people. Please try harder to be clear about what the data is so people don’t think it’s all related to their personal data.

  • unkerjay

    mollified not “modified”.

  • georgiaraysman@gmail.com

    Read this blog to find out what the real danger is in the use of metadata. Eye-opening.


  • am_an_american

    Our privacy has not been private for many years. Many covers have been used to hide the fact. And the media has been one of the promoters of the covers. Our news has never been less diluted since the 50’s with comparison to our Cold War enemies. Fear is again raising its head to “hide” the truth. We are to be known for the monster computer designed to know all about us and control our accounts, life and business.

    Bill, keep on updating us with that elusive hope of truth.

  • Sweet Sixteen~But Wicked Smart

    What about MacDonald’s Hamburgrs? They put little computer chips into our intestines so that they can follow us from one MacD’s to the next MacD’s. They justify this with a cry about commericial taste tracking.

  • Bonjoartist

    It’s right to have debate about these topics; many things have been left undecided to the detriment of freedoms. Much was in place prior to Obama, I think, but it’s unlike any president to curtail his own power avenues.
    I think leakers may have some good motives, and I think NSA surveillance has some good motives, but there has to be discussion and compromise for the formation of guidelines for gathering data if the people are to have rights and freedoms preserved.

  • Sweet Sixteen~But Wicked Smart

    Too much data. Too many secrets. Just relax and let the secrets out. Who cares? Prosecute the sex criminals like Julian Asange. Send him to Sweden to face the music. Come on Julian … why don’t you admit your philosophy of “Love ’em and Leave ’em.”