Thursday, July 18, 2013

Big Data means...


Last week, the NYT ran a piece about technology that tracks users as they walk around a store (http://www.nytimes.com/2013/07/15/business/attention-shopper-stores-are-tracking-your-cell.html?emc=eta1). The idea is not new, of course. I think that casinos were some of the early adopters of this approach. They'd give you a kind of loyalty card that you'd walk around with to present for discounts, free drinks, etc., but it was actually an RFID card that allowed them to track you and learn about traffic patterns. The technology has progressed so that now a store can do that just based on your cellphone. That doesn't give them demographic data like the loyalty card does, but they couple the cellphone location data with some face-recognition stuff that reliably determines gender, age, race, etc., and presto!

What I want to talk about is, what makes big data "big"?

Gartner has defined Big Data as “big” in the three V's of Volume, Variety, and Velocity. Boyd and Crawford (2012) raise the notion (attributed to others) that Big Data means having the amount and kind of data that obviates the need for a priori modeling; one may simply let the data talk. This is an interesting, if debatable, perspective.

I'm not satisfied with these definitions. Well, let's take a step back. Why bother with definitions? Because Big Data seems to mean many things, and definitions can help to cogently define and separate the different challenges and opportunities for research and practice. For example, Gartner's 3 V's has been used to frame the technical computer-science challenges that Big Data entails. The Boyd and Crawford definition highlights some technical implications for data mining. But I'm not satisfied with these definitions, partly because they especially highlight the new technical challenges and solutions that Big Data entails, whereas I'm more interested in characterizing the new opportunities that it brings. 

With this in mind, I characterize Big Data as data that is passive in its collection and in its referent.  When I say it is passive in its referent, I mean that Big Data doesn’t only capture distinguished episodes that happen to objects, but also their background state. And when I say that Big Data is passive in its collection, I mean that the data is recorded not only in response to an active trigger, but regularly, i.e. due to the mere passage of a pre-defined amount of time. Often, the data capture is done by sensors. I think that the more important element is the first, that the referent is an object's state. 

In the context of an online consumer, Big Data represents not only a purchase, but each user click, and all the moments of inactivity in between. In a physical-store consumer setting, it represents the full history of the person’s movement through the store over time; this is the technology described in the NYT article. In a logistics context, it represents not only a transition of cargo from one modality to another, but its location, temperature, etc. at all times. In a medical setting, it represents not only episodic measures, but a continuous reading of the patient’s state on certain variables (if it is his/her whole state of DNA, then due to practical limitations it is unlikely to be a continuously updated reading, but a single or occasional snapshot). 

When considering the opportunity presented by Big Data, the question is, what additional opportunity is afforded when moving from data whose referents are pre-defined events to data whose referents describe a state?

I think about this question in light of older Information Systems literature on how to really get the best benefit from data. A number of research areas within the field of information systems have established that the greatest gains are unleashed when newly available information facilitates entirely new processes, rather than “merely” to improved decisions or execution of existing processes. For example, this has been found in the context of inter-organizational information sharing, as when a manufacturer may consider using retail-level POS data not only to make better production decisions, but to possibly bypass the distribution network altogether and make direct deliveries (Clemons and Row 1993). A similar idea is the use of supply chain information to allow make-to-order to partly replace make-to-inventory.

The early literature on the opportunity presented by Big Data (I leave completely aside the technical aspects) appears to be more oriented towards informing decisions than re-designing processes (e.g. McAfee and Brynolfsson 2012).  This perspective views big data as part of managing and competing by analytics (Davenport 2006).

My inclination is to consider that because it represents states and not events, Big Data offers something different, beyond improved decision-making, but different than altered business processes. 

I just haven't yet figured out what that is.


No comments:

Post a Comment