Last week, the NYT ran a piece about technology that tracks users as they walk around a store (http://www.nytimes.com/2013/07/15/business/attention-shopper-stores-are-tracking-your-cell.html?emc=eta1). The idea is not new, of course. I think that casinos were some of the early adopters of this approach. They'd give you a kind of loyalty card that you'd walk around with to present for discounts, free drinks, etc., but it was actually an RFID card that allowed them to track you and learn about traffic patterns. The technology has progressed so that now a store can do that just based on your cellphone. That doesn't give them demographic data like the loyalty card does, but they couple the cellphone location data with some face-recognition stuff that reliably determines gender, age, race, etc., and presto!
What I want to talk about is, what makes big data "big"?
Gartner has defined Big Data as “big” in the three V's of Volume, Variety,
and Velocity. Boyd
and Crawford (2012) raise the notion (attributed to others) that Big Data means having the amount and kind of data that
obviates the need for a priori modeling; one may simply let the data talk. This
is an interesting, if debatable, perspective.
I'm not satisfied with these definitions. Well, let's take a step back. Why bother with definitions? Because Big Data seems to mean many things, and definitions can help to cogently define and separate the different challenges and opportunities for research and practice. For example, Gartner's 3 V's has been used to frame the technical computer-science challenges that Big Data entails. The Boyd and Crawford definition highlights some technical implications for data mining. But I'm not satisfied with these definitions, partly because they especially highlight the new technical challenges and solutions that Big Data entails, whereas I'm more interested in characterizing the new opportunities that it brings.
With this in mind, I characterize Big Data as data that is passive in its collection and in
its referent. When I say it is passive
in its referent, I mean that Big Data doesn’t only capture distinguished episodes
that happen to objects, but also their background state. And when I say that Big Data is passive in its collection, I mean that the data
is recorded not only in response to an active trigger, but regularly,
i.e. due to the mere passage of a pre-defined amount of time. Often, the data
capture is done by sensors. I think that the more important element is the first, that the referent is an object's state.
In the context of an online consumer, Big Data represents
not only a purchase, but each user click, and all the moments of inactivity in
between. In a physical-store consumer setting, it represents the full history
of the person’s movement through the store over time; this is the technology
described in the NYT article. In a logistics context, it represents not only a
transition of cargo from one modality to another, but its location,
temperature, etc. at all times. In a medical setting, it represents not only episodic
measures, but a continuous reading of the patient’s state on certain variables
(if it is his/her whole state of DNA, then due to practical limitations it is
unlikely to be a continuously updated reading, but a single or occasional
snapshot).
When considering the
opportunity presented by Big Data, the question is, what additional opportunity is
afforded when moving from data whose referents are pre-defined events to data
whose referents describe a state?
I think about this question in light of older Information Systems literature on how to really get the best benefit from data. A number of research areas within the field of information
systems have established that the greatest gains are unleashed when newly
available information facilitates entirely new processes, rather than “merely”
to improved decisions or execution of existing processes. For example, this has
been found in the context of inter-organizational information sharing, as when
a manufacturer may consider using retail-level POS data not only to make better
production decisions, but to possibly bypass the distribution network
altogether and make direct deliveries (Clemons and Row 1993). A similar idea is
the use of supply chain information to allow make-to-order to partly replace
make-to-inventory.
The early literature on the opportunity presented by Big Data (I leave completely aside the technical aspects) appears to be more oriented
towards informing decisions than re-designing processes (e.g. McAfee and
Brynolfsson 2012). This perspective
views big data as part of managing and competing by analytics (Davenport 2006).
My inclination is to consider that because it represents states and not events, Big Data
offers something different, beyond improved decision-making, but different than
altered business processes.
I just haven't yet figured out what that is.
No comments:
Post a Comment