Monday, July 29, 2013
User-Generated Content... Travel Blogs
Today's musings are about user-generated content, which arises in two NYT articles in the last few days:
http://travel.nytimes.com/2013/07/28/travel/travel-blogging-today-its-complicated.html?hpw
http://www.nytimes.com/2013/07/29/technology/pc-industry-fights-to-adapt-as-tablets-muscle-in.html?hpw
The first article is about travel bloggers who get their trips paid for by local tourism boards or other interested parties. In other words, the idea of the eccentric lone traveler with a backpack and an uplink is naive, with travel blogs increasingly resembling advertisements. As reported, "In March, the Federal Trade Commission had seen enough digital content that blurred the line between editorial and advertising that it issued a clarification document stating that disclosures of free trips need to be clear, concise and toward the top of posts". It's reminiscent of the issue of keyword ads versus organic search results.
The question facing readers is how to discern reliable content from content whose objectivity may be compromised by the writer's sources of funding. This is also an interesting research question, within the broader category of how people judge information quality. The article suggests a variety of clues that readers may use, such as the out-links from the blog. One imagines that readers may look for cues such as the number of followers that a blogger has, leading to a rich-get-richer effect of popularity. I personally am researching rich-get-richer effects, which I think are a big story in our Web-based information world.
The other article is about the seemingly unrelated topic of Wintel-and-PCs versus Android-Ipad-and-Tablets. It quotes Daniel Huttenlocher of Cornell University’s new New York City technology campus, who associates PC's with the function of generating content, and tablets with the function of consuming content. He notes that "There are way more consumers than producers, period, even in a world with lots of user-generated content,” much to the chagrin of Wintel.
The IS literature, including the literature on end-user computing, has instruments that distinguish and separately measure information quality from functional effectiveness, but I think the distinction between generating and consuming content may be useful in models of adoption, usage, satisfaction, etc.
So, putting the two together: User-generated content is a big story for Web 2.0 and on. For IS research, the associated business models raise questions about how users figure out what information is objective. And there may be some importance in refining our models to distinguish explicitly between usage for content-generation and usage for content-consumption.
Thursday, July 25, 2013
e-Reader Wars and Teaching Network Effects
Today's post is about David Carr's NYT piece titled Why Barnes & Noble is Good for Amazon (http://www.nytimes.com/2013/07/15/business/media/why-barnes-noble-is-good-for-amazon.html?pagewanted=all). I am looking at it just from the perspective of teaching about network effects.
In my MBA classes I still use the Harvard case on AOL's Instant Messenger as the basis for discussing network effects. That case describes a bare-bones setting -- 3 big companies fighting about a chat standard -- that routinely stupefies even the brightest students and highlights the key concepts of "cooperate to compete" (a phrase I took from Kenneth Preiss) in a network-externality setting.
In an effort to use a more up-to-date case, I sometimes consider adopting the e-reader wars as a basis for teaching this topic. But to be honest, I find these wars too complicated to neatly teach (or understand) the core concepts. In these e-reader wars, each firm is operating within an ecosystem of device manufacturers, publishers, and (e-)book retailers. For example, suppose a publisher decides to make its books available on Nooks. Then, while it continues to compete directly against other publishers, it will also be wanting to cooperate with those other publishers in trying to establish the Nook as a dominant standard. And so on. I call this "three-dimensionsal cooperate to compete under externalities" -- a mouthful that I can barely grasp. How -- what -- are we supposed to teach our Executive MBA's about this?
The e-wallet wars only up the ante. Imagine teaching a case with the following actors in direct competition:
Wal-Mart, 7-Eleven, Google, Visa, Mastercard, AT&T, DeutscheTelecom, Starbucks, PayPal, and more. This is the landscape in the e-wallet standards wars. How can anyone possibly wrap their mind around the core issues of such a case? Well, as I said, in my short courses I am satisfied if we can unravel the nature of the strategic decisions in the AOL IM chat case. I believe that that case captures many key concepts about cooperate to compete, which are then applicable to today's more complex ecosystem wars.
One other question about the article: First, assuming that Amazon also believes that Barnes & Noble is good for Amazon, does anyone expect a situation in which Amazon consciously eases the pressure on B&N to help it survive? Presumably, Amazon would like to see a weakened but existant B&N. In particular, Amazon would love to see B&N figure out a way to make money off the experience it sells, without selling too many actual books. Can we imagine any scenario in which that occurs? Any scenario in which Amazon passively or even actively "helps" B&N to move there? At the same time, Amazon may be moving away from reliance on e-books and be orienting itself more towards software and apps? So, assuming that both B&N and Amazon don't want to keep competing directly on the price of an e-book, then either B&N moves towards turf that exploits its physicality, or Amazon moves towards turf that exploits its virtuality.
Or not.
In my MBA classes I still use the Harvard case on AOL's Instant Messenger as the basis for discussing network effects. That case describes a bare-bones setting -- 3 big companies fighting about a chat standard -- that routinely stupefies even the brightest students and highlights the key concepts of "cooperate to compete" (a phrase I took from Kenneth Preiss) in a network-externality setting.
In an effort to use a more up-to-date case, I sometimes consider adopting the e-reader wars as a basis for teaching this topic. But to be honest, I find these wars too complicated to neatly teach (or understand) the core concepts. In these e-reader wars, each firm is operating within an ecosystem of device manufacturers, publishers, and (e-)book retailers. For example, suppose a publisher decides to make its books available on Nooks. Then, while it continues to compete directly against other publishers, it will also be wanting to cooperate with those other publishers in trying to establish the Nook as a dominant standard. And so on. I call this "three-dimensionsal cooperate to compete under externalities" -- a mouthful that I can barely grasp. How -- what -- are we supposed to teach our Executive MBA's about this?
The e-wallet wars only up the ante. Imagine teaching a case with the following actors in direct competition:
Wal-Mart, 7-Eleven, Google, Visa, Mastercard, AT&T, DeutscheTelecom, Starbucks, PayPal, and more. This is the landscape in the e-wallet standards wars. How can anyone possibly wrap their mind around the core issues of such a case? Well, as I said, in my short courses I am satisfied if we can unravel the nature of the strategic decisions in the AOL IM chat case. I believe that that case captures many key concepts about cooperate to compete, which are then applicable to today's more complex ecosystem wars.
One other question about the article: First, assuming that Amazon also believes that Barnes & Noble is good for Amazon, does anyone expect a situation in which Amazon consciously eases the pressure on B&N to help it survive? Presumably, Amazon would like to see a weakened but existant B&N. In particular, Amazon would love to see B&N figure out a way to make money off the experience it sells, without selling too many actual books. Can we imagine any scenario in which that occurs? Any scenario in which Amazon passively or even actively "helps" B&N to move there? At the same time, Amazon may be moving away from reliance on e-books and be orienting itself more towards software and apps? So, assuming that both B&N and Amazon don't want to keep competing directly on the price of an e-book, then either B&N moves towards turf that exploits its physicality, or Amazon moves towards turf that exploits its virtuality.
Or not.
Tuesday, July 23, 2013
Automatic Diagnosis Machine -- FDA debate
Today’s post is about an article (http://www.nytimes.com/2013/07/21/business/dissent-over-a-device-to-help-find-melanoma.html?hpw)
describing a new medical device to detect melanomas, and the factors that
affect the FDA’s decision whether to approve it, and individual doctors’
decisions whether to adopt it.
One thing that struck me about the quotations from various FDA officials and doctors, is that academics – especially in fields such as information systems, and industrial engineering -- may have a lot to offer, and that as a community, we may want to think about how to make our knowledge more visible and available to policy makers.
One example that struck me, was a thread of quotes about the
machine’s rate of false positives. A member of the FDA panel expressed concern that
the false-positive rate was too high. But anyone with an understanding of the
technology will realize that this rate is trivial to alter, and that the key
metric is not either false-positive or false-negatives in isolation – since
either of these can be trivially set to zero – but some combined measure of
them both (e.g. ROC, average-precision, etc.). It is hard to imagine – but
seems to be the case -- that the FDA panel did not know this. It also
appears that the FDA was not provided with information that properly compares
the machine against a human on such a combined measure. This is scary to me.
A slightly subtler thread that runs through the article, is
about how the device is likely to be used. The argument is raised that doctors
may get lazy and rely on the machine, in which case one has merely replaced the person with a machine. But let’s consider this argument in more
detail. First, even if this is true, the machine may be better than the human, though I am still surprised that the FDA is asked to approve or reject a device
without a clear answer to which (person or machine) works better when acting
alone. But there appears to be lurking a second, stronger version of this
argument, according to which the result of human+machine is worse than human
alone (or machine alone). That is to say, the increased laziness of the human more than offsets the benefit of his/her having a better predictor. Can this be? Is there research that shows a
phenomenon such as this?
Sunday, July 21, 2013
Web-based information and beliefs
Today's post is about how Web-based information influences beliefs.
http://www.nytimes.com/2013/07/21/us/some-mormons-search-the-web-and-find-doubt.html?hp
The NYT article is about Mormon believers who encounter challenging -- heretical, to them -- information on the Web. We all encounter -- seek -- information on the Web. And there is always the general question of how people update their beliefs in response to information (e.g. under- (or over-) weighting prior beliefs, as studied by Kahneman and Tversky). But there are also more specific questions. First, is there anything peculiar about the extent to which Web-based information influences our beliefs, as compared with other information sources? Conceptualizing this, this question really becomes, what are the characteristics of information that make it more influential on our beliefs? Prior research gives some ideas. For example, communications research has studied Source Credibility as one such factor. In Information Systems, theories such as Elaboration Likelihood Model have been used to characterize the process whereby information may influence beliefs.
But what is special about the case being described in this article, is that it represents the reader's first encounter with alternative views, i.e. to beliefs that that person had previously taken as axioms. In this case, the information that is encountered does not represent one more piece of information to a perennial stream. It doesn't just carry its informational message. It also carries an implied meta-informational message, namely, that there exist multiple competing points of view on the given issue. It is interesting to consider how this meta-message is processed, and the characteristics of meta-messages such as this that might make it more influential. In other words, are the characteristics that are important for a message to convince us to adopt a particular side or position (e.g. in ads), also important for a meta-message to convince us that an issue *has* debatable sides?
Leaving aside this academic question, the article hits home to many of us who have developed ways of 'coping' with the tensions between information and religious tradition. Here's a personal anecdote. My wife was brought up without any of the belief systems we associate with religion. Soon after we'd moved to Israel and she'd begun to study about Judaism for her later conversion, we visited the Bible Lands museum (http://blmj.org/en/), a wonderful private museum that is scientific in its "methodology" but whose content is the study of the lands (e.g. Mesopotamia, etc.) and cultures from Biblical times. Anyhow, my wife came across a timeline display. It was a long horizontal affair with color-coded events (ancient city so-and-so) depicted on glass. Somewhere about halfway through the timeline, was the event -- one among many -- "creation of the universe". As someone brought up in a Jewish-observant but modern family, this barely registered. But my wife had never had to deal with this kind of dissonance, and I will never forget the unfolding look on her face. Having just begun to open her mind to studying about the Bible, and without much practice in -- how to call it? -- constructive ambiguity, she was seriously shaken up; nothing made sense anymore. Until it did again.
Information is not only the stuff of economic decisions. Religion is not science, but it is surely a real phenomenon. And how people with religious beliefs process information -- especially dissonant information, and meta-information -- is amenable to scientific study. More-information, as provided by the Internet, is not going to lead to the demise of religious belief. But it may very well influence it to be more questioning, as this NYT article describes.
http://www.nytimes.com/2013/07/21/us/some-mormons-search-the-web-and-find-doubt.html?hp
The NYT article is about Mormon believers who encounter challenging -- heretical, to them -- information on the Web. We all encounter -- seek -- information on the Web. And there is always the general question of how people update their beliefs in response to information (e.g. under- (or over-) weighting prior beliefs, as studied by Kahneman and Tversky). But there are also more specific questions. First, is there anything peculiar about the extent to which Web-based information influences our beliefs, as compared with other information sources? Conceptualizing this, this question really becomes, what are the characteristics of information that make it more influential on our beliefs? Prior research gives some ideas. For example, communications research has studied Source Credibility as one such factor. In Information Systems, theories such as Elaboration Likelihood Model have been used to characterize the process whereby information may influence beliefs.
But what is special about the case being described in this article, is that it represents the reader's first encounter with alternative views, i.e. to beliefs that that person had previously taken as axioms. In this case, the information that is encountered does not represent one more piece of information to a perennial stream. It doesn't just carry its informational message. It also carries an implied meta-informational message, namely, that there exist multiple competing points of view on the given issue. It is interesting to consider how this meta-message is processed, and the characteristics of meta-messages such as this that might make it more influential. In other words, are the characteristics that are important for a message to convince us to adopt a particular side or position (e.g. in ads), also important for a meta-message to convince us that an issue *has* debatable sides?
Leaving aside this academic question, the article hits home to many of us who have developed ways of 'coping' with the tensions between information and religious tradition. Here's a personal anecdote. My wife was brought up without any of the belief systems we associate with religion. Soon after we'd moved to Israel and she'd begun to study about Judaism for her later conversion, we visited the Bible Lands museum (http://blmj.org/en/), a wonderful private museum that is scientific in its "methodology" but whose content is the study of the lands (e.g. Mesopotamia, etc.) and cultures from Biblical times. Anyhow, my wife came across a timeline display. It was a long horizontal affair with color-coded events (ancient city so-and-so) depicted on glass. Somewhere about halfway through the timeline, was the event -- one among many -- "creation of the universe". As someone brought up in a Jewish-observant but modern family, this barely registered. But my wife had never had to deal with this kind of dissonance, and I will never forget the unfolding look on her face. Having just begun to open her mind to studying about the Bible, and without much practice in -- how to call it? -- constructive ambiguity, she was seriously shaken up; nothing made sense anymore. Until it did again.
Information is not only the stuff of economic decisions. Religion is not science, but it is surely a real phenomenon. And how people with religious beliefs process information -- especially dissonant information, and meta-information -- is amenable to scientific study. More-information, as provided by the Internet, is not going to lead to the demise of religious belief. But it may very well influence it to be more questioning, as this NYT article describes.
Thursday, July 18, 2013
Big Data means...
Last week, the NYT ran a piece about technology that tracks users as they walk around a store (http://www.nytimes.com/2013/07/15/business/attention-shopper-stores-are-tracking-your-cell.html?emc=eta1). The idea is not new, of course. I think that casinos were some of the early adopters of this approach. They'd give you a kind of loyalty card that you'd walk around with to present for discounts, free drinks, etc., but it was actually an RFID card that allowed them to track you and learn about traffic patterns. The technology has progressed so that now a store can do that just based on your cellphone. That doesn't give them demographic data like the loyalty card does, but they couple the cellphone location data with some face-recognition stuff that reliably determines gender, age, race, etc., and presto!
What I want to talk about is, what makes big data "big"?
Gartner has defined Big Data as “big” in the three V's of Volume, Variety,
and Velocity. Boyd
and Crawford (2012) raise the notion (attributed to others) that Big Data means having the amount and kind of data that
obviates the need for a priori modeling; one may simply let the data talk. This
is an interesting, if debatable, perspective.
I'm not satisfied with these definitions. Well, let's take a step back. Why bother with definitions? Because Big Data seems to mean many things, and definitions can help to cogently define and separate the different challenges and opportunities for research and practice. For example, Gartner's 3 V's has been used to frame the technical computer-science challenges that Big Data entails. The Boyd and Crawford definition highlights some technical implications for data mining. But I'm not satisfied with these definitions, partly because they especially highlight the new technical challenges and solutions that Big Data entails, whereas I'm more interested in characterizing the new opportunities that it brings.
With this in mind, I characterize Big Data as data that is passive in its collection and in
its referent. When I say it is passive
in its referent, I mean that Big Data doesn’t only capture distinguished episodes
that happen to objects, but also their background state. And when I say that Big Data is passive in its collection, I mean that the data
is recorded not only in response to an active trigger, but regularly,
i.e. due to the mere passage of a pre-defined amount of time. Often, the data
capture is done by sensors. I think that the more important element is the first, that the referent is an object's state.
In the context of an online consumer, Big Data represents
not only a purchase, but each user click, and all the moments of inactivity in
between. In a physical-store consumer setting, it represents the full history
of the person’s movement through the store over time; this is the technology
described in the NYT article. In a logistics context, it represents not only a
transition of cargo from one modality to another, but its location,
temperature, etc. at all times. In a medical setting, it represents not only episodic
measures, but a continuous reading of the patient’s state on certain variables
(if it is his/her whole state of DNA, then due to practical limitations it is
unlikely to be a continuously updated reading, but a single or occasional
snapshot).
When considering the
opportunity presented by Big Data, the question is, what additional opportunity is
afforded when moving from data whose referents are pre-defined events to data
whose referents describe a state?
I think about this question in light of older Information Systems literature on how to really get the best benefit from data. A number of research areas within the field of information
systems have established that the greatest gains are unleashed when newly
available information facilitates entirely new processes, rather than “merely”
to improved decisions or execution of existing processes. For example, this has
been found in the context of inter-organizational information sharing, as when
a manufacturer may consider using retail-level POS data not only to make better
production decisions, but to possibly bypass the distribution network
altogether and make direct deliveries (Clemons and Row 1993). A similar idea is
the use of supply chain information to allow make-to-order to partly replace
make-to-inventory.
The early literature on the opportunity presented by Big Data (I leave completely aside the technical aspects) appears to be more oriented
towards informing decisions than re-designing processes (e.g. McAfee and
Brynolfsson 2012). This perspective
views big data as part of managing and competing by analytics (Davenport 2006).
My inclination is to consider that because it represents states and not events, Big Data
offers something different, beyond improved decision-making, but different than
altered business processes.
I just haven't yet figured out what that is.
Wednesday, July 17, 2013
Clinical Medical Trials and Design Science
A recent opinion piece on a seemingly unrelated topic has great methodological importance for research in the "design science" tradition. Design science is essentially engineering research, in which the researcher builds a system -- e.g. a recommender system, a data mining system, etc. -- and tests whether it works better than existing systems. It also includes research such as interface design, in which the researcher isolates and tests the efficacy of a single design element, as opposed to a whole system. We'll get back to both kinds of design science below, and in a later post, I will discuss the important differences between these two versions of design science. But first, the NYT article.
The New York Times article is titled "Do Clinical Trials Work?" (http://www.nytimes.com/2013/07/14/opinion/sunday/do-clinical-trials-work.html?pagewanted=all)
It writes about clinical trials of medicines. The purpose of a clinical trial (a so-called Phase 3 trial) is to test the efficacy of a proposed drug as part of the process of gaining approval from the FDA.
The chief concern that is expressed in this NYT opinion is that even after hearing the results of a study -- or indeed, of the totality of studies -- one still doesn't know which drug works best. The article implies that the reason for this is that each clinical trial tests the efficacy of a single drug, often against a placebo. This is not quite right. The reason one doesn't know which drug works best, is that each drug is treated in isolation, and it's impossible to compare the magnitude of effect on one study against the magnitude of effect in another study, due to the myriad confounding factors (e.g. different population, etc.). Distinct from this, the reason one doesn't know which COMBINATION of drugs works best, is because each drug is tested against a placebo. I will elaborate, but just to summarize till here:
PROBLEM 1: Don't know which drug works better; REASON: Each drug tested in different setting.
PROBLEM 2: Don't know which combination of drugs works best; REASON; each drug tested in isolation, against a placebo.
Finally, the article raises a third problem, which is that one doesn't know the circumstances under which one drug may work better than another; this is attributed to the fact that drug efficacy depends crucially on the presence of ("is moderated by", in academic parlance) genetic factors that most large clinical trials don't include.
In the information systems setting the question is, what do we learn from a study that pits system-with-feature-X against a "placebo" system-without-feature-X? Well, may we learn that feature X is helpful. But we don't know if this feature is better than feature Y, which was studied separately (PROBLEM 1). And more to the point, similar work is being done by dozens or hundreds of other researchers, each studying one or two system features, and demonstrating that they work better than ...nothing, i.e. the placebo. This leaves us with the question of which combination of features works best (PROBLEM 2).
This issue was recently raised in Norbert Fuhr's acceptance speech for his Salton award (http://www.is.inf.uni-due.de/bib/pdf/ir/Fuhr_12.pdf), a kind of lifetime achievement award for work done in the field of information retrieval, aka search engines. As he noted, a study by Armstrong et al. (2009) reported that there has not been any upward trend in the overall performance of (laboratory-based) information retrieval systems over the past decade or so, in spite of an endless stream of papers reporting system features that improve performance. Now, the information retrieval field does not suffer badly from PROBLEM 1. The reason is that they often use standardized data sets, meaning that even when two researchers each study a different feature, they study them on the same set of documents and queries. This would be akin to two separate medical clinical trials, each testing a different drug for a particular condition, ON THE SAME SET OF PATIENTS. Obviously, this is not practicable in the medical setting, where hundreds of studies are being carried out, each in a different hospital etc.
But like many fields in engineering, including much work in IS's design science, information retrieval does suffer from PROBLEM 2. What happens is that each year, new design features are suggested, but always in comparison with the same - call it "placebo" -- baseline, not with respect to a system that includes all previously known good-features. The result is that we are left with a sort of inventory of design features, each of which is provenly better than nothing, but with no guidance about which combination of features works best. Armstrong et al. further imply that this essentially means that the studies were "cheating", for if feature Z only works better than a placebo with "no features", but not better than a decent system that includes previously-known-to-work features, then Z cannot be said to "work" in any meaningful sense. At least, that's their view. And, it leaves us with the problem of not knowing which combination of features works best. To remedy this, they suggest that each researcher should compare his/her newly proposed design against the best-performing system that is known to date. In other words, if I propose a new design feature Z, I should test a system that has all the features that lead to the very best performance overall but that does not include Z, against a system that has all those features and ALSO feature Z. Then, if Z adds marginal benefit, we will have learned something.
I recently wrote a commentary in ACM SIGIR Forum, which diametrically opposes the suggestion that proper design science should test proposed new features as additions to previous-best-performing systems. I argue that the correct remedy to this situation is not to require comparisons against previous-best-performers, but to engage in more conceptual research. Conceptual research is about using theory to guide the invention and definition of variables, and how to measure them, and their relationships with other similarly conceived variables -- NOT ULTIMATE OUTCOMES -- and testing those definitions and relationships in empirical work. This is the a-b-c of scientific work in all fields, except in engineering fields where there is a tendency to test any proposal on ultimate performance measures.
Take an example from maritime engineering. Suppose a researcher proposes a new design element for a ship, e.g. a new material that results in a stronger hull. In the engineering-oriented approach of Armstrong et al. -- which is also implied in the NYT article -- the researcher must create (simulate) a full, best-performing total ship that includes all good design elements that yield the world's single top-performing ship. Then, based on that previous top-performer, he/she should see if the stronger hull adds any improvement to the ultimate outcome e.g. top speed or time between refueling or whatever is the ultimate outcome measure.
By contrast, in conceptual research, a researcher would propose specific, local variables that the increased hull strength is expected to affect. Indeed, the whole notion of "hull strength" would have to first be conceived as a meaningful variable to think about; it would have to be defined, on its own terms and in terms of its expected relationships with other variables. The researcher would propose the other variables that it directly affects, and would not (only) predict how that might affect ultimate outcomes. In academic parlance, a variable's direct relationship with other variables that are not ultimate outcomes, is called the "mechanism" through which the variable affects the ultimate outcome. For example, the researcher might propose that the stronger hull will reduce the ship's wake. A proper test of that hypothesis is a (simulation of) whether such a stronger hull indeed reduces the ship's wake. The importance of such research for shipbuilding is the hope that, under some conditions, the reduced wake might improve an ultimate performance measure; but that would be outside the scope of the described research.
In this conceptual world, it is not only unnecessary, but counter-productive, to test instead whether the stronger hull led to improvement in some ultimate performance measure such as top speed. Unnecessary, because we are trying to learn how things work. And counter-productive, because it might very well be that the so-called "previous best" ship would be better if we had REMOVED one of its supposedly great features, and INSTEAD used the stronger hull. It is the nature of scientific work to study direct connections between local variables, and in this effort, it is perfectly correct to use a placebo as the baseline. This is not "cheating" because the aim is not to show that my system is the winner, or to say that ships with stronger hulls will be better on some ultimate performance criterion. Rather, the aim (in this example) is to see whether a stronger hull actually reduces the ship's wake. Other researchers will do something similar, studying different sets of local variables, such as how wake interacts with wind, or what have you. Armed with these separate understandings of how things work, we may be able to predict which combinations of elements work well, under which circumstances. It's not trivial, but we're in a much better position than if we had conducted experiments that only measure ultimate performance. Each piece of conceptual research contributes insights into how things work. Then we might be able to theorize and hypothesize which combinations make sense together. This is called science, and not all engineering fields are steeped in the tradition of conceptual research.
To summarize, in conceptual research, we learn how things work, and this will ultimately guide us about what combination to expect to work. By contrast, in the horse-racing approach that dominates some engineering fields, we are indeed left with an inventory of features, but no guidance about how they work, and so no guidance about which combinations may be expected to work best.
In the medical field, actually, I think there is a strong tradition of conceptual research to complement the measurement of ultimate outcomes. Clinical trials are like those engineering studies that try to measure an ultimate performance measure. But in medicine, those same clinical trials often also measure the many layers of causal mechanisms (what led to longer life? reduced tumor size during x months; what led to that? increased susceptibility of cancer cells to destruction by X; what led to that? increased Y, supplied by the drug being tested). Thus, in the medical field like in engineering fields, the ultimate performance measure of a single study has limited meaning. X worked better than nothing, but is it better than alternative Y that was studied elsewhere? Hard to know. Is it best to use X in combination with A or in combination with B or neither? Will a regimen of X added to A, offer any benefit compared with A alone? Don't know, based solely on the clinical trial's ultimate performance measures. But the answer is not to require clinical tests to compare the addition of X to the so-called previous best performer. Rather, the answer is to focus -- as the medical field does -- also on the mechanisms, the less-than-ultimate performance measures, which explain how things are working. This yields guidance about what combinations of drugs might work best. I am no expert, but I believe that medical research, including clinical trials, does not limit itself to ultimate performance measures. Therefore, I think the situation in the medical world is not as bleak as the article portrays it. I think they accumulate knowledge of mechanisms, and this serves as the basis for contemplating what combinations might work well. In engineering design science, I am less sanguine that researchers appreciate the benefit of conceptual research.
To summarize, in design science as in all science, the most important research is conceptual research that studies mechanisms, i.e. directly related variables. This is the best way to make sustained progress on the level of whole-systems, because it guides us about which combinations of features make sense together.
The New York Times article is titled "Do Clinical Trials Work?" (http://www.nytimes.com/2013/07/14/opinion/sunday/do-clinical-trials-work.html?pagewanted=all)
It writes about clinical trials of medicines. The purpose of a clinical trial (a so-called Phase 3 trial) is to test the efficacy of a proposed drug as part of the process of gaining approval from the FDA.
The chief concern that is expressed in this NYT opinion is that even after hearing the results of a study -- or indeed, of the totality of studies -- one still doesn't know which drug works best. The article implies that the reason for this is that each clinical trial tests the efficacy of a single drug, often against a placebo. This is not quite right. The reason one doesn't know which drug works best, is that each drug is treated in isolation, and it's impossible to compare the magnitude of effect on one study against the magnitude of effect in another study, due to the myriad confounding factors (e.g. different population, etc.). Distinct from this, the reason one doesn't know which COMBINATION of drugs works best, is because each drug is tested against a placebo. I will elaborate, but just to summarize till here:
PROBLEM 1: Don't know which drug works better; REASON: Each drug tested in different setting.
PROBLEM 2: Don't know which combination of drugs works best; REASON; each drug tested in isolation, against a placebo.
Finally, the article raises a third problem, which is that one doesn't know the circumstances under which one drug may work better than another; this is attributed to the fact that drug efficacy depends crucially on the presence of ("is moderated by", in academic parlance) genetic factors that most large clinical trials don't include.
In the information systems setting the question is, what do we learn from a study that pits system-with-feature-X against a "placebo" system-without-feature-X? Well, may we learn that feature X is helpful. But we don't know if this feature is better than feature Y, which was studied separately (PROBLEM 1). And more to the point, similar work is being done by dozens or hundreds of other researchers, each studying one or two system features, and demonstrating that they work better than ...nothing, i.e. the placebo. This leaves us with the question of which combination of features works best (PROBLEM 2).
This issue was recently raised in Norbert Fuhr's acceptance speech for his Salton award (http://www.is.inf.uni-due.de/bib/pdf/ir/Fuhr_12.pdf), a kind of lifetime achievement award for work done in the field of information retrieval, aka search engines. As he noted, a study by Armstrong et al. (2009) reported that there has not been any upward trend in the overall performance of (laboratory-based) information retrieval systems over the past decade or so, in spite of an endless stream of papers reporting system features that improve performance. Now, the information retrieval field does not suffer badly from PROBLEM 1. The reason is that they often use standardized data sets, meaning that even when two researchers each study a different feature, they study them on the same set of documents and queries. This would be akin to two separate medical clinical trials, each testing a different drug for a particular condition, ON THE SAME SET OF PATIENTS. Obviously, this is not practicable in the medical setting, where hundreds of studies are being carried out, each in a different hospital etc.
But like many fields in engineering, including much work in IS's design science, information retrieval does suffer from PROBLEM 2. What happens is that each year, new design features are suggested, but always in comparison with the same - call it "placebo" -- baseline, not with respect to a system that includes all previously known good-features. The result is that we are left with a sort of inventory of design features, each of which is provenly better than nothing, but with no guidance about which combination of features works best. Armstrong et al. further imply that this essentially means that the studies were "cheating", for if feature Z only works better than a placebo with "no features", but not better than a decent system that includes previously-known-to-work features, then Z cannot be said to "work" in any meaningful sense. At least, that's their view. And, it leaves us with the problem of not knowing which combination of features works best. To remedy this, they suggest that each researcher should compare his/her newly proposed design against the best-performing system that is known to date. In other words, if I propose a new design feature Z, I should test a system that has all the features that lead to the very best performance overall but that does not include Z, against a system that has all those features and ALSO feature Z. Then, if Z adds marginal benefit, we will have learned something.
I recently wrote a commentary in ACM SIGIR Forum, which diametrically opposes the suggestion that proper design science should test proposed new features as additions to previous-best-performing systems. I argue that the correct remedy to this situation is not to require comparisons against previous-best-performers, but to engage in more conceptual research. Conceptual research is about using theory to guide the invention and definition of variables, and how to measure them, and their relationships with other similarly conceived variables -- NOT ULTIMATE OUTCOMES -- and testing those definitions and relationships in empirical work. This is the a-b-c of scientific work in all fields, except in engineering fields where there is a tendency to test any proposal on ultimate performance measures.
Take an example from maritime engineering. Suppose a researcher proposes a new design element for a ship, e.g. a new material that results in a stronger hull. In the engineering-oriented approach of Armstrong et al. -- which is also implied in the NYT article -- the researcher must create (simulate) a full, best-performing total ship that includes all good design elements that yield the world's single top-performing ship. Then, based on that previous top-performer, he/she should see if the stronger hull adds any improvement to the ultimate outcome e.g. top speed or time between refueling or whatever is the ultimate outcome measure.
By contrast, in conceptual research, a researcher would propose specific, local variables that the increased hull strength is expected to affect. Indeed, the whole notion of "hull strength" would have to first be conceived as a meaningful variable to think about; it would have to be defined, on its own terms and in terms of its expected relationships with other variables. The researcher would propose the other variables that it directly affects, and would not (only) predict how that might affect ultimate outcomes. In academic parlance, a variable's direct relationship with other variables that are not ultimate outcomes, is called the "mechanism" through which the variable affects the ultimate outcome. For example, the researcher might propose that the stronger hull will reduce the ship's wake. A proper test of that hypothesis is a (simulation of) whether such a stronger hull indeed reduces the ship's wake. The importance of such research for shipbuilding is the hope that, under some conditions, the reduced wake might improve an ultimate performance measure; but that would be outside the scope of the described research.
In this conceptual world, it is not only unnecessary, but counter-productive, to test instead whether the stronger hull led to improvement in some ultimate performance measure such as top speed. Unnecessary, because we are trying to learn how things work. And counter-productive, because it might very well be that the so-called "previous best" ship would be better if we had REMOVED one of its supposedly great features, and INSTEAD used the stronger hull. It is the nature of scientific work to study direct connections between local variables, and in this effort, it is perfectly correct to use a placebo as the baseline. This is not "cheating" because the aim is not to show that my system is the winner, or to say that ships with stronger hulls will be better on some ultimate performance criterion. Rather, the aim (in this example) is to see whether a stronger hull actually reduces the ship's wake. Other researchers will do something similar, studying different sets of local variables, such as how wake interacts with wind, or what have you. Armed with these separate understandings of how things work, we may be able to predict which combinations of elements work well, under which circumstances. It's not trivial, but we're in a much better position than if we had conducted experiments that only measure ultimate performance. Each piece of conceptual research contributes insights into how things work. Then we might be able to theorize and hypothesize which combinations make sense together. This is called science, and not all engineering fields are steeped in the tradition of conceptual research.
To summarize, in conceptual research, we learn how things work, and this will ultimately guide us about what combination to expect to work. By contrast, in the horse-racing approach that dominates some engineering fields, we are indeed left with an inventory of features, but no guidance about how they work, and so no guidance about which combinations may be expected to work best.
In the medical field, actually, I think there is a strong tradition of conceptual research to complement the measurement of ultimate outcomes. Clinical trials are like those engineering studies that try to measure an ultimate performance measure. But in medicine, those same clinical trials often also measure the many layers of causal mechanisms (what led to longer life? reduced tumor size during x months; what led to that? increased susceptibility of cancer cells to destruction by X; what led to that? increased Y, supplied by the drug being tested). Thus, in the medical field like in engineering fields, the ultimate performance measure of a single study has limited meaning. X worked better than nothing, but is it better than alternative Y that was studied elsewhere? Hard to know. Is it best to use X in combination with A or in combination with B or neither? Will a regimen of X added to A, offer any benefit compared with A alone? Don't know, based solely on the clinical trial's ultimate performance measures. But the answer is not to require clinical tests to compare the addition of X to the so-called previous best performer. Rather, the answer is to focus -- as the medical field does -- also on the mechanisms, the less-than-ultimate performance measures, which explain how things are working. This yields guidance about what combinations of drugs might work best. I am no expert, but I believe that medical research, including clinical trials, does not limit itself to ultimate performance measures. Therefore, I think the situation in the medical world is not as bleak as the article portrays it. I think they accumulate knowledge of mechanisms, and this serves as the basis for contemplating what combinations might work well. In engineering design science, I am less sanguine that researchers appreciate the benefit of conceptual research.
To summarize, in design science as in all science, the most important research is conceptual research that studies mechanisms, i.e. directly related variables. This is the best way to make sustained progress on the level of whole-systems, because it guides us about which combinations of features make sense together.
Sunday, July 14, 2013
Leveling the Playing Field of Financial Information Access -- a Good Thing?
http://www.nytimes.com/2013/07/13/business/the-ethics-of-a-split-second-advantage-for-traders.html?hp
New York's Attorney General forced Thomson Reuters to stop releasing special data 2-seconds-early to those willing to pay a steep fee. The idea is to level the playing field by not allowing a situation in which rich folks and institutions get to trade on the data by taking positions within the two seconds before the data is released to regular folks.
In the mid-1990's, I played a minor role in the EDGAR Online project, in which the documents filed by all public companies with the SEC -- documents in the so-called EDGAR database -- were made accessible via the Internet for the first time. Previously, they had been accessible only via an (expensive) data feed from a particular vendor, Mead Data Central, that had the sole contract with the SEC for disseminating this data. The idea was to level the playing field.
But, it was decided to release the data to the masses one day later, while still allowing anyone with deep pockets to purchase the no-delay data feed.
But-but, once the data had become Internet-friendly for the sake of the Internet-based dissemination, a whole industry was spawned by companies like EDGAR-Online, offering the whole gamut of access, from full and immediate with value-added etc., to raw and day-delayed. EDGAR-Online pays a large fee for the data, but recoups it by (adding value and) re-selling it.
And so, back to the current case. The New York Times article raises one possible argument against -- or with reservations about -- leveling the playing field as New York's Attorney General has required. They note the possibility that if no one is allowed to get the data early, then no one will pay for it, and if no one will pay for it, then it won't get created. (The counter-counter argument was that, er, yes it will). That argument comes direct from the academic accounting literature. But to my mind, and based on my experience with EDGAR, there are equally important arguments against this well-meaning effort to level the playing field. One argument is that in today's very sophisticated information markets, regular folks are not handicapped in any meaningful way, because data re-sellers will step in and sell it at affordable prices to the masses. This is what happened with EDGAR.
It may be preferable to allow a free market to establish prices for different cuts and speeds of access to the information. This is anyhow what happens, because while the Attorney General thinks he has now leveled the playing field, that's only for raw access. It's still only the "rich" who can get instantaneous trend analysis (to know if a number signifies an increased or a decrease in whatever metric), cross-referencing of prominent names, or whatever other analysis will now be the basis for trading on the information. In other words, there's something arbitrary about leveling the playing field; one has merely leveled the raw input. Well, isn't it better to at least do that? I'm not sure it has any benefit, which leads me to my next point, a thought exercise.
Imagine that the playing field were to be totally level in the sense that everyone has access not only to the same raw data, but to the same analyses, and analyses of analyses, etc. As is also well known from the academic literature in financial accounting, this would result in much more agreement between people, which, in turn, means fewer opportunities to trade, to the benefit of no one. This is just a fancy of way of saying that informational differences are what makes for horse trading. The ideal situation, then, is not one in which all information is totally equal and disseminated to all. Rather, the ideal situation is that different entities decide which information they choose to purchase. The ideal situation is one in which this Thomson Reuters data set, and all the other data sets, are all being bought and sold and re-sold in markets, with different investors deciding what information to pay for. That's what gives you a market, both for information and for financial instruments. That's much better than a situation in which all those sources of information are given away freely and identically to all, where's there's no information market, and in which there's less basis for trading in the financial market.
So, even New York's Attorney General will not really be trying to level the playing field. Advised by finance professors, he will be balancing the wish for a level playing field, with a wish for horse trading. Does it make sense to find this balance by standardizing access to some basic data feed? I don't think so. IF IT WERE THE CASE that the only way to get the data, or to get it non-delayed, is to fork over a million bucks, then clearly, regular people would be at a systematic disadvantage. In such a case, it would not be that different people decide to get different information, only that rich people or institutions always decide to get it, and everyone else always "decides" not to. But that is often not the case, at least not in the black and white way it may sometimes be portrayed. Rather, the case is probably that different quality-levels of data are available for sale for different prices, for various data sets. In this case, it may be better -- i.e. Pareto better, better for everyone, not only the rich -- to allow information markets to flourish for them all.
New York's Attorney General forced Thomson Reuters to stop releasing special data 2-seconds-early to those willing to pay a steep fee. The idea is to level the playing field by not allowing a situation in which rich folks and institutions get to trade on the data by taking positions within the two seconds before the data is released to regular folks.
In the mid-1990's, I played a minor role in the EDGAR Online project, in which the documents filed by all public companies with the SEC -- documents in the so-called EDGAR database -- were made accessible via the Internet for the first time. Previously, they had been accessible only via an (expensive) data feed from a particular vendor, Mead Data Central, that had the sole contract with the SEC for disseminating this data. The idea was to level the playing field.
But, it was decided to release the data to the masses one day later, while still allowing anyone with deep pockets to purchase the no-delay data feed.
But-but, once the data had become Internet-friendly for the sake of the Internet-based dissemination, a whole industry was spawned by companies like EDGAR-Online, offering the whole gamut of access, from full and immediate with value-added etc., to raw and day-delayed. EDGAR-Online pays a large fee for the data, but recoups it by (adding value and) re-selling it.
And so, back to the current case. The New York Times article raises one possible argument against -- or with reservations about -- leveling the playing field as New York's Attorney General has required. They note the possibility that if no one is allowed to get the data early, then no one will pay for it, and if no one will pay for it, then it won't get created. (The counter-counter argument was that, er, yes it will). That argument comes direct from the academic accounting literature. But to my mind, and based on my experience with EDGAR, there are equally important arguments against this well-meaning effort to level the playing field. One argument is that in today's very sophisticated information markets, regular folks are not handicapped in any meaningful way, because data re-sellers will step in and sell it at affordable prices to the masses. This is what happened with EDGAR.
It may be preferable to allow a free market to establish prices for different cuts and speeds of access to the information. This is anyhow what happens, because while the Attorney General thinks he has now leveled the playing field, that's only for raw access. It's still only the "rich" who can get instantaneous trend analysis (to know if a number signifies an increased or a decrease in whatever metric), cross-referencing of prominent names, or whatever other analysis will now be the basis for trading on the information. In other words, there's something arbitrary about leveling the playing field; one has merely leveled the raw input. Well, isn't it better to at least do that? I'm not sure it has any benefit, which leads me to my next point, a thought exercise.
Imagine that the playing field were to be totally level in the sense that everyone has access not only to the same raw data, but to the same analyses, and analyses of analyses, etc. As is also well known from the academic literature in financial accounting, this would result in much more agreement between people, which, in turn, means fewer opportunities to trade, to the benefit of no one. This is just a fancy of way of saying that informational differences are what makes for horse trading. The ideal situation, then, is not one in which all information is totally equal and disseminated to all. Rather, the ideal situation is that different entities decide which information they choose to purchase. The ideal situation is one in which this Thomson Reuters data set, and all the other data sets, are all being bought and sold and re-sold in markets, with different investors deciding what information to pay for. That's what gives you a market, both for information and for financial instruments. That's much better than a situation in which all those sources of information are given away freely and identically to all, where's there's no information market, and in which there's less basis for trading in the financial market.
So, even New York's Attorney General will not really be trying to level the playing field. Advised by finance professors, he will be balancing the wish for a level playing field, with a wish for horse trading. Does it make sense to find this balance by standardizing access to some basic data feed? I don't think so. IF IT WERE THE CASE that the only way to get the data, or to get it non-delayed, is to fork over a million bucks, then clearly, regular people would be at a systematic disadvantage. In such a case, it would not be that different people decide to get different information, only that rich people or institutions always decide to get it, and everyone else always "decides" not to. But that is often not the case, at least not in the black and white way it may sometimes be portrayed. Rather, the case is probably that different quality-levels of data are available for sale for different prices, for various data sets. In this case, it may be better -- i.e. Pareto better, better for everyone, not only the rich -- to allow information markets to flourish for them all.
Intro: Information Systems in the New York Times; A Teaching and Research Resource
Hello
I'm a lecturer on Information Systems, and I often read the online New York Times. I find that many NYT articles relate to information systems in interesting ways. The format of this blog is to use NYT stories as a basis for discussions that are of general interest, but especially to the research, practice, and teaching of information systems.
Hope you find something worthwhile here.
David Bodoff
I'm a lecturer on Information Systems, and I often read the online New York Times. I find that many NYT articles relate to information systems in interesting ways. The format of this blog is to use NYT stories as a basis for discussions that are of general interest, but especially to the research, practice, and teaching of information systems.
Hope you find something worthwhile here.
David Bodoff
Subscribe to:
Posts (Atom)