Sunday, November 1, 2009

Social Data Revolution

Social Data Revolution, Part 1 — Time and Money: What Instantaneous and Free Communication is Doing For Consumers

Way back in time, communication seemed simple: people were home in the evening, and you could just swing by for a chat. But then the printing press was invented, greatly increasing the scope and reducing the cost of communication. Print, often complemented by services such as mail delivery, enabled firms to reach a huge number of people inexpensively.

Sears, for example, sent its catalog to millions of US households twice a year from 1896 until 1993. It was a slow world—products and prices remained valid until the next issue came out. Relevant dates, such as the delivery date, were hard to predict and rarely communicated to customers. But the customers did not expect much transparency from the firms, either—they were happy as long as the toaster they ordered eventually arrived.

Shifting the focus from transaction to relationship

In this era of limited communication, the firm only knew about the final orders, not the process of decision making. The focus was on transactions, not on relationships.

And now? The Internet allows us to reach anyone, anywhere, instantaneously. The reach of communication has increased from the people in the sender’s town to the entire world. People are social—they want to listen, comment, and be heard. But now that everyone can have a voice, who actually gets heard?

In the old world, senders bore the main cost of communication. Buying stamps and mailing out physical letters limited the number of messages generated. But in electronic communication, the marginal cost of another message is essentially zero. The bottleneck has moved from the sender to the receivers: they are becoming inundated with more requests for attention than they can deal with. The problem is that we are hard-wired to attend to new stimuli. We need to make these new technologies work for people and not against them.

The new currency: May I have your attention, please?

With all these demands on our time, how should we allocate our attention? Randomly? Perhaps—a former colleague’s strategy was to sporadically delete the messages in his inbox as his way of coping with information overload. Needless to say, though, his typical excuse (“I guess your email must have been in the batch I deleted”) was not particularly popular.

Right now, for most of us, that long-awaited love letter arrives the same way as yet another credit card solicitation. Can we do better than allocating our attention randomly? The answer comes in two parts: data, and more data.

Meta-data matter

Meta-data, data about the message, can help guide our decisions: how important it is for senders that their message gets read, and what is the message’s expected value for the reader?

Well, the simplest way to get this data is just to ask! Mr. Sender, tell us on a scale from 0 to 10: how important is it to you that the reader actually reads your message? And how much do you think the reader will get out of it?  These two numbers can help us prioritize our attention.

But taking these values by themselves won’t do the trick. Just as in the physical world, slimy marketers will try to game the system by creating the impression that their message is of utmost importance to us. They’ll try to whet our appetites and get us to open that spam.

To solve this problem, we’ll need to introduce a direct feedback mechanism by getting some data from the message’s recipient. Obviously, this wouldn’t work for physical mail—our junk mail just finds its way to the shredder. This non-response is a very weak learning signal since the sender has no way of gauging the recipient’s response to the message. It could be that the recipient was an early adopter of the sender’s product and is very happy with it. Or, he could be getting very annoyed with all these messages, to the degree that he is actually starting to hate the company!

In the world of cheap, bi-directional communication, we can do better. The receiver can directly indicate the actual value the message has for him—if he actually does enjoy receiving lots of updates, for example, he can express positive feedback.  By indicating the actual relevance for him, the receiver can increase or decrease the relevance of future messages from the same sender. That is, he directly benefits from his actions in the immediate future.

Senders, on the other hand, can benefit as well.  There is a new term in the cost function of mass communication—the cost of sending unwanted messages, as expressed by the rising voice of the consumer. Being aware of their recipients’ feedback helps them maintain their pristine reputation—senders will not benefit by becoming attention offenders.

Cheap communication allows us to calibrate senders’ predictions with the actual value perceived by the recipient. As we build up a history of direct feedback, our relevance functions will improve and allow us to prioritize our attention effectively. With free bi-directional communication, the era of the con-artist is coming to an end—only companies that respect their customers will be able to get through to them.  Since everybody has an incentive to make as accurate relevance predictions as possible, we can use the power of the community to build a good system.

To sum up, two data sources allow us to harness the power of the community: relevance predictions from the senders, and relevance assessments of the recipients.

The communication revolution is a meta-data revolution

With communication being free and instantaneous, attention is increasingly scarce. Economics is the science of scarcity. So, that’s why we need to develop an economic model of communication. Before, scarcity was on the side of the senders (time, money). It was impossible for firms to communicate effectively with large numbers of people at once, and communication/coordination between customers was even more difficult.  There was no way for an individual to effectively reach a broad audience beyond a very limited radius.  But the communication revolution has brought about many changes.  At first glance, this seemed to be great for companies—it’s now almost free to bury customers in ad campaigns!  However, now that the scarcity has shifted to the recipients (time, attention), communication needs to go beyond transactions and move to relationships. In fact, the value of relationships is greater than the value of transactions. Truly customer-centric companies like Zappos understand the value of long-term relationships and bidirectional communication.  Unfortunately, though, these companies are the exception. There are many more companies that are moving in the wrong direction by cutting costs in customer service. In general, communication between individuals and firms has not become any easier even though it’s now easier than ever for individuals to communicate with each other. When will the communication revolution allow us to easily reach all companies we want to talk to?

Social Data Revolution, Part 2 — Why We Need a Sound Data Strategy

A previous post discussed how free communication has changed the world, including the expectations and work of individuals, business, and society. This post discusses how two data revolutions (the first about passively collected clicks on, the second about actively contributed data), and the ensuing change in consumer expectation make an astute, coherent data strategy critical.

The first data revolution came from the dream of collecting data from consumer decision-making. With the advent of the web, firms pondered whether it might worth saving the vast amounts of data that customers were generating through their clicks and searches. For consumers, there was no hiding-after all, there is no online equivalent of discreetly checking out a magazine while a bookstore employee is looking the other way. has pretty much saved all user data from its beginning.

Back then, customers had no choice but to share their intentions with firms. If a technology enthusiast wanted to find out if a website sold a particular surveillance device, there was no shortcut but to type some keywords into a search box and therefore give the company a valuable intention stream. Companies, therefore, had all the power. Many tried often too hard to push products and advertisements. The consumer had no voice.

During the first data revolution, successful companies gained power by collecting, aggregating, and analyzing the customer data they collected. However, most companies did not know what to do and ended up burying their data in tombs.

The second data revolution brought about a new dimension to data creation: users started to actively contribute explicit data such as information about themselves, their friends, or about the items they purchased. These data went far beyond the clicks and search data that characterized the first decade of the web.

An early example of user-generated content was’s reviews system. The firm realized that users often trusted recommendations by other users more than promotional material found elsewhere on the web. By enabling users to actively contribute such explicit data, succeeded to leverage knowledge dormant in its large customer base to help customers with their purchasing decisions.

Later, Wikipedia increased transparency even more by allowing online collaboration. By allowing users to interact and build on top of each other, the site relinquished control over their space. The benefit of allowing such user interaction today is obvious-why spend time on hold with a customer service representative if we can just Google the cryptic error codes to see if someone else has already solved the same problem? People learned that by sheer large numbers, an online user community was likely to be more helpful than a representative employed by the company.

Today, the online world has shifted to a model of collaboration and explicit data creation. Successful firms develop systematic ways to encourage and reward users who contribute honest data. A good system does not try to trick customers into revealing demographics or contact information that is useful for the company. Rather, it rewards users with information that is useful to them.

Netflix, for example, allows users to contribute ratings for movies that they have seen. Users have an incentive to contribute accurate data because this will give them better recommendations for new movies. The 1999 “Web 2.0 company” MoodLogic (acquired by All Media Guide, in turn acquired by Macrovision) enabled users to create metadata about their favorite music. Why on earth would they do that? Because they got back playlists, which made it easier for them to discover new music they enjoyed. Such successful companies realized the key feature of a good incentive system: people need to see that they profit from the outcome in some way if they are willing to put in the effort to contribute truthfully.

In the last few years, users have gone a lot further than contributing metadata to movies and music: in fact, they have taken center stage. The center of the universe has shifted from e-business to me-business. Customers are also starting to discover each other, and to interact with each other. Knowing that they are not alone has shifted the balance of power from companies back to consumers. And they have begun to demand transparency. Customers are beginning to have a voice. They are realizing that the data they voluntarily contribute can help them and others with making decisions, providing true value. In turn, they want to be treated fairly as individuals by the companies they pay attention and money to.

What are the consequences of this change towards the expectations of consumers?

Successful interactions have become genuine communication with near-instantaneous feedback. For example, PayScale allows users to retrieve real-time salary reports based on their job title, location, education, and experience-but only after they have contributed their own data. As the expectations of users change, firms must spend more time developing incentive systems that will entice users to participate.

Indeed, the online world is beginning to be ruled by the expectations of the users. No longer is it sufficient for a search engine to cough up some hotels across the world when a weary traveler is looking for a good deal in Bangkok! As these consumer expectations shift, companies that want to stay relevant have no choice but to accept the ideas of the consumer revolution as swiftly as possible. For users, switching costs are cheap-firms can no longer think of “customer relationship management” as providing stickiness for thecustomer (just like fly paper provides stickiness to the fly). Industries such as real estate and automobiles whose business models are built on information asymmetries will quickly lose their revenues to those who increase transparency using data contributed by consumers.

This leaves us several deep questions to ponder, including what the implications are on customer expectations, and what companies can do to address these expectations. This is the social data revolution (SDR).

Yesterday alone, Facebook users issued 21 million friend requests. 17 million requests were accepted. So many new connections, and yet they’re all treated the same—what an oversimplification!

All Facebook links are created equal. But links can differ in strength—for example, a close friend versus a casual acquaintance. Links can be in different categories, like your boss versus a random hookup. And links can be asymmetric—Amy may think that Bob is a good friend, yet Bob may not trust Amy at all! The world is not a binary place.

Discovering Discovery: Don’t ask, Do tell

How can we use data to investigate these different properties of links? Today’s social networks do a lousy job of leveraging our existing data. Why do you need to manually confirm my friend request if we’re already calling, IM-ing, and emailing each other all the time? These data sources should be able to make a good guess about the strength and type of our relationship. Why not use existing data sources to propose better default responses?

If we give our networks a richer structure for our links and relationships, we will also be able to discover interesting facts about ourselves. Why is this important? By investigating implicit relations, we can gain insight into our relationships and how they work. For example, I might be surprised to find out that whenever I email my friend John, he always writes me back promptly whereas I always take 10 times longer to respond to him! Armed with this knowledge, I would ask my system to tell me to get my act together and crank out that response if I’m getting too delinquent.

Facebook 1.0 has helped us create an intimate network of our 17,000 friends. Will Facebook 2.0 help us manage them?

Mind the Explicit, Mine the Implicit

What else can data tell us about the quality of our relationships? One way to use data is to figure out differential interest in budding relationships. It’s easy to do this by looking at communications patterns in email, for example—does one person spend hours crafting that perfect email, only to get a reply that took only a few minutes to write? Or has he suddenly acquired a brand new set of favorite books, movies, and music that just happens to match his new love interest? People leave rich traces on the web—we can discover much more about them than the data they explicitly give.

This is only possible if we can look at the user’s history. After all, we can only make inferences about our behavior if we have a past to compare it against. But this introduces new questions: how much would you pay to know how long Monty spent writing you that email? How much would you pay to keep your data private?

Trust Networks

Social networks are also great for learning about trust. Let’s say that I’m thinking of entering in a business deal with you, but I don’t know you too well. Should I trust you?

There’s an easy way to use the power of networks to answer this question. Let’s just look at all of your other connections: do they trust you? We can give people reputation scores by allowing users to rate their interactions with friends. To make the system even more powerful, we could allow users to link their reputations. To illustrate: let’s say I trust my friend Mike so much that I am willing to attach a trust coefficient of 0.9. This implies that if Mike’s rating goes up by 1, I should get a rating boost of 0.9. Conversely, if someone has a bad experience with Mike and downgrades his rating by 1, my rating will also go down by 0.9. Through the power of the community, reputation ratings would spread quickly. (What trust coefficient would you attach to the author of this post?)

Reward Content Generation

One of the best ways to engage users is to get them to understand how every bit of data they contribute will end up benefiting them. In the example of trust networks, people can improve their own reputations by linking themselves with others. In my previous post on communication, I talked about a system where providing feedback on an email’s relevance would directly benefit you in the future. Online social networks need to reward people to provide explicit data, too.

The Facebook Feed was a brilliant idea for surfacing relevant content created by friends. Ideally, the Feed would create a positive feedback loop: good content provided by friends would get high ratings, which would motivate them to post even more good content. However, an early system of allowing users to rate the submissions of their friends was poorly designed—only 21% of users used the feature. On a rainy day, April 15, 2008, Facebook turned off the feedback system. What a step backward! I wish Facebook instead had created a better machine learning system to reward its users to generate and surface good content.

Social networks based on mutually confirmed binary relations was Day One in evolution of social networks. Introducing, richer semantics, more expressive structures including trust coefficients are the beginning of Day Two. What will the second week bring?

Posted via web from sdn's posterous

No comments:

Post a Comment