Facebook and Tumblr Journalism: Why They Should Release More Data

A little over a year ago, the social blogging platform Tumblr dipped its toe into journalism with a new site called Storyboard. Staffed by three professional journalists, it aimed to highlight the neat things people were doing with Tumblr, and produced some interesting work. But on Tuesday, Tumblr axed the department. Analysts quickly figured it was just because the company had lost its appetite for such experiments when it had yet to turn a profit.

Other social networks’ editorial side projects haven’t fared much better. Last year, Facebook hired journalist Dan Fletcher as the "managing editor" of a project called FacebookStories that was supposed to illuminate how important Facebook had become to peoples’ lives, mostly through feel-good vignettes and blurbs contributed by names you might recognize. Last month, he announced he was leaving of his own accord, because the site didn't actually need reporters (although a Facebook spokesman says FacebookStories will continue). "There is no more engaging content Facebook could produce than you talking to your family and friends," Fletcher told a college audience, explaining his departure.

The problem with Storyboard and FacebookStories isn’t that Tumblr or Facebook wanted to generate editorial content, or even that they only wanted to do so to draw attention to their own users. It’s hard to sift through social media sometimes, and platforms should highlight the best content they host. Rather, the problem was that both companies misunderstood their most valuable journalistic product: not puffy human interest stories, but the aggregate data they gather about how people behave online.

Most of the time, we hear about data as the stuff social media sites sell to advertisers. But it can be tremendously revealing about social dynamics. The best example of this is the dating site OKCupid, which for two years maintained a blog using statistics gleaned from its user base. It was fascinating stuff: Women of all races strongly prefer white men, people lie all the time about their appearance, and straight people have gay sex too. OKCupid didn't just write it for kicks, though. It also played a key role in bringing people to the site.

"I wrote it with the dinner party in mind," says co-founder and blog author Christian Rudder. "Like people talking about shit they heard on NPR. 'This site OKCupid has this blog.' It made it easier for people to talk about online dating in a third person way." (The blog went dark after OKCupid was purchased by Match.com, which publishes occasional surveys; Rudder has also been busy working on a book-length version.)

It’s also clear with a site like Twitter, which doesn’t self-analyze that much but makes available a good slice of its data to outside researchers, who’ve used it for everything from mapping the spread of swine flu to predicting the stock market. The data becomes more valuable every year, as more people create accounts and make sample sizes more representative of the overall population.

Facebook, on the other hand, is stingier with its data. Although it does a ton of internal analysis, the company keeps its most interesting insights private, aside from occasional papers. Journalists and academics would have a field day measuring how information spreads, to what extent people self-segregate into identity-based communities, the prevalence of racist speech—name your question, and Facebook can probably answer it. But right now, its application programming interface only covers public profiles, which isn't much to go on for academics, since people tend to be more private on Facebook than on Twitter.

"It is a lot harder to get data from Facebook," says Chris Vargo, a PhD student at the University of North Carolina-Chapel Hill who uses social media for political science research. "And that's why there are so many fewer studies on it."

There are two main reasons for this. First, Facebook is tip-toeing around privacy, and worries that crunching user data—even anonymously—could make customers aware just how much the company knows about them. (Facebook isn’t unique in this regard; Uber, an app for hailing taxis, publishes some analysis, but worries that customer identities could be derived from raw geolocation data if they were to give it out. That’s a real concern, but could be ameliorated by entrusting it only to credible institutions with research codes of ethics.)

Second, Facebook already sells that data to advertisers; it might be less valuable if they gave it away for free to journalists and academics. Facebook used to feature a tool called Lexicon, which was like Google Trends, but killed it in 2010 in favor of "focusing development on our analytics tools for Page owners, advertisers and Platform developers."

Facebook does have some social consciousness. It already tries to encourage organ donation and voting by creating badges for people to broadcast their virtuous choices, for example, and CEO Mark Zuckerberg is now using his massive platform to push for immigration reform. Freeing its data, however, has perhaps the greatest potential for good. The same goes for Tumblr, LinkedIn, and any number of other social networking and online gaming sites. They don’t need to employ “journalists” to tell small stories about their users. They should at least help real journalists and other researchers tell big stories about the rest of the world.