Chapter 8

A Journalist’s Take on Open Data

Journalists are bad at math. No, really. We’re really bad at math. The joke goes that we all went into media because we’re unable to figure out the proper tip on a restaurant check.

Nonetheless, data is not foreign to reporters. We regularly comb financial reports to pump out quarterly earnings or interpret annual municipal budgets. At times, governments, nonprofits, and researchers are kind enough to do the heavy-lifting for us, providing executive summaries, bullet points, and numbers, broken out into figures that can easily be turned around on deadline. In larger newsrooms, there are teams that specialize in computer-assisted reporting (CAR). They may work with graphics and application teams to crunch numbers and visually display them, utilizing interactive graphics, maps, and charts.

I was lucky enough to have worked at places like the New York Times and the Wall Street Journal, which had graphics editors and special projects managers who coordinated with reporters and editors on long-term projects. These were often the stories that would run on page one and were given additional resources from photo and graphics departments.

Such stories are the news analysis items, features that would be referred to in-house as a “tick tock.” While the story carries the byline of one or two reporters, there is often a large team of contributors, some of whom appear with production credits on interactive graphics that are produced and paired with the piece.

News analysis items allow media organizations to break away from the day-to-day rush of breaking news and concentrate on a story to extrapolate the information and get to the underlying reasons for a policy being enacted. These types of stories examine relationships over a period of time to unearth new information or contextualize data that, at first glance, seems too obtuse to the general public.

While recounting personal and dramatic narratives is always a focus, obtaining documents and records is just as important. Big data projects are nothing new to newsrooms. Some of the more renowned ones would end up winning awards and prompting government action or public outcry. The Sun Sentinel won a Pulitzer Prize for public service this year for its series on speeding cops, which utilized a database of transponders to determine that many cops were averaging speeds around or in excess of 120mph, posing a significant safety risk (The Pulitzer Prizes, 2013; Sun Sentinel, 2012)

Investigative and special projects teams at news outlets can analyze data that is extrapolated from computers or with pencil and paper. While many news outlets are coping with diminishing staffs, reporters and editors have adapted, utilizing workshops and conferences to learn data and digital skills to aid in their reporting. At WBEZ, the Chicago-based public radio station where I work as a data reporter and web producer, we’ve invited some of these individuals to train and educate our news staff on working with new tools, analytics, and large data.

News organizations (if they have the money) will sometimes employ researchers and database analysts who assist reporters in making sense of government reports, budgets, and almost everything that is obtained painfully via Freedom of Information Act (FOIA) requests. More often than not, reporters will wait tirelessly for a FOIA request, only to get a PDF or paper file, instead of an electronic file, with blacked out portions of police reports, instead of vivid accounts—and sometimes an outright denial, which requires an exhausting appeal.

That’s just for the hard-to-get information, though. If a reporter needs a quick figure or the name of a city vendor, it requires calling a public affairs officer, who has to track down an employee to look up said information. It’s absurdly cumbersome and makes it harder to have fleshed out details on deadline, much to the chagrin of stressed journalists.

Fortunately, the access to some of that information is now changing. I experienced firsthand the effects of this shift on journalism when I moved back to Chicago just as the city’s open data movement was taking off. As a bit of background, I’m a Chicago native. Born on the West Side, I resided in the largely quiet Northwest Side neighborhood of Jefferson Park, attending Catholic school on the city’s North Side in the Lake View neighborhood.

I attended Columbia College in 2002, majoring in journalism with a concentration on news reporting and writing for newspapers. I was an intern web producer at Chicago’s CBS affiliate in 2004, a reporting intern at the Chicago RedEye in 2005, and a multimedia intern at the Chicago Tribune before graduating in 2006. After graduating, I had an internship as a web producer at the New York Times, and in 2007, became a senior web editor for the New York Daily News.

Eventually, I found my way to the Wall Street Journal (WSJ), where my responsibilities varied from producing web content, managing the homepage, and growing the paper’s online news presence over the weekends. My responsibilities eventually evolved to include the production of mobile applications.

In the summer of 2010, I was leading a lot of the production for the WSJ’s iPad and mobile apps and helped test and launch the international editions for the paper’s Asia and Europe iPad editions. Working at the WSJ was a lot of fun. The paper evolved at a fast pace to catch up from being a paper that focused on long-form, “day after” news to one that had to keep pace with financial news competitors that put a premium on the speed of proprietary information.

Chicago is a hard city to leave, though, and I felt the need to return to my hometown in 2011, finding a place at WBEZ, the public radio station I spent much of my college nights listening to while banging out stories for my reporting classes.

During my first year back, I had to reacclimatize to Chicago’s news scene. I was lucky that I was coming home under a new mayor, which meant all of Chicago’s reporters were starting fresh. Mayor Rahm Emanuel wasted no time in shaking up the city’s departments with a flurry of new appointments, which included the creation of the city’s first Chief Data Officer and Chief Information Officer, a role filled by Brett Goldstein.

Goldstein and his newly formed Department of Innovation and Technology was quick to tap into the active group of civic hackers and developers, some of whom authored a smattering of blogs that relied on manual data collection at times. One worth noting is, which was run by a civically active biker named Steve Vance. Vance blogs about transportation and biking issues, but has also created and hosts a number of GIS files, which includes bike crash incidents and bike routes.

Goldstein’s employees would actively ask civic developers what they needed. Those developers would start projects, but then say they needed a particular map file or dataset. Goldstein’s department would then set into motion the release and automatic updates of those files via the city’s data portal site.

I turned my attention to the issue of crime in the city at the same time all this information was released. There was a noticeable uptick in violent crime that year, and news organizations began to capitalize on mapping applications to aggregate crime data. RedEye reporter Tracy Swartz made a notable effort by manually compiling data from the Cook County Medical Examiner on homicides in the city.

The data was obtained the old-fashioned way: retrieving reports, which she then compiled into tables to list victim names, age, race, and gender, as well as the occurrence and times of crimes. That dataset allowed the Tribune (RedEye’s parent company) and others to visualize where Chicago’s murders were happening, parse it by date, and note whether it was by gunshot or other infliction.

That same year, Goldstein’s department began to compile and release datasets, which included crime stats and the city’s GIS map files. I was relatively new to mapping at the time. While at the Wall Street Journal, my interactions with mapping involved updating paths of hurricane maps or the locations of restaurants the paper had reviewed. That summer, though, there were highly publicized robberies and violent assaults in Chicago’s gay entertainment district, called Boystown. I utilized the crime data, with the help of my intern Meg Power, to map out violent crime in the neighborhood.

While there was an increase in robberies, overall crime in the neighborhood was roughly the same or decreasing year after year. You wouldn’t be able to tell that from the city’s news coverage. A viral video of a fight and consequent stabbing that injured one person on the Fourth of July weekend had news vans parked in front of gay bars for a week.

There was a certain level of hyperbole that would trump crime data at times as a spat of “flash mobs” garnered attention from even national news outlets. The Wall Street Journal ran the headline “Chicago Police Brace for ‘Flash Mob’ Attacks.” I covered a contentious town hall meeting on policing, where one resident, clad in designer clothes, described the flower-festooned streets of Chicago’s Lake View neighborhood as a “war zone.” In the summer of 2011, there were moments where cellphone robberies on the CTA transit were being shared on social media en masse by news outlets.

Both officials and residents would cite data, but interpret it differently. In regards to the city’s homicide numbers, does one count a death of an individual by a CPD officer? In regards to crime on CTA, do you count crimes at bus stops? Or do you include all crime or only violent crime as the CTA and police have? (See Ramos, 2012; 2013.)

There were times when crime data was the story ahead of the people affected. I’ve heard police officials cite homicides using the terms like “down this quarter over last year,” as if the city were reporting quarterly earnings to investors. I would see similar reports from news outlets, which would use murder tallies as the main emphasis in lieu of reporting on the social issues that caused the crime. WGN TV recently reported on how the media covered the city’s homicides. One of the interviewees coined the term “scoreboard reporting,” alluding to regular roundups done by outlets for weekend violence: eight shot last night, eight shot in Roseland, eight shot in Englewood (Hall, 2013).

In 2011, it was surreal to see seasoned newspapers and TV stations respond to relatively minor crimes—long a part of Chicago’s fabric—as if a mugging was something new. Some were making Twitter updates in high frequency in order to be the first, capitalizing on the referrals to their websites—and therefore, increased pageviews.

I’ve personally been a vocal critic about this practice, as it ignores the very real possibility of stoking hysteria and does little to inform the public about the underlying social problems contributing to the crime. I’m not at all saying that crime reporting shouldn’t be done, but I take issue with a series of retweets about a person getting their iPhone or iPad stolen before the facts of the case are known.

The result may cause officials or police to rush an effort. It may cause hyperbole, often racially charged, to boil in the ether of social media, when the facts are not yet known. Such was the case when a woman made up a story about being robbed of $100,000 worth of jewelry on Michigan Avenue (Sudo, 2013). One former Cook County Prosecutor recently resigned, alleging she was demoted when she dropped charges in a flash mob case that received a negative media backlash (Dudek, 2013).

That winter, I found another use for mapping as the City Council began to redistrict the boundaries of Chicago’s fifty wards. I needed to show the public how the current ward map looked and then do a comparison with the proposed changes. I was easily able to pull the map file from the city’s data portal site and post to WBEZ’s website by way of Google Fusion Tables, which allowed me to label each of the locations with the ease of editing a spreadsheet.

Getting the map files of the proposals being considered was a bit harder. Those files were not available or posted publicly by the City Council members. What the public did have access to was a massive ordinance proposal that ran dozens of pages. Those ordinances were automatically generated by cartographer software and weren’t entirely meant to be read—or make sense—to the general public.

Here is an excerpt of the ordinance text defining part of the boundary of the 13th ward:

...Central Park Avenue to West 65th Street; thence east on West 65th Street to the Grand Trunk Railway; thence south along the Grand Trunk Railway to a Nonvisible Linear Legal/Statistical Boundary (TLID:1 12050833) located between West 74th Street and West 75th Street; thence west on the Nonvisible Linear Legal/Statistical Boundary (TLID:112050833) located between West 74th Street and West 75th Street to South Pulaski Road... (Office of the Chicago City Clerk, 2012)

Some of the descriptions ran as long as five pages for a single ward.

With the help of our political reporter Sam Hudzik, I was able to obtain many of the GIS files necessary to create a series of maps that outlined the proposed changes. We were given the files by the caucuses of aldermen and outside groups, who put forth proposals but were unable to get the revised and eventual approved version from the Council’s Rules Committee. The Rules Committee was the body that was involved in the real negotiations about how the city wards were being redrawn. They were unable to provide updated map files because they were constantly changing them all the way up to the final hours of approval.

They even approved the map, knowing that the ordinance, produced largely by software, was littered with errors (so many that an amendment was passed months later to correct the remap). That amendment was fifty-eight pages long.

Oddly enough, in a conversation with a spokeswoman for the city’s clerk’s office, there was no mandate for the aldermen to make electronic maps available. This is because the laws governing the redistricting weren’t updated to account for the use of electronic mapping (even though the aldermen were using GIS mapping to redraw the wards). The fact that I still needed to work with outside sources to obtain files or even file as many FOIAs as I do, explains why some reporters have been skeptical of open data.

In Chicago, reporters have had and currently do have to box with city departments to obtain proprietary data. I had to go a few rounds with the police department when I tried to obtain electronic map files of its gang territory map. The Chicago Police Department originally denied my FOIA request and, after getting it appealed, still gave me the files in flat PDF format. I redrew the entire map manually from the PDFs.

I’ve talked with colleagues at WBEZ and other news outlets that downright do not trust the data available through the portal site. Others who do trust it, say they’ve found it useful, but that a lot of the information they would need is not there.

Do I need to figure out the race or age of a homicide victim? I would have to call the Cook County Medical Examiner’s office. Where do I find out the details of an armed battery? Chicago’s data portal will tell me when and what block the crime occurred on, but I would still need to call the police to find out who the victim was, if that information was even public.

Many will try to tell a story or narrative, but the information they need might be classified due to privacy concerns, or it might not yet have been cleaned up in a machine-readable format from legacy databases. Typos, formatting errors, and other problems within datasets can make reporters question the accuracy or reliability of using data portals.

For me, that’s where it becomes interesting.

What I and other reporters who work with data have realized is that government data is fluid, and the idea of getting information that can account for city functions with a hundred percent accuracy is not there. This also means that when a PR person gives us figures for a story, they are quite literally using the same data. The reporters, however, can put the onus of that accuracy on the agency they’re quoting, thus leaving the integrity of the reporter and the news outlet intact.

There are times a FOIA request can be denied because the data request is deemed burdensome. And for some, the reporting stops there, and a sentence is tossed in a story saying the data was unavailable. But it might be a little more nuanced than that. If you’re a data nerd, you understand things like database migration, categorizations, and legacy platforms. If you can actually get in touch with a human being who can discuss the nuances or differences between a city department’s old and new systems, then a reporter can narrow a request, obtain the raw data, clean it up, and present the differences to the public. An intern and I encountered this with the city’s towing data, having to refine messy cells with misspellings and typos into a set of two hundred or so categories, down from over a thousand.

When I do my own data analysis, it can be fear inducing, especially when the data is used to explain complex stories. While the data can be as accurate or as whole as the city can make it, the interpretation of that data relies entirely on me. It’s a pretty hard limb for any reporter to climb. There’s no news media method of peer review, unlike when a reporter quotes figures or stats from an academic study. While there are forums like Investigative Reporters and Editors (IRE), there’s no apparatus in place for journalists to share data, for fear of being scooped by a competitor, compromising those coveted pageviews.

I found that I had to start writing my stories with lengthy methodologies or even explain that it wasn’t possible to get an up-to-date record, but that what was available still conveyed an overall trend. This was the case when I mapped out abandoned properties and vacant lots to measure the city’s urban blight as juxtaposed to public school closings.

Once, I had to report on the effectiveness of Chicago’s recent pot-ticketing program, which was touted as a way for police to focus on violent crime and not lock up nonviolent drug offenders. The ordinance said that pot possession under fifteen grams could be a ticket, but the arrest data the city tracked could only identify amounts above or below thirty grams.

WBEZ isn’t a stranger to using data. The station has teamed up with outside groups to gather data for use in its reports. Catalyst, which covers education in Chicago, often teams up with WBEZ to analyze Chicago Public Schools data.

WBEZ is a public radio station, which is a lot different than working at a corporate newspaper. Public radio stations regularly partner with other nonprofits and community groups.

In that spirit, it’s through a partnership with the Smart Chicago Collaborative and the Chicago Community Trust that WBEZ began to formalize its data journalism effort. The Smart Chicago Collaborative is a civic organization, which believes it can use technology to enhance the lives of those in Chicago.

They approached the station about doing more data-based journalism. In turn, they would provide a grant that would assist us financially with resources for the project.

Daniel X. O’Neil is the group’s executive director. He’s been instrumental in connecting the station to resources we didn’t even know we needed. On the technology end, we’ve been slowly building up infrastructure that would give the station’s journalists a toolset to aid their reporting. O’Neil also helped connect WBEZ with the data community, which included developers and data scientists. They would lend their expertise, coming into the station and giving training sessions about how to handle and interpret data.

As I started to cover the use of data in the city, I found it to be the most unusual thing I’ve ever come across. City officials usually work with nonprofit groups for city improvement projects. Aldermen, police, and transportation officials regularly meet for community feedback on city projects, but the way that Goldstein’s department interacted with the open data community in Chicago was downright surreal for a reporter.

They were attending hack nights and responding to emails or Tweets faster than most city employees I’ve ever seen. They even take my phone calls directly when I find an error in a dataset. It was downright unsettling to be talking with city officials, saying it would make reporters’ lives easier if we had a particular dataset regularly, then having them respond with, “Let’s see what we can do to get you that.” My imagination internally cuts away to Star Wars’ Admiral Ackbar screaming, “It’s a trap!’

While I’m sure that Mayor Emanuel is not going to launch a counter-offensive against the rebel forces of the data community, I’m left wondering what the limitations of data are when information is not entirely available in a machine-readable way.

There is a litany of privacy issues when it comes to health departments releasing datasets on patient information, as well as when police departments release information on crime victims. Also, as one of Goldstein’s employees put it, cities don’t really deal with a lot of personal data, only incidents and locational data.

This means that while it’s helpful to get CTA ridership breakdowns and totals for trains and buses, I’m not expecting a situation where the CTA’s security footage is regularly opened up to the public.

A listing of city vendors, contracts, and employee receipt reimbursements is a vastly helpful resource, but a considerable amount of reporting is required to contextualize it. I am regularly pairing datasets from the portal site with datasets obtained via FOIA.

Part of the problem is that this is still relatively new territory. For Chicago’s program being a fairly recent development, we’re in a lot better shape than other municipalities I’ve seen. I tried finding similar datasets that Chicago has on other city data portals and was unable to find matching records.

Also, reporters must get more involved with the city in releasing sets. Often, the city won’t know that an obscure set is useful or has intrinsic news value unless it’s brought to their attention.

I’m also worried that some news outlets, which may be pressed with fewer resources and greater demands to churn out content, may not spend time to contextualize the data. There can be the temptation to take a cursory top ten list of a particular dataset because it’s easy to write a story on deadline that involves plucking a number from the data portal site and fitting a few quotes around it.

That said, there still is a great amount of work being done by some tireless reporters in this city and beyond through the use of data. We’re in the middle of an information renaissance, and while privacy is a very real fear, giving the Fourth Estate the ability to match a government’s ability to process and analyze data may even the odds.

About the Author

Elliott is a data reporter and Web producer for WBEZ, a Chicago-based public radio station. Elliott focuses on reporting from enterprise feature stories to data visualizations. He previously worked as a web editor at the Wall Street Journal, where he specialized in managing breaking news online and the production of iPad, Android, and mobile applications as the paper’s mobile platform editor. Prior to that, he was a senior Web editor at the New York Daily News and interned as a Web producer at the New York Times. Elliott graduated from Columbia College Chicago in 2006 with a B.A. in journalism, having interned at the Chicago Tribune, Chicago RedEye, and WBBM CBS2 News. Elliott is a Chicago native, hailing from the city’s Jefferson Park neighborhood.


Elliott Ramos
Data Journalist
Formerly at WBEZ
Reporter first and foremost, using technology to enable me to do digital story-telling. Native Chicagoan.