A selected number of data projects
Here is an introduction to some of the data journalism I have produced in recent years. Some of it was done as part of my secondment to The Financial Times' data team. Some was produced during my time as a journalist with CNN's data team in Atlanta. Finally, some advanced research was conducted while I was a statistics' student at the French Grande Ecole ENSAE.
This bar chart shows the portion of diamonds with high or low clarity based on types of diamond cuts. You can see here that diamonds with relatively high clarity (diamonds with very few blemishes) in yellow and light green make up a higher proportion of ideal and premium cuts. Both cuts represent the best shape a diamond can have as well as the best ability of the diamond to interact with light. In contrast, a fair cut diamond is more likely to have low clarity aka to have a lot of inclusions and blemishes. This relationship between clarity and cut, as pointed out by the chart, is probably not the result of any natural process but rather the result of market forces leading diamond manufacturers to focus their cutting efforts on high clarity diamonds rather than low clarity ones.
R Data Visualisation - 2020
Recently, I have been practicing my data visualisation skills with R using tidyverse and access to online data such as US flight, diamond and US car consumption data. Here are some of the charts I was able to create:
This box plot shows the median, interquartile range and outlying values of car fuel efficiency for various types of cars in the US. You can see that SUVs have a lower fuel efficiency, meaning they travel less miles per gallon, than a subcompact or a two-seater.
This bar chart shows the portion of diamonds with high or low clarity based on types of diamond cuts. You can see here that diamonds with relatively high clarity (diamonds with very few blemishes) in yellow and light green make up a higher proportion of ideal and premium cuts. Both cuts represent the best shape a diamond can have as well as the best ability of the diamond to interact with light. In contrast, a fair cut diamond is more likely to have low clarity aka to have a lot of inclusions and blemishes. This relationship between clarity and cut, as pointed out by the chart, is probably not the result of any natural process but rather the result of market forces leading diamond manufacturers to focus their cutting efforts on high clarity diamonds rather than low clarity ones.
This scatter plot, which also includes a smooth line, shows the relationship between a car's engine size (in litres) and car efficiency (in miles per gallons) and gives a visual sense of where each class of car stands. As you can see, for small engine sizes, there seems to be a negative correlation between engine size and car efficiency, meaning that the bigger the engine, the less efficient the car. However, for cars with engines containing over five litres of oil, the relationship seems to be inverse. But that second note should be taken with the grain of salt considering that there's only a limited number of data points for large engines, and that those points are far apart from each other, meaning there might not be such a strong link between big engines and car efficiency.
Big Tech workers support left-leaning candidates - Financial Times - February 2020
In this FT story, I worked with the data team on a project that involved pulling a dataset from the US Federal Electoral Commission website, then analysing it using R. The final story shows that people working for tech companies in the US support far-left-leaning candidates more than they support conservative Democrats. My contribution was to update an R script to include more recent FEC data and export charts. I then rework the charts on Adobe Illustrator to fit FT standards and wrote some of the analysis for the story. Below are a few charts from that story:
Seconded to the Financial Times’ data team in 2020, I produced “Datawatch,” daily charts appearing on the paper’s front page revealing an interesting fact about world economics or politics. My contribution involved finding new and relevant data, formatting them into an FT chart (usually a line chart, bar chart or dot plot) and writing the text going along the chart (headline, subhead and text). Below are a few examples:
Working for a year as a journalist with CNN’s data journalism team in Atlanta, I was lucky to delve into a range of projects using R, SQL, Excel and a large array of investigative skills (FOIA requests, surveys, research and interviews). I was able to produce collaborative breaking news and investigative stories about the US health industry and politics.
The Opioid Crisis - CNN - March 2019
For this project, we pulled data from US Center for Medicaid and Medicare, specifically data about opioid prescriptions and payments received by doctors from pharmaceutical companies. We proved that doctors receiving payments from Big Pharma were more likely to prescribe opioids, contributing to the country’s plaguing opioid crisis of 2016. My contribution was an R regression analysis, looking at the correlation between opioid prescriptions and pharmaceutical payments across the data base. I found consistent evidence of a correlation between opioids and payments. More specifically, for each $1 paid to a doctor, that doctor was likely to prescribe 2.1 more pills to their patients. The final piece did not include the R analysis, but a SQL analysis of the top 100 doctors who prescribed the most opioids and their relationships with Big Pharma.
Trump's plan to kill Energy Star could benefit his properties - CNN - April 2017
For this project, my colleague and I found several data bases available through city websites of Energy Star commercial property rankings, part of a programme mandated for buildings in large cities tracking their CO2 emissions. We then cross-referenced these data bases with President Trump's properties in those cities to get an idea of how Trump's real estate ranked in terms of carbon emissions. This led to a breaking investigative story, picked up by the Los Angeles Times and The Independent, about how the president had a conflict of interest in cutting the programme, which consistently graded his properties poorly in terms of carbon emissions.
For this project, my colleague and I found several data bases available through city websites of Energy Star commercial property rankings, part of a programme mandated for buildings in large cities tracking their CO2 emissions. We then cross-referenced these data bases with President Trump's properties in those cities to get an idea of how Trump's real estate ranked in terms of carbon emissions. This led to a breaking investigative story, picked up by the Los Angeles Times and The Independent, about how the president had a conflict of interest in cutting the programme, which consistently graded his properties poorly in terms of carbon emissions.
Is the Phillips Curve still relevant? - ENSAE - 2013
In this project, I conducted a time series analysis of general inflation levels and unemployment in France between 1998 and 2012 to determine whether William Phillips' concept of a negative correlation between unemployment and inflation rates is a pertinent framework to understand unemployment today. The analysis was done using data from the French Institute of Statistics INSEE and the programming software SAS.
Media and feeling of safety in France - ENSAE - 2012 (in French)
This dissertation aimed to determine whether watching TV or consuming news in various
forms in France affected people's feeling of safety at home and outside their houses. The essay was developed with the use of descriptive statistics based on a data set from the European Social Survey from 2008 with a focus on the French population.
We focused on several questions to a survey and used the answers to establish indicators of whether an individual felt safe or not. We also used data on media consumption to establish if the individual was a strong or weak consumer of news.
Using SAS, we found that there didn't seem to be a very strong connection between media and feeling of safety, although people who consumed an overwhelming amount of news tended to feel less safe.
Comments
Post a Comment
Please leave a comment and tell us how you feel about the issue, keeping in mind that politeness and respect are essential to constructive discussions.