For the final project, I created a Tableau Story viewable on Tableau Public. This story examines the popularity of Fake News using a “Drill Down” approach. In other words, we begin by assessing the websites’ popularity and iteratively narrow down on the popularity of the stories posted.
There are three Story Points:
- Website Domain Rank Popularity
- Website Domain Rank and Published Content Volume
- Story Titles by Facebook Likes
We begin by examining the websites and how much traffic they receive, based on the Domain Rank field. Next, we look at how much published content each site is contributing, based on the number of story records. Lastly, we glimpse the published story titles receiving the most Facebook Likes, sorted in descending order, and hypothesize a genre theme.
Our next data analysis tool is Tableau. I used Tableau Public to create some initial charts exploring the Fake News dataset from Kaggle.
Stories per Country
As mentioned in the previous post, the subset dataset is heavily weighted to the US via entries in the “Country” field. The pie chart below depicts the countries by the sum of their associated “Stories” or rows. Note the disproportion to the US with 5,087. DE (Germany) is second with 172 and GB (Great Britain) has third most with 143, respectively. Again, it is not entirely clear if the Country field refers to the publishing website’s origin or how it was assigned.
Popularity by Website
The following chart examines the popularity of the websites represented in the data subset, based on the “Domain Rank” field. The rows of the chart are represented by the “Site URL” and the columns by “Domain Rank”. The top site, Amren, has a domain rank of about 8.7 million. My research on domain ranking seemed to indicate a more limited scale (e.g. 1-100), so it is unclear the nature of these scores. However, within the context of the dataset, the chart demonstrates 1) the Websites included and 2) their Domain Rank relative to each other. Note, the charts required screen captures for inclusion here, and this one extended down multiple screens worth.
Popular Websites and Stories Count
Building off the bar chart above, I added the number of records to Rows, i.e. the Sum of “Stories”. The scatterplot chart below shows us not only the most popular websites by Domain Rank, but also how many stories each contributed to the dataset. Amren, the most popular site contributed near the most stories at 93. Liberal America contributed the most at 95 but is much less popular, having around 2.8 million Domain Rank.