Our next data analysis tool is Tableau. I used Tableau Public to create some initial charts exploring the Fake News dataset from Kaggle.
Stories per Country
As mentioned in the previous post, the subset dataset is heavily weighted to the US via entries in the “Country” field. The pie chart below depicts the countries by the sum of their associated “Stories” or rows. Note the disproportion to the US with 5,087. DE (Germany) is second with 172 and GB (Great Britain) has third most with 143, respectively. Again, it is not entirely clear if the Country field refers to the publishing website’s origin or how it was assigned.
Popularity by Website
The following chart examines the popularity of the websites represented in the data subset, based on the “Domain Rank” field. The rows of the chart are represented by the “Site URL” and the columns by “Domain Rank”. The top site, Amren, has a domain rank of about 8.7 million. My research on domain ranking seemed to indicate a more limited scale (e.g. 1-100), so it is unclear the nature of these scores. However, within the context of the dataset, the chart demonstrates 1) the Websites included and 2) their Domain Rank relative to each other. Note, the charts required screen captures for inclusion here, and this one extended down multiple screens worth.
Popular Websites and Stories Count
Building off the bar chart above, I added the number of records to Rows, i.e. the Sum of “Stories”. The scatterplot chart below shows us not only the most popular websites by Domain Rank, but also how many stories each contributed to the dataset. Amren, the most popular site contributed near the most stories at 93. Liberal America contributed the most at 95 but is much less popular, having around 2.8 million Domain Rank.