Juliet Kelson

Real vs Fake News

Mini-Project 2: Adventure 1

Prompt

You’ll pretend that you work for Buzzfeed and have been asked to build an algorithm that automatically flags incoming news stories as “fake” or “real”. You’re provided with the data. This dataset includes the following information on 182 articles from 2016: title, text (content), authors, source, url address, and type (whether the article is real or fake).

Mini-Project 2: Adventure 1



Part 1: Process the data

buzzfeed <- read.csv("https://www.macalester.edu/~ajohns24/data/buzzfeed.csv")

buzzfeed <- buzzfeed %>% 
  mutate(title = as.character(title),
         text = as.character(text),
         url = as.character(url))

New predictors

Our new predictors will be:

  • Word count
  • Word count in title
  • Upper-case word count
  • Upper-case word count in title
  • Upper-case word ratio in title
  • ! ratio (to . ?)
  • ! ratio (to . ?) in title
  • Sentence Count
  • Syllables
  • % Unique words
  • Flesch Reading Ease
  • Author Count
  • Primary sentiment
  • Secondary sentiment
  • Strength of primary sentiment
  • Strength of secondary sentiment

Making total_words

buzzfeed <- buzzfeed %>% 
  mutate(total_words = str_count(text, " ") + 1)

Making total_words_title

buzzfeed <- buzzfeed %>% 
  mutate(total_words_title = str_count(title, " ") + 1)

Making total_upper_case_words

buzzfeed <- buzzfeed %>% 
  mutate(total_upper_case_words = str_count(text, "\\b[A-Z]{2,}\\b"))

Making total_upper_case_words_title and upper_case_word_ratio_title

buzzfeed <- buzzfeed %>% 
  mutate(total_upper_case_words_title = str_count(title, "\\b[A-Z]{2,}\\b"),
         upper_case_word_ratio_title = total_upper_case_words_title/total_words_title)

Making exclamation_ratio

buzzfeed <- buzzfeed %>% 
  mutate(exclamation_ratio = exclamation_ratio(text))

Making exclamation_ratio_title

buzzfeed <- buzzfeed %>% 
  mutate(exclamation_ratio_title = exclamation_ratio(title))

Making total_sentences

buzzfeed <- buzzfeed %>% 
  mutate(total_sentences = nsentence(text))

Making total_syllables

buzzfeed <- buzzfeed %>% 
  mutate(total_syllables = nsyllable(text))

Making unique_word_percent

buzzfeed <- buzzfeed %>% 
  mutate(unique_word_percent = pct_unique_words(text, total_words))

Making readability_score

buzzfeed <- buzzfeed %>% 
  mutate(readability_score = flesch_reading_ease(total_words, total_sentences, total_syllables))

Making author_count

buzzfeed <- buzzfeed %>%
  mutate(authors = gsub("View All Posts,", "", authors)) %>%
  mutate(authors = gsub("View All Posts", "", authors)) %>%
  mutate(authors = gsub("Abc News,", "", authors)) %>%
  mutate(authors = gsub("Abc News", "", authors)) %>%
  mutate(authors = gsub("Cnn National Politics Reporter,", "", authors)) %>%
  mutate(authors = gsub("Cnn National Politics Reporter", "", authors)) %>%
  mutate(authors = gsub("Cnn Pentagon Correspondent,", "", authors)) %>%
  mutate(authors = gsub("Cnn Pentagon Correspondent", "", authors)) %>%
  mutate(authors = gsub("Latest Posts,", "", authors)) %>%
  mutate(authors = gsub("Latest Posts", "", authors)) %>%
  mutate(authors = gsub("Cnn White House Producer,", "", authors)) %>%
  mutate(authors = gsub("Cnn White House Producer", "", authors)) %>%
  mutate(authors = gsub("Cnn Senior Congressional Producer,", "", authors)) %>%
  mutate(authors = gsub("Cnn Senior Congressional Producer", "", authors)) %>%
  mutate(author_count = str_count(authors, ",") + 1) %>% 
  mutate(author_count = if_else(author_count == 0, 1, author_count)) %>% 
  mutate(authors = if_else(authors == "", "None", authors)) %>% 
  mutate(author_count = if_else(authors == "None", 0, author_count))

Making primary_sentiment, secondary_sentiment, primary_sentiment_value, and secondary_sentiment_value

buzzfeed <- buzzfeed %>% 
  rowwise() %>% 
  mutate(primary_sentiment = max_sentiment_type(text, 1),
         primary_sentiment_value = max_sentiment_value(text, 1),
         secondary_sentiment = max_sentiment_type(text, 2),
         secondary_sentiment_value = max_sentiment_value(text, 2)
         )
buzzfeed[c(8,153),] %>% 
  select(-text) %>% 
  kable() %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
title url authors source type total_words total_words_title total_upper_case_words total_upper_case_words_title upper_case_word_ratio_title exclamation_ratio exclamation_ratio_title total_sentences total_syllables unique_word_percent readability_score author_count primary_sentiment primary_sentiment_value secondary_sentiment secondary_sentiment_value
Jeb Bush to lecture at Harvard this fall http://cnn.it/2d7qa20 Ashley Killough http://cnn.it real 163 8 1 0 0.00 0.00 0 7 308 64 23.34 1 trust 34 positive 32
Hillary’s DEAD!?!? Brand New Theory Has Serious PROOF http://www.thepoliticalinsider.com/hillary-health-dead-brand-new-theory/?source=RWN Featured Commentator http://www.thepoliticalinsider.com fake 211 8 2 2 0.25 47.06 50 16 361 76 48.71 1 negative 16 positive 16




The two articles that will be presented in order to demonstrate our new predictors are “Jeb Bush to lecture at Harvard this fall” – the real sample article – and “Hillary’s DEAD!?!? Brand New Theory Has Serious PROOF” – the fake article. We first started by looking at predictors relating to word and punctuation counts in the text itself and the title. Our first predictors looked at the word count and upper-case word count in the text and the title. The total number of words for the title and the text were not significantly different between these two articles. The number of words in the titles is exactly the same, and the word count in the body only differs by 50.

The total number of upper-case words in the text is also similar in both articles: one in the real article and two in the fake. However, the difference in the number of upper-case words is more drastic in the title. There are zero upper-case words in the real title whereas two out of eight words in the fake title are uppercase. We also capture this as a ratio. For the fake title, two out of eight words is 25%, as seen in the predictor total_upper_case_words_title.

The next two predictors looked at the ratio of exclamation points to question marks and periods in the text and title. In the text, there are zero exclamation points in real article, yet about 47% of sentence ending punctuation (.?!) in the fake article is made up of exclamations. Furthermore, in the titles, there are zero exclamation points in the real article but they make up 50% of the punctuation for the fake article.

Just like word count, we also looked at sentence and syllable count. These two measurements were fairly insignificant as the numbers were similar in both sentences but were useful in the computation of other new predictors. It was slightly surprising, however, that there were more sentences and syllables in the fake news article than the real one.

The next two predictors analyzed the words that were used in the text. The first is the percent of unique words, which was much higher for the real article, 64% vs 76%. The second is a Flesch Reading Ease score, where higher score implies the text is easier to read. Texts with high scores can be read by a younger audience. The real article has a readability score of 23.34, which implies a reading level of a college graduate. On the other hand, the fake article has a score of 48.71, which implies a reading level of a college student. While this may not seem like a huge difference, the idea is that the structure of the article is more complex when the score is lower. Fake articles often don’t have that same complexity.

The next predictor is about the author count, which is equal in this case between the two articles. It does not make a big difference in this comparison.

Finally, the last four predictors are about sentiment analysis. We looked at the primary and secondary sentiment in the text and the strength of each sentiment. The primary sentiment is the strongest sentiment in the text and the second sentiment is the second strongest where strength is calculated by total sentences containing that sentiment. The sentiments for the real article are trust and positive, with scores in the 30s, while the fake article has the sentiments negative and positive with low scores of 16 each. The fact that the primary and secondary sentiments conflict with each other greatly is something to be concerned about when thinking of the validity of an article.

Similar to any analytical method, text analysis has its drawbacks. One drawback, for example, is seen through the sentiment analysis predictors. We will not always be able to accurately analyze the sentiment of text. Computers and algorithms can’t always detect sarcasm, understand references or allusions, and won’t always correctly analyze basic sentiments such as positive or negative. Additionally, writing algorithms to detect factors like writing style, words with multiple meanings, correct grammar/punctuation, and correctness of fact is often difficult, inaccurate, or impossible. All of that being said, text analysis is still a beneficial and important research tool.







Part 2: Analyze

LASSO with source

set.seed(253)
lambda_grid <- 10^seq(-3, 1, length = 100)

lasso_model <- train(
    type ~ .,
    data = buzzfeed %>% select(-text, -title, -url),
    method = "glmnet",
    tuneGrid = data.frame(alpha = 1, lambda = lambda_grid),
    trControl = trainControl(method = "cv", number = 10, selectionFunction = "best"),
    metric = "Accuracy",
    na.action = na.omit
)

LASSO without source

set.seed(253)
lambda_grid <- 10^seq(-3, 1, length = 100)

lasso_model_no_source <- train(
    type ~ .,
    data = buzzfeed %>% select(-text, -title, -url, -source),
    method = "glmnet",
    tuneGrid = data.frame(alpha = 1, lambda = lambda_grid),
    trControl = trainControl(method = "cv", number = 10, selectionFunction = "best"),
    metric = "Accuracy",
    na.action = na.omit
)

To build our model, we first looked at the relationships between predictors and types of article. We then tried a variety of techniques to get a sense of which might do the best job of predicting if an article is real or fake. We tried KNN and GAM, but GAM did not do well with so much text and KNN does not do so well with so many predictors, so we moved on to parametric algorithms.

We chose LASSO with binary classification because simple logistic regression tried to use all of the predictors with no penalties and thus did not do as good of a job making accurate predictions. We did not try backwards stepwise or best subset selection because they are more computationally expensive. LASSO had a very high accuracy and high specificity so we moved forward with it.

From here we made two models – one that includes source and one that does not. We did this because source significantly affects accuracy. If we were to try classifying an article with no source, the accuracy would be lower and the model would be different. When training our model, we excluded text, title, and url from the data because they are unique to each article and therefore do not make useful predictors.

For our models, we used a broad range of lambda (\(\lambda\)) values to get a better understanding of what size of penalty yielded the highest accuracy. We chose the selection function best with the metric of Accuracy, because for a topic like fake news with potentially impactful consequences, it is important to be as accurate as possible.

LASSO \(\lambda\) with source

plot(lasso_model, xlim=c(0, .3))

LASSO \(\lambda\) without source

plot(lasso_model_no_source, xlim=c(0, .3))







Part 3: Summarize

After analyzing our data and looking at the results, we can conclude that some of the best predictors for our model included source, exclamation ratio in the text and title, authors, and surprise sentiment. Some of the worst predictors included the four sentiment predictors overall: primary and secondary sentiment and sentiment strength. The predictors that did not lean in either direction include readability_score, unique_word_percent, and author_count because they weren’t disjoint but also not completely intersectional between real and fake articles. We did, however, have two different models, one with source and one without source as a predictor. Both are very accurate, but when taking source out as a predictor, sentiment becomes a better predictor.

buzzfeed %>% 
  ggplot(aes(x = source, fill= type))+
  geom_bar()+ 
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90))+
  scale_y_continuous(breaks = scales::pretty_breaks(10))+
  labs(x="Source", y="Count",  fill= "Type", title="Count of Real vs Fake articles by source")

We can see from the plot above that there is very little overlap of real and fake news for each source. This makes it an extremely good predictor.

We can also see that the exclamation ratios are useful predictors because they have little overlap across real and fake articles.

eratio <- buzzfeed %>% 
  ggplot(aes(x=exclamation_ratio, fill=type))+
  geom_density(alpha=0.5)+
  labs(x="Exclamation Ratio in Body", fill="Type", y = "Density")+
  theme_minimal()

eration_title <- buzzfeed %>% 
  ggplot(aes(x=exclamation_ratio_title, fill=type))+
  geom_density(alpha=0.5)+
  labs(x="Exclamation Ratio in Title", fill="Type", y = "Density")+
  theme_minimal()

gridExtra::grid.arrange(eratio, eration_title)

When looking at sentiment analysis, we see that the strength and type of primary sentiment are not very useful except in the case of surprise. Surprise is only encountered in real articles, making it a good predictor of article type. As mentioned before, this fact changes if source is removed as a predictor. When that happens, sentiment becomes an important predictor.

buzzfeed %>% 
  ggplot(aes(x=primary_sentiment_value, fill = type))+
  geom_density(alpha = 0.5)+
  facet_wrap(~primary_sentiment)+
  xlim(0,60)+
  theme_minimal()+
  labs(x = "Primary Sentiment Value", y = "Density", fill = "Type")+
  theme(legend.direction = 'horizontal', legend.position=c(.8,.2))

buzzfeed %>% 
  ggplot(aes(x=type, fill = type))+
  geom_bar()+
  facet_grid(~primary_sentiment)+
  theme_minimal()+
  labs(x = "Type", y = "Count", title = "Primary Sentiment Type")+
  theme(legend.position = "none")+
  scale_y_continuous(breaks = scales::pretty_breaks())

Overall, source and author are two of the most indicative predictors. From the predictors we created, the exclamation ratio in both the text and the title play an important role in categorizing the article. When source is removed, sentiment analysis also becomes important as a predictor. The LASSO model we created from these predictors was highly accurate with or without source as a predictor and the specificity was extremely high. With source as a predictor, specificity is at 100%. Without source as a predictor, specificity is approximately 97%. In both cases, specificity is high, which is very important for this task. For categorizing an article as fake or real news, it is more important to have a high specificity because it is better to accurately know which articles are fake, rather than accidentally categorize an article as real news. Lastly, the sensitivity is also high (77% and 90%) which means our models also accurately classify articles as real. Overall, as seen from our predictors and the sensitivity and specificity values, it is easier to categorize a fake article than a real article, yet both are easily detectable.







Part 4: Contributions

Both members evenly contributed to this project. We came up with the new predictors together and each implemented half of them. The rest was done together.




Appendix




LASSO model summary data with source

lasso_model$results %>% filter(lambda == lasso_model$bestTune$lambda)
##   alpha     lambda  Accuracy     Kappa AccuracySD  KappaSD
## 1     1 0.03764936 0.8511111 0.7022222 0.06525249 0.130505
coef(lasso_model$finalModel, lasso_model$bestTune$lambda)
## 142 x 1 sparse Matrix of class "dgCMatrix"
##                                                                                                                                                   1
## (Intercept)                                                                                                                            -0.871369531
## authorsBarbara Starr,                                                                                                                   .          
## authorsBetsy Klein                                                                                                                      .          
## authorsBlair Patterson                                                                                                                  .          
## authorsBob Amoroso                                                                                                                      .          
## authorsBrian Stelter                                                                                                                    .          
## authorsBrianna Ehley,Jack Shafer                                                                                                        .          
## authorsCampus Reform,                                                                                                                   .          
## authorsCassy Fiano                                                                                                                      .          
## authorsColin Taylor,Grant Stern,Brett Bose,Natalie Dickinson                                                                            .          
## authorsCrispin White                                                                                                                    .          
## authorsDale Summitt                                                                                                                     .          
## authorsDan Merica                                                                                                                       .          
## authorsDan Merica,Eugene Scott                                                                                                          .          
## authorsDavid Wright                                                                                                                     .          
## authorsDeirdre Walsh,                                                                                                                   .          
## authorsDominique Debucquoy-dodley                                                                                                       .          
## authorsEdward-isaac Dovere,Eli Stokols,Politico Staff,Jack Shafer                                                                       .          
## authorsEli Watkins                                                                                                                      .          
## authorsElvin Bartley                                                                                                                    .          
## authorsEric Bradner                                                                                                                     .          
## authorsFeatured Commentator                                                                                                             .          
## authorsFed Up                                                                                                                           .          
## authorsGrant Stern,Brett Bose,Natalie Dickinson                                                                                         .          
## authorsHadas Gold                                                                                                                       .          
## authorsJack Shafer                                                                                                                      .          
## authorsJack Shafer,Burgess Everett                                                                                                      .          
## authorsJack Shafer,Daniel Strauss                                                                                                       .          
## authorsJack Shafer,Erick Trickey,Zachary Karabell                                                                                       .          
## authorsJack Shafer,Jeff Greenfield                                                                                                      .          
## authorsJack Shafer,Julia Ioffe                                                                                                          .          
## authorsJack Shafer,Kyle Cheney,Daniel Strauss,Daniel Lippman,Eli Stokols,Glenn Thrush,Brent Griffiths                                   .          
## authorsJack Shafer,Louis Nelson                                                                                                         .          
## authorsJack Shafer,Louis Nelson,Matthew Nussbaum,Shane Goldmacher                                                                       .          
## authorsJack Shafer,Michael Hirsh,Mikhail Zygar,Bruce Blair,Peter Edelman,Adam Walinsky                                                  .          
## authorsJack Shafer,Nolan D                                                                                                              .          
## authorsJack Shafer,Politico Staff,Kyle Cheney                                                                                           .          
## authorsJack Shafer,Steven Shepard,Glenn Thrush,Nolan D,Shane Goldmacher                                                                 .          
## authorsJack Shafer,Yousef Saba                                                                                                          .          
## authorsJameson Parker                                                                                                                   .          
## authorsJeff Dunetz                                                                                                                      0.845123477
## authorsJeremy Diamond                                                                                                                   .          
## authorsJim Bowman                                                                                                                       .          
## authorsJohn Couwels                                                                                                                     .          
## authorsJohn Hawkins                                                                                                                     0.845124782
## authorsJohn Parkinson,More John,More Alexander                                                                                          .          
## authorsJohn Prager                                                                                                                      .          
## authorsJosh Gerstein                                                                                                                    .          
## authorsJoyce Tseng,Eli Watkins                                                                                                          .          
## authorsKevin Jackson                                                                                                                    .          
## authorsKevin Liptak,                                                                                                                    .          
## authorsLaura Koran                                                                                                                      .          
## authorsLeonora Cravotta                                                                                                                 1.840217792
## authorsLisa Smith                                                                                                                       .          
## authorsMadeline Conway,Burgess Everett,Katie Glueck,Jack Shafer                                                                         .          
## authorsManu Raju,Senior Political Reporter                                                                                              .          
## authorsMartin Lioll,John Falkenberg,Ben Marquis,Kimberly J Smith,Martin Walsh,V Saxena,Benjamin Arie                                    .          
## authorsMatt Barber                                                                                                                      .          
## authorsMichael Hayne                                                                                                                    .          
## authorsMj Lee,                                                                                                                          .          
## authorsMockarena Cotr                                                                                                                   .          
## authorsMore Arlette                                                                                                                     .          
## authorsMore Candace,Adam Kelsey,More Adam                                                                                               .          
## authorsMore Josh,Josh Margolin                                                                                                          .          
## authorsMore Meghan,                                                                                                                     .          
## authorsMore Michael,                                                                                                                    .          
## authorsMore Stephanie,Emily Shapiro,Jj Gallagher,Stephanie Wash,Michael Edison Hayden,Mike Levine,More Rhonda,More Emily,More Michael,  .          
## authorsMore Veronica,Ryan Struyk,More Ryan,Meghan Keneally,More Shushannah,More Meghan,Veronica Stracqualursi                           .          
## authorsNick Gass,Jack Shafer                                                                                                            .          
## authorsNick Gass,Madeline Conway,Jack Shafer                                                                                            .          
## authorsNone                                                                                                                            -0.159411607
## authorsOliver Willis                                                                                                                    1.459714849
## authorsOnan Coca,                                                                                                                       0.845170645
## authorsPhilip Hodges,                                                                                                                   0.400810961
## authorsRich Witmer,Doug Giles                                                                                                           .          
## authorsRika Christensen                                                                                                                 0.663167594
## authorsRyan Browne                                                                                                                      .          
## authorsRyan Denson                                                                                                                      .          
## authorsScott Osborn,Mr Wendal                                                                                                           .          
## authorsScott Osborn,Terresa Monroe-hamilton,Mr Wendal,Max Jackson                                                                       .          
## authorsSierra Marlee                                                                                                                    .          
## authorsStephen D Foster Jr                                                                                                              .          
## authorsSteven Shepard                                                                                                                   .          
## authorsTal Kopan                                                                                                                        .          
## authorsTerresa Monroe-hamilton                                                                                                          .          
## authorsTiffiny Ruegner                                                                                                                  .          
## authorsTom Lobianco                                                                                                                     .          
## authorsTony Elliott                                                                                                                     .          
## authorsWendy Gittleson                                                                                                                  .          
## sourcehttp://100percentfedup.com                                                                                                        .          
## sourcehttp://abcn.ws                                                                                                                    2.575498041
## sourcehttp://addictinginfo.org                                                                                                          0.182094876
## sourcehttp://allenwestrepublic.com                                                                                                      .          
## sourcehttp://author.addictinginfo.org                                                                                                   .          
## sourcehttp://author.groopspeak.com                                                                                                      .          
## sourcehttp://clashdaily.com                                                                                                             .          
## sourcehttp://cnn.it                                                                                                                     3.106674079
## sourcehttp://conservativebyte.com                                                                                                       .          
## sourcehttp://conservativetribune.com                                                                                                    .          
## sourcehttp://eaglerising.com                                                                                                            .          
## sourcehttp://freedomdaily.com                                                                                                          -0.140963717
## sourcehttp://occupydemocrats.com                                                                                                        .          
## sourcehttp://politi.co                                                                                                                  3.333982825
## sourcehttp://rightwingnews.com                                                                                                          .          
## sourcehttp://theblacksphere.net                                                                                                         .          
## sourcehttp://usherald.com                                                                                                               .          
## sourcehttp://winningdemocrats.com                                                                                                       .          
## sourcehttp://www.addictinginfo.org                                                                                                      .          
## sourcehttp://www.chicksontheright.com                                                                                                   .          
## sourcehttp://www.ifyouonlynews.com                                                                                                      1.459770163
## sourcehttp://www.opposingviews.com                                                                                                      1.004636906
## sourcehttp://www.proudcons.com                                                                                                          .          
## sourcehttp://www.thepoliticalinsider.com                                                                                                .          
## sourcehttp://www.yesimright.com                                                                                                         .          
## sourcehttps://goo.gl                                                                                                                    .          
## sourcehttps://ihavethetruth.com                                                                                                         .          
## sourcehttps://www.washingtonpost.com                                                                                                    1.004636352
## total_words                                                                                                                             .          
## total_words_title                                                                                                                       .          
## total_upper_case_words                                                                                                                  .          
## total_upper_case_words_title                                                                                                            .          
## upper_case_word_ratio_title                                                                                                             .          
## exclamation_ratio                                                                                                                       .          
## exclamation_ratio_title                                                                                                                -0.007232691
## total_sentences                                                                                                                         .          
## total_syllables                                                                                                                         .          
## unique_word_percent                                                                                                                     .          
## readability_score                                                                                                                       .          
## author_count                                                                                                                            .          
## primary_sentimentnegative                                                                                                               .          
## primary_sentimentpositive                                                                                                               .          
## primary_sentimentsurprise                                                                                                               0.373877799
## primary_sentimenttrust                                                                                                                  .          
## primary_sentiment_value                                                                                                                 .          
## secondary_sentimentanticipation                                                                                                         .          
## secondary_sentimentfear                                                                                                                 .          
## secondary_sentimentnegative                                                                                                            -0.185171842
## secondary_sentimentpositive                                                                                                             .          
## secondary_sentimentsadness                                                                                                              .          
## secondary_sentimentsurprise                                                                                                             .          
## secondary_sentimenttrust                                                                                                                .          
## secondary_sentiment_value                                                                                                               .
predict_data <- na.omit(lasso_model$trainingData)
classifications <- predict(lasso_model, newdata = predict_data, type = "raw")
head(classifications, 3)
## [1] real real fake
## Levels: fake real
confusionMatrix(data = classifications, 
  reference = predict_data$.outcome, 
  positive = "real")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction fake real
##       fake   91   21
##       real    0   70
##                                          
##                Accuracy : 0.8846         
##                  95% CI : (0.829, 0.9271)
##     No Information Rate : 0.5            
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.7692         
##                                          
##  Mcnemar's Test P-Value : 1.275e-05      
##                                          
##             Sensitivity : 0.7692         
##             Specificity : 1.0000         
##          Pos Pred Value : 1.0000         
##          Neg Pred Value : 0.8125         
##              Prevalence : 0.5000         
##          Detection Rate : 0.3846         
##    Detection Prevalence : 0.3846         
##       Balanced Accuracy : 0.8846         
##                                          
##        'Positive' Class : real           
## 




LASSO model summary data without source

lasso_model_no_source$results %>% filter(lambda == lasso_model_no_source$bestTune$lambda)
##   alpha     lambda  Accuracy     Kappa AccuracySD   KappaSD
## 1     1 0.02595024 0.7816667 0.5633333  0.1259273 0.2518546
coef(lasso_model_no_source$finalModel, lasso_model_no_source$bestTune$lambda)
## 114 x 1 sparse Matrix of class "dgCMatrix"
##                                                                                                                                                   1
## (Intercept)                                                                                                                             1.231189859
## authorsBarbara Starr,                                                                                                                   .          
## authorsBetsy Klein                                                                                                                      .          
## authorsBlair Patterson                                                                                                                 -1.988346154
## authorsBob Amoroso                                                                                                                     -1.111741527
## authorsBrian Stelter                                                                                                                    .          
## authorsBrianna Ehley,Jack Shafer                                                                                                        .          
## authorsCampus Reform,                                                                                                                  -0.710551010
## authorsCassy Fiano                                                                                                                      .          
## authorsColin Taylor,Grant Stern,Brett Bose,Natalie Dickinson                                                                           -2.458082928
## authorsCrispin White                                                                                                                   -0.299475994
## authorsDale Summitt                                                                                                                    -0.587310407
## authorsDan Merica                                                                                                                       .          
## authorsDan Merica,Eugene Scott                                                                                                          .          
## authorsDavid Wright                                                                                                                     .          
## authorsDeirdre Walsh,                                                                                                                   .          
## authorsDominique Debucquoy-dodley                                                                                                       .          
## authorsEdward-isaac Dovere,Eli Stokols,Politico Staff,Jack Shafer                                                                       .          
## authorsEli Watkins                                                                                                                      .          
## authorsElvin Bartley                                                                                                                   -1.541188370
## authorsEric Bradner                                                                                                                     .          
## authorsFeatured Commentator                                                                                                            -0.216856971
## authorsFed Up                                                                                                                          -0.008116754
## authorsGrant Stern,Brett Bose,Natalie Dickinson                                                                                        -1.768624114
## authorsHadas Gold                                                                                                                       .          
## authorsJack Shafer                                                                                                                      .          
## authorsJack Shafer,Burgess Everett                                                                                                      .          
## authorsJack Shafer,Daniel Strauss                                                                                                       .          
## authorsJack Shafer,Erick Trickey,Zachary Karabell                                                                                       .          
## authorsJack Shafer,Jeff Greenfield                                                                                                      .          
## authorsJack Shafer,Julia Ioffe                                                                                                          .          
## authorsJack Shafer,Kyle Cheney,Daniel Strauss,Daniel Lippman,Eli Stokols,Glenn Thrush,Brent Griffiths                                   .          
## authorsJack Shafer,Louis Nelson                                                                                                         0.499954664
## authorsJack Shafer,Louis Nelson,Matthew Nussbaum,Shane Goldmacher                                                                       .          
## authorsJack Shafer,Michael Hirsh,Mikhail Zygar,Bruce Blair,Peter Edelman,Adam Walinsky                                                  .          
## authorsJack Shafer,Nolan D                                                                                                              0.204279096
## authorsJack Shafer,Politico Staff,Kyle Cheney                                                                                           .          
## authorsJack Shafer,Steven Shepard,Glenn Thrush,Nolan D,Shane Goldmacher                                                                 .          
## authorsJack Shafer,Yousef Saba                                                                                                          .          
## authorsJameson Parker                                                                                                                   0.001121241
## authorsJeff Dunetz                                                                                                                      .          
## authorsJeremy Diamond                                                                                                                   .          
## authorsJim Bowman                                                                                                                      -0.909931964
## authorsJohn Couwels                                                                                                                     .          
## authorsJohn Hawkins                                                                                                                     .          
## authorsJohn Parkinson,More John,More Alexander                                                                                          .          
## authorsJohn Prager                                                                                                                     -1.641877810
## authorsJosh Gerstein                                                                                                                    .          
## authorsJoyce Tseng,Eli Watkins                                                                                                          .          
## authorsKevin Jackson                                                                                                                   -2.570547861
## authorsKevin Liptak,                                                                                                                    0.286594460
## authorsLaura Koran                                                                                                                      .          
## authorsLeonora Cravotta                                                                                                                 1.011647340
## authorsLisa Smith                                                                                                                      -1.427876297
## authorsMadeline Conway,Burgess Everett,Katie Glueck,Jack Shafer                                                                         .          
## authorsManu Raju,Senior Political Reporter                                                                                              .          
## authorsMartin Lioll,John Falkenberg,Ben Marquis,Kimberly J Smith,Martin Walsh,V Saxena,Benjamin Arie                                   -3.055155755
## authorsMatt Barber                                                                                                                     -1.610110987
## authorsMichael Hayne                                                                                                                    0.759907467
## authorsMj Lee,                                                                                                                          .          
## authorsMockarena Cotr                                                                                                                  -1.411034345
## authorsMore Arlette                                                                                                                     .          
## authorsMore Candace,Adam Kelsey,More Adam                                                                                               .          
## authorsMore Josh,Josh Margolin                                                                                                          .          
## authorsMore Meghan,                                                                                                                     .          
## authorsMore Michael,                                                                                                                    .          
## authorsMore Stephanie,Emily Shapiro,Jj Gallagher,Stephanie Wash,Michael Edison Hayden,Mike Levine,More Rhonda,More Emily,More Michael,  .          
## authorsMore Veronica,Ryan Struyk,More Ryan,Meghan Keneally,More Shushannah,More Meghan,Veronica Stracqualursi                           .          
## authorsNick Gass,Jack Shafer                                                                                                            .          
## authorsNick Gass,Madeline Conway,Jack Shafer                                                                                            .          
## authorsNone                                                                                                                            -1.798335840
## authorsOliver Willis                                                                                                                    0.092385070
## authorsOnan Coca,                                                                                                                       .          
## authorsPhilip Hodges,                                                                                                                   .          
## authorsRich Witmer,Doug Giles                                                                                                          -0.764601982
## authorsRika Christensen                                                                                                                 .          
## authorsRyan Browne                                                                                                                      0.514057002
## authorsRyan Denson                                                                                                                     -1.657598389
## authorsScott Osborn,Mr Wendal                                                                                                          -1.843553260
## authorsScott Osborn,Terresa Monroe-hamilton,Mr Wendal,Max Jackson                                                                      -2.753257834
## authorsSierra Marlee                                                                                                                    .          
## authorsStephen D Foster Jr                                                                                                             -1.658690427
## authorsSteven Shepard                                                                                                                   .          
## authorsTal Kopan                                                                                                                        .          
## authorsTerresa Monroe-hamilton                                                                                                         -0.203826578
## authorsTiffiny Ruegner                                                                                                                 -1.266450258
## authorsTom Lobianco                                                                                                                     .          
## authorsTony Elliott                                                                                                                    -1.866264885
## authorsWendy Gittleson                                                                                                                 -2.137856441
## total_words                                                                                                                             .          
## total_words_title                                                                                                                      -0.102911390
## total_upper_case_words                                                                                                                 -0.001136262
## total_upper_case_words_title                                                                                                            .          
## upper_case_word_ratio_title                                                                                                            -4.434406101
## exclamation_ratio                                                                                                                       .          
## exclamation_ratio_title                                                                                                                -0.014854345
## total_sentences                                                                                                                         .          
## total_syllables                                                                                                                         .          
## unique_word_percent                                                                                                                     .          
## readability_score                                                                                                                       .          
## author_count                                                                                                                            0.099142260
## primary_sentimentnegative                                                                                                               .          
## primary_sentimentpositive                                                                                                               .          
## primary_sentimentsurprise                                                                                                               1.429639087
## primary_sentimenttrust                                                                                                                  .          
## primary_sentiment_value                                                                                                                 .          
## secondary_sentimentanticipation                                                                                                         .          
## secondary_sentimentfear                                                                                                                 .          
## secondary_sentimentnegative                                                                                                            -0.406057828
## secondary_sentimentpositive                                                                                                             .          
## secondary_sentimentsadness                                                                                                              .          
## secondary_sentimentsurprise                                                                                                             .          
## secondary_sentimenttrust                                                                                                                .          
## secondary_sentiment_value                                                                                                               0.058763627
predict_data_no_source <- na.omit(lasso_model_no_source$trainingData)
classifications <- predict(lasso_model_no_source, newdata = predict_data, type = "raw")
head(classifications, 3)
## [1] real real real
## Levels: fake real
confusionMatrix(data = classifications, 
  reference = predict_data$.outcome, 
  positive = "real")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction fake real
##       fake   86    9
##       real    5   82
##                                           
##                Accuracy : 0.9231          
##                  95% CI : (0.8743, 0.9573)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8462          
##                                           
##  Mcnemar's Test P-Value : 0.4227          
##                                           
##             Sensitivity : 0.9011          
##             Specificity : 0.9451          
##          Pos Pred Value : 0.9425          
##          Neg Pred Value : 0.9053          
##              Prevalence : 0.5000          
##          Detection Rate : 0.4505          
##    Detection Prevalence : 0.4780          
##       Balanced Accuracy : 0.9231          
##                                           
##        'Positive' Class : real            
## 

This project is maintained by julietkelson