I've been doing analysis on twitter dataset and I'm stuck at a part where I need to convert data to tidy text and remove stop words. I did look up some websites but they throw error. Here's my code so far. Please help!
I don't know how to add csv file so I'm attaching image of my dataset. This is a very common dataset easily found on Kaggle etc.
library(readr)
library(stringr)
library(caret)
library(car)
library(tidytext)
library(stringr)
library(tidyr)
set.seed(123)
twitter_train<-read.csv("/Users/School/Documents/R/Final Training Data Set-twitter.csv")
text<-twitter_train$tweet
text <- tolower(text)
# Remove mentions, urls, emojis, numbers, punctuations, etc.
text <- gsub("@\\w+", "", text)
text <- gsub("https?://.+", "", text)
text <- gsub("\\d+\\w*\\d*", "", text)
text <- gsub("#\\w+", "", text)
text <- gsub("[^\x01-\x7F]", "", text)
text <- gsub("[[:punct:]]", " ", text)
# Remove spaces and newlines
text <- gsub("\n", " ", text)
text <- gsub("^\\s+", "", text)
text <- gsub("\\s+$", "", text)
text <- gsub("[ |\t]+", " ", text)
#Create new column to store cleaned tweets
twitter_train["fix_text"] <- text
