By: Richard W. Sharp
Release the tweets! We’ve made the Trump Watch’s database of tweets available on our new downloads page. These include all tweets from @realDonaldTrump going back to November 10, 2016. They were collected with Twitter’s public API using the query from:realDonaldTrump. Both the raw tweets and labels we have added are available. Each of the raw tweets files has a name in the format tweets_by_realDonaldTrump_yyyymmdd.json. The date represents the date that the tweet was collected. The tweet itself contains information about when it was created in the “created_at” field. A complete description of the information contained in a tweet is maintained by Twitter.
Since a search for tweets with the public API returns results from roughly the past week, we end up collecting the same tweet for several consecutive days. Each of the raw files contains at most one copy of each tweet (if we collected tweets more than once in a day, it’s the most recent version), however, the same tweet will typically appear in several of the files. Why keep duplicates? Because they’re not duplicates. Some features of a tweet change over time, such as the retweet count, which can give us insight into some short-lived trends. Sadly, we did not capture the recent “unpresidented” tweet, because it appeared and was corrected (in 27 minutes) faster than our collection updates , but it provides a good example of why its useful to archive the statements of public figures.
don't worry, we heard your cry for help. pic.twitter.com/qaIhor2aYP
— the first joel, the angels did say (@JoelNihlean) December 17, 2016
For the Trump Watch, we classify each tweet for sentiment and whether or not it’s an insult. The file trump_dump.csv contains the unique id and text of each tweet, as well as the tags we use to for classification and any notes. Please note that this is a .csv file, but it uses the | character as an alternate delimiter between fields to simplify parsing since commas are so common in the tweet text field.
Here is how we categorized tweetID 810121703288410112:
China steals United States Navy research drone in international waters - rips it out of water and takes it to China in unprecedented act.
— Donald J. Trump (@realDonaldTrump) December 17, 2016
Tag | Definition | Example |
---|---|---|
State | All references to a country or similar entity (e.g., the United Nations, ISIS), as represented by the official apparatus of government (e.g., until 20 Jan 2017, “USA” implies the Obama administration). Uses ISO-standard 3-letter country codes. |
#StaCHN |
State Sentiment | The sentiment (in the eye of the tweeter) implied by each state reference. This can be positive, negative, or neutral. | #SsnCHNNeg |
State Insult | Whether the reference to each state is an insult or a compliment (in the eye of the target state). | #SinCHNIns |
We will continue to regularly update and add to the collection.
No Comments on "New Dataset – @realDonaldTrump Tweet corpus"