hi

Overview of Tweet Analysis

A random sample of tweets from the @Auckland Uni Twitter account was analyzed to explore various characteristics of tweets and their potential influence on engagement.

Data Variables in the Dataset

tweet_id: Unique identifier for each tweet.
year_tweeted: Year when the tweet was published (2020, 2021).
number_hashtags: The count of hashtags used in the tweet.
first_words: Starting words of the tweet.
tweet_length: The length of the tweet in characters.

True/False Statements with Variables

number_hashtags: Numeric variable identified by iNZight Lite. TRUE.
first_words: Categorical variable identified by iNZight Lite. FALSE.
tweet_length: Categorical variable identified by iNZight Lite. FALSE.

Dataset Shape:

The dataset is rectangular. TRUE.

Engagement Analysis

Percentage of Tweets with Engagement: 29% of tweets received at least one retweet. FALSE.
Most tweets contained no links. TRUE.
Among tweets with links, 70% received at least one retweet. TRUE.
Tweets with links significantly predict engagement (retweeting). TRUE.

Summary Statistics Questions

Proportion of Tweets Using Hashtags: Calculate using iNZight Lite.
Tweets Posted on Sunday: Calculate using iNZight Lite.
Hashtag Use on Monday: Identify the proportion using iNZight Lite.
Hashtag-Free Tweets on Thursday: Determine proportion using iNZight Lite.

Classification of Bots in Tweets

Confusion Matrix Findings

Analyzed a model that classifies tweets as written by a bot.

Predicted / Actual Classification: Actual Twitter bot: 3 predicted as bots, 4 as not bots (total 7).
Actual Not bot: 2 are predicted as bots (total 11).

Percentage Calculations

Overall Accuracy of the Model: Calculate based on correct predictions over total tweets.
Percentage of Actual Bots: Identify the total percentage of tweets written by bots.
Predicted Bots Accuracy: Percentage of predicted bots actually being bots.
Non-bot Predictions: Percentage of tweets not predicted as bots that were actually bots.

Visual Representation of Model Results

A visual of the tweet classification model utilizing 34 tweets. Color coding indicates actual authorship and predictive capabilities of the model. A complete confusion matrix needs to be filled based on visual data.