A histogram is a graph of the frequency distribution for numerical data.
Constructing a Frequency Distribution
Determine the range of data: Find the smallest and largest observations to understand the data's spread.
Select the number of classes: Decide how many bars the histogram should have. Too many or too few classes can be less informative. Usually, the number of classes is between 5 and 15 but it depends on the data.
Compute class intervals: Determine the width of the classes.
Determine boundaries (limits): Define the boundaries for each class.
Count observations and assign them to classes: Go through the data set and assign each observation to its appropriate class.
Types of Histograms
Frequency Histogram: Shows the actual frequency of observations in each class.
Relative Frequency Histogram (Proportion): Shows the proportion of observations in each class (frequency out of the total).
Percent Histogram: Shows the percentage of observations in each class (proportion multiplied by 100).
Calculated as: (\text{Proportion} \times 100)
Graphing the Histogram
The horizontal axis represents the values or classes.
The vertical axis represents the frequency, relative frequency, or percentage.
Bars should touch each other to indicate that there are no gaps between the classes.
Stata - Do File Editor
The do file editor is a text editor within Stata that allows you to write and execute a program or a series of commands.
Commands in the do file editor are typically shown in blue.
You can execute the entire do file or select specific portions of the code to execute.
Useful Commands:
clear: Clears everything from memory.
log using: Starts a log file to record your Stata session.
cd: Changes the current working directory.
input: Enters data into Stata.
list: Lists the data in Stata.
generate: Generates a new variable.
Conditional Statements
if: Used to apply a command only to observations that meet a specific condition.
replace: Replaces the value of a variable based on a specified condition.
Example:
generate bin1 = 0: Creates a new variable bin1 and initializes it with zeros.
replace bin1 = 1 if variable1 >= 15.5 & variable1 < 25.5: Replaces the value in bin1 with 1 if variable1 is greater than or equal to 15.5 and less than 25.5.
Frequency Tables
Frequency tables display the number of observations in each category or bin.
Commands:
tabstat: Tabulates the data.
table: Command to create table. (The command wasn't working, needs to be fixed.)
Recode Command
The recode command is used to change the values of a variable based on specified conditions.
Example:
recode variable1 15.5/25.5 = 1 25.5/35.5 = 2 35.5/45.5 = 3, generate(bin): Recodes values in variable1 from 15.5 to 25.5 to 1, from 25.5 to 35.5 to 2, from 35.5 to 45.5 to 3, and generates a new variable called bin.
Graph Command
The graph command is used to generate graphs in Stata.
graph bar: Generates a bar graph.
Histogram Command
The histogram command is used to generate histograms in Stata.
histogram variable1, frequency start(15.5) width(10): Generates a frequency histogram for variable1, starting at 15.5 with a width of 10.
Options:
frequency: Specifies a frequency histogram.
percent: Specifies a percent histogram.
start(): Specifies the starting value for the histogram.
width(): Specifies the width of each class.
bin: Specifies the number of bins.
Skewness and Kurtosis
Skewness measures the asymmetry of a distribution.
A symmetric distribution has a skewness of 0.
A distribution skewed to the right has a positive skewness.
A distribution skewed to the left has a negative skewness.