1/20
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What are the graphical procedures to describe 1 variable?
1.- Pie graph
2.- Bar graph
3.- Cummulative frequency graph
4,- Histogram
5.- Box-plot
What command is used to upload data files from internet?
webuse
downloads a sample dataset from the internet and loads it into memory
ie webuse lbw
lbw is a well-known example dataset (often labeled “Hosmer & Lemeshow data”) used for logistic regression examples.
What is doing the following command
graph pie, over(race)
This draws a pie where each slice = the number of observations in each race category.
over (variable)

What does the command plabel () …
graph pie, over(race) plabel(_all percent) legend(on)
Add percentage and legend to the graphsfor each slice of the pie chart, showing the portion of each race category.


What is doing the following command?
Is defining each colour for each slide
ie pie (1, color (gs15)) defines that first pie is gray (gs15)

When is used a bar graph?
. represent a qualitative ordinal variable
. each bar represents a category
. Height proportional to frequency

What is doing the function graph var, over (agecat),
makes a bar chart of the distribution of age_cat
For each category of age_cat (y axis) show the relative frequency or percentage

hist age_cat, discrete freq gap(20) xlabel(, valuelabel) addlabel ///
fcolor(navy) lcolor(none) xtitle("Age of the mother")
Make a histogram of age_cat as separate bars, show counts, add gaps, label the x-axis with the category names, and print the counts on each bar.
What each part does
hist age_cat
Draws a histogram of age_cat.
, discrete
Treats age_cat as discrete categories (one bar per integer/category), not continuous bins.
freq
Y-axis shows frequency (counts), not density/percent.
gap(20)
Adds space between bars (bigger number = bigger gaps). Makes it look more like separated category bars.
xlabel(, valuelabel)
Uses the value labels of age_cat on the x-axis (e.g., “18–24”, “25–34”) instead of showing just codes (1,2,3…).
addlabel
Prints the count on top of each bar.
fcolor(navy)
Sets the fill color of the bars to navy.
lcolor(none)
Removes the outline around bars (no border line).
xtitle("Age of the mother")
Sets the x-axis title.
///
Line continuation in Stata: lets you split a long command across lines.

cumul age, gen(c_age) equal
twoway scatter c_age age, connect(l)
1) cumul age, gen(c_age) equal
cumul computes the cumulative distribution function (CDF) of age.
gen(c_age) saves the cumulative values into a new variable called c_age.
After this, each observation gets a number between 0 and 1:
c_age = proportion of observations with age ≤ that observation’s age (a cumulative proportion).
equal tells Stata to treat each observation as having equal weight (each person counts the same). It’s mainly relevant if you have weights or if you want the “standard” unweighted empirical CDF.
2) twoway scatter c_age age, connect(l)
Plots c_age (y-axis) against age (x-axis).
connect(l) draws lines between the points, so it looks like a CDF curve rather than separate dots.

graph box partial1, ylab(0(1)10)
Draws a box-and-whisker plot for the variable partial1 (one box showing its distribution).
ylab(0(1)10) (y-axis labels)
This controls the tick marks on the y-axis:
0(1)10 means: label 0, 1, 2, …, 10 (step = 1)
So it forces a y-axis scale that’s nice for something like a score from 0 to 10.

What graphs can be used to describe the relationship between two variables?
1.- Two or more pie graphs
2.- Two or more bar graphs
3.- Two or more box-plots
4.- Scatter plot

When to use two or more pie graphs?
Two qualitative nominal variables

When to use two or more bar graphs?
One qualitative nominal, one qualitative ordinal

When to use two or more box-plots?
One qualitative and one quantitative

When to use scatter plot?
Two quantitative variables

tw sc SBP age
two way scatter plot
Y-axis: SBP (systolic blood pressure)
X-axis: age

sc SBP weight0 if smk==1 || lfit SBP weight0
sc SBP weight0
Scatter plot with:
Y = SBP (systolic blood pressure)
X = weight0 (baseline weight)
if smk==1
Only plot the observations where smk equals 1 (e.g., smokers).
||
Combines multiple “twoway” plots in the same graph (adds another layer).
lfit SBP weight0
Adds a linear fit line (OLS regression line) of SBP on weight0.

sc SBP weight0 if smk==1 || qfit SBP weight0 ///
, xlabel(40(5)120) ylab(80(5)155, angle(0))
sc SBP weight0 if smk==1
sc = scatter plot
Y-axis: SBP (systolic blood pressure)
X-axis: weight0
if smk==1 = only plot observations where smk equals 1 (smokers)
||
Adds another plot layer on the same graph.
qfit SBP weight0
Adds a quadratic fit curve (a 2nd-degree polynomial regression of SBP on weight0).
It fits: SBP=a+b⋅weight0+c⋅weight02SBP = a + b\cdot weight0 + c\cdot weight0^2SBP=a+b⋅weight0+c⋅weight02
Useful if the relationship looks curved rather than straight.

Kaplan-Meier graph
Cumulative survival plot with time

stset followup, fail(death)
This tells Stata: “I’m doing survival/time-to-event analysis.”
followup = the time variable (how long each person was followed, e.g., years/months/days).
fail(death) defines the event indicator:
death==1 → the event happened (failure)
death==0 → censored (no event during follow-up)
After stset, Stata creates internal survival variables and now you can use sts, stcox, etc.
sts graph, xlab(0(1)6)
This draws the Kaplan–Meier survival curve based on the stset data.
sts graph = plot the estimated survival function S(t)S(t)S(t).
xlab(0(1)6) puts x-axis tick labels at 0,1,2,3,4,5,6 (in the same time units as followup).This draws the Kaplan–Meier survival curve based on the stset data.
sts graph = plot the estimated survival function S(t)S(t)S(t).
xlab(0(1)6) puts x-axis tick labels at 0,1,2,3,4,5,6 (in the same time units as followup).
