0% found this document useful (0 votes)

17 views31 pages

Stata Graphics for Economics Students

The document discusses creating graphs in Stata to visualize data distributions and model outputs. It covers univariate and bivariate graphs for discrete and continuous variables, as well as time plots and graphs for regression results. Examples are provided using built-in Stata commands and some user-written commands.

Uploaded by

LIUL LB Tamirat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views31 pages

Stata Graphics for Economics Students

Uploaded by

LIUL LB Tamirat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UCD GEARY INSTITUTE FOR PUBLIC POLICY

DISCUSSION PAPER SERIES

Basic Stata graphics for economics students

Kevin J. Denny
School of Economics & Geary Institute for Public Policy
University College Dublin

Geary WP2018/13
July 11, 2018

UCD Geary Institute Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of
such a paper should account for its provisional character. A revised version may be available directly from the author.

Any opinions expressed here are those of the author(s) and not those of UCD Geary Institute. Research published in this series
may include views on policy, but the institute itself takes no institutional policy positions.

1
Basic Stata graphics for economics students

Kevin Denny
School of Economics & Geary Institute for Public Policy
University College Dublin
11th July 20181

This paper provides an introduction to the main types of graph in Stata that economics
students might need. It covers univariate discrete and continuous variables, bivariate
distributions, some simple time plots and methods of visualising the output from estimating
models. It shows a small number of the many options available and includes references to
further resources.

Keywords: Stata, graphics, data visualization

JEL codes: A2, C87, Y10

1
This guide was written with my students in mind. I am releasing it into the wild as others may find it helpful but
mainly as a commitment device to prevent me extending it further. Comments to [Link]@[Link]
2
1. Introduction

It has often been said that a picture paints a thousand words. This is certainly true when it
comes to data analysis. There are two good reasons to acquire some skills in graphing your
data: (1) Graphical methods are a powerful way for a researcher to explore data and (2)
graphs can be a very useful way of illustrating your data and results whether it is in a
presentation, a project or a thesis.

To motivate the first reason above, consider the following set of graphs:

These four graphs are collectively known as Anscombe’s Quartet. You may be surprised to
learn that the x variables in these four graphs have the same mean and the same variance.
This is also true of y and, moreover, the covariance between x and y is the same & hence the
regression line is the same. Clearly they are very different relationships. Without graphing
the data, you would probably never know.

In this document I show how to use Stata to generate some of the key graphs that economics
students should know about and should consider using in their projects, presentations and
theses. There are several good online treatments of Stata graphics (listed at the end). Stata’s
Youtube channel has videos on graphics which are excellent. The book by Michael Mitchell is

3
a fantastic resource which you could also draw on. Andrew Jones’ guide, though designed for
health econometrics, is of general interest if you are using Stata. Jesse Shapiro has a great set
of slides on preparing a good applied microeconomics presentation but is relevant for any
applied economics talk & probably other social sciences. Here I am going to outline the main
methods that I think economics students should know at a minimum. Along the way I show a
few of the many options available to whet your appetite. The definitive source of information
is the Stata Graphics Manual which is a mere 739 pages long. A classic text on data
visualization and graphics is Tufte (2001). For a shorter guide targeted at economists see the
paper by Schwabish (2014).

All of the datasets I use here are either available online & can be accessed in Stata using the
webuse command or they are provided with Stata and can be accessed using sysuse. To
switch from one dataset to another you need to use clear first. Stata commands will be in
bold. A basic knowledge of Stata is required. There are two ways to create graphs in Stata.
You can either (a) use a written command which can be done interactively in the command
line or written in a do-file or (b) you can use the dialogue boxes/pull-down menus at the top.

A nice feature is that if you use the dialogue box to create a graph Stata will show you the
equivalent syntax in the output window so you can learn how to generate the graph. You
could copy the syntax into a do-file so you can repeat the exercise. So I tend to use the pull-
down menus to experiment until I get the graph looking like I want. Then I copy the syntax
that generates it from the output window into my do-file so I can replicate it later.

When Stata produces a graph for you on the screen click on “file” at the top left: you can
either save it or you can open the editor to make further changes. Stata’s native format for
graphs is .gph. If you want to include it in a Word document or a Powerpoint file for example
you need to save it to a format like portable network graph (. png). , a tiff file (.tiff), a
Windows meta file (. wmf) or a postscript file (.ps). You may need to experiment saving to
different formats to get something that works with your document. As .wmf files only work
with Windows, if you are a Mac user or you are collaborating with one it might be best to
avoid them: .emf is better. If in doubt I recommend saving the graph as .png. Postscript files
can end up taking a lot of space if there are a large number of data points in your graph.

4
A feature I will not discuss here is that you can create two graphs separately and then
combine them into one graph. Koffman (2015) has a few slides on this or help graph
combine.

The Stata graphics editor has numerous options & you can customize the graph in many
many ways. It is beyond the scope of this document to describe how. Here I am mostly going
to use the graph commands that come with Stata. However there are some good user-written
commands for Stata graphics that are freely available online. You can find and download
them within Stata using the findit command. Here I will draw on just four of these:
binscatter, coefplot, fabplot and vioplot. To download the first of these say, just type:
findit binscatter in the command line or ssc install binscatter. Hit return and follow the
steps.

As these user created commands are occasionally revised, it is worth using the adoupdate
command periodically to ensure you have the latest version.

5
2. Distributions

When you are analysing data it is essential that you carefully explore the data before you get
stuck into modelling using it using econometric methods. You really need to get to know your
data. There are a few reasons for this. One is that exploring the data will sometimes show up
anomalies, for example there might be crazy values like missing values code as -1 or 99. The
main reason is to get a sense of what the basic patterns are. This is particularly the case for
variables that you create from the raw data. It is very easy to make a mistake – even
experienced users do - so if you generate a new variable you want to check does it look
sensible.

2.1 Univariate
We will first consider looking at the distribution of a single variable. You should certainly
have a good look at your key variables before you do any modelling.

2.1.1 Discrete
For a discrete variable you should use a histogram:

webuse fullauto

ta rep78 generates a table of this discrete variable. This is fine as far as it goes and you may
want to include a table like this in your document particularly if this is your dependent
variable. Note that to keep the table nicely aligned as it is in Stata you need to use the Courier
font. However it may be hard to get a sense of the distribution simply by looking at the table.
If you are preparing a presentation, for example, you want the audience to easily grasp what
the data looks like. Let’s graph it next.

histogram rep78, discrete fraction creates a histogram where the heights of the bars gives
the fractions in each category. If you want to show percentages instead simply replace
fraction with percent. If you want absolute frequencies use frequency.
.4
.3
Fraction

.2
.1
0

0 1 2 3 4 5
Repair Record 1978

You can generate a graph with separate histogram for different groups beside each other:
histogram rep78, discrete fraction by(, title(Figure 1) note(Data: auto)) by(foreign)

7
Figure 1
Domestic Foreign

.6
.4
Fraction

.2
0

0 2 4 6 0 2 4 6
Repair Record 1978
Data: auto

See how I have added a title and a note at the bottom?

Another way of illustrating shares across categories is the pie chart. Pie charts are not to
everyone’s taste and some are less than helpful so use them wisely. Try graph pie,
over(rep78) as an alternative to a histogram. If you would like to see what % each slice
contains: graph pie, over(rep78) plabel(_all percent) .

A bar chart is a useful way of comparing some characteristic of a variable (like the mean)
across different categories of a variable. For example

graph bar (mean) price, over(rep78)

8
graph bar (mean) price, over(rep78) over(foreign) shows the mean across categories of
two variables

An alternative way of doing this where the two graphs are separate is:

graph bar (mean) price, over(rep78) by(foreign)

Note that bar charts aren’t just for means. You can use them to compare other statistics like
medians, standard deviations, maxima etc. When you use the pull-down menu there is a
“Statistic” option beside the variable. If you want the bars to be horizontal, replace bar with
hbar. The mean is the default. To generate a bar chart of standard deviations across the
categories of rep78 for example try:

graph bar (sd) price, over(rep78)

Stata has another type of graph which can be used to illustrate differences in means or
medians of a continuous variable across categories of one or more variables.

sysuse nlsw88, clear

grmeanby race married collgrad, summarize(wage) ytitle($) ytitle(, size(medlarge))

title(Mean wage differences) subtitle(hourly wage)

9
This example shows in one graph the differences in average earnings between the categories
of three variables. A glance at this graph suggests that education seems to be more important
than marital status. Note that I have added a title to the Y axis (the “$”) and changed the size
of the title. The command line I used continues onto a second line. If you were using this in a
do-file Stata has to know to read it as one line. Entering it like this in your do-file will work:

grmeanby race married collgrad, summarize(wage) ytitle($) ytitle(, size(medlarge)) /*

*/ title(Mean wage differences) subtitle(hourly wage)

The /* */ comments out the end of the line so Stata just reads it as one line. Adding median
after the “,” (the comma) in the command will show the medians instead e.g.:

grmeanby race married collgrad, summarize(wage) median

10
2.1.2 Continuous

There are a few ways to show the distribution of a continuous variable. You can use a
histogram as shown on page 5. My preferred method is to generate something called a kernel
density function.

kdensity price

The smoothness of the density is controlled by a bandwidth parameter. Stata calculates a

default parameter & reports it. You can see it is 605.6 in the above example. You can over-
ride this if necessary using the bandwidth option. For example by reducing the bandwidth to
400 you make it less smooth. Be careful not to over-smooth i.e. setting the bandwidth so high
that you remove key features of the data. There is a handy download akdensity which allows
the bandwidth to adapt optimally to how much data there is at a particular part of the
distribution. The “norm” option superimposes a normal distribution which can be useful if
you have reasons to believe that the variable should be normal:

kdensity price, bw(400) norm xtitle(US $) title(Distribution of car prices)

11
Note how I have added a title and a label for the x axis. Sometimes you may wish to
superimpose two densities on top of each other. For example if you are looking at the
distribution of earnings, it might be useful to compare the earnings of men and women or
married and unmarried people.

Use the nlsw88 data

sysuse nlsw88, clear

kdensity wage if married==0 , addplot(kdensity wage if married==1)

Kernel density estimate

.15
.1
Density

.05
0

0 10 20 30 40
hourly wage

Kernel density estimate

kdensity wage
kernel = epanechnikov, bandwidth = 1.0162

The density shown in blue is for the unmarried (married==0) and red is for the married.
The legend below the graph is not very helpful unfortunately and you will need to edit it in
the graphics editor so you can end up with something like:

Kernel density estimate

.15
.1
Density

.05
0

0 10 20 30 40
hourly wage

Unmarried
Married
bandwidth = 1.0162

12
An alternative way of comparing the distribution of a continuous variable across categories
of some other discrete variable is a boxplot (aka “box and whisker” plot). The middle line in
the box shows the median, the bottom and top of the box show the 25th & 75th quartiles,
respectively. So the height of the box is the IQR, the inter-quartile range.

The “whiskers” from the box extend vertically to the upper and lower adjacent values. Their
definition is somewhat tricky: think of a value U= the 75th percentile + (3/2)* IQR and L= the
25th percentile – (3/2)*IQR. The upper adjacent value is the value of x which is ≤ U. The lower
adjacent value is defined as the value of x which is ≥L.2 Essentially, the whiskers pick up the
extent to which the distribution is spread out outside of the IQR. Points outside this range are
shown as dots. In the example below we show the distribution over categories of two
variables, college graduate status and marital status:

graph box wage , over(collgrad) over(married):

40
30
hourly wage

20
10
0

not college grad college grad not college grad college grad
single married

The points at the top of each plot show that this variable is right (positively) skewed. These
points can sometimes distort the diagram so if you wish to omit them adding noout at the
end of the line will do. The command below will plot the bars horizontally and removes the
outliers:

graph hbox wage, over(married) noout

2
Noe that this is Stata’s implementation of box plots which goes back to the influential work of John Tukey (1977).
Other approaches are possible. For example some have the whiskers extend to the 10 th & 90th percentiles instead.
13
Violin plots are a useful way of combining box plots and densities invented by Hintze &
Nelson (1998). First download the vioplot package (“ssc install vioplot”). Then, using the
auto dataset:

vioplot mpg , over(rep78) title("Violin plot of mileage") subtitle("by repair record")

The white dot is a marker for the median, the thick line shows the interquartile range with
whiskers extending to the upper & lower adjacent values (as defined above). This is overlaid
with a density of the data. Violin plots contain a lot of information though you may need to
fiddle with the options to get it looking right.

14
2.2 Bivariate

To examine a bivariate distribution, start with a scatterplot

twoway (scatter price mpg , msymbol(Oh))

I used the msymbol option above to change the dots to an “O”. Scatterplots are not always
very illuminating and you may want to adjust them. It is simple to fit and plot a linear
regression to this data:
twoway (scatter price mpg) (lfit price mpg)

If you replace lfit with lfitci it shows the confidence interval around the line. Using qfit
instead fits a quadratic curve (and hence qfitci instead show the confidence intervals).

15
If you want to show a scatterplot for two different subsets of the data try:

scatter mpg weight if foreign || scatter mpg weight if !foreign , yline(20)

The blue dots refer to the first named subset (foreign cars). I have added a line
corresponding to y=20 with the yline(20) option. You can have more than one line: using say
xline(3000 4000) would create vertical lines corresponding to those values of x. You can
use this if there is a particular x or y value that is important (e.g. a particular year). This
syntax is another way of creating the same basic diagram:

twoway (scatter mpg weight if foreign) (scatter mpg weight if !foreign)

Note the variable foreign is either 0 or 1 so !foreign means “not foreign” i.e. foreign==0.
Sometimes a scatter plot has so many datapoints that you end up with a graph that’s not very
illuminating. Let’s switch to the nlsw88 data set to see this:

sysuse nlsw88, clear

scatter wage tenure (note this is the same as twoway (scatter wage tenure) )

16
40
30
hourly wage

20
10
0

0 5 10 15 20 25
job tenure (years)

Not terribly clear is it? There is a handy Stata download called binscatter. It groups the x-
axis variable into equal-sized bins, computes the mean of the x-axis and y-axis variables
within each bin, then creates a scatterplot of these data points.

binscatter wage tenure

10
9
8
wage

7
6
5

0 5 10 15 20
tenure

What this brings out is there is a positive relationship between wages and job tenure.
binscatter has many nice feature that you can explore. It is possible to generate scatterplots
of 3 variables i.e. with 3 dimensional graphs using a download graph3d. These are trickier to
get in a form that is helpful. An alternative way of illustrating the bivariate relationship
between two variables is to fit a curve to the data. Stata has several ways of doing this. A
popular method is called lowess (for locally weighted scatterplot smoothing).

17
lowess wage tenure, by(married) lineopts(lwidth(medthick))

Here I have also made the line thicker than the default. An alternative curve-fitting technique
is local polynomial smoothing:

lpoly wage tenure, ci lineopts(lcolor(yellow) lwidth(thick)) title(Local polynomial

smoothing) note(nlsw88 data)

I have made the line even thicker & changed the colour to yellow. The confidence interval is
so tight for most of the range of tenure you can’t see it here except in the right tail where it
spreads out as there is so little data. Which technique you use is partly a matter of taste and
what works best with your data.

18
In the first graph on the previous page I used “by(married)” to generate separate graphs for
two subsets according to that variable. If I used twoway scatter wage tenure, by(married
union) I would have four separate graphs for each of the combinations e.g. single & non-
union, single & union.. etc:.

A download fabplot, created by Nick Cox, shows graphs in which each panel contains all the
points but the particular subsets are highlighted. For example:

fabplot scatter wage tenure , by(married union)

19
So in the top left graph, the blue dots show you where the single non-union members are in
the whole distribution. This may facilitate uncovering where in the joint distribution a
particular subset is concentrated.

To generate a matrix of scatterplots for several variables:

webuse auto, clear

graph matrix price mpg weight length, half

Price

30
Mileage
(mpg)
20

10
5,000

4,000
Weight
3,000 (lbs.)
2,000
250

200 Length
(in.)
150

5,000 10,000 15,000

10 20 30 402,000 3,000 4,000 5,000

Omitting the “half” option means that the upper triangle (symmetric to the lower one) is also
shown.

20
2.3 Time plots
If your data is time series it is best to use the dedicated line plot for time series command.

webuse klein, clear

tsset yr This tells Stata that this variable is the time variable
tsline consump invest govt This will graph the three variables over time:
80
60
40
20
0

1920 1925 1930 1935 1940

year

consumption investment
government spending

Use twoway (tsline consump, recast(scatter)) If you do not wish the points to be
connected.

To plot the correlogram i.e. autocorrelations between yt and yt-1, yt and yt-2 etc:
ac consump, lags(8)

Sometimes you have two variables and you want to illustrate the range between them over
time. For example they could be the upper and lower bounds for a given outcome, like a daily
price high and low.

sysuse tsline2
twoway rarea ucalories lcalories day

21
For a slightly different look replace rarea with rcap or rbar.

twoway area calories day shades the area under the curve

Some time series fluctuate have high variance and it may be preferable to create a smoother
version of the data that removes much of the short run volatility. The simplest way of doing
this is a moving average. To graph a moving average in Stata, create it using tssmooth ma
first.

tssmooth ma caloriessm = calories , window(3 1 2)

22
This generates a 6 year moving average with three lags, the current value and two leads. You
can apply weights to the different values if you wish. To show the original and the moving
average series:

tsline cal*

The widely used Hodrick-Prescott filter for macroeconomic data is available with the tsfilter
hp command.

Sometimes you might wish to plot two or more variables which have different dimensions
for example GNP and the unemployment rate. In that case you can use a separate y axis for
each of the two. Using the klein dataset:

twoway (scatter consump yr, c(l) yaxis(1)) (scatter taxnetx yr, c(l) yaxis(2))

.
23
To get a taste for some of the many options you can use in crafting an image, consider this:

twoway (scatter consump yr, c(l) msymbol(Dh) mcolor(blue) msize(large)

clwidth(thick) clcolor(maroon) )

2.4 Panel data

If you are using panel/longitudinal data it is possible to create line plots of a series for the
individuals:
sysuse xtline1
xtset person day
xtline calories, overlay

Dropping the overlay option generates a separate graph for each individual. If “N” is large
i.e. there are many individuals, then this form of graph may not be very useful.
24
3. Graphs after estimation
After you have estimated your models there are several reasons to use graphs. One is that,
post-estimation, it may be wise to examine various characteristics of the residuals. A second
is that sometimes a plot of regression coefficients or marginal effects is an easier way of
showing the results.

sysuse nlsw88, clear

reg wage age married collgrad union [Link]
predict reshat , residuals
kdensity reshat , norm

This plots the residuals from the model and superimposes a normal distribution. You can see
the residuals don’t look normal. It is well known that the distribution of earnings is close to
being log-normal i.e. the log is normally distributed. If you change the dependent variable to
the log of wages & re-estimate the model you will find the residuals are remarkably close to a
normal distribution (2nd graph below)

Kernel density estimate

.15
.1
Density

.05
0

-10 0 10 20 30
Residuals

Kernel density estimate

Normal density
kernel = epanechnikov, bandwidth = 0.6171

25
Kernel density estimate
log wage model

.8
.6
Density

.4
.2
0

-2 -1 0 1 2
Residuals

Kernel density estimate

Normal density
kernel = epanechnikov, bandwidth = 0.0932

If your data is time-series then you should be interested in whether the residuals are
autocorrelated. A graphical way of doing this is examine the correlogram which plots the
degree of autocorrelation for different lags i.e. the autocorrelation between the residuals in
period t and t-1, t and t-2 etc.

webuse klein, clear

tsset yr
reg consump totinc
predict reshat , residuals
ac reshat, lags(9) recast(connect)

So the autocorrelation between residuals in periods t and t-1 is about .65. The lags fade away
as we would expect so there is little correlation between residuals in period t and t-5, say.
The corrgram command displays a table of the autocorrelations and has a crude graph of
them.

26
To plot regression coefficients there is a user written command coefplot written by Ben Jann
(2013). By default it shows 95% confidence intervals but you can change that.

sysuse auto
regress price mpg headroom trunk length turn
coefplot, drop(_cons) xline(0)

Mileage (mpg)

Headroom (in.)

Trunk space (cu. ft.)

Length (in.)

Turn Circle (ft.)

-1500 -1000 -500 0 500

In the example above the coefficient on each variable is the marginal effect of that variable.
That is because the price variable is assumed to be a linear function of the x variables. If any
of the x variables enter non-linearly then the marginal effect of the variable will be different
at different values. Say for example if price depends on mpg (miles per gallon) and its square:

price = 0 + 1MPG +  2 MPG 2 + ...

price
 = 1 + 2 2 MPG
MPG

Stata’s margins command comes can be used to evaluate the marginal effect at different
values of mpg. First run the model, say: regress price [Link]##[Link] headroom trunk
length turn
margins, dydx(mpg) will give you the average marginal effect with a standard error but
doesn’t tell you how it varies with mpg. Note that if you had created the square of mpg as a
separate variable (say “mpgsq”) & then included in the model just like another variable then
margins would not provide the correct marginal effect of mpg as Stata, in using margins,
would not know what mpgsq is. That’s why it is necessary to use the [Link]##[Link]

27
formulation. Note that [Link]#[Link] will generate a quadratic function but without the
linear component i.e. the square only.

margins, dydx(mpg) at (mpg=(12(4)41)) evaluates the marginal effect at different values

of mpg starting at 12 and increasing by increments of 4 to 41. The results are presented in a
table but it is better at this stage to use the marginsplot command to get a nice graph of the
marginal effect of mpg as it varies with mpg. Since the model was quadratic in mpg, the
marginal effect is linear (see the 2nd equation above).

With coefplot and margins there are numerous options to customize the output. I have just
shown the basics. margins can also be used with interactions between variables. For
example

sysuse nhanes2
reg bpsystol agegrp##sex
margins agegrp#sex

<output omitted>
This regresses a continuous variable on a categorical age variable interacted with a dummy
variable and then calculates the marginal effects of the interactions. Typing marginsplot
produces a graph with the predicted value of the outcome for each category by sex with 95%
confidence intervals. Older people have higher (systolic) blood pressure and the age gradient
is steeper for females.

28
4. Schemes

While you can tweak the look of graphs in many ways, one approach is to use different styles
of graphs (called schemes) that Stata has created. Taking the scatterplot we had on page 16
above, you could try:

scatter mpg weight if foreign || scatter mpg weight if !foreign , scheme(vg_teal)

Other schemes include s2color, s1mono, s2mono. Using scheme(economist) replicates the
look of the graphs in The Economist magazine. To see the different schemes available, when
you open the dialogue box for graphs, open the tab for “overall”. The “scheme” option is at
the top left. Alternatively, entering graph query, schemes in the Stata command line will
provide a full list.
29
5. The definitive pie chart:

ooOoo

30
6. References & resources

Hintze, Jerry & Ray Nelson (1998). Violin Plots: A Box Plot-Density Trace Synergism. The
American Statistician 52(2):181-84.

Jann, Ben (2013). coefplot: Stata module to plot regression coefficients and other
results. [Link]

Jones, Andrew (2017) Data visualization and health econometrics

[Link]

Koffman, Dawn (2015) Introduction to Stata 14 graphics

[Link]

Mitchell, Michael N (2012) A visual guide to Stata graphics 3rd ed. Stata Press

Schwabish, Jonathan (2014) An economist’s guide to visualizing data. Journal of Economic

Perspectives, 28(1) 209-234.
[Link]

Shapiro, Jesse (nd) How to give an applied micro talk

[Link]

Tufte, Edward (2001) The visual display of quantitative information 2nd ed. Graphics
Press.

Tukey, John (1977) Exploratory Data Analysis, Addison-Wesley.

Van Kerm, Philippe (2012). Kernel-smoothed cumulative distribution function

estimation with akdensity. Stata Journal 12: 543-548.

Introduction to Stata Graphics: [Link]

Introduction to graphs in Stata

[Link]

Stata graphics tutorial [Link]

Common questions

Stata offers extensive customization options, such as changing axis labels, adding titles, and adjusting line sizes and colors. These features enhance data presentation by ensuring that graphs are not only informative but also visually appealing and tailored to the specific audience. This is particularly useful in educational and presentation settings, where clear and engaging visualizations can significantly aid in communication and understanding .

The 'binscatter' method is significant because it addresses the issue of overplotting in scatter plots where there is a high density of data points, which can obscure trends. By binning the data into intervals and plotting the averages, 'binscatter' provides a clearer visualization of the relationship between variables, making it easier to discern patterns that dense scatter plots might hide .

For univariate discrete variables, the document suggests using histograms. Graphical methods like histograms help illustrate the distribution of discrete variables, making it easier for an audience to understand the data at a glance compared to numerical tables. This can be particularly useful in presentations to quickly convey information .

The document describes using histograms and kernel density plots for displaying the distribution of continuous variables. Bandwidth is crucial in kernel density plots as it controls the plot's smoothness. A smaller bandwidth may reveal more details of the data distribution, while a larger bandwidth may smooth over important features, potentially masking valuable insights. The importance of setting an appropriate bandwidth lies in accurately reflecting the true distribution of the data .

Anscombe's Quartet consists of four datasets that have nearly identical simple descriptive statistics such as mean and variance, yet they have different distributions and relationships between the variables. Without graphing these datasets, a researcher might overlook these differences and make incorrect assumptions. This illustrates the critical role of data visualization in uncovering underlying patterns that statistics alone cannot reveal .

Saving graph files in different formats is important because different formats are suited for various applications and can affect the quality and compatibility of the visuals. For instance, .png files are generally compatible across many platforms and retain quality, whereas postscript files may be more appropriate for printed materials but can become large with high data point counts. Compatibility with collaboration tools and the intended use (e.g., digital or print media) should guide the choice of format .

User-written commands in Stata can be downloaded using the 'findit' command followed by the command name, or the 'ssc install' command. Once installed, these commands extend Stata's functionality, offering tailored analyses and visualizations not available in standard packages. Such functionalities are important because they allow researchers to apply specialized techniques and adapt their analyses to specific research needs, enhancing the overall capability and flexibility of Stata .

Box plots can sometimes omit detailed distribution information, such as bimodality, or over-emphasize outliers. Violin plots address these limitations by overlaying a density plot onto the box plot. This combines the detail of density estimation with the summary statistics of a box plot, offering a more comprehensive view of the data distribution .

Smoothing techniques like lowess enhance scatter plot analysis by fitting a smoothed line that captures trends within noisy data. Unlike a linear regression line, lowess does not assume a specific functional form and can reveal nonlinear relationships and local patterns in the data. This allows for a more nuanced understanding of the underlying data trends and can highlight features that might be missed by traditional methods .

Kernel density plots can show comparisons between different subgroups by plotting the densities of each group on the same graph, distinguished by different colors. For example, the document shows how to compare earnings distributions by plotting densities for married versus unmarried individuals on the same chart. This allows visual comparison of the shapes and positions of the distributions .

Basic Stata Graphics for Economists
No ratings yet
Basic Stata Graphics for Economists
28 pages
Introduction to Stata Graphics Basics
No ratings yet
Introduction to Stata Graphics Basics
56 pages
StataSession Graphs
No ratings yet
StataSession Graphs
4 pages
Stata Demo: Descriptive Statistics Guide
No ratings yet
Stata Demo: Descriptive Statistics Guide
12 pages
Davis Better Graphs v2
No ratings yet
Davis Better Graphs v2
17 pages
Stata Tutorial: Data Analysis Basics
No ratings yet
Stata Tutorial: Data Analysis Basics
8 pages
An Introduction To Stata Graphics
No ratings yet
An Introduction To Stata Graphics
53 pages
Stata Data Management & Graphics Guide
No ratings yet
Stata Data Management & Graphics Guide
19 pages
Stata Programming Techniques Overview
100% (1)
Stata Programming Techniques Overview
2 pages
Data Visualization Techniques in Stata 19
No ratings yet
Data Visualization Techniques in Stata 19
33 pages
Introduction to STATA Basics
No ratings yet
Introduction to STATA Basics
19 pages
Stata Guide for Health Researchers
No ratings yet
Stata Guide for Health Researchers
391 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
82 pages
GARCH Modeling in Stata Example
No ratings yet
GARCH Modeling in Stata Example
59 pages
Graphical Techniques in Descriptive Statistics
No ratings yet
Graphical Techniques in Descriptive Statistics
49 pages
Introduction to Stata Basics
No ratings yet
Introduction to Stata Basics
8 pages
Chapter 2 - Exploratory Financial Data Analysis
No ratings yet
Chapter 2 - Exploratory Financial Data Analysis
62 pages
Stata Data Analysis Workshop Guide
No ratings yet
Stata Data Analysis Workshop Guide
22 pages
Panel Data Analysis with Stata Guide
100% (5)
Panel Data Analysis with Stata Guide
35 pages
STATA Graphs for Academic Success
No ratings yet
STATA Graphs for Academic Success
2 pages
Stata Techniques for Table-like Graphs
No ratings yet
Stata Techniques for Table-like Graphs
21 pages
Stata 18 Tutorial: Data Management & Stats
No ratings yet
Stata 18 Tutorial: Data Management & Stats
60 pages
Creating Figures with ggplot2
No ratings yet
Creating Figures with ggplot2
58 pages
Statistics Practical
No ratings yet
Statistics Practical
35 pages
Exploratory vs Confirmatory Data Analysis
100% (1)
Exploratory vs Confirmatory Data Analysis
48 pages
Graphical Data Analysis With R
100% (1)
Graphical Data Analysis With R
306 pages
Graphical Data Presentation Techniques
No ratings yet
Graphical Data Presentation Techniques
54 pages
Descriptive Statistics: Graphing Methods
No ratings yet
Descriptive Statistics: Graphing Methods
41 pages
Graphical Methods in Descriptive Stats
No ratings yet
Graphical Methods in Descriptive Stats
14 pages
Graphing Frequency Distribution
No ratings yet
Graphing Frequency Distribution
5 pages
Statistical Thinking in Business Analytics
No ratings yet
Statistical Thinking in Business Analytics
50 pages
EC481 Data Analysis Guidelines
No ratings yet
EC481 Data Analysis Guidelines
15 pages
STATA: Comprehensive Guide to Commands
No ratings yet
STATA: Comprehensive Guide to Commands
26 pages
Introduction to Stata Basics
No ratings yet
Introduction to Stata Basics
9 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
14 pages
Understanding Graphs: Types & Uses
No ratings yet
Understanding Graphs: Types & Uses
14 pages
Understanding Graphs: Types & Uses
No ratings yet
Understanding Graphs: Types & Uses
15 pages
Data Visualization Techniques in Econ10005
No ratings yet
Data Visualization Techniques in Econ10005
21 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
2004 - Cox - Speaking Stata - Graphing Categorical and Compositional Data
No ratings yet
2004 - Cox - Speaking Stata - Graphing Categorical and Compositional Data
27 pages
Stata 15: A Comprehensive Guide
No ratings yet
Stata 15: A Comprehensive Guide
17 pages
Data Analytics for Business Decisions
No ratings yet
Data Analytics for Business Decisions
15 pages
Professional Edition: Statgraphics Lets You Make Sense of It All
No ratings yet
Professional Edition: Statgraphics Lets You Make Sense of It All
2 pages
Introduction to Stata Software Usage
No ratings yet
Introduction to Stata Software Usage
34 pages
SPSS 26 Manual for Data Analysis
No ratings yet
SPSS 26 Manual for Data Analysis
27 pages
R Graphics: A Comprehensive Guide
No ratings yet
R Graphics: A Comprehensive Guide
56 pages
Introduction to Stata Software
No ratings yet
Introduction to Stata Software
36 pages
Stata 17: Essential Commands Guide
No ratings yet
Stata 17: Essential Commands Guide
17 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
27 pages
Data Visualization and Exploration Basics
No ratings yet
Data Visualization and Exploration Basics
115 pages
STATA 13 Econometrics Tutorial Guide
No ratings yet
STATA 13 Econometrics Tutorial Guide
45 pages
Descriptive Analytics Techniques Explained
No ratings yet
Descriptive Analytics Techniques Explained
18 pages
Using dropmiss in Stata
No ratings yet
Using dropmiss in Stata
41 pages
Data Description Methods Overview
No ratings yet
Data Description Methods Overview
96 pages
Introduction to Stata Software
No ratings yet
Introduction to Stata Software
45 pages
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
No ratings yet
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
32 pages
Stata Economic Data Analysis Guide
No ratings yet
Stata Economic Data Analysis Guide
22 pages
Understanding Simultaneous Equation Models
100% (1)
Understanding Simultaneous Equation Models
17 pages
Endogeneity and Instrumental Variables Guide
No ratings yet
Endogeneity and Instrumental Variables Guide
32 pages
Advanced STATA Training Session 3
100% (1)
Advanced STATA Training Session 3
53 pages
FAO - Rice in Human Nutrition
No ratings yet
FAO - Rice in Human Nutrition
166 pages
Understanding Rural Transformation Dynamics
No ratings yet
Understanding Rural Transformation Dynamics
44 pages
Principles of Agricultural Economics
100% (3)
Principles of Agricultural Economics
334 pages
Current Issues in Agricultural Economics
100% (1)
Current Issues in Agricultural Economics
130 pages
Rural Development Principles Policies Management
100% (3)
Rural Development Principles Policies Management
369 pages
Book EconomicsOfTheEnvironment
100% (2)
Book EconomicsOfTheEnvironment
334 pages
Climate Smart Argriculture: Building Resilience To Climate Change
No ratings yet
Climate Smart Argriculture: Building Resilience To Climate Change
629 pages
Economics of Natural Resources Overview
100% (4)
Economics of Natural Resources Overview
396 pages
Agriculture Production Guide Malawi 2020
100% (15)
Agriculture Production Guide Malawi 2020
448 pages
Economics of The Environment - Theory and Policy (PDFDrive)
100% (2)
Economics of The Environment - Theory and Policy (PDFDrive)
328 pages
Soil Fertility and Fertilizers
100% (11)
Soil Fertility and Fertilizers
529 pages
Climate-Smart Agriculture Strategies
100% (1)
Climate-Smart Agriculture Strategies
36 pages
Carl P. Simon, Lawrence E. Blume - Mathematics For Economists-W. W. Norton & Company (1994)
100% (7)
Carl P. Simon, Lawrence E. Blume - Mathematics For Economists-W. W. Norton & Company (1994)
953 pages
Basics of Agriculture For Engineers - Useful For B. Tech. (Agricultural Engineering) PDF
100% (5)
Basics of Agriculture For Engineers - Useful For B. Tech. (Agricultural Engineering) PDF
117 pages
Agricultural Economics Course Overview
100% (2)
Agricultural Economics Course Overview
45 pages
(Pranav K Desai) Agricultural Economics (BookFi)
No ratings yet
(Pranav K Desai) Agricultural Economics (BookFi)
317 pages
Agribusiness Management Essentials
100% (3)
Agribusiness Management Essentials
158 pages
Vegetable Disease
100% (13)
Vegetable Disease
449 pages
Stata Guide for Econometrics 4th Ed.
No ratings yet
Stata Guide for Econometrics 4th Ed.
10 pages
Introduction to Rural Development
100% (2)
Introduction to Rural Development
22 pages
Agricultural Mechanization Business Plan
85% (99)
Agricultural Mechanization Business Plan
32 pages
Agricultural Economics Notes
94% (68)
Agricultural Economics Notes
86 pages
Agribusiness Commodity Systems Overview
100% (2)
Agribusiness Commodity Systems Overview
100 pages
George Grekousis - Spatial Analysis Methods and Practice - Describe - Explore - Explain Through GIS-Cambridge University Press (2020) PDF
100% (4)
George Grekousis - Spatial Analysis Methods and Practice - Describe - Explore - Explain Through GIS-Cambridge University Press (2020) PDF
535 pages
Rural Economy's Role in National Growth
No ratings yet
Rural Economy's Role in National Growth
18 pages
Environmental Economics
90% (10)
Environmental Economics
492 pages
1 Digitalization of Agriculture A Way To Solve The Food Problem or A Trolley Dilemma
No ratings yet
1 Digitalization of Agriculture A Way To Solve The Food Problem or A Trolley Dilemma
8 pages
High Strength Bolts Test Report
No ratings yet
High Strength Bolts Test Report
1 page
Valve Test Report Summary
No ratings yet
Valve Test Report Summary
59 pages
Car Buying Vocabulary Guide
No ratings yet
Car Buying Vocabulary Guide
31 pages
Elementary Statistics Formulas Guide
No ratings yet
Elementary Statistics Formulas Guide
13 pages
SWOT Analysis for College Insights
No ratings yet
SWOT Analysis for College Insights
2 pages
Depreciation Calculation Guide
No ratings yet
Depreciation Calculation Guide
2 pages
Influential Thinkers in Political Economy
No ratings yet
Influential Thinkers in Political Economy
3 pages
Strategic Cost Management Analysis
No ratings yet
Strategic Cost Management Analysis
117 pages
Payroll SOX Compliance Control Matrix
100% (5)
Payroll SOX Compliance Control Matrix
8 pages
Core Macroeconomic Definitions Explained
No ratings yet
Core Macroeconomic Definitions Explained
2 pages
Airtel Monthly Statement Summary
No ratings yet
Airtel Monthly Statement Summary
13 pages
Astra and Honda Showroom Projects
No ratings yet
Astra and Honda Showroom Projects
9 pages
Supply Chain Management Assessment Survey
No ratings yet
Supply Chain Management Assessment Survey
4 pages
Money Idioms and Their Meanings
No ratings yet
Money Idioms and Their Meanings
9 pages
Chapter Twelve: Behavioral Finance
No ratings yet
Chapter Twelve: Behavioral Finance
11 pages
Understanding Business Activity Basics
No ratings yet
Understanding Business Activity Basics
27 pages
Dishonor of Negotiable Instruments
100% (1)
Dishonor of Negotiable Instruments
10 pages
Land Acquisition and State Dynamics
No ratings yet
Land Acquisition and State Dynamics
8 pages
SOCAR: International Ratings Overview
No ratings yet
SOCAR: International Ratings Overview
7 pages
MRP Inventory Management Analysis
No ratings yet
MRP Inventory Management Analysis
3 pages
Overview of Industrial Types and Locations
No ratings yet
Overview of Industrial Types and Locations
24 pages
Economics of Uncertainty and Information
No ratings yet
Economics of Uncertainty and Information
59 pages
America's Protectionist Takeoff 1815-1914 - Michael Hudson
No ratings yet
America's Protectionist Takeoff 1815-1914 - Michael Hudson
368 pages
Understanding Production Functions and Technologies
No ratings yet
Understanding Production Functions and Technologies
32 pages
Applied Financial Accounting I Guide
No ratings yet
Applied Financial Accounting I Guide
77 pages
Short-Term vs Long-Term CD Analysis
No ratings yet
Short-Term vs Long-Term CD Analysis
2 pages
Manufacturing Data Analysis for 2023
No ratings yet
Manufacturing Data Analysis for 2023
3 pages
Covidy Store Expense Analysis 2019-2020
No ratings yet
Covidy Store Expense Analysis 2019-2020
2 pages
Customer Purchase Data Analysis
No ratings yet
Customer Purchase Data Analysis
633 pages
Amending Miami's Historic Preservation Code
No ratings yet
Amending Miami's Historic Preservation Code
11 pages