Using ggplot2

American Household Income: the Mean is much higher than the Median

Made on ggplot

I have a standard code for ggplot2 which I use to make line graphs, scatter plots, and histograms.

For lines or scatters:

p<- ggplot(x, aes(x=Year, y=Rank, colour=Uni, group=Uni)) #colour lines by variable Uni #group Uni labelled variables in the same line

Then: 

p + #you get an error if not for this step
geom_line(size=1.2) +
geom_point(data=QS[QS[,2]==”2013″,]) +
geom_text(data=QS[QS[,2]==”2013″&QS[,1]!=”Princeton”,],aes(label=paste(paste(Rank,”.”,sep=””),Uni)),hjust=-0.2)+
ylim(17,0.5) +
scale_x_continuous(limit=c(2004,2014),breaks=seq(2004,2014,1)) +
theme(legend.position=”none”) +
ggtitle(“QS University Rankings 2008-2013”) +
theme(plot.title=element_text(size=rel(1.5))) +
theme_bw() +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
geom_text(aes(label=country),size=6,vjust=-1) +
annotate(“text”,x=2011,y=16.5,label=”Abbas Keshvani”)

For a bar chart:

ggplot(Dist, aes(x=B,y=C,fill=A)) +  #stacked bars, column A contains stacks
geom_bar(stat=”identity”, width=0.9) +

Abbas Keshvani

Advertisements

Parametric vs non-Parametric Linear Models (LM)

 a  b

Histogram: LM estimates of Intercepts

Histogram: LM estimates of Gradient

 c  d

QQ Plot: LM estimates of Intercepts

QQ Plot: LM estimates of Gradient

Figure 1: Gradient  appears to follow a normal distribution more than intercept .

When do we use a parametric model, and when do we use a non-parametric one? In the above example, “Intercept” is one random variable, and “Gradient” is another. I will show you why “Intercept” is better modeled by a non-parametric model, and “Gradient” is better modeled by a parametric one.

In Figure 1, histograms and QQ plots of “Intercept”  and “Gradient”  show that the latter appears to follow a normal distribution whereas the former does not. As such, a parametric (normal) distribution would not be appropriate for modelling “Intercept”. This leads us to believe that a non-parametric distribution is a better method for estimating “Intercept”.

However, a parametric (normal) distribution might be appropriate for modelling “Gradient”, which appears to follow a normal distribution, according to both its histogram and QQ plot.

Abbas Keshvani