I have a standard code for ggplot2 which I use to make line graphs, scatter plots, and histograms.
For lines or scatters:
p<- ggplot(x, aes(x=Year, y=Rank, colour=Uni, group=Uni)) #colour lines by variable Uni #group Uni labelled variables in the same line
p + #you get an error if not for this step
ggtitle(“QS University Rankings 2008-2013”) +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) +
Figure 1: Gradient appears to follow a normal distribution more than intercept .
When do we use a parametric model, and when do we use a non-parametric one? In the above example, “Intercept” is one random variable, and “Gradient” is another. I will show you why “Intercept” is better modeled by a non-parametric model, and “Gradient” is better modeled by a parametric one.
In Figure 1, histograms and QQ plots of “Intercept” and “Gradient” show that the latter appears to follow a normal distribution whereas the former does not. As such, a parametric (normal) distribution would not be appropriate for modelling “Intercept”. This leads us to believe that a non-parametric distribution is a better method for estimating “Intercept”.
However, a parametric (normal) distribution might be appropriate for modelling “Gradient”, which appears to follow a normal distribution, according to both its histogram and QQ plot.