Grant Data: 500 Households that get Social Grant in a certain region, the data comes from Social and Economic Survey. Download the dummy data: DATA
grant <- read.csv("grant.csv")
#PFOODEXP: Proportion of Food Expenditure to Total Expenditure
#HH_Income: Household Income ($)
#HH_FOOD: Household Food Expenditure ($)
#HH_Loc: Household Location (0: Rural, 1: Urban)
#Educ_H: Household Head Education (Year)
#HH_Size: Household Family Member
#Gender_H: Household Head Gender (0: Female, 1: Male )
#Age_H: Household Head Age
#Remove 1st column
grant[1] <- NULL
#change variable type to factor
grant$Gender_H <- as.factor(grant$Gender_H)
grant$HH_Loc <- as.factor(grant$HH_Loc)
head(grant)
## PFOODEXP HH_Food HH_Loc Gender_H Age_H Educ_H HH_Size HH_Income
## 1 73.16933 675.1766 1 0 32 22 4 922.7590
## 2 69.77417 596.2286 1 1 27 22 13 854.5119
## 3 63.31992 585.4946 0 1 42 15 8 924.6610
## 4 71.77146 559.9470 0 1 26 22 5 780.1806
## 5 52.36861 554.8954 1 0 32 9 3 1059.5954
## 6 79.74131 551.4373 0 1 21 15 5 691.5328
summary(grant)
## PFOODEXP HH_Food HH_Loc Gender_H Age_H
## Min. :29.12 Min. :165.9 0:380 0:167 Min. :15.00
## 1st Qu.:56.43 1st Qu.:189.0 1:120 1:333 1st Qu.:25.00
## Median :65.35 Median :218.3 Median :34.00
## Mean :64.74 Mean :251.1 Mean :34.16
## 3rd Qu.:75.59 3rd Qu.:286.6 3rd Qu.:42.00
## Max. :86.98 Max. :675.2 Max. :80.00
## Educ_H HH_Size HH_Income
## Min. : 3.00 Min. : 1.000 Min. : 198.6
## 1st Qu.: 6.00 1st Qu.: 4.000 1st Qu.: 280.7
## Median :15.00 Median : 5.000 Median : 351.0
## Mean :12.57 Mean : 5.392 Mean : 406.6
## 3rd Qu.:16.00 3rd Qu.: 6.000 3rd Qu.: 489.0
## Max. :23.00 Max. :20.000 Max. :1059.6
#Scatter Plot with Linear Line
library(ggplot2)
library(ggthemes)
ggplot(data=grant, aes(HH_Income,PFOODEXP, colour = Gender_H, size = HH_Size)) +
geom_point(alpha=0.8) + geom_smooth(method = "lm", se=FALSE) +
ylab("% Food Expenditure")+ xlab("Household Income per Month ($)") +
guides(color = guide_legend(override.aes = list(size=5, linetype = c(0,0)), title = "HH Gender"),
size = guide_legend(override.aes = list(linetype = c(0,0)), title = "H Size")) +
scale_size_continuous(range = c(1, 8), breaks = c(1, 2, 4, 8))+
scale_color_manual(labels = c("female","male"), values = c("hotpink","deepskyblue"))+
labs( col = "Gender") +
ggtitle("Scatterplot of Percentage of Food Expenditure vs Household Income per Month ($)") +
scale_x_log10()+
theme_bw()+
theme(plot.title = element_text(size=10, face= "bold"))
#Scatter Plot with LOESS
library(ggplot2)
library(ggthemes)
ggplot(data=grant, aes(HH_Income,PFOODEXP, shape = Gender_H, colour = Gender_H, size = HH_Size)) +
geom_point(alpha=0.8) + geom_smooth(method = "loess") +
ylab("% Food Expenditure")+ xlab("Household Income per month ($)") +
guides(colour = FALSE,
size = FALSE,
shape = guide_legend(override.aes =
list(size=5, linetype = c(0,0),
colour = c("azure4","gold")), title = "HH Gender")) +
scale_size_continuous(range = c(1, 8), breaks = c(1, 2, 4, 8))+
scale_shape_manual(labels = c("female","male"), values = c("f","m"))+
scale_color_manual(labels = c("female","male"), values = c("azure4","gold"))+
ggtitle("Scatterplot of Percentage of Food Expenditure vs Household Income per Month ($)") +
scale_x_log10()+
theme_bw()+
theme(plot.title = element_text(size=10, face= "bold"))
Scatter Plot with LOESS: One of advantage of LOESS method, it doesn’t require the specification of a function to fit a model to all of the data in the sample. One of disadvantage, it doesn’t generate a regression function that is easily represented by mathematical formula.
#Hexbin Plot
library(hexbin)
library(RColorBrewer)
# Create data
y<-grant%>% pull("HH_Food")
x<-grant%>% pull("HH_Income")
# Make the plot
bin<-hexbin(x, y)
rf=colorRampPalette(rev(brewer.pal(10,'Spectral')))
hexbinplot(y~x, data=bin, main="Income vs Food Expenditure",
colramp=rf, trans=log, inv=exp,mincnt=1, maxcnt=70,
ylab="food expenditure ($)",
xlab="income ($)", cex.label=0.7)
Hexbin Plot: Scatterplots can get very hard to interpret when displaying large datasets, as points inevitably overplot. Hexbinplot helps discerning the data individually. Code source: www.everydayanalytics.ca
#Density Plot
qplot(HH_Income,data = grant, geom="density", fill = HH_Loc ,alpha=I(.5),
ylab="Density",
xlab= "Household Income($)",
main = "Distribution of Household (HH) Income per Month by Household Location") +
scale_fill_manual(labels = c("rural","urban"), values = c("tomato","mediumspringgreen"))+
labs( fill = "HH Location") + geom_density(alpha= 0.2,aes(HH_Income), colour = "grey85")+
theme_minimal() + theme(plot.title = element_text(size=10, face= "bold"))
Income distribution is often right skewed, this shows income inequality. The hypothesized reasons are differences in talents, skills, and opportunities. It is not surprising, household income distribution in urban area is more skewed than in rural area.
Comments
Post a Comment