Skip to main content

Modifying Some Plots in R

Grant Data: 500 Households that get Social Grant in a certain region, the data comes from Social and Economic Survey. Download the dummy data: DATA
grant <- read.csv("grant.csv")

#PFOODEXP: Proportion of Food Expenditure to Total Expenditure
#HH_Income: Household Income ($)
#HH_FOOD: Household Food Expenditure ($)
#HH_Loc: Household Location (0: Rural, 1: Urban)
#Educ_H: Household Head Education (Year)
#HH_Size: Household Family Member
#Gender_H: Household Head Gender (0: Female, 1: Male )
#Age_H: Household Head Age

#Remove 1st column
grant[1] <- NULL  
#change variable type to factor
grant$Gender_H <- as.factor(grant$Gender_H) 
grant$HH_Loc <- as.factor(grant$HH_Loc)

head(grant)
##   PFOODEXP  HH_Food HH_Loc Gender_H Age_H Educ_H HH_Size HH_Income
## 1 73.16933 675.1766      1        0    32     22       4  922.7590
## 2 69.77417 596.2286      1        1    27     22      13  854.5119
## 3 63.31992 585.4946      0        1    42     15       8  924.6610
## 4 71.77146 559.9470      0        1    26     22       5  780.1806
## 5 52.36861 554.8954      1        0    32      9       3 1059.5954
## 6 79.74131 551.4373      0        1    21     15       5  691.5328
summary(grant)
##     PFOODEXP        HH_Food      HH_Loc  Gender_H     Age_H      
##  Min.   :29.12   Min.   :165.9   0:380   0:167    Min.   :15.00  
##  1st Qu.:56.43   1st Qu.:189.0   1:120   1:333    1st Qu.:25.00  
##  Median :65.35   Median :218.3                    Median :34.00  
##  Mean   :64.74   Mean   :251.1                    Mean   :34.16  
##  3rd Qu.:75.59   3rd Qu.:286.6                    3rd Qu.:42.00  
##  Max.   :86.98   Max.   :675.2                    Max.   :80.00  
##      Educ_H         HH_Size         HH_Income     
##  Min.   : 3.00   Min.   : 1.000   Min.   : 198.6  
##  1st Qu.: 6.00   1st Qu.: 4.000   1st Qu.: 280.7  
##  Median :15.00   Median : 5.000   Median : 351.0  
##  Mean   :12.57   Mean   : 5.392   Mean   : 406.6  
##  3rd Qu.:16.00   3rd Qu.: 6.000   3rd Qu.: 489.0  
##  Max.   :23.00   Max.   :20.000   Max.   :1059.6
#Scatter Plot with Linear Line

library(ggplot2)
library(ggthemes)

ggplot(data=grant, aes(HH_Income,PFOODEXP, colour = Gender_H, size = HH_Size)) + 
  geom_point(alpha=0.8) + geom_smooth(method = "lm", se=FALSE) +
  ylab("% Food Expenditure")+ xlab("Household Income per Month ($)") +
  guides(color = guide_legend(override.aes = list(size=5, linetype = c(0,0)), title = "HH Gender"), 
         size = guide_legend(override.aes = list(linetype = c(0,0)), title = "H Size")) +
  scale_size_continuous(range = c(1, 8), breaks = c(1, 2, 4, 8))+
  scale_color_manual(labels = c("female","male"), values = c("hotpink","deepskyblue"))+
  labs( col = "Gender") +
  ggtitle("Scatterplot of Percentage of Food Expenditure vs Household Income per Month ($)") +
  scale_x_log10()+
  theme_bw()+
  theme(plot.title = element_text(size=10, face= "bold"))
#Scatter Plot with LOESS

library(ggplot2)
library(ggthemes)

ggplot(data=grant, aes(HH_Income,PFOODEXP, shape = Gender_H, colour = Gender_H, size = HH_Size)) + 
  geom_point(alpha=0.8) + geom_smooth(method = "loess") +
  ylab("% Food Expenditure")+ xlab("Household Income per month ($)") +
  guides(colour = FALSE, 
         size = FALSE,
         shape = guide_legend(override.aes = 
                 list(size=5, linetype = c(0,0), 
                      colour = c("azure4","gold")), title = "HH Gender")) +
  scale_size_continuous(range = c(1, 8), breaks = c(1, 2, 4, 8))+
  scale_shape_manual(labels = c("female","male"), values = c("f","m"))+
  scale_color_manual(labels = c("female","male"), values = c("azure4","gold"))+
  ggtitle("Scatterplot of Percentage of Food Expenditure vs Household Income per Month ($)") +
  scale_x_log10()+
  theme_bw()+
  theme(plot.title = element_text(size=10, face= "bold"))
Scatter Plot with LOESS: One of advantage of LOESS method, it doesn’t require the specification of a function to fit a model to all of the data in the sample. One of disadvantage, it doesn’t generate a regression function that is easily represented by mathematical formula.
#Hexbin Plot

library(hexbin)
library(RColorBrewer)

# Create data
  y<-grant%>% pull("HH_Food")
  x<-grant%>% pull("HH_Income")
  
# Make the plot
  bin<-hexbin(x, y)
  rf=colorRampPalette(rev(brewer.pal(10,'Spectral')))
  
  hexbinplot(y~x, data=bin, main="Income vs Food Expenditure",
             colramp=rf, trans=log, inv=exp,mincnt=1, maxcnt=70,
             ylab="food expenditure ($)",
             xlab="income ($)", cex.label=0.7)
Hexbin Plot: Scatterplots can get very hard to interpret when displaying large datasets, as points inevitably overplot. Hexbinplot helps discerning the data individually. Code source: www.everydayanalytics.ca
#Density Plot

qplot(HH_Income,data = grant, geom="density", fill = HH_Loc ,alpha=I(.5), 
      ylab="Density",
      xlab= "Household Income($)", 
      main = "Distribution of Household (HH) Income per Month by Household Location") + 
      scale_fill_manual(labels = c("rural","urban"), values = c("tomato","mediumspringgreen"))+
      labs( fill = "HH Location") + geom_density(alpha= 0.2,aes(HH_Income), colour = "grey85")+
      theme_minimal() + theme(plot.title = element_text(size=10, face= "bold"))
Income distribution is often right skewed, this shows income inequality. The hypothesized reasons are differences in talents, skills, and opportunities. It is not surprising, household income distribution in urban area is more skewed than in rural area.

Comments

Popular posts from this blog

How to Create Indonesia Map in R

Creating the Map In this article, I will try to explain how to make Indonesia Map in R. I will assume that you are already familiar with the basic codes in R. First, we need the required libraries : require (maps) #loading maps package require (mapdata) #loading mapdata package library(ggplot2) #ggplot2 package library(readxl) #package for read .xlsx file library(ggthemes) #package for ggplot2 theme library(ggrepel) #extendig the plotting package ggplot2 for maps Then, we prepare the data that contains the information of provinces name, latitude, and longitude of every province in Indonesia, e.g. : You can download the data in here:  Data Now open the file and create the polygon: setwd( "your file's path" ) #set your own directory mydata<- read _xlsx( "dummy.xlsx" ) #assign the data to "mydata" View(mydata) #view the data, notice the column of "latitude","longitude", "woe_label" glo...

Who Retweets Whom? A Quantitative and Qualitative Analysis of a Social Network

Social network analysis (SNA) is the process of investigating social structures using networks and graph theory. It characterizes networked structures in terms of nodes or vertex (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. I won’t talk too much detail on the SNA theories as you can easily find them on the internet. There are some sources that provide data for SNA such as Stanford  and kdnuggets . In this simple analysis, I extract Twitter retweet data using “twitteR” package and analyse network retweet of a certain topic using “igraph” package in R. This tutorial from cosmopolitanvan may help you to replicate this work. As one of the largest social networks on the Internet, Twitter can be used for expanding your business or website's audience. It is free to create an account, it is easy to start tweeting to promote your work or share your ideas and thoughts. Twitter has 284 mil...

Big Data in Credit Scoring

Global Findex (Financial Inclusion Index) report which was released in 2018 shows only 48.9 percent of the adults in Indonesia own a bank account. Millions of unbanked Indonesian adults work in private sector and get paid in cash. What is the main reason to these young adults not having bank account? the reason is the distance and surprisingly 69 percent of this population segment have their own mobile phone.  We see some efforts from the banking institution to reach out this unbanked population but it is not enough, there is still a wide gap.  We may now realize that why there are many advanced technology multi-finance companies occur in the recent years, these companies fill the gap. They know the characteristics of the unbanked population and by utilizing the technology they can reach out more of this population. But, reaching this population is not without some risks. Multi-finance companies compete with each other to capture the market, they will offer many ...