Skip to main content

Big Data in Credit Scoring

Global Findex (Financial Inclusion Index) report which was released in 2018 shows only 48.9 percent of the adults in Indonesia own a bank account. Millions of unbanked Indonesian adults work in private sector and get paid in cash. What is the main reason to these young adults not having bank account? the reason is the distance and surprisingly 69 percent of this population segment have their own mobile phone. We see some efforts from the banking institution to reach out this unbanked population but it is not enough, there is still a wide gap. 

We may now realize that why there are many advanced technology multi-finance companies occur in the recent years, these companies fill the gap. They know the characteristics of the unbanked population and by utilizing the technology they can reach out more of this population. But, reaching this population is not without some risks.

Multi-finance companies compete with each other to capture the market, they will offer many products to attract the clients. Some of the companies focus on lending primarily to people with little or no credit history. Asymmetric information, also known as "information failure," is bound to happen. When it comes to borrowing or lending money, asymmetric information occurs when the borrower has more information about his financial state than the lender does.

Have you ever wondered how a bank or financing company can approve or reject someone  (client) credit application? most of financing companies use services from credit rating agencies (CRA) to measure the credit worthiness of the clients. They will measure the client's credit score each time the client apply for a credit, this attempt will also help reducing the asymmetric information. 

The process of generating the credit score is called credit scoring. It is widely applied in many industries especially in the banking. Generally, it contains two main parts: Building the statistical model and applying a statistical model to assign a score to a credit application or an existing credit account. The statistical model for credit scoring is called Scorecard Model and most of the time the model is based on Logistic Regression.

Why Logistic Regression? It is more about finding relationships between variables and the significance of those relationships. Most of the time it is more stable and easy to interpret compared to advanced or black box model. Interpretability of model should be important since finance companies should have 'clear' explanation of why a client is rejected or accepted. But on the other side, less advanced model like Logistic Regression is often sacrificing the predictive power to cater the interpretability.



Score from the statistical model usually shows the probability of the clients to be default or not able to pay the credit in the future. It means as the Score increases, the clients tend to be default. But, most of the CRA will convert (to make it more interpretable for the public) this default probability to some ranges of value that show credit worthiness, it means as the score increases the client tends to be a good client.

Utilizing Big Data for Credit Scoring

The ideal Scorecard model should have the capabilities to capture all the behaviours of the clients and CRA usually have access to the credit history of the client, but sometimes it is not enough. Nowadays, some of advanced technology CRAs start utilizing the big data. It has been estimated that 2.5 quintillion bytes of data are generated each day. An interesting way to visualize this much data is to imagine this: this amount of data would fill 10 million Blu-ray discs, which, stacked, would equal the height of four Eiffel Towers arranged on top of each other. These astonishing amounts of data are often referred to as big data.

The ability of a financial institution to use all of the data, whether structured or semistructured, is crucial in the age of big data analytics. Using data to make decisions that span across the entire financial institution can make that institution more efficient, and drive an increase in revenue. As stated above, 69 percent of the unbanked population have their own mobile phone. All activities that they do in their mobile phone are captured somewhere. Those are valuable data that can be changed to the predictors for the Scorecard model. 

We can see some patterns or even make wild hypothesis from the big data comes from the mobile phone. For example, fraudsters tend to use WiFi connection when they apply for credit through mobile application or default clients tend to visit betting website excessively prior applying for credit. Client's mobile phone brand and combining with some other data can also (loosely) approximate their economic condition. CRA may hypothesize that if the clients use unpopular phone they usually come from lower income population and likely have difficulties in credit repayment. Total main storage of client's phone, on the other hand, can approximate whether the clients posses high-technology phone or not thus again can describe their economic condition. Many other hypothesis can be derived from the big data and in the end the Scorecard model will prove whether the hypothesis are right  (statistically significant) or not.

Comments

Popular posts from this blog

How to Create Indonesia Map in R

Creating the Map In this article, I will try to explain how to make Indonesia Map in R. I will assume that you are already familiar with the basic codes in R. First, we need the required libraries : require (maps) #loading maps package require (mapdata) #loading mapdata package library(ggplot2) #ggplot2 package library(readxl) #package for read .xlsx file library(ggthemes) #package for ggplot2 theme library(ggrepel) #extendig the plotting package ggplot2 for maps Then, we prepare the data that contains the information of provinces name, latitude, and longitude of every province in Indonesia, e.g. : You can download the data in here:  Data Now open the file and create the polygon: setwd( "your file's path" ) #set your own directory mydata<- read _xlsx( "dummy.xlsx" ) #assign the data to "mydata" View(mydata) #view the data, notice the column of "latitude","longitude", "woe_label" glo...

What Can We Learn from Greek Debt Dramas?

Greek Debt Dramas Before the Global Financial Crisis (GFC) in 2008, the Greek had positive economic growth and it was considered high among countries in eurozone. Average economic growth reached almost four per cent between 1999 and 2007. Then the crisis hit in 2007 where housing bubble burst and made the subprime mortgage market in the United State collapsed. The crisis in the U.S. created a chain reaction which causing global banking crisis and credit crunch that lasts through 2009. The crisis made Lehman Brothers, big financial company, collapsed and the government in the United States and Europe prepared to bail out their banks. Greece failed to pay their huge debt since borrowing costs rose and financing dried up.  The financial crisis affected the Greek economy by reducing financial liquidity and business activity. Greece had been fortunate enough to face the crisis with the euro instead of its national currency, if they were using their national currency the crisis wo...

Empirical Evidence of Engel’s Law Among Social Grant Recievers

Engel's law is an observation in economics stating that as income increases, the proportion of income spent on food decreases, even if absolute expenditure on food increases. The law was named after the statistician  Ernst Engel (1821–1896). One application of this statistic is treating it as a reflection of the living standard of a country. As this proportion — or "Engel coefficient" — increases, the country is by nature poorer; conversely a low Engel coefficient indicates a higher standard of living. Engel's Law image source: Wikipedia Using data collected through National Social and Economic Survey (NSES) by BPS-Statistics Indonesia, I tried to examine the existence of Engel's Law among households that received social grants in West Papua-Indonesia. Some studies found that giving additional money to the low-income households resulted in an increase in overall expenditure on food (on absolute) but the proportion  of income spent on food would decrea...