Big Data in Credit Scoring

Global Findex (Financial Inclusion Index) report which was released in 2018 shows only 48.9 percent of the adults in Indonesia own a bank account. Millions of unbanked Indonesian adults work in private sector and get paid in cash. What is the main reason to these young adults not having bank account? the reason is the distance and surprisingly 69 percent of this population segment have their own mobile phone. We see some efforts from the banking institution to reach out this unbanked population but it is not enough, there is still a wide gap.

We may now realize that why there are many advanced technology multi-finance companies occur in the recent years, these companies fill the gap. They know the characteristics of the unbanked population and by utilizing the technology they can reach out more of this population. But, reaching this population is not without some risks.

Multi-finance companies compete with each other to capture the market, they will offer many products to attract the clients. Some of the companies focus on lending primarily to people with little or no credit history. Asymmetric information, also known as "information failure," is bound to happen. When it comes to borrowing or lending money, asymmetric information occurs when the borrower has more information about his financial state than the lender does.

Have you ever wondered how a bank or financing company can approve or reject someone (client) credit application? most of financing companies use services from credit rating agencies (CRA) to measure the credit worthiness of the clients. They will measure the client's credit score each time the client apply for a credit, this attempt will also help reducing the asymmetric information.

The process of generating the credit score is called credit scoring. It is widely applied in many industries especially in the banking. Generally, it contains two main parts: Building the statistical model and applying a statistical model to assign a score to a credit application or an existing credit account. The statistical model for credit scoring is called Scorecard Model and most of the time the model is based on Logistic Regression.

Why Logistic Regression? It is more about finding relationships between variables and the significance of those relationships. Most of the time it is more stable and easy to interpret compared to advanced or black box model. Interpretability of model should be important since finance companies should have 'clear' explanation of why a client is rejected or accepted. But on the other side, less advanced model like Logistic Regression is often sacrificing the predictive power to cater the interpretability.

Score from the statistical model usually shows the probability of the clients to be default or not able to pay the credit in the future. It means as the Score increases, the clients tend to be default. But, most of the CRA will convert (to make it more interpretable for the public) this default probability to some ranges of value that show credit worthiness, it means as the score increases the client tends to be a good client.

Utilizing Big Data for Credit Scoring

The ideal Scorecard model should have the capabilities to capture all the behaviours of the clients and CRA usually have access to the credit history of the client, but sometimes it is not enough. Nowadays, some of advanced technology CRAs start utilizing the big data. It has been estimated that 2.5 quintillion bytes of data are generated each day. An interesting way to visualize this much data is to imagine this: this amount of data would fill 10 million Blu-ray discs, which, stacked, would equal the height of four Eiffel Towers arranged on top of each other. These astonishing amounts of data are often referred to as big data.

The ability of a financial institution to use all of the data, whether structured or semistructured, is crucial in the age of big data analytics. Using data to make decisions that span across the entire financial institution can make that institution more efficient, and drive an increase in revenue. As stated above, 69 percent of the unbanked population have their own mobile phone. All activities that they do in their mobile phone are captured somewhere. Those are valuable data that can be changed to the predictors for the Scorecard model.

We can see some patterns or even make wild hypothesis from the big data comes from the mobile phone. For example, fraudsters tend to use WiFi connection when they apply for credit through mobile application or default clients tend to visit betting website excessively prior applying for credit. Client's mobile phone brand and combining with some other data can also (loosely) approximate their economic condition. CRA may hypothesize that if the clients use unpopular phone they usually come from lower income population and likely have difficulties in credit repayment. Total main storage of client's phone, on the other hand, can approximate whether the clients posses high-technology phone or not thus again can describe their economic condition. Many other hypothesis can be derived from the big data and in the end the Scorecard model will prove whether the hypothesis are right (statistically significant) or not.

Statistik-Saurus

Search This Blog

Big Data in Credit Scoring

Comments

Post a Comment

Popular posts from this blog

How to Create Indonesia Map in R

Modifying Some Plots in R

Interactive Visualisation Using R