How SMEs can leverage public data to make business decisions?

Thu, Jun 2, 2022 5-minute read

Hi there!

Recently, a friend of mine was complaining to me that his business has reached a saturation point in Singapore, and he simply doesn’t have a space to expand his venture. I decided to conduct a small amount of research to find out if his business has exhausted all available growth points.

Firstly, let’s try to understand the nature of my friend’s business. He shared a few important facts about his venture:

  1. Due to technical requirements, his business can only be conducted in shopping malls 
  2. The target audience are middle-age adults, young adults, and secondary school kids
  3. The business has a seasonality factor (which isn’t important for the current exercise)
  4. Lastly, moderate foot traffic (footfall) is required to have a decent return

To understand, if my friend’s business has a room for growth, I’ve decided to establish how many shopping malls Singapore has. I picked a well known website as a reliable resource - Wikipedia. It has quite a comprehensive list of Singapore’s shopping malls. I was shocked - Singapore has more than 100 malls!

The next step was quite straightforward - I requested that my friend share the locations of his points of sales (about 25 locations). Then I simply plotted both sets of data on google maps (blue labels - shopping malls, red labels - my friend’s sales points):

business-locations

I think it’s quite straightforward to see that my friend’s business has a big potential for growth. There is an important caveat. Even though Singapore has a lot of shopping malls, many of them are old or outdated. They aren’t popular among young adults or school kids. Thus, it’s required to find a way to sort out suitable malls. In addition, my friend developed certain areas of Singapore pretty well, e.g. there is not much room for improvement within the city center.

To narrow down the search criteria, I’ve decided to do two things:

  1. Sort out shopping malls based on google reviews. My hypothesis is if a mall’s rating is higher than 4 it could be considered a “good mall”
  2. Try to create a population density map of Singapore to see if there is an under or over presence of sales points in certain regions

It was quite straightforward to get a malls’ ratings simply through the google search. On the other hand, I was faced with couple of difficulties during my work on the density map.

Before starting the work on the density map, I decided to conduct a small amount of research to understand how people visualize density levels. Pretty quickly, I managed to realize that the perfect way to visualize Singapore’s population density is a choropleth map.

In just a half an hour, I was able to find a magnificent guide for beginners on how to build a choropleth map from scratch. The article guides you through an approach to utilize a few popular python libraries such as pandas, numpy, and matplotlib to create a choropleth map of cities in Indonesia per province.

Once I realized that it’s quite straightforward to produce a choropleth map, I decided to park this activity and look for data for visualization. In short, I needed to find a piece of information on Singapore’s regions and the population density per region. Likely, the amazing website singstat.gov.sg, maintained by the Singapore Department of Statistics, helped me out one more time (that’s my first research on rental prices in Singapore). Firstly, the website has a link to Singapore’s master plan - the country development strategy for the next 40 years. The master plan is developed based on a country’s territorial division, which looks as follows:

singapore-regions

The territorial division part is exactly what I needed for a visualization.

The second source of information is Singapore Census of Population 2020. The report has comprehensive analytics on Singapore’s population dynamics including population distribution per each geographical district including age distribution. That’s exactly what I needed! At this point in time I had everything that I needed to produce a density map.

Definitely, I wasn’t able to join data set immediately. There is always something wrong: extra characters, non-ASCII characters, inconsistent naming, etc But after a little bit of struggle, I’ve managed to produce a pretty decent density map:

singapore-desity-map

If you aren’t aware of Singapore’s geography, you may be wondering why certain areas are completely uninhabited. The bottom left part of the city-state is occupied by the major industrial area together with Singapore’s port. That was the main engine of Singapore’s economy in the last century. The top right area is dedicated to a couple of natural reserves. Two patches in the city center are a military base and natural reserve. Lastly, the huge empty area on the right is Changi, an amazing airport in Singapore.

Once the density map was ready, I placed the same territorial divisions on a previous map with shopping malls and sales points. Thus, now it’s possible to compare both maps!

singapore-location-regions

At this stage, the research was almost completed. The last step is to combine available data points! Once I had combined these data points, a few observations arose:

  1. There are a few important, highly dense areas, where my friend is lacking of sales points such as Jurong West, Woodlands.
  2. There are quite a few areas that are over-saturated with sales points such as Rochor.

Once I had gotten my hands on the above facts, I got a list of shopping malls per each area sorted based on google reviews. My friend and I, based on sales data, proved that there is a positive correlation between positive reviews and good sales. Based on that information, I managed to prepare a list of shopping malls that might have sales potential.

Is it possible to go further and beyond to improve the research? Definitely! There are quite a few improvements that can be applied:

  1. The Singapore Census of Population has information about age distribution. Consequently, it was possible to apply age filtering to get cleaner results.
  2. It was possible to represent all data points on the same map, but I was too lazy to do such processing.

Stay tuned!