Multi-temporal analysis of past and future land cover change in the highly urbanized state of Selangor, Malaysia

This study analysed the multi-temporal trend in land cover, and modelled a future scenario of land cover for the year 2030 in the highly urbanized state of Selangor, Malaysia. The study used a Decision Forest-Markov chain model in the land change modeller (LCM) tool of TerrSet software. Land cover maps of 1999, 2006 and 2017 were classified into 5 classes, namely water, natural vegetation, agriculture, built-up land and cleared land. A simulated land cover map of 2017 was validated against the actual land cover map 2017. The Area Under the Curve (AUC) value of 0.84 of Total Operating Characteristics (TOC) and higher percentage of components of agreement (Hits + Correct rejection) compared to components of disagreement (Misses + False alarm + Wrong hits) indicated successful validation of the model. The results showed between the years 1999 to 2017 there was an increase in built-up land cover of 608.8 km2 (7.5%), and agricultural land 285.5 km2 (3.5%), whereas natural vegetation decreased by 831.8 km2 (10.2%). The simulated land cover map of 2030 showed a continuation of this trend, where built-up area is estimated to increase by 723 km2 (8.9%), and agricultural land is estimated to increase by 57.2 km2 (0.7%), leading to a decrease of natural vegetation by 663.9 km2 (8.1%) for the period 2017 to 2030. The spatial trend of land cover change shows built-up areas mostly located in central Selangor where the highly urbanized and populated cities of Kuala Lumpur and Putrajaya and the Klang valley are located. The future land cover modelling indicates that built-up expansion mostly takes place at edges of existing urban boundaries. The results of this study can be used by policy makers, urban planners and other stakeholders for future decision making and city planning.


Introduction
Land use and land cover (LULC) change is the human modification of the earth's terrestrial surface. Over the past 50 years, the driving force behind LULC change has been the increase in agricultural land, and since 1992 the rapid urbanization has also added to the ever-growing change of land use and land cover (IPBES 2019), and worldwide half of the habitable land is used for agriculture (Ritchie and Roser 2019). The changes from one LULC to another can affect the biogeochemical cycles and biogeophysical processes occurring between the surface and atmosphere (Zhou et al. 2020). LULC change can impact biogeochemical cycles like the carbon cycle by altering carbon sinks and carbon dioxide (CO 2 ) emissions (Li et al. 2020a, b). LULC change affects biogeophysical processes like surface albedo, roughness, and evapotranspiration (Lejeune et al. 2017;Winckler et al. 2017;Hirsch et al. 2018), and hence alter the energy budget, water budget, and atmospheric variables like temperature and rainfall.
Many studies have shown that urbanization can result in increasing of surface temperature, which leads to Urban Heat Island (UHI) effect (Chao et al. 2020;Hafoud et al. 2020;Qiu et al. 2020;Son et al. 2020;Sultana and Satyanarayana 2020), this can furthermore alter rainfall over urban areas (Pielke et al. 2002;Liang and Ding 2017;Schmid and Niyogi 2017;Liu and Niyogi 2019;Singh et al. 2020;Yu et al. 2020). Urbanization can also affect the hydrological cycle, where impermeable urban surfaces increase runoff and lead to an increased risk of flooding (Li et al. 2018;Ohana-Levi et al. 2018;Hu et al. 2020).
Besides affecting biogeochemical and biogeophysical processes, LULC change also has a great impact on biodiversity and ecosystem services. The conversion of natural vegetation to agricultural land and urban areas can lead to the degradation of the ecosystem services and loss of biodiversity, particularly in biodiversity-rich hotspots like South East Asia. Forests and protected areas play an essential part in providing a healthy environment that benefits both people and nature, by protecting biodiversity, culture, and livelihoods of indigenous people and local communities within these areas (Shadeed and Almasri 2010;UNEP-WCMC and IUCN 2016), and providing important ecosystem services such as carbon sequestration, landscape value, and regulation of major element cycles (Ronchi and Salata 2017). Deforestation can amplify and increase the severity of floods, hence forests and protected areas are important in flood mitigation (Bhattacharjee and Behera 2018; Hasyim et al. 2020;Tembata et al. 2020).
Furthermore, LULC change modelling can be used to estimate and predict future LULC change scenarios and their effects on the environment, by analysing past and current LULC change trends, and the variables that bring about these changes. Moreover, these LULC change models can be integrated with other models, like climate models (Fattah et al. 2021), hydrological models (Shirmohammadi et al. 2020;Tankpa et al. 2020;Galleguillos et al. 2021), and ecosystem models (Krause et al. 2019;Li et al. 2020a, b;Rocca and Milanesi 2020), to better understand the impact of LULC change. Modelling future urban land use change, in particular has been of interest to the scientific community in recent times, due to the growing global population and urbanization. In the past few years many studies have used various LULC models, to model future urban expansion and to study their effects on the environment. For example, studies Rimal et al. 2020;Grigorescu et al. 2021;Okwuashi and Ndehedehe 2021) all modelled future urban land use change, and in all the studies it was estimated urban land will increase and other land uses will decrease.
In the 25 years  the largest conversion of forest area to other land uses occurred in the tropics, with South East Asia having the highest rate of deforestation since 1990 (Masum et al. 2017). In Malaysia agriculture especially palm oil plantations, was the main contributor to the economy until 1987, and then manufacturing took over as the main contributor to the economy as the government shifted its development policy to the manufacturing sector, and as a result by the year 2000 other sectors like infrastructure and commercial development started growing, giving a rise to urbanization (Abdullah and Hezri 2008). However, palm oil is still a major part of the Malaysian economy and continues to expand. Between 1990 and 2017 agricultural land increased by 55.7% with 98.2% of this area being plantations (Yan et al. 2020). And as of 2017 palm oil plantations occupy 17.62% of the landmass in Malaysia, which has led to a 20% loss in forest land (Ezechi and Muda 2019).
Urbanization on the other hand has seen a rapid increase, and Malaysia is one of the most urbanized countries in East Asia (Plecher 2020). 50.4% of the population lived in urban areas in 1991, this number reached 65% in 2010 and as of 2020 the number of the population living in urban areas has reached 75%, and with projections by 2040, it will reach 85% (Samat et al. 2020). This has resulted in the expansion of urban areas at the expense of other land covers, urban areas increased from 1793.2 ha in 1992 to 3235.4 ha in 2002, and in 2010 urban areas reached 3987.8 ha, on the other hand, agricultural land decreased from 6171.3 ha (53.8%) in 1992 to 3883 ha (35%) in 2010 (Mohammed et al. 2016). At the state level, Selangor and Penang have historically been the most urbanized states in Malaysia and the rate of urbanization has continued to rise over the years. On the other hand, the federal territories of Kuala Lumpur and Putrajaya located within Selangor have been 100% urbanized since 2010 (Hasan and Nair 2014). Overall the level of urbanization in Malaysia as of 2020, stands at 77.16% (O'Neill 2021).
In Malaysia, there have been few studies that have used LULC models, GIS, and remote sensing to study LULC change, with these studies having a varying spatial scope. For example, Gambo et al. (2018) and Rafaai et al. (2020) used LULC change modelling to study the changes within and around protected areas, whereas Verburg et al. (2002), Memarian et al. (2012), Ibrahim and Ludin (2016), Kamarudin et al. (2018) and Majid et al. (2018) studied LULC change at the basin level, and others like Boori et al. (2015), Almdhun et al. (2018) and Samat et al. (2020) studied land use change in cities/towns. In Selangor, several studies have carried out LULC change modelling For example, Boori et al. (2015) and Nourqolipour et al. (2015aNourqolipour et al. ( , b, 2016 analysed LULC change for certain parts of Selangor. Therefore, there are still gaps in knowledge in the trends and changes in LULC in Malaysia, and only a few handful of studies have modelled future LULC changes. LULC change models are great tools for researchers and professionals to explore the dynamics and drivers that bring about change in LULC (Agarwal et al. 2002). LULC change models are capable of capturing (reproducing) these complex dynamics of LULC change and be used to extrapolate future land use scenarios (Soesbergen 2016), which can help to inform policies affecting such change. A broad array of models and modelling methods are available to researchers, and each type has certain advantages and disadvantages depending on the objective of the research. There are statistical and empirical models like logistic regression and Markov chain, dynamic models like Cellular Automata (CA), and integrated models (Al-sharif and Pradhan 2014). The Markov chain and CA are the most commonly used methods in LULC change and many studies use an integration of the CA-Markov method (Hamad et al. 2018;Karimi et al. 2018;Huang et al. 2020;Khawaldah et al. 2020;Mansour et al. 2020).
In recent years several machine learning ( The RF algorithm has several advantages over other machine learning methods: it is faster and easier to understand and interpret, the algorithm is completed at a fixed number of operations, it can process large volumes of data, a small quantity of parameters is needed to be adjusted during modelling, and has higher accuracy compared to other machine learning algorithms (Kamusoko and Gamba 2015;Legdou et al. 2020;Mao et al. 2020).
LULC change is increasing and surpassing climate change as having the most significant impact on environmental change dynamics. Selangor has been experiencing rapid LULC change and urbanization, with the federal territories of Kuala Lumpur and Putrajaya being 100% urbanized. This has therefore had a significant impact on the environmental dynamics in Selangor. The LULC change and urbanization have increased the risk of flooding and air pollution, increased urban temperatures, and resulted in the degradation of the natural ecosystem within Selangor. With Kuala Lumpur and Putrajaya being 100% urbanised, the continuous expansion of these urban centres could affect the surrounding areas and result in urban sprawl.
Therefore, for local decision-makers and urban planners to mitigate the future impacts of the urban expansions and LULC changes, and to implement better land use policies, it is important to know what the future scenarios of LULC change are, determine the spatial changes, and quantify these changes. A few studies have focused on LULC change modelling in Selangor; however, there has not been any study that has analysed the trend and changes of LULC for Kuala Lumpur, Putrajaya, and Selangor as a whole, and what the future scenario could be. Therefore, the objective of this study is to determine what future land-use change scenarios in Selangor are and how the expansion of the urban centres could affect the surrounding areas.
The Decision Forest algorithm, which is a model rarely used in LULC change modelling, has shown to have high accuracy in past studies, and has several advantages over other models. An integrated Decision Forest algorithm and Markov Chain model in the LCM tool of the TerrSet2020 software developed by Clark Labs (Clark Labs 2021) was used in this study. The DF algorithm is a modification of the original RF algorithm by Leo Breiman and Adele Cutler (Breiman 2001;Cutler et al. 2012).

Study area
The state of Selangor, with an area of 8200 km 2 , is located in the western part of Peninsular Malaysia and lies near to the equator (Fig. 1). Selangor is one of the 13 states of Malaysia. It is located at latitudes 2°35′-3°60′ N and longitudes 100°45′-102°00′ E on the west coast of Peninsular Malaysia. It is bordered by Perak to the north, Pahang to the east, Negeri Sembilan to the south, and the Strait of Malacca to the west. The state capital is Shah Alam, while Klang serves as the royal capital. It is the most populated state in Malaysia, with a population of about 6.5 million (Department of Statistics Malaysia 2021) and it's highly urbanized.

Data
The study used Landsat 5 TM (1999 and2006) and Landsat 8 OLI (2017) satellite images obtained from United State Geological Services (USGS) website at (http:// earth explo rer. usgs. gov/). The SRTM digital elevation map was also obtained from the USGS website. The ancillary data like road networks and rivers were obtained from the open street website at (https:// www. opens treet map. org) which were used for land use change modelling. The slope map was created from the DEM and using the slope tool in ArcGIS 10.2.2 software developed by ESRI (2014).

Land cover classification and accuracy assessment
For the land cover classification ERDAS IMAGINE 2020 software developed by Hexagon Geospatial (2020) was used. First, image mosaic was carried out to merge Fig. 1 The study area (the state of Selangor and its districts) located in Western Peninsular Malaysia images which give the full extent of the study area, this was then followed by a haze reduction process to remove or reduce any haze in all the images. For the 2017 image, the panchromatic band of Landsat 8 was used to pan sharpen the image and improve its spatial resolution from 30 m to 15 m for better interpretation and classification of land cover. The maximum likelihood algorithm under supervised classification which is a commonly used method for classification was performed to classify Landsat images of 1999, 2006, and 2017. The following 5 land cover classes were generated: water, natural vegetation, agriculture, built-up land, and cleared land.
For accuracy assessment, the accuracy assessment tool in ERDAS software was used. The accuracy assessment tool allows the comparison of random sample points in the classified map with reference pixels with known class labels. For each map 150 sample points were generated using stratified random sampling, and an error matrix was constructed for each land cover map. The reference data were obtained using Google Earth and its historical imagery were unable to obtain onsite ground truth data. This was then followed by the calculation of the producer accuracy, user accuracy, and the overall accuracy.

Land cover change modelling
To simulate a future land cover map, a Decision Forest-Markov chain model is used. The Land Change Modeller (LCM) in TerrSet2020 software developed by Clack Labs (Clark Labs 2021) is used for the modelling. The LCM is based on historical land cover data, transition potential maps, and Markov matrices, to simulate future LC change. The LCM consists of 3 main steps, change analysis, transition potential modelling, and change prediction.

Change analysis
The change analysis step calculates the nature and extent of land cover change between time 1 and time 2 and between 2 land cover maps. The changes that are identified are transitions from one land cover state to another. The change analysis evaluates gains and losses, detects net gains, and creates change maps.

Transition potential modelling
In this step, the potential of land to transition is identified, and transition potential maps for each transition are created. The transition potential maps that have the same underlying driver variables are grouped within an empirically evaluated transition sub-model. A transition sub-model can consist of a single land cover transition or a group of transitions that are thought to have the same underlying driver variables. These driver variables are used to model the historical change process.
The driver variables used in this study are: distance to rivers, distance to roads, distance to urban area, DEM, and slope (Figs. 2, 3). The driver variables were selected based on the literature review (Camara et al. 2020;Rafaai et al. 2020) and the author's knowledge of the study area. The transition potential maps are created using the Decision Forest algorithm, which is an implementation of the Random Forest method.

Change prediction
In the final step, the historical change of rates calculated in the change analysis step and the transition potential maps, are used to predict a future scenario for a specified future date. The Markov Chain determines the amount of change using the earlier and later land cover maps along with the date specified. The procedure determines exactly how much land would be expected to transition from the later date to the prediction date based on a projection of the transition potentials into the future and creates a file of transition probabilities. The file of transition probabilities is a matrix that records the probability that each land cover category will change to every other category.
In this study, the 1999 and 2006 land cover maps are first used for the calibration and validation stages of the model, and a land cover map of 2017 is simulated and validated with a reference land cover map of 2017. Then in the next step, a future land cover map of 2030 is simulated using the 2006 and 2017 maps. The gains and losses and net gain for the year 1999-2006 and 2006-2017 from the change analysis step is shown in Fig. 4.

Validation
For the validation of the model the area under the curve (AUC) of Total Operating Characteristic (TOC) and the 3 maps comparison cross tabulation method which measures components of agreement and disagreement are used. TOC method indicates how well the model is predicting change, while the 3 maps cross-tabulation matrix provides detailed information on the accuracy of predicted change and persistence of each land cover class. The 3 maps cross tabulation method uses a reference map of time 1, a reference map of time 2, and a simulated map of time 2 to create the cross-tabulation matrix. In this study reference map 2006 (t1), reference map 2017 (t2), and simulated map 2017 (t2) are used. There are 2 components of agreement called Hits and Correct Rejection and 3 components of disagreement called Misses, False alarm, and Wrong hits. These metrics of agreement and disagreement are recommended by  and Pontius and Millones (2011) as an alternative to Kappa statistics since Kappa indices attempt to compare accuracy to a baseline of randomness, but randomness is Fig. 2 Raster maps of distance to roads, distance to rivers, and distance to urban area, used as driver variables for the transition potential modelling step not a reasonable alternative for map construction, hence Kappa statistics can give an illusion of high accuracy.

Land cover classification and accuracy assessment
The spatio-temporal land cover classification maps of 1999, 2006, and 2017 are shown in Fig. 5. In the period 1999 to 2017 built-up land increased by 608.8 km 2 (7.5%), agricultural land increased by 285.5 km 2 (3.5%), and water bodies increased by 21.1 km 2 (0.3%), whereas natural vegetation decreased by 831.8 km 2 (10.2%) and cleared land decreased by 83.7 km 2 (1%) ( Table 1). The majority of the urban expansion took place in central Selangor, where the federal territories of Kuala Lumpur and Putrajaya are located. The agricultural expansion from 1999 to 2006 took place mostly in the North of Selangor.
The overall accuracy for the land cover map 1999, 2006, and 2017 are 84%, 92.74%, and 88.67% respectively, and the producer accuracy and user accuracy are shown in Table 2.
For land cover maps 1999 and 2017, the water has low producer accuracy and user accuracy of 50%. In the accuracy assessment of both land cover maps, there were only 2 random samples for the water, and in both cases, the model wrongfully classified one sample point as another class, hence the 50% producer and user accuracy, as shown in the error matrix in Table 3.

Land cover change modelling and validation
The simulated land cover map of 2030 is shown in Fig. 6. The future simulation of land cover change shows an increase of 723 km 2 (8.9%) in built-up land and an increase of 57.2 km 2 (0.7%) in agricultural land from the year 2017 to 2030, on the other hand, there is a decrease  (Table 4).

Model validation
The area under the curve (AUC) of the Total Operating Characteristic (TOC) is shown in Fig. 7. The validation had an AUC of 0.84.
The hits, misses, false alarms, wrong hits, and correct rejections of the cross-tabulation matrix are shown in Fig. 8. The total components of agreement were 71.1% which is the sum of Hits (3.1%) and Correct rejections (68%), and the total components of disagreement were 28.9% which is the sum of Misses (14.9%), False alarm (9.4%) and Wrong hits (4.7%).

Discussion
This study developed a DT-MC model to analyse the trend in land cover and to estimated future change in Selangor. The trend in land cover change from the period 1999 to 2017, shows an increase of 608.8 km 2 (6.8%) in built-up area. The majority of the built-up areas are located in central Selangor, where the highly populated federal territories of Kuala Lumpur and Putrajaya are located. This increase in built-up areas in Selangor could be mostly attributed to the expansions of Kuala Lumpur, Putrajaya, and the surrounding Klang Valley, as previous studies have shown these cities have expanded and resulted in urban sprawl (Rosni et al. 2016;Almdhun et al. 2018). The AUC value of 0.84 and the higher percentage of total components of agreement (71.1%) compared to total components of disagreement (28.9%), show that the model is adequately calibrated and validated, and it's suitable for simulating future land use maps. This is comparable with the study by Samardžić-Petrović et al. (2015), that had similar AUC value and accuracy, which indicates that the DF-MC is a suitable model in land cover change modelling.
The projected 2030 land cover map shows this urban expansion to continue, and the model estimated that urban land covers to increase by 723 km 2 (8.9%) in the period 2017 to 2030 (Fig. 9). The model shows the urban expansion taking place mostly at the edges of existing urban boundaries, and road networks, an indication of the distance to roads and distance to exiting urban areas, are the most impactful variables on urban expansion.
The increase of transportation networks has played a major role in the expansion of urban areas in Malaysia. For example, in George Town, Northern Malaysia, it is estimated by 2030 urban land to expand from 925.77 to 1253.95 km 2 , with the North-South highway and the   et al. Ecological Processes (2022) 11:2 second bridge between Georg Town and Penang island playing a major role in this expansion (Samat et al. 2020). Similarly, a study in Northern part of Selangor shows an estimated urban expansion of about 33% from 2015 to 2033, with distance to road and distance to build-up areas, being the major variables (Camara et al. 2020).
Similar to build-up areas, agricultural land has seen an increase in area of 285.5 km 2 (3.5%) from 1999 to 2017, and the model estimated that agricultural land will increase by 57.2 km 2 (0.7%) from 2017 to 2030. The increase of built-up area and agricultural land on the other hand has resulted in a decrease in natural vegetation by 831.8 km 2 (10.2%) from 1999 to 2017, where natural vegetation is turned into build-up and agricultural land cover. The hotspots of these agricultural land use change, are peat swamps in the north east of Selangor and the coastal regions, where distance to transportation infrastructure plays a major role, as distance to transport increases, agricultural land decrease (Olaniyi et al., 2015). The model estimates that there will be a loss of 663.9 km 2 (8.1%) in natural vegetation from 2017 to 2030. Water and cleared land both had very minimal changes, therefore it shows that the majority of the land for urban development and agriculture comes from natural vegetation, as the population and the economy of Selangor continue to increase.
The rapid increase in urbanization and uncontrolled urban expansion can have a negative impact on the environment. Selangor and in particular the cities of Kuala Lumpur and Putrajaya have experienced regular flooding events due to urban development (Bhuiyan et al. 2018 (Ravindran and Rajendra 2020). As the demand for urban land and agricultural land increases, more of the protected areas and urban green spaces will be used to meet these demands. The model shows the urban and agriculture expansion by 2030 will result in more loss of natural vegetation and urban green spaces, and the expansion of the major urban centres will affect their surrounding areas. Therefore, the results of this study can help local policymakers and urban planners to visualise and compare the impacts of future land cover change, which can help to formulate and implement better land use policies and structural plans, which can limit or prevent urban sprawl and uncontrolled urban expansion into surrounding areas. The future land cover change projection can also help local government to mitigate the effects of flooding, and develop a more resilient structural plan.  The results of this study show that, at the current rate of development and urbanization, large parts of natural vegetation will be lost by 2030. Therefore, the findings of this study can help Selangor state government, policymakers, urban planners, and other stakeholders to better understand and manage future development in Selangor and to achieve the objectives of the Selangor State Structural Plan 2035, where one of the objectives is to maintain and preserve 32% of forest areas. Furthermore, the results of this study can be incorporated in other studies, like climate and hydrological studies, where effects of land cover change and increasing urbanization on temperature and flood risk can be studied.
There are some limitations in using the DF-MC model. The Markov chain is essentially a projection model and not policy sensitive, therefore making it difficult to include policy variables like socioeconomic and population variables into the modelling (Iacono et al. 2015). To overcome this limitation would need to integrate the MC model with other models that can incorporate socioeconomic variables (Hamad et al. 2018). Moreover, the Decision Forest is a complex model that requires high computational power and is more time-consuming. The complexity of the model can be reduced by reducing the number of decision trees and lowering the number of variables used, and therefore it is advisable to carry out variable selection processes during modelling (Samardžić-Petrović et al. 2015). Overall, the model is capable of estimating future land cover change and can be used in future studies.