Skip to main content

Examining global warming factors using self-organizing map and Granger causality network: a case from South Korea



Understanding and patterning the possible causal variables of global warming is attributed to the development of effective prevention and mitigation strategies for climate change. Therefore, we aimed to pattern and visualize the possible causal variables of global warming and measure the causality between them.


We patterned and visualized the time series (103 years, from 1918 to 2020) of global surface temperature (GTemp) data with the gross domestic product (GDP) per capita, human population (Pop), and carbon dioxide (CO2) emissions of South Korea using a self-organizing map (SOM) and examined the causable local feature of global warming using the Granger causality (GC) test. The time-series data were trained and mapped in 4 × 4 SOM grids, and causality networks between variables were examined using multivariate Granger test statistics.


SOM patterned 103 years of data, and a dominant cluster contained continuous time-series data from 2007 to 2020. Similarly, the CO2 emissions of South Korea were obtained as a predictable unidirectional causal variable for GTemp from GC analysis. Based on data from the past 34 years, significant causality (p-value = 0.01) was observed with the integrated effect of Pop, GDP, and CO2 on GTemp.


This study patterned the time-series data using SOM and examined the causal relationship between variables using the GC test. The study framework could be used as a reference by future scholars, ecologists, and the United Nations Sustainable Development Goals.


Global warming is an important environmental issue, causing regional and seasonal temperature extremes, heavy rainfall, decreasing snow cover and sea ice, altering plant and animal biodiversity, and threatening global health (An et al. 2018; Gérard et al. 2020; Sutherland et al. 2022; Abbass et al. 2022; Raihan and Tuspekova 2022a). To fight against the threat of climate change, many countries have signed and adopted the Kyoto Protocol (UNFCCC 1997) and Paris Agreement (UNFCCC 2015) treaties. Goal 13 (Climate action) of the Sustainable Development Goals (SDG) was created to achieve a better and more sustainable future for humans and the Earth, and developed countries have agreed to invest US$ 100 billion per year to support developing countries in adopting climate change prevention strategies and investing in low-carbon development (UNDP 2021).

The global mean surface temperature has risen by 1.0 °C since pre-industrial levels, owing primarily to anthropogenic activities, and anthropogenic global warming continues at a rate of 0.2 °C per decade (IPCC 2022). Global warming is caused by greenhouse gases (GHGs) that absorb infrared radiation emitted from the surface of the Earth and prevent it from passing into space; some of this energy is then reradiated back to the Earth (Mann 2022). Major GHGs include water vapor, carbon dioxide (CO2) (the most significant gas), methane (CH4), nitrous oxide (N2O), and fluorinated gases (halocarbons) (Mann 2022). Increasing human population and economic growth have impacted GHG emissions (Oreggioni et al. 2021), and global temperature anomalies have influenced human population distribution (Lehner and Stocker 2015; Jaber 2020). Similarly, gross domestic product (GDP) is an indicator of economic strength, and its effects have been broadly studied and associated with multiple sectors, such as climate change (Dobes et al. 2014), household consumption (Bekhet and Othman 2011), CO2 emissions (Caporale et al. 2021), renewable energy consumption (Bhuiyan et al. 2022), and sustainable development indicators (Adrangi and Kerr 2022). Climate change has an impact on a country’s GDP; when global surface temperature increases by 4 °C, GDP can lose up to 26.6% per year, though this varies by country (Kompas et al. 2018). Moreover, increasing the surface temperature increases the CO2 emissions (Anokye et al. 2021). Thus, data visualization and multivariate causality studies that take into account such potential factors would highlight predictive indicators and support actions toward SDG 13.

Interactive visualization techniques can help with clustering and parameter setting decisions (Chang and Ding 2004; Ray et al. 2021). Large data visualization and patterning summarize scenarios are normally processed through dimension reduction techniques, such as principal component analysis (PCA) (Abdi and Williams 2010), non-metric multidimensional scaling (NMDS) (Rabinowitz 1975), and self-organizing maps (SOM) (Kohonen 2001). In this study, we used SOM, a robust environmental and atmospheric science tool (Bindoff et al. 2013; Sévellec and Drijfhout 2018), to visualize and pattern the possible causal features of global warming.

Most studies have concentrated on the causes of climate change at the global, regional, and local levels, with little attention paid to the effects of regional attribution at the global level (Bindoff et al. 2013; Sévellec and Drijfhout 2018). Local trends may contradict or enhance global warming signals or may not persist (Hansen et al. 2016). Sequential time-series data recorded at a certain interval helps in understanding the underlying causes of systemic patterns and trends over time (Wishart and Wold 1966; Raihan and Tuspekova 2022b). The Granger causality (GC) test was developed in 1969 to determine whether one time series can predict another (Granger 1969). In recent years, GC tests have been extensively used to analyze the nexus between GHGs, climate change, energy consumption, economic growth, and global warming (Kodra et al. 2011; Zhang et al. 2011; Wang 2018; Zolghadr-Asli et al. 2021; Raihan et al. 2022a, 2022b; Raihan and Tuspekova 2022a). The causal variables using the GC test from local label data on global warming have not yet been studied.

We hypothesized that GC analysis may serve as a tool for attribution to the multivariate case of global surface temperature anomalies (GTemp) with GDP, human population (Pop), and carbon dioxide (CO2) emissions of South Korea and examined the significant predictive variables using time-series annual data from 1918 to 2020. This study provides a better understanding of the causal parameters of global warming at the local level, which are essential for mitigating climate change and examining ecological processes.

Materials and methods


Time-series data on production-based CO2 emissions (in millions of tons), GDP (USD), and Pop from 1905 to 2020 were obtained from Our World in Data (Ritchie et al. 2022). GTemps from 1892 to 2021 were obtained from the National Center for Environmental Information (NCEI) (NOAA 2022). To ensure homogeneity and minimize missing values, we analyzed data from 1918 to 2020 (a total of 103 years). The missing data in examined period were interpolated using a simple gradient rule (Eq. 1):

$${Y}_{i}^{{\prime}}= \frac{{({Y}_{i}}_{2}-{{Y}_{i}}_{1})}{\left({X}_{2}-{X}_{1}\right)} \times \left(X{^{\prime}}-{X}_{1}\right)+{Y}_{i1},$$

where \({Y}_{i}^{{\prime}}\) is the interpolated data of variable \(i\) in year \(X{^{\prime}}\), \({Y}_{i1} and {Y}_{i2}\) are the nearest lower and upper boundaries of the available data of variable \(i\) with corresponding years \({X}_{1}\) and \({X}_{2}\), respectively.

Data visualization

We visualized the time-series growth trends of each observed variable (103 years \(\times\) 4 features) and presented a correlation plot including Pearson correlation coefficients between variables. Furthermore, we patterned the time-series multivariate data using an unsupervised machine learning technique, a self-organizing map (SOM), that is widely used for data visualization and pattern identification (Cottrell et al. 2018; Zhang et al. 2011). An SOM is a type of artificial neural network that was first introduced by Professor Teuvo Kohonen in the 1980s and is sometimes called a Kohonen map (Kohonen 1982; Kohonen and Honkela 2007). The SOM consists of input data (\({x}_{1}, {x}_{2}, {x}_{3}, \cdots , {x}_{n})\) and output data (\({O}_{1}, {O}_{2}, {O}_{3}, \cdots , {O}_{M}\)), which is composed of \(M\) neurons in the map grids, and weight vector \({w}_{i}(t)\) describing each output layer at iteration \(t\). An SOM is based on the “winner-takes-all” (WTA) principle (Kohonen 2001). The winner neuron with the shortest distance between each input data and the weight vector is called the best matching unit (BMU), and the SOM finds the optimal distance by continuously updating the BMU, as shown in Eq. 2:

$${w}_{i}\left(t+1\right)= {w}_{i}\left(t\right) +\alpha \left(t\right){h}_{iv}\left(t\right)\left[{w}_{i}\left(t\right)-x\left(t\right)\right],$$

where \(\alpha \left(t\right)\) is the learning rate at time t, \({h}_{iv}\left(t\right)\) is the neighborhood smoothing function i centered in the winner neuron v at time t, and x(t) is the input vector projected from the input dataset at time t. For dimensional reduction and data visualization, SOM with a very small Kohonen network of size 4 \(\times\) 4 nodes connected to input layers was investigated using the Kohonen’ package in RStudio (R Core Team 2022; Wehrens and Kruisselbrink 2022).

Granger causality (GC) test

GC analysis is a robust method for studying the links between time-series data (Granger 1969; Raihan and Tuspekova 2022c). It is a causality function for two scalar-valued cases for ergodic time series {\({X}_{t}\)} and {\({Y}_{t}\)}, and stationary data are defined as \(F\left({X}_{t}|{I}_{t-1}\right)\) the conditional probability of \({X}_{t}\) given the bivariate information set \({I}_{t-1}\) consisting of and \(Lx\)-length lagged vector of \({X}_{t} \left({X}_{t-Lx}^{Lx}\equiv \left({X}_{t-Lx},{X}_{t-Lx+1}..,{X}_{t-1}\right)\right)\), \(Yx\)-length lagged vector of \({Y}_{t}\) (\({Y}_{t-Ly}^{Lx}\equiv ({X}_{t-Ly},{X}_{t-Ly+1}..,{Y}_{t-1}\))), and mathematically expressed using a linear autoregressive model (Raihan et al. 2022c) (Eq. 3):

$$X\left(t\right)=\sum_{j=1}^{m}{A}_{11j}X\left(t-j\right)+\sum_{j=1}^{m}{A}_{12j}Y\left(t-j\right)+\varepsilon \left(t\right) \forall t=\mathrm{1,2},$$
$$Y\left(t\right)=\sum_{j=1}^{m}{A}_{21j}X\left(t-j\right)+\sum_{j=1}^{m}{A}_{22j}Y\left(t-j\right)+n\left(t\right) \forall t=\mathrm{1,2},$$

where \(m\) is the number of lags, matrix \(A\) is the coefficient of the model, and \({e}_{1}\) and \({e}_{2}\) are the uncorrelated white noise series for each time series; that is, \(E\left[{e}_{1}{e}_{s}\right]=0=E\left[{n}_{t}{n}_{t}\right], s\ne t and E\left[{\varepsilon }_{t}{\varepsilon }_{s}\right]=0\) for all \(t,s\). If the variance of \(\varepsilon\) (or \(n\)) is reduced by the inclusion of the \(Y\) (or \(X\)) terms in Eqs. 1 and 2, then it is said that \(Y\) (or \(X\)) Granger (G) causes \(X\) (or \(Y\)). We examined the causality test using the ‘bruceR’ package in RStudio (Bao 2021).

The Augmented Dickey–Fuller test (Dickey and Fuller 1979) was used to check the stationarity of the data, and nonstationary data were differentiated and made stationary. The rule of thumb for the lags for annual data is one (Ivanov and Kilian 2005), and we further confirmed the lag with a partial autocorrelation function (Geurts et al. 1977) using the highest significant spike value. The causal network diagram was plotted using the directed graph theory (Newman 2010). The detailed study flow and steps are illustrated in Fig. 1.

Fig. 1
figure 1

Study flow

Data analysis

The missing covariate data within the study interval was checked, and the GDP data for 1944 and 1945 were missing and were adjusted using Eq. 1. The annual continuous time-series data about the CO2 emissions (178.02 \(\pm 22.44\) million tons), Pop (3.09E7 \(\pm\) 1.42E6 people), and GDP (4.65E11 \(\pm\) 6.06E10 US$) of South Korea with GTemp (− 0.5 \(\pm\) 0.04 \(^\circ{\rm C}\)) from 1918 to 2020 are shown in Fig. 2. The growing trend shows that the CO2 emissions, Pop, and GDP of South Korea continuously increase; GTemp fluctuates at a different time, but it is increasing. The degree of association between the examined data was statistically significant and showed positive trends, as illustrated in Fig. 3 (correlation plot). CO2 emissions and GDP were highly correlated (0.994), followed by CO2 emissions and Pop (0.881), Pop and GDP (0.841), and Pop and GTemp (0.776).

Fig. 2
figure 2

Time-series data about a CO2 (in millions of tons); b population (Pop); c GDP (USD) of South Korea; and d global surface temperature anomalies (GTemp) from 1918 to 2020

Fig. 3
figure 3

Correlation plot with coefficients between CO2 emissions, population (Pop), and GDP of South Korea and global surface temperature anomalies (GTemp)


Patterning using self-organizing map (SOM)

The input data were scaled with the mean values and trained using the SOM algorithm. The SOM was trained with the default initial learning rate (starting from 0.05 and stopping at 0.01) and neighbor radius as two-thirds of all distances of the map units with a small map size (4 \(\times\) 4). The code plot (Fig. 4a), hierarchical cluster dendrogram of SOM outputs (Fig. 4b), and correlation component plans (heatmaps; Fig. 5) are visualized. The code plot shows that each grid of the SOM had at least one observation. Similar feature observations were dropped onto the SOM grid (best-matched units). The feature vectors of Grid V13 have equally weighted CO2, GDP, Pop, and GTemp data, but GTemp is dominant in the V3, V4, V5, V7, V8, V12, V14, V15, and V16 grids. Hierarchical clustering of the SOM output shows that we can further group the data into different clusters, separated into two clusters (V1, V11, and V13 form one cluster, and the rest form the other). The height axis in the cluster dendrogram displays the distance between clusters, and the horizontal bars indicate the points at which the two clusters are merged.

Fig. 4
figure 4

Patterning and visualization of input data in a the self-organizing map (SOM) code plot in (4 \(\times\) 4) grids and b cluster dendrogram of SOM output (\({V}_{i}\) is the SOM output cluster of ith grid).

Fig. 5
figure 5

Self-organizing component maps

The heatmap of each component (component plan, Fig. 5, corresponding to the grid label in Fig. 4a) shows the information of the datasets in each grid for individual variables. The color-scaling illustrates the contribution of variables datasets to each cluster grid; for example, the 14 years of data (Table 1) in grid V13 have the highest scaling values of all variables and low scales of GTemp in V3 (6), V5 (8), and V9 (4) (number of observation years are in parentheses).

Table 1 Clustered data frequency (in parentheses) and years in SOM grids

Causality test

We used all 103 years of data (from 1918 to 2020) for lag selection using the partial autocorrelation function (PAF) and conducted a vector autoregression (VAR)-based GC test. Significant PAF values were identified when the lag ranges for CO2 emissions were identified at lags 1 (0.264), 12 (− 0.253), and 19 (− 0.225); Pop was 1 (− 0.829), 2 (− 0.699), 3 (− 0.511), and 6 (0.285); GDP was 1 (− 0.858), 2 (− 0.734), 3 (− 0.589), and 4 (− 0.360); and GTemp was 1 (0.627), 2 (0.241), and 3 (0.289) (Fig. 6). The highest significant PAF values were identified in the first lag of each variable, and we examined the multivariate GC test using VAR (1).

Fig. 6
figure 6

PAF with different lags of the observed stationary data (blue dashed line: significant interval)

From the SOM analysis, the data were scattered (see SOM Section) and did not capture the continuous time-series data, except for the V10, V11, and V13 grids (Table 1). The cluster dendrogram (Fig. 4b) and V1 grid data were in the same cluster. Therefore, we further tested causality using continuous data from V1 and the most recent 34 years (1987–2020). The GC network diagrams with variables and a combination of all variables except the linked variable (as All) of data are plotted (Fig. 7). The edge value represents the Wald \({X}^{2}\) test result and the corresponding \(p\)-values are in parentheses.

Fig. 7
figure 7

Causality network after Granger test; edge data representing the Wald \({X}^{2}\) (p) values and edge color corresponding to p-values of a 103 years and b 34 years from 1987 to 2020 data. (\(All\in \left\{variables \right|connected variable \notin variables \},\) colorbar indicates the range of \(p\)-values of \({X}^{2}\) test, Figures are licensed under CC-By 4.0)

Based on the Granger causality examining the data for 103 years (Fig. 7a), CO2 emissions are not significantly caused by Pop, GDP, GTemp, and the combination of Pop, GDP, and GTemp; Pop is also not caused by CO2 emissions, but is caused by GDP (99.9% confidence level), GTemp (95% confidence level), and the combination of CO2 emissions, GDP, and GTemp (99.9% confidence level); GDP is caused by Pop (99.9% confidence level) and the combination of CO2 emissions, Pop, and GTemp (99.9% confidence level); and GTemp is only caused by CO2 emissions (90% confidence level). GDP and Pop have bidirectional relationships, whereas GTemp and Pop and CO2 emissions and GTemp have unidirectional causal relationships. The test with the most recent 34 years of data (Fig. 7b) showed similar trends. GDP and Pop are bidirectional (99.9% confidence level), and CO2 emissions and GTemp (99.5% confidence level) have a unidirectional causal relationship; however, it is not caused by Pop, and the combined effect of CO2, Pop, and GDP is significant (95% confidence level) to GTemp.


Reducing greenhouse gas emissions is a goal of climate change mitigation to control global warming, and the impact of climate change has been studied in multiple dimensions (Turner et al. 2009; Israel et al. 2020; Mengist et al. 2020). Econometric modeling is an approach that has been used to analyze the causality, trends, and effects of factors concerning climate change and global warming (Castle and Hendry 2022; Heckman and Pinto 2022). Machine learning techniques to pattern environmental data and their visualization also help in decision-making for sustainable ecological ecosystems (Haq et al. 2022).

In this study, we patterned over a century data of continuously increasing CO2 emissions, Pop, GDP of South Korea, and GTemp data (Fig. 2) using an SOM. The SOM is a widely used neural network technique for clustering and visualizing data (Cottrell et al. 2018; Zhang et al. 2011), when applied and developed as a very small map, 4 \(\times\) 4 (16 grids) in size, where each grid gathered at least 1 year of data (Fig. 4, Table 1) and GTemp data were dominant (Fig. 4a) in all grids. The component maps (heatmaps) illustrate the intensity of each variable in the grids and can visualize the correlation of the scaled variable data (Fig. 5), which are essential features of SOM (Qian et al. 2019). The output of SOM was further visualized in a dendrogram using the hierarchical clustering technique, which made it easier to select recent time-series data by merging some grids (V1, V11, and V13) (Fig. 4b).

Next, we conducted a GC test and developed a causality network model with Wald \({X}^{2}\) and p-values. No correlation implies no causality (Cunningham 2021), and a correlation does not imply causation, but causation always implies correlation (Buchanan 2012). In this study, we also observed a highly significant correlation between the observed variables (Figs. 2, 3, and 5), but only certain variables had significant unidirectional or bidirectional relations obtained from the Granger causality test (Fig. 7).

Climate change is caused by economic activities (Dobes et al. 2014). CO2 causes global temperature change (Sun and Wang 1996; Mozahid and Akter 2022). From the century of data from South Korea and the Granger causality test, we found that only CO2 emissions were unidirectional causable variables for global surface temperature anomalies (p-value < 0.1). Similarly, the causal analysis from the SOM continuous data from 1987 to 2020 (34 years) showed that CO2 and the combined effect of CO2, GDP, and population were significant factors impacting global warming (p-value < 0.01).

The current study showed similar results regarding bidirectional causality between population and economic growth, as reported by (Tsen and Furuoka 2005; Vo and Vo 2021). Many studies have reported a significant causal relationship between economic growth and CO2 emissions (Omri 2013; Radmehr et al. 2021; Raihan and Tuspekova 2022b; Raihan et al. 2022c), which we could not find in this study. The causality test results examining data from recent years showed that CO2 emissions and the integrated effect of CO2 emissions, GDP, and population significantly caused the GTemp. Although GTemp and CO2 emissions are proxies for climate change (Demirhan 2020), unidirectional directional causality was identified from CO2 emissions to GTemp.

This study presents informative relations between observed variables; however, there are some limitations. Global temperature anomalies have a significant relationship with sea surface temperature, but not conversely (Elsner 2007), and anthropogenic activities cause global temperature to increase (Attanasio 2012). Global warming is caused by GHGs, including methane (CH4), nitrous oxide (N2O), and fluorinated gases (halocarbons) (Mann 2022); however, we only examined the relationship between CO2, Pop, and GDP of South Korea and GTemp, owing to data consistency and availability. A detailed study is required that considers the pre-identified and correlated global temperature variables of individual countries and groups of countries. The SOM is a stochastic process (Cottrell et al. 2016), and we used a default case. Sensitivity analysis and clustering with optimal parameters (Ahmed et al. 2020) have not been conducted, which could be a topic for future research. Granger causality does not necessarily constitute a true causal effect (Bao 2021), but comparison with dynamic causal modeling, information theory, and regression modeling (Pereda et al. 2005; Gourévitch et al. 2006; Friston et al. 2013) for time-series analysis could be applied to analyze the causes of global warming with different possible covariates at the local and global scales.


In this study, we introduced different trends and patterns of possible causal variables of global warming by using a self-organizing map. A VAR-based GC test was conducted, and we presented causality networks based on all 103 years, and the SOM suggested recent (34 years) continuous time-series data. We identified CO2 emissions as a significant unidirectional causal variable for the global warming indicator GTemp, whereas GDP and population had a bidirectional causal relationship. The results and visualization technique using SOM and GC network plots could serve as a reference for future scholars and SDG designers.

Time-series data patterning using SOM illustrated the trends and grouped the periodic features that made it easier to segment the data for further analysis. The ecological process modelers and data scientists can refer to the current study. Climate change poses a significant global threat. Carbon emissions in local geographic areas were identified as more causative factors for global temperature anomalies than population and economic growth in South Korea. Carbon emission reduction is a major concern, and governments should develop strategies for the optimal use of renewable energy and to create a green environment. This type of research is essential for decision-making and developing strategies to mitigate global warming in local geographic regions.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.



Global surface temperature


Gross domestic product


Population growth rate

CO2 :

Carbon dioxide


United Nations


Sustainable development goal


Greenhouse gas


Principal component analysis


Generalized discriminate analysis


Linear discriminate analysis


Non-metric multidimensional scaling


Self-organizing map


Granger causality


National Center for Environmental Information


Best match unit


Vector auto-regression


‘Is an element of’


Download references


This research was supported by a grant from the National Institute of Ecology (NIE-C-2022-90) funded by the Ministry of Environment (MOE), and the Korea Environmental Industry & Technology Institute (2022003570001) in the Republic of Korea.

Author information

Authors and Affiliations



All authors contributed equally to this work. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Do-Hun Lee or Gab-Sue Jang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhakal, T., Kim, TS., Lee, DH. et al. Examining global warming factors using self-organizing map and Granger causality network: a case from South Korea. Ecol Process 12, 9 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: