Modeling fires based on the results of correlation analysis

In order to monitor and study in more detail the causes and probability of the occurrence and spread of fires in the east of Ukraine in the combat zone, mathematical modeling of the factors influencing the occurrence of fires based on linear regression was performed in this study. The initial assessment of a priori information presented in a discrete form is a time-consuming process. A large dataset with a time interval requires application of ready-made methods and solutions. By applying statistical analysis techniques and historical analogies, it becomes possible to visually and graphically evaluate the initial data. This evaluation serves as the foundation for classifying factors, which enables their division into samples for subsequent analysis and modeling. The expediency of application of correlation analysis is demonstrated by its ability to establish and illustrate the connections between fires and hostilities across different time intervals. To examine the connection between fires and the factors contributing to their occurrence, the widely used method of linear regression was applied, which is common in solving problems of ecological monitoring of the Earth.


Introduction
Even before the start of the military conflict (before 2014), the environmental condition of Donbass, the most technologically loaded region, caused serious concerns.Given the circumstances of the ongoing hostilities, the environmental situation in eastern Ukraine has the potential to become catastrophic.Consequently, there is a need to build an integrated monitoring system with a full set of factors influencing the environmental situation, their further assessment and forecasting the dynamics of their changes, on the basis of which systems of decision-making rules are built.
As a result of military maneuvers or military exercises, the construction of fortifications, explosions and ignition of ammunition, active hostilities, the probability of a fire occurrence increases.The fire is considered one of the main causes of the sharp depletion of the world's forest ecosystems among anthropogenic natural processes (Yongqi Pang et al., 2022;Venkatesh K. et al., 2020; __________ *E-mail: a.topchiy@khai.eduSachdeva S. et al., 2018), and the topic of forest fire prevention and the development of reliable forecasting models is a key topic of many studies in the field of ecology and forestry (Yongqi Pang et al., 2022;Venkatesh Prasad A. M. et al., 2006;Artés T. et. al., 2017;Avilaflores D. Y. et. al., 2010;Ko B. C. et.al., 2007;Liao B. Q. et. al., 2008;Bhusal S. et. al., 2008).Thus, fire occurrence is one of the important parameters of the monitoring system that affects the ecological situation in the region.
The outbreak of the war virtually paralyzed many aspects of environmental protection in eastern Ukraine.The consequences of the initial destruction of the environmental protection system in the war zone have led to the fact that some of the lost archival information has not been restored, environmental monitoring is not carried out on part of the territory, there are problems with logistical support and a shortage of specialists.Only in recent times has attention been directed towards the issue of environmental consequences arising from hostilities.In the context of a lack of a priori information for solving monitoring problems, it is relevant to use remote sensing or Earth observation (RSE) data, methods of mathematical statistics and analysis to build decision-making rules and modeling based on them.In this study, mathematical analysis relies on a combination of freely available data from open sources, media sources, daily reports from the combat zone, and archived records of battles in eastern Ukraine.The compilation of a comprehensive set of parameters based on these sources is a meticulous and time-consuming process.At present, a part of the territory lacks environmental monitoring, there is no reliable information regarding the extent of damage.A secrecy regime further aggravates the situation, impeding or even prohibiting the work of state environmental inspections in the Donetsk and Lugansk regions.

Review of existing methods for analyzing the occurrence of a fire
As it is known, forest fire forecasting is divided into empirical models based on statistical methods and machine learning.Statistically based methods include statistical analysis and correlation methods.Statistical analysis consists in collecting meteorological data related to historical fires, analyzing meteorological data (weather conditions, landscape features of the study area) and the frequency of historical fires.At the next stage, the connection between fires and various meteorological factors is determined, which gives a quantitative assessment of the factors of maximum and minimum influence on the occurrence of fires in the study area (Xufeng Lin et al., 2023).
Barm Putis et al. (Xufeng Lin et al., 2023;Barmpoutis P. et al., 2020) created a fire early warning model using optical remote sensing technologies.Sakr et al. (Xufeng Lin et al., 2023;Sakr G.E. et al., 2011) used two weather parameters to effectively predict the occurrence of forest fires in developing countries.Pradeep et al. (Xufeng Lin et al., 2023;Pradeep G. S. et al., 2021) used GIS tools to classify forest fire hazard zones, and Gulchin et al. (Xufeng Lin et al., 2023;Gülçin D. et al., 2020) built a forest fire risk map using GIS.To predict the characteristics of forest fires, Maffei et al. (Xufeng Lin et al., 2023;Maffei, C. et al., 2021) used a combination of multispectral and thermal remote sensing data to predict forest fires.An unsteady physical model of fire spread is detailed, which describes the initiation and development of eruptive fires with an induced wind sub-model and has been researched by Balbi et al. (Balbi et al., 2014).
Linear models have been developed as the simplest mathematical approach to machine learning.Cunningham et al. (Xufeng Lin et al., 2023;Cunningham A. A. et al., 1973) applied Poisson regression models to forest fires and used them to predict future fire hazards.Shi et al. (Xufeng Lin et al., 2023;Shi S. et al., 2018) developed a forest fire risk assessment model using logistic regression.The results showed that the established logistic regression model can predict the occurrence of forest fires well.Kalantar et al. (Xufeng Lin et al., 2023;Kalantar B. et al., 2020) predicted wildfire susceptibility based on machine learning and a resampling algorithm.Qiu et al. (Xufeng Lin et al., 2023;Qiu J. et al., 2021) used time series data from Landsat on the basis of which machine learning was carried out for the quantitative analysis of forest fires.
Based on the study of previously applied methods and algorithms, it can be concluded that due to data classification at the initial stage for further recognition and prediction by regression methods, deep learning works better than traditional learning methods for a large set of initial data.Thus, forecasting and building regression models is a relevant method for solving environmental monitoring problems.The purpose of this study is to obtain, from experimental data, mathematical models that describe the behavior of fires depending on the hostilities.

Analysis of the object of research and initial data
The object of the study is the territories of Donetsk and Lugansk regions located in the east of Ukraine.Donetsk region is located in the steppe zone of the southeast of Ukraine, washed by the Sea of Azov.Predominantly plains, indented with gullies and ravines, predominate.The highest place in the region is an unnamed height of 336 m, located near railway stops.The climate is temperate continental, with little snow in winter and hot summer.Average temperatures in January are from -5 to -8°C, in July 21-23°C.Precipitation is about 500 mm per year.In the spring there are dry winds (more often in May), sometimes dust storms, hail may occur.
The Lugansk region is located in the extreme east of the country, mainly in the basin of the middle reaches of the Seversky Donets.The surface is an undulating plain, which rises from the valley of the Seversky Donets to the north and south, where the Donetsk Ridge is located.The climate is temperate continental.The average temperature of the warmest month (July) is +21°C, and the coldest month (January) is -7°C.Winter is relatively cold, with sharp east and southeast winds, frosts.Summer is sultry, its second half is noticeably dry.Autumn is sunny, warm and dry.Annual precipitation is 400-500 mm.
The survey data consists of three parts: fire data, meteorological data, and terrain data.Flash point data was obtained from the NASA Global Fire Atlas with characteristics of individual fires (https://firms.modaps.eosdis.nasa.gov/).The Global Fire Atlas is a global dataset that tracks the daily dynamics of single fires.Information about fires was added to the global fire atlas based on the collection of Moderate Resolution Imaging Spectroradiometers (MODIS) (Yongqi Pang et al., 2022).
In this study, data on fire points in the Donetsk and Lugansk regions from 2012 to 2018 were utilized, organized into 2-year increments.Meteorological data, comprising temperature, atmospheric pressure, wind speed, and precipitation, were obtained from open weather archives and subsequently sorted, systematized, and classified.
Data on the presence of hostilities were obtained from open sources.Like other statistical data of the study, they were sorted, systematized, classified and tabulated for further application of the methods of mathematical analysis and the construction of mathematical models based on them.
For the primary analysis of the relief of the Donetsk and Lugansk regions, SRTM data (https://www.usgs.gov/)were obtained to build a digital model (DEM) of the study area (Fig. 1).

Statistical analysis of fires
Currently, statistical graphs are the most widely used method of presenting data analysis.The visual display of such data allows you to provide more information in a clear form, as well as to perform an initial assessment and, based on it, the classification of the data set.There are many programs and tools used for statistical analysis, one such language is Python, which, in addition to being free, allows advanced data processing and is compatible with complex models.Python is one of the most widely used programming languages and has wide functionality for statistical data analysis and visualization (Logroño Naranjo et al., 2022).That is why the Python development environment was chosen for fire analysis.For data visualization, the matplotlib library is used, which also allows you to build various sets of charts.Visualization of the initial set of data on fires for the regions is presented in Fig. 2, 3. Using the method of visual assessment of historical data on fires in the regions (Fig. 2, 3), it can be concluded that the largest number of fires in Donetsk and Lugansk regions occurred in 2014, which is confirmed by the aggravated conflict situation in the east of the country.The primary analysis of the a priori set of information is confirmed by the graphical presentation of the data (Fig. 4, 5).In order to determine the day and year with the largest number of fires in the regions, it is necessary to find the value of the feature that has the highest frequency in the statistical distribution series.Fashion is determined in different ways and it depends on whether the variable feature is presented as an interval or discrete series.For a discrete series, find the value with the highest frequency (Table 1).The considered approach makes it possible to form samples for further mathematical modeling based on the linear regression method.

Mathematical modeling of factors influencing the occurrence of fires based on linear regression
The goal of all existing methods of mathematical statistics is to study the connection between variables.Linear regression is widely used in environmental monitoring tasks, so it was chosen as a method of mathematical modeling of fires based on active hostilities in the east of Ukraine, the tool for building the resulting models is the Python development environment.The regression equation looks like: where: Y is the dependent variable; βo, β1, βn are regression coefficients; Xindependent variables.
For the analysis of fires, data were taken for the Donetsk and Lugansk regions for 2012, 2014, 2016, 2018.Based on the method of historical analogy, it is known with 100% accuracy that the fighting in 2012 in the territories were not carried out, therefore, it makes no sense to analyze fires due to hostilities in the eastern territories of the country, which means that the data for 2012 excluded from the analysis.For the linear regression algorithm to work at the initial stage, it is necessary to conduct a correlation analysis between such parameters as fire and hostilities (Fig. 6, 7).The samples were divided into two sets of random variables (CV)training (train) and test (test).The volume of the test sample is 40%, and the training sample is 60% (Fig. 8, 9).(2) The coefficients m and c were obtained (Table 2, 3), on their basis, a calculation was made to obtain an array Y for the test sample (Y_test) and training (Y_train).The values on the fires and fights axis are not integers because they were converted by the built-in software packages into data arrays, however, in the initial statistical data, the fire values mean the intensity of the fire, and fights contain information about the number of hostilities in the territory for a certain date.The intensity of fires should have an additional classification, since it also depends on a number of factorsthe presence of an industrial zone in a given radius, the type of object or territory, the type of infrastructure, etc.This will allow the intensity of fires corresponding only to combat operations to be included in the datasets.The inaccuracy of the initial data is confirmed by the simulation resultsthe maximum number of shots does not always correspond to the maximum intensity of fires.
To assess how the obtained predictions meet the objectives of the study (0 -"good", 1 -"bad"), quantitative estimates of test and training models were obtained (Table 4, 5).The values are given up to 3 digits because when analyzing the simulation results, their difference is too small and, when rounded, it turns out to be insignificant or equal to 0. This is due to the fact that we have an incomplete data set for the study.Thus, comparing the obtained results for the data of the Donetsk region, the fire modeling performed better on the test dataset.However, for 2014 and 2018 a slight difference is observed.At the same time, according to the data of 2016, a quantitative analysis of the models shows that the modeling was carried out better for the test data set (~ 2 times).
For the Lugansk region, there is a slight difference between the two samples.This is due to the fact that the initial data set for Lugansk Oblast is smaller than for Donetsk oblast.

Conclusions
In this study, on the basis of experimental data, mathematical models of fires were obtained and a correlation analysis was conducted to investigate the connection between fires and hostilities in the eastern region of Ukraine.
The obtained mathematical models confirm the need for a complete set of all factors of influence in order to form more accurate samples.The results of the regression analysis confirm that data modeling depending on the intensity or severity of influencing factors is one of the most difficult tasks of statistical analysis in environmental monitoring problems.Formation of a set of primary data is a laborious process.
The regression analysis has revealed the necessity for employing supplementary analytical methods and exploring alternative solution approaches to obtain more effective mathematical models.
For problems with a large amount of data, regression analysis cannot be performed manually and requires the use of statistical packages, but the advantage of linear methods is their simplicity, which is due to the frequency of application in solving environmental problems.

Fig. 2 .Fig. 3 .
Fig. 2. Fires in the Donetsk region in the period from 2012-2018 in two year increments

Fig. 4 .Fig. 5 .
Fig. 4. Analysis of the number of fires in the period from 2012-2018 on territory of Donetsk region

Fig. 8 .Fig. 9 .
Fig. 8. Dividing the sample into training and testing for the Donetsk region

Table 2 .
Coefficients m and c for Donetsk region

Table 3 .
Coefficients m and c for Lugansk region

Table 4 .
Evaluation of the mathematical model for the Donetsk region

Table 5 .
Evaluation of the mathematical model for the Lugansk region