Correlation of Satellite-based LAI and actual crop yield

.


INTRODUCTION
Crop yields have a high impact on the economic sustainability of any world country.Accurate prediction of crop yield at the field scale is important for precision agriculture to understand crop production response to agronomic management practice and environmental stress.There are a lot of works related to using satelliteretrieved LAI in different prediction models to estimate yield production around the world (Aboelghar et al., 2011;Aboelghar et al., 2010;Liu et al., 2020), where the methodology is based on regressing measured yield with satellite-derived spectral information or the leaf area index.However, there isn't any work on the use of satellite-based LAI in prediction models to estimate yield production in Ukraine.Prior to building any prediction model, it is necessary to observe the relationship between explanatory and target variables.This is the reason why the correlation between the actual yield of the fields in Ukraine and the satellitebased LAI has to be quantified.
The retrieval of crop biophysical variables from remote sensing falls into two categories: empirical and physical modeling approaches (Mourad et al., 2020).The simplest method of estimating LAI from remote sensing is by establishing an empirical relationship between the remotely sensed vegetation indices (VIs) and measured LAI, referred to as the LAI-VI approach (Baret et al., 1991;Broge et al., 2001).Vegetation indices are computed based on the reflectance in two or more spectral bands and reflect biophysical characteristics of the plant canopy such as greenness, biomass, and LAI (Baghzouz et al., 2010;Huete et al., 1996).VIs that have shown a good correlation with LAI are normalized difference vegetation index (NDVI) (Deering, 1978), soil adjusted vegetation index (SAVI) (Huete, 1988) and enhanced vegetation index (EVI) (Huete, 1997).
The physical modeling approach involves the use of radiative transfer models (RTMs) to simulate the canopy spectral reflectance and the inversion of RTMs to obtain the required parameters (Campos-Taberner et al, 2016;Féret et al, 2017).Because the inversion of an RTM model can be very computing-intensive, precomputed look-up-tables (LUTs) are often employed for operational use, as in the MODIS LAI product main algorithm (Myneni et al, 2002).Another modeling technique is the retrieval of LAI biophysical parameters based on neural networks, such as the algorithm implemented in the Sentinel Application Platform (SNAP) biophysical processor tool (Weiss et al, 2016) developed by the European Space Agency (ESA).The last one was used as a source of LAI data in this research.
The present work aims to perform a correlation analysis between Sentinel-2 LAI and actual crop yield data at the field level in the regions of Ukraine.

DATA AND METHODS
The actual yield data for this study were available for 2364 agricultural fields located in 3 regions of Ukraine (Fig. 1): Vinnytsia (about 80% of fields), Khmelnytskyi and Cherkasy.For each field of all the cases, the spatial average LAI was calculated for all available cloudless satellite images within following periods: Spring crops 2016 -20.06.2016-18.10.2016(21 images with a mean interval of 6 days); Winter crops 2016-2017 -06.05.2017-25.07.2017 (20 images with a mean interval of 4 days); Spring crops 2017 -25.06.2017-19.08.2017 (18 images with a mean interval of 3 days); Winter crops 2017-2018 -29.04.2018-12.07.2018 (30 images with a mean interval of 3 days); Spring crops 2018 -10.06.2018-21.09.2018 (44 images with a mean interval of 2 days).This is the very question: what value of LAI do we have to take as base for comparison with actual yield (mean, maximum or certain timestep of vegetation period)?A lot of discussions have been found related to this issue (Kayad et al, 2022;He et al, 2021).In paper (Kayad et al, 2022) LAI selection value was based on the development stage, which is mostly correlated to the final yield.Since the development stage data was unavailable in our research, we used the approach proposed in (He et al, 2021).The average LAI values of above-mentioned time series were selected for correlation analysis.
The first preprocessing step of LAI dataset was to make temporal interpolation/extrapolation for each period and create time series data with a 1-day interval.To implement this step, the following 4 different approaches were used to fill the time gaps: 1) simple interpolation (linear); 2) polynomial interpolation (spline); 3) ARIMA (AutoRegressive Integrated Moving Average) and 4) LOCF (Last Observation Carried Forward).Fig. 2 contains an example of LAI calculated with all the above mentioned techniques alongside raw satellite LAI for one field.The second preprocessing step of prior correlation was to apply local polynomial regression fitting (loess function) (Cleveland et al, 2017) to created 1-day interval LAI time-series dataset.The loess function was applied with different degree of smoothing: 0.1, 0.3, 0.5 and 0.75, which is shown in Fig. 3-5 and Fig. 6 accordingly.With the figures above, it is clear that an increasing in the smoothing degree leads to an amplitude decrease in the LAI dynamics.
In order to accomplish the correlation analysis, the average time-series LAI values for each field were taken into account.
For 5 datasets with 4-time series, gap-filling methods of LAI and 5 different time series smoothing degrees (including the 0 degree of smoothing), the "LAI vs. YIELD" correlation analysis has been conducted.
All technical work was performed using the R programming language.

RESULTS
In the tables below, the average correlation coefficients between LAI and the actual yield grouped by gap-filling methods and smoothing degrees are presented for each crop, along with significance level of the correlation coefficients (p-value).
Table 1 shows maize's correlation coefficients calculated and averaged for 3 vegetation periods (2016)(2017)(2018).Based on the results, the highest correlation coefficient is 0.4 (p <0.05), which is considered low according to Table 2.It is clear that the smoothing process for the time-series LAI dynamics of maize fields leads to increasing (insignificantly) the correlation coefficients.The best gap-filling method on average is ARIMA.(Hinkle et al, 2003) Size of Correlation Interpretation 0.9 to 1 (-0.9 to -1) Very high positive (negative) correlation 0.7 to 0.9 (-0.7 to -0.9) High positive (negative) correlation 0.5 to 0.7 (-0.5 to -0.In Table 3, the correlation coefficients for soy are calculated and averaged for 3 vegetation periods (2016)(2017)(2018).Based on the results, the highest correlation coefficient is 0.52 (p < 0.05) which is considered moderate according to Table 2.In this case, the timeseries LAI dynamics smoothing process does not improve the correlation coefficients.The best gap-filling methods on average are LOCF and ARIMA.The average correlation coefficients of sunflower calculated for 3 vegetation periods (2016-2018) are presented in Table 4.The highest correlation coefficient is 0.39 (p <0.05), considered as low (see Table 2).The smoothing process for the time-series LAI dynamics does not improve correlation coefficients.The best gap-filling methods on average are linear and spline interpolation.The average correlation coefficients of winter barley calculated for 2 vegetation periods (2016)(2017)(2017)(2018) are presented in Table 5.For this crop, the correlation coefficient is the highest (0.86) compared to other crops, but the p-value is greater than 0.05.Thus, there is insufficient evidence to conclude that the high correlation coefficient of 0.86 for this crop is significant (one of the reasons is the small number of fields).The smoothing process for the time series LAI dynamics slightly improves the correlation coefficients.The best gap-filling method on average is LOCF.In Table 6, the correlation coefficients for winter rapeseed are calculated and averaged for 2 vegetation periods (2016)(2017)(2017)(2018).Based on the results, the highest correlation coefficient is 0.54 (p <0.05), which is considered moderate according to Table 2.In this case, the time series LAI dynamics smoothing process does not improve the correlation coefficients.The best gap-filling methods on average are LOCF and ARIMA.The average correlation coefficients of winter wheat for 2 vegetation periods (2016)(2017)(2017)(2018) are presented in Table 7.The highest correlation coefficient is 0.5 (p <0.05), which is considered low/moderate (see Table 2).The smoothing process for the time-series LAI dynamics slightly increases the correlation coefficients.The best gap-filling methods on average are linear interpolation and ARIMA.The average correlation coefficients for all crops are similar and close to 0.5 (p <0.05) despite different gap-filling techniques and degree of smoothing (Table 8).Although the variation in the correlation coefficient is low among various gap-filling techniques and smoothing degrees, the LOCF and ARIMA have the highest values with the 0.1 degree of smoothing.
To visualize relationships between the mean time series LAI and the actual yields of crops for all cases (Fig. 8-12), the ARIMA gap-filling techniques were chosen by the 0.1 degree of smoothing for the preprocessing LAI time-series dynamics.

CONCLUSION
Based on the obtained results, the average correlation coefficient between Sentinel-2 LAI and the actual crop yield is about 0.5 (p <0.05), which is considered low/moderate (with a correlation coefficient of 0.4 for maize, 0.52soy, 0.39sunflower, 0.86winter barley, 0.54winter rapeseed and 0.5winter wheat).The hypothesis test of the "significance of the correlation coefficient" shows the significance level of p <0.05 for all crops except winter barley (there is insufficient evidence to conclude that a high correlation coefficient of 0.86 for this crop is significant).The dependency of correlation coefficients and the number of fields was observed.
The two steps of LAI dynamics preprocessing lead to a slightly increasing correlation coefficients.Thus, a smoothing process for the time-series LAI dynamics at the 0.1 degree of smoothing slightly improves the correlation coefficients on average.The best gap-filling methods on average are LOCF and ARIMA.
We also made attempts to use the maximum value of LAI (instead of mean), as well as values at certain intervals of the growing season.However, these manipulations did not improve the correlation coefficients (therefore, the results of this attempts were ommited).
In our point of view low correlation between satellite-derived LAI and actual yield depends on uncertainties of retrieval LAI using spectral measurements in visible and infrared channels from loworbiting satellites (Yan et al, 2019).Also, in our opinion it depends on specific vegetation condition of different crop types, their biophysical development and producing of plants green biomass.
Thus, our results show that the real relationship between LAI and final actual yield for different crop types are not strong enough, which is the main goal of this article.Creating a linear crop yield prediction model using only LAI derived from Sentinel-2 will not be efficient.Based on this conclusion we need to use more complicated methods to predict crop yield (Machine Learning, Deep Learning etc.) Today there are a lot of low-orbital satellites (Sentinel-2, Landsat, Planet, SPOT) which can produce LAI on a regular basis that open another way of using LAI (in respect to crop yield prediction) as assimilation into biophysical models (use the output of these models directly or as predictors) (Dente et al, 2008;Fang et al, 2011;Ma et al, 2013;Curnel et al, 2011;Tewes et al, 2020;Peng et al, 2021).Satellite-based LAI assimilation into biophysical models is subject of further research, which can be realized as additional module in the crop growth monitoring system in Ukraine (CGMS-Ukraine) (Kryvobok et al, 2018;Kryvoshein et al, 2020).We hope that this improvement will increase the accuracy of crop yield prediction.

Fig. 1 .
Fig. 1.Location of agricultural fields the actual yields data were available for

Table 1 .
Averaged correlation coefficients for maize grouped by gap-filling method and smoothing degrees Degree

Table 2 .
Correlation coefficient interpretation

Table 3 .
Average correlation coefficients for soy grouped by gap-filling methods and smoothing degrees Degree

Table 4 .
Average correlation coefficients for sunflower grouped by gap-filling methods and smoothing degrees Degree

Table 5 .
Average correlation coefficients for winter barley grouped by gap-filling methods and smoothing degrees Degree

Table 6 .
Average correlation coefficients for winter rapeseed grouped by gap-filling methods and smoothing degrees Degree

Table 7 .
Average correlation coefficients for winter wheat grouped by gap-filling methods and smoothing degrees Degree

of smoothing Time-series fill gap method of LAI
Relationship between the obtained correlation coefficients and the number of crop fields

Table 8 .
Average correlation coefficients for all crops grouped by gap-filling methods and smoothing degrees Degree