Skip to main content

Detection storage time of mild bruise’s yellow peaches using the combined hyperspectral imaging and machine learning method

Abstract

To deduce the process of bruise and reduce the number of bruised fruits from the source, the storage time of yellow peaches after bruise should be identified. In order to distinguish the different storage times of mild bruise’s yellow peaches more effectively than current detection methods, the combined hyperspectral imaging and machine learning method was proposed. Firstly, the sample bruise region spectrum was extracted as spectral features, and then, the hyperspectral images were processed by Principal Component Analysis (PCA), and eight single-wavelength images were selected according to the weight coefficient curve of PC1 images, and the gray values of the selected images were calculated as image features. Finally, in order to find the optimal discriminative model, random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost) models were built based on spectral features, image features, and spectral features combined with image features, respectively. The results show that the XGBoost models based on spectral features, image features, and spectral features combined with image features are the optimal models with the overall accuracy of 77.50%, 87.50% and 90.00%, respectively. To simplify the model, Competitive Adaptive Reweighted Sampling (CARS) algorithm was used to screen the wavelength of the normalized spectral data, and then they were fused with the image feature data again and the XGBoost model with an overall model accuracy of 95.00% was built. To sum up, the combined hyperspectral imaging and machine learning method can be used to distinguish the different storage times (2 h, 8 h, 24 h and 48 h) of mild bruise’s yellow peaches effectively. It provides a certain theoretical basis for hyperspectral imaging technology in fruit bruise detection.

Introduction

With the standard of living and demand improving, China's fruit industry has become a trillion-dollar industry. Customers give the quality and safety of fruits more and more attention. Yellow peaches contain a large amount of vitamin C and many trace elements required by the human body, and they also have the function of preventing diseases such as anemia and beauty care (Huang et al. 2021; Liu et al. 2020). In recent years, yellow peaches are more and more popular in market. However, due to the texture of yellow peaches being soft, they aren't resistant to storage (Yu et al. 2020). In order to make yellow peaches better for sale, a series of treatments such as selection, grading, and packaging are carried out on them, and the bruise, which is one of the main physical damages causing degradation and post-harvest losses of horticultural products, inevitably occurs in those process (Opara and Pathare 2014). The bumping of fruits accelerates their ripening and spoilage, and it is more likely to harbor molds and bacteria in the bruised area (Luo et al. 2019). Yellow peaches are often processed into canned goods. If the storage time of the bruise's yellow peaches is short, the quality of canned goods may not be affected. However, if the storage time of the bruise's yellow peaches is too long, the quality of canned goods will be affected. Therefore, it plays a vital role in finding the method which can be used to distinguish the different storage times of bruise’s yellow peaches for the commercial value of yellow peaches and similar fruits. From a practical point of view, it is possible to reduce the occurrence of bruises at the source by backtracking to the point where bruises occur, and then the targeted improvements to the process are made. Therefore, it is important for businesses to distinguish the storage time of bruised yellow peaches.

In the past 20 years, machine vision (Fan et al. 2020; Bennedsen et al. 2005), near-infrared spectroscopy (Guo et al. 2021; Zhang et al. 2013), thermal imaging (Kim et al. 2014; Baranowski et al. 2012), structured light (Li et al. 2018; Lu and Lu 2017), and hyperspectral imaging detection (Munera et al. 2021; Tan et al. 2018) are widely used to detect the bruise of fruits. Among these techniques, since hyperspectral imaging technology is based on very much narrow-band image data technology, and it combines imaging technology and spectral technology to detect two-dimensional geometric space and one-dimensional spectral information of the target, and the continuous and narrow band image data of hyperspectral resolution is obtained, which makes it a hot technology for detecting fruit collision. For example, Luo et al. (2019) used hyperspectral imaging to obtain images of sound samples and bruised samples of green-skinned, intermediate-colored-skinned (green–red), and red-skinned (dark-red) apples. The images in the visible near infrared (Vis–NIR), visible (Vis) and near infrared (NIR) spectral regions were subjected to principal component analysis (PCA), and then three key wavelength images were obtained for subsequent analysis based on the weighting coefficients of the PC images. Finally, the improved watershed algorithm was used for collision detection, and the overall accuracy was 99.5%. Zhang et al. (2022) proposed a method for predicting mechanical parameters after impact damage to apples based on 900–1700 nm hyperspectral imaging. Firstly, the spectra were extracted based on the selected regions. Then, principal component analysis (PCA) and successive projections algorithm (SPA) were used to select the feature wavelengths. Finally, a support vector machine (SVM) model based on the full waveband and selected wavelengths was used for nondestructive detection of mechanical parameters. Yuan et al. (2022) obtained the reflection (R), absorption (A) and Kubelka–Munk (K-M) spectra of Lingwu jujube after bruising by hyperspectral imaging technology, and the feasibility of detecting the early damage degree of Lingwu jujube based on absorption spectra using different preprocessing methods and modeling approaches was demonstrated.

Some previous studies have been done to detect the storage time of fruits after bruising by hyperspectral imaging. For example, Chen et al. (2017) collected 480 hyperspectral images of intact samples and samples from 1 to 7 days after damage, and 60 kulle pears were used as the study object. The spectral data were processed by wavelet transform (WT), and 19 feature wavelengths were extracted from the full-spectrum information using second-order derivatives, and a support vector machine (SVM) model was established based on the full spectrum and the extracted feature wavelengths, respectively. And the recognition rate of both prediction sets reached 93.75% in the discriminant analysis models based on the full spectrum and the feature wavelengths. Yuan et al. (2021) used the hyperspectral technique to collect five time-point average spectra of Lingwu long jujube. Based on raw data and pre-processed spectra, a partial least squares discriminant analysis (PLS-DA) classification model was established; then, various variable selection methods were used to select the feature variables; and finally, a PLS-DA model based on the feature variables was developed; the results showed that the hyperspectral imaging technology combining with PLS-DA could distinguish bruise’s Lingwu long jujube at different storage times. Tang et al. (2020) exposed the bruised area of the apple at room temperature for different periods, and spectral data of these samples were collected based on hyperspectral technology; then piecewise nonlinear curve fitting (PWCF) was used to extract the feature descriptors; finally, the feature descriptors were fed into an error correction output coding-based support vector machine (ECOC-SVM). The experimental results showed that the method effectively classified the storage time of bruised apples. These scholars come true the detection of fruit defects and storage time of mild bruise’s fruit based on hyperspectral technique and the satisfactory results are achieved. However, it is not guaranteed that the storage time of all fruits after bruising can be distinguished only based on the modeling of spectral data. So, we should make full use of the advantages of hyperspectral “union of imagery and spectrum,” and use the image features to classify the storage time of fruits after bruising.

Therefore, to distinguish the different storage times of mild bruise’s yellow peaches more effectively, the combined the mild bruise of yellow peaches’ hyperspectral images spectral and image features at 2 h, 8 h, 24 h and 48 h and machine learning method is used to model. It provides a theoretical basis for hyperspectral imaging technology in fruit bruise detection.

Materials and methods

Sample preparation and processing

The yellow peaches used in the experiment were bought from Linyi of Shandong Province. Before the experiment, the samples were screened to remove the defective ones. To prevent the size and temperature factors of samples from interfering with the experimental results, in the experiment, 80 yellow peaches with a diameter of about 80 mm were used, and the selected yellow peaches were placed in the dark environment at 24 °C for 12 h. For the bruise treatment of yellow peaches in the experiment, the iron ball with a diameter of 30 mm and a weight of 100 g was fixed on the falling ball impact test machine, and it was 0.4 m away from the surface of yellow peach. At this time, the gravitational potential energy of the iron ball was about 0.4 J. It was assumed that all the kinetic energy converted from the gravitational potential energy of the iron ball was absorbed by sample. Hence, the free fall of the iron ball striking the surface of the yellow peach caused mild damage to it (Li et al. 2021a). The bruise’s yellow peaches were stored at the room with the temperature of 24 °C and the relative humidity of 65%. Then the hyperspectral images of samples were collected at 2 h, 8 h, 24 h and 48 h.

Acquisition and correction of hyperspectral images

The experimental data were collected by the Gaia hyperspectral sorter, whose composition structure schematic and collected 3D data cubes are shown in Fig. 1. The system hardware consists of an imaging spectrometer, four halogen lamps (20 W), displacement stages, and stepper motors. The software of Spectral View is used to obtain data.

Fig. 1
figure 1

Schematic diagram of the hyperspectral imaging system

Before image acquisition, the hyperspectral system needs to be turned on and it is preheated for 0.5 h to prevent extraneous factors from interfering with experimental results. The parameters are: camera exposure time is 6 ms; the spectral resolution is 3.5 nm; the spectral range is 397.5–1014 nm; displacement platform movement speed is 2.5 m/s. Operation steps: 1. The sample is placed on the moving platform; 2. The moving platform is opened until the moving platform’s motion is finished, and then the save button is clicked; 3. A 3D data cube containing image information and spectral information is recorded by computer software.

The acquisition of experimental data is affected by the current dark of the camera. In order to reduce this influence, the original images need to be corrected with black and white (Shao et al. 2019). The calculation process is shown in formula (1):

$$R = \frac{{R_{0} - B}}{W - B}$$
(1)

where R is the corrected image, R0 is the original image, B is the blackboard calibration image, and W is the whiteboard calibration image.

Data processing

Principal component analysis (PCA)

Principal component analysis (PCA) is an orthogonal linear transformation technology. It can transform the original data into a set of linear independent representations of each dimension through linear transformation to extract the main linear components of the data. PCA has been widely used for defect detection of food and agricultural products based on hyperspectral image data and the satisfactory results are obtained by it (Cheng et al. 2004; Qin et al. 2008; Huang et al. 2015). PCA transforms hyperspectral images into principal component (PC) image sequences by computing linear projections of the spectral data (Kara and Dirgenali 2007). The covariance matrix of the image is used to calculate the weighted value, and then the linear sum of the original image of each wavelength is multiplied by the corresponding weight coefficient (eigenvector) to form the PC image of each band. In this study, the acquired hyperspectral images are analyzed by PCA to highlight the yellow peach bruise region.

Selection of the characteristic wavelength

Because of the fusion of spectral features with image features, the amount of data is increased, and the redundant information is added. Competitive adaptive reweighted sampling (CARS) is used to screen wavelengths to reduce the number of data dimensions.

CARS is a simple and effective method for wavelength selection based on the survival of the fittest and regression coefficients. The absolute value of the regression coefficient is used as an index to measure the importance of wavelength. The exponential decay function is introduced to control the retained rate of wavelengths. The minimum root means square error subset of the cross-validation is selected according to the cross-validation method. The contained variables are the best wavelength combinations (Li et al. 2009).

Model building and evaluation

Random forest (RF) is a classifier containing multiple decision trees, and its output category is determined by the mode of the category output by individual trees. A particular advantage of this model is that the set of validation variables is randomized for each split and this makes the method to be particularly suitable for multi-feature prediction (Breiman 2001; Harel et al. 2020).

Support vector machine (SVM) is a method based on statistical learning theory. SVM can effectively overcome the drawbacks of neural network methods in difficult convergence, solution instability, and poor generalization (i.e., generalization or prediction ability). SVM minimizes the risk of structuring by balancing the errors in the training and test sets to obtain better classification accuracy (Vapnik 1999; Tian et al. 2020).

Extreme gradient boosting (XGBoost) can be seen as an improved version of the GBDT algorithm. XGBoost restricts the base learner to be a CART regression tree and it outputs a score instead of a category, which helps to integrate the output of all base CART regression trees (simple summation). XGBoost introduces parallelization, so it is faster, and also XGBoost introduces second-order bias of the loss function, which is generally better. The principle of operation of XGBoost can be divided into the following parts:

1) The objective function of XGBoost: To prevent overfitting, a regularization term is added the objective function of XGBoost, and the loss function is chosen to be able to perform a second-order Taylor expansion (Chen and Xgboost 2016). The objective function is shown in Eq. (2).

$$\mathop {\min }\limits_{{\omega_{t} }} \sum\limits_{t = 1}^{T} {\left( {G_{t} \omega_{t} + \frac{1}{2}(H_{t} + \lambda )\omega_{t}^{2} } \right) + \gamma T}$$
(2)

2) Determine the output of the leaf nodes of the base CART tree: The loss function should satisfy both the second-order derivative and Ht + λ > 0, then the absolute value of the objective function is obtained by Eq. (3).

$$y = \sum\limits_{t = 1}^{T} {\left( { - \frac{{G_{t}^{2} }}{{2(H_{t} + \lambda )}}} \right) + \gamma T}$$
(3)

where T is the total number of leaf nodes of the base CART regression tree; ω is the output fraction of the leaf nodes of the tree; t = 1, 2, …; T is the output value of the t-th leaf node of the base CART regression tree; and γ and λ is the coefficients of the regularization terms, respectively; y is the absolute value of the objective function.

3) Determine the structure of the base CART tree: whether the leaf node is suitable to be extended is determined by a recursive algorithm. For the particular leaf node which needs to be extended, its objective function value is first calculated before the extension. The extended objective function value is calculated after extending. According to each taken value, two new leaf nodes are segmented. Then the difference between the node’s values of two new leaves is calculated, and the feature achieved the maximum value is used, and its value is used to segment the leaf nodes.

4) Parallel computation of XGBoost: Although the base CART trees in XGBoost are serial to each other, the generation of individual base CART trees can be taken in parallel computation.

5) Overfitting treatment of XGBoost: In addition to adding regularization terms to the objective function to alleviate overfitting, other methods such as limiting the maximum depth of the tree, shrinkage, and feature subsampling can also be used.

After the model is established, the accuracy rate of each subcategory and the overall accuracy rate of the model discrimination are used as its evaluation index. The higher the accuracy rate is, the stronger the discriminative performance of model is.

Results and discussion

Extraction and analysis of features

Extraction and analysis of spectral features

Figure 2 shows the average spectra of healthy samples and samples with different bruising times and the corresponding images of bruised yellow peaches at four different times. The yellow peach’s bruise region spectra were extracted by ENVI4.5 software and the region of interest (ROI) size was 160 pixels. From Fig. 2, it can be seen that the reflectance of the sound samples is highest, while the spectral reflectance of the four types of bruised samples is different, their general trends are similar. The longer the storage time of bruise’s yellow peaches are, the lower the reflectance of their average spectra is. This may be due to the different browning degrees of the bruised area; this makes its different absorption in light. As the browning degree of the bruised area of samples increasing with the storage time, the browning degree changes more obvious at the beginning, the browning degree is close to the maximum value at 24 h, so that the browning degree doesn’t change much at 48 h comparing with 24 h. Two distinct absorption valleys exist in all types of spectra. The absorption valley in the S1 region (680–720 nm) is caused by the chlorophyll in yellow peaches, the other in the S2 region (980–1020 nm) is caused by the O–H bond vibrations of water and sugar components (Tang et al. 2020).

Fig. 2
figure 2

Average spectra of healthy samples and samples with different bruising times and images of the corresponding four different times of bruised yellow peaches

Extraction and analysis of image features

The collected spectral range is from 397.5 to 1014 nm, and the number of bands is 176. Each band corresponds to a single wavelength image, so it is obviously impossible to extract features from each image, and the data dimension reduction needs to be performed and the relevant images is selected to maximize the characterization of yellow peach bruises. PCA can be used to reduce hyperspectral dimensions, enhance the target region's information, and eliminate noise (Tian et al. 2021). Therefore, in this study, PCA was used to downscale the hyperspectral images of all samples, and Fig. 3 shows the first five principal component images of one of samples after dimensionality reduction with a total contribution of 99.92%, and the five principal component contribution rates are 99.79%, 0.06%, 0.05%, 0.01% and 0.01%, respectively. It can be seen from Fig. 3 that the bruise region features of the PC2 image aren’t apparent. The bruise region features PC3 and PC4 show the bruise region well, but they both are sensitivity to non-uniform illumination relatively, and the intensity distribution of the whole fruit surface isn’t uniform. PC5 image is basically noisy, and it is difficult to identify the bruised area. Compared with the above images, the PC1 image retains the real information of yellow peaches well, and the bruised area is prominent. Therefore, the PC1 image is used to characterize the bruise characteristics of yellow peach.

Fig. 3
figure 3

First 5 principal component images

The PC1 images of yellow peach samples at different times (2 h, 8 h, 24 h and 48 h) were obtained based on PCA, and their weight coefficient curves were plotted. As shown in Fig. 4, the weight curves of the four types of samples are almost similar shapes, and the characteristic peaks or valleys of the curves give a prominent contribution to the corresponding PC1 images. Therefore, eight single-wavelength images corresponding to characteristic peaks or valleys (672.3 nm, 696.9 nm, 746.4 nm, 771.3 nm, 937.1 nm, 842.9 nm, 860.9 nm and 984.6 nm) were used to extract image features.

Fig. 4
figure 4

PC1 image weight coefficient curve and corresponding single wavelength image of four types of samples

The PC1 image of each yellow peach sample was made into a binary mask template corresponding to a single wavelength image to remove the background of each image. The number of pixels occupied by yellow peach samples was counted, and the gray value of each pixel was calculated. Then, all gray values were summed; finally, the average gray value was used as image feature, and the extraction flowchart is shown in Fig. 5.

Fig. 5
figure 5

Flowchart of average gray value extraction

Model building and analysis

In this study, the spectral features, image features, and spectral features combining with image features were used to model, respectively. The Kennard-stone (KS) algorithm was applied to divide the three kinds of features into modeling set and prediction set at the ratio of 3:1, respectively, i.e., the modeling set had 240 sample data, the prediction set had 80 sample data. In order to find the best discriminative classification model, RF, SVM, and XGBoost models were established, respectively, and the performance of the discriminative model were evaluated by the accuracy rate and the number of false positives.

Modeling based on spectral features

After the original spectral data of the bruised area being processed by KS, three discrimination models were established, respectively, and the discriminative results of the models are shown in Table 1.

Table 1 Modeling results based on spectral features

From Table 1, the overall accuracy of the three models based on spectral features doesn’t reach more than 80.00%, and the accuracy of each subcategory varies widely. The subcategory with the highest accuracy is the sample with a storage time at 2 h after bruising, and the lowest is the sample with a storage time at 24 h. It can be interpreted from the spectral reflectance in Fig. 3, the highest spectral reflectance is the sample with a storage time at 2 h, and it has a large difference from the others, so the discrimination accuracy is highest. The discrimination accuracy of RF and XGBoost models are 100.00% on this subcategory, while the other three kinds of spectral reflectance aren’t separated well, so the three models aren’t satisfactory in discriminating these three subcategories. From Fig. 6, it can be seen that the models misjudge each other when they are used to discriminate the three categories of storage time at 8 h, 24 h and 48 h. It can be seen from Table 1 that the best discriminant effect of the three models established based on spectral characteristics is the XGBoost model, with the overall accuracy of 77.50%, and the overall accuracy of RF and SVM models is 73.75% and 66.25%, respectively. Yuan et al. (2021) established a PLS-DA model based on the spectra of Lingwu jujube at five time points (2 h, 4 h, 8 h, 12 h, and 24 h) after bruising, and the prediction set accuracy of the model was 90.00% at the original spectra. Although the results of this study are satisfactory in discriminating the storage time of bruised Lingwu, it isn’t sure that the storage time of all fruits after bruising can be well distinguished based on spectral data, so, in this study, the spectral features combined with image features are used to achieve better discrimination.

Fig. 6
figure 6

Visual confusion matrix of three different modeling methods based on spectral features, where R, S and X stand for “RF,” “SVM” and “XGBoost,” respectively

Figure 6 visualizes the results of RF, SVM, and XGBoost models built based on spectral features using confusion matrices. The colors in the confusion matrix are represented as heat maps with values from 0 to 20. From the figure, it can be seen that the subcategory 48 h is indeed easy to misclassify as subcategory 24 h, and in the RF, SVM and XGBoost models, 7, 6 and 5 samples are misclassified as subcategory 24 h, respectively, when the models are used to identify the samples of subcategory 48 h. For the XGBoost model with the best discrimination, one sample is misclassified as 24 h in identifying subcategory 8 h samples, when the model is used to identify the samples of subcategory 24 h, five are misclassified as 8 h and five are misclassified as 48 h.

Modeling based on image features

The image feature data of all yellow peach samples processed by KS was used to establish three discriminative models, respectively. The discriminative results are shown in Table 2.

Table 2 Modeling results based on image features

From Table 2, the overall accuracy of RF, SVM and XGBoost models based on image features are 86.25%, 81.25% and 87.50%, respectively, which are higher than those based on spectral features. It shows that it is feasible to distinguish peaches stored at different times after bruising according to the gray value of PC1 images, and the satisfactory results are achieved. Three models based on image features for four storage time discrimination, the best effect is 8 h, the accuracy of all models reaches 100%.

Figure 7 visualizes the results of the RF, SVM and XGBoost models based on image features using the confusion matrix. The colors in the confusion matrix are represented by heat maps with values from 0 to 20. From the figure, it can be seen that the mutual misclassification phenomenon is more serious between the storage time of 24 h and 48 h. When the models are used to identify the samples of subcategory 24 h, in RF, SVM and XGBoost models, 3, 5 and 2 samples are misclassified as subcategory 48 h, respectively, when the models are used to identify the samples of subcategory 24 h, 3, 3 and 2 samples are misclassified as subcategory 24 h, respectively. The reason for this situation may be that when the image is obtained, the position of the two time points of the sample is different, and there are some differences in the light intensity of the sample surface, or it is related to the relatively close gray value of the bruised area, because the browning of the bruised area deepened with time after the bruising of yellow peaches, the browning may have been close to the maximum degree after 24 h of bruising, so that the degree of browning does not change much after 48 h compared with 24 h. Therefore, the combination of image features and spectral features may improve the misclassification phenomenon.

Fig. 7
figure 7

Visual confusion matrix of three different modeling methods based on image features, where R, S and X stand for “RF,” “SVM” and “XGBoost,” respectively

Modeling based on spectral features combining with image features

Due to the large differences between spectral features and image features, they were normalized, and then the discriminative models were established based on the processed data, and the results of the models are shown in Table 3.

Table 3 Modeling results based on spectral features combined with image features

From Table 3, it can be found that the overall accuracy of RF, SVM and XGBoost models based on feature fusion are 88.75%, 63.75% and 90.00%, respectively. Compared with the models based on single feature, the discriminant effect of RF and XGBoost models is improved, while the discriminant effect of SVM model is worse. The reason for this situation is that the feature fusion not only provides useful information for modeling, but also increases some useless information this shows that the feature fusion is more harm than good for SVM models. However, compared with Tables 1, 2 and 3, it can be found that the accuracy of each subcategory of SVM has changed, and the SVM model based on feature fusion is better than the SVM model based on spectral features in the category of discriminating storage time of 24 h. This indicates that the feature fusion plays a certain effect on the model to discriminate the storage time of yellow peaches after bruising, and the overall accuracy of RF and XGBoost models are improved. Therefore, a suitable model selected under the basis of feature fusion can still improve the discriminative effect of the model.

Figure 8 visualizes the results of the RF, SVM and XGBoost models based on feature fusion using the confusion matrix. The colors in the confusion matrix are represented by heat maps with values from 0 to 20. Among the three models, the XGBoost model has the greatest improvement in discrimination, and the overall accuracy is improved by 12.50% and 2.50%, respectively, compared with the XGBoost models based on spectral features and image features. As can be seen in Fig. 8, the model misclassified one of them as subcategory 24 h when the model is used to identify the samples with subcategory 8 h; three of them are misclassified as subcategory 8 h; when the model is used to identify the samples with subcategory 24 h, and four of them are misclassified as subcategory 8 h, when the model is used to identify the samples with subcategory 48 h.

Fig. 8
figure 8

Visual confusion matrix of three different modeling methods based on feature fusion, where R, S and X stand for “RF,” “SVM” and “XGBoost,” respectively

Screening of characteristic wavelengths

Compared with the previous models under single features, the accuracy of the model based on features fusion is not significantly improved. Still, the features fusion increases the data volume, and the redundant information is added. Therefore, in order to simplify the model and improve the accuracy of the model, CARS algorithm was used to screen the wavelength of the normalized spectral feature data in MATLAB. The process of CARS screening wavelength is shown in Fig. 9.

Fig. 9
figure 9

The CARS screening wavelength process: (a) trend chart of the number of wavelengths; (b) trend chart of the RMSECV; (c) trend chart of the regression coefficients of the wavelength variables

From Fig. 9, the number of sampling times is set to 100, and Fig. 9a shows the trend of the number of variables with the number of sampling runs, and the number of selected wavelengths gradually decreases as the number of the sampling runs increases. Figure 9b shows the trend of RMSECV with the number of sampling runs, and the RMSECV decreases slowly at the beginning, it indicates that some irrelevant variables are removed in the process of sampling. And then the RMSECV steps up, and some variables with high correlation are removed (Li et al. 2021b). Figure 9c shows that the RMSECV is the smallest when the sampling number is 29, at which time a total of 50 characteristic wavelengths are screened out, accounting for 28.41% of the full band. The detailed wavelengths selected by CARS are shown in Table 4.

Table 4 CARS screening results

Analysis of model building at characteristic wavelengths

Among the models based on features fusion, the XGBoost model is best, so the screening spectral features are fused with the image features and it is used to establish the XGBoost model. The results are shown in Table 5.

Table 5 Modeling results after screening based on wavelength

As can be seen from Table 5, the overall accuracy of the XGBoost model based on band screening is 95.00%, which is 17.50%, 7.50%, and 5.00% higher than the overall accuracy of the XGBoost model based on spectral features, based on image features, and before band screening, respectively. Yuan et al. (2021) established a PLS-DA model based on the spectra of five time points of Lingwu jujube after bruising, and the accuracy of the prediction set of the model under the original spectra was 90.00%, and the accuracy of the XGBoost model based on the screened spectral features combined with image features for the classification of storage time of yellow peaches after bruising is 5.00% higher in this study. Therefore, the spectral features combined with the image features can be used to well distinguish the storage time of yellow peaches after bruising. Figure 10 visualizes the results of the XGBoost model built based on the filtered spectral features combined with image features using the confusion matrix. The colors in the confusion matrix are represented by heat maps with values from 0 to 20. From the figure, it can be seen that the XGBoost model achieves 100.00% accuracy. When the model is used to identify the samples 2 h and 8 h, and 2 samples are misclassified to subcategory 8 h in identifying subcategory 24 h; and 2 samples are misclassified as subcategory 8 h in identifying subcategory 48 h.

Fig. 10
figure 10

Visual confusion matrix of XGBoost based on screened spectral features combined with image features, where X represents “XGBoost”

Conclusion

In this study, a hyperspectral imaging system was used to acquire images of yellow peaches at 2 h, 8 h, 24 h and 48 h after bruising, and to verify the feasibility of distinguishing yellow peaches at different storage times after mild bruising by combining the advantage of hyperspectral "union of imagery and spectrum" and machine learning algorithms for modeling. Firstly, the sample bruise region spectrum was extracted as the spectral feature, then, the hyperspectral image was processed by PCA, eight single wavelength images were selected according to the weight coefficient curve of PC1 image, and the gray value of the selected image was calculated as the image feature. Secondly, RF, SVM and XGBoost models based on spectral features, image features and spectral features combined with image features were established, and finally, the XGBoost models based on spectral features, image features and spectral features combined with image features were found to be the optimal models with an overall accuracy of 77.50%, 87.50% and 90.00%, respectively. Since feature fusion increased the amount of data and adds redundant information, the CARS algorithm was used to screen the wavelength of normalized spectral data, and then they were fused with the image feature data again and an XGBoost model with an overall model accuracy of 95.00% was built. In summary, this study shows that the combined hyperspectral imaging and machine learning method can be used to detect yellow peaches stored for different times after mild bruising, but the sample size is insufficient and there are risks in the future external validation process. Therefore, in further studies, multiple time gradients and yellow peaches with different bruise degrees should be added for modeling and analysis.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • Baranowski P, Mazurek W, Wozniak J, et al. Detection of early bruises in apples using hyperspectral data and thermal imaging. J Food Eng. 2012;110(3):345–55.

    Article  Google Scholar 

  • Bennedsen BS, Peterson DL, Tabb A. Identifying defects in images of rotating apples. Comput Electron Agric. 2005;48(2):92–102.

    Article  Google Scholar 

  • Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  • Chen XX, Guo CT, Zhang C, et al. Visual detection study on early bruises of Korla pear based on hyperspectral imaging technology. Spectrosc Spectr Anal. 2017;37(1):150–5.

    CAS  Google Scholar 

  • Cheng X, Chen YR, Tao Y, et al. A novel integrated PCA and FLD method on hyperspectral image feature extraction for cucumber chilling damage inspection. Trans ASAE. 2004;47(4):1313.

    Article  Google Scholar 

  • Fan S, Li J, Zhang Y, et al. On line detection of defective apples using computer vision system combined with deep learning methods. J Food Eng. 2020;286: 110102.

    Article  Google Scholar 

  • Guo W, Gao M, Cheng J, et al. Effect of mechanical bruises on optical properties of mature peaches in the near-infrared wavelength range. Biosys Eng. 2021;211:114–24.

    Article  CAS  Google Scholar 

  • Harel B, Parmet Y, Edan Y. Maturity classification of sweet peppers using image datasets acquired in different times. Comput Ind. 2020;121: 103274.

    Article  Google Scholar 

  • Huang W, Li J, Wang Q, et al. Development of a multispectral imaging system for online detection of bruises on apples. J Food Eng. 2015;146:62–71.

    Article  Google Scholar 

  • Huang YN, Zhang W, Zhang Q, et al. Effects of pre-harvest bagging and non-bagging treatment on postharvest storage quality of yellow-flesh peach. J Chin Inst Food Sci Technol. 2021;21(06):231–42.

    Google Scholar 

  • Kara S, Dirgenali F. A system to diagnose atherosclerosis via wavelet transforms, principal component analysis and artificial neural networks. Expert Syst Appl. 2007;32(2):632–40.

    Article  Google Scholar 

  • Kim G, Kim GH, Park J, et al. Application of infrared lock-in thermography for the quantitative evaluation of bruises on pears. Infrared Phys Technol. 2014;63:133–9.

    Article  Google Scholar 

  • Li H, Liang Y, Xu Q, et al. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal Chim Acta. 2009;648(1):77–84.

    Article  CAS  Google Scholar 

  • Li R, Lu Y, Lu R. Structured illumination reflectance imaging for enhanced detection of subsurface tissue bruising in apples. Trans ASABE. 2018;61(3):809–19.

    Article  Google Scholar 

  • Li X, Jiang H, Jiang X, et al. Identification of geographical origin of Chinese chestnuts using hyperspectral imaging with 1D-CNN algorithm. Agriculture. 2021b;11(12):1274.

    Article  Google Scholar 

  • Li X, Liu Y, Jiang X, et al. Supervised classification of slightly bruised peaches with respect to the time after bruising by using hyperspectral imaging technology. Infrared Phys Technol. 2021a;113:103557.

    Article  CAS  Google Scholar 

  • Liu CJ, Wang HO, Niu LY, et al. Effect of sucrose control on microstructure and quality of explosion-puffed yellow peach chips food. Science. 2020;41(11):113–20.

    CAS  Google Scholar 

  • Lu Y, Lu R. Histogram-based automatic thresholding for bruise detection of apples by structured-illumination reflectance imaging. Biosys Eng. 2017;160:30–41.

    Article  Google Scholar 

  • Luo W, Zhang H, Liu X. Hyperspectral/multispectral reflectance imaging combining with watershed segmentation algorithm for detection of early bruises on apples with different peel colors. Food Anal Methods. 2019;12(5):1218–28.

    Article  Google Scholar 

  • Munera S, Rodríguez-Ortega A, Aleixos N, et al. Detection of invisible damages in ‘Rojo Brillante’ persimmon fruit at different stages using hyperspectral imaging and chemometrics. Foods. 2021;10(9):2170.

    Article  Google Scholar 

  • Opara UL, Pathare PB. Bruise damage measurement and analysis of fresh horticultural produce—a review. Postharvest Biol Technol. 2014;91:9–24.

    Article  Google Scholar 

  • Shao YY, Wang YX, Xuan GT, et al. Rapid detection of soluble solids content in strawberry coated with chitosan based on hyperspectral imaging. Trans Chin Soc Agric Eng. 2019;35(18):245–54.

    Google Scholar 

  • Tan W, Sun L, Yang F, et al. The feasibility of early detection and grading of apple bruises using hyperspectral imaging. J Chemom. 2018;32(10):e3067.

    Article  Google Scholar 

  • Tang Y, Gao S, Zhuang J, et al. Apple bruise grading using piecewise nonlinear curve fitting for hyperspectral imaging data. IEEE Access. 2020;8:147494–506.

    Article  Google Scholar 

  • Tian X, Wang Q, Huang W, et al. Online detection of apples with moldy core using the Vis/NIR full-transmittance spectra. Postharvest Biol Technol. 2020;168:111269.

    Article  CAS  Google Scholar 

  • Tian X, Zhang C, Li J, et al. Detection of early decay on citrus using LW-NIR hyperspectral reflectance imaging coupled with two-band ratio and improved watershed segmentation algorithm. Food Chem. 2021;360:130077.

    Article  CAS  Google Scholar 

  • Vapnik V. The nature of statistical learning theory. New York: Springer Science & Business Media; 1999.

    Google Scholar 

  • Yu XY, Lv J, Bi JF, et al. Mechanism for texture softening of canned yellow peaches based on modification of pectin characteristics. Food Science. 2020;41(19):45–52.

    Google Scholar 

  • Yuan R, Liu G, He J, et al. Classification of Lingwu long jujube internal bruise over time based on visible near-infrared hyperspectral imaging combined with partial least squares-discriminant analysis. Comput Electron Agric. 2021;182:106043.

    Article  Google Scholar 

  • Yuan R, Guo M, Li C, et al. Detection of early bruises in jujubes based on reflectance, absorbance and Kubelka-Munk spectral data. Postharvest Biol Technol. 2022;185:111810.

    Article  CAS  Google Scholar 

  • Zhang S, Zhang H, Zhao Y, et al. A simple identification model for subtle bruises on the fresh jujube based on NIR spectroscopy. Math Comput Model. 2013;58(3–4):545–50.

    Article  Google Scholar 

  • Zhang P, Shen B, Ji H, et al. Nondestructive prediction of mechanical parameters to apple using hyperspectral imaging by support vector machine. Food Anal Methods. 2022;15:1397.

    Article  Google Scholar 

  • Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. p. 785–794.

  • Qin J, Burks TF, Kim MS, et al. Detecting citrus canker by hyperspectral reflectance imaging and PCA-based image classification method. In: Defense and security 2008: special sessions on food safety, visual analytics, resource restricted embedded and sensor networks, and 3D imaging and display, vol. 6983. International Society for Optics and Photonics; 2008. p. 698305.

Download references

Acknowledgements

Not applicable.

Funding

The study was financially supported by the National Natural Science Foundation of China (No. 12103019), and National Science and technology award backup project cultivation plan (No. 20192AEI91007).

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contribution to this research work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ai-guo Ou-yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, B., Yin, H., Liu, Yd. et al. Detection storage time of mild bruise’s yellow peaches using the combined hyperspectral imaging and machine learning method. J Anal Sci Technol 13, 24 (2022). https://doi.org/10.1186/s40543-022-00334-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40543-022-00334-5

Keywords