Feasibility of UV–Vis spectroscopy combined with pattern recognition techniques to authenticate the medicinal plant material from different geographical areas

The correct identification and authentication of medicinal plants material is a crucial task that ensures quality and prevent adulteration. The use of UV–Vis spectroscopy with principal component analysis (PCA) and discriminant analysis (DA) was proposed for identification/authentication of plant material form different genus and different geographical areas provenience. Hydroalcoholic extracts of samples from twelve genus collected from seven countries (Romania, North Macedonia, Germany, Italy, Serbia, Russia and Kazakhstan) were used. The UV–Vis spectra of the extracts were acquired in the 200–800 nm spectral range, and signal smoothing was used for pre-processing the spectral data. Hierarchical clustering analysis (HCA) with 1-Pearson r distance measurement was used to classify the samples based on the original spectra and different-order derivative spectra, respectively. Data from original spectra and from differ-ent-order derivative spectra were evaluated by PCA method. Using the PCA with varimax rotation approach, the spectral ranges with significant contribution for samples classification were revealed for the first time. When the PCA method coupled with DA was applied to the data obtained from the original spectra and the fourth-order derivative spectra, the samples were correctly classified to the respective groups with a 98.04% accuracy. The proposed method can be a useful tool for rapid authentication of plant material derived from different countries.


Introduction
Medicinal plants are playing an increasingly crucial role in the discovery of new drugs, with their rich chemical constitution offering valuable therapeutic compounds.Due to the continuous development of medicinal plant industry, the Committee on Herbal Medicinal Products (HMPC) was established at the European Medicines Agency (EMA).The main responsibilities of HMPC were to provide monograms and list entries on herbal substances and preparations (Anouar et al. 2012;Guemari et al. 2022;Knöss and Chinou 2012;Mukadam et al. 2021).Given the growing popularity of herbal medicine and the ongoing need to update monograms with the latest information, it is necessary to identify and quantify the phytochemical constituents from medicinal plants.According to World Health Organization (WHO) and Food and Drug Administration (FDA), chromatographic methods are considered to be more suitable for these kinds of evaluation (Pérez-Ràfols et al. 2023).Also, most of the classification and identification/authentication studies for the plant materials are carried out based on chromatographic analysis (Fibigr et al. 2018;Wang et al. 2020;Alvarez-Rivera et al. 2019;Simion et al. 2019).Despite the high level of accuracy and specificity of chromatographic methods, the analysis of medicinal plant constituents using these methods are often expensive and time consuming.So, for rapid identification/authentication of complex samples such as medicinal plant extracts, the fingerprinting techniques can be used as fast qualitative analyses.Spectral methods including ultraviolet-visible (UV-Vis), infrared (IR) and Raman spectroscopy were used for this purpose (Heredia-Guerrero et al. 2014;Dankowska and Kowalewski 2019;Fu et al. 2023).Among the fingerprinting techniques, UV-Vis spectroscopy would represent a simpler and less expensive alternative for the first screening step in identification and authentication of plant material.UV-Vis spectral profile of hydroalcoholic extracts can provide qualitative and quantitative information upon the presence of different types of polyphenol constituents (Dhivya and Kalaichelvi 2017;Kalaichelvi and Dhivya 2017) that are the most common phytochemicals present in plants.However, UV-Vis spectroscopy is primarily employed for identifying broad classes of compounds rather than pinpointing specific compounds.The spectra obtained for polyphenols are mainly attributed to electronic transitions between π-type molecular orbital (Anouar et al. 2012).Moreover, using the whole UV-Vis spectra, the analyzed samples are described by a vector of absorbance that can be considered a fingerprint of the sample.Since the fingerprinting techniques provide a non-selective signal, the use of appropriate chemometric techniques is necessary for the results interpretation.Multivariate analysis methods such as cluster analysis (CA), principal component analysis (PCA) or the combination of PCA with discriminant analysis (DA) have been widely used in quality assessment, classification and identification of different medicinal plants based on spectral data (Wei et al. 2015;Cobzac et al. 2019;Rafi et al. 2018).The use of chemometric methods on derivative spectral data was also recommended as a suitable choice to resolve the overlapped absorption spectra and classify the complex samples (Simion et al. 2019).
The purpose of this research was to develop a rapid and inexpensive analytical method able to accurately identify/ authenticate the medicinal plant materials from different growing areas based on UV-Vis spectroscopy and pattern recognition methods.The PCA method with the varimax analysis of principal components approach is proposed for the first time to evaluate both the original and derivative spectral profiles and reveal the spectral regions that have a significant contribution in samples classification/authentication.

Raw materials of samples
The first group of samples includes 23 certified plant materials (Table 1, marked samples) from twelve different medicinal plant genus collected from Romania and Republic of North Macedonia and used as reference samples for plant material authentication.The samples from Romanian provenience were purchased from a specialized store as certified materials assumed in concordance with regulations of Romanian Pharmacopoeias by the producers (Dacia Plant, Fares and Plafar National Company).These producers have a long-standing tradition and positive trend in terms of preparing natural products as well as soils which facilitate green cultures of medicinal and aromatic herbs (Guideline on Declaration of Herbal Substances and Herbal Preparations in Herbal Medicinal Products/Traditional Herbal Medicinal Products 2009).The plant material originated from North Macedonia was collected from three locations in the Osogovo mountains basin situated in the south-eastern part of the country.The plant material was identified by determination key using the data from Matevski (Matevski 2010) and a specimen is kept in the herbarium at the Department of Plant Production, Faculty of Agriculture, GoceDelchev University in Shtip, Republic of North Macedonia.The second group of samples including 29 samples collected from specialized stores from Germany, Italy, Serbia, Russia, and from markets from Kazakhstan (Almaty) (unmarked samples in Table 1) was used as test samples to enlarge the study and verify the feasibility of UV-Vis spectroscopy combined with the proposed pattern recognition techniques to authenticate the plant material of different geographical area provenience.A sample of each plant material used in this study is kept in the Chemistry Department at the Faculty of Chemistry and Chemical Engineering, Babes-Bolyai University, Cluj-Napoca, Romania.

Extraction procedure
To perform the experiment, the vegetal material (10 g) was crushed to powder using a Retsch MM400 ball mill (Retsch, Haan, Germany).Two grams (accurately weighted) of each sample was subjected to the maceration process with 20 mL of extraction mixture consisting of ethanol-water in a ratio of 70:30 (v/v) for 10 days at room temperature.The resulting extracts were separated by decantation, and the remaining residue was washed two times with 2 mL of extraction mixture and centrifuged.After the extraction steps, the combined extracts were diluted with the extraction solvent to a final volume of 25 mL.This procedure was conducted on two parallel samples for each plant material to ensure reproducibility and minimize experimental errors.Before the analysis, the samples were centrifuged at 4000 rpm for 15 min and diluted at a ratio of 1:100 with mixture of ethanol-water (70:30 v/v).

UV-Vis spectral measurement and pre-processing
Absorption spectra were recorded in the range 200-800 nm using the Jasco V-550 double-beam spectrophotometer (Jasco Corporation, Japan).The slit was fixed at 0.5 nm.The 10 mm path length quartz cells were used to obtain the spectra of all the solutions.Other characteristics of the system are the registering speed (400 nm min − 1), wavelength precision (± 0.3 nm), photometric accuracy (± 0.004), and the wavelength reproducibility (± 0.1 nm).The same mixture of solvents (ethanol-water in a ratio of 70:30 v/v) was used as reference for spectra acquisition.For each of the samples, the spectra was collected at room temperature in duplicate and the results were averaged.All UV-Vis spectra were pre-processed using the smoothing procedure to remove noise with Spectra Manager software version 1.54.03(Jasco Corporation, Japan).Absorbance data from 200 to 800 nm spectral range and from different-order derivative spectra consisting in 601 variables in each case, were used for PCA and HCA analysis.The HCA, PCA and DA analyses were performed using the Software Package Statistica 12 (StatSoft inc. 1984-2014, USA).

Multivariate analysis of spectral data
In this study, Savitzky-Golay smoothing (23-point quadratic polynomial) pretreatment was applied for original spectral data.The first-, second-, third-and fourth-order derivative algorithms were also applied on the pre-processed data using the Spectra Manager software.Digitized data from the whole UV-Vis spectra acquired for the plant material samples (52 samples, Table 1) were organized into a data matrix consisting in 52 rows (the number of samples) and 601 columns (recorded absorbance at different wavelengths in the range 200-800 nm).
Hierarchical clustering analysis (HCA) with Ward's method as the amalgamation rule and 1-Pearson r clustering distance measurement (the best approach selected in our previous studies for plant extracts classification) (Cobzac et al. 2019) was used as unsupervised pattern recognition technique to group the analyzed samples based on the original spectral data and on the corresponding first-, second-, third-and fourth-order derivative spectral data.
To overcome the difficulty of PCA results interpretation, the varimax rotation approach (Forina et al. 1998) was used to maximize the variation expressed by the principal components (PCs) in principal component analysis (PCA).The Varimax procedure was performed on the subspace of PCs with eigenvalue > 1 necessary to reconstruct the data to the determined accuracy and not just those correlated to outcome.Formally varimax searches for a linear combination of the original factors such that the variance of the loadings is maximized.Varimax rotation simplifies factor analysis by grouping variables into distinct subgroups.This facilitates interpretation of results because variables are associated with a limited number of factors, allowing an easier identification of spectral regions crucial for classification and authentication.
The correct classification of the samples was evaluated by applying the combination of PCA-LDA analysis.This method has the advantage to work even if the number of samples is small and has the ability to correct the over optimistic results provided by other methods.The main idea behind this method is that from the data set (validation set) it takes one sample point and the rest of the sample points are used as the training set.This process was repeated until the samples points have been validated once (Geroldinger et al. 2023).

UV-Vis spectra analysis
In this study, a number of 52 samples from twelve medicinal plant genus (Table 1) were used.According to the UV-Vis spectral profile (Fig. 1) of the extracts from certified samples used as reference (Table 1, 23 marked samples), a variation in peak positions and intensities from 220 to 750 nm were observed.According to the literature general information (Mabasa et al. 2021), the UV-Vis profiles of the selected medicinal plants indicate a complex composition of plant materials.Three absorption bands in the UV region (range 220-250 nm, range 250-300 nm and range 300-380 nm) and three absorption bands with lower absorbance values located in the visible region (range 400-500 nm, range 550-600 nm and 650-700 nm) were observed.According to the general composition of the medicinal plant extracts mentioned in literature, these UV regions of the spectra can be associated with the characteristic absorption region of different phenolic compounds with aromatic conjugated systems such as flavonoids and their derivatives (from 230 to 290 nm (band I) and from 300 to 350 nm (band II)) (Kalaichelvi and Dhivya 2017;Saxena and Saxena 2012), flavones and flavonols (from 310 to 370 nm, generally higher wavelength for flavonols than for flavones) (Dankowska and Kowalewski 2019).Anthocyanin could be present in certain samples since they show also two basic regions of absorbance, the first one at a wavelength region of 260-280 nm (UV region) (Saha et al. 2020).According to the visible region of the spectra profiles (Fig. 1b), some of the analyzed extracts indicates the presence of carotenoids (peaks occurring at 400-450 nm Patle et al. 2020), anthocyanin (the second region including peaks at 490-550 nm, Saha et al. 2020), tannins (peaks occurrence at 400-550 nm, Saxena and Saxena 2012), terpenoids (peaks occurrence at 400-550 nm) and chlorophyll (peaks occurrence at 600-700 nm, Saxena and Saxena 2012; Syed Ali Fathima and Johnson 2018).Additionally, in Fig. 1, it is difficult to observe whether the UV-Vis spectra for samples are different from each other and if samples from different plant genus exhibit distinct patterns.Also, the variation in the UV-Vis absorbance regions reflects the differential abundance of specific phytochemicals present in each plant species.
The HCA classification of the samples.When data from original spectra were analyzed using HCA, the samples were grouped into four main clusters, as depicted in Fig. 2a.Stinging nettle (Ur), Valerian (Va), Dandelion (Ta), Horsetail (Eq), Agrimony (Ag) St.John's wort (Hy), Liquorice (Lq), Blackberry (Ru), and Elderberry (Sa) samples collected from different countries were classified in the same clusters with corresponding certified samples collected from Romania or North Macedonia.
The Juniper (Ju) and Peppermint (Me) samples were divided into distinct subclusters depending of the countries of provenience.Peppermint (Me) samples exhibited a more complex pattern, forming two subclusters with a degree of dissimilarity of 4.4%.The samples from Italy (Me7) and Serbia (Me3) were classified in the same subcluster as the one from North Macedonia (Me2).
On the other hand, the second subcluster, composed of samples from Russia (Me8) and Kazakhstan (Me9), formed a separate group with a degree of dissimilarity of 0.1% from the subcluster including the certified sample from Romania.
The individual analysis of the UV and Vis region, respectively, revealed that within the 400-800 nm range (Fig. 2b), the clustering patterns were less distinct, leading to mixed results.Although the use of data from the 200-400 nm range improved considerably the clustering results of classification (Fig. 2c), it can be seen that the entire UV-Vis spectra provides the most valuable information for medicinal samples classification.
The use of information from the whole spectra is supported also by plant material constituents as phytocompounds with a large domain of absorbance (e.g., 200-600 nm the classes of polyphenols and 350-700 nm for carotenoid and chlorophylls pigments (Saha et al. 2020;Zhang et al. 2017;Souza et al. 2021;Domenici et al. 2014).Furthermore, different parts of the plant such as root, leaves, grass, flowers or fruits have different distribution of the classes of compounds (roots have no chlorophyll, some flowers and fruits have anthocyanin etc.) with significant contribution to the spectral profiles.
Derivative spectra analysis Derivative spectra can be used in spectroscopy to enhance the resolution of spectral features, reduce the influence of baseline shifts and noise and improve the performance of pattern recognition methods (Simion et al. 2019).This is because higher-order derivatives tend to amplify the curvature of the original spectra, making more subtle changes in Fig. 1 Representative UV-Vis spectra obtained for the extracts of certified medicinal plant material selected in this study (marked samples in Table 1): a 200-800 nm spectral range; b 380-800 nm spectral range absorption more apparent.In the context of authentication/identification of medicinal plant samples, the use of different-order derivatives can be particularly beneficial for several reasons.The most important reasons can be listed as the increased resolution to differentiate closely spaced absorption bands, reduction of baseline noise and interference, suppression of broad band absorption, enhanced robustness to sample preparation variations and consequently more effective discrimination between genuine and non-authentic samples.So, to reveal the derivative-order spectra which provide the most valuable information, the PCA analysis combined with the varimax rotation algorithm was used to evaluate data from first-, second-, third-and four-order derivative of the UV-Vis spectra.The Varimax rotation algorithm aims to maximize the variance of squared loadings in each principal component (PC), thereby enhancing the interpretability of the resulting factor loadings.It also rearranges the amount of variance among PCs, ensuring that the most important factors are represented by the first few components.So, each new factor has only few variables with significant contribution (large loadings).For the PCA-Varimax rotation analysis, the information contained in each spectrum (absorbance values from 200-800 nm, 52 objects × 601 data matrices) was considered.
Based on the PCA-Varimax results from data of the original spectra, the spectral region 400-730 nm associated with the first factor (Factor 1, significant values of loadings > 0.70) provide 55.11% from data variability while the spectral region 200-400 nm associated with the second factor (Factor 2, significant values of loadings > 0.70) provide 26.74% from the spectral data variation.With 8.17% variance accounted by the third factor associated with the 740-800 nm spectral region (Factor 3, significant values of loadings > 0.70) practically more than 90% (90.02% of data variability) of spectral information provided by entire spectra profile is expressed by the first three factors.So, the UV-Vis spectra can offer valuable information that is important for medicinal plants identification/authentication/classification based on the spectral analysis of hydroalcoholic extracts.
According to the PCA-Varimax loadings profile of the first three factors from data from original spectra (Fig. 3a), the first three factors are associated with large spectral domains with high loadings values.In all cases, an insignificant variation is observed between loadings values of closed regions inside of the associated spectral domain.Using data from derivative spectra, the loadings profiles (Fig. 3b-e) revealed significant differences between close regions of the spectra that are highlighted by significant variation in loadings values.As the derivation order increases, the hidden information in the UV-Vis spectra can be revealed over increasingly narrow spectral domains based on the large variations in the loadings values.This means that using data from higherorder derivative spectra, the hidden information from more close regions of the spectra can be revealed in order to more accurately detect differences between samples from the UV-Vis spectral profile.
In order to verify the grouping/classification of the samples using the data obtained from derivative spectra, the supervised PCA-DA method was employed.Considering the above, the PCA scores with eigenvalues > 1 obtained for dataset of original spectra (first 10 PCs), first derivative (first 18 PCs), second-order derivative (first 19 PCs), third-order derivative (first 21 PCs) and fourthorder derivative spectra (first 22 PCs), respectively, were used as initial variables for DA analysis.As can be seen from Table 2 when data from the original spectra and data from the first-order derivative spectra were used in PCA-DA analysis, a percent of 94.34% of the analyzed samples were correct classified.Among the groups with a lower percent of classification, the Horsetail (Eq, 80% of samples correct classified), St.John's wort (Hy, 75% of samples correct classified) and Valerian (Va, 83.33% of samples correct classified) samples can be mentioned based on data from original spectra.Horsetail (Eq, 60% of samples correct classified) and Valerian (Va, 83.33% of samples correct classified) samples showed a low percent of classification even when data from the first-order derivative spectra were employed.An improved classification was observed for data from third-order derivative spectra when 98.11% of the samples were correct classified.In this case only the group of Horsetail samples showed a low classification (Eq, 80% of samples correct classified).The percent of 100% correct classification was revealed for all the analyzed groups of samples based on data from fourth-order derivative spectra.
Based on these results, it was demonstrated that data from fourth-order derivative spectra could provide additional information related to all UV-Vis spectral region that could be important for samples classification/ authentication.

Conclusion
In this study, the feasibility of UV-Vis spectroscopy combined with chemometric methods was proved for classification/authentication of a significant number of plant material samples collected from different geographical area.It was shown that loadings profile from the PCA-Varimax analysis are useful tools to reveal hidden information from close regions of the UV-Vis spectra.It was demonstrated for the first time that data from increasing order derivative spectra reveal the differences between increasingly narrow regions of the UV-Vis spectra.Moreover, data from the fourth-order derivative of the UV-Vis spectra provide the most valuable information for correct classification/authentication of complex herbal products.Consequently, the developed method can be employed as a rapid, effective, and reliable approach for authentication of medicinal plant material and useful tool in context of fraud detection.

Fig. 2
Fig. 2 Classification of the medicinal plant samples (52 samples including 23 reference samples (red color) and 29 test samples) according to hierarchical clustering analysis (HCA) with Ward' s method as the amalgamation rule and 1-Pearson r clustering distance measurement using data from original spectra; a spectral range 200-800 nm; b spectral range 400-800 nm; c spectral range 200-400 nm

Table 2
Percent of samples correct classified based on PCA-DA analysis