t-Distributed Stochastic Neighbor Embedding (t-SNE) [91], curvilinear components analysis (CCA) [92], Maximum Variance Unfolding (MVU) [93], Schroedinger Eigenmaps (SE) [94] and Spatial Spectral Schroedinger Eigenmaps (SSSE) [11]. However, nearly all of these **nonlinear** DR methods have been applied only to small images or tiles. Two of the greatest barriers to effective use of **nonlinear** DR methods in HSI processing are their computational complexity and memory requirements. Fong [95] shows that LLE, LTSA and LLTSA are incapabale of handling HSI having more than 70 × 70 pixels; for SPE, KPCA and CFA, this number of pixels reduces to 50 × 50. Although [95] is now over ten years old, its conclusions have not changed dramatically in the last decade. In attempting to run various **nonlinear** DR algorithms on the 512 × 217 **pixel** Salinas image [96] on a modern desktop computer (AMD FX-6300 Six-Core Processor, 24 GB memory), we run out of memory when attempting to perform LLE, ISOMAP, LTSA and t-SNE. LE does successfully run under the constraint that only a small number (20) of neighbors can be used to construct the graph; however, the accuracy of a subsequent random forest classifier is worse than that achieved with PCA dimension **reduction**. SSSE and KPCA DR algorithms also successfully run on the same desktop computer, and subsequent random forest **classification** is superior than when PCA is used for DR; however, computation time is huge: 1,716 seconds for SSSE and 432,173 seconds (more than 5 days) for KPCA.

Show more
150 Read more

Isometric Mapping (ISOMAP)[2] is a data-driven **nonlinear** **dimensionality** reduc- tion method, or manifold learning method, to describe the **nonlinear** variations of the data. It uses a k-Nearest Neighbor (kNN) Graph to estimate the manifold–the nonlin- ear geometric structure of the data, and Classical Multidimensional Scaling (CMDS)[3] to construct the manifold coordinates that are sorted by the variances of the data on the manifold. It solves the major problem of linear **dimensionality** **reduction** meth- ods. However, it requires the eigendecomposition on the proximity matrix of all data points, which is often impossible for large remotely sensed **hyperspectral** images. The ENH-ISOMAP[4] designed for **hyperspectral** **imagery** addresses this problem by adopt- ing the landmark ISOMAP algorithm[5], which uses a subset of data points to estimate the whole manifold, along with other optimizations such as backbone reconstruction, **efficient** data structure and algorithm implementation for nearest neighbor search and shortest path search, etc. This makes ENH-ISOMAP a practical manifold learning algorithm for typical **hyperspectral** **imagery** and has better performance than MNF, especially for **nonlinear** signals.

Show more
131 Read more

A use of bootstrap sampling prior to **classification** might improve **classification** accuracy [45,46]. The RF algorithm forms many **classification** trees, and each tree is trained based on a bootstraps sample for the training data [29]. The RF algorithm does the bootstraps internally, therefore, we didn’t consider applying bootstraps separately on the training-polygons. In addition to the mentioned factors, a visual comparison of the results showed that most misclassifications were occurred in shadowing areas and where the two artificial black-lines where existed. Handling shadowing-, and noise-areas (or other abnormalities in an image) prior to the **classification** might have been optimizing the **classification** results. Moreover, the **pixel**-level **classification** results could be enhanced by post- processing techniques, however, to avoid missing tree-classes, no post-processing strategy was applied in this study.

Show more
12 Read more

radiometric resolution of the image increases, expressed as number of bit per **pixel**, lossy approaches obtain better results than lossless techni- ques, in terms of quality of reconstructed images. In the literature, several lossy approaches have been proposed for the compression of HS images (Abousleman et al., 1995; Conoscenti, Coppola, & Magli, 2016; Fowler et al., 2007; Karami, Heylen, & Scheunders, 2015; Kulkarni et al., 2006). Many of these techniques are based on decorrelation trans- forms, in order to exploit both spatial and spectral correlations, followed by a quantization stage and an entropy coder. In particular these approaches involve the combination of a 1-D spectral decorrelator such as the principal component analysis transform (PCA), the Discrete Wavelet Transform (DWT), or the Discrete Cosine Transform (DCT), followed by a spatial decorrelator (Abrardo, Barni, & Magli, 2010; Christophe, Mailhes, & Duhamel, 2008; Kaarna et al., 2000). It is not difficult to understand that the spec- tral decorrelation phase plays a critical role for an effective HS compression. Wavelet-based techniques include the 3D extensions of JPEG2000, SPIHT, and SPECK (Kim, Xiong, & Pearlman, 2000; Penna et al., IEEE GRSL, 2006a; Tang et al. 2005). These approaches can be seen as direct 3D extensions of approaches designed for 2D **imagery**, where a 1D

Show more
16 Read more

In addition, it is almost impossible for us to obtain labeled samples of every class for training in **hyperspectral** **imagery** scene. In other words, we may have no training sample allied to a given query **pixel** at all. This makes that no matter how we classify the query **pixel** is wrong. Hence, before classifying a given query **pixel**, we must first decide if it is a valid sample from one of the classes in the HSI data set. The ability to detect and then reject invalid test **pixel** is important for **hyperspectral** **classification** task. Conventional classifiers such as nearest neighbor (NN) and nearest space (NS) usually use the representation error for validation. However, with an over-complete dictionary, the smallest representation error of the invalid test **pixel** is not so large, even an invalid **pixel** will has as small representation error as valid pixels, leading to inaccurate validation. Since the coefficients are computed globally in sparse representation scheme, the distribution of the coefficients contains important information about the validity of the query **pixel**. In detail, a valid **pixel** has sparse coding vector whose nonzero entries concentrate on one subject while an invalid **pixel** has sparse coefficients spread widely across the entire training set. However, in dense CR, since the samples from different classes share similarity and all the training samples participate in the representation, both of the coefficients of valid and invalid samples would spread widely across the entire training set. This makes the validation ability of coefficients weak. Therefore, we note that the sparsity constraint contributes to validation as well. What calls for special attention is that the coefficients computing must be interpreted as 0 -norm minimization within a certain

Show more
11 Read more

The basic LBP operator is a gray scale invariant pattern measure characterising the tex- ture in images Maenpaa et al. [2000]. In this method, texture is defined using local patterns on a **pixel** level. Each **pixel** is labelled with the code of the texture primitive that best matches the local neighbourhood. First, the centre **pixel** value is taken as a threshold and n neigh- bours of the **pixel** are selected. Then, each neighbouring **pixel** is given a weight based on its position and these weights are multiplied by the threshold values to generate basic LBP code. In the circular LBP (which is used in our algorithm), symmetric neighbours in a cir- cle are used for a particular radius. Normally 8, 12 and 16 neighbourhoods are used. By taking 8 neighbours, 59 parameters represent the local variations in a region, whereas 12 and 16 neighbours result in 133 and 243 parameters, respectively. By using the highly uni- fying approach of LBP, texture in biopsy samples distinguishes benign tissue samples from malignant samples.

Show more
16 Read more

[4-17]). Several studies for higher spectral resolution (e.g., 60 channels in [18,19]) used synthetic data which often favor a particular (such as maximum likelihood) classifier, by virtue of (Gaussian) data construction. Oth- ers offered some principled **dimensionality** **reduction** and showed high accuracies with the reduced number of bands for a moderate number of classes (e.g., [20-22]). Some research targeted selected narrow spectral windows of **hyperspectral** data to classify one specific important spectral feature [23]. A small number of ANN works clas- sified **hyperspectral** data directly, without prior dimen- sionality **reduction** [24-26]. Experience suggests that the difference in quality between the performance of clas- sical methods and ANN classifiers increases in favor of the ANNs with increasing number of channels. However, this has not yet been quantified for large-scale classifica- tion of many cover types with subtle differences in com- plex, noisy **hyperspectral** patterns. Assessment of ANN performance versus conventional methods for realistic, advanced remote sensing situations requires comparisons using the full spectral resolution of real **hyperspectral** data with many cover classes because conventional tech- niques are most likely to reach their limitations in such circumstances. Systematic evaluation is needed to ensure powerful, reliable, automated applications of ANNs or any other classifiers. The present paper is a step toward filling this gap.

Show more
19 Read more

and Principal Component Analysis (PCA). We use PCA in two different ways. Firstly, PCA is applied on **hyperspectral** bands only and additional features with the first few PCs were added. Secondly, PCA is applied on the whole, feature vector from **hyperspectral** and LiDAR as Luo et al. (2016). Our former technique for using PCA provides higher classi- fication accuracies. Also, we measure the **classification** ac- curacies of our feature combination and the feature combi- nation proposed by Luo et al. (2016) when applying PCA on the whole feature vector. Our feature combination achieves higher **classification** accuracies with the same number of PCs than the mentioned existing one using the decision tree. • Our method for classifying land cover classes is not depen- dent on any prior knowledge like road width/tree height. It can be used in other datasets without any adjustment that is required by some existing method (Man et al., 2015).

Show more
The study shows that the variable importance in projection method [77] can be used to identify the wavebands that are the most important predictor variables in the **hyperspectral** **classification** of grassland age-classes. The accuracy of a partial least squares **classification** based on a subset of 177 wavebands, identified with the help of the variable importance in projection approach as those that were most important for discriminating between successional stages, was 85% (8% higher than for a **classification** based on the full set of 269 bands). Among the 177 **hyperspectral** wavebands that gave the most **efficient** discrimination between grassland age-classes, 50 wavebands were located in the visible region (414–716 nm), 79 wavebands in the red-edge to near-infrared regions (722–1394 nm), and 48 wavebands in the shortwave infrared region (1448–2417 nm) of the electromagnetic spectrum. The fact that the best wavebands for discriminating between grassland age-classes fell within the operating range of both the HySpex VNIR-1600 spectrometer (414 to 991 nm) and the HySpex SWIR-320m-e spectrometer (966 to 2501 nm) suggests that data from specific wavebands covering the full 400–2500 nm spectral range are likely to provide the best **classification** of grassland successional stages. Our results also show that the partial least squares-based **classification** procedure is a suitable method for the **classification** of grasslands successional stages, allowing a large number of **hyperspectral** wavebands to be compressed into a few latent variables while decreasing the risk of model overfitting. In our study, the first four latent variables explained approximately 97% of the variation in the spectral data.

Show more
30 Read more

Gidudu Anthony [2] et al performed multiple **classification** tasks using Support Vector Machines. The approaches used were One-Against-One (1A1) and One-Against-All (1AA) techniques for **classification** of multiple land covers present in remotely sensed data. The authors conclude that 1AA approach to multiclass **classification** has exhibited higher propensity for mixed pixels than the 1A1 approach. The two approaches were compared with four different SVM classifiers like Linear, Quadratic, Polynomial and RBF using Kappa Coefficients. Thus, **classification** accuracy reduced for the linear and RBF classifiers and stayed the same for the polynomial and increased for the quadratic classifier. It can therefore be concluded that whereas one can be certain of high **classification** results with the 1A1 approach, the 1AA yields approximately as good **classification** accuracies. The choice therefore of which approach to adopt henceforth becomes a matter of preference.

Show more
Experimental comparisons of the methods: In addition to the theoretical specificities of the approaches, the experimental results remain a major selection criterium. A meta-analysis has been conducted to identify the most significant algo- rithm performances from the numerical comparisons inven- toried in the publications. The results depend on both the used quality measure and the main information guiding the **dimensionality** **reduction** (feature space v.s label space or co- label and feature space). Three methods MVMD, SSMDDM and MDDM based on feature space **reduction** dominate for the two uncorrelated retained measures (Hamming Loss and a selected measure among a large set of correlated ones including Micro F1, Macro F1 and AUC). For the latter, results also highlights SLEEC and REML which are recent approaches especially designed for extreme multi- label learning. A dual examination of domination relation- ships completes the analysis by pointing out the methods dominated for the two measures. However, from a method- ological point of view the generalization of the conclusions should be considered cautiously. As numerous pairwise comparisons are absent of the published experiments, the meta-analysis has been computed on a non-complete graph. Moreover, the heterogeneity of both the datasets used in the different studies and the number of times each algorithm was evaluated add biases to the comparisons. However, despite these limitations, we believe that this first meta- analysis can help identify recurrent properties in the most **efficient** approaches and also flaws in the experimental protocols (e.g. the lack of some pairwise comparisons). More

Show more
21 Read more

As each term in HTML tag for each web page can be taken as a feature. It causes the problem of high **dimensionality**. To reduce **dimensionality** problem, Selma Ayse Özel (2009) proposed optimal feature selection technique based on Genetic algorithm. The performance of this method is compared with J48 (decision tree), the Naïve Bayes Multinominal (Bayes), and the IBk (kNN) classifiers. It gives 96% accuracy using GA as feature selector. In this method, the numbers of features considered are large i.e. up to 50000 features, system takes both terms and HTML tags together on a Web page as features, assign different weights to each feature and the weights are determined by the GA. After extracting features, document vectors for the Web pages are created by counting the occurrences of each feature in the associated HTML tag of each Web page. The GA feature selector consists of coding, generation of initial population, evaluation of a population, reproduction, crossover, mutation, and determination of the new generation steps and reproduction, crossover, mutation steps are repeated, number of generations times until optimal feature vector found [9].

Show more
in optimizing the features and sustainable during the process of decision making of tumour cells [7].The proposed system uses the PCA for feature selection and Artificial Neural Network (ANN) for **classification** for improving the detection accuracy. Scree Test and cumulative variance are the rule utilized in the PCA. After feature selection, the reduced number of data is passed to back propagation ANN to distinguish the benign and cancer data [8]. The proposed system for feature selection is genetic algorithm .It makes us to understand the most significant parameters for cancer detection. Artificial neural network(ANN), particle swarm optimization and genetic algorithms are utilized to determine the detection accuracy of the classifier models on WDBC and WPBC datasets. Particle swarm optimization outperforms the other classifiers in WDBC dataset. Artificial neural network provides good detection accuracy in both WDBC and WPBC datasets. Hence, feature selection increases the detection accuracy before passing onto the classifier model[9]. Hybrid systems are constructed using the independent component analysis(ICA) with discrete wavelet transform for **dimensionality** **reduction** for WDBC dataset. Probabilistic neural network (PNN) is operated to analyse between the benign and malignant cells. The system provides detection efficiency of 96.31% and 98.88% sensitivity. The computational overhead is reduced because the dataset features are reduced before passing to the PNN classifier [10]. The independent component analysis (ICA) is further explored for its adaptability as the decision system for WDBC dataset. The classifier used to verify the **classification** results are k-nearest neighbor, ANN, RBFNN and SVM. The metrics evaluated are ROC, specificity, sensitivity, detection efficiency and F-measure. The

Show more
With rapid advances in science and technology nowadays, the marginal cost associated with data collection is decreasing, and more and more big data of different types are available for scientific analysis. In the context of data explosion, however, high data **dimensionality** occurs, posing considerable challenges to **classification**. The traditional **classification** algorithms rely on distance or density of data items. But in the case of high **dimensionality**, these methods are not effective anymore due to space sparsity. Moreover, directly classifying high-**dimensionality** data using the **classification** methods causes heavy time costs and computational complexities. This limits the widespread application of the traditional **classification** algorithms.

Show more
We consider two common word space models that have been used with **dimensionality** reduc- tion. The first is the Vector Space Model (VSM) (Salton et al., 1975). Words are represented as vectors where each dimension corresponds to a document in the corpus and the dimension’s value is the number of times the word occurred in the document. We label the second model the Word Co-occurrence (WC) model: each dimension cor- respond to a unique word, with the dimension’s value indicating the number of times that dimen- sion’s word co-occurred.

In this paper we discuss the specific visualization task of projecting the data to points on a two- dimensional display. Note that this task is different from manifold learning, in case the inherent **dimensionality** of the manifold is higher than two and the manifold cannot be represented perfectly in two dimensions. As the representation is necessarily imperfect, defining and using a measure of goodness of the representation is crucial. However, in spite of the large amount of research into methods for extracting manifolds, there has been very little discussion on what a good two- dimensional representation should be like and how the goodness should be measured. In a recent survey of 69 papers on **dimensionality** **reduction** from years 2000–2006 (Venna, 2007) it was found that 28 (≈ 40%) of the papers only presented visualizations of toy or real data sets as a proof of quality. Most of the more quantitative approaches were based on one of two strategies. The first is to measure preservation of all pairwise distances or the order of all pairwise distances. Examples of this approach include the multidimensional scaling (MDS)-type cost functions like Sammon’s cost and Stress, methods that relate the distances in the input space to the output space, and various cor- relation measures that assess the preservation of all pairwise distances. The other common quality assurance strategy is to classify the data in the low-dimensional space and report the **classification** performance.

Show more
40 Read more

Three non eigen-based **hyperspectral** ID estimators have re- cently been proposed. The first one, introduced in [34] as part of a Negative ABundance-Oriented (NABO) unmixing algorithm, borrows the main idea from the HySIME algorithm. Basically, it decomposes the residual error from the uncon- strained unmixing into two components, a first due to noise and a second due to ID. The algorithm works by starting from an underestimate of the ID, and then, iteratively increments the ID value until the unmixing error can be solely explained by the noise term. The second non eigen-based method, called **Hyperspectral** Image Dimension Estimation through Nearest Neighbor distance ratios (HIDENN) [26] is based on local geometrical properties of the data manifold. The technique is aimed at computing the correlation dimension of the dataset, which is itself closely related to the concept of fractal dimen- sion. The basic idea is to count (in the neigborhood of one data point) the total number of pairs of points g() which have a distance between them that is less than . Then it can be shown that if n → ∞ and → 0, the so-called correlation

Show more
17 Read more

In this paper, we investigate and compare the features in the structural profiles of core promoter regions in several typical eukaryotes. Instead of using a sliding window of specified width to filter noise in the structural profiles of each individual promoter, we align promoters at the TSS for each eukaryote type and get an averaged promoter representative for this eukaryote type. Then we apply a **nonlinear** **dimensionality** **reduction** algorithm – Isomap on the averaged promoter model, which is described by a set of physicochemical parameters, to separate a comprehensive structural profile. The structural profile derived by our method is very different from those in previous studies. Firstly, the avoidance of the sliding window approach can preserve the local details of each single promoter, while the average between individual promoters weakens the local inconsistent structural traits and strengthens the consistent

Show more
Abstract: **Dimensionality** **reduction** is of high importance in **hyperspectral** data processing, which can effectively reduce the data redundancy and computation time for improved **classification** accuracy. Band selection and feature extraction methods are two widely used **dimensionality** **reduction** techniques. By integrating the advantages of the band selection and feature extraction, we propose a new method for reducing the dimension of **hyperspectral** image data. First, a new and fast band selection algorithm is proposed for **hyperspectral** images based on an improved Determinantal point process (DPP). To reduce the amount of calculation, the Dual-DPP is used for fast sampling representative pixels, followed by kNN-based local processing to explore more spatial information. These representative **pixel** points are used to construct multiple adjacency matrices to describe correlation between bands based on mutual information. To further improve the **classification** accuracy, two-Dimensional Singular Spectrum Analysis (2D-SSA) is used for feature extraction from the selected bands. Experiments show that the proposed method can select a low-redundancy and representative band subset, where both data dimension and computation time can be reduced. Furthermore, it also shows that the proposed **dimensionality** **reduction** algorithm outperforms a number of state-of-the-art methods in terms of **classification** accuracy.

Show more
10 Read more