0% found this document useful (0 votes)
13 views20 pages

On The Application of Dimensionality Reduction and Clustering Algorithms For The Classification of Kinematic Morphologies of Galaxies

Uploaded by

Haruki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views20 pages

On The Application of Dimensionality Reduction and Clustering Algorithms For The Classification of Kinematic Morphologies of Galaxies

Uploaded by

Haruki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Astronomy & Astrophysics manuscript no.

main ©ESO 2022


December 9, 2022

On the application of dimensionality reduction and clustering


algorithms for the classification of kinematic morphologies of
galaxies
M. S. Rosito1 , L. A. Bignone1 , P. B. Tissera2, 3 , S. E. Pedrosa1

1
Instituto de Astronomía y Física del Espacio, CONICET-UBA, Casilla de Correos 67, Suc. 28, 1428, Buenos Aires, Argentina.
2
Institute of Astronomy, Pontificia Universidad Católica de Chile, Avenida Vicuña Mackena 4690, Santiago, Chile.
3
Centro de Astro-Ingeniería, Pontificia Universidad Católica de Chile, Avenida Vicuña Mackena 4690, Santiago, Chile.
arXiv:2212.03999v1 [[Link]] 7 Dec 2022

December 9, 2022

ABSTRACT

Context. The morphological classification of galaxies is considered a relevant issue and can be approached from different points of
view. The increasing growth in the size and accuracy of astronomical data sets brings with it the need for the use of automatic methods
to perform these classifications.
Aims. The aim of this work is to propose and evaluate a method for automatic unsupervised classification of kinematic morphologies
of galaxies that yields a meaningful clustering and captures the variations of the fundamental properties of galaxies.
Methods. We obtain kinematic maps for a sample of 2064 galaxies from the largest simulation of the eagle project that mimics
integral field spectroscopy (IFS) images. These maps are the input of a dimensionality reduction algorithm followed by a clustering
algorithm. We analyse the variation of physical and observational parameters among the clusters obtained from the application of this
procedure to different inputs. The inputs studied in this paper are (a) line-of-sight velocity maps for the whole sample of galaxies
observed at fixed inclinations, (b) line-of-sight velocity, dispersion, and flux maps together for the whole sample of galaxies observed
at fixed inclinations, (c) line-of-sight velocity, dispersion, and flux maps together for two separate subsamples of edge-on galaxies with
similar amount of rotation, and (d) line-of-sight velocity, dispersion, and flux maps together for galaxies from different observation
angles mixed.
Results. The application of the method to solely line-of-sight velocity maps achieves a clear division between slow rotators (SRs) and
fast rotators (FRs) and can differentiate rotation orientation. By adding the dispersion and flux information at the input, low rotation
edge-on galaxies are separated according to their shapes and, at lower inclinations, the clustering using the three types of maps
maintains the overall information obtained using only the line-of-sight velocity maps. This method still produces meaningful groups
when applied to SRs and FRs separately, but, in the first case, the division into clusters is less clear than when the input includes a
variety of morphologies. When applying the method to a mixture of galaxies observed from different inclinations, we obtain results
that are similar to those in our previous experiments with the advantage that in this case the input is more realistic. In addition, our
method has proven to be robust to consistently classify the same galaxies viewed from different inclinations.
Key words. methods: statistical — galaxies: general – galaxies: kinematics and dynamics

1. Introduction bulges and pseudo-bulges by using the Sérsic index(e.g. Combes


2009; Fisher & Drory 2008; Tonini et al. 2016). Galaxy classifi-
Since galaxy morphology is the result of complex processes in- cation can also be based on a combination of parameters such as
volved in the assembly history of galaxies, a morphological de- colours and Sérsic index (Vika et al. 2015).
scription is crucial to trace these processes and to achieve a better The disc (bulge)-to-total light ratio defines the position of a
understanding of galaxy evolution (e.g. see Conselice 2014, for galaxy in the Hubble sequence (Hubble 1926) and is widely used
a recent review). Historically, galaxies have been classified vi- in observational works (e.g. Kormendy & Bender 2012; Yoon &
sually through their apparent morphology Hubble (1926). The Im 2020). In numerical simulations, the availability of physical
need of the human eye is an obvious difficulty (Raddick et al. information allows the definition of the disc-to-total mass ratio
2007) and the inaccuracies of visual classification are partially (D/T ), which is used to distinguish disc and spheroid dominated
solved by the use of different quantitative measurements. How- galaxies (e.g. Pedrosa & Tissera 2015; Tissera et al. 2016a,b;
ever, while visual classification is still required (Chadha 2007), Rosito et al. 2018, 2019a,b). A negative correlation can be found
the vast photometric data sets that are being gathered requires the between D/T and the aforementioned Sérsic index, as shown in
development of reliable, efficient, and fast classification meth- Rosito et al. (2018).
ods. Another important parameter used for galaxy classification is
The structure of galaxies has been quantified through param- the mean rotation-to-dispersion velocity ratio, (V/σ, e.g. Chisari
eters related to their light profiles (de Vaucouleurs 1948; Sérsic et al. 2015; Dubois et al. 2016) which is tightly correlated to
1968). In particular, Sérsic profile, which describes the shape of the disc fraction (Rosito et al. 2021). Furthermore, the three di-
the surface density profiles, varies according to galaxy morphol- mensional axis of the ellipsoid enclosing a particular mass can
ogy and can be used to establish a separation between classical be easily obtained from the simulations (Tissera & Dominguez-
Article number, page 1 of 20
A&A proofs: manuscript no. main

Tenreiro 1998) providing, thus, a direct measurement of the tested, obtaining increasing accuracy and performance. For in-
shape of a galaxy. The axis ratios are used to describe the pro- stance, Marin et al. (2013) achieve accuracies of 79 per-cent and
lateness/oblateness (Artale et al. 2019; Cataldi et al. 2021; van 91 per-cent for naive Bayes and random forest classifiers, re-
de Ven & van der Wel 2021) and can also be correlated with spectively. Selim et al. (2016) present a method for supervised
rotation (Rosito et al. 2019b). classification based on non-negative matrix factorization obtain-
In contrast, non-parametric methods measure the distribu- ing an accuracy of 93 per-cent. Convolutional neural networks,
tion of light in galaxies without assumptions related to the stellar on the other hand, are commonly used in computer vision and are
mass or light distributions. Galaxy structure can be quantitatively considered, thus, ideal tools to analyse galaxy images. As an ex-
described by these methods through the CAS system, which ample, Khalifa et al. (2017) achieve an accuracy of 97 per-cent
combines concentration (C), asymmetry (A), and clumpiness (S) for galaxy image classification. This can be improved through
of the stellar light distribution (Conselice 2003), and through a the addition of data augmentation techniques to overcome over-
number of similar parameters, such as the Gini index, the M20, fitting, as shown by Mittal et al. (2019). Tohill et al. (2021) use
and the internal colour dispersion statistic (Abraham et al. 2003; convolutional neural networks to predict non-parametric quanti-
Lotz et al. 2004; Papovich et al. 2003). These parameters can be ties from galaxy images. Their method outperforms other non-
used to define classification criteria (e.g. Deng 2013). Besides, parametric measurement algorithms in the sense that it is more
non-parametric statistics correlate with other morphology indi- than 1000 times faster, as well as providing lower bias estimates
cators: disc-to-total ratio (Scannapieco et al. 2008), the κ0 param- with lower scatter. de Diego et al. (2020) show that deep neu-
eter (Correa et al. 2017), and the ratio of rotation and dispersion ral networks are more accurate than other classification meth-
velocities (Bignone et al. 2020). ods related to photometry and shape parameters (Sérsic index
Quantitative and accurate classification of galaxies according and concentration index) when applied to modern deep surveys
to kinematics has been possible with the advent of integral-field (Bongiovanni et al. 2019). All these methods are based on super-
spectroscopy (IFS) techniques, being the SAURON project (Ba- vised algorithms which rely deeply on the availability of training
con et al. 2001) disruptive and followed by galaxy surveys like sets.
CALIFA (Sánchez et al. 2012), MaNGA (Bundy et al. 2015) and
Unsupervised ML is particularly relevant to extract new in-
SAMI (Bryant et al. 2015). The distinction between slow rota-
formation from a data set since unsupervised algorithms are
tors (SRs) and fast rotators (FRs) was first introduced by Em-
trained with unlabelled data. Since galaxies can be classified ac-
sellem et al. (2007) for early-type galaxies and has been widely
cording to a variety of criteria, unsupervised classification can be
studied from an observational point of view (e.g. Veale et al.
considered ideal to automatically obtain meaningful groups. A
2017; Brough et al. 2017; Greene et al. 2018). Emsellem et al.
completely unsupervised technique presented by Hocking et al.
(2007) and subsequent works (Emsellem et al. 2011; Cappellari
(2018) is able to successfully separate images of early and late
2016; van de Sande et al. 2021) propose classifications in SRs
type galaxies besides being very computationally efficient. Di-
and FRs based on observational projected parameters. Other au-
mensionality reduction algorithms based on principal compo-
thors consider definitions based on physical quantities, for in-
nents analysis (PCA) are used to study images from galaxy sur-
stance by fixing a lower bound for the bulge-to-total mass ratio
veys with a number of applications such as outlier detection and
(e.g. Rosito et al. 2019b). The study of kinematics sheds light
prediction of missing data (Uzeirbegovic et al. 2020). Portillo
on fundamental concerns about galaxies. Using the eagle cos-
et al. (2020) use variational autoencoders trained with a sam-
mological simulations (Crain et al. 2015; Schaye et al. 2015)
ple of high dimensional spectra and obtain an interpretable la-
and the HYDRANGEA zoom-in runs (Bahé et al. 2017), La-
tent space able to separate galaxies with notably different prop-
gos et al. (2018) connect kinematic properties of galaxies with
erties related star formation and active galactic nuclei (AGN).
their formation paths. In particular, they find a correlation be-
In this work, they outperform PCA with the same number of
tween the amount of rotation of galaxies and their merger his-
components. Unsupervised clustering aims to group objects with
tories, being the SRs more likely to have experienced dry major
similar properties and may follow dimensionality reduction al-
mergers, whereas wet mergers are more frequent in FRs. Other
gorithms. Classical algorithms like k-means have been used to
studies of numerical simulations find that mergers are able to
classify galaxy spectra being able to separate galaxies accord-
transform kinematic properties which can provide clues to un-
ing to colours (Sánchez Almeida et al. 2010) but still have dif-
derstand galaxy evolution (Jesseit et al. 2009; Bois et al. 2011;
ficulties in identifying stars with different metallicities (Sánchez
Naab et al. 2014; Penoyre et al. 2017; Schulze et al. 2018).
Almeida & Allende Prieto 2013). In a recent work, Cheng et al.
Due to the increasing depth and resolution of new surveys (2021) employ a combined technique which consists on a varia-
(e.g. Laureijs et al. 2011; LSST Science Collaboration et al. tional autoencoder followed by hierarchical clustering. They ob-
2009; Ivezić et al. 2019) and the continuous increasing in nu- tain a meaningful classification for the input images formed by
merical resolution of cosmological simulations, the use of au- 27 groups with well defined properties related to structure and
tomatic methods is becoming mandatory (e.g. Ball & Brunner shape finding also a correlation with physical properties.
2010; Kremer et al. 2017; Howard 2017). The availability of
more computational resources leads to the generation of large An intermediate form between supervised and unsupervised
data sets from simulations as well as making possible to anal- learning is self-supervised learning. Commonly applied to nat-
yse data accurately and obtain robust conclusions regarding the ural language processing and robotics, self-supervised learning
underlying physics. is also used in astronomy. A novel self-supervised contrastive
Machine learning (ML) is rapidly gaining ground in different learning method presented by Sarmiento et al. (2021) takes
fields of astronomy (see Baron 2019, for a recent review). The kinematic and stellar population maps obtained from MaNGA
idea of the use of artificial neural networks for the morphological (Bundy et al. 2015) and is able to produce two groups: one
classification of galaxy images has been in place since the end formed by old, metal-rich massive galaxies compatible with
of last century (Storrie-Lombardi et al. 1992; Lahav et al. 1995). early-type galaxies and the other populated by low-mass, star-
In the past few years, the application of several supervised ML forming galaxies which can be associated with late-type galax-
techniques to galaxy classification has been widely studied and ies.
Article number, page 2 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

The goal of our work is the automatic unsupervised classi- Central galaxies are defined as the most massive systems within
fication of kinematic maps. We propose a method based on the the virial halo.
application of the Uniform Manifold Approximation and Pro- To achieve a more accurate unsupervised classification, it
jection (UMAP) algorithm (McInnes et al. 2018), which is a is relevant to include in our sample galaxies with diverse mor-
non-linear dimensionality reduction technique, followed by the phologies and kinematic properties to assess, thus, the ability of
Hierarchical Density-Based Spatial Clustering of Applications our method to capture these features. Since the fraction of SRs,
with Noise (Campello et al. 2013, HDBSCAN) algorithm with regardless of the criteria adopted to classify them, significantly
the aim at clustering a set of mock kinematic maps that mimic increases with increasing stellar mass (Lagos et al. 2018), we
images obtained through IFS. Hereafter, we will refer to our choose galaxies with stellar masses above 1010 M to ensure the
method as UmScanGalactiK . inclusion of a non-negligible number of these type of galaxies.
For this purpose, we implement and test our method in a The stellar masses are measured within 1.5 optical radius, de-
galaxy catalogue constructed by Tissera et al. (2019) from the fined as the radius that enclosed ∼ 80 per-cent of the stellar mass
larger volume simulation of the eagle project (Schaye et al. of the galaxy (Tissera 2000). Our final sample consists, thus, on
2015; Crain et al. 2015), for which visual classification is avail- 2064 massive central galaxies. As a morphological indicator, we
able. The kinematic maps are generated with the R-package Sim- compute the disc-to-total stellar mass ratio, D/T . This is done
Spin (Harborne et al. 2020). Our method is sensitive to both, by means of the dynamical decomposition based on the binding
physical and observational projected parameters, computed di- energy and angular momentum content of the stellar particles as
rectly from the simulation. Hence, UmScanGalactiK is a simple described by Tissera et al. (2012).
and fast method to cluster galaxies which yields a classification
that takes into account the main properties of galaxies.
This paper is organized as follows. In Section 2 we describe 3. Methodology
the eagle project (Schaye et al. 2015; Crain et al. 2015) and As mentioned in the Introduction, we propose a method that
the simulation used for this study and our galaxy sample. The automatically classifies mock galaxy kinematic maps by the
methodology we follow is presented in Section 3. In Sections 4 combination of the UMAP dimensionality reduction algorithm
and 5 we show and discuss our main results obtained through our (McInnes et al. 2018) and the clustering algorithm HDBSCAN
method applied to different sets of kinematic maps. In Section 6, (Campello et al. 2013). The groups obtained as a result of the
we study the clustering of SRs and FRs separately. We discuss application of these algorithms can be used to classify galaxies
the robustness of this method and its applicability to more real- according to morphology and kinematics. Python implementa-
istic cases in Section 7 and we finally conclude in Section 8. tion for UMAP2 and HDBSCAN3 are publicly available on-line.
The data feeding the aforementioned algorithms is a set of
mock kinematic maps of the sample of galaxies described in Sec-
2. Simulated galaxies tion 2. These maps are obtained from IFS data cubes built with
the R-package SimSpin4 presented by Harborne et al. (2020).
In this study, we use a sample of galaxies selected from the larger
Below, we briefly describe each algorithm as well as the gen-
volume simulation of the eagle project (Crain et al. 2015; Schaye
eration of the kinematic maps and discuss how we use them
et al. 2015), which is a suite of hydrodynamic cosmological sim-
throughout this work.
ulations that aims to follow the formation of structures and are
consistent with a Λ-CDM universe1 .
The cosmological parameters are consistent with the Planck 3.1. Kinematic maps
cosmology (Planck Collaboration et al. 2014a,b): Ωm = 0.307,
Having selected the set of galaxies to be analysed in this work,
ΩΛ = 0.693, Ωb = 0.04825, H0 = 100h km s−1 Mpc−1 , be-
we use the SimSpin R-package (Harborne et al. 2020) to build
ing h = 0.6777. For this work we use the simulation, so-called
their IFS kinematic data cubes. This package is provided with
L100N1504, which corresponds to a 100 Mpc side box and it
information of each galaxy. All galaxies in our sample were pre-
uses 15043 dark matter and initial baryonic particles. The mass
viously rotated so that the total angular momentum is parallel to
resolution is 9.70 × 106 M and 1.81 × 105 M for the dark
the z-axis. In this work, we use only stellar particles from which
matter and initial gas particles. The gravitational softening (0.7
we extract their position, velocity, mass, age, metallicity, and ini-
pkpc, proper kilo-parsecs) is kept constant in proper units below
tial mass. The age, metallicity, and initial mass are used to model
z = 2.8; at higher z the softening is kept constant in co-moving
the spectral energy distributions for each stellar particle using the
units at 2.66∼ckpc.
stellar population synthesis models of Bruzual & Charlot (2003).
The eagle simulations were performed by using a modi-
For this study, we compute the line-of-sight velocity, velocity
fied version of the gadget-3 code described in Springel et al.
dispersion, and flux maps at different inclinations generated by
(2005). This version includes radiate cooling (Wiersma et al.
SimSpin from the kinematic data cubes by collapsing the cubes
2009), reinonization (Haardt & Madau 2001), stochastic star
along the velocity axis (z-axis). Pixels of the flux maps are com-
formation (Schaye & Dalla Vecchia 2008), stellar (Dalla Vec-
puted by summing the contributions of each flux plane along the
chia & Schaye 2012) and AGN feedback (Rosas-Guevara et al.
velocity axis, whereas line-of-sight velocity and dispersion maps
2015). A Chabrier (2003) initial mass function (IMF) is used.
represent the flux-weighted average of the velocity and the dis-
The dark matter haloes are identified with a Friends-of-Friends
persion along the velocity axis of the data cubes (see Harborne
algorithm (Djorgovski & Davis 1987, FoF) and the gravitionally
et al. 2020, for details). The images mimic the observational pa-
bound subhaloes are then selected by using a SUBFIND algo-
rameters of SAMI (Scott et al. 2018). In particular, the spatial
rithm (Springel et al. 2001; Dolag et al. 2009).
pixel size is 0.5 arcsec. We need our galaxies to be reasonably
In this work, we analyse a subsample of galaxies comprising
7482 central galaxies at z = 0 selected by Tissera et al. (2019). 2
[Link]
3
[Link]
1 4
We use the eagle public database (McAlpine et al. 2016). [Link]

Article number, page 3 of 20


A&A proofs: manuscript no. main

well resolved. Hence, we choose a projected distance to each sional data is projected into a bidimensional plane obtaining thus
galaxy of z = 0.05 obtaining, thus, a physical size or aperture of an easy visualisation of the data and their subsequent clustering
15 kpc side. (see Appendix B for a discussion about the number of compo-
We also use SimSpin to compute physical and projected ob- nents of the output space).
servational parameters. The former include the 3D axis ratios In Table 1 we show the UMAP main (hyper)parameters
b/a and c/a within half-mass, being a ≥ b ≥ c and the spin we establish for each experiment in this work. In particular,
parameter defined by Bullock et al. (2001), λB , obtained from n_neighbors defines the number of points in the local neighbor-
its radial profile computing the mass weighted average within hood which the algorithm considers when it attempts to learn
the half-mass radius. Regarding the observational parameters, the manifold structure. Low values of n_neighbors implies that
we calculate the projected spin parameter (λR , Emsellem et al. the algorithm focuses on very local structure, whereas setting
2007) and the projected ellipticity, ε = 1 − b/a, where a and large values yields a more global representation of the data. An-
b are the projected major and minor semi-axis for a given in- other (hyper)parameter is the minimum distance at which the
clination, respectively. Both parameters are measured within the projected points are allowed to be (min_dist). When this (hy-
projected effective radius. It is important to clarify that we are per)parameter is low, the embeddings are clumpier with small
not able to compute an accurate value of λR for galaxies with connected components, while higher values should be used to
projected effective radii larger than the aperture, so, we exclude preserve the broad structure. The reduced dimension space in
those galaxies in the analysis of the projected properties. The which the data will be embedded is set with the (hyper)parameter
number of removed galaxies varies with the inclination, but in n_components and the metrics used to compute distances in the
all cases it is between 40 and 50. input space defined by the metric (hyper)parameter.
Throughout this paper, we study the variations of these pa-
rameters, as well as D/T (Sec. 2) and the triaxiality parameter,
3.3. Clustering algorithm
T = (1 − b2 /a2 )/(1 − c2 /a2 ) among the galaxy groups, classi-
fied with UmScanGalactiK , with the goal of understanding the Clustering is the most commonly employed unsupervised ML al-
meaning of the clustering. gorithm and aims to group objects that are in some sense similar
In Fig. 1 we show three examples of kinematic maps: a (high intra-cluster similarity) while objects belonging to differ-
galaxy with little amount of rotation (upper panel) and two ro- ent groups are not similar (low inter-cluster similarity).
tating galaxies with different directions. The rotation orientation We choose the HDBSCAN (Campello et al. 2013) algorithm
can be quantified by the sign of the average rotation velocity as our second step. This method outperforms other density-based
computed for each half of the line-of-sight velocity map, defin- clustering algorithms and is often used in astronomy (e.g. Katz
ing positive or negative whether the velocity is incoming or out- et al. 2016; Kimm et al. 2018), as well as in a variety of disci-
going to the plane defined by the image. plines like malware analysis, bioinformatics, and molecular dy-
namics. It is an ideal tool for unsupervised learning to define
3.2. Dimensionality reduction algorithm groups in data sets based on finding dense regions with the ad-
vantage that the number of clusters does not need to be specified.
High dimensional data may be difficult to handle and compu- The set of significant clusters is obtained from the optimization
tationally expensive to process. Dimensionality reduction tech- of the stability of the clusters given a minimum cluster size set
niques allow the transformation of high dimensional data into before so that components with a number of elements below this
lower dimensional data minimizing the loss of information and threshold are considered spurious. The elements that cannot be
facilitating, thus, the visualisation and classification of the data assigned to any group are called outliers.
sets. By using this algorithm, we find clear groups in the bidi-
UMAP is a dimensionality reduction algorithm first pre- mensional projection depicting galaxies that share similar char-
sented by McInnes et al. (2018). It serves as a manifold learn- acteristics. We study the properties of each group to evaluate the
ing technique and, due to its good computational performance usefulness of our method.
and scalability with dimensions and number of samples, it has a The HDBSCAN (hyper)parameters used in each experiment
wide variety of applications in astronomy (e.g. Reis et al. 2021; in this work are summarised in Table 1. The (hyper)parameter
Kim et al. 2021), as well as in other fields like bioinformatics and min_cluster_size defines the smallest number of elements in a
materials science. It is based on strong mathematical foundations group. To define how conservative the clustering might be, we
that allow it to construct a representation in a low dimensional can modify min_samples and alpha. Technically, min_samples is
space, which is topologically equivalent to the representation of the number of samples in a neighbourhood for a point to be con-
the high dimensional data (see McInnes et al. 2018, for details). sidered a core point. Large values of this (hyper)parameter result
In this work, we construct 30 px x 30 px kinematic images. in a more conservative clustering in which more points will be
Each map is reshaped to obtained 900-dimensional vectors. In considered noise in comparison with the clustering using low
Sec. 4, we study directly the line-of-sight velocity maps and min_samples. Regarding alpha, it is a scale parameter related to
therefore the input of the algorithm are 900-dimensional data the hierarchical algorithm and it is also related to how conser-
points. In Sec. 5, we include in the analysis the velocity disper- vative the clustering is. The default value of this parameter is 1
sion and the flux information by concatenating to the line-of- and higher values yield more conservative results. The minimum
sight velocity maps, velocity dispersion and flux maps, increas- distance between clusters is given by cluster_selection_epsilon
ing the input dimension to 2700. Because of the differences in in the sense that clusters at distances below this threshold are
the magnitudes among the different data, we apply a uniform merged. For more details about the algorithm see Campello et al.
linear normalization to each type of maps by dividing each value (2013).
by the maximum absolute value of the components across the We set all the (hyper)parameters prioritising a clear visual-
whole data set. Therefore the components of the velocity map ization of the trends among the clusters besides not being too
are between -1 and 1, whereas, for the dispersion and flux maps, restrictive, especially regarding the cluster sizes. Since UmScan-
the values lie in the interval [0,1]. In both cases, the high dimen- GalactiK is an unsupervised method, we need human interven-
Article number, page 4 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

150 300

100 250

50 200

1
1
0

km s
150

km s
50
100
100
50
150
0

200

150 400

100
300
50
1

1
0
km s

km s
200
50

100
100
150

200 0

200 350
150
300
100
250
50
200
1
1

0 km s
km s

50 150

100 100
150 50
200
0

Fig. 1. Examples of line-of-sight velocity (left panels), dispersion (middle panels), and flux (right panels) maps for a galaxy with low rotation
(upper panels), a “clockwise” rotating galaxy (middle panels), and a “counterclockwise” rotating galaxy (bottom panels). The maps are obtained
from the kinematic data cubes generated by SimSpin by collapsing each data cube along the velocity axis, as described in Harborne et al. (2020).

Table 1. UMAP and HDBSCAN (hyper)parameters set for each experiment. n_neighbors: size of local neighborhood used for manifold approx-
imation, min_dist: minimum distance between embedded points, n_components: dimension of the space to embed into, metric: metric to use to
compute distances in the high dimensional space, min_cluster_size: minimum size of clusters, min_samples: number of samples in a neighbour-
hood for a point to be considered a core point, cluster_selection_epsilon: distance threshold below which clusters will be merged, alpha: distance
scaling parameter.

Experiment UMAP HDBSCAN


n_neighbors min_dist n_components metric min_cluster_size min_samples cluster_selection_epsilon alpha
1 (Sec. 4) 70 0. 2 ’euclidean’ 70 1 0. 1.
2 (Sec. 5) 70 0. 2 ’euclidean’ 70 1 0. 1.
3 (Sec. 6, SRs) 73 0.1 2 ’euclidean’ 60 1 0. 1.
4 (Sec. 6, FRs) 72 0.9 2 ’euclidean’ 70 1 0. 1.
5 (Sec. 7) 45 0. 2 ’euclidean’ 150 1 0. 1.

tion in the analysis of the results that arise for different inputs, 4. Clustering of line-of-sight velocity maps
rather than relying on metrics like accuracy or sensitivity. The al-
gorithms are sensitive to their (hyper)parameters and the number 4.1. Clustering of edge-on galaxies
of combination of (hyper)parameters is potentially huge. Hence,
we can set their values for different inputs based on our astro- As a first approach, we use only the line-of-sight velocity maps
physical knowledge keeping the ‘human-in-the-loop’. An im- of our galaxy sample as input of the algorithms described in
portant part of the challenges of this work is to find a combi- Sec. 3 and study the variation of either physical or observational
nation of (hyper)parameters that allows us to find sensible and galaxy properties among the clusters obtained by UmScanGalac-
interesting results. Exploring heuristic methods to improve the tiK . In this Subsection, we observe galaxies edge-on, that is, at
(hyper)parameters selection is a research question itself and is an inclination of 90 degrees. Velocity maps are sensitive to the
beyond the scope of this paper. inclination at which a galaxy is observed because IFUs can only
measure line-of-sight velocity. Therefore, rotation is best noticed
Article number, page 5 of 20
A&A proofs: manuscript no. main

at high inclinations. In particular, at 90 degrees the line-of-sight two clusters, being C1 (violet) associated with the highest disc-
is perpendicular to the total angular momentum. fractions. C0 and C1 from Fig. 2 are unified by UmScanGalac-
In Fig. 2 we show a summary of the application of Um- tiK in a single cluster. We conclude that the method is able to
ScanGalactiK to the the line-of-sight velocity maps of edge- distinguish orientation without loss of information relevant to
on galaxies. Hereafter, the colorbar limits are determined by the galaxy morphology.
10th and 90th percentiles of the property according to which the Taking into account that we use kinematic information as in-
symbols are coloured. We quantify the distributions of the pa- put for our method, it is interesting to assess the applicability of
rameters involved in this analysis by approximating a probabil- this method to the classification of SRs and FRs. This separa-
ity density function (PDF) from the normalised histograms. The tion is widely studied in the literature (e.g. Emsellem et al. 2007,
clustering method yields ∼ 3 per-cent of outliers (grey dots). It is 2011; Veale et al. 2017; Brough et al. 2017; Greene et al. 2018;
important to remark that these outliers are located near the clus- Lagos et al. 2018; Rosito et al. 2019a). Parametric classifica-
tered points in the bidimensional embedding and that the varia- tions considering λR are proposed first by Emsellem et al. (2007,
tions of the physical and projected properties across the projec- 2011) and in subsequent works (e.g. Cappellari 2016; Graham
tion follow a continuous trend that includes the outliers. et al. 2018; van de Sande et al. 2021). Hereafter, we consider the
We find three clusters, which we dubbed C0, C1, and C2. parametric classification for SRs of van de Sande et al. (2021)
The central one, C2 (red), is populated by less disc-dominated shown in the Eq. 1
galaxies with lower λB and more prolate (higher values of T )
than the galaxies in C0 (violet) and C1 (green). These proper- λR < 0.12 + 0.25ε, for ε ≤ 0.5 (1)
ties are consistent with galaxies dominated by velocity disper-
sion and, in fact, the D/T ratios of this population are low with where both parameters are measured within the half light radius,
most of the members having D/T < 0.2. Regarding the triax- following also Lagos et al. (2022).
iality parameter, a bimodal distribution can be seen with peaks In Fig. 4 (left panel) we show the λR -ε plane for all clustered
at T = 0.15 and T = 0.8 approximately. Galaxies in C2 with galaxies following the colour code of Fig. 2. It is clear that the
T ≤ 0.4 (around the first peak) have also higher disc fractions SRs region according to van de Sande et al. (2021) is dominated
compared to the remaining galaxies with a median D/T of 0.21, by galaxies from C2, as expected. In fact, 71 per-cent of the SRs
being 0.16 and 0.26 the first and third quartiles, respectively. In belong to C2, whereas the 17 per-cent and 12 per-cent belong to
contrast, the first, second (median), and third quartiles of D/T of C0 and C1 respectively. Our approach may outperform the para-
C2 galaxies with T > 0.4 are 0.11, 0.13, and 0.16, respectively. metric classification since a non-negligible number of galaxies
It can also be seen from the figure that this “low rotation cluster” with low D/T belonging to C2 (23 per-cent) would be classi-
has the lowest values of ε compared to the other two clusters, fied as FRs with the parametric definition of van de Sande et al.
although a non-negligible number of galaxies in C2 have inter- (2021). In the middle panel of Fig 4 it can be seen that these
mediate ellipticities overlapping with the distributions of ε of C0 galaxies are those with the highest D/T within C2. Similarly,
and C1. The situation is similar to that of T : the highest ε galax- galaxies in the SR region belonging to C0 or C1 have the lowest
ies in C2 have the highest D/T . For instance, the median D/T D/T among these clusters (right panel). We remark that physi-
for galaxies in C2 with ε < 0.5 is 0.13 while the median D/T for cal parameters such as D/T do not depend on the inclination and
the remaining galaxies is 0.22. Low projected ellipticities mean describe, thus, the fundamental properties of the galaxies.
that the projected axis are more similar and, thus, the galaxies are Automatic ML methods for galaxy classification yielding
rounder. Galaxies in C2 have the lowest λB , as well as the lowest groups that distinguish galaxy rotation have been previously
λR . This is what we expect since there is a significant correlation employed. From an input consisting on stellar population and
between both spin parameters, regardless of the inclination. This kinematic maps for galaxies from MaNGA survey (Bundy et al.
is discussed in Appendix A where we show that the correlations 2015), Sarmiento et al. (2021) obtain three galaxy clusters, one
between the two spins are present down to 20 degrees of inclina- of which lies in the SR region of the λR -ε plane (their figure
tion. For lower angles the Spearman coefficients are below 0.5. 3, first row), albeit clusters significantly overlap. These galax-
It is clearly seen from Fig.2 that C0 and C1 are populated ies are also notably older than those in the other two clusters in
by oblate discy galaxies with the highest values of ε and both agreement with galaxies from the eagle simulation (Rosito et al.
spin parameters. The disc fractions measured from the simu- 2019a).
lation of galaxies in C0 and C1 are systematically higher than We can conclude that, when using only the information about
those from C2. On the other hand, no significant differences can line-of-sight velocity, we can obtain a clear separation between
be found between the D/T of C0 and C1, as could be verified by SRs and FRs. However, this requires that the rotation is clearly
a Brunner-Munzel test (Brunner & Munzel 2000)5 By compar- depicted in the kinematic maps as we explain in the next Sub-
ing the PDFs in Fig. 2 for C0 and C1, we find very similar dis- section.
tributions of the parameters. This can also be quantified by the
Brunner-Munzel test. The difference we observe is that galax- 4.2. Effects of the inclination
ies in C0 and in C1 rotate in opposite directions regarding the
line-of-sight (see Sec. 3). In order to explore the effect of the ro- Besides studying the clustering of edge-on galaxies, we anal-
tation direction in our classification, we repeat the procedure but yse the applicability of UmScanGalactiK to galaxies observed
this time flipping the kinematic images of “counterclockwise” at different inclinations. The rotation is poorly seen at low in-
rotating galaxies, following the sign convention mentioned in clinations, hence, we assess how the method is affected by de-
Sec. 3.1. Thus, all input galaxies have the same direction. We creasing the angle of observation. In Table 2, we summarise the
show this in Fig 3. As can be seen, UmScanGalactiK yields properties of the clustering at inclinations down to 20 degrees.
In Fig. 5 we show the application of UmScanGalactiK to
5
The Brunner-Munzel test is a statistical test used to assess stochastic galaxies at inclinations of 60, 45, 30, and 20 degrees and the
equality of two samples without assuming that the shapes of the under- distributions of D/T on the bidimensional projections (top and
lying distributions are the same. middle panels respectively). In the first three cases, we find three
Article number, page 6 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Fig. 2. Results of UmScanGalactiK applied to the line-of-sight velocity maps of our galaxy sample at inclination 90 degrees. Top row panels:
HDBSCAN clusters in the UMAP bidimensional projection including the outliers of the method as grey dots (left panel), distributions of the
projected parameters ε (middle panel) and λR (right panel) on the projection. Second row panels: size of the clusters (left panel), PDFs of ε
(middle panel) and λR (right panel) for each cluster. Third row panels: distribution of three dimensional parameters D/T (left panel), T (middle
panel), and λB (right panel) on the projection. Bottom row panels: PDFs of D/T (left panel), T (middle panel), and λB (right panel) for each
cluster. Colorbar limits are fixed to 10th and 90th percentile of the variables.

clusters among which the central one, C2 (red), seems to have cording to the same criterion. The fraction of galaxies classified
notably less disc fraction. On the projection at 30 degrees, the as SRs increases with decreasing inclination, as seen in Fig. 5
division of C1 (green) and C2 (red) is less clear, albeit the clus- and Table 2. The fact that the rotation becomes less noticeable at
tering can identify the region with the lowest rotation. UmScan- lower inclinations can be also seen from their lower values of ε.
GalactiK cannot distinguish galaxies with the lowest D/T in a
particular group when galaxies are observed at angles below 20 When applied to the sample considered in this work, Um-
degrees. ScanGalactiK is robust enough to obtain a good classification
In Fig. 5 (bottom panels), we also show the λR -ε plane. It is of galaxies according to their level of rotation for inclinations
clear that the SRs region is dominated by galaxies from C2 in the greater or equal than 45 degrees. To be conservative regarding
first three cases, as expected. For inclinations of 90, 60, and 45 the application of this method, we consider inclination of 30 de-
degrees, C2 includes above 70 per-cent of the total SRs (accord- grees a borderline case, albeit the quantities on the projection at
ing to Eq. 1), whereas at 30 degrees this fraction decreases to 58 30 and 20 degrees still present clear trends for the sample stud-
per-cent. At 20 degrees, most galaxies are considered SRs ac- ied in this paper. This is encouraging taking into account that
Article number, page 7 of 20
A&A proofs: manuscript no. main

maps) for galaxies observed at 90 degrees as input to perform the


UMAP bidimensional projections.
In Fig. 6 we summarise the results of the application of Um-
ScanGalactiK to the set of kinematic maps mentioned above. As
can be seen from Fig. 6 the additional information brought in by
the velocity dispersion and flux allow UmScanGalactiK to clus-
ter galaxies in 5 different groups. C3 (orange) and C4 (red) pref-
erentially contain low rotation galaxies, including 80 per-cent
of SRs defined by Eq. 1. The distributions of the parameters de-
picted in Fig. 6 for these clusters with low-rotation galaxies show
clear differences in comparison with the other clusters, having
notably lower D/T , lower spin parameters, lower ε, and higher
T.
By focusing on the comparison of the distributions of pa-
rameters from C3 and C4, in Fig. 6 it can be seen clearly that the
triaxiality parameter distributions in C3 and C4 differ. By apply-
ing the Brunner-Munzel test, we conclude that the values of T in
C3 are significantly higher than in C4, which means that galax-
ies are more prolate in the former. Through the same statistical
test, it is possible to determine that D/T in C4 are systemati-
cally higher, but the differences between the medians of each
distribution is low (0.02). In contrast, no significant differences
among these two groups are seen regarding λB . This situation
is similar when analysing the projected parameters. There are
not significant differences in the values of λR , which is what we
expect due to the strong correlation of this parameter with λB .
On the other hand, the values of ε in C3 are lower and this is
consistent with the fact that galaxies in C3 are less dominated
by rotation and are seen, thus, rounder. Therefore, for edge-on
galaxies, including the information of dispersion and flux leads
to a refinement of low rotation groups based primarily on galaxy
shape. The addition of extra kinematic information results in an
unsupervised classification that takes into account shape as well
Fig. 3. Top panel: Clusters obtained by UmScanGalactiK when applied
to the line-of-sight velocity maps in which every galaxy has the same
as rotation leading to a more accurate and meaningful classifica-
rotation direction. We include the outliers of the method as grey dots. tion of SRs. This may be useful to assist qualitative predictions
Bottom panel: distribution of D/T on the projection. of galaxy features when applied to samples with missing infor-
mation on some parameters. Many efforts about the recovery of
3D shapes have been made recently, in particular those based on
our method relies on kinematic maps, which are sensitive to the IFU, due to the importance of intrinsic shapes in the study of
inclination of the sources. galaxies (Weijmans et al. 2014; Foster et al. 2017; Li et al. 2018;
Regarding the outliers, we make the same observation as in Ene et al. 2018). Bassett & Foster (2019) mention some of the
Sec 4.1: there is a continuous variation of the relevant properties difficulties these techniques face, such as the use of statistical
across the bidimensional embeddings that involves both, clus- methods to obtain distributions of three dimensional shapes, the
tered and not clustered galaxies. We can conclude this from the generalisation of theoretical models to different data sets, or the
middle panels of Fig. 5 and the strong correlation between D/T need of more tests of IFS shape measurement methods applied
and the other physical and projected properties. This pattern is to large samples. They suggest the use of ML methods. Our ap-
also seen in the following sections. We emphasise that galaxies proach, albeit qualitative, may be a direct way to predict shape
close to each other in the projection have similar properties. Um- parameters using the estimation of the distributions of those pa-
ScanGalactiK leverages this advantage of UMAP algorithm to rameters for galaxies in the same group and further exploration
obtain a more meaningful clustering. of the variation of galaxy shapes among clusters may be useful
to tackle this task.
The clusters C0 (violet), C1 (light blue), and C2 (green) also
present differences. From inspection of Fig. 6, it is clear that
5. Joint clustering of all kinematic maps types these clusters are populated by galaxies dominated by rotation,
5.1. Clustering of edge-on galaxies especially C0 and C1, and can be associated, thus, to FRs. The
differences between the distributions of parameters of C0 and C1
Our results indicate that clustering of only line-of-sight velocity are not clear from the figure. By applying the Brunner-Munzel
maps is sufficient to describe the differences between SRs and test, we find that the D/T in C0 are lower than those in C1 and
FRs. In this Section, we analyze the possibility of obtaining a the opposite behaviour is seen for T parameter. The second trend
meaningful clustering that captures more details about other fea- is significant6 but weak, having a p-value of 0.03. No significant
tures of galaxies, such as shape, by adding information also from differences between the values of λB and λR are found for galax-
the velocity dispersion and flux maps. Hence, we apply UmScan-
GalactiK as in the previous section, but first we concatenate the 6
We consider a level of significance of 0.05, which is commonly used
three types of kinematic maps (line-of-sight, dispersion, and flux in hypothesis testing.

Article number, page 8 of 20


M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Fig. 4. Left panel: λR − ε plane. Symbols are colored depicting clusters following the colour code of Fig. 2. Middle panel: λR − ε plane for galaxies
of C2 coloured by D/T . Right panel: λR − ε plane for galaxies of C0 and C1 coloured by D/T . Colorbar limits are fixed to 10th and 90th percentile
of the variables. We include in all cases figures the criterion in Eq. 1 depicted by the black solid lines.

Fig. 5. Top panels: HDBSCAN clusters in the UMAP bidimensional projection of the line-of-sight velocity maps of galaxies observed at 60, 45,
30, and 20 degrees including the outliers of the method as grey dots. Middle panels: distributions of D/T on each projection. Colorbar limits are
fixed to 10th and 90th percentile of the variables. Bottom panels: λR − ε plane. Symbols are colored depicting clusters following the same colour
code of the top panels. We include the criterion in Eq. 1 depicted by the black solid lines.

Table 2. Summary of the HDBSCAN clustering at different inclinations using the line-of-sight velocity maps. SRs are defined by Eq. 1 among
those for which λR can be calculated.

Numbers 90 degrees 60 degrees 45 degrees 30 degrees 20 degrees


Clusters 3 3 3 3 2
Clustered galaxies 2001 2061 2061 1941 1929
Outliers 63 3 3 123 135
Clustered galaxies with λR 1955 2016 2018 1891 1891
Clustered SRs with λR 441 613 673 747 1013

ies in C0 and C1 and galaxies in C0 tend to have lower ε than the number of galaxies populating C2 is much smaller than those
those in C1. The main difference between C0 and C1 is the rota- in C0 and C1.
tion orientation, in analogy to the situation presented in Sec. 4 for
high rotation galaxies. On the other hand, the distributions of the
parameters for C2 shown in Fig. 6 are consistent with galaxies In Fig. 7 we show the UMAP projections considered in this
with intermediate rotation between SRs and the FRs. However, Section (Fig. 6) coloured by the clusters obtained using only the
line-of-sight velocity maps studied in Sec. 4. It can be appre-
ciated that galaxies in C1 and C2 in this Section belong to C0
Article number, page 9 of 20
A&A proofs: manuscript no. main

Fig. 6. Results of UmScanGalactiK applied to all kinematic map types of our galaxy sample at inclination 90 degrees. Top row panels: HDB-
SCAN clusters in the UMAP bidimensional projection including the outliers of the method as grey dots (left panel), distributions of the projected
parameters ε (middle panel) and λR (right panel) on the projection. Second row panels: size of the clusters (left panel), PDFs of ε (middle panel)
and λR (right panel) for each cluster. Third row panels: distribution of three dimensional parameters D/T (left panel), T (middle panel), and λB
(right panel) on the projection. Bottom row panels: PDFs of D/T (left panel), T (middle panel), and λB (right panel) for each cluster. Colorbar
limits are fixed to 10th and 90th percentile of the variables.

of the clustering obtained in Sec 4.1, and have, thus, the same Based on the analysis performed in Sec. 4.2, we compare the re-
rotation orientation. sult of the method when applied to the set of all kinematic maps
From Fig. 7, it can be noticed that the “extra” clusters ob- types at inclinations 90, 60, 45, and 30 degrees. Table 3 sum-
tained with the information from all the kinematic maps types marises the main properties of each clustering.
are approximately subsets of the groups obtained using only the In the top and middle panels of Fig. 8 we show the clusters
line-of-sight velocity maps. Therefore, these findings show that formed in the bidimensional projections of the set of all kine-
the refined clustering maintains the information provided by the matic maps by the application of UmScanGalactiK and the same
line-of-sight velocity maps. projection coloured by the clustering analysed in Sec. 4.2, re-
spectively. For inclinations of 60 and 45 degrees (bottom panels),
5.2. Effects of the inclination there are well defined groups that present low rotation galaxies.
Around 70 per-cent of the SRs classified according to Eq. 1 are
In this Subsection, we study the robustness of UmScanGalac- included in these clusters at inclinations of 60 and 45 degrees,
tiK applying it to the kinematic maps with different inclinations. which is the same fraction observed in Sec 4.2. In both cases,
Article number, page 10 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Table 3. Summary of the HDBSCAN clustering at different inclinations using the three types of kinematic maps. SRs are defined by Eq. 1 among
those for which λR can be calculated.

90 degrees 60 degrees 45 degrees 30 degrees


Clusters 5 3 3 3
Clustered galaxies 2039 2062 2053 2064
Outliers 25 2 11 0
Clustered galaxies with λR 1993 2017 2010 2014
Clustered SRs with λR 461 615 670 838

6. Analysis of slow and fast rotators


In this Section, we assess the possibility of achieving more
meaningful clustering for galaxies with lower (greater) amounts
of rotation when considered them separately. We select as input
of our method the three types of kinematic map of galaxies with
λR,edge−on ≤ 0.2 and λR,edge−on > 0.2. Lagos et al. (2022) study
a sample of the same simulation used in our work at z = 0 and
mention that most galaxies below a threshold of 0.2 in λR,edge−on
have lower values of the spin parameter observed at any other
inclination (van de Sande et al. 2021). This threshold is also con-
sidered as a suitable separation in SRs and FRs in previous works
(Lagos et al. 2018; Rosito et al. 2019a).
In the first case, we find 609 galaxies in our sample with
λR,edge−on ≤ 0.2. This constraint in the edge-on projected spin
parameter includes 96 per-cent SRs according to Eq. 1. We ob-
tain three clusters and 35 outliers. In Fig. 9 we show an analysis
of the clustering of the galaxies below this threshold. It can be
seen that the separation in clusters in the UMAP projection is
not as clear as those in the previous sections and that there is a
continuous variation of the parameters among the three groups.
Since rotation in all these galaxies is low, in Fig. 10 we look in
detail the 3D shapes in the different groups to analyse a mean-
ingful clustering.
A more discy group, C0 (violet), can be seen from Fig. 9
which also has notably lower values of T and higher λB , b/a, and
b/a − c/a (Fig. 10) with respect to the other two groups. This is
consistent with the right panels of Fig. 10 in the sense that in the
plane T -b/a the region with low triaxiality parameters and high
b/a ratios is dominated by galaxies from C0. From that figure
it can also be seen that galaxies from C0 with the highest T and
the lowest b/a are the ones with the lowest D/T within that clus-
ter and hence we can associate this region with low amount of
Fig. 7. Top panel: UMAP projection obtained from all kinematic maps rotation. Moreover, there are differences between the projected
coloured according to the clustering obtained for the line-of-sight ve-
parameters in C0 and C1 (green) as can be appreciated from the
locity maps discussed in Sec. 4.1 following the colour code of Fig. 2.
Bottom panel: clustering in Fig. 6 is repeated for comparison. PDFs in Fig. 9, being these parameters lower in C1. However,
the comparisons between the distributions of λR and ε from C0
and those of C2 (red) are unclear. It can be seen by means of
the Brunner-Munzel test that ε from C0 have a weak trend (p-
value ∼ 0.03) of being lower than those of C2 and that there is
the distributions of D/T of the groups with low-rotation galaxies no systematic trends regarding λR . On the other hand, although
show differences with the other clusters, having notably lower C1 and C2 are both associated with galaxies in which the ro-
values. Differences can be appreciated at 30 degrees, at which tational component is almost negligible, the distributions of the
UmScanGalactiK using the only line-of-sight velocity can bet- parameters present systematic differences as seen from the PDFs
ter identify low rotation galaxies, but, as we mentioned above, in Fig. 9 and Fig. 10 and quantified through the Brunner-Munzel
this is considered a borderline case and is not reliable to make test. C1 has the lowest D/T , ε, λR , b/a, b/a−c/a, and the highest
robust conclusions. T . The spin parameter λB can be considered lower in C1 than in
C2, but this trend is weak (p-value 0.02).
Again, at inclinations greater or equal to 45 degrees, the ad- Lagos et al. (2022) mention the importance of the visual clas-
dition of other kinematic types maps maintains the information sification of the kinematic maps to mitigate the inaccuracies of
obtained by the clustering using solely velocity maps with the the parametric criteria based on the λR − ε plane (van de Sande
advantage that SRs may be more accurately described at high et al. 2021). Lagos et al. (2022) visually classify their galaxies
inclinations. Therefore, we suggest including all the information in the following groups: “flat slow rotators (FSRs)”, “round slow
to achieve better classifications. rotators (RSRs)”, “prolate”, “2σ”, “rotators”, and “unclear”. We
Article number, page 11 of 20
A&A proofs: manuscript no. main

Fig. 8. Top panels: HDBSCAN clusters in UMAP bidimensional projection of the set of all kinematic maps of galaxies observed at 60, 45, and
30 degrees including the outliers of the method as grey dots. Middle panels: UMAP projection obtained from all kinematic maps types coloured
according to the clustering obtained for the line-of-sight velocity maps discussed in Section 4 following the colour code of Fig. 5. Bottom panels:
distributions of D/T on each projection. Colorbar limits are fixed to 10th and 90th percentile of the variables.

show that we can perform automatically a classification similar Galaxies in C0 and C1 rotate in opposite directions and this is
to the visual classification in Lagos et al. (2022). what UmScanGalactiK can capture.
When using solely SRs as input, the method yields simi-
The low triaxiality parameter for galaxies in C0 indicates that lar conclusions to those in Sec. 5, in which galaxies with little
they are oblate and the fact that the major and middle axis have amount of rotation can be separated according to their shapes.
similar values but their difference with the minor axis is high ev- Despite having identified three groups instead of the two in
idences that these galaxies are flatter. The values of T are high Sec. 5.1 (C3 and C4), our clustering is weaker when the input
and b/a − c/a are low for C1 and C2 but the b/a ratios are signif- is not rich enough in variety of morphologies and kinematics.
icantly lower in C1, which means that C2 has rounder galaxies If only FRs are clustered, our method still identifies two clear
(axis of similar length) while galaxies in C1 are more prolate. groups differentiated by the orientation of galaxy rotation, as
Hence, we can associate C0, C1, and C2 with the FSR, prolate, happens when the whole sample is considered (Secs. 4.1 and
and RSR groups in Lagos et al. (2022) respectively. The frac- 5.1).
tion of galaxies in each group differs (see percentages in Fig. 9),
however, to those reported by Lagos et al. (2022), who find a
majority of FSR (48 per-cent) galaxies followed by RSRs (38 7. Mixing galaxies observed at different inclinations
per-cent) and prolate (10 per-cent) galaxies.
Throughout this work, we have clustered galaxies at a fixed incli-
The second group of galaxies, λR,edge−on > 0.2, consists on nation. Clustering kinematic maps from galaxies observed from
1409 galaxies and UmScanGalactiK yields a clustering depict- different angles is more realistic than the procedures described
ing very clear groups with no outliers. In Fig. 11 we show the in Sec. 4 and Sec. 5.
analysis of the two clusters obtained by our method. Very similar In this Section, we assess the applicability and the robustness
distributions of the parameters can be appreciated from the fig- of UmScanGalactiK when mixing galaxies at different inclina-
ure. Due to the high p-values of the two-sided Brunner-Munzel tions. We use all kinematic map types, as in Sec. 5. The input of
tests (above 0.5) in all cases, we conclude that are no significant the method is the set of the kinematic maps constructed for each
trends for the parameters in one cluster to be greater than those galaxy from different inclinations. Due to the aforementioned
in the other. Furthermore, the sizes of both groups are similar. reasons related to the reliability of the method as a function of
Article number, page 12 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Fig. 9. Results of UmScanGalactiK applied to all the kinematic maps types of galaxies with λR,edge−on ≤ 0.2 observed at 90 degrees. Top row
panels: HDBSCAN clusters in the UMAP bidimensional projection including the outliers of the method as grey dots (left panel), distributions of
the projected parameters ε (middle panel) and λR (right panel) on the projection. Second row panels: size of the clusters (left panel), PDFs of ε
(middle panel) and λR (right panel) for each cluster. Third row panels: distributions of D/T (left panel), T (middle panel), and λB (right panel) on
the projection, Bottom row panels: PDFs of D/T (left panel), T (middle panel), and λB (right panel) for each cluster. Colorbar limits are fixed to
10th and 90th percentile of the variables.

inclination, we consider angles of 90, 60 and 45 degrees. Hence, In Fig. 12 we show the clustering obtained with the input
each galaxy is considered three times. studied in this Section and a summary of the properties of its
galaxies. We include physical and observational properties, be-
ing the latter sensitive to the angle of observation. From the fig-
7.1. Clustering of galaxies from different inclinations ure, it can be appreciated a division of galaxies with low level of
We obtain a “supersample” that includes three different inclina- rotation (C4 and C5) and faster rotators (C0, C1, C2, and C3), as
tions (90, 60, and 45 degrees) for each galaxy and consists, thus, occurs when the inclination is fixed.
on 2064 × 3 = 6192 elements. Therefore, we need to project
6192 high dimensional points depicting the kinematic maps of Low rotation clusters present differences in the distributions
each galaxy. After the application of the method, we obtain six of the parameters. By means of the Brunner-Munzel test, we find
clusters with 229 outliers, among which there are 153 distinct that galaxies in C4 have significantly lower D/T , lower λB , lower
galaxies. The number of galaxies appearing at least once in the λR , lower ε, and higher T than those from C5, which is consistent
set of clusters is 2042. with having a smaller disc component.
Article number, page 13 of 20
A&A proofs: manuscript no. main

Fig. 10. Analysis of the 3D axis of galaxies with λR,edge−on ≤ 0.2. Top panels: distributions of b/a (left panel) and b/a − c/a (middle panel) in the
UMAP projection of the kinematic maps depicted in Fig. 9, T vs. b/a coloured according to the clusters following the colour code of Fig 9 (right
panel). Bottom panels: PDFs of b/a (left panel) and b/a − c/a (middle panel), and T vs. b/a for C0 coloured according to D/T (right panel).
Colorbar limits are fixed to 10th and 90th percentile of the variables.

The differences between the parameter distribution from C1 GalactiK is robust even if the inclinations in the range between
and C3 are less noticeable by inspection of Fig. 12. Statistically, 45 deg and 90 deg are mixed.
Brunner-Munzel test shows that disc fractions in C1 are greater The sizes of C0 and C2 are notably smaller than those of the
than those in C3, but the trend is weak, since the p-value is 0.01. other clusters as can be seen from Table 4 and from the bar plot
Similarly to Sec. 5, the smallest clusters C0 and C2 present pa- in Fig. 12. In both clusters, the number of elements depicts the 5
rameters with intermediate values between SRs and FRs. The per-cent of the total input size. These clusters would be joined to
groups C0+C1 and C2+C3 are populated by galaxies rotating C1 and C3, respectively, if we applied a more restrictive condi-
with opposite directions. tion regarding the minimum cluster size. Moreover, galaxies on
We show that when considering galaxies observed at differ- those FRs clusters share similar properties in terms of shape and
ent high inclinations, UmScanGalactiK is still useful and the kinematic parameters, as mentioned above. If that was the case,
clustering meaningful, yielding similar conclusions to those of we would obtain two larger clusters of sizes 2198 (C0 + C1) and
the edge-on galaxies study. 2350 (C2 + C3), from which 94 per-cent and 89 per-cent of their
galaxies correctly classified regardless of their inclination. In ad-
dition, taking into account that, as the inclination decreases, rota-
7.2. On the robustness of our method tion is less noticeable and our method is then not able to separate
In this Subsection, we check that, in most cases, the different SRs according to shape in different groups, as seen in Sec. 5.2,
instances of the same galaxy (that is, the kinematic maps for joining C4 and C5 in a single cluster (C4 + C5) would also be
each galaxy observed at 90, 60, and 45 degrees) are clustered in reasonable when analysing the robustness of UmScanGalactiK .
the same group after the application of UmScanGalactiK . We We would obtain 78 per-cent of properly classified SRs galaxies
detail in Table 4 the number of elements within each group with in C4 + C5. Following this idea, by adding the number of ro-
the number of repetitions for each distinct galaxy. bustly classified galaxies in C0 + C1, C2 + C3, and, C4 + C5,
the aforementioned bound increases to 90 per-cent.
As stated in Table 4, in C1 and C3 the vast majority of the
galaxies appears three times in the same cluster, that is, they re-
ceive the same classification regardless of the inclination. The
8. Conclusions
situation is similar in the low rotation clusters, C4 and C5: al-
though the fraction of robustly classified galaxies is lower, it is In this work, we define clear steps to perform an unsupervised
still high. In C0 and C2, these fractions of galaxies repeated three kinematic morphology classification of a sample of galaxies
times are the lowest. That means that galaxies on those groups from the eagle simulation (Schaye et al. 2015; Crain et al. 2015).
are more susceptible to inclinations effects. Since the input of UmScanGalactiK are kinematic maps mim-
The result mentioned above yields a lower bound for robustly ics IFS observations, it may be applied either to observational
classified galaxies. We can affirm that 1640 out of 2064 galax- data sets or to other simulations. These maps are computed at
ies (80 per-cent) are robustly classified by UmScanGalactiK . different inclinations. We project the maps (considered as high
This fraction may be higher if we take into account that not all dimensional points) to a bidimensional space using the UMAP
2064 galaxies have been clustered and that we are not consid- algorithm (McInnes et al. 2018) followed by the application of
ering galaxies repeated twice in a particular cluster. This value the HDBSCAN clustering (Campello et al. 2013). We identify
is conservative but it is good enough to conclude that UmScan- the advantages and limitations of this methodology and conclude
Article number, page 14 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Fig. 11. Results of UmScanGalactiK applied to all the kinematic maps types of galaxies with λR,edge−on > 0.2 observed at 90 degrees. Top row
panels: HDBSCAN clusters in the UMAP bidimensional projection (left panel), distributions of the projected parameters ε (middle panel) and λR
(right panel) on the projection. Second row panels: size of the clusters (left panel), PDFs of ε (middle panel) and λR (right panel) for each cluster.
Third row panels: distributions of D/T (left panel), T (middle panel), and λB (right panel) on the projection, Bottom row panels: PDFs of D/T
(left panel), T (middle panel), and λB (right panel) for each cluster. Colorbar limits are fixed to 10th and 90th percentile of the variables.

that it may be useful for defining unsupervised classifications of eters (λR and ε), since it provides a clearer separation re-
galaxies that capture their main morphological properties based garding the rotational component. Furthermore, galaxies in
on physical parameters. clusters associated with high rotation are separated by the
The main results of this study are: direction of rotation. We show that this is an advantage of
UmScanGalactiK in the sense that it allows the extraction of
– Line-of-sight velocity maps as input are sufficient to perform this information without introducing a false dimension that
a good classification in SRs and FRs for edge-on galaxies. may leads to underperformance of the classification. Albeit
Most galaxies in the cluster with the lowest values of D/T not relevant to explain the kinematic morphologies of galax-
are classified as SRs according to the Eq. 1 (van de Sande ies, this information could be useful to study the relative ro-
et al. 2021), as shown in Fig. 4. However, there is a frac- tation between galaxies.
tion of these galaxies that would be considered FRs using – By adding the velocity dispersion and flux maps to the anal-
this parametric criterion despite having low disc fractions. ysis, edge-on galaxies with low rotation are divided in two
Therefore, UmScanGalactiK may be considered more accu- groups (C3 and C4) which differ mainly in the triaxiality
rate than those based on the observational projected param- parameter yielding a refinement of the low rotation cluster
Article number, page 15 of 20
A&A proofs: manuscript no. main

Fig. 12. Results of UmScanGalactiK applied to all the kinematic maps types of our galaxy sample mixing three different inclinations including
the outliers of the method as grey dots. Top row panels: HDBSCAN clusters in the UMAP bidimensional projection (left panel), distributions of
the projected parameters ε (middle panel) and λR (right panel) on the projection. Second row panels: size of the clusters (left panel), PDFs of ε
(middle panel) and λR (right panel) for each cluster. Third row panels: distributions of D/T (left panel), T (middle panel), and λB (right panel) on
the projection, Bottom row panels: PDFs of D/T (left panel), T (middle panel), and λB (right panel) for each cluster. Colorbar limits are fixed to
10th and 90th percentile of the variables.

found in Sec. 4 (Fig. 6). Using hypothesis testing, we also maps preserves the information provided by the velocity
find systematic differences in the values of D/T and ε from maps. Since the use of all kinematic map types yields a
C3 and C4, and no significant difference regarding the spin more detailed description of low rotation edge-on galaxies,
parameters. Among the clusters with more rotating galaxies we suggest including them when performing morphological
(C0, C1, and C2), C0 and C1 differ mainly in the rotation classification using UmScanGalactiK .
orientation. The distributions of the parameters for galaxies – Due to the fact that the input data considered for our anal-
in C2 present graphically noticeable differences with respect ysis consist on IFU kinematic maps, which are sensitive to
to the others, but its size is too small to draw a robust con- the angle of observation, UmScanGalactiK is useful when
clusion. galaxies are observed at inclinations at which rotation can
– Figs. 7 and 8 show that at inclinations greater or equal than be clearly distinguished. When the inclination is 30 degrees,
45 degrees, the clusters obtained in Sec. 5 can be consid- the fraction of SRs (according to the parameter definition)
ered subsets of the groups obtained in the analysis performed identified in the low rotation cluster decreases significantly
in Sec. 4. Therefore, the addition of the dispersion and flux with respect to higher inclinations and the division in clus-
Article number, page 16 of 20
M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Table 4. Number of galaxies and repetitions in each group obtained when applying UmScanGalactiK to the mixture of the kinematic maps for the
same galaxies observed at three different inclinations. We include the percentage of well classified galaxies.

C0 C1 C2 C3 C4 C5
Cluster size 308 1890 303 2047 528 887
N◦ of distinct galaxies 154 672 174 749 224 369
N◦ of galaxies not repeated 65 29 102 39 53 73
N◦ of galaxies repeated twice 24 68 15 122 38 74
N◦ of galaxies repeated three times 65 (42 per-cent) 575 (86 per-cent) 57 (33 per-cent) 588 (79 per-cent) 133 (59 per-cent) 222 (60 per-cent)

ters is less clear. We adopt a conservative criterion preferring tribute to scrutinize large data sets to extract information with
angles greater to or equal than 45 degrees. physical meaning.
– When applying UmScanGalactiK exclusively to slow rotat- Acknowledgements. PBT acknowledges partial funding by Fondecyt
ing galaxies, we can identify groups with different shapes 1200703/2020 (ANID), ANID Basal Project FB210003, and Nucleo
However, the separation between groups in the UMAP pro- Milenio ANID ERIS. SP acknowledges support through PIP CONICET
jection is not as clear as when galaxies with a variety of mor- 11220170100638CO.
phologies and kinematics are clustered together. On the other
hand, the method divides a sample of FRs according to the
rotation orientation and the parameter distributions present References
no significant differences among the two groups.
Abraham, R. G., van den Bergh, S., & Nair, P. 2003, ApJ, 588, 218
– When the input consists on galaxies observed at different in- Artale, M. C., Pedrosa, S. E., Tissera, P. B., Cataldi, P., & Di Cintio, A. 2019,
clinations, UmScanGalactiK still clearly differentiates SRs A&A, 622, A197
and FRs. The former are divided in two groups, as happens Bacon, R., Copin, Y., Monnet, G., et al. 2001, MNRAS, 326, 23
Bahé, Y. M., Barnes, D. J., Dalla Vecchia, C., et al. 2017, MNRAS, 470, 4186
when only edge-on galaxies are considered (Sec. 5.1). Galax- Ball, N. M. & Brunner, R. J. 2010, International Journal of Modern Physics D,
ies with large amount of rotation are separated according to 19, 1049
the direction of rotation. Furthermore, we find clusters with Baron, D. 2019, arXiv e-prints, arXiv:1904.07248
intermediate parameters between SRs and FRs, albeit these Bassett, R. & Foster, C. 2019, MNRAS, 487, 2354
Bignone, L. A., Pedrosa, S. E., Trayford, J. W., Tissera, P. B., & Pellizza, L. J.
clusters are very small in comparison to the others. An input 2020, MNRAS, 491, 3624
that mixes galaxies observed at different inclinations is more Binney, J. 1978, MNRAS, 183, 501
realistic and the fact that our method still obtains a meaning- Bois, M., Emsellem, E., Bournaud, F., et al. 2011, MNRAS, 416, 1654
ful clustering is encouraging. Furthermore, UmScanGalac- Bongiovanni, Á., Ramón-Pérez, M., Pérez García, A. M., et al. 2019, A&A, 631,
A9
tiK is robust with respect to the inclinations of the input Brough, S., van de Sande, J., Owers, M. S., et al. 2017, ApJ, 844, 59
galaxies. It can be seen that in at least ∼ 80 per-cent of the Brunner, E. & Munzel, U. 2000, Biometrical Journal, 42, 17
cases, the different instances of each galaxy are classified in Bruzual, G. & Charlot, S. 2003, MNRAS, 344, 1000
the same group. If we apply a more restrictive clustering al- Bryant, J. J., Owers, M. S., Robotham, A. S. G., et al. 2015, MNRAS, 447, 2857
Bullock, J. S., Dekel, A., Kolatt, T. S., et al. 2001, ApJ, 555, 240
gorithm, this fraction can increase to ∼ 90 per-cent. When Bundy, K., Bershady, M. A., Law, D. R., et al. 2015, ApJ, 798, 7
allowing finer clustering, our method produces some mixing Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowl-
within groups, but is still able to clearly separate SRs and edge Discovery and Data Mining, ed. J. Pei, V. S. Tseng, L. Cao, H. Motoda,
FRs. & G. Xu (Berlin, Heidelberg: Springer Berlin Heidelberg), 160–172
Cappellari, M. 2008, MNRAS, 390, 71
Cappellari, M. 2016, ARA&A, 54, 597
The methodology presented in this paper has proven to be Cappellari, M. & Copin, Y. 2003, MNRAS, 342, 345
accurate for a sample of mock galaxies and has succeeded in Cataldi, P., Pedrosa, S. E., Tissera, P. B., & Artale, M. C. 2021, MNRAS, 501,
5679
identifying their intrinsic properties and their projected param- Chabrier, G. 2003, PASP, 115, 763
eters. The need for observing galaxies at high inclinations can Chadha, K. S. 2007, Astronomy Now, 21, 28
be mitigated by predicting how the kinematic maps of galax- Cheng, T.-Y., Huertas-Company, M., Conselice, C. J., et al. 2021, MNRAS, 503,
ies observed in an arbitrary inclination would be seen edge-on. 4446
Chisari, N., Codis, S., Laigle, C., et al. 2015, MNRAS, 454, 2736
Dynamical models (Cappellari 2008) have been used for similar Combes, F. 2009, in Astronomical Society of the Pacific Conference Series, Vol.
tasks in the last decade. 419, Galaxy Evolution: Emerging Insights and Future Challenges, ed. S. Jo-
The next stage is to asses its applicability to observational gee, I. Marinova, L. Hao, & G. A. Blanc, 31
data sets and to new generation simulations. Testing UmScan- Conselice, C. J. 2003, ApJS, 147, 1
Conselice, C. J. 2014, ARA&A, 52, 291
GalactiK on mock inputs that are more fairly comparable to Correa, C. A., Schaye, J., Clauwens, B., et al. 2017, MNRAS, 472, L45
observations would be an enriching and necessary step before Crain, R. A., Schaye, J., Bower, R. G., et al. 2015, MNRAS, 450, 1937
applying it to real observations. We can obtain more diverse in- Dalla Vecchia, C. & Schaye, J. 2012, MNRAS, 426, 140
put samples by choosing different distances and inclination an- Davies, R. L., Efstathiou, G., Fall, S. M., Illingworth, G., & Schechter, P. L.
1983, ApJ, 266, 41
gles to build the SimSpin data cubes besides changing the user- de Diego, J. A., Nadolny, J., Bongiovanni, Á., et al. 2020, A&A, 638, A134
specified point spread function (Harborne et al. 2020). There are de Vaucouleurs, G. 1948, Annales d’Astrophysique, 11, 247
other considerations to keep in mind when dealing IFS such as Deng, X.-F. 2013, Research in Astronomy and Astrophysics, 13, 651
the need of a minimum signal-to-noise ratio to measure line-of- Djorgovski, S. & Davis, M. 1987, ApJ, 313, 59
Dolag, K., Borgani, S., Murante, G., & Springel, V. 2009, MNRAS, 399, 497
sight velocity. Voronoi tessellations binning (Cappellari & Copin Dubois, Y., Peirani, S., Pichon, C., et al. 2016, MNRAS, 463, 3948
2003) can be used for unbiased recovery of this kinematic in- Emsellem, E., Cappellari, M., Krajnović, D., et al. 2011, MNRAS, 414, 888
formation and it may be interesting to explore methods like the Emsellem, E., Cappellari, M., Krajnović, D., et al. 2007, MNRAS, 379, 401
procedure proposed by Walo-Martín et al. (2020). Ene, I., Ma, C.-P., Veale, M., et al. 2018, MNRAS, 479, 2810
Fisher, D. B. & Drory, N. 2008, AJ, 136, 773
ML algorithms are becoming mandatory to study astronomy Foster, C., van de Sande, J., D’Eugenio, F., et al. 2017, MNRAS, 472, 966
and the use of unsupervised techniques has the potential to con- Graham, M. T., Cappellari, M., Li, H., et al. 2018, MNRAS, 477, 4711

Article number, page 17 of 20


A&A proofs: manuscript no. main

Greene, J. E., Leauthaud, A., Emsellem, E., et al. 2018, ApJ, 852, 36 Tissera, P. B. & Dominguez-Tenreiro, R. 1998, MNRAS, 297, 177
Haardt, F. & Madau, P. 2001, in Clusters of Galaxies and the High Redshift Tissera, P. B., Machado, R. E. G., Sanchez-Blazquez, P., et al. 2016a, A&A, 592,
Universe Observed in X-rays, ed. D. M. Neumann & J. T. V. Tran, 64 A93
Harborne, K. E., Power, C., & Robotham, A. S. G. 2020, PASA, 37, e016 Tissera, P. B., Pedrosa, S. E., Sillero, E., & Vilchez, J. M. 2016b, MNRAS, 456,
Harborne, K. E., Power, C., Robotham, A. S. G., Cortese, L., & Taranu, D. S. 2982
2019, MNRAS, 483, 249 Tissera, P. B., Rosas-Guevara, Y., Bower, R. G., et al. 2019, MNRAS, 482, 2208
Hocking, A., Geach, J. E., Sun, Y., & Davey, N. 2018, MNRAS, 473, 1108 Tissera, P. B., White, S. D. M., & Scannapieco, C. 2012, MNRAS, 420, 255
Howard, E. M. 2017, in Astronomical Society of the Pacific Conference Series, Tohill, C., Ferreira, L., Conselice, C. J., Bamford, S. P., & Ferrari, F. 2021, ApJ,
Vol. 512, Astronomical Data Analysis Software and Systems XXV, ed. N. P. F. 916, 4
Lorente, K. Shortridge, & R. Wayth, 245 Tonini, C., Mutch, S. J., Croton, D. J., & Wyithe, J. S. B. 2016, MNRAS, 459,
Hubble, E. P. 1926, ApJ, 64, 321 4109
Illingworth, G. 1977, ApJ, 218, L43 Uzeirbegovic, E., Geach, J. E., & Kaviraj, S. 2020, MNRAS, 498, 4021
Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 van de Sande, J., Vaughan, S. P., Cortese, L., et al. 2021, MNRAS, 505, 3078
Jesseit, R., Cappellari, M., Naab, T., Emsellem, E., & Burkert, A. 2009, MNRAS, van de Ven, G. & van der Wel, A. 2021, ApJ, 914, 45
397, 1202 Veale, M., Ma, C.-P., Thomas, J., et al. 2017, MNRAS, 464, 356
Katz, H., Lelli, F., McGaugh, S. S., et al. 2016, MNRAS, 466, 1648 Vika, M., Vulcani, B., Bamford, S. P., Häußler, B., & Rojas, A. L. 2015, A&A,
Khalifa, N. E. M., Taha, M. H. N., Hassanien, A. E., & Selim, I. M. 2017, arXiv 577, A97
e-prints, arXiv:1709.02245 Walo-Martín, D., Falcón-Barroso, J., Dalla Vecchia, C., Pérez, I., & Negri, A.
Kim, Y., Telea, A. C., Trager, S. C., & Roerdink, J. B. T. M. 2021, arXiv e-prints, 2020, MNRAS, 494, 5652
arXiv:2110.00317 Weijmans, A.-M., de Zeeuw, P. T., Emsellem, E., et al. 2014, MNRAS, 444, 3340
Kimm, T., Haehnelt, M., Blaizot, J., et al. 2018, MNRAS, 475, 4617 Wiersma, R. P. C., Schaye, J., & Smith, B. D. 2009, MNRAS, 393, 99
Kormendy, J. & Bender, R. 2012, ApJS, 198, 2 Yoon, Y. & Im, M. 2020, ApJ, 893, 117
Kremer, J., Stensbo-Smidt, K., Gieseke, F., Steenstrup Pedersen, K., & Igel, C.
2017, arXiv e-prints, arXiv:1704.04650
Lagos, C. d. P., Emsellem, E., van de Sande, J., et al. 2022, MNRAS, 509, 4372
Lagos, C. d. P., Schaye, J., Bahé, Y., et al. 2018, MNRAS, 476, 4327
Lahav, O., Naim, A., Buta, R. J., et al. 1995, Science, 267, 859
Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, arXiv e-prints, arXiv:1110.3193
Li, H., Mao, S., Cappellari, M., et al. 2018, MNRAS, 476, 1765
Lotz, J. M., Primack, J., & Madau, P. 2004, AJ, 128, 163
LSST Science Collaboration, Abell, P. A., Allison, J., et al. 2009, arXiv e-prints,
arXiv:0912.0201
Marin, M., Sucar, L., Gonzalez, J., & Diaz, R. 2013, 438
McAlpine, S., Helly, J. C., Schaller, M., et al. 2016, Astronomy and Computing,
15, 72
McInnes, L., Healy, J., & Melville, J. 2018, arXiv e-prints, arXiv:1802.03426
Mittal, A., Soorya, A., Nagrath, P., & Hemanth, D. J. 2019, Earth Science Infor-
matics, 13, 601
Naab, T., Oser, L., Emsellem, E., et al. 2014, MNRAS, 444, 3357
Papovich, C., Giavalisco, M., Dickinson, M., Conselice, C. J., & Ferguson, H. C.
2003, ApJ, 598, 827
Pedrosa, S. E. & Tissera, P. B. 2015, A&A, 584, A43
Peebles, P. J. E. 1969, ApJ, 155, 393
Penoyre, Z., Moster, B. P., Sijacki, D., & Genel, S. 2017, MNRAS, 468, 3883
Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2014a, A&A, 571, A1
Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2014b, A&A, 571, A16
Portillo, S. K. N., Parejko, J. K., Vergara, J. R., & Connolly, A. J. 2020, AJ, 160,
45
Raddick, J., Lintott, C. J., Schawinski, K., et al. 2007, in American Astronomical
Society Meeting Abstracts, Vol. 211, American Astronomical Society Meet-
ing Abstracts, 94.03
Reis, I., Rotman, M., Poznanski, D., Prochaska, J. X., & Wolf, L. 2021, Astron-
omy and Computing, 34, 100437
Rosas-Guevara, Y. M., Bower, R. G., Schaye, J., et al. 2015, MNRAS, 454, 1038
Rosito, M. S., Pedrosa, S. E., Tissera, P. B., et al. 2018, A&A, 614, A85
Rosito, M. S., Pedrosa, S. E., Tissera, P. B., et al. 2021, A&A, 652, A44
Rosito, M. S., Tissera, P. B., Pedrosa, S. E., & Lagos, C. D. P. 2019a, A&A, 629,
L3
Rosito, M. S., Tissera, P. B., Pedrosa, S. E., & Rosas-Guevara, Y. 2019b, A&A,
629, A37
Sánchez, S. F., Kennicutt, R. C., Gil de Paz, A., et al. 2012, A&A, 538, A8
Sánchez Almeida, J., Aguerri, J. A. L., Muñoz-Tuñón, C., & de Vicente, A. 2010,
ApJ, 714, 487
Sánchez Almeida, J. & Allende Prieto, C. 2013, ApJ, 763, 50
Sarmiento, R., Huertas-Company, M., Knapen, J. H., et al. 2021, ApJ, 921, 177
Scannapieco, C., Tissera, P. B., White, S. D. M., & Springel, V. 2008, MNRAS,
389, 1137
Schaye, J., Crain, R. A., Bower, R. G., et al. 2015, MNRAS, 446, 521
Schaye, J. & Dalla Vecchia, C. 2008, MNRAS, 383, 1210
Schulze, F., Remus, R.-S., Dolag, K., et al. 2018, MNRAS, 480, 4636
Scott, N., van de Sande, J., Croom, S. M., et al. 2018, MNRAS, 481, 2299
Selim, I., Arabi, E., & [Link], B. 2016, International Journal of Computer Appli-
cations, 137, 4
Sérsic, J. L. 1968, Atlas de galaxias australes
Springel, V., Di Matteo, T., & Hernquist, L. 2005, MNRAS, 361, 776
Springel, V., Yoshida, N., & White, S. D. M. 2001, New A, 6, 79
Storrie-Lombardi, M. C., Lahav, O., Sodre, L., J., & Storrie-Lombardi, L. 1992,
in American Astronomical Society Meeting Abstracts, Vol. 181, American
Astronomical Society Meeting Abstracts, 65.08
Tissera, P. B. 2000, ApJ, 534, 636

Article number, page 18 of 20


M. S. Rosito et al.: Unsupervised algorithms for the classification of galaxies

Appendix A: Correlation between λ B and λ R


In this work, we have analysed the distributions of two spin pa-
rameters: λB (Bullock et al. 2001) and λR (Emsellem et al. 2007)
for galaxies observed at different inclinations. We remark that
the former is obtained from the particle 3D distributions and
therefore it does not depend on the inclination. This parameter
(λB ) plays a role in the description of the variation of the spe-
cific angular momentum as a function of the halo mass. Bullock
et al. (2001) propose it as an alternative definition of the spin pa-
rameter generalising that used before in the literature (e.g. Pee-
bles 1969). On the other hand, the spin parameter defined by
Emsellem et al. (2007), λR , despite also quantifying angular mo-
mentum, is computed from bidimensional kinematic information
and observational descriptions of rotation and shape based on it
are more accurate than those obtained from anisotropy diagrams
(Illingworth 1977; Binney 1978; Davies et al. 1983). Harborne
et al. (2019) find an approximate functional relation between
these two parameters from a set of N-body realization of galax-
ies.
We study the monotonic (increasing) relationship between
these parameters by means of the Spearman correlation coeffi-
cient. In Table A.1 we show these coefficients computed at dif-
ferent inclinations. Although in all cases the low p-values (∼ 0)
indicate a significant relationship, it can be appreciated that the
coefficients decrease with the angle. This is related with the fact
that the rotation becomes less noticeable at low inclinations.
Figure A.1 shows λB as a function of ε instead of the λR − ε
plane widely used to separate SRs and FRs. We measure ε pro-
jected at different inclinations. It can be appreciated that at low
inclinations galaxies are seen rounder than they would at high
inclinations while λB remains the same. However, the correla-
tion between both spin parameters may justify the use of λB in-
stead of λR to analyse the relation between rotation and shape.
Because of the differences in the Spearman coefficients, the anal-
ysis would be more reliable at higher inclinations.
Fig. A.1. λB as a function of ε where ellipticities are computed at dif-
Table A.1. Spearman coefficients between λB and λR computed at dif- ferent inclinations. The decrease in the values of ε with decreasing in-
ferent inclinations. clination indicates that galaxies are seen rounder when observed at low
inclinations.
Inclination Spearman coefficient
90 deg 0.919
60 deg 0.907
45 deg 0.886
30 deg 0.833
20 deg 0.738
10 deg 0.490
0 deg 0.112

Article number, page 19 of 20


A&A proofs: manuscript no. main

Appendix B: Dimensionality of the output space we find a large number of outliers, especially those in which we
observe more than three clusters. In general, we find that using
In this Appendix, we explore the intrinsic dimensionality of the ten components instead of two does not improve the quality of
problem and discuss how the method would be impacted if more the clustering significantly or make it more meaningful or ro-
than two dimensions are considered in the clustering algorithm. bust. As mentioned in Sec. 3, the selection of (hyper)parameters
is not straightforward and a larger number of dimensions ap-
B.1. Analysis of the number of dimensions pears to make the clustering more sensitive to small changes of
(hyper)parameter values.
Throughout this work, we focus on bidimensional projections Although it would be worthwhile to explore the potential im-
on which we applied the HDBSCAN clustering algorithm. It is provements we would obtain by clustering higher dimensional
reasonable to be concerned about the loss of information when embeddings, the method we present here leads to meaningful
reducing very high dimensional data to only two components. classifications by using bidimensional projections. This explo-
However, we show that UmScanGalactiK is able to produce ration shall be analysed in a future work and may be helpful in
galaxy groups that are meaningful without using labels. Further- data leveraging and generalization of UmScanGalactiK .
more, classical works about galaxy kinematics (e.g. Emsellem
et al. 2007, 2011; Cappellari 2016; van de Sande et al. 2021)
divide SRs and FRs using only two dimensions (ε and λR ) and
those classifications are proven to be insightful for subsequent
studies.
Because UMAP lacks the interpretability of the dimensions
of the embeddings, as well as a quantification of the explained
variance (McInnes et al. 2018), we turn to PCA to obtain an
approximation of the information retained by the dimensional-
ity reduction process. PCA has been previously used to study
galaxy morphology. For instance, Uzeirbegovic et al. (2020) ap-
plied PCA to galaxy images and found that 85 per-cent of the
variance could be explained using just two components.
In Table B.1 we show the percentage of variance that a
PCA algorithm would explain for the line-of-sight velocity maps
(Sec. 4) and the set of all the types of kinematic maps (Sec. 5).
This linear algorithm leads to non-negligible information loss
with the explained variance ranging from 47.6 per-cent to 68.6
per-cent. However, a non-linear method such as UMAP outper-
forms other algorithms in the preservation of global structure and
may lead to better performance compared to PCA, for example
when using classifiers such as k-nearest neighbours (McInnes
et al. 2018).

Table B.1. Analysis of the explained variance (% var) for two and ten
principal components (PCs) and number of PCs needed to explain 75
per-cent of the variance for the PCA algorithm applied to the sample
of (a) the line-of-sight velocity maps (Experiment 1) and (b) the three
types of kinematic maps (Experiment 2) for different inclinations.

Experiment 1 Experiment 2
90° 60° 45° 90° 60° 45°
% var
47.6% 55.3% 50.8% 60.1% 60.5% 59.0%
2 PCs
% var
50.5% 57.6% 53.0% 67.1% 68.6% 67.9%
10 PCs
N◦ PCs
to explain 141 121 146 98 81 89
75% var

B.2. Clustering on a 10-dimensional UMAP embedding


To explore the effects of using a larger number of dimensions for
the clustering part of this work, we conduct experiments using
10-dimensional embeddings. In Fig. B.1, we show the bidimen- Fig. B.1. Bidimensional UMAP projection coloured according to HDB-
sional projections computed in Sec. 4.1 coloured by the HDB- SCAN clustering applied to a 10 dimensional UMAP embdedding for
SCAN clustering of the 10-dimensional UMAP embeddings. the set of line-of-sight velocity maps of edge-on galaxies (Sec. 4.1)
We use a small (hyper)parameter grid that includes the ones including the outliers as grey dots. We also include the values of
utilized for our analysis (Table 1, Experiment 1). We modify n_neighbors and min_dist as well as the number of outliers.
n_neighbors, min_dist, and set n_components = 10. In all cases,
Article number, page 20 of 20

You might also like