localbiplot package

local biplot

class localbiplot.GMDOutput[source]

Bases: object

class localbiplot.LocalBiplot(X, labels=None, perplexity=None, red='tsne', sca='minmax', random_seed=123)[source]

Bases: object

Object for data analysis using linear and non-linear Biplots obtained by SVD decomposition and a Generalized SVD decomposition .

This class implements a set of functions for data analysis, including scaling, dimensionality reduction, kernel calculation, and biplots computation and display.

X

Input matrix of shape N x P.

Type:

pd.dataframe

labels

Labels for the samples (default is None).

Type:

array-like, optional

perplexity

Perplexity for t-SNE (default is calculated as the square root of N).

Type:

int or None, optional

red

Dimensionality reduction method (‘tsne’ by default).

Type:

{‘tsne’, ‘pca’, ‘umap’}, default is ‘tsne’

sca

Data scaling method (‘minmax’ by default).

Type:

{‘minmax’}, default is ‘minmax’

random_seed

Seed for result reproducibility.

Type:

int, default is 123

data_scaler(X, feature_range=(0, 1))[source]

Scale the data using MinMaxScaler if ‘sca’ is set to ‘minmax’.

reduce_dimensions(X)[source]

Reduce the dimensionality of the data using t-SNE, PCA, or UMAP.

krbf(X)

Calculate the Radial Basis Function (RBF) kernel matrix for the input data.

center_kernel(K)

Center a given kernel matrix using the Kernel Centering method.

laplacian_score(X, K, tol=1e-10)

Calculate the Laplacian score for a given dataset and kernel matrix.

lnkbp_()

Process and analyze the data through steps such as scaling, dimensionality reduction, kernel calculations, and Laplacian Score computation.

localbp_(X_)

Perform a local biplot operation on the scaled data (currently commented out).

laplacian_score(X, K, tol=1e-10)

Calculate the Laplacian score for a given dataset and kernel matrix

GMD(X, H, Q, K)

Generalized Matrix Decomposition method (power method) for a given dataset and kernel matrices.

biplot_gmd_body(fit, index=None, names=None, sample_col='grey50', sample_pch=19, arrow_col='orange', arrow_cex=1)

Generate a GMD-biplot based on generalized matrix decomposition results.

plot_lnkbp_(hue, c, figsize=(25, 10))

Plot various visualizations, including scatter plots, kernel matrices, and feature relevance.

affine_transformM(parameters, array_A)[source]

Apply an affine transformation to the input array using the given parameters.

registration_errorM(parameters, array_A, array_B)[source]

Compute the registration error between two sets of 2D points after applying an affine transformation.

LocalBiplot_()[source]

Process and analyze the data using a series of steps, including scaling, dimensionality reduction, kernel calculations, and Laplacian score computation.

Returns:

  • YourClass instance: The modified instance with processed and analyzed data.

affine_transformM(parameters, array_A)[source]

Apply an affine transformation to the input array using the given parameters.

Parameters:

  • parameters (array-like): Affine transformation parameters.
    • parameters[0]: Scaling factor

    • parameters[1]: Rotation angle (in radians)

    • parameters[2:]: Translation along x and y axes

  • array_A (array-like): Input array to be transformed.

Returns:

  • array-like: Transformed array after applying the affine transformation.

clustering(Z, eps_=None, per_=5)[source]

Perform clustering on the given 2D data using DBSCAN algorithm.

Parameters:

  • Z (array-like): N x 2 list | np.ndarray representing the data points.

  • eps_ (float, optional): The maximum distance between two samples for one to be considered as in the neighborhood of the other. Defaults to None.

  • per_ (float, optional): The percentile value used to set the eps parameter if it is not provided. Defaults to 5.

Returns:

  • list | np.ndarray : An array of cluster labels assigned by the DBSCAN algorithm.

Notes:

If eps_ is not provided, it is calculated as a percentile of the pairwise Euclidean distances between points in the input data Z.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together data points that are close to each other and marks outliers as noise.

compute_variance_ratio(Sc)[source]

Compute eigenvalues, total variance, and explained variance ratio by principal component.

Parameters: - Sc: Array of singular values from SVD.

Returns: - explained_variance_ratio: Array of explained variance ratios.

data_scaler(X, feature_range=(0, 1))[source]

this method scale the input data using MinMaxScaler if ‘sca’ is set to ‘minmax’.

Parameters:
  • (array-like) (- X) –

  • (tuple (- feature_range) – Defaults to (0, 1).

  • optional) (Tuple specifying the minimum and maximum values of the feature range.) – Defaults to (0, 1).

Return type:

  • An N x P scaled data matrix.

get_localbp_(tar_, Ck, databp)[source]
optimize_affine_transform(Zc, B, Sc, ind_)[source]

Optimize the parameters for the affine transformation.

Parameters:

  • Zc (array-like): Cluster data points (N x 2 array).

  • B (array-like): Matrix of vectors (2 x P) representing the original basis.

  • Sc (array-like): Singular values of the original basis.

  • ind_ (array-like): Boolean array indicating the indices of the cluster.

Returns:

  • Tuple: A tuple containing the optimized parameters and the transformed cluster points.

Notes:

This function performs optimization to find the best affine transformation parameters using the Nelder-Mead method. It then applies the optimized transformation to the cluster points.

pca_by_SVD(X)[source]

Perform SVD decomposition.

Parameters:

  • X: list | np.ndarray Input data N x P.

Returns:

  • U, S, VT, S_, A, B

Details:

Singular Value Decomposition

(utilizar ..math:: en lugar de $$) $mathbf{X} = mathbf{U}mathbf{S}mathbf{V}^ op = mathbf{U}mathbf{S}^{0.5}mathbf{S}^{0.5}mathbf{V}^ op = mathbf{A}mathbf{B}^ op$

$mathbf{X}in mathbb{R}^{N imes P}$

$mathbf{U}in mathbb{R}^{N imes M}$

$mathbf{V}in mathbb{R}^{P imes M}$

$mathbf{S}in mathbb{R}^{M imes M}$

$mathbf{A} = mathbf{U}mathbf{S}^{0.5} in mathbb{R}^{N imes M} $

$mathbf{B} = mathbf{V}mathbf{S}^{0.5} in mathbb{R}^{P imes M} $

$M = min(N,P)$

plot_transformed_clusters(ax, ZcA, VA, cmap, arrow_size=0.05)[source]

Plot the non-linear local-Biplot SVD.

Parameters:

  • ax (matplotlib.axes._subplots.AxesSubplot): Axes on which to plot.

  • ZcA (numpy.ndarray): Transformed points of the cluster.

  • VA (numpy.ndarray): Transformed vector arrows of the cluster.

  • cmap: Color map for the scatter plot.

  • arrow_size

Returns:

None

reduce_dimensions(X)[source]

Reduce the dimensionality of the input data using t-SNE, PCA, or UMAP.

Parameters:

  • X (array-like): Input matrix of shape N x P. Input data to be dimensionality reduced.

Returns:

  • An n x 2 array-like dimensionality reduced data.

registration_errorM(parameters, array_A, array_B)[source]

Compute the registration error between two sets of 2D points after applying an affine transformation.

Parameters:

  • parameters (array-like): Affine transformation parameters.

  • array_A (array-like): Source set of 2D points (N x 2 array).

  • array_B (array-like): Target set of 2D points (N x 2 array).

Returns:

  • float: Registration error, calculated as the Frobenius norm of the difference

    between the transformed source points and the target points.