localbiplot package¶
local biplot¶
- class localbiplot.LocalBiplot(X, labels=None, perplexity=None, red='tsne', sca='minmax', random_seed=123)[source]¶
Bases:
objectObject for data analysis using linear and non-linear Biplots obtained by SVD decomposition and a Generalized SVD decomposition .
This class implements a set of functions for data analysis, including scaling, dimensionality reduction, kernel calculation, and biplots computation and display.
- X¶
Input matrix of shape N x P.
- Type:
pd.dataframe
- labels¶
Labels for the samples (default is None).
- Type:
array-like, optional
- perplexity¶
Perplexity for t-SNE (default is calculated as the square root of N).
- Type:
int or None, optional
- red¶
Dimensionality reduction method (‘tsne’ by default).
- Type:
{‘tsne’, ‘pca’, ‘umap’}, default is ‘tsne’
- sca¶
Data scaling method (‘minmax’ by default).
- Type:
{‘minmax’}, default is ‘minmax’
- random_seed¶
Seed for result reproducibility.
- Type:
int, default is 123
- data_scaler(X, feature_range=(0, 1))[source]¶
Scale the data using MinMaxScaler if ‘sca’ is set to ‘minmax’.
- krbf(X)¶
Calculate the Radial Basis Function (RBF) kernel matrix for the input data.
- center_kernel(K)¶
Center a given kernel matrix using the Kernel Centering method.
- laplacian_score(X, K, tol=1e-10)¶
Calculate the Laplacian score for a given dataset and kernel matrix.
- lnkbp_()¶
Process and analyze the data through steps such as scaling, dimensionality reduction, kernel calculations, and Laplacian Score computation.
- localbp_(X_)¶
Perform a local biplot operation on the scaled data (currently commented out).
- laplacian_score(X, K, tol=1e-10)¶
Calculate the Laplacian score for a given dataset and kernel matrix
- GMD(X, H, Q, K)¶
Generalized Matrix Decomposition method (power method) for a given dataset and kernel matrices.
- biplot_gmd_body(fit, index=None, names=None, sample_col='grey50', sample_pch=19, arrow_col='orange', arrow_cex=1)¶
Generate a GMD-biplot based on generalized matrix decomposition results.
- plot_lnkbp_(hue, c, figsize=(25, 10))¶
Plot various visualizations, including scatter plots, kernel matrices, and feature relevance.
- affine_transformM(parameters, array_A)[source]¶
Apply an affine transformation to the input array using the given parameters.
- registration_errorM(parameters, array_A, array_B)[source]¶
Compute the registration error between two sets of 2D points after applying an affine transformation.
…
- LocalBiplot_()[source]¶
Process and analyze the data using a series of steps, including scaling, dimensionality reduction, kernel calculations, and Laplacian score computation.
Returns:¶
YourClass instance: The modified instance with processed and analyzed data.
- affine_transformM(parameters, array_A)[source]¶
Apply an affine transformation to the input array using the given parameters.
Parameters:¶
- parameters (array-like): Affine transformation parameters.
parameters[0]: Scaling factor
parameters[1]: Rotation angle (in radians)
parameters[2:]: Translation along x and y axes
array_A (array-like): Input array to be transformed.
Returns:¶
array-like: Transformed array after applying the affine transformation.
- clustering(Z, eps_=None, per_=5)[source]¶
Perform clustering on the given 2D data using DBSCAN algorithm.
Parameters:¶
Z (array-like): N x 2 list | np.ndarray representing the data points.
eps_ (float, optional): The maximum distance between two samples for one to be considered as in the neighborhood of the other. Defaults to None.
per_ (float, optional): The percentile value used to set the eps parameter if it is not provided. Defaults to 5.
Returns:¶
list | np.ndarray : An array of cluster labels assigned by the DBSCAN algorithm.
Notes:¶
If eps_ is not provided, it is calculated as a percentile of the pairwise Euclidean distances between points in the input data Z.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together data points that are close to each other and marks outliers as noise.
- compute_variance_ratio(Sc)[source]¶
Compute eigenvalues, total variance, and explained variance ratio by principal component.
Parameters: - Sc: Array of singular values from SVD.
Returns: - explained_variance_ratio: Array of explained variance ratios.
- data_scaler(X, feature_range=(0, 1))[source]¶
this method scale the input data using MinMaxScaler if ‘sca’ is set to ‘minmax’.
- Parameters:
(array-like) (- X) –
(tuple (- feature_range) – Defaults to (0, 1).
optional) (Tuple specifying the minimum and maximum values of the feature range.) – Defaults to (0, 1).
- Return type:
An N x P scaled data matrix.
- optimize_affine_transform(Zc, B, Sc, ind_)[source]¶
Optimize the parameters for the affine transformation.
Parameters:¶
Zc (array-like): Cluster data points (N x 2 array).
B (array-like): Matrix of vectors (2 x P) representing the original basis.
Sc (array-like): Singular values of the original basis.
ind_ (array-like): Boolean array indicating the indices of the cluster.
Returns:¶
Tuple: A tuple containing the optimized parameters and the transformed cluster points.
Notes:¶
This function performs optimization to find the best affine transformation parameters using the Nelder-Mead method. It then applies the optimized transformation to the cluster points.
- pca_by_SVD(X)[source]¶
Perform SVD decomposition.
Parameters:¶
X: list | np.ndarray Input data N x P.
Returns:¶
U, S, VT, S_, A, B
Details:¶
Singular Value Decomposition
(utilizar ..math:: en lugar de $$) $mathbf{X} = mathbf{U}mathbf{S}mathbf{V}^ op = mathbf{U}mathbf{S}^{0.5}mathbf{S}^{0.5}mathbf{V}^ op = mathbf{A}mathbf{B}^ op$
$mathbf{X}in mathbb{R}^{N imes P}$
$mathbf{U}in mathbb{R}^{N imes M}$
$mathbf{V}in mathbb{R}^{P imes M}$
$mathbf{S}in mathbb{R}^{M imes M}$
$mathbf{A} = mathbf{U}mathbf{S}^{0.5} in mathbb{R}^{N imes M} $
$mathbf{B} = mathbf{V}mathbf{S}^{0.5} in mathbb{R}^{P imes M} $
$M = min(N,P)$
- plot_transformed_clusters(ax, ZcA, VA, cmap, arrow_size=0.05)[source]¶
Plot the non-linear local-Biplot SVD.
Parameters:¶
ax (matplotlib.axes._subplots.AxesSubplot): Axes on which to plot.
ZcA (numpy.ndarray): Transformed points of the cluster.
VA (numpy.ndarray): Transformed vector arrows of the cluster.
cmap: Color map for the scatter plot.
arrow_size
Returns:¶
None
- reduce_dimensions(X)[source]¶
Reduce the dimensionality of the input data using t-SNE, PCA, or UMAP.
Parameters:¶
X (array-like): Input matrix of shape N x P. Input data to be dimensionality reduced.
Returns:¶
An n x 2 array-like dimensionality reduced data.
- registration_errorM(parameters, array_A, array_B)[source]¶
Compute the registration error between two sets of 2D points after applying an affine transformation.
Parameters:¶
parameters (array-like): Affine transformation parameters.
array_A (array-like): Source set of 2D points (N x 2 array).
array_B (array-like): Target set of 2D points (N x 2 array).
Returns:¶
- float: Registration error, calculated as the Frobenius norm of the difference
between the transformed source points and the target points.