nptsne API Reference¶
Module summary¶
API Reference¶
The main API classes are:
- t-SNE classes
TextureTsne
: linear tSNE simple APITextureTsneExtended
: linear tSNE advanced API wrapper with additional functionality
- HSNE classes
Full details are in the reference below.
Code examples¶
The Examples in the documentation make use of the DocTest run_doctest.py to prepare the sample data. Refer to either the repository code Doctest code or Demo list for more information.
nptsne
: t-SNE and HSNE data embedding¶
Initialize an HSne wrapper with logging state. |
|
Create a wrapper for the HSNE data scale. |
|
Create a wrapper class for the linear tSNE implementation. |
|
Create an extended functionality wrapper for the linear tSNE implementation. |
|
Enumeration used to select the knn algorithm used. Three possibilities are |
A numpy compatible python extension for GPGPU linear complexity t-SNE and HSNE
This package contains classes that wrap linear complexity t-SNE and classes to support HSNE.
Available subpackages¶
- hsne_analysis
Provides classes for selection driven navigation of the HSNE model and mapping back to the original data. The classes are intended to support visual analytics
Notes¶
ndarray
types are the preferred parameters types for input
and where possible internal data in the wrapped t-SNE [1] and HSNE [2] is returned without
a copy in a ndarray
.
References¶
- 1
Pezzotti, N. et al., GPGPU Linear Complexity t-SNE Optimization
- 2
Pezzotti, N. et al., Hierarchical Stochastic Neighbor Embedding
-
class
nptsne.
HSne
(self: nptsne.libs._nptsne.HSne, verbose: bool = False) → None¶ Bases:
pybind11_builtins.pybind11_object
Initialize an HSne wrapper with logging state.
- Parameters
- verbosebool
Enable verbose logging to standard output, default is False
Notes
HSne is a simple wrapper API for the Hierarchical SNE implementation.
Hierarchical SNE is is a GPU compute shader implementation of Hierarchical Stochastic Neighborhood Embedding described in [1].
The wrapper can be used to create a new or load an existing hSNE analysis. The hSNE analysis is then held in the HSne instance and can be accessed through the class api.
References
Examples
Create an HSNE wrapper
>>> import nptsne >>> hsne = nptsne.HSne(True)
- Attributes
num_data_points
int: The number of data points in the HSne.
num_dimensions
int: The number of dimensions associated with the original data.
num_scales
int: The number of scales in the HSne.
-
create_hsne
(*args, **kwargs)¶ Overloaded function.
create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int) -> bool
create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int, point_ids: numpy.ndarray[numpy.uint64]) -> bool
Create the hSNE analysis data hierarchy with user assigned point ids from the input data with the number of scales required.
- Parameters
- X
ndarray
The data used to create the saved file. Shape is : (num. data points, num. dimensions)
- num_scalesint
How many scales to create in the hsne analysis
- point_ids
ndarray
, optional Array of ids associated with the data points
- X
Examples
>>> import nptsne >>> hsne = nptsne.HSne(True) >>> hsne.create_hsne(sample_hsne_data, 3) True >>> hsne.num_data_points 10000 >>> hsne.num_dimensions 16 >>> hsne.num_scales 3
-
get_scale
(self: nptsne.libs._nptsne.HSne, scale_number: int) → HSneScale¶ Get the scale information at the index. 0 is the HSNE data scale.
- Parameters
- scale_indexint
Index of the scale to retrieve
- Returns
HSneScale
A numpy array contain a flatten (1D) embedding
Examples
The number of landmarks in scale 0 is the number of data points.
>>> scale = sample_hsne.get_scale(0) >>> scale.num_points 10000
-
load_hsne
(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], file_path: str) → bool¶ Load the HSNE analysis data hierarchy from a pre-existing HSNE file.
- Parameters
- X
ndarray
The data used to create the saved file. Shape is : (num. data points, num. dimensions)
- file_pathstr
Path to saved HSNE file
- X
Examples
Load hsne from a file, and check that is contains the expected data
>>> import nptsne >>> import doctest >>> loaded_hsne = nptsne.HSne(True) >>> loaded_hsne.load_hsne(sample_hsne_data, sample_hsne_file) True >>> loaded_hsne.num_data_points 10000 >>> loaded_hsne.num_dimensions 16 >>> loaded_hsne.num_scales 3
-
static
read_num_scales
(file_path: str) → int¶ Read the number of scales defined in stored hSNE data without fully loading the file.
- Parameters
- filenamestr
The path to a saved hSNE
- Returns
- int
The number of scales in the saved hierarchy
Examples
Read the number of scales from a saved file
>>> import nptsne >>> nptsne.HSne.read_num_scales(sample_hsne_file) 3
-
save
(self: nptsne.libs._nptsne.HSne, file_path: str) → None¶ Save the HSNE as a binary structure to a file
- Parameters
- filenamestr
The file to save to. If it already exists it is overwritten.
Examples
Save the hsne to a file and check the number of scales was saved correctly.
>>> import nptsne >>> from pathlib import Path >>> from tempfile import gettempdir >>> savepath = Path(gettempdir(), "save_test.hsne") >>> sample_hsne.save(str(savepath)) >>> nptsne.HSne.read_num_scales(str(savepath)) 3
-
property
num_data_points
¶ int: The number of data points in the HSne.
Examples
>>> sample_hsne.num_data_points 10000
-
property
num_dimensions
¶ int: The number of dimensions associated with the original data.
Examples
>>> sample_hsne.num_dimensions 16
-
property
num_scales
¶ int: The number of scales in the HSne.
Examples
>>> sample_hsne.num_scales 3
-
class
nptsne.
HSneScale
(self: nptsne.libs._nptsne.HSneScale, hsne: nptsne.libs._nptsne.HSne, scale_number: int) → None¶ Bases:
pybind11_builtins.pybind11_object
Create a wrapper for the HSNE data scale. The function
HSne.get_scale()
works more directly than calling the constructor on this class.- Parameters
- hsne
HSne
The hierarchical SNE being explored
- scale_numberint
The scale from the nsne to wrap
- hsne
Examples
Using the initializer to create an HSneScale wrapper. Scale 0 contains the datapoints. (Prefer the HSne.get_scale function)
>>> import nptsne >>> scale = nptsne.HSneScale(sample_hsne, 0) >>> scale.num_points 10000
- Attributes
num_points
int: The number of landmark points in this scale
transition_matrix
The transition (probability) matrix in this scale.
landmark_orig_indexes
Original data indexes for each landmark in this scale.
-
get_landmark_weight
(self: nptsne.libs._nptsne.HSneScale) → numpy.ndarray[numpy.float32]¶ The weights per landmark in the scale.
- Returns
ndarray
Weights array in landmark index order
Examples
The size of landmark weights should match the number of points
>>> num_points = sample_scale2.num_points >>> weights = sample_scale2.get_landmark_weight() >>> weights.shape[0] == num_points True
All weights at scale 0 should be 1.0
>>> weights = sample_scale0.get_landmark_weight() >>> test = weights[0] == 1.0 >>> test.all() True
-
property
area_of_influence
¶ The area of influence matrix in this scale.
- Returns
- list(list(tuple)):
The area of influence matrix in this scale
Notes
The return is in list-of-lists (LIL) format. The list returned has one entry for each landmark point i at scale s-1, :math: mathcal{L}_{i}^{s-1}. Each entry is a list of tuples at where each tuple contains an index j for a landmark at scale s, :math: mathcal{L}_{j}^{s} and a value :math: mathit{I}^{S}(i,j) representing the probability that the landmark point i at scale s-1 is influenced by landmark j at scale s.
The resulting matrix is sparse.
Examples
The size of landmark area of influence should match the number of points in the more detailed (s-1) scale.
>>> len(sample_scale2.area_of_influence) == sample_scale1.num_points True
Loop over all the landmarks, i, at scale 1. Sum the influences from each landmark j at scale 2 on the individual landmarks i in scale 1. For each landmark i at scale 1 the total influence from the j landmarks should be approximately 1.0. In this random data test the difference is assumed to be < \(1.5\mathrm{e}{-2}\).
>>> aoi_2on1 = sample_scale2.area_of_influence >>> scale1_sum = {} >>> all_tots_are_1 = True >>> for i in aoi_2on1: ... sum_inf = 0.0 ... for j_tup in i: ... sum_inf += j_tup[1] ... if abs(1 - sum_inf) > 0.015: ... print(f"{1- sum_inf}") ... all_tots_are_1 = False >>> all_tots_are_1 == True True
-
property
landmark_orig_indexes
¶ Original data indexes for each landmark in this scale.
- Returns
ndarray
:An ndarray of the original data indexes.
Examples
At scale 0 the landmarks are all the data points.
>>> sample_scale0.landmark_orig_indexes.shape (10000,) >>> sample_scale0.landmark_orig_indexes[0] 0 >>> sample_scale0.landmark_orig_indexes[9999] 9999
-
property
num_points
¶ int: The number of landmark points in this scale
Examples
>>> sample_scale0.num_points 10000
-
property
transition_matrix
¶ The transition (probability) matrix in this scale.
- Returns
- list(list(tuple)):
The transition (probability) matrix in this scale in list-of-lists form
Notes
The list returned has one entry for each landmark point, each entry is a list The inner list contains tuples where the first item is an integer landmark index in the scale and the second item is the transition matrix value for the two points.
The resulting matrix is sparse in list-of-lists (LIL) form, one list per row containing a list of (column number:value) tuples.
Examples
The size of the transition matrix should match the number of points
>>> sample_scale0.num_points == len(sample_scale0.transition_matrix) True >>> sample_scale1.num_points == len(sample_scale1.transition_matrix) True >>> sample_scale2.num_points == len(sample_scale2.transition_matrix) True
-
class
nptsne.
KnnAlgorithm
(self: nptsne.libs._nptsne.KnnAlgorithm, value: int) → None¶ Bases:
pybind11_builtins.pybind11_object
Enumeration used to select the knn algorithm used. Three possibilities are supported:
KnnAlgorithm.Flann: Knn using FLANN - Fast Library for Approximate Nearest Neighbors KnnAlgorithm.HNSW: Knn using Hnswlib - fast approximate nearest neighbor search KnnAlgorithm.Annoy: Knn using Annoy - Spotify Approximate Nearest Neighbors Oh Yeah
Members:
Flann
HNSW
Annoy
-
get_supported_metrics
(self: int) → Dict[str, object]¶ Get a dict containing KnnDistanceMetric values supported by the KnnAlgorithm.
- Parameters
- knn_lib
KnnAlgorithm
The algorithm being queried.
- knn_lib
- Returns
ndarray
A numpy array contain a flatten (1D) embedding
Examples
Each algorithm has different support. See the tests below.
>>> import nptsne >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Flann) >>> for i in support.items(): ... print(i[0]) Euclidean >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Annoy) >>> for i in support.items(): ... print(i[0]) Cosine Dot Euclidean Manhattan >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.HNSW) >>> for i in support.items(): ... print(i[0]) Euclidean Inner Product >>> support["Euclidean"] is nptsne.KnnDistanceMetric.Euclidean True
-
Annoy
= <KnnAlgorithm.Annoy: 1>¶
-
Flann
= <KnnAlgorithm.Flann: -1>¶
-
HNSW
= <KnnAlgorithm.HNSW: 0>¶
-
property
name
¶
-
property
value
¶
-
-
class
nptsne.
KnnDistanceMetric
(self: nptsne.libs._nptsne.KnnDistanceMetric, value: int) → None¶ Bases:
pybind11_builtins.pybind11_object
Enumeration used to select the knn distance metric used. Five possibilities are supported:
KnnDistanceMetric.Euclidean: Euclidean metric for all algorithms KnnDistanceMetric.InnerProduct: Inner Product metric for HNSW KnnDistanceMetric.Cosine: Cosine metric for Annoy KnnDistanceMetric.Manhattan: Manhattan metric for Annoy KnnDistanceMetric.Hamming: Hamming metric for Annoy, not supported KnnDistanceMetric.Dot: Dot metric for Annoy
Members:
Euclidean
Cosine
InnerProduct
Manhattan
Hamming
Dot
-
Cosine
= <KnnDistanceMetric.Cosine: 1>¶
-
Dot
= <KnnDistanceMetric.Dot: 5>¶
-
Euclidean
= <KnnDistanceMetric.Euclidean: 0>¶
-
Hamming
= <KnnDistanceMetric.Hamming: 4>¶
-
InnerProduct
= <KnnDistanceMetric.InnerProduct: 2>¶
-
Manhattan
= <KnnDistanceMetric.Manhattan: 3>¶
-
property
name
¶
-
property
value
¶
-
-
class
nptsne.
TextureTsne
(self: nptsne.libs._nptsne.TextureTsne, verbose: bool = False, iterations: int = 1000, num_target_dimensions: int = 2, perplexity: int = 30, exaggeration_iter: int = 250, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None¶ Bases:
pybind11_builtins.pybind11_object
Create a wrapper class for the linear tSNE implementation.
- Parameters
- verbosebool
Enable verbose logging to standard output
- iterationsint
The number of iterations to perform. This must be at least 1000.
- num_target_dimensionsint
The number of dimensions for the output embedding. Default is 2.
- perplexityint
The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.
- exaggeration_iterint
The iteration when force exaggeration starts to decay.
- knn_algorithm
KnnAlgorithm
The knn algorithm used for the nearest neighbor calculation. The default is Flann for less than 50 dimensions HNSW may be faster
- knn_metric
KnnDistanceMetric
The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann
See also
Notes
TextureTsne is a GPU compute shader implementation of the gradient descent linear tSNE. If the system does not support OpenGL 4.3 an abover the implementation falls back to the a Texture rendering approach as described in [1].
References
- 1
Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181
Examples
Create an TextureTsne wrapper
>>> import nptsne >>> tsne = nptsne.TextureTsne(verbose=True, knn_algorithm=nptsne.KnnAlgorithm.Annoy) >>> tsne.verbose True >>> tsne.iterations 1000 >>> tsne.num_target_dimensions 2 >>> tsne.perplexity 30 >>> tsne.exaggeration_iter 250 >>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy True
-
fit_transform
(self: nptsne.libs._nptsne.TextureTsne, X: numpy.ndarray[numpy.float32]) → numpy.ndarray[numpy.float32]¶ Fit X into an embedded space and return that transformed output.
- Parameters
- X
ndarray
The input data with shape (num. data points, num. dimensions)
- X
- Returns
ndarray
A numpy array contain a flatten (1D) embedding
Examples
An 2D embedding is returned in the form of a numpy array [x0, y0, x1, y1, …].
>>> import nptsne >>> tsne = nptsne.TextureTsne() >>> embedding = tsne.fit_transform(sample_tsne_data) >>> embedding.shape (4000,) >>> import numpy >>> embedding.dtype == numpy.float32 True
-
property
exaggeration_iter
¶ int: The iteration where attractive force exaggeration starts to decay, set at initialization.
Notes
The gradient of the cost function used to iteratively optimize the embedding points \(y_i\) is a sum of an attractive and repulsive force \(\frac{\delta C} {\delta y_i} = 4(\phi * F_i ^{attr} - F_i ^{rep})\) The iterations up to exaggeration_iter increase the \(F_i ^{attr}\) term by the factor \(\phi\) which then decays to 1.
Examples
>>> sample_texture_tsne.exaggeration_iter 250
-
property
iterations
¶ int: The number of iterations, set at initialization.
Examples
>>> sample_texture_tsne.iterations 1000
-
property
knn_algorithm
¶ int: The KnnAlgorithm value, set at initialization.
Examples
>>> import nptsne >>> sample_texture_tsne.knn_algorithm == nptsne.KnnAlgorithm.Flann True
-
property
knn_distance_metric
¶ int: KnnDistanceMetric value, set at initialization.
Examples
>>> import nptsne >>> sample_texture_tsne.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean True
-
property
num_target_dimensions
¶ int: The number of target dimensions, set at initialization.
Examples
>>> sample_texture_tsne.num_target_dimensions 2
-
property
perplexity
¶ int: The tsne perplexity, set at initialization.
Examples
>>> sample_texture_tsne.perplexity 30
-
property
verbose
¶ bool: True if verbose logging is enabled. Set at initialization.
Examples
>>> sample_texture_tsne.verbose False
-
class
nptsne.
TextureTsneExtended
(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, num_target_dimensions: int = 2, perplexity: int = 30, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None¶ Bases:
pybind11_builtins.pybind11_object
Create an extended functionality wrapper for the linear tSNE implementation.
- Parameters
- verbosebool
Enable verbose logging to standard output, default is False
- num_target_dimensionsint
The number of dimensions for the output embedding. Default is 2.
- perplexityint
The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.
- knn_algorithm
KnnAlgorithm
The knn algorithm used for the nearest neighbor calculation. The default is ‘Flann’ for less than 50 dimensions ‘HNSW’ may be faster
- knn_metric
KnnDistanceMetric
The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann
See also
Notes
TextureTsneExtended offers additional control over the exaggeration decay compares to TextureTsne. Additionally it supports inputting an initial embedding. Linear tSNE is described in [1].
References
- 1
Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181
Examples
Create an TextureTsneExtended wrapper
>>> import nptsne >>> tsne = nptsne.TextureTsneExtended(verbose=True, num_target_dimensions=2, perplexity=35, knn_algorithm=nptsne.KnnAlgorithm.Annoy) >>> tsne.verbose True >>> tsne.num_target_dimensions 2 >>> tsne.perplexity 35 >>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy True
- Attributes
decay_started_at
int: The iteration number when exaggeration decay started.
iteration_count
int: The number of completed iterations of tSNE gradient descent.
-
close
(self: nptsne.libs._nptsne.TextureTsneExtended) → None¶ Release GPU resources for the transform
-
init_transform
(self: nptsne.libs._nptsne.TextureTsneExtended, X: numpy.ndarray[numpy.float32], initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → bool¶ Initialize the transform with given data and optional initial embedding. Fit X into an embedded space and return that transformed output.
- Parameters
- X
ndarray
The input data with shape (num. data points, num. dimensions)
- initial_embedding
ndarray
An optional initial embedding. Shape should be (num data points, num output dimensions)
- X
- Returns
- bool
True if successful, False otherwise
Examples
Create an TextureTsneExtended wrapper and initialize the data. This step performs the knn.
>>> import nptsne >>> tsne = nptsne.TextureTsneExtended() >>> tsne.init_transform(sample_tsne_data) True
-
reinitialize_transform
(self: nptsne.libs._nptsne.TextureTsneExtended, initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → None¶ Fit X into an embedded space and return that transformed output. Knn is not recomputed. If no initial_embedding is supplied the embedding is re-randomized.
- Parameters
- initial_embedding
ndarray
An optional initial embedding. Shape should be (num data points, num output dimensions)
- initial_embedding
Examples
Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.
>>> import nptsne >>> tsne = nptsne.TextureTsneExtended() >>> tsne.init_transform(sample_tsne_data) True >>> embedding = tsne.run_transform(iterations=100) >>> tsne.iteration_count 100 >>> tsne.reinitialize_transform() >>> tsne.iteration_count 0
-
run_transform
(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, iterations: int = 1000) → numpy.ndarray[numpy.float32]¶ Run the transform gradient descent for a number of iterations with the current settings for exaggeration.
- Parameters
- verbosebool
Enable verbose logging to standard output.
- iterationsint
The number of iterations to run.
- Returns
ndarray
A numpy array contain a flatten (1D) embedding. Coordinates are arranged: x0, y0, x, y1, …
Examples
Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.
>>> import nptsne >>> tsne = nptsne.TextureTsneExtended() >>> tsne.init_transform(sample_tsne_data) True >>> embedding = tsne.run_transform(iterations=250) >>> embedding.shape (4000,) >>> tsne.iteration_count 250
-
start_exaggeration_decay
(self: nptsne.libs._nptsne.TextureTsneExtended) → None¶ Enable exaggeration decay. Effective on next call to run_transform. From this point exaggeration decays over the following 150 iterations, the decay this is a fixed parameter. This call is ony effective once.
- Raises
- RuntimeError
If the decay is already active. This can be ignored.
Examples
Starting decay exaggeration is recorded in the decay_started_at property.
>>> import nptsne >>> tsne = nptsne.TextureTsneExtended() >>> tsne.init_transform(sample_tsne_data) True >>> tsne.decay_started_at -1 >>> embedding = tsne.run_transform(iterations=100) >>> tsne.start_exaggeration_decay() >>> tsne.decay_started_at 100
-
property
decay_started_at
¶ int: The iteration number when exaggeration decay started. Is -1 if exaggeration decay has not started.
Examples
Starting decay exaggeration is recorded in the decay_started_at property.
>>> sample_texture_tsne_extended.decay_started_at -1
-
property
iteration_count
¶ int: The number of completed iterations of tSNE gradient descent.
>>> sample_texture_tsne_extended.iteration_count 0
-
property
knn_algorithm
¶ int: The KnnAlgorithm value, set at initialization.
Examples
>>> import nptsne >>> sample_texture_tsne_extended.knn_algorithm == nptsne.KnnAlgorithm.Flann True
-
property
knn_distance_metric
¶ int: The KnnDistanceMetric value, set at initialization.
Examples
>>> import nptsne >>> sample_texture_tsne_extended.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean True
-
property
num_target_dimensions
¶ int: The number of target dimensions, set at initialization.
Examples
>>> sample_texture_tsne_extended.num_target_dimensions 2
-
property
perplexity
¶ int: The tsne perplexity, set at initialization.
Examples
>>> sample_texture_tsne_extended.perplexity 30
-
property
verbose
¶ bool: True if verbose logging is enabled. Set at initialization.
Examples
>>> sample_texture_tsne_extended.verbose False
nptsne.hsne_analysis
: HSNE visual analysis support submodule¶
Create a new analysis as a child of an (optional) parent analysis. |
|
A dict of dicts to store analyses |
|
Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points. |
|
Enumeration used to select the embedder used. Two possibilities are |
|
SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1]. |
-
class
nptsne.hsne_analysis.
Analysis
(self: nptsne.libs._nptsne._hsne_analysis.Analysis, hnse: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType, parent: nptsne.libs._nptsne._hsne_analysis.Analysis = None, parent_selection: List[int] = []) → None¶ Bases:
pybind11_builtins.pybind11_object
Create a new analysis as a child of an (optional) parent analysis.
- Parameters
- hsne
HSne
The hierarchical SNE being explored
- embedder_type
EmbedderType
The tSNE to use CPU or GPU based
- parent
Analysis
, optional The parent Analysis (where the selection was performed) if any
- parent_selectionlist, optional
List of selection indexes in the parent analysis.
- hsne
Notes
Together with AnalysisModel provides support for visual analytics of an hSNE. The Analysis class holds both the chosen landmarks at a particular scale but also permits referencing back to the original data. Additionally a t-SNE embedder is included (a choice is provided between GPU and CPU implementations) which can be used to create an embedding of the selected landmarks.
Examples
The Analysis constructor is meant for use by the :class: nptsne.hsne_analysis.AnalysisModel. The example here illustrates how a top level analysis would be created from a sample hsne.
>>> import nptsne >>> top_analysis = nptsne.hsne_analysis.Analysis(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> top_analysis.scale_id 2 >>> sample_hsne.get_scale(top_analysis.scale_id).num_points == top_analysis.number_of_points True
- Attributes
number_of_points
int : number of landmarks in this Analysis
parent_id
int : Unique id of the parent analysis
transition_matrix
list(dict) : The transition (probability) matrix in this Analysis
landmark_weights
ndarray
: the weights for the landmarks in this Analysislandmark_indexes
ndarray
: the indexes for the landmarks in this Analysislandmark_orig_indexes
ndarray
: the original data indexes for the landmarks in this Analysisembedding
ndarray
: the tSNE embedding generated for this Analysis
-
do_iteration
(self: nptsne.libs._nptsne._hsne_analysis.Analysis) → None¶ Perform one iteration of the chosen embedder
-
get_area_of_influence
(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int], threshold: float = 0.3) → numpy.ndarray[numpy.float32]¶ Get the area of influence of the selection in the original data. For more information on the threshold refer to the HSNE paper section 4.2 Filtering and drilling down.
A fast but less accurate approach to obtaining area of influence is get_mapped_area_of_influence.
- Parameters
- select_listlist
A list of selection indexes for landmarks in this analysis
- threshold: float, optional
The minimum value required for the underlying datapoint to be considered in the landmark’s region of influence. Default is 0.3. The parameter must be in the range 0 to 1.0, values outside the range it will be ignored.
- Returns
ndarray
The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.
See also
-
get_fast_area_of_influence
(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int]) → numpy.ndarray[numpy.float32]¶ Fast method to get the area of influence of the selection in the original data based on non overlapping \({1}\rightarrow{n}\) mapping of scale landmarks to original data points.
This mapping is derived by working bottom up from the data points and finding the landmarks at each scale with the maximum influence. The mapping is calculated once on the first call to this function so subsequent calls are fast.
Due to thresholding it is possible that a datapoint may have no representative landmark at a specific scale.
- Parameters
- select_listlist
A list of selection indexes for landmarks in this analysis
- Returns
ndarray
The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.
See also
Examples
Demonstrate the non-overlap of the area of influence for each landmark.
>>> import math >>> import numpy as np >>> all_top_landmarks=list(range(0,sample_analysis.number_of_points)) >>> all_influenced=sample_analysis.get_fast_area_of_influence(all_top_landmarks) >>> all_influenced.shape[0] == 10000 True
Accumulate the individual landmark AOIs and check the total
>>> infl_accum = np.zeros((10000,), dtype=np.float32) >>> total = 0 >>> for i in all_top_landmarks: ... influenced = sample_analysis.get_fast_area_of_influence([i]) ... total = total + influenced.sum() ... infl_accum = np.add(infl_accum, influenced) >>> total == 10000 True
Verify that all AOIs are non-overlapping, each datapoint occurs once and only once.
>>> np.all(infl_accum == 1) True
-
property
embedding
¶ ndarray
: the tSNE embedding generated for this AnalysisExamples
An embedding is a 2d float array. One entry per point.
>>> import numpy as np >>> sample_analysis.embedding.shape == (sample_analysis.number_of_points, 2) True >>> sample_analysis.embedding.dtype == np.float32 True
-
property
id
¶ int: Internally generated unique id for the analysis.
Examples
>>> sample_analysis.id 0
-
property
landmark_indexes
¶ ndarray
: the indexes for the landmarks in this AnalysisExamples
In a complete top level analysis all points are present in this case all the points at scale2.
>>> import numpy as np >>> np.array_equal( ... np.arange(sample_scale2.num_points, dtype=np.uint32), ... sample_analysis.landmark_indexes) True
-
property
landmark_orig_indexes
¶ ndarray
: the original data indexes for the landmarks in this AnalysisExamples
The indexes are in the range of the original point indexes.
>>> import numpy as np >>> np.logical_and( ... sample_analysis.landmark_orig_indexes >= 0, ... sample_analysis.landmark_orig_indexes < 10000).any() True
-
property
landmark_weights
¶ ndarray
: the weights for the landmarks in this AnalysisExamples
There will be a weight for every point.
>>> weights = sample_analysis.landmark_weights >>> weights.shape == (sample_analysis.number_of_points,) True
-
property
number_of_points
¶ int : number of landmarks in this Analysis
Examples
The sample analysis is all the top scale points
>>> sample_analysis.number_of_points == sample_scale2.num_points True
-
property
parent_id
¶ int : Unique id of the parent analysis
-
property
scale_id
¶ int: The number of this HSNE scale where this analysis is created.
Examples
>>> sample_analysis.scale_id 2
-
property
transition_matrix
¶ list(dict) : The transition (probability) matrix in this Analysis
-
class
nptsne.hsne_analysis.
AnalysisContainer
(top_analysis: nptsne.libs._nptsne._hsne_analysis.Analysis)¶ Bases:
object
A dict of dicts to store analyses
- Parameters
- top_analysis: :class:`Analysis`
The Analysis at the highest scale level containing all landmarks
Notes
The outer dict represents the scales and the inner dicts at scale level are indexed by the unique self-generated Analysis ids.
-
add_analysis
(analysis: nptsne.libs._nptsne._hsne_analysis.Analysis) → None¶ Add a new analysis to the container
- Parameters
- analysisAnalysis
The new analysis
-
get_analysis
(analysis_id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis¶ Get the analysis corresponding to the id
- Parameters
- analysis_idint
[description]
- Returns
- Analysis
[description]
- Raises
- ValueError
If the id does not correspond to an analysis
-
remove_analysis
(analysis_id: int) → List[int]¶ Removes analysis and, recursively, child analyses
- Returns
- list[int]
A list of analysis ids removed including this one
-
class
nptsne.hsne_analysis.
AnalysisModel
(hsne: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType)¶ Bases:
object
Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points. As selections are made starting from the top analysis new sub analyses are added to the tree in AnalsisModel. The helper class AnalysisContainer is responsible for maintaining this tree of analyses.
- Parameters
- hsneHSne
The python HSne wrapper class
- embedder_type
hsne_analysis.EmbedderType
The embedder to be used when creating a new analysis CPU or GPU
See also
hsne_analysis.Analysis
hsne_analysis.EmbedderType.CPU
Notes
The hsne_analysis.AnalysisModel contains the user driven selections when exploring an HSNE hierarchy. The AnalysisModel is created with a top level default hsne_analysis.Analysis containing all top level landmarks.
Examples
Initialize a model using loaded
HSne
data.>>> import nptsne >>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> model.top_scale_id 2
- Attributes
top_analysis
hsne_analysis.Analysis: The top level analysis
analysis_container
The container for all analyses.
- bottom_scale_id
- top_scale_id
-
add_new_analysis
(parent: nptsne.libs._nptsne._hsne_analysis.Analysis, parent_selection: numpy.ndarray) → nptsne.libs._nptsne._hsne_analysis.Analysis¶ Add a new analysis based on a selection in a parent analysis
- Parameters
- parent: Analysis
The parent analysis
- parent_selection: ndarray<np.uint32>
The selection indices in the parent analysis
Examples
Make a child analysis by selecting half of the points in the top analysis. The analysis is created at the next scale down is a child of the top level and contains an embedding of the right shape.
>>> import nptsne >>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> sel = np.arange(int(model.top_analysis.number_of_points / 2)) >>> analysis = model.add_new_analysis(model.top_analysis, sel) >>> analysis.scale_id 1 >>> analysis.parent_id == model.top_analysis.id True >>> analysis.embedding.shape == (analysis.number_of_points, 2) True
-
get_analysis
(id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis¶ Get the Analysis for the given id
- Parameters
- id: int
An Analysis id
- Returns
- The Analysis corresponding to the id
- Raises
- ValueError
If the id does not correspond to an analysis
Examples
>>> import nptsne >>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> id = model.top_analysis.id >>> str(model.top_analysis) == str(model.get_analysis(id)) True
-
remove_analysis
(id: int) → List[int]¶ Remove the analysis and all children
- Returns
- list[int]
list of deleted ids
Examples
>>> import nptsne >>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> sel = np.arange(int(model.top_analysis.number_of_points / 2)) >>> analysis = model.add_new_analysis(model.top_analysis, sel) >>> id = analysis.id >>> a_list = model.remove_analysis(analysis.id) >>> a_list == [id]
-
property
analysis_container
¶ The container for all analyses.
This is an internal property exposed for debug purposes only
-
property
top_analysis
¶ hsne_analysis.Analysis: The top level analysis
- Raises
- ValueError
If there is not top analysis
Examples
Retrieve the top level analysis containing all points at the top level.
>>> import nptsne >>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU) >>> analysis = model.top_analysis >>> analysis.scale_id 2
-
class
nptsne.hsne_analysis.
EmbedderType
(self: nptsne.libs._nptsne._hsne_analysis.EmbedderType, value: int) → None¶ Bases:
pybind11_builtins.pybind11_object
Enumeration used to select the embedder used. Two possibilities are supported:
EmbedderType.CPU: CPU tSNE EmbedderType.CPU: GPU tSNE
Members:
CPU
GPU
-
CPU
= <EmbedderType.CPU: 0>¶
-
GPU
= <EmbedderType.GPU: 1>¶
-
property
name
¶
-
property
value
¶
-
-
class
nptsne.hsne_analysis.
SparseTsne
¶ Bases:
pybind11_builtins.pybind11_object
SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1].
Forms an alternative to TextureTsne when GPU acceleration for creation of the embedding is not available for internal use in the Analysis class
See also
References
- 1(1,2)
Pezzotti, N., Lelieveldt, B.P.F., Maaten, L. van der, Höllt, T., Eisemann, E., Vilanova, A., 2017. Approximated and User Steerable tSNE for Progressive Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 23, 1739–1752.
- Attributes
embedding
ndarray
Embedding plot - shape embed dimensions x num points
-
do_iteration
(self: nptsne.libs._nptsne._hsne_analysis.SparseTsne) → None¶ Perform a single tSNE iteration on the sparse data. Once complete the embedding coordinates can be read via the embedding property
-
property
embedding
¶ Embedding plot - shape embed dimensions x num points