nptsne API Reference¶

Module summary¶

API Reference¶

The main API classes are:

t-SNE classes

TextureTsne : linear tSNE simple API
TextureTsneExtended : linear tSNE advanced API wrapper with additional functionality

HSNE classes

HSne : Hierarchical-SNE model builder
HSneScale : Wrapper for a scale in the HSNE model

Full details are in the reference below.

Code examples¶

The Examples in the documentation make use of the DocTest run_doctest.py to prepare the sample data. Refer to either the repository code Doctest code or Demo list for more information.

`nptsne`: t-SNE and HSNE data embedding¶

`nptsne.HSne`	Initialize an HSne wrapper with logging state.
`nptsne.HSneScale`	Create a wrapper for the HSNE data scale.
`nptsne.TextureTsne`	Create a wrapper class for the linear tSNE implementation.
`nptsne.TextureTsneExtended`	Create an extended functionality wrapper for the linear tSNE implementation.
`nptsne.KnnAlgorithm`	Enumeration used to select the knn algorithm used. Three possibilities are

A numpy compatible python extension for GPGPU linear complexity t-SNE and HSNE

This package contains classes that wrap linear complexity t-SNE and classes to support HSNE.

Available subpackages¶

hsne_analysis: Provides classes for selection driven navigation of the HSNE model and mapping back to the original data. The classes are intended to support visual analytics

Notes¶

ndarray types are the preferred parameters types for input and where possible internal data in the wrapped t-SNE [1] and HSNE [2] is returned without a copy in a ndarray.

References¶

1: Pezzotti, N. et al., GPGPU Linear Complexity t-SNE Optimization
2: Pezzotti, N. et al., Hierarchical Stochastic Neighbor Embedding

class nptsne.HSne(self: nptsne.libs._nptsne.HSne, verbose: bool = False) → None¶

Bases: pybind11_builtins.pybind11_object

Initialize an HSne wrapper with logging state.

Parameters

verbosebool: Enable verbose logging to standard output, default is False

Notes

HSne is a simple wrapper API for the Hierarchical SNE implementation.

Hierarchical SNE is is a GPU compute shader implementation of Hierarchical Stochastic Neighborhood Embedding described in [1].

The wrapper can be used to create a new or load an existing hSNE analysis. The hSNE analysis is then held in the HSne instance and can be accessed through the class api.

References

1: Hierarchical Stochastic Neighbor Embedding

Examples

Create an HSNE wrapper

>>> import nptsne
>>> hsne = nptsne.HSne(True)

Attributes

num_data_points: int: The number of data points in the HSne.
num_dimensions: int: The number of dimensions associated with the original data.
num_scales: int: The number of scales in the HSne.

create_hsne(*args, **kwargs)¶

Overloaded function.

create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int) -> bool
create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int, point_ids: numpy.ndarray[numpy.uint64]) -> bool

Create the hSNE analysis data hierarchy with user assigned point ids from the input data with the number of scales required.

Parameters

Xndarray

The data used to create the saved file. Shape is : (num. data points, num. dimensions)

num_scalesint: How many scales to create in the hsne analysis
point_idsndarray, optional: Array of ids associated with the data points

Examples

>>> import nptsne
>>> hsne = nptsne.HSne(True)
>>> hsne.create_hsne(sample_hsne_data, 3)
True
>>> hsne.num_data_points
10000
>>> hsne.num_dimensions
16
>>> hsne.num_scales
3

get_scale(self: nptsne.libs._nptsne.HSne, scale_number: int) → HSneScale ¶

Get the scale information at the index. 0 is the HSNE data scale.

Parameters

scale_indexint: Index of the scale to retrieve

Returns

HSneScale: A numpy array contain a flatten (1D) embedding

Examples

The number of landmarks in scale 0 is the number of data points.

>>> scale = sample_hsne.get_scale(0)
>>> scale.num_points
10000

load_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], file_path: str) → bool¶

Load the HSNE analysis data hierarchy from a pre-existing HSNE file.

Parameters

Xndarray: The data used to create the saved file. Shape is : (num. data points, num. dimensions)
file_pathstr: Path to saved HSNE file

Examples

Load hsne from a file, and check that is contains the expected data

>>> import nptsne
>>> import doctest
>>> loaded_hsne = nptsne.HSne(True)
>>> loaded_hsne.load_hsne(sample_hsne_data, sample_hsne_file)  
True
>>> loaded_hsne.num_data_points
10000
>>> loaded_hsne.num_dimensions
16
>>> loaded_hsne.num_scales
3

static read_num_scales(file_path: str) → int¶

Read the number of scales defined in stored hSNE data without fully loading the file.

Parameters

filenamestr: The path to a saved hSNE

Returns

int: The number of scales in the saved hierarchy

Examples

Read the number of scales from a saved file

>>> import nptsne
>>> nptsne.HSne.read_num_scales(sample_hsne_file)
3

save(self: nptsne.libs._nptsne.HSne, file_path: str) → None¶

Save the HSNE as a binary structure to a file

Parameters

filenamestr: The file to save to. If it already exists it is overwritten.

Examples

Save the hsne to a file and check the number of scales was saved correctly.

>>> import nptsne
>>> from pathlib import Path
>>> from tempfile import gettempdir
>>> savepath = Path(gettempdir(), "save_test.hsne")
>>> sample_hsne.save(str(savepath))
>>> nptsne.HSne.read_num_scales(str(savepath))
3

property num_data_points¶

int: The number of data points in the HSne.

Examples

>>> sample_hsne.num_data_points
10000

property num_dimensions¶

int: The number of dimensions associated with the original data.

Examples

>>> sample_hsne.num_dimensions
16

property num_scales¶

int: The number of scales in the HSne.

Examples

>>> sample_hsne.num_scales
3

class nptsne.HSneScale(self: nptsne.libs._nptsne.HSneScale, hsne: nptsne.libs._nptsne.HSne, scale_number: int) → None¶

Bases: pybind11_builtins.pybind11_object

Create a wrapper for the HSNE data scale. The function HSne.get_scale() works more directly than calling the constructor on this class.

Parameters

hsneHSne: The hierarchical SNE being explored
scale_numberint: The scale from the nsne to wrap

Examples

Using the initializer to create an HSneScale wrapper. Scale 0 contains the datapoints. (Prefer the HSne.get_scale function)

>>> import nptsne
>>> scale = nptsne.HSneScale(sample_hsne, 0)
>>> scale.num_points
10000

Attributes

num_points: int: The number of landmark points in this scale
transition_matrix: The transition (probability) matrix in this scale.
landmark_orig_indexes: Original data indexes for each landmark in this scale.

get_landmark_weight(self: nptsne.libs._nptsne.HSneScale) → numpy.ndarray[numpy.float32]¶

The weights per landmark in the scale.

Returns

ndarray: Weights array in landmark index order

Examples

The size of landmark weights should match the number of points

>>> num_points = sample_scale2.num_points
>>> weights = sample_scale2.get_landmark_weight()
>>> weights.shape[0] == num_points
True

All weights at scale 0 should be 1.0

>>> weights = sample_scale0.get_landmark_weight()
>>> test = weights[0] == 1.0
>>> test.all()
True

property area_of_influence¶

The area of influence matrix in this scale.

Returns

list(list(tuple)):: The area of influence matrix in this scale

Notes

The return is in list-of-lists (LIL) format. The list returned has one entry for each landmark point i at scale s-1, :math: mathcal{L}_{i}^{s-1}. Each entry is a list of tuples at where each tuple contains an index j for a landmark at scale s, :math: mathcal{L}_{j}^{s} and a value :math: mathit{I}^{S}(i,j) representing the probability that the landmark point i at scale s-1 is influenced by landmark j at scale s.

The resulting matrix is sparse.

Examples

The size of landmark area of influence should match the number of points in the more detailed (s-1) scale.

>>> len(sample_scale2.area_of_influence) == sample_scale1.num_points
True

Loop over all the landmarks, i, at scale 1. Sum the influences from each landmark j at scale 2 on the individual landmarks i in scale 1. For each landmark i at scale 1 the total influence from the j landmarks should be approximately 1.0. In this random data test the difference is assumed to be < \(1.5\mathrm{e}{-2}\).

>>> aoi_2on1 = sample_scale2.area_of_influence
>>> scale1_sum = {}
>>> all_tots_are_1 = True
>>> for i in aoi_2on1:
...     sum_inf = 0.0
...     for j_tup in i:
...         sum_inf += j_tup[1]
...     if abs(1 - sum_inf) > 0.015:
...         print(f"{1- sum_inf}")
...         all_tots_are_1 = False
>>> all_tots_are_1 == True
True

property landmark_orig_indexes¶

Original data indexes for each landmark in this scale.

Returns

ndarray:: An ndarray of the original data indexes.

Examples

At scale 0 the landmarks are all the data points.

>>> sample_scale0.landmark_orig_indexes.shape
(10000,)
>>> sample_scale0.landmark_orig_indexes[0]
0
>>> sample_scale0.landmark_orig_indexes[9999]
9999

property num_points¶

int: The number of landmark points in this scale

Examples

>>> sample_scale0.num_points
10000

property transition_matrix¶

The transition (probability) matrix in this scale.

Returns

list(list(tuple)):: The transition (probability) matrix in this scale in list-of-lists form

Notes

The list returned has one entry for each landmark point, each entry is a list The inner list contains tuples where the first item is an integer landmark index in the scale and the second item is the transition matrix value for the two points.

The resulting matrix is sparse in list-of-lists (LIL) form, one list per row containing a list of (column number:value) tuples.

Examples

The size of the transition matrix should match the number of points

>>> sample_scale0.num_points == len(sample_scale0.transition_matrix)
True
>>> sample_scale1.num_points == len(sample_scale1.transition_matrix)
True
>>> sample_scale2.num_points == len(sample_scale2.transition_matrix)
True

class nptsne.KnnAlgorithm(self: nptsne.libs._nptsne.KnnAlgorithm, value: int) → None¶

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the knn algorithm used. Three possibilities are supported:

KnnAlgorithm.Flann: Knn using FLANN - Fast Library for Approximate Nearest Neighbors KnnAlgorithm.HNSW: Knn using Hnswlib - fast approximate nearest neighbor search KnnAlgorithm.Annoy: Knn using Annoy - Spotify Approximate Nearest Neighbors Oh Yeah

Members:

Flann

HNSW

Annoy

get_supported_metrics(self: int) → Dict[str, object]¶

Get a dict containing KnnDistanceMetric values supported by the KnnAlgorithm.

Parameters

knn_libKnnAlgorithm: The algorithm being queried.

Returns

ndarray: A numpy array contain a flatten (1D) embedding

Examples

Each algorithm has different support. See the tests below.

>>> import nptsne
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Flann)
>>> for i in support.items():
...     print(i[0])
Euclidean
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Annoy)
>>> for i in support.items():
...     print(i[0])
Cosine
Dot
Euclidean
Manhattan
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.HNSW)
>>> for i in support.items():
...     print(i[0])
Euclidean
Inner Product
>>> support["Euclidean"] is nptsne.KnnDistanceMetric.Euclidean
True

Annoy = <KnnAlgorithm.Annoy: 1>¶

Flann = <KnnAlgorithm.Flann: -1>¶

HNSW = <KnnAlgorithm.HNSW: 0>¶

property name¶

property value¶

class nptsne.KnnDistanceMetric(self: nptsne.libs._nptsne.KnnDistanceMetric, value: int) → None¶

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the knn distance metric used. Five possibilities are supported:

KnnDistanceMetric.Euclidean: Euclidean metric for all algorithms KnnDistanceMetric.InnerProduct: Inner Product metric for HNSW KnnDistanceMetric.Cosine: Cosine metric for Annoy KnnDistanceMetric.Manhattan: Manhattan metric for Annoy KnnDistanceMetric.Hamming: Hamming metric for Annoy, not supported KnnDistanceMetric.Dot: Dot metric for Annoy

Members:

Euclidean

Cosine

InnerProduct

Manhattan

Hamming

Dot

Cosine = <KnnDistanceMetric.Cosine: 1>¶

Dot = <KnnDistanceMetric.Dot: 5>¶

Euclidean = <KnnDistanceMetric.Euclidean: 0>¶

Hamming = <KnnDistanceMetric.Hamming: 4>¶

InnerProduct = <KnnDistanceMetric.InnerProduct: 2>¶

Manhattan = <KnnDistanceMetric.Manhattan: 3>¶

property name¶

property value¶

class nptsne.TextureTsne(self: nptsne.libs._nptsne.TextureTsne, verbose: bool = False, iterations: int = 1000, num_target_dimensions: int = 2, perplexity: int = 30, exaggeration_iter: int = 250, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None¶

Bases: pybind11_builtins.pybind11_object

Create a wrapper class for the linear tSNE implementation.

Parameters

verbosebool: Enable verbose logging to standard output
iterationsint: The number of iterations to perform. This must be at least 1000.
num_target_dimensionsint: The number of dimensions for the output embedding. Default is 2.
perplexityint: The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.
exaggeration_iterint: The iteration when force exaggeration starts to decay.
knn_algorithmKnnAlgorithm: The knn algorithm used for the nearest neighbor calculation. The default is Flann for less than 50 dimensions HNSW may be faster
knn_metricKnnDistanceMetric: The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann

See also

TextureTsneExtended

Notes

TextureTsne is a GPU compute shader implementation of the gradient descent linear tSNE. If the system does not support OpenGL 4.3 an abover the implementation falls back to the a Texture rendering approach as described in [1].

References

1: Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181

Examples

Create an TextureTsne wrapper

>>> import nptsne
>>> tsne = nptsne.TextureTsne(verbose=True, knn_algorithm=nptsne.KnnAlgorithm.Annoy)
>>> tsne.verbose
True
>>> tsne.iterations
1000
>>> tsne.num_target_dimensions
2
>>> tsne.perplexity
30
>>> tsne.exaggeration_iter
250
>>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy
True

fit_transform(self: nptsne.libs._nptsne.TextureTsne, X: numpy.ndarray[numpy.float32]) → numpy.ndarray[numpy.float32]¶

Fit X into an embedded space and return that transformed output.

Parameters

Xndarray: The input data with shape (num. data points, num. dimensions)

Returns

ndarray: A numpy array contain a flatten (1D) embedding

Examples

An 2D embedding is returned in the form of a numpy array [x0, y0, x1, y1, …].

>>> import nptsne
>>> tsne = nptsne.TextureTsne()
>>> embedding = tsne.fit_transform(sample_tsne_data)  
>>> embedding.shape  
(4000,)
>>> import numpy  
>>> embedding.dtype == numpy.float32  
True

property exaggeration_iter¶

int: The iteration where attractive force exaggeration starts to decay, set at initialization.

Notes

The gradient of the cost function used to iteratively optimize the embedding points \(y_i\) is a sum of an attractive and repulsive force \(\frac{\delta C} {\delta y_i} = 4(\phi * F_i ^{attr} - F_i ^{rep})\) The iterations up to exaggeration_iter increase the \(F_i ^{attr}\) term by the factor \(\phi\) which then decays to 1.

Examples

>>> sample_texture_tsne.exaggeration_iter
250

property iterations¶

int: The number of iterations, set at initialization.

Examples

>>> sample_texture_tsne.iterations
1000

property knn_algorithm¶

int: The KnnAlgorithm value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne.knn_algorithm == nptsne.KnnAlgorithm.Flann
True

property knn_distance_metric¶

int: KnnDistanceMetric value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean
True

property num_target_dimensions¶

int: The number of target dimensions, set at initialization.

Examples

>>> sample_texture_tsne.num_target_dimensions
2

property perplexity¶

int: The tsne perplexity, set at initialization.

Examples

>>> sample_texture_tsne.perplexity
30

property verbose¶

bool: True if verbose logging is enabled. Set at initialization.

Examples

>>> sample_texture_tsne.verbose
False

class nptsne.TextureTsneExtended(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, num_target_dimensions: int = 2, perplexity: int = 30, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None¶

Bases: pybind11_builtins.pybind11_object

Create an extended functionality wrapper for the linear tSNE implementation.

Parameters

verbosebool: Enable verbose logging to standard output, default is False
num_target_dimensionsint: The number of dimensions for the output embedding. Default is 2.
perplexityint: The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.
knn_algorithmKnnAlgorithm: The knn algorithm used for the nearest neighbor calculation. The default is ‘Flann’ for less than 50 dimensions ‘HNSW’ may be faster
knn_metricKnnDistanceMetric: The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann

See also

TextureTsne

Notes

TextureTsneExtended offers additional control over the exaggeration decay compares to TextureTsne. Additionally it supports inputting an initial embedding. Linear tSNE is described in [1].

References

1: Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181

Examples

Create an TextureTsneExtended wrapper

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended(verbose=True, num_target_dimensions=2, perplexity=35, knn_algorithm=nptsne.KnnAlgorithm.Annoy)
>>> tsne.verbose
True
>>> tsne.num_target_dimensions
2
>>> tsne.perplexity
35
>>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy
True

Attributes

decay_started_at: int: The iteration number when exaggeration decay started.
iteration_count: int: The number of completed iterations of tSNE gradient descent.

close(self: nptsne.libs._nptsne.TextureTsneExtended) → None¶: Release GPU resources for the transform

init_transform(self: nptsne.libs._nptsne.TextureTsneExtended, X: numpy.ndarray[numpy.float32], initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → bool¶

Initialize the transform with given data and optional initial embedding. Fit X into an embedded space and return that transformed output.

Parameters

Xndarray: The input data with shape (num. data points, num. dimensions)
initial_embeddingndarray: An optional initial embedding. Shape should be (num data points, num output dimensions)

Returns

bool: True if successful, False otherwise

Examples

Create an TextureTsneExtended wrapper and initialize the data. This step performs the knn.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True

reinitialize_transform(self: nptsne.libs._nptsne.TextureTsneExtended, initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → None¶

Fit X into an embedded space and return that transformed output. Knn is not recomputed. If no initial_embedding is supplied the embedding is re-randomized.

Parameters

initial_embeddingndarray: An optional initial embedding. Shape should be (num data points, num output dimensions)

Examples

Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> embedding = tsne.run_transform(iterations=100)    
>>> tsne.iteration_count    
100
>>> tsne.reinitialize_transform()    
>>> tsne.iteration_count    
0

run_transform(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, iterations: int = 1000) → numpy.ndarray[numpy.float32]¶

Run the transform gradient descent for a number of iterations with the current settings for exaggeration.

Parameters

verbosebool: Enable verbose logging to standard output.
iterationsint: The number of iterations to run.

Returns

ndarray: A numpy array contain a flatten (1D) embedding. Coordinates are arranged: x0, y0, x, y1, …

Examples

Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> embedding = tsne.run_transform(iterations=250)    
>>> embedding.shape    
(4000,)
>>> tsne.iteration_count    
250

start_exaggeration_decay(self: nptsne.libs._nptsne.TextureTsneExtended) → None¶

Enable exaggeration decay. Effective on next call to run_transform. From this point exaggeration decays over the following 150 iterations, the decay this is a fixed parameter. This call is ony effective once.

Raises

RuntimeError: If the decay is already active. This can be ignored.

Examples

Starting decay exaggeration is recorded in the decay_started_at property.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> tsne.decay_started_at
-1
>>> embedding = tsne.run_transform(iterations=100)    
>>> tsne.start_exaggeration_decay()    
>>> tsne.decay_started_at    
100

property decay_started_at¶

int: The iteration number when exaggeration decay started. Is -1 if exaggeration decay has not started.

Examples

Starting decay exaggeration is recorded in the decay_started_at property.

>>> sample_texture_tsne_extended.decay_started_at
-1

property iteration_count¶

int: The number of completed iterations of tSNE gradient descent.

>>> sample_texture_tsne_extended.iteration_count
0

property knn_algorithm¶

int: The KnnAlgorithm value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne_extended.knn_algorithm == nptsne.KnnAlgorithm.Flann
True

property knn_distance_metric¶

int: The KnnDistanceMetric value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne_extended.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean
True

property num_target_dimensions¶

int: The number of target dimensions, set at initialization.

Examples

>>> sample_texture_tsne_extended.num_target_dimensions
2

property perplexity¶

int: The tsne perplexity, set at initialization.

Examples

>>> sample_texture_tsne_extended.perplexity
30

property verbose¶

bool: True if verbose logging is enabled. Set at initialization.

Examples

>>> sample_texture_tsne_extended.verbose
False

`nptsne.hsne_analysis`: HSNE visual analysis support submodule¶

`nptsne.hsne_analysis.Analysis`	Create a new analysis as a child of an (optional) parent analysis.
`nptsne.hsne_analysis.AnalysisContainer`	A dict of dicts to store analyses
`nptsne.hsne_analysis.AnalysisModel`	Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points.
`nptsne.hsne_analysis.EmbedderType`	Enumeration used to select the embedder used. Two possibilities are
`nptsne.hsne_analysis.SparseTsne`	SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1].

class nptsne.hsne_analysis.Analysis(self: nptsne.libs._nptsne._hsne_analysis.Analysis, hnse: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType, parent: nptsne.libs._nptsne._hsne_analysis.Analysis = None, parent_selection: List[int] = []) → None¶

Bases: pybind11_builtins.pybind11_object

Create a new analysis as a child of an (optional) parent analysis.

Parameters

hsneHSne: The hierarchical SNE being explored
embedder_typeEmbedderType: The tSNE to use CPU or GPU based
parentAnalysis, optional: The parent Analysis (where the selection was performed) if any
parent_selectionlist, optional: List of selection indexes in the parent analysis.

Notes

Together with AnalysisModel provides support for visual analytics of an hSNE. The Analysis class holds both the chosen landmarks at a particular scale but also permits referencing back to the original data. Additionally a t-SNE embedder is included (a choice is provided between GPU and CPU implementations) which can be used to create an embedding of the selected landmarks.

Examples

The Analysis constructor is meant for use by the :class: nptsne.hsne_analysis.AnalysisModel. The example here illustrates how a top level analysis would be created from a sample hsne.

>>> import nptsne
>>> top_analysis = nptsne.hsne_analysis.Analysis(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> top_analysis.scale_id
2
>>> sample_hsne.get_scale(top_analysis.scale_id).num_points == top_analysis.number_of_points
True

Attributes

number_of_points: int : number of landmarks in this Analysis
parent_id: int : Unique id of the parent analysis
transition_matrix: list(dict) : The transition (probability) matrix in this Analysis
landmark_weights: ndarray : the weights for the landmarks in this Analysis
landmark_indexes: ndarray : the indexes for the landmarks in this Analysis
landmark_orig_indexes: ndarray : the original data indexes for the landmarks in this Analysis
embedding: ndarray : the tSNE embedding generated for this Analysis

do_iteration(self: nptsne.libs._nptsne._hsne_analysis.Analysis) → None¶: Perform one iteration of the chosen embedder

get_area_of_influence(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int], threshold: float = 0.3) → numpy.ndarray[numpy.float32]¶

Get the area of influence of the selection in the original data. For more information on the threshold refer to the HSNE paper section 4.2 Filtering and drilling down.

A fast but less accurate approach to obtaining area of influence is get_mapped_area_of_influence.

Parameters

select_listlist: A list of selection indexes for landmarks in this analysis
threshold: float, optional: The minimum value required for the underlying datapoint to be considered in the landmark’s region of influence. Default is 0.3. The parameter must be in the range 0 to 1.0, values outside the range it will be ignored.

Returns

ndarray: The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.

See also

get_fast_area_of_influence

get_fast_area_of_influence(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int]) → numpy.ndarray[numpy.float32]¶

Fast method to get the area of influence of the selection in the original data based on non overlapping \({1}\rightarrow{n}\) mapping of scale landmarks to original data points.

This mapping is derived by working bottom up from the data points and finding the landmarks at each scale with the maximum influence. The mapping is calculated once on the first call to this function so subsequent calls are fast.

Due to thresholding it is possible that a datapoint may have no representative landmark at a specific scale.

Parameters

select_listlist: A list of selection indexes for landmarks in this analysis

Returns

ndarray: The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.

See also

get_area_of_influence

Examples

Demonstrate the non-overlap of the area of influence for each landmark.

>>> import math
>>> import numpy as np
>>> all_top_landmarks=list(range(0,sample_analysis.number_of_points))
>>> all_influenced=sample_analysis.get_fast_area_of_influence(all_top_landmarks)
>>> all_influenced.shape[0] == 10000
True

Accumulate the individual landmark AOIs and check the total

>>> infl_accum = np.zeros((10000,), dtype=np.float32)
>>> total = 0
>>> for i in all_top_landmarks:
...     influenced = sample_analysis.get_fast_area_of_influence([i])
...     total = total + influenced.sum()
...     infl_accum = np.add(infl_accum, influenced)
>>> total == 10000
True

Verify that all AOIs are non-overlapping, each datapoint occurs once and only once.

>>> np.all(infl_accum == 1)
True

property embedding¶

ndarray : the tSNE embedding generated for this Analysis

Examples

An embedding is a 2d float array. One entry per point.

>>> import numpy as np
>>> sample_analysis.embedding.shape == (sample_analysis.number_of_points, 2)
True
>>> sample_analysis.embedding.dtype == np.float32
True

property id¶

int: Internally generated unique id for the analysis.

Examples

>>> sample_analysis.id
0

property landmark_indexes¶

ndarray : the indexes for the landmarks in this Analysis

Examples

In a complete top level analysis all points are present in this case all the points at scale2.

>>> import numpy as np
>>> np.array_equal(
... np.arange(sample_scale2.num_points, dtype=np.uint32), 
... sample_analysis.landmark_indexes)
True

property landmark_orig_indexes¶

ndarray : the original data indexes for the landmarks in this Analysis

Examples

The indexes are in the range of the original point indexes.

>>> import numpy as np
>>> np.logical_and(
... sample_analysis.landmark_orig_indexes >= 0,
... sample_analysis.landmark_orig_indexes < 10000).any()
True

property landmark_weights¶

ndarray : the weights for the landmarks in this Analysis

Examples

There will be a weight for every point.

>>> weights = sample_analysis.landmark_weights
>>> weights.shape == (sample_analysis.number_of_points,)
True

property number_of_points¶

int : number of landmarks in this Analysis

Examples

The sample analysis is all the top scale points

>>> sample_analysis.number_of_points == sample_scale2.num_points
True

property parent_id¶: int : Unique id of the parent analysis

property scale_id¶

int: The number of this HSNE scale where this analysis is created.

Examples

>>> sample_analysis.scale_id
2

property transition_matrix¶: list(dict) : The transition (probability) matrix in this Analysis

class nptsne.hsne_analysis.AnalysisContainer(top_analysis: nptsne.libs._nptsne._hsne_analysis.Analysis)¶

Bases: object

A dict of dicts to store analyses

Parameters

top_analysis: :class:`Analysis`: The Analysis at the highest scale level containing all landmarks

Notes

The outer dict represents the scales and the inner dicts at scale level are indexed by the unique self-generated Analysis ids.

add_analysis(analysis: nptsne.libs._nptsne._hsne_analysis.Analysis) → None¶

Add a new analysis to the container

Parameters

analysisAnalysis: The new analysis

get_analysis(analysis_id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis¶

Get the analysis corresponding to the id

Parameters

analysis_idint: [description]

Returns

Analysis: [description]

Raises

ValueError: If the id does not correspond to an analysis

remove_analysis(analysis_id: int) → List[int]¶

Removes analysis and, recursively, child analyses

Returns

list[int]: A list of analysis ids removed including this one

class nptsne.hsne_analysis.AnalysisModel(hsne: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType)¶

Bases: object

Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points. As selections are made starting from the top analysis new sub analyses are added to the tree in AnalsisModel. The helper class AnalysisContainer is responsible for maintaining this tree of analyses.

Parameters

hsneHSne: The python HSne wrapper class
embedder_typehsne_analysis.EmbedderType: The embedder to be used when creating a new analysis CPU or GPU

See also

hsne_analysis.Analysis
hsne_analysis.EmbedderType.CPU

Notes

The hsne_analysis.AnalysisModel contains the user driven selections when exploring an HSNE hierarchy. The AnalysisModel is created with a top level default hsne_analysis.Analysis containing all top level landmarks.

Examples

Initialize a model using loaded HSne data.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> model.top_scale_id
2

Attributes

top_analysis: hsne_analysis.Analysis: The top level analysis
analysis_container: The container for all analyses.
bottom_scale_id
top_scale_id

add_new_analysis(parent: nptsne.libs._nptsne._hsne_analysis.Analysis, parent_selection: numpy.ndarray) → nptsne.libs._nptsne._hsne_analysis.Analysis¶

Add a new analysis based on a selection in a parent analysis

Parameters

parent: Analysis: The parent analysis
parent_selection: ndarray<np.uint32>: The selection indices in the parent analysis

Examples

Make a child analysis by selecting half of the points in the top analysis. The analysis is created at the next scale down is a child of the top level and contains an embedding of the right shape.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> sel = np.arange(int(model.top_analysis.number_of_points / 2))
>>> analysis = model.add_new_analysis(model.top_analysis, sel)
>>> analysis.scale_id
1
>>> analysis.parent_id == model.top_analysis.id
True
>>> analysis.embedding.shape == (analysis.number_of_points, 2)
True

get_analysis(id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis¶

Get the Analysis for the given id

Parameters

id: int: An Analysis id

Returns

The Analysis corresponding to the id

Raises

ValueError: If the id does not correspond to an analysis

Examples

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> id = model.top_analysis.id
>>> str(model.top_analysis) == str(model.get_analysis(id))
True

remove_analysis(id: int) → List[int]¶

Remove the analysis and all children

Returns

list[int]: list of deleted ids

Examples

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> sel = np.arange(int(model.top_analysis.number_of_points / 2))
>>> analysis = model.add_new_analysis(model.top_analysis, sel)
>>> id = analysis.id
>>> a_list = model.remove_analysis(analysis.id)
>>> a_list == [id]

property analysis_container¶

The container for all analyses.

This is an internal property exposed for debug purposes only

property top_analysis¶

hsne_analysis.Analysis: The top level analysis

Raises

ValueError: If there is not top analysis

Examples

Retrieve the top level analysis containing all points at the top level.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> analysis = model.top_analysis
>>> analysis.scale_id
2

class nptsne.hsne_analysis.EmbedderType(self: nptsne.libs._nptsne._hsne_analysis.EmbedderType, value: int) → None¶

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the embedder used. Two possibilities are supported:

EmbedderType.CPU: CPU tSNE EmbedderType.CPU: GPU tSNE

Members:

CPU

GPU

CPU = <EmbedderType.CPU: 0>¶

GPU = <EmbedderType.GPU: 1>¶

property name¶

property value¶

class nptsne.hsne_analysis.SparseTsne¶

Bases: pybind11_builtins.pybind11_object

SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1].

Forms an alternative to TextureTsne when GPU acceleration for creation of the embedding is not available for internal use in the Analysis class

See also

Analysis
EmbedderType

References

1(1,2): Pezzotti, N., Lelieveldt, B.P.F., Maaten, L. van der, Höllt, T., Eisemann, E., Vilanova, A., 2017. Approximated and User Steerable tSNE for Progressive Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 23, 1739–1752.

Attributes

embeddingndarray: Embedding plot - shape embed dimensions x num points

do_iteration(self: nptsne.libs._nptsne._hsne_analysis.SparseTsne) → None¶: Perform a single tSNE iteration on the sparse data. Once complete the embedding coordinates can be read via the embedding property

property embedding¶: Embedding plot - shape embed dimensions x num points

nptsne API Reference¶

Module summary¶

API Reference¶

Code examples¶

nptsne: t-SNE and HSNE data embedding¶

Available subpackages¶

Notes¶

References¶

nptsne.hsne_analysis: HSNE visual analysis support submodule¶

`nptsne`: t-SNE and HSNE data embedding¶

`nptsne.hsne_analysis`: HSNE visual analysis support submodule¶