nptsne API Reference

Module summary

API Reference

The main API classes are:

t-SNE classes
HSNE classes
  • HSne : Hierarchical-SNE model builder

  • HSneScale : Wrapper for a scale in the HSNE model

Full details are in the reference below.

Code examples

The Examples in the documentation make use of the DocTest run_doctest.py to prepare the sample data. Refer to either the repository code Doctest code or Demo list for more information.

nptsne: t-SNE and HSNE data embedding

nptsne.HSne

Initialize an HSne wrapper with logging state.

nptsne.HSneScale

Create a wrapper for the HSNE data scale.

nptsne.TextureTsne

Create a wrapper class for the linear tSNE implementation.

nptsne.TextureTsneExtended

Create an extended functionality wrapper for the linear tSNE implementation.

nptsne.KnnAlgorithm

Enumeration used to select the knn algorithm used. Three possibilities are

A numpy compatible python extension for GPGPU linear complexity t-SNE and HSNE

This package contains classes that wrap linear complexity t-SNE and classes to support HSNE.

Available subpackages

hsne_analysis

Provides classes for selection driven navigation of the HSNE model and mapping back to the original data. The classes are intended to support visual analytics

Notes

ndarray types are the preferred parameters types for input and where possible internal data in the wrapped t-SNE [1] and HSNE [2] is returned without a copy in a ndarray.

References

1

Pezzotti, N. et al., GPGPU Linear Complexity t-SNE Optimization

2

Pezzotti, N. et al., Hierarchical Stochastic Neighbor Embedding

class nptsne.HSne(self: nptsne.libs._nptsne.HSne, verbose: bool = False) → None

Bases: pybind11_builtins.pybind11_object

Initialize an HSne wrapper with logging state.

Parameters
verbosebool

Enable verbose logging to standard output, default is False

Notes

HSne is a simple wrapper API for the Hierarchical SNE implementation.

Hierarchical SNE is is a GPU compute shader implementation of Hierarchical Stochastic Neighborhood Embedding described in [1].

The wrapper can be used to create a new or load an existing hSNE analysis. The hSNE analysis is then held in the HSne instance and can be accessed through the class api.

References

1

Hierarchical Stochastic Neighbor Embedding

Examples

Create an HSNE wrapper

>>> import nptsne
>>> hsne = nptsne.HSne(True)
Attributes
num_data_points

int: The number of data points in the HSne.

num_dimensions

int: The number of dimensions associated with the original data.

num_scales

int: The number of scales in the HSne.

create_hsne(*args, **kwargs)

Overloaded function.

  1. create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int) -> bool

  2. create_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], num_scales: int, point_ids: numpy.ndarray[numpy.uint64]) -> bool

    Create the hSNE analysis data hierarchy with user assigned point ids from the input data with the number of scales required.

Parameters
Xndarray

The data used to create the saved file. Shape is : (num. data points, num. dimensions)

num_scalesint

How many scales to create in the hsne analysis

point_idsndarray, optional

Array of ids associated with the data points

Examples

>>> import nptsne
>>> hsne = nptsne.HSne(True)
>>> hsne.create_hsne(sample_hsne_data, 3)
True
>>> hsne.num_data_points
10000
>>> hsne.num_dimensions
16
>>> hsne.num_scales
3
get_scale(self: nptsne.libs._nptsne.HSne, scale_number: int)HSneScale

Get the scale information at the index. 0 is the HSNE data scale.

Parameters
scale_indexint

Index of the scale to retrieve

Returns
HSneScale

A numpy array contain a flatten (1D) embedding

Examples

The number of landmarks in scale 0 is the number of data points.

>>> scale = sample_hsne.get_scale(0)
>>> scale.num_points
10000
load_hsne(self: nptsne.libs._nptsne.HSne, X: numpy.ndarray[numpy.float32], file_path: str) → bool

Load the HSNE analysis data hierarchy from a pre-existing HSNE file.

Parameters
Xndarray

The data used to create the saved file. Shape is : (num. data points, num. dimensions)

file_pathstr

Path to saved HSNE file

Examples

Load hsne from a file, and check that is contains the expected data

>>> import nptsne
>>> import doctest
>>> loaded_hsne = nptsne.HSne(True)
>>> loaded_hsne.load_hsne(sample_hsne_data, sample_hsne_file)  
True
>>> loaded_hsne.num_data_points
10000
>>> loaded_hsne.num_dimensions
16
>>> loaded_hsne.num_scales
3
static read_num_scales(file_path: str) → int

Read the number of scales defined in stored hSNE data without fully loading the file.

Parameters
filenamestr

The path to a saved hSNE

Returns
int

The number of scales in the saved hierarchy

Examples

Read the number of scales from a saved file

>>> import nptsne
>>> nptsne.HSne.read_num_scales(sample_hsne_file)
3
save(self: nptsne.libs._nptsne.HSne, file_path: str) → None

Save the HSNE as a binary structure to a file

Parameters
filenamestr

The file to save to. If it already exists it is overwritten.

Examples

Save the hsne to a file and check the number of scales was saved correctly.

>>> import nptsne
>>> from pathlib import Path
>>> from tempfile import gettempdir
>>> savepath = Path(gettempdir(), "save_test.hsne")
>>> sample_hsne.save(str(savepath))
>>> nptsne.HSne.read_num_scales(str(savepath))
3
property num_data_points

int: The number of data points in the HSne.

Examples

>>> sample_hsne.num_data_points
10000
property num_dimensions

int: The number of dimensions associated with the original data.

Examples

>>> sample_hsne.num_dimensions
16
property num_scales

int: The number of scales in the HSne.

Examples

>>> sample_hsne.num_scales
3
class nptsne.HSneScale(self: nptsne.libs._nptsne.HSneScale, hsne: nptsne.libs._nptsne.HSne, scale_number: int) → None

Bases: pybind11_builtins.pybind11_object

Create a wrapper for the HSNE data scale. The function HSne.get_scale() works more directly than calling the constructor on this class.

Parameters
hsneHSne

The hierarchical SNE being explored

scale_numberint

The scale from the nsne to wrap

Examples

Using the initializer to create an HSneScale wrapper. Scale 0 contains the datapoints. (Prefer the HSne.get_scale function)

>>> import nptsne
>>> scale = nptsne.HSneScale(sample_hsne, 0)
>>> scale.num_points
10000
Attributes
num_points

int: The number of landmark points in this scale

transition_matrix

The transition (probability) matrix in this scale.

landmark_orig_indexes

Original data indexes for each landmark in this scale.

get_landmark_weight(self: nptsne.libs._nptsne.HSneScale) → numpy.ndarray[numpy.float32]

The weights per landmark in the scale.

Returns
ndarray

Weights array in landmark index order

Examples

The size of landmark weights should match the number of points

>>> num_points = sample_scale2.num_points
>>> weights = sample_scale2.get_landmark_weight()
>>> weights.shape[0] == num_points
True

All weights at scale 0 should be 1.0

>>> weights = sample_scale0.get_landmark_weight()
>>> test = weights[0] == 1.0
>>> test.all()
True
property area_of_influence

The area of influence matrix in this scale.

Returns
list(list(tuple)):

The area of influence matrix in this scale

Notes

The return is in list-of-lists (LIL) format. The list returned has one entry for each landmark point i at scale s-1, :math: mathcal{L}_{i}^{s-1}. Each entry is a list of tuples at where each tuple contains an index j for a landmark at scale s, :math: mathcal{L}_{j}^{s} and a value :math: mathit{I}^{S}(i,j) representing the probability that the landmark point i at scale s-1 is influenced by landmark j at scale s.

The resulting matrix is sparse.

Examples

The size of landmark area of influence should match the number of points in the more detailed (s-1) scale.

>>> len(sample_scale2.area_of_influence) == sample_scale1.num_points
True

Loop over all the landmarks, i, at scale 1. Sum the influences from each landmark j at scale 2 on the individual landmarks i in scale 1. For each landmark i at scale 1 the total influence from the j landmarks should be approximately 1.0. In this random data test the difference is assumed to be < \(1.5\mathrm{e}{-2}\).

>>> aoi_2on1 = sample_scale2.area_of_influence
>>> scale1_sum = {}
>>> all_tots_are_1 = True
>>> for i in aoi_2on1:
...     sum_inf = 0.0
...     for j_tup in i:
...         sum_inf += j_tup[1]
...     if abs(1 - sum_inf) > 0.015:
...         print(f"{1- sum_inf}")
...         all_tots_are_1 = False
>>> all_tots_are_1 == True
True
property landmark_orig_indexes

Original data indexes for each landmark in this scale.

Returns
ndarray:

An ndarray of the original data indexes.

Examples

At scale 0 the landmarks are all the data points.

>>> sample_scale0.landmark_orig_indexes.shape
(10000,)
>>> sample_scale0.landmark_orig_indexes[0]
0
>>> sample_scale0.landmark_orig_indexes[9999]
9999
property num_points

int: The number of landmark points in this scale

Examples

>>> sample_scale0.num_points
10000
property transition_matrix

The transition (probability) matrix in this scale.

Returns
list(list(tuple)):

The transition (probability) matrix in this scale in list-of-lists form

Notes

The list returned has one entry for each landmark point, each entry is a list The inner list contains tuples where the first item is an integer landmark index in the scale and the second item is the transition matrix value for the two points.

The resulting matrix is sparse in list-of-lists (LIL) form, one list per row containing a list of (column number:value) tuples.

Examples

The size of the transition matrix should match the number of points

>>> sample_scale0.num_points == len(sample_scale0.transition_matrix)
True
>>> sample_scale1.num_points == len(sample_scale1.transition_matrix)
True
>>> sample_scale2.num_points == len(sample_scale2.transition_matrix)
True
class nptsne.KnnAlgorithm(self: nptsne.libs._nptsne.KnnAlgorithm, value: int) → None

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the knn algorithm used. Three possibilities are supported:

KnnAlgorithm.Flann: Knn using FLANN - Fast Library for Approximate Nearest Neighbors KnnAlgorithm.HNSW: Knn using Hnswlib - fast approximate nearest neighbor search KnnAlgorithm.Annoy: Knn using Annoy - Spotify Approximate Nearest Neighbors Oh Yeah

Members:

Flann

HNSW

Annoy

get_supported_metrics(self: int) → Dict[str, object]

Get a dict containing KnnDistanceMetric values supported by the KnnAlgorithm.

Parameters
knn_libKnnAlgorithm

The algorithm being queried.

Returns
ndarray

A numpy array contain a flatten (1D) embedding

Examples

Each algorithm has different support. See the tests below.

>>> import nptsne
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Flann)
>>> for i in support.items():
...     print(i[0])
Euclidean
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Annoy)
>>> for i in support.items():
...     print(i[0])
Cosine
Dot
Euclidean
Manhattan
>>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.HNSW)
>>> for i in support.items():
...     print(i[0])
Euclidean
Inner Product
>>> support["Euclidean"] is nptsne.KnnDistanceMetric.Euclidean
True
Annoy = <KnnAlgorithm.Annoy: 1>
Flann = <KnnAlgorithm.Flann: -1>
HNSW = <KnnAlgorithm.HNSW: 0>
property name
property value
class nptsne.KnnDistanceMetric(self: nptsne.libs._nptsne.KnnDistanceMetric, value: int) → None

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the knn distance metric used. Five possibilities are supported:

KnnDistanceMetric.Euclidean: Euclidean metric for all algorithms KnnDistanceMetric.InnerProduct: Inner Product metric for HNSW KnnDistanceMetric.Cosine: Cosine metric for Annoy KnnDistanceMetric.Manhattan: Manhattan metric for Annoy KnnDistanceMetric.Hamming: Hamming metric for Annoy, not supported KnnDistanceMetric.Dot: Dot metric for Annoy

Members:

Euclidean

Cosine

InnerProduct

Manhattan

Hamming

Dot

Cosine = <KnnDistanceMetric.Cosine: 1>
Dot = <KnnDistanceMetric.Dot: 5>
Euclidean = <KnnDistanceMetric.Euclidean: 0>
Hamming = <KnnDistanceMetric.Hamming: 4>
InnerProduct = <KnnDistanceMetric.InnerProduct: 2>
Manhattan = <KnnDistanceMetric.Manhattan: 3>
property name
property value
class nptsne.TextureTsne(self: nptsne.libs._nptsne.TextureTsne, verbose: bool = False, iterations: int = 1000, num_target_dimensions: int = 2, perplexity: int = 30, exaggeration_iter: int = 250, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None

Bases: pybind11_builtins.pybind11_object

Create a wrapper class for the linear tSNE implementation.

Parameters
verbosebool

Enable verbose logging to standard output

iterationsint

The number of iterations to perform. This must be at least 1000.

num_target_dimensionsint

The number of dimensions for the output embedding. Default is 2.

perplexityint

The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.

exaggeration_iterint

The iteration when force exaggeration starts to decay.

knn_algorithmKnnAlgorithm

The knn algorithm used for the nearest neighbor calculation. The default is Flann for less than 50 dimensions HNSW may be faster

knn_metricKnnDistanceMetric

The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann

Notes

TextureTsne is a GPU compute shader implementation of the gradient descent linear tSNE. If the system does not support OpenGL 4.3 an abover the implementation falls back to the a Texture rendering approach as described in [1].

References

1

Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181

Examples

Create an TextureTsne wrapper

>>> import nptsne
>>> tsne = nptsne.TextureTsne(verbose=True, knn_algorithm=nptsne.KnnAlgorithm.Annoy)
>>> tsne.verbose
True
>>> tsne.iterations
1000
>>> tsne.num_target_dimensions
2
>>> tsne.perplexity
30
>>> tsne.exaggeration_iter
250
>>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy
True
fit_transform(self: nptsne.libs._nptsne.TextureTsne, X: numpy.ndarray[numpy.float32]) → numpy.ndarray[numpy.float32]

Fit X into an embedded space and return that transformed output.

Parameters
Xndarray

The input data with shape (num. data points, num. dimensions)

Returns
ndarray

A numpy array contain a flatten (1D) embedding

Examples

An 2D embedding is returned in the form of a numpy array [x0, y0, x1, y1, …].

>>> import nptsne
>>> tsne = nptsne.TextureTsne()
>>> embedding = tsne.fit_transform(sample_tsne_data)  
>>> embedding.shape  
(4000,)
>>> import numpy  
>>> embedding.dtype == numpy.float32  
True
property exaggeration_iter

int: The iteration where attractive force exaggeration starts to decay, set at initialization.

Notes

The gradient of the cost function used to iteratively optimize the embedding points \(y_i\) is a sum of an attractive and repulsive force \(\frac{\delta C} {\delta y_i} = 4(\phi * F_i ^{attr} - F_i ^{rep})\) The iterations up to exaggeration_iter increase the \(F_i ^{attr}\) term by the factor \(\phi\) which then decays to 1.

Examples

>>> sample_texture_tsne.exaggeration_iter
250
property iterations

int: The number of iterations, set at initialization.

Examples

>>> sample_texture_tsne.iterations
1000
property knn_algorithm

int: The KnnAlgorithm value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne.knn_algorithm == nptsne.KnnAlgorithm.Flann
True
property knn_distance_metric

int: KnnDistanceMetric value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean
True
property num_target_dimensions

int: The number of target dimensions, set at initialization.

Examples

>>> sample_texture_tsne.num_target_dimensions
2
property perplexity

int: The tsne perplexity, set at initialization.

Examples

>>> sample_texture_tsne.perplexity
30
property verbose

bool: True if verbose logging is enabled. Set at initialization.

Examples

>>> sample_texture_tsne.verbose
False
class nptsne.TextureTsneExtended(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, num_target_dimensions: int = 2, perplexity: int = 30, knn_algorithm: nptsne.libs._nptsne.KnnAlgorithm = KnnAlgorithm.Flann, knn_metric: nptsne.libs._nptsne.KnnDistanceMetric = KnnDistanceMetric.Euclidean) → None

Bases: pybind11_builtins.pybind11_object

Create an extended functionality wrapper for the linear tSNE implementation.

Parameters
verbosebool

Enable verbose logging to standard output, default is False

num_target_dimensionsint

The number of dimensions for the output embedding. Default is 2.

perplexityint

The tSNE parameter that defines the neighborhood size. Usually between 10 and 30. Default is 30.

knn_algorithmKnnAlgorithm

The knn algorithm used for the nearest neighbor calculation. The default is ‘Flann’ for less than 50 dimensions ‘HNSW’ may be faster

knn_metricKnnDistanceMetric

The knn distance metric used for the nearest neighbor calculation. The default is KnnDistanceMetric.Euclidean the only supported metric for Flann

See also

TextureTsne

Notes

TextureTsneExtended offers additional control over the exaggeration decay compares to TextureTsne. Additionally it supports inputting an initial embedding. Linear tSNE is described in [1].

References

1

Pezzotti, N., Thijssen, J., Mordvintsev, A., Höllt, T., Van Lew, B., Lelieveldt, B.P.F., Eisemann, E., Vilanova, A. GPGPU Linear Complexity t-SNE Optimization IEEE Transactions on Visualization and Computer Graphics 26, 1172–1181

Examples

Create an TextureTsneExtended wrapper

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended(verbose=True, num_target_dimensions=2, perplexity=35, knn_algorithm=nptsne.KnnAlgorithm.Annoy)
>>> tsne.verbose
True
>>> tsne.num_target_dimensions
2
>>> tsne.perplexity
35
>>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy
True
Attributes
decay_started_at

int: The iteration number when exaggeration decay started.

iteration_count

int: The number of completed iterations of tSNE gradient descent.

close(self: nptsne.libs._nptsne.TextureTsneExtended) → None

Release GPU resources for the transform

init_transform(self: nptsne.libs._nptsne.TextureTsneExtended, X: numpy.ndarray[numpy.float32], initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → bool

Initialize the transform with given data and optional initial embedding. Fit X into an embedded space and return that transformed output.

Parameters
Xndarray

The input data with shape (num. data points, num. dimensions)

initial_embeddingndarray

An optional initial embedding. Shape should be (num data points, num output dimensions)

Returns
bool

True if successful, False otherwise

Examples

Create an TextureTsneExtended wrapper and initialize the data. This step performs the knn.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
reinitialize_transform(self: nptsne.libs._nptsne.TextureTsneExtended, initial_embedding: numpy.ndarray[numpy.float32] = array([], dtype=float32)) → None

Fit X into an embedded space and return that transformed output. Knn is not recomputed. If no initial_embedding is supplied the embedding is re-randomized.

Parameters
initial_embeddingndarray

An optional initial embedding. Shape should be (num data points, num output dimensions)

Examples

Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> embedding = tsne.run_transform(iterations=100)    
>>> tsne.iteration_count    
100
>>> tsne.reinitialize_transform()    
>>> tsne.iteration_count    
0
run_transform(self: nptsne.libs._nptsne.TextureTsneExtended, verbose: bool = False, iterations: int = 1000) → numpy.ndarray[numpy.float32]

Run the transform gradient descent for a number of iterations with the current settings for exaggeration.

Parameters
verbosebool

Enable verbose logging to standard output.

iterationsint

The number of iterations to run.

Returns
ndarray

A numpy array contain a flatten (1D) embedding. Coordinates are arranged: x0, y0, x, y1, …

Examples

Create an TextureTsneExtended wrapper and initialize the data and run for 250 iterations.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> embedding = tsne.run_transform(iterations=250)    
>>> embedding.shape    
(4000,)
>>> tsne.iteration_count    
250
start_exaggeration_decay(self: nptsne.libs._nptsne.TextureTsneExtended) → None

Enable exaggeration decay. Effective on next call to run_transform. From this point exaggeration decays over the following 150 iterations, the decay this is a fixed parameter. This call is ony effective once.

Raises
RuntimeError

If the decay is already active. This can be ignored.

Examples

Starting decay exaggeration is recorded in the decay_started_at property.

>>> import nptsne
>>> tsne = nptsne.TextureTsneExtended()
>>> tsne.init_transform(sample_tsne_data)
True
>>> tsne.decay_started_at
-1
>>> embedding = tsne.run_transform(iterations=100)    
>>> tsne.start_exaggeration_decay()    
>>> tsne.decay_started_at    
100
property decay_started_at

int: The iteration number when exaggeration decay started. Is -1 if exaggeration decay has not started.

Examples

Starting decay exaggeration is recorded in the decay_started_at property.

>>> sample_texture_tsne_extended.decay_started_at
-1
property iteration_count

int: The number of completed iterations of tSNE gradient descent.

>>> sample_texture_tsne_extended.iteration_count
0
property knn_algorithm

int: The KnnAlgorithm value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne_extended.knn_algorithm == nptsne.KnnAlgorithm.Flann
True
property knn_distance_metric

int: The KnnDistanceMetric value, set at initialization.

Examples

>>> import nptsne
>>> sample_texture_tsne_extended.knn_distance_metric == nptsne.KnnDistanceMetric.Euclidean
True
property num_target_dimensions

int: The number of target dimensions, set at initialization.

Examples

>>> sample_texture_tsne_extended.num_target_dimensions
2
property perplexity

int: The tsne perplexity, set at initialization.

Examples

>>> sample_texture_tsne_extended.perplexity
30
property verbose

bool: True if verbose logging is enabled. Set at initialization.

Examples

>>> sample_texture_tsne_extended.verbose
False

nptsne.hsne_analysis: HSNE visual analysis support submodule

nptsne.hsne_analysis.Analysis

Create a new analysis as a child of an (optional) parent analysis.

nptsne.hsne_analysis.AnalysisContainer

A dict of dicts to store analyses

nptsne.hsne_analysis.AnalysisModel

Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points.

nptsne.hsne_analysis.EmbedderType

Enumeration used to select the embedder used. Two possibilities are

nptsne.hsne_analysis.SparseTsne

SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1].

class nptsne.hsne_analysis.Analysis(self: nptsne.libs._nptsne._hsne_analysis.Analysis, hnse: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType, parent: nptsne.libs._nptsne._hsne_analysis.Analysis = None, parent_selection: List[int] = []) → None

Bases: pybind11_builtins.pybind11_object

Create a new analysis as a child of an (optional) parent analysis.

Parameters
hsneHSne

The hierarchical SNE being explored

embedder_typeEmbedderType

The tSNE to use CPU or GPU based

parentAnalysis, optional

The parent Analysis (where the selection was performed) if any

parent_selectionlist, optional

List of selection indexes in the parent analysis.

Notes

Together with AnalysisModel provides support for visual analytics of an hSNE. The Analysis class holds both the chosen landmarks at a particular scale but also permits referencing back to the original data. Additionally a t-SNE embedder is included (a choice is provided between GPU and CPU implementations) which can be used to create an embedding of the selected landmarks.

Examples

The Analysis constructor is meant for use by the :class: nptsne.hsne_analysis.AnalysisModel. The example here illustrates how a top level analysis would be created from a sample hsne.

>>> import nptsne
>>> top_analysis = nptsne.hsne_analysis.Analysis(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> top_analysis.scale_id
2
>>> sample_hsne.get_scale(top_analysis.scale_id).num_points == top_analysis.number_of_points
True
Attributes
number_of_points

int : number of landmarks in this Analysis

parent_id

int : Unique id of the parent analysis

transition_matrix

list(dict) : The transition (probability) matrix in this Analysis

landmark_weights

ndarray : the weights for the landmarks in this Analysis

landmark_indexes

ndarray : the indexes for the landmarks in this Analysis

landmark_orig_indexes

ndarray : the original data indexes for the landmarks in this Analysis

embedding

ndarray : the tSNE embedding generated for this Analysis

do_iteration(self: nptsne.libs._nptsne._hsne_analysis.Analysis) → None

Perform one iteration of the chosen embedder

get_area_of_influence(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int], threshold: float = 0.3) → numpy.ndarray[numpy.float32]

Get the area of influence of the selection in the original data. For more information on the threshold refer to the HSNE paper section 4.2 Filtering and drilling down.

A fast but less accurate approach to obtaining area of influence is get_mapped_area_of_influence.

Parameters
select_listlist

A list of selection indexes for landmarks in this analysis

threshold: float, optional

The minimum value required for the underlying datapoint to be considered in the landmark’s region of influence. Default is 0.3. The parameter must be in the range 0 to 1.0, values outside the range it will be ignored.

Returns
ndarray

The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.

get_fast_area_of_influence(self: nptsne.libs._nptsne._hsne_analysis.Analysis, select_list: List[int]) → numpy.ndarray[numpy.float32]

Fast method to get the area of influence of the selection in the original data based on non overlapping \({1}\rightarrow{n}\) mapping of scale landmarks to original data points.

This mapping is derived by working bottom up from the data points and finding the landmarks at each scale with the maximum influence. The mapping is calculated once on the first call to this function so subsequent calls are fast.

Due to thresholding it is possible that a datapoint may have no representative landmark at a specific scale.

Parameters
select_listlist

A list of selection indexes for landmarks in this analysis

Returns
ndarray

The mask of the original points represented by the selected landmarks. If the point is in the AOI the value is 1.

Examples

Demonstrate the non-overlap of the area of influence for each landmark.

>>> import math
>>> import numpy as np
>>> all_top_landmarks=list(range(0,sample_analysis.number_of_points))
>>> all_influenced=sample_analysis.get_fast_area_of_influence(all_top_landmarks)
>>> all_influenced.shape[0] == 10000
True

Accumulate the individual landmark AOIs and check the total

>>> infl_accum = np.zeros((10000,), dtype=np.float32)
>>> total = 0
>>> for i in all_top_landmarks:
...     influenced = sample_analysis.get_fast_area_of_influence([i])
...     total = total + influenced.sum()
...     infl_accum = np.add(infl_accum, influenced)
>>> total == 10000
True

Verify that all AOIs are non-overlapping, each datapoint occurs once and only once.

>>> np.all(infl_accum == 1)
True
property embedding

ndarray : the tSNE embedding generated for this Analysis

Examples

An embedding is a 2d float array. One entry per point.

>>> import numpy as np
>>> sample_analysis.embedding.shape == (sample_analysis.number_of_points, 2)
True
>>> sample_analysis.embedding.dtype == np.float32
True
property id

int: Internally generated unique id for the analysis.

Examples

>>> sample_analysis.id
0
property landmark_indexes

ndarray : the indexes for the landmarks in this Analysis

Examples

In a complete top level analysis all points are present in this case all the points at scale2.

>>> import numpy as np
>>> np.array_equal(
... np.arange(sample_scale2.num_points, dtype=np.uint32), 
... sample_analysis.landmark_indexes)
True
property landmark_orig_indexes

ndarray : the original data indexes for the landmarks in this Analysis

Examples

The indexes are in the range of the original point indexes.

>>> import numpy as np
>>> np.logical_and(
... sample_analysis.landmark_orig_indexes >= 0,
... sample_analysis.landmark_orig_indexes < 10000).any()
True
property landmark_weights

ndarray : the weights for the landmarks in this Analysis

Examples

There will be a weight for every point.

>>> weights = sample_analysis.landmark_weights
>>> weights.shape == (sample_analysis.number_of_points,)
True
property number_of_points

int : number of landmarks in this Analysis

Examples

The sample analysis is all the top scale points

>>> sample_analysis.number_of_points == sample_scale2.num_points
True
property parent_id

int : Unique id of the parent analysis

property scale_id

int: The number of this HSNE scale where this analysis is created.

Examples

>>> sample_analysis.scale_id
2
property transition_matrix

list(dict) : The transition (probability) matrix in this Analysis

class nptsne.hsne_analysis.AnalysisContainer(top_analysis: nptsne.libs._nptsne._hsne_analysis.Analysis)

Bases: object

A dict of dicts to store analyses

Parameters
top_analysis: :class:`Analysis`

The Analysis at the highest scale level containing all landmarks

Notes

The outer dict represents the scales and the inner dicts at scale level are indexed by the unique self-generated Analysis ids.

add_analysis(analysis: nptsne.libs._nptsne._hsne_analysis.Analysis) → None

Add a new analysis to the container

Parameters
analysisAnalysis

The new analysis

get_analysis(analysis_id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis

Get the analysis corresponding to the id

Parameters
analysis_idint

[description]

Returns
Analysis

[description]

Raises
ValueError

If the id does not correspond to an analysis

remove_analysis(analysis_id: int) → List[int]

Removes analysis and, recursively, child analyses

Returns
list[int]

A list of analysis ids removed including this one

class nptsne.hsne_analysis.AnalysisModel(hsne: nptsne.libs._nptsne.HSne, embedder_type: nptsne.libs._nptsne._hsne_analysis.EmbedderType)

Bases: object

Create an analysis model tree with the a top level Analysis containing all landmarks at the highest scale The AnalysisModel initially contains only the top analysis, i.e. the HSNE scale with the least number of points. As selections are made starting from the top analysis new sub analyses are added to the tree in AnalsisModel. The helper class AnalysisContainer is responsible for maintaining this tree of analyses.

Parameters
hsneHSne

The python HSne wrapper class

embedder_typehsne_analysis.EmbedderType

The embedder to be used when creating a new analysis CPU or GPU

See also

hsne_analysis.Analysis
hsne_analysis.EmbedderType.CPU

Notes

The hsne_analysis.AnalysisModel contains the user driven selections when exploring an HSNE hierarchy. The AnalysisModel is created with a top level default hsne_analysis.Analysis containing all top level landmarks.

Examples

Initialize a model using loaded HSne data.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> model.top_scale_id
2
Attributes
top_analysis

hsne_analysis.Analysis: The top level analysis

analysis_container

The container for all analyses.

bottom_scale_id
top_scale_id
add_new_analysis(parent: nptsne.libs._nptsne._hsne_analysis.Analysis, parent_selection: numpy.ndarray) → nptsne.libs._nptsne._hsne_analysis.Analysis

Add a new analysis based on a selection in a parent analysis

Parameters
parent: Analysis

The parent analysis

parent_selection: ndarray<np.uint32>

The selection indices in the parent analysis

Examples

Make a child analysis by selecting half of the points in the top analysis. The analysis is created at the next scale down is a child of the top level and contains an embedding of the right shape.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> sel = np.arange(int(model.top_analysis.number_of_points / 2))
>>> analysis = model.add_new_analysis(model.top_analysis, sel)
>>> analysis.scale_id
1
>>> analysis.parent_id == model.top_analysis.id
True
>>> analysis.embedding.shape == (analysis.number_of_points, 2)
True
get_analysis(id: int) → nptsne.libs._nptsne._hsne_analysis.Analysis

Get the Analysis for the given id

Parameters
id: int

An Analysis id

Returns
The Analysis corresponding to the id
Raises
ValueError

If the id does not correspond to an analysis

Examples

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> id = model.top_analysis.id
>>> str(model.top_analysis) == str(model.get_analysis(id))
True
remove_analysis(id: int) → List[int]

Remove the analysis and all children

Returns
list[int]

list of deleted ids

Examples

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> sel = np.arange(int(model.top_analysis.number_of_points / 2))
>>> analysis = model.add_new_analysis(model.top_analysis, sel)
>>> id = analysis.id
>>> a_list = model.remove_analysis(analysis.id)
>>> a_list == [id]
property analysis_container

The container for all analyses.

This is an internal property exposed for debug purposes only

property top_analysis

hsne_analysis.Analysis: The top level analysis

Raises
ValueError

If there is not top analysis

Examples

Retrieve the top level analysis containing all points at the top level.

>>> import nptsne
>>> model = nptsne.hsne_analysis.AnalysisModel(sample_hsne, nptsne.hsne_analysis.EmbedderType.CPU)
>>> analysis = model.top_analysis
>>> analysis.scale_id
2
class nptsne.hsne_analysis.EmbedderType(self: nptsne.libs._nptsne._hsne_analysis.EmbedderType, value: int) → None

Bases: pybind11_builtins.pybind11_object

Enumeration used to select the embedder used. Two possibilities are supported:

EmbedderType.CPU: CPU tSNE EmbedderType.CPU: GPU tSNE

Members:

CPU

GPU

CPU = <EmbedderType.CPU: 0>
GPU = <EmbedderType.GPU: 1>
property name
property value
class nptsne.hsne_analysis.SparseTsne

Bases: pybind11_builtins.pybind11_object

SparseTsne a wrapper for an approximating tSNE CPU implementation as described in [1].

Forms an alternative to TextureTsne when GPU acceleration for creation of the embedding is not available for internal use in the Analysis class

References

1(1,2)

Pezzotti, N., Lelieveldt, B.P.F., Maaten, L. van der, Höllt, T., Eisemann, E., Vilanova, A., 2017. Approximated and User Steerable tSNE for Progressive Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 23, 1739–1752.

Attributes
embeddingndarray

Embedding plot - shape embed dimensions x num points

do_iteration(self: nptsne.libs._nptsne._hsne_analysis.SparseTsne) → None

Perform a single tSNE iteration on the sparse data. Once complete the embedding coordinates can be read via the embedding property

property embedding

Embedding plot - shape embed dimensions x num points