Coding tips

These are a series of tips for using the t-SNE and HSNE functionality. For complete documentation refer to the nptsne API Reference.

The code given in the tips is all taken from either the documentation examples (in the docstrings or see nptsne API Reference) or is available in the demos.

Full demos and data are available in demos.zip (see demos + data).

t-SNE tips

1. Creating a simple t-SNE embedding

The simple t-SNE class is called TextureTsne it uses either a GPU compute kernel (if available) of a GPU texture driven approach to calculating t-SNE.

The code shown is from TextureTsne demo code test.py and API is documented at at nptsne.TextureTsne

Note that the number of iterations or perplexity can be set when creating the TextureTsne object but in the example the defaults are used.

Assuming mnist 1 is a dictionary containg numpy arrays the following code will produce an embedding:

tsne = nptsne.TextureTsne(True)  # True triggers verbose output
embed = tsne.fit_transform(mnist["data"])
print(embed.shape)
embed = embed.reshape(70000, 2)
1

mnist in this example is the well-known handwritten digits data.

1. Creating an extended t-SNE embedding

The class nptsne.TextureTsneExtended add flexibility to the API in nptsne.TextureTsne. It permits running a number of iterations, examine the embedding, then running more embeddings. Force exaggeration decay (refer to the t-SNE GPU paper [NP2019]) can be triggered a suitable point. If force exaggeration is maintained the resulting clusters will more closely resemble a umap plot. For a related effect see the umap to tSNE example.

The code shown here is adapted from TextureTsneExtended demo code testtextureextended.py and the API is documented at nptsne.TextureTsneExtended

tsne = nptsne.TextureTsneExtended(False)
tsne.init_transform(mnist["data"])
for i in range(20):
    # reduce the forces from iteration 1000
    if i == 10:
        tsne.start_exaggeration_decay()
    embedding = tsne.run_transform(verbose=False, iterations=100)
tsne.close()

A user-friendly way to explore nptsne.TextureTsne and nptsne.TextureTsneExtended is to use Jupyter notebooks with the tSNE Jupyter notebook demo.

HSNE tips

HSNE is designed for visual analyis of large multidimensional data. An HSNE visual analytics session typically follows a number of steps:

  1. Create a multi-scale HSNE hierarchy

  2. Display an embedding 2 based on all the landmarks 3 in the topmost scale

  3. Interact with clusters in the embedding to make sub-selections of landmarks.

    a) Handle landmark selections by displaying the Area of Interest corresponding to the landmark (how this is done this is application dependent).

  4. Choose a selection to create a lower scale analysis with nptsne.hsne_analysis.Analysis

  5. Display an embedding based on that Analysis. Continue with step 3.

  6. Repeat steps 3., 4, and 5. to create a tree of analysis and corresponding visualizations.

2

The examples use t-SNE embeddings

3

An HSNE landmark at scale n is defined to be a datapoint representing a number of neighbouring (as defined by the chosen metric) points at scale n-1

Support for HSNE based visual analytics in nptsne

The submodule nptsne.hsne_analysis contains classes to assist in the creation and navigation of a hierarchy of analyses:

The nptsne embedded support for visual analytics is limited to data management but examples of how visualization can be done (using matplotlib and PyQt5) can be found in the demos. The nptsne.hsne_analysis system forms the core of both the Basic HSNE demo code and the Extended HSNE demo code.

A number of the steps have been highlight (in simplified form) here:

1. Create a multi-scale HSNE hierarchy

Code is adapted from Doctest code run_doctest.py.

import nptsne
import numpy as np
data = np.random.randint(256, size=(10000, 16)) # create some random data
hsne = nptsne.HSne(True) # a verbose HSne object
hsne.create_hsne(hsne_data, 3) # create a three level hierarchy
# create the ctop level analysis using all the landmarks
top_analysis = nptsne.hsne_analysis.Analysis(hsne, nptsne.hsne_analysis.EmbedderType.CPU)

2. Creating an analysis hierarchy

This is a simplified overview showing one way to perform visual analytics with nptsne.HSne and the nptsne.hsne_analysis support classes in python. See Basic HSNE demo code for details.

2a. Display and iterate the analysis embedding

Code fragments are adapted from Basic HSNE demo code AnalysisGui.py

Python library matplotlib supports interactive scatter plots and plot animation. This can be used to display and iterate the t-SNE embedding of the - nptsne.hsne_analysis.Analysis. The actual code is more complex and includes selections and the display of the corresponding mnist digits on mouse over. In the actual AnalysisGui.py the code shown is part of a an AnalysisGui class permitting multiple analysis embeddings to be show simultaneously.

import matplotlib.pyplot as plt
from nptsne import hsne_analysis
# input_analysis could be top_analysis as shown above
# or the result of a new selection
analysis: hsne_analysis.Analysis = input_analysis
fig = plt.figure(num=str(analysis))
# setup animation
ani = animation.FuncAnimation(
    fig,
    iterate_Tsne,
    init_func=self.start_plot,
    frames=range(self.num_frames),
    interval=100,
    repeat=True,
    blit=True,
)
stop_iter = False
num_iters = 350

def start_plot()
    # Reserve space for a scatter plot of the embedding,
    #
    # ***********************************************************
    embedding = self.analysis.embedding   # Extract the embedding
    # ***********************************************************
    x = embedding[:, 0]
    y = embedding[:, 1]
    # ********************************************************************
    scatter = ax.scatter( # Point size represents the landmark weight
        x, y, s=analysis.landmark_weights * 8, c="b", alpha=0.4, picker=10
    )
    # ********************************************************************

def iterate_Tsne(i):
    # In practice do several iterations per animation frame
    # to give a smoother feeling to the embedding
    fig.canvas.flush_events()

    if not stop_iter:
        # *********************
        analysis.do_iteration()  # Perform an iteration of the embedding
        # *********************

        if i == num_iters:
            stop_iter = True

        # Update point positions
        # *************************************
        scatter.set_offsets(analysis.embedding) # Update the scatter plot
        # *************************************
        # At this point the embedding plot should be rescaled
        # as the size of the embedding changes.
        # See AnalysisGui.py update_scatter_plot_limits for details

2b. Select a region in the embedding to create a new analysis

Code fragments adapted from Basic HSNE demo code AnalysisGui.py and ModelGui.py

The code concentrates on the conversion between a selection rectangle and the creation of the new analysis.

# The selection origin is tracked in the rorg_xy tuple (embedding coords)
# The current cursor coordinate is dim_xy (embedding coords)
def on_end_select(self, event):
    # ******************************
    if self.analysis.scale_id == 0:  # at the data level can't drill down
        return
    # ******************************

    # *********************************
    embedding = analysis.embedding  # Get the embedding points that fall in the current selection rectangle
    # *********************************
    # Get the ordered indexes at this analysis level
    indexes = np.arange(embedding.shape[0])
    selected_indexes = indexes[
        (embedding[:, 0] > rorg_xy[0])
        & (embedding[:, 0] < rorg_xy[0] + dim_xy[0])
        & (embedding[:, 1] > rorg_xy[1])
        & (embedding[:, 1] < rorg_xy[1] + dim_xy[1])
    ]
    if selected_indexes.shape[0] > 0:
        # ************************************************************************
        new_analysis = analysis_model.add_new_analysis(analysis, selected_indexes) # Add a new analysis to the model with the current one as parent
        # ************************************************************************

3. Extending the HSNE viewers

The Extended HSNE demo code, a simple but fairly complete visual analysis tool, includes two additional viewers capable of visualizing other types of multidimensional data:

  1. Image is datapoint - MNIST like data

  2. Hyperspectral image - examples include hyperspectral image of sun and multispectral earth oberservation satellite imaging

  3. Point and meta data - for example cell data classified according to gene expression and meta data related to cell types that can be used to color the embeddings

Extended HSNE demo code extends the approach in Basic HSNE demo code. The AnalysisGui has been split into a reusable EmbeddingGui (for the analysis embedding) and separate viewers for the different data types: HyperspectralImageViewer, MetaDataViewer and CompositeImageViewer. An AnalysisController mediates between selections and the manitenance of the AnalysisModel.

A detailed explanation of these viewers and other support classes can be found in the Extended HSNE demo code README.md

3a. Area of influence from landmarks

In certain applications (for example hyperspectral imaging) visualizing the entirety of the data points that map to a landmark selection is required. This is termed the area of influence (AOI) of the landmark selection.

In Extended HSNE demo code AnalysisController.py the on_selection function illustrates that this mapping can be done in a slow (accurate method nptsne.hsne_analysis.Analysis.get_area_of_influence()) or fast (but less accurate method nptsne.hsne_analysis.Analysis.get_fast_area_of_influence()) manner. Typically the fast AOI is suitable for mouse-over events.

Note that selections indexes numbered from 0 to number_of_landmarks-1 at a scale multi-scale be converted data indexes using nptsne.hsne_analysis.Analysis.landmark_indexes.

def landmark_index_from_selection(self, sel_indexes: List[int]) -> List[int]:
    """Selection indexes in this analysis are converted to landmark
    indexes in this scale"""
    landmark_indexes = []
    # Translate the selection indexes to the scale indexes
    for i in sel_indexes:
        landmark_indexes.append(self.analysis.landmark_indexes[i])
    return landmark_indexes

def on_selection(
    self, analysis_selection: List[int], make_new_analysis: bool, fast: bool = False
) -> None:
    """analysis_selection is a list of indexes at this analysis scale
    If make_new_analysis is true start a new analysis controller"""
    # Selection indexes are from 0 - number of landmarks. The original
    # data indexes of the landmarks are needed for the AOI
    landmark_indexes = self.landmark_index_from_selection(analysis_selection)
    if self.demo_type == DemoType.HYPERSPECTRAL_DEMO:
        # Pass area influenced to the hyperspectral viewer
        aoi: np.ndarray
        if fast:
            aoi = self.analysis.get_fast_area_of_influence(landmark_indexes)
        else:
            aoi = self.analysis.get_area_of_influence(landmark_indexes)
        self.data_gui.set_static_mask(aoi)