===========
Coding tips
===========

These are a series of tips for using the t-SNE and HSNE functionality. 
For complete documentation refer to the :doc:`nptsne API Reference <./nptsne>`. 


The code given in the tips is all taken from either the documentation examples 
(in the docstrings or see :doc:`nptsne API Reference <./nptsne>`)
or is available in the demos.

Full demos and data are available in demos.zip |zenodo_url|.

t-SNE tips
==========

1. Creating a simple t-SNE embedding
------------------------------------

The simple t-SNE class is called TextureTsne it uses either a GPU compute kernel (if available)
of a GPU texture driven approach to calculating t-SNE.

The code shown is from |TTdemo_github_url| test.py and API is documented at at :py:class:`nptsne.TextureTsne`

Note that the number of iterations or perplexity can be set when creating the `TextureTsne` object
but in the example the defaults are used.

Assuming *mnist* [#]_ is a dictionary containg numpy arrays the following code
will produce an embedding:

.. code-block:: python

    tsne = nptsne.TextureTsne(True)  # True triggers verbose output
    embed = tsne.fit_transform(mnist["data"])
    print(embed.shape)
    embed = embed.reshape(70000, 2)

.. [#] *mnist* in this example is the well-known `handwritten digits data <http://yann.lecun.com/exdb/mnist/>`_.

1. Creating an extended t-SNE embedding
---------------------------------------

The class :py:class:`nptsne.TextureTsneExtended` add flexibility to the API in :py:class:`nptsne.TextureTsne`.
It permits running a number of iterations, examine the embedding,
then running more embeddings. Force exaggeration decay (refer to the t-SNE GPU paper [NP2019]_)
can be triggered a suitable point. If force exaggeration is maintained the resulting
clusters will more closely resemble a *umap* plot. For a related effect see the :ref:`umap-tsne-label`.

The code shown here is adapted from |TTEdemo_github_url| testtextureextended.py and
the API is documented at :py:class:`nptsne.TextureTsneExtended`

.. code-block:: python

    tsne = nptsne.TextureTsneExtended(False)
    tsne.init_transform(mnist["data"])
    for i in range(20):
        # reduce the forces from iteration 1000
        if i == 10:
            tsne.start_exaggeration_decay()
        embedding = tsne.run_transform(verbose=False, iterations=100)
    tsne.close()

A user-friendly way to explore :py:class:`nptsne.TextureTsne` and :py:class:`nptsne.TextureTsneExtended`
is to use Jupyter notebooks with the |TNotebook_github_url|.

HSNE tips
=========

HSNE is designed for visual analyis of large multidimensional data. An HSNE visual analytics session typically
follows a number of steps:

1. Create a multi-scale *HSNE* hierarchy
2. Display an embedding [#]_ based on all the landmarks [#]_ in the topmost scale
3. Interact with clusters in the embedding to make sub-selections of landmarks.

    a) Handle landmark selections by displaying the *Area of Interest*
    corresponding to the landmark (how this is done this is application dependent).

4. Choose a selection to create a lower scale analysis with :py:class:`nptsne.hsne_analysis.Analysis`
5. Display an embedding based on that *Analysis*. Continue with step 3.
6. Repeat steps 3., 4, and 5. to create a tree of analysis and corresponding visualizations.


.. [#] The examples use *t-SNE* embeddings
.. [#] An *HSNE* landmark at scale n is defined to be a datapoint representing a number of neighbouring (as defined by the chosen metric) points at scale n-1

Support for HSNE based visual analytics in nptsne
-------------------------------------------------

The submodule :py:mod:`nptsne.hsne_analysis` contains classes to assist in the creation
and navigation of a hierarchy of analyses:

- :py:class:`nptsne.hsne_analysis.Analysis` - a selection of landmarks at one *HSNE* scale under examination
- :py:class:`nptsne.hsne_analysis.AnalysisModel` - a hierarchy of landmark selections (:py:class:`nptsne.hsne_analysis.Analysis`) representing the totality of a visual analytics session
- :py:class:`nptsne.hsne_analysis.AnalysisContainer` - a container type used by `nptsne.hsne_analysis.AnalysisModel` to hold :py:class:`nptsne.hsne_analysis.Analysis`


The nptsne embedded support for visual analytics is limited to data management but examples of how visualization 
can be done (using matplotlib and PyQt5) can be found in the demos.
The :py:mod:`nptsne.hsne_analysis` system forms the core of both the |HSNEdemo_github_url| and the |EXHSNEdemo_github_url|.

A number of the steps have been highlight (in simplified form) here:

1. Create a multi-scale HSNE hierarchy
--------------------------------------

Code is adapted from |doctest_github_url| run_doctest.py.

.. code-block:: python

    import nptsne
    import numpy as np
    data = np.random.randint(256, size=(10000, 16)) # create some random data
    hsne = nptsne.HSne(True) # a verbose HSne object
    hsne.create_hsne(hsne_data, 3) # create a three level hierarchy
    # create the ctop level analysis using all the landmarks
    top_analysis = nptsne.hsne_analysis.Analysis(hsne, nptsne.hsne_analysis.EmbedderType.CPU)

2. Creating an analysis hierarchy
---------------------------------

This is a simplified overview showing one way to perform visual analytics with :py:class:`nptsne.HSne` 
and the :py:mod:`nptsne.hsne_analysis` support classes in python. See |HSNEdemo_github_url| for details.

2a. Display and iterate the analysis embedding
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Code fragments are adapted from |HSNEdemo_github_url| AnalysisGui.py

Python library *matplotlib* supports interactive scatter plots and plot animation.
This can be used to display and iterate the t-SNE embedding of the - :py:class:`nptsne.hsne_analysis.Analysis`.
The actual code is more complex and includes selections and the display of the 
corresponding *mnist* digits on mouse over. In the actual AnalysisGui.py 
the code shown is part of a an *AnalysisGui* class 
permitting multiple analysis embeddings to be show simultaneously.

.. code-block:: python


    import matplotlib.pyplot as plt
    from nptsne import hsne_analysis
    # input_analysis could be top_analysis as shown above
    # or the result of a new selection
    analysis: hsne_analysis.Analysis = input_analysis
    fig = plt.figure(num=str(analysis))
    # setup animation
    ani = animation.FuncAnimation(
        fig,
        iterate_Tsne,
        init_func=self.start_plot,
        frames=range(self.num_frames),
        interval=100,
        repeat=True,
        blit=True,
    )
    stop_iter = False
    num_iters = 350

    def start_plot()
        # Reserve space for a scatter plot of the embedding,
        #
        # ***********************************************************
        embedding = self.analysis.embedding   # Extract the embedding
        # ***********************************************************
        x = embedding[:, 0]
        y = embedding[:, 1]
        # ********************************************************************
        scatter = ax.scatter( # Point size represents the landmark weight
            x, y, s=analysis.landmark_weights * 8, c="b", alpha=0.4, picker=10
        )
        # ********************************************************************

    def iterate_Tsne(i):
        # In practice do several iterations per animation frame
        # to give a smoother feeling to the embedding
        fig.canvas.flush_events()

        if not stop_iter:
            # *********************
            analysis.do_iteration()  # Perform an iteration of the embedding
            # *********************

            if i == num_iters:
                stop_iter = True

            # Update point positions
            # *************************************
            scatter.set_offsets(analysis.embedding) # Update the scatter plot
            # *************************************
            # At this point the embedding plot should be rescaled
            # as the size of the embedding changes.
            # See AnalysisGui.py update_scatter_plot_limits for details

2b. Select a region in the embedding to create a new analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Code fragments adapted from |HSNEdemo_github_url| AnalysisGui.py and ModelGui.py

The code concentrates on the conversion between a selection rectangle and 
the creation of the new analysis.

.. code-block:: python

    # The selection origin is tracked in the rorg_xy tuple (embedding coords)
    # The current cursor coordinate is dim_xy (embedding coords)
    def on_end_select(self, event):
        # ******************************
        if self.analysis.scale_id == 0:  # at the data level can't drill down
            return
        # ******************************

        # *********************************
        embedding = analysis.embedding  # Get the embedding points that fall in the current selection rectangle
        # *********************************
        # Get the ordered indexes at this analysis level 
        indexes = np.arange(embedding.shape[0])
        selected_indexes = indexes[
            (embedding[:, 0] > rorg_xy[0])
            & (embedding[:, 0] < rorg_xy[0] + dim_xy[0])
            & (embedding[:, 1] > rorg_xy[1])
            & (embedding[:, 1] < rorg_xy[1] + dim_xy[1])
        ]
        if selected_indexes.shape[0] > 0:
            # ************************************************************************
            new_analysis = analysis_model.add_new_analysis(analysis, selected_indexes) # Add a new analysis to the model with the current one as parent
            # ************************************************************************


3. Extending the *HSNE* viewers
-------------------------------

The |EXHSNEdemo_github_url|, a simple but fairly complete visual analysis tool, 
includes two additional viewers capable of visualizing other types of multidimensional data:

1. Image is datapoint - MNIST like data
2. Hyperspectral image - examples include hyperspectral image of sun and multispectral earth oberservation satellite imaging
3. Point and meta data - for example cell data classified according to gene expression and meta data related to cell types that can be used to color the embeddings


|EXHSNEdemo_github_url| extends the approach in |HSNEdemo_github_url|. The *AnalysisGui* has been
split into a reusable *EmbeddingGui* (for the analysis embedding) and separate viewers 
for the different data types: *HyperspectralImageViewer*,
*MetaDataViewer* and *CompositeImageViewer*. An *AnalysisController* mediates between selections and the manitenance
of the *AnalysisModel*.

A detailed explanation of these viewers and other support classes can be found in the |EXHSNEdemo_github_url| README.md

3a. Area of influence from landmarks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In certain applications (for example hyperspectral imaging) visualizing the entirety 
of the data points that map to a landmark selection is required. This is termed the 
*area of influence* (*AOI*) of the landmark selection. 

In |EXHSNEdemo_github_url| AnalysisController.py the *on_selection* function 
illustrates that this mapping can be done in a slow (accurate method 
:py:meth:`nptsne.hsne_analysis.Analysis.get_area_of_influence`) 
or fast (but less accurate method :py:meth:`nptsne.hsne_analysis.Analysis.get_fast_area_of_influence`) 
manner. Typically the fast *AOI* is suitable for mouse-over events.

Note that selections indexes numbered from 0 to number_of_landmarks-1 at a scale multi-scale
be converted data indexes using :py:attr:`nptsne.hsne_analysis.Analysis.landmark_indexes`.

.. code-block:: python

    def landmark_index_from_selection(self, sel_indexes: List[int]) -> List[int]:
        """Selection indexes in this analysis are converted to landmark
        indexes in this scale"""
        landmark_indexes = []
        # Translate the selection indexes to the scale indexes
        for i in sel_indexes:
            landmark_indexes.append(self.analysis.landmark_indexes[i])
        return landmark_indexes

    def on_selection(
        self, analysis_selection: List[int], make_new_analysis: bool, fast: bool = False
    ) -> None:
        """analysis_selection is a list of indexes at this analysis scale
        If make_new_analysis is true start a new analysis controller"""
        # Selection indexes are from 0 - number of landmarks. The original
        # data indexes of the landmarks are needed for the AOI
        landmark_indexes = self.landmark_index_from_selection(analysis_selection)
        if self.demo_type == DemoType.HYPERSPECTRAL_DEMO:
            # Pass area influenced to the hyperspectral viewer
            aoi: np.ndarray
            if fast:
                aoi = self.analysis.get_fast_area_of_influence(landmark_indexes)
            else:
                aoi = self.analysis.get_area_of_influence(landmark_indexes)
            self.data_gui.set_static_mask(aoi)