Interpretability of models with biological data

(1) The How and Why of Interpretability in the Biological Sciences - Lior Pachter - YouTube

Theme of the explaination:

The whole idea of the video is to provide the insights on the intreprability of data science models on biological data.

It turns out that it's dangerous to just only depend on the mathematical calculations and equations to derive the result on the biological data. Interpratability in biological setting should be a process driven with design and experimentations with theory rather than making judgments on bare mathematical foundations.

Details:

What it means to be interpratable in biological setting is essential.

2. How biologist think about mechanical interpratability.

Interpretability of models with biological data-8

Biological data here can be related to circle in 2D, sphere in 3D and hysphere on nD as we are dealing with data which is associated with with the nucleus, cells etc., and the circles/spheres have the greatest area/volume among all closed curves of fixed diameter and thereby are the best interpretation to fit in more data.

They conducted experiment and analysed the reduced dimensional data using UMAP provided by Auto Encoder and found the group of the clusters to be non-informative.

Embeddings provided by Auto Encoders in lower dimension is not intrepratable because of the non linear functions used in Neural Network.

However the using the variation of Auto Encoder-> Variational Auto Encoder was able to get in some meaningful insights of the data and are more intrepratable even in lower dimension.

Variational Auto Encoder architecture:

Encoder part(remains same) with non linear functions, used to reduce the dimensionality of data.

They made the change in Decoder architecture where in they used linear functions to pull the data back, similar to original. Linear functions are interpratable asin PCA.

This whole transformation in architecture comes at the cost of accuracy as the linear constraints are not straight to match, but they the model was providing better intrepratability.

6. They found such VAE are providing better intrepratability, even when the data is embedded in lower dimension.However these architecture are only providing meaningful interprations until dimensionality and data size is not extremly high.

Interpretability of models with biological data-18

Above figure shows, as no. of data points becomes >=3 it becomes hard to position data points at equidistant position in lower dimension space, as the sphere or hysphere are restricted by definite space and trying to fit in more biological data in that space by preserving the distances tends to introduce distortion.

And thus we end up getting wrong conclusions by modern dimensionality reduction technique.

The overall idea is to generate and make use of biological ideas, theory and envolve them with data and mathematical formulation to create inferences in biological domain.

Interpratability should be considered as a process which envolves experiments and ideas, rather than an output from inference procedures.

To handle it we can setup a process where we can examine two different models such that these models should be generating the data as per our need.Then we perform some mathematical analysis and look for what sort of information allows to distinguish between these models .Further ahead we can use those informations to design of new ideas, and perform experiments.

Last Updated:

Summarize & share videos seamlessly