Install and Use TSNECUDA package
T-SNE is a great method to visualize high-dimension data by projecting them into three or two dimensions.
Sk-learn has an implementation which works on CPU (so it has great compatibility), but the problem is it is too slow on large amount of data.
CannyLab also has an optimized version that runs on Nvidia GPU (with CUDA), which they claims to be much faster than sk-learn and other implementations of t-SNE, although it can only project to two dimensions.
However, installation is not as easy as you might think, I encountered several problems during my installation so I want to write them down here in case I forget it.
Introduction to My Computer
Nvidia Tesla K80 GPU
Find CUDA version
The first step is to check which CUDA I’m using, after digging for awhile, I found several ways to do this. I tried all of them on my computer and here are what I got.
After digging for awhile, I found this page, OK, I might follow nvidia-smi
and believe it is CUDA 11.2, sounds good, let’s install cudatsne!
Never Install cudatsne using pip
On their Installation page, they recommend using conda, however I found my CUDA version is too high, I tried to install the 10.1 version on my computer anyway, but the problem is after I type in `conda install tsnecuda cuda101 -c cannylab
, I waited for 10 minutes and conda was still not able to install it for me.
I canceled it and tried pip install cudatsne
because in my experience, pip always works faster than conda, I got it installed but when I use it complains
cudatsne OSError: libfaiss.so: cannot open shared object file: No such file or directory
I went back to their installation page and found this
OK, it seems the problems is the package itself, I check the version of the cudatsne
I installed by running pip list|grep cudatsne
and found it was 0.1.1
, and was released on Aug 1 2018. But on the Github page it displays a version of 2.1
.
Finally, I decided to install it offline using conda.
Install cudatsne offline using conda
I first went to the conda tsne repository to download the latest release with CUDA 10.1, I downloaded it in my computer and install it offlie.
conda install --offline tsnecuda-2.1.0-cuda101.tar.bz2
Compare speed with sk-learn
Conclusions
- CUDATSNE is indeed much faster
- CUDATSNE can work on higher CUDA version (at least CUDA 11.2) though its latest version is for CUDA 10.1
- pip is sometimes too old
- Offline installation could be a solution when pulling from conda repo directly is too slow
Run it on Google Colab
If you are also interested in running it on Colab, you can follow these two links.
https://github.com/CannyLab/tsne-cuda/issues/59
https://colab.research.google.com/drive/11YFkI_XA9DUE2lEnmGu08LZB7dv5-mbb