Jupyterlab was my best friend

For a very long time, I have used Jupyter notebooks via Jupyterlab to code in Python. I was taught to use it in my undergrad and found it to be enough for my needs. I really liked the tight feedback loop of the development process that allows me to see what my code is doing immediately without needing to re-run the entire thing.

But the one thing that really, really grinds my gears is the fact that I was coding the whole time without vim. Using the default Jupyter keymap is fine when I just started to learn programming, but needing to go to my trackpad to navigate within a Jupyter code cell today feels like a big step backward.

“But Richard, we have Jupyter extensions that can enable vim in Jupyterlab.” And yes, the jupyterlab-vim extension exists to bring vim functionality to Jupyter cells. It worked for a time and I was somewhat happy with it. A little odd thing was that the <Esc> key to quit Editing mode to return to Vim’s Normal mode conflicted with the main Jupyter <Esc> key to return to Jupyter’s Navigation mode, so you had to re-bind the key to <Ctrl>-<Esc> (if I remember it correctly). The whole thing worked for a time, until one day I updated it and there was no vim. I was back to editing code like a caveman.

Apparantly when my Jupyterlab got updated to version 4.0, the extensions needed time to get updated too. Though its working now for version 4.0, I can’t help but feel that the Jupyter ecosystem has over-extended itself to become an IDE in the browser. Trying to fix things in Jupyterlab has also been an exercise in frustration (e.g. theming Jupyterlab beyond light or dark mode, setting fonts for code-cells and making sure the changes persist). It just felt like the thing was becoming a mess and some day my environment would just fail on me when I need it most.

But I still like the notebook workflow. I don’t think I can move to developing in .py files again anytime soon. So instead of trying to make Jupyter become an IDE, why not use Jupyter in an IDE?

Using VS code with local conda environments

VS code has the ability to open .ipynb files. Upon opening a file, everything feels right at home. Some of the main navigation keys and shortcuts still work:

  • J/K to navigate cells
  • gg to go to top-most cell
  • G to go to bottom-most cell
  • Escape to go back to navigation mode
  • A to insert new code cell above current cell
  • B to insert new code cell below current cell
  • C to copy current cell
  • V to paste current cell

But what about Vim? I installed the vim extension in vscode and I was good to go. It just works.

But the power feature that changed everything was the fact that VS code knows how to start an IPython kernel without needing a Jupyter server. What??? Usually my workflow would go like this:

  • open a terminal
  • conda activate <env>
  • jupyter lab
  • open browser at localhost:8080

But somehow VS code, with the Jupyter extension, could run a kernel without needing a Jupyter server to run. All you have to do is <Ctrl>+<Shift>+P, then select “Python: Select Interpreter” and give it the path to the python binary in the conda environment binaries folder, and you can start running your code! VS code can even start a kernel the next time you resume by remembering where your conda environment is, so all you have to do is start VS code and you can start coding.

Remote machine Jupyter notebook

Another killer feature of VS code is the ability to connect to a remote machine via SSH and run a VS code server on the remote server. Let me illustrate why.

I always have my Thinkpad with me and editing code on it is a really enjoyable experience, but it has a 5 year old 4-core processor without any significant GPU compute. I don’t have enough computing power to run expensive data processing applications. But I do have a server with an Nvidia GPU in my lab. Usually that would mean that I must go to the lab in order to use the more powerful machine. But if I could SSH to my lab machine, I could develop on the lab machine through my notebook regardless of where I am.

By remoting to my machine via SSH using VS code, I could access the files on my remote machine through the ease and comfort of my local machine’s VS code. Furthermore, I can utilize the conda environment on the remote machine, giving me access to CUDA facilities in my PyTorch code. Though I cannot be in the lab all the time, it is very useful to be able to run notebooks on my remote machine at anytime I want.

But a tricky problem is how you can SSH to a machine that does not have a dedicated public IP.

One method is to use a service that exposes your machine on a public IP like ngrok.

Another method is much more involved, but you can avoid the use of a third party service for better privacy and security. First, perform reverse SSH port-forwarding to a remote virtual private server that you own with a public IP. Note that because your connection can die due to inactivity, the use of autossh is advised.

On the remote server:

ssh -fN -R 5000:127.0.0.1:22 <public-ip server>

Then, port forward that same port to your machine via port forwarding the remote port to your client machine local port.

On your client machine:

ssh -L 5000:127.0.0.1:5000 <public-ip server>

Then SSH localhost at the local port and you then have an SSH connection to the remote server.

On your client machine:

ssh 127.0.0.1 -p 5000

Persistent VS code remote notebook

One issue that I found with the setup in the previous section was that when my SSH disconnects, the kernel is closed. A common workflow that I need is to open a remote notebook, start a training cycle, and then close my notebook and go on with my life. But this is not possible with a remote kernel.

Unfortunately, this means that I cannot use the IPython kernel provided by VS code, but I need to run a Jupyter server on the remote machine in order for the kernel to survive when the SSH connection is closed. This means that even when I close my client machine and the SSH connection is closed, the kernel can keep running on the remote machine and my training cycle will not be interrupted.

The first way to go about it is to run a Jupyter server on the remote machine. First, SSH to the remote machine, then run: jupyter server & in order to make it run in the background. Then remote to the same machine in VS code via SSH and connect to the kernel of the Jupyter server.

The first method is good enough for the purpose of getting a persistent notebook. But I wanted a less hacky solution and not have to use 2 SSH connections.

The second method is to setup a persistent terminal in the VS code remote session.

First, install tmux on the remote server.

Then on your client machine, add these to your settings.json of your VS code.

"terminal.integrated.profiles.linux": {
	"tmux": {
		"path": "/usr/bin/tmux",
		"args": [
			"new-session",
			"-A",
			"-s",
			"main"
		],
	},
},
"terminal.integrated.defaultProfile.linux": "tmux",

These settings allow your terminal in VS code to start with tmux enabled. This has the benefit of allowing you to resume a previous tmux session. By running jupyter server in the tmux session, the jupyter server will be attached to the tmux session. Since disconnecting from an SSH will only detach the tmux session, the tmux session will persist and all processes within the tmux session will also persist. When you disconnect from the remote SSH session in VS code, and then reconnect later on, your terminal session will be restored by attaching to the same tmux session. And since the jupyter server is still alive, your notebook should just resume from where you left off.

That means after you <Ctrl>+<Enter> for your model training cell in your notebook, you can just set your machine to sleep. And after a while you can open your machine, and VS code will prompt you to reconnect (If you use my port-forward solution for SSH, you might need to port-forward the public IP port to your client machine again). Reconnect and you will be right where you left off, where your model has finished training and you can evaluate the results of the training session.

To avoid needing to re-establish the port-forward connection from the public server to your client, set the following in your VS code settings.json

"remote.SSH.defaultForwardedPorts": [
    {
        "localPort": 5000,
        "remotePort": 5000,
        "name": "your public server ip"
    }
]

With this, your VS code remote session will resume without any additional steps.

References