Background

In recent days one thing really bugged me: my laptop has terrible battery life. I have long accepted that it was because my laptop was old, or that because I was running Linux. I have always used my machine to type in a text-editor, browse the web or listen to music. None of these seemed to me as particularly intensive computing tasks, and yet my machine would always drop a percentage of battery after around a few minutes of use.

For a long time, my solution was to use power-profiles-daemon. It looked easy to use, just use power-saving, balanced or performance modes. When I set it to power-saving, my processor would run at the lowest speed (800MHz) and everything would slow to a crawl. But hey, if it meant that I could use my computer for a longer period of time, I didn’t mind the slightly slower performance. After all, I mostly use emacs anyway, which could run on potato-level performance machines. But it still wasn’t enough.

I started to seriously consider getting ARM machines. Those Macbook airs do look nice, and my current machine is nearing 5 years of use. I was convinced that I was due for an upgrade, not because I needed a faster computer, but because I needed a more efficient one.

This topic cropped up in a conversation with one of my friends. His suggestion was to just use Windows. And so I wondered, why is the battery life of machines running Linux worse than Windows? I just assumed that it was due to Windows getting more attention from manufacturers to actually produce functional drivers and firmware for the machine and its components to run well. I then tried to find a satisfactory reason on the internet… and it just reinforces my belief. Linux will always have worse battery life due to using drivers that are not optimized to my machine.

First Realization

But I came across a post on reddit. It showed me so many things that I never thought of before, like power capping, panel self-refresh. I tried to power cap my CPU to a few watts rather than the default 20 watt limit. I also enabled panel self-refresh to see if it conserves more power.

However, my watt usage just would not go below 4 watts for a freshly-booted system with no programs loaded. I have tried everything, from TLP and Powertop, to following the arch guide on power optimizations.

Then I saw this comment. This user experienced the opposite - his battery life on Linux is a lot better than that of Windows. I was intrigued. Then he revealed the clue: Package C-state (package C0, C2, C3, C6, C8, and C10). This was a totally alien concept to me. Furthermore, this is one area in which Linux systems can trump windows. Apparently a windows system hardly goes below PC6 since there are so many background tasks running all the time, meaning that the PC cannot idle at the lowest power most of the time. Powertop can give you a read-out on C-state residency for your CPU, Package, and GPU. And true enough, my computer was not able to go down to the deepest Package C-state level (PC10).

Intel has a blog post on how to troubleshoot PC10 residency issues. The comment from earlier also mentioned the same things. These are the areas to troubleshoot:

  • enable PCIe runtime power mangement
  • enable panel self-refresh
  • ensure LTR requests are not preventing your device from reaching PC10

Guide

A quick way to check if PC10 can be reached is with the following command:

sudo cat /sys/kernel/debug/pmc_core/package_cstate_show

Run it for a few times. When the numbers for PC10 increment, you know that it is being reached.

PCIe runtime power management can be enabled with the following kernel option in a file /etc/modprobe.d/newfile.conf or where you put your kernel boot options.

options pcie_aspm=force

PCIe aspm must also be set to a power-saving mode. I chose the lowest option “powersupersave”.

echo "powersupersave" |
    sudo tee /sys/module/pcie_aspm/parameters/policy

Panel self-refresh can be enabled with the kernel option:

options i915.enable_psr=0

For issues with LTR (latency tolerance reporting), it is more tricky. First, get the number of devices.

sudo cat /sys/kernel/debug/pmc_core/ltr_show

Then disable each device by passing a digit from 0 to (number of devices - 1). For me, I found 4 to be the culprit for preventing my CPU from entering PC10.

echo 4 |
    sudo tee /sys/kernel/debug/pmc_core/ltr_ignore

Second Realization

After I have set my computer to apply these changes on boot, it would be able to enter PC10 state right after booting. But this only applies when in the TTY. When I login and my window manager loads, PC10 cannot be accessed.

A big clue I found was that by booting into the default clean configuration of SwayWM, PC10 can be accessed. This showed that it was something in my configuration that was preventing PC10 from being accessed. I started to disable things in my config and logging out and logging back into SwayWM. In the end, I found that my status bar program, waybar, was the problem. I further narrowed it down to the Tray component of waybar that was preventing my computer from accessing the deeper PC10 state.

I then replaced waybar with sway-bar + i3status-rust. This status bar allowed my CPU to access the deeper PC10 idle mode.

What is happening?

If we refer to the intel documentation, we see that the processor enters PC10 when:

  • All IA cores in C10 + Processor Graphic cores in RC6.
  • The platform components/devices allow proper LTR for entering Package C10.

This means that both the CPU and GPU must idle, and that device LTR must not have issues.

If something is constantly running in the background, the processor cannot enter into PC10. Also, if one component has a misconfigured LTR, the processor also cannot enter into PC10. This is why it is so tricky to get a Linux system with proper functioning package C-states. Either the default graphical environment is constantly running something in the background, preventing deep package C-states, or one of the component LTR is bugged, also preventing deep package C-states.

Results

Previously, my idle wattage would never go below 4 watts. Now with firefox and emacs open, the idle wattage is 2.5 watts.

Realistically, usage wattage previously would be around 4.5-5 watts, while usage wattage is now around 3-3.5 watts.

Due to the severely degraded state of my battery(s), I only have 28Wh of capacity compared to the rated 42Wh’s. 28Wh can last for around 6 hours and 10 minutes at 4.5 watts, and 8 hours at 3.5 watts. That is a significant battery life gain considering the age of the machine. Perhaps I do not need a new computer after all (tbh I really need to get new batteries).

Thoughts

My past attempts to manage power consumption on my laptop was mainly by managing clockspeeds. I thought that a processor consumed power based on how high it is clocked to. Others have found big gains by just running powertop or tlp, but tweaking both has not given me signficantly greater battery life. But what I missed was processor C-states, especially Package C-states. The processor was designed to power-gate portions of itself when not in use. Clearly Intel has put in a lot of effort to allow the processor to idle at very low power when not in use. But because of a lack of knowledge, I just attributed the high idle power usage to Intel processors being Intel processors; hot and power-hungry.

I was pleasantly surprised at how efficient an Intel 8th gen processor can be. When allowed to idle, it can go down to 2-2.5 watts. I am glad that my assumptions were proven wrong and that Linux systems can indeed be very efficient. However what I found is that many things can interfere with the system’s ability to enter into the lowest-power PC10 state. These troubleshooting steps are highly technical and non-obvious. If not for the reddit post and the subsequent Intel blog posts and documentation, I would not have realized that there was anything wrong with my computer.

Knowing all these, I am curious as to how many Linux (and even windows) machines out there are misconfigured to never be able to idle at deeper C-states. As most reviews of laptops only focus on the performance of the laptop at maximum power, the very significant factor of idle power is often over-looked. Moreover, it might actually be the single largest contributor of the actual battery-life experience of the user. This might be one reason why Intel laptops are still able to last very long despite AMD chips having a much better usage power-draw.

In my own usage, I usually have a browser and a text-editor open. Most of the time, I am reading something from my browser and pausing to think before writing on my editor. I am essentially doing nothing with the computer in these moments. And in these moments of inactivity with my current configuration, the panel engages self-refresh and allows the processor to deep-idle. But when I need to do something again, like scrolling and typing, the machine exits its deep-idle quickly and responds, before quickly going back to idling. This allows the average power usage to remain very low while still being snappy.

Coming back to where I started, it is nice to be able to squeeze just a little bit more out of my machine. In my next machine, the ability to achieve PC10 residency is a definite must-have.

Updates [2023-02-19 Sun]

I realized that doing the following actually rendered my USB unstable. Audio would consistently underrun, causing terrible audio artifacts when using USB audio.

echo 4 |
    sudo tee /sys/kernel/debug/pmc_core/ltr_ignore

This was always meant to be a hack and not meant to be used as the final solution. Hence I decided to do a bit of digging.

By referring to this document and searching for the keyword “IP OFFSET” under the “check device LTR value” section, I found that the IP OFFSET of 4 referred to the XHCI. In other words, the USB-host controller was being affected. That explained why my USB audio device was bugging out.

To see what devices were being used, the command lsusb -tv is helpful. It turns out that the XHCI was not only responsible for my USB ports, but was also handling my bluetooth and camera as well.

I disabled my camera from within the BIOS firmware settings. And voila! My computer could enter PC10 states even without the “ltr_ignore” hack.

What was a further shock to me was that when I cut my screen-power to the minimum, an idle-power of 1.9 watts was possible. No matter what I did prior to this (killing all radios, turning off the display etc.), the lowest idle-power was always around 2.3 watts.

Damn it, it was the camera all-along…