Latest posts by Youness Alaoui (see all)
- Adventures with coreboot and NVM Express storage - October 11, 2018
- Intel FSP reverse engineering: finding the real entry point! - April 2, 2018
- February 2018 coreboot update now available - February 22, 2018
In my last blog post, I have spoken of the completion of the Purism coreboot port for the Librem 13 v1 and mentioned that I had some good news about the Intel Management Engine disablement efforts (to go further than our existing quarantine) and to “stay tuned” for more information. Since then I got a little side-tracked with some more work on coreboot (more below), but now it’s time to share with you the good news!
Ladies and Gentlemen, Clean Your Engines!
I am happy to say that neutralizing the ME works! I investigated the effectiveness of neutralizing the Management Engine using the me_cleaner tool (which is an amazing feat of the community), and then I tested to make sure the ME was indeed neutralized and that the Librem 13 stays on for over 30 minutes. We plan to go even further than that in the future and reverse-engineer the remaining parts just so we can attain 100% freedom.
First of all, you need to understand what me_cleaner does. You can of course go read the technical details on their wiki on how it works, but to put it simply, the ME is organized in multiple modules, each handling a specific task. The me_cleaner tool deletes most modules (utilities, kernel, network stack, and a Java virtual machine—Yes! You read that right), pretty much everything except the hardware initialization (BUP = Bring UP) module in the ME image. After the BUP module is executed, it can’t find the other modules, so it stops executing (as it has nothing to execute into), but at that point the 30 minutes watchdog has already been disabled by the BUP itself, so we can keep running. This is already a great improvement! The watchdog is precisely the issue that had been blocking us when we did our initial investigations a year ago.
After I ran the me_cleaner script on the BIOS image, and flashed it, I needed to test and make sure that the ME was indeed neutralized. I used the “intelmetool” that comes with coreboot and which is used to communicate with the ME PCI device to get information from it. Unfortunately, the intelmetool kept crashing, which was a good sign because it apparently couldn’t find the ME, but a crash (segmentation fault) is not really a conclusive answer… so I looked at its code, figured out what it did wrong, fixed it, then tested it again. This time, it gave me lots of output, and it confirmed that the ME was basically unresponsive. I then checked the output of “cbmem”—coreboot’s debug log during the boot sequence—and it showed that the ME was now stuck in “bring up phase”, its state was “recovery” instead of “normal”.
Bring out the Champagne! The ME is not only quarantined, it is now officially neutralized and the Librem remains working beyond the 30 minutes time limit that Intel had put in place!
A question remains, however: “What exactly did we remove, and what remains?” So I tried to dig into that as well.
First of all, the Intel ME image takes 2MB of space in the BIOS flash, but not all of those 2MB are used. It’s made of different modules, which can be compressed with LZMA or with a private/secret Huffman dictionary. There is a total of 1.2MB of actual compressed code in the image, which gives us a total of 1.6MB (1662976 bytes) of uncompressed code in 23 modules.
Of those 23 modules, 21 modules are completely removed from the ME partition, and we leave only 2 modules: ROMP and BUP. The ROMP module is a “ROM bypass” module which is used to bypass the ROM initialization code and it’s less than 1KB of code, used to load the BUP module and execute it. The BUP module is a 116KB module which is used to initialize the ME hardware. So we end up with 120KB (122880) of data (108224 bytes actually, if we ignore the end of the ROMP and BUP modules which are empty) which represents 7.38% of the total ME code. We have effectively removed over 92.6% of the ME code without any adverse effects (but see further below).
And so we removed plenty of stuff, but most importantly, we completely removed the ME kernel as well as the network stack. You can see the full list of modules here:
## Original ME modules : total 1.6M 8.0K -rw-r--r-- 1 kakaroto kakaroto 8.0K Feb 28 17:08 AFWS-20687000.mod 12K -rw-r--r-- 1 kakaroto kakaroto 12K Feb 28 17:08 BOP-20392000.mod 116K -rw-r--r-- 1 kakaroto kakaroto 116K Feb 28 17:08 BUP-200d4000.mod 16K -rw-r--r-- 1 kakaroto kakaroto 16K Feb 28 17:08 CLS-206e0000.mod 4.0K -rw-r--r-- 1 kakaroto kakaroto 4.0K Feb 28 17:08 ClsPriv-20716000.mod 12K -rw-r--r-- 1 kakaroto kakaroto 12K Feb 28 17:08 FPF-206b3000.mod 132K -rw-r--r-- 1 kakaroto kakaroto 140K Feb 28 17:08 FTPM-20777000.mod 60K -rw-r--r-- 1 kakaroto kakaroto 60K Feb 28 17:08 HOSTCOMM-20396000.mod 24K -rw-r--r-- 1 kakaroto kakaroto 24K Feb 28 17:08 HOTHAM-2032b000.mod 16K -rw-r--r-- 1 kakaroto kakaroto 16K Feb 28 17:08 ICC-203ad000.mod 272K -rw-r--r-- 1 kakaroto kakaroto 272K Feb 28 17:08 JOM-208c2000.mod 344K -rw-r--r-- 1 kakaroto kakaroto 344K Feb 28 17:08 KERNEL-200f8000.mod 28K -rw-r--r-- 1 kakaroto kakaroto 28K Feb 28 17:08 MCTP-20379000.mod 28K -rw-r--r-- 1 kakaroto kakaroto 28K Feb 28 17:08 ME_TUNNEL-203b4000.mod 52K -rw-r--r-- 1 kakaroto kakaroto 52K Feb 28 17:08 NET_STACK-20383000.mod 20K -rw-r--r-- 1 kakaroto kakaroto 20K Feb 28 17:08 NFC-208bb000.mod 196K -rw-r--r-- 1 kakaroto kakaroto 204K Feb 28 17:08 Pavp-20040000.mod 124K -rw-r--r-- 1 kakaroto kakaroto 124K Feb 28 17:08 POLICY-2034d000.mod 4.0K -rw-r--r-- 1 kakaroto kakaroto 4.0K Feb 28 17:08 ROMP-200d2000.mod 60K -rw-r--r-- 1 kakaroto kakaroto 60K Feb 28 17:08 SESSMGR-20719000.mod 44K -rw-r--r-- 1 kakaroto kakaroto 44K Feb 28 17:08 SESSMGR_PRIV-2015a000.mod 4.0K -rw-r--r-- 1 kakaroto kakaroto 4.0K Feb 28 17:08 UPDATE-2003e000.mod 32K -rw-r--r-- 1 kakaroto kakaroto 32K Feb 28 17:08 utilities-2036f000.mod ## Cleaned ME modules : total 120K 4.0K -rw-r--r-- 1 kakaroto kakaroto 4.0K Feb 28 17:15 ROMP-200d2000.mod 116K -rw-r--r-- 1 kakaroto kakaroto 116K Feb 28 17:15 BUP-200d4000.mod
A few things to watch out for
Possible graphics problems
Unfortunately for me, on one of my machine’s set-ups, the i915 graphics driver would constantly crash with Wayland. I have tried an Ubuntu 16.04 live USB and haven’t had any problems with it, but when trying with two different PureOS installs, I had one being extremely stable while the other had the graphics driver crashing. I could still SSH into the machine and do what I wanted, but I couldn’t login into my desktop. Running “startx” in a terminal was working, however, without causing additional crashes of the graphics driver.
Other people on the team tested on their own Librem 13 and couldn’t reproduce the issue.
I spent some time trying to debug that phenomenon, but without much success. I tried updating/downgrading the kernel and comparing Wayland versions, and couldn’t figure out what was different between my two PureOS installs. I eventually put that aside because I had other things to do, and this could wait, given that I only experienced this problem on one particular machine.
Microcode or no microcode, that is the question!
Then came the idea of removing the microcode update from coreboot. This is a tricky question.
- The way the CPU is made, it comes with a predefined “microcode”, basically some sort of “arrangement” of the low-level transistor blocks to define the “high-level” x86 instruction sets the processor supports. Sometimes if an instruction doesn’t behave the way it should, Intel will release a microcode update to “re-arrange” the transistor blocks in order to fix bugs in how the instructions are behaving. Those bugs can be anything: silent data corruption, security flaws, or very visible kernel panics.
- Some people, however, may decide not to have a microcode update in their BIOS because it’s technically an unknown binary—even though the CPU hardware itself already comes with an initial microcode configuration pre-burned in its silicon.
After researching the implications of removing the microcode update from coreboot, I tested it. I ran prime95 for over 28 hours without any errors (what I forgot to mention in my previous blog post is that my prime95 results back then were actually made on a microcode-less system!) The system seems to run fine, boots, logs me in without problems and is perfectly usable, so that’s great… but it’s of course no guarantee that the system won’t have hidden bugs that I can’t notice, or small data corruption, etc. If anyone wants to remove the microcode updates from their BIOS, they can do that, and they can be safe in knowing that the system will be “usable”, but of course this comes with a big disclaimer on the risks involved. Todd (our CEO) has tested his machine extensively with coreboot without microcode updates and said that the machine would lock-up completely in less than 24 hours, after a few days of testing, he added back the microcode updates and the system became stable again. Your mileage may vary.
Here are some comments about this that I’ve received from the #coreboot IRC channel:
<avph> microcode problems are weird. They can appear in many different problems. I have encountered: VT-X very broken, movies playing (SIMD broken?), wrong CPUID, …
<pgeorgi> I guess the most common issues fixed by microcode updates are typically related to caching bugs, which often have security impact. TLB broken, write-back unstable, stuff like that. There’s hardly a single tool you can run and hope that it’ll catch all those bugs.
<the7thstranger> I have never seen an x86 chip that didn’t fail in some way without ucode updates, so don’t get your hopes up
Back for more coreboot work
In the introduction, I mentioned some coreboot issues that distracted me. Nothing major at the beginning: there was a small typo (and apparently without consequences) in one of the commits I wrote in coreboot, which was not noticed until after it was merged, so I had to fix that (quick and easy) and send it to coreboot for review/merging (not so quick and easy). The coreboot team was great in giving me good feedback pretty quickly (my commit message wasn’t up to their standards because I did it in a rush), and it got merged upstream.
Then, unfortunately, during beta testing, someone in the team bricked his Librem 13 (good thing we’re testing with our own devices first, huh?) We’re not yet sure why this happened, so I’m waiting to receive this person’s unit to debug that. In the meantime, I had to send my Librem 13 to them so they can get their laptop back to work. Once I receive their “brick”, I’ll be able to investigate why it’s not booting, whether it was a problem in flashing coreboot (due to QubesOS or a bad version of flashrom or due to user error), if it’s a problem with coreboot itself, or if it’s a problem with the hardware (that laptop might have been an early prototype, which means the hardware may be different, I’ll have to check to make sure).
For now, I’m also still working on a fool-proof (as much as possible) install script that will build coreboot for you and install it, limiting any risks of user-error that might cause a bricked machine. Once I know what happened to that bricked Librem (and fix it), then I’ll be able to continue working on that installation script (it’s hard to test it without any hardware on hand!), after which we’ll do some more “in-house” beta testing before releasing a public beta test for everybody.
Thankfully, any brick of the laptop can easily be recovered by using an external hardware flasher and the original BIOS. I have the original BIOS from that specific machine, since a backup was made before coreboot was flashed to it, so I can recover it quickly.