Starting today, our second generation of laptops (based on the 6th gen Intel Skylake platform) will now come with the Intel Management Engine neutralized and disabled by default. Users who already received their orders can also update their flash to disable the ME on their machines.
In this post, I will dig deeper and explain in more details what this means exactly, and why it wasn’t done before today for the laptops that were shipping this spring and summer.
Think of the ME as having 4 possible states:
In my previous blog post about taming the ME, we discussed how we neutralize the ME (note that this was on the first generation, Broadwell-based Purism laptops back then), but we’ve taken things one step further today by not only neutralizing the ME but also by disabling it. The difference between the two might not be immediately visible to some of you, so I’ll clarify below.
The two approaches are similar in that they both stop the execution of the ME during the hardware initialization (BUP) phase, but with the ME disabled through the HAP method, the ME stops on its own, without putting up a fight, potentially disabling things that the forceful “me_cleaner” approach, with the “unexpected error” state, wouldn’t have disabled. The PCI interface for example, is entirely unable to communicate with the ME processor, and the status of the ME is not even retrievable.
So the big, visible difference for us, between a neutralized and a disabled ME, is that the neutralized ME might appear “normal” when coreboot accesses its status, or it might show that it has terminated due to an error, while a disabled ME simply doesn’t give us a status at all—so coreboot will even think that the ME partition is corrupted. Another advantage, is that, from my understanding of the Positive Technologies’s research, a disabled ME stops its execution before a neutralized ME does, so there is at least a little bit of extra code that doesn’t get executed when the ME is disabled, compared to a neutralized ME.
In our case, we went with an ME that is both neutered and disabled. By doing so, we provide maximum security; even if the disablement of the ME isn’t functioning properly, the ME would still fail to load its mission-critical modules and will therefore be safe from any potential exploits or backdoors (unless one is found in the very early boot process of the ME).
I want to talk about the neutralizing of the Skylake ME then follow up on how the ME was disabled. However, I first want you to understand the differences between the ME on Broadwell systems (ME version 10.x) and the ME on Skylake systems (ME version 11.0.x).
In my last ME-related post, I gave everyone a rundown of the modules that were in the ME 10.x firmware and which ones were remaining after it was neutered, so, for Skylake, here is the list of modules in a regular ME 11.0.x firmware:
-rw-r--r-- 1 kakaroto kakaroto 184320 Aug 29 16:33 bup.mod -rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 busdrv.mod -rw-r--r-- 1 kakaroto kakaroto 32768 Aug 29 16:33 cls.mod -rw-r--r-- 1 kakaroto kakaroto 163840 Aug 29 16:33 crypto.mod -rw-r--r-- 1 kakaroto kakaroto 389120 Aug 29 16:33 dal_ivm.mod -rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 dal_lnch.mod -rw-r--r-- 1 kakaroto kakaroto 49152 Aug 29 16:33 dal_sdm.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 evtdisp.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 fpf.mod -rw-r--r-- 1 kakaroto kakaroto 45056 Aug 29 16:33 fwupdate.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 gpio.mod -rw-r--r-- 1 kakaroto kakaroto 8192 Aug 29 16:33 hci.mod -rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 heci.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 hotham.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 icc.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 ipc_drv.mod -rw-r--r-- 1 kakaroto kakaroto 11832 Aug 29 16:33 ish_bup.mod -rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 ish_srv.mod -rw-r--r-- 1 kakaroto kakaroto 73728 Aug 29 16:33 kernel.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 loadmgr.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 maestro.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 mca_boot.mod -rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 mca_srv.mod -rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 mctp.mod -rw-r--r-- 1 kakaroto kakaroto 32768 Aug 29 16:33 nfc.mod -rw-r--r-- 1 kakaroto kakaroto 409600 Aug 29 16:33 pavp.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 pmdrv.mod -rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 pm.mod -rw-r--r-- 1 kakaroto kakaroto 61440 Aug 29 16:33 policy.mod -rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 prtc.mod -rw-r--r-- 1 kakaroto kakaroto 167936 Aug 29 16:33 ptt.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 rbe.mod -rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 rosm.mod -rw-r--r-- 1 kakaroto kakaroto 49152 Aug 29 16:33 sensor.mod -rw-r--r-- 1 kakaroto kakaroto 110592 Aug 29 16:33 sigma.mod -rw-r--r-- 1 kakaroto kakaroto 20480 Aug 29 16:33 smbus.mod -rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 storage.mod -rw-r--r-- 1 kakaroto kakaroto 8192 Aug 29 16:33 syncman.mod -rw-r--r-- 1 kakaroto kakaroto 94208 Aug 29 16:33 syslib.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 tcb.mod -rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 touch_fw.mod -rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 vdm.mod -rw-r--r-- 1 kakaroto kakaroto 98304 Aug 29 16:33 vfs.mod
And here is the list of modules in a neutered ME :
-rw-r--r-- 1 kakaroto kakaroto 184320 Oct 4 16:21 bup.mod -rw-r--r-- 1 kakaroto kakaroto 73728 Oct 4 16:21 kernel.mod -rw-r--r-- 1 kakaroto kakaroto 16384 Oct 4 16:21 rbe.mod -rw-r--r-- 1 kakaroto kakaroto 94208 Oct 4 16:21 syslib.mod
The total ME size dropped from 2.5MB to 360KB, which means that 14.42% of the code remains, while 85.58% of the code was neutralized with me_cleaner.
The reason the neutering on Skylake-based systems removed less code than on Broadwell-based systems is because of the code in the ME’s read-only memory (ROM). What this “ROM” means is that a small part of the ME firmware is actually burned in the silicon of the ME Core. The ROM content is the first code executed, loaded internally from the ROM, by the ME core, and it has the simple task of reading the ME firmware from the flash, verifying its signature, making sure it hasn’t been tampered with, loading it in the ME Core’s memory and executing it.
The problem with the code in the ROM is that it cannot be removed because it’s inside of the processor itself and, well, it’s Read-Only Memory—it cannot be overwritten in any way, by definition. On the bright side, it is nice to see that most of the code that was previously in the ROM has now been moved to the flash in Skylake systems.
The ME firmware itself has multiple “partitions”, each containing something that the ME firmware needs. Some of those partitions will contain code modules, some will contain configuration files, and some will contain “other data” (I don’t really know what). Either way, the ME firmware contains about a dozen different partitions, each for a specific purpose, and two of those partitions contain the majority of the code modules.
I’ll now explain what has been done to get to this point in the project. When I was done with the coreboot port to the new Skylake machines, I tried to neutralize the ME, thinking it would be a breeze, since me_cleaner claimed support for Skylake. Unfortunately, it wasn’t working as it should and I spent the entire hacking day at the coreboot conference trying to fix it.
The problem is that once the ME was neutralized with me_cleaner, the Wi-Fi module on the Librem was unpredictable: it sometimes would work and sometimes wouldn’t, which was confusing. I eventually realized that if I reboot after replacing the ME, the wifi would keep the same state as it was in before:
I figured that it has something to do with how the PCI-Express card is initialized, and I spent quite some time trying to “enable it” from coreboot with a neutralized ME. I’ll spare you the details but I eventually realized that I couldn’t get it to work because the PCIe device completely ignored all my commands and would simply refuse to power up. It turns out that the ME controls the ICC (Integrated Clock Controller) so without it, it would simply not enable the clock for the PCIe device, so the wifi card wouldn’t work and there is nothing you can do about it because only the ME has control over the ICC registers. I tried to test a handful of different ME firmware versions, but surprisingly, the wifi module never worked on any of those images, even when the ME was not neutralized. Obviously, it meant that the ME firmware was not properly configured, so I used the Intel FIT tool (which is used to configure ME images, allowing us to set things like PCIe lanes, and which clocks to enable, and all of that). Unfortunately, even when an image was configured the exact same way as the original ME image we had, the wifi would still not work, and I couldn’t figure out why.
I shelved the problem to concentrate on the release of coreboot and eventually on the SATA issues we were experiencing. The decision was made to release the Librem 13 v2 and Librem 15 v3 with a regular ME until more work was done on that front, because we couldn’t hold back shipments any longer (and because we can provide updates after shipment). Also note that at that time, the support for Skylake in me_cleaner was very rough—it was removing only half of the ME code because the format of the new ME 11.x firmware wasn’t fully known yet.
A few weeks later, I saw the release of unME11 from Positive Technologies and a week later, Nicola Corna pushed more complete support for Skylake in a testing branch of me_cleaner. I immediatly jumped on it and tested it on our machines. Unfortunately, the wifi issue was still there. I decided to debug the cause by figuring out what me_cleaner does that could be affecting the ME firmware that way.
As I mentioned earlier in this post, the ME firmware is made up of a dozen of partitions, some of those containing code modules, and me_cleaner will remove all the partitions except one, in which it will remove most of the modules and leave only the critical modules needed for the startup of the system. Therefore, I started progressively whitelisting more modules so me_cleaner wouldn’t remove them, and testing if it affected the wifi module. This was annoying to test because I’d have to change me_cleaner, neutralize the ME firmware, then copy the image from my main PC to the Librem then flash the new image, poweroff, then restart the machine, and if the Wifi wasn’t working, which was 99% of the time, I had to copy the files through a USB drive. I eventually restored all of the modules and it was still not working, which made me suspect the cause might be in one of the other partitions, so I gradually added one partition at a time, until the Wifi suddenly worked. I had just added the “MFS” partition, so I started removing the other partitions again one at a time, but keeping the “MFS” partition, and the Wifi was still working. I eventually removed all of the code modules (apart from the critical ones) but keeping the MFS partition, and the wifi was still working. So I had found my fix: I just need to keep the “MFS” partition in the image and the wifi would work.
So, what is this mysterious “MFS” partition? There’s not a lot of information about it anywhere online, other than one forum or mailing list user mentioning the MFS partition as “ME File System”. I decided to use a comparative approach.
The fun thing when comparing ME firmware images: not only are there multiple versions (ex: 10.x vs 11.x), for each single ME version there are multiple “flavors” of it, such as “Consumer” or “Corporate”, and there are also multiple flavors for “mobile” and “desktop”.
After modifying me_cleaner to add support for the Librem, which allows us to neutralize the ME while keeping the Wifi module working, I discussed with Nicola Corna how to best integrate the feature into me_cleaner. We came to the conclusion that having a new option to allow users to select which partitions to keep would be a better method, so I sent a pull request that adds such a feature.
Unfortunately, while the wifi module was working with this change, I also had an adverse side-effect when adding the MFS partition back into the ME firmware: my machine would refuse to power off, for example, and would have trouble rebooting.
The next step for me was to start reverse-engineering the ME firmware, like I had done before. This is of course a very long and arduous process that took a while and for which I don’t really have much progress to show. One thing I wanted to reverse-engineer was the MFS file system format so I could see which configuration files are within it and to start eliminating as much from it as possible. I started from the beginning however, by reverse engineering the entry point in the ROM. I will spare you much of the detail and the troubles in trying to understand some of the instructions, and mostly some of the memory accesses. The important thing to know is that before I got too far along, Positive Technologies announced the discovery of a way to disable the Intel ME, and I needed to test it.
Unfortunately, enabling the HAP bit which disables the ME Core, didn’t work on the Librem: it was causing the power LED to blink very slowly, and nothing I could do would stop it until I removed the battery. I first thought the machine was stuck in a boot loop, but it was just blinking really slowly. I figured out eventually that the reason was that the “HAP” bit was not added in version 11.0.0, but rather in version 11.0.x (where x > 0). I decided to try a newer ME firmware version and the HAP bit did work on that, which confirmed that the ME disablement was a feature added to the ME after the version the Librem came with (11.0.0.1180). So now I have a newer ME (version 11.0.18.1002) that is disabled thanks to the HAP bit, but… no Wi-Fi again.
I decided to retry using the FIT tool to configure the ME with the exact same settings as the old ME firmware. I went through every setting available to make sure it matches, and when I tried booting it again, the ME Core was disabled and the Wifi module was working. Great Success!
Obviously, I then needed to do plenty of testing, make sure it’s all working as it should, confirm that the ME Core was disabled, test the behavior of the system with a ME firmware both disabled and neutralized, and that it has no side effects other than what we wanted.
My previous coreboot build script was using the ME image from the local machine, but unfortunately, I can’t do that now for disabling the ME since it’s not supported on the ME image that most people have on their machines. So I updated my coreboot build script to make it download the new ME version from a public link (found here), and I used bsdiff to patch the ME image with the proper configuration for the WiFi to work. I made sure to check that the only changes to the ME image is in the MFS partition and is configuration data, so the binary patch does not contain any binary code and we can safely distribute it.
The next step will be to continue the reverse-engineering efforts, but for now, I’ve put that on hold because Positive Technologies have announced that they found an exploit in the ME Firmware allowing the executing of unsigned code. This exploit will be announced at the BlackHat Europe 2017 conference in December, so we’ll have to wait and see how their exploit works and what we can achieve with it before going further. Also, once Positive Technologies release their information, it might be possible for us to work together and share our knowledge. I am hoping that I can get some information from them on code that they already reverse engineered, so I don’t have to duplicate all of their efforts. I’d also like to mention that, just as last time, Igor Skochinsky has generously shared his research with us, but also getting data from Positive Technologies would be a tremendous help, considering how much work they have already invested on this.
Right now, I have decided to move my focus to investigating the FSP, which is another important binary that needs to be reverse-engineered and removed from coreboot. I don’t think that anyone is currently actively working on it, so hopefully, I can achieve something without duplicating someone else’s work, and we can advance the cause much faster this way. I think I will concentrate first on the PCH initialization code, then move to the memory initialization.