Starting today, our second generation of laptops (based on the 6th gen Intel Skylake platform) will now come with the Intel Management Engine neutralized and disabled by default. Users who already received their orders can also update their flash to disable the ME on their machines.
In this post, I will dig deeper and explain in more details what this means exactly, and why it wasn’t done before today for the laptops that were shipping this spring and summer.
The life and times of the ME
Think of the ME as having 4 possible states:
- Fully operational ME: the ME is running normally like it does on other manufacturers’ machines (note that this could be a consumer or corporate ME image, which vary widely in the features they ‘provide’)
- Neutralized ME: the ME is neutralized/neutered by removing the most “mission-critical” components from it, such as the kernel and network stack.
- Disabled ME: the ME is officially “disabled” and is known to be completely stopped and non-functional
- Removed ME: the ME is completely removed and doesn’t execute anything at any time, at all.
In my previous blog post about taming the ME, we discussed how we neutralize the ME (note that this was on the first generation, Broadwell-based Purism laptops back then), but we’ve taken things one step further today by not only neutralizing the ME but also by disabling it. The difference between the two might not be immediately visible to some of you, so I’ll clarify below.
- A neutralized ME is a ME image which had most of its code removed.
- The way the ME firmware is packaged on the flash, is in the form of multiple modules, and each module has a specific task, such as : Hardware initialization, Firmware updates, Kernel, Network stack, Audio/Video processing, HECI communication over PCI, Java virtual machine, etc. When the ME is neutralized using the me_cleaner tool, most of the modules will be removed. As we’ve seen on Broadwell, that meant almost 93% of the code is removed and only 7% remains (that proportion is different on Skylake, see further below).
- A neutralized ME means that the ME firmware will encounter an error during its regular boot cycle; It will not find some of its critical modules and it will throw an error and somehow fail to proceed. However, the ME remains operational, it just can’t do anything “valuable”. While it’s unable to communicate with the main CPU through the HECI commands, the PCI interface to the ME processor is still active and lets us poke at the status of the ME for example, which lets us see which error caused it to stop functioning.
- When the ME is disabled using the “HAP” method (thanks to the Positive Technologies for discovering this trick), however, it doesn’t throw an error “because it can’t load a module”: it actually stops itself in a graceful manner, by design.
The two approaches are similar in that they both stop the execution of the ME during the hardware initialization (BUP) phase, but with the ME disabled through the HAP method, the ME stops on its own, without putting up a fight, potentially disabling things that the forceful “me_cleaner” approach, with the “unexpected error” state, wouldn’t have disabled. The PCI interface for example, is entirely unable to communicate with the ME processor, and the status of the ME is not even retrievable.
So the big, visible difference for us, between a neutralized and a disabled ME, is that the neutralized ME might appear “normal” when coreboot accesses its status, or it might show that it has terminated due to an error, while a disabled ME simply doesn’t give us a status at all—so coreboot will even think that the ME partition is corrupted. Another advantage, is that, from my understanding of the Positive Technologies’s research, a disabled ME stops its execution before a neutralized ME does, so there is at least a little bit of extra code that doesn’t get executed when the ME is disabled, compared to a neutralized ME.
Kill it with fire! Then dump it into a volcano.
In our case, we went with an ME that is both neutered and disabled. By doing so, we provide maximum security; even if the disablement of the ME isn’t functioning properly, the ME would still fail to load its mission-critical modules and will therefore be safe from any potential exploits or backdoors (unless one is found in the very early boot process of the ME).
I want to talk about the neutralizing of the Skylake ME then follow up on how the ME was disabled. However, I first want you to understand the differences between the ME on Broadwell systems (ME version 10.x) and the ME on Skylake systems (ME version 11.0.x).
- The Intel Management Engine can be seen as two things; first, the isolated processor core that run the Management Engine is considered “The ME”, and second, the firmware that runs on the ME Core is also considered as being “the ME”. I often used the two terms interchangeably, but to avoid confusion, I will from now on (try to) refer to them, respectively, as the ME Core and the ME Firmware, but note that if I simply say the ME, then I am probably referring to the ME Firmware.
- The ME Firmware 10.x was used on Broadwell systems which had an ARC core, while the ME Firmware 11.0.x used on Skylake systems uses an x86 core. What this means is that the architecture used by the ME core is completely different (kind of like how PowerPC and Intel macs used a different architecture, or how most mobile devices use an ARM architecture, the Broadwell ME Core used an ARC architecture). This means that the difference between the 10.x and 11.0.x ME firmwares is major, and the cores themselves are also very different. It’s a bit like comparing arabic to korean!
- As the format of the ME firmware changed significantly, it took a while to figure out how to decompress the modules and understand how to remove the modules without breaking anything else. Nicola Corna, the author of the me_cleaner tool, recently was able to add support for Skylake machines by removing all the non essential modules.
In my last ME-related post, I gave everyone a rundown of the modules that were in the ME 10.x firmware and which ones were remaining after it was neutered, so, for Skylake, here is the list of modules in a regular ME 11.0.x firmware:
-rw-r--r-- 1 kakaroto kakaroto 184320 Aug 29 16:33 bup.mod
-rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 busdrv.mod
-rw-r--r-- 1 kakaroto kakaroto 32768 Aug 29 16:33 cls.mod
-rw-r--r-- 1 kakaroto kakaroto 163840 Aug 29 16:33 crypto.mod
-rw-r--r-- 1 kakaroto kakaroto 389120 Aug 29 16:33 dal_ivm.mod
-rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 dal_lnch.mod
-rw-r--r-- 1 kakaroto kakaroto 49152 Aug 29 16:33 dal_sdm.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 evtdisp.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 fpf.mod
-rw-r--r-- 1 kakaroto kakaroto 45056 Aug 29 16:33 fwupdate.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 gpio.mod
-rw-r--r-- 1 kakaroto kakaroto 8192 Aug 29 16:33 hci.mod
-rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 heci.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 hotham.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 icc.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 ipc_drv.mod
-rw-r--r-- 1 kakaroto kakaroto 11832 Aug 29 16:33 ish_bup.mod
-rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 ish_srv.mod
-rw-r--r-- 1 kakaroto kakaroto 73728 Aug 29 16:33 kernel.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 loadmgr.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 maestro.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 mca_boot.mod
-rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 mca_srv.mod
-rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 mctp.mod
-rw-r--r-- 1 kakaroto kakaroto 32768 Aug 29 16:33 nfc.mod
-rw-r--r-- 1 kakaroto kakaroto 409600 Aug 29 16:33 pavp.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 pmdrv.mod
-rw-r--r-- 1 kakaroto kakaroto 24576 Aug 29 16:33 pm.mod
-rw-r--r-- 1 kakaroto kakaroto 61440 Aug 29 16:33 policy.mod
-rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 prtc.mod
-rw-r--r-- 1 kakaroto kakaroto 167936 Aug 29 16:33 ptt.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 rbe.mod
-rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 rosm.mod
-rw-r--r-- 1 kakaroto kakaroto 49152 Aug 29 16:33 sensor.mod
-rw-r--r-- 1 kakaroto kakaroto 110592 Aug 29 16:33 sigma.mod
-rw-r--r-- 1 kakaroto kakaroto 20480 Aug 29 16:33 smbus.mod
-rw-r--r-- 1 kakaroto kakaroto 36864 Aug 29 16:33 storage.mod
-rw-r--r-- 1 kakaroto kakaroto 8192 Aug 29 16:33 syncman.mod
-rw-r--r-- 1 kakaroto kakaroto 94208 Aug 29 16:33 syslib.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Aug 29 16:33 tcb.mod
-rw-r--r-- 1 kakaroto kakaroto 28672 Aug 29 16:33 touch_fw.mod
-rw-r--r-- 1 kakaroto kakaroto 12288 Aug 29 16:33 vdm.mod
-rw-r--r-- 1 kakaroto kakaroto 98304 Aug 29 16:33 vfs.mod
And here is the list of modules in a neutered ME :
-rw-r--r-- 1 kakaroto kakaroto 184320 Oct 4 16:21 bup.mod
-rw-r--r-- 1 kakaroto kakaroto 73728 Oct 4 16:21 kernel.mod
-rw-r--r-- 1 kakaroto kakaroto 16384 Oct 4 16:21 rbe.mod
-rw-r--r-- 1 kakaroto kakaroto 94208 Oct 4 16:21 syslib.mod
The total ME size dropped from 2.5MB to 360KB, which means that 14.42% of the code remains, while 85.58% of the code was neutralized with me_cleaner.
The reason the neutering on Skylake-based systems removed less code than on Broadwell-based systems is because of the code in the ME’s read-only memory (ROM). What this “ROM” means is that a small part of the ME firmware is actually burned in the silicon of the ME Core. The ROM content is the first code executed, loaded internally from the ROM, by the ME core, and it has the simple task of reading the ME firmware from the flash, verifying its signature, making sure it hasn’t been tampered with, loading it in the ME Core’s memory and executing it.
- On Broadwell, there is about 128KB of code burned in the ME Core’s ROM. That 128KB of code contains the bootloader as well as some system APIs that the other modules can use.
- On Skylake, the ROM code was decreased to 17KB, leaving only the basic bootloader, and moving the system APIs to a module of their own inside the ME firmware.
- This means that the total amount of code remaining, including the ROM is 360+17KB out of 2524+17KB = 377/2541 = 14.84% for Skylake, while on Broadwell, it’s 120 + 128KB out of 1624+128KB = 248/1752 = 14.15% of code remaining. The difference is much smaller now when we account for the code hidden in the ROM of the processor.
The problem with the code in the ROM is that it cannot be removed because it’s inside of the processor itself and, well, it’s Read-Only Memory—it cannot be overwritten in any way, by definition. On the bright side, it is nice to see that most of the code that was previously in the ROM has now been moved to the flash in Skylake systems.
The ME firmware itself has multiple “partitions”, each containing something that the ME firmware needs. Some of those partitions will contain code modules, some will contain configuration files, and some will contain “other data” (I don’t really know what). Either way, the ME firmware contains about a dozen different partitions, each for a specific purpose, and two of those partitions contain the majority of the code modules.
I’ll now explain what has been done to get to this point in the project. When I was done with the coreboot port to the new Skylake machines, I tried to neutralize the ME, thinking it would be a breeze, since me_cleaner claimed support for Skylake. Unfortunately, it wasn’t working as it should and I spent the entire hacking day at the coreboot conference trying to fix it.
The problem is that once the ME was neutralized with me_cleaner, the Wi-Fi module on the Librem was unpredictable: it sometimes would work and sometimes wouldn’t, which was confusing. I eventually realized that if I reboot after replacing the ME, the wifi would keep the same state as it was in before:
- if I neutralized the ME and reboot, it would still work, but after powering off the machine and turning it on, the wifi would stop working;
- if I restored a full ME (instead of a neutralized one) and rebooted, the wifi would remain dead;
- …but if I power off the machine and turn it back on, the wifi would finally be restored.
I figured that it has something to do with how the PCI-Express card is initialized, and I spent quite some time trying to “enable it” from coreboot with a neutralized ME. I’ll spare you the details but I eventually realized that I couldn’t get it to work because the PCIe device completely ignored all my commands and would simply refuse to power up. It turns out that the ME controls the ICC (Integrated Clock Controller) so without it, it would simply not enable the clock for the PCIe device, so the wifi card wouldn’t work and there is nothing you can do about it because only the ME has control over the ICC registers. I tried to test a handful of different ME firmware versions, but surprisingly, the wifi module never worked on any of those images, even when the ME was not neutralized. Obviously, it meant that the ME firmware was not properly configured, so I used the Intel FIT tool (which is used to configure ME images, allowing us to set things like PCIe lanes, and which clocks to enable, and all of that). Unfortunately, even when an image was configured the exact same way as the original ME image we had, the wifi would still not work, and I couldn’t figure out why.
I shelved the problem to concentrate on the release of coreboot and eventually on the SATA issues we were experiencing. The decision was made to release the Librem 13 v2 and Librem 15 v3 with a regular ME until more work was done on that front, because we couldn’t hold back shipments any longer (and because we can provide updates after shipment). Also note that at that time, the support for Skylake in me_cleaner was very rough—it was removing only half of the ME code because the format of the new ME 11.x firmware wasn’t fully known yet.
A few weeks later, I saw the release of unME11 from Positive Technologies and a week later, Nicola Corna pushed more complete support for Skylake in a testing branch of me_cleaner. I immediatly jumped on it and tested it on our machines. Unfortunately, the wifi issue was still there. I decided to debug the cause by figuring out what me_cleaner does that could be affecting the ME firmware that way.
As I mentioned earlier in this post, the ME firmware is made up of a dozen of partitions, some of those containing code modules, and me_cleaner will remove all the partitions except one, in which it will remove most of the modules and leave only the critical modules needed for the startup of the system. Therefore, I started progressively whitelisting more modules so me_cleaner wouldn’t remove them, and testing if it affected the wifi module. This was annoying to test because I’d have to change me_cleaner, neutralize the ME firmware, then copy the image from my main PC to the Librem then flash the new image, poweroff, then restart the machine, and if the Wifi wasn’t working, which was 99% of the time, I had to copy the files through a USB drive. I eventually restored all of the modules and it was still not working, which made me suspect the cause might be in one of the other partitions, so I gradually added one partition at a time, until the Wifi suddenly worked. I had just added the “MFS” partition, so I started removing the other partitions again one at a time, but keeping the “MFS” partition, and the Wifi was still working. I eventually removed all of the code modules (apart from the critical ones) but keeping the MFS partition, and the wifi was still working. So I had found my fix: I just need to keep the “MFS” partition in the image and the wifi would work.
So many firmwares, so little time
So, what is this mysterious “MFS” partition? There’s not a lot of information about it anywhere online, other than one forum or mailing list user mentioning the MFS partition as “ME File System”. I decided to use a comparative approach.
The fun thing when comparing ME firmware images: not only are there multiple versions (ex: 10.x vs 11.x), for each single ME version there are multiple “flavors” of it, such as “Consumer” or “Corporate”, and there are also multiple flavors for “mobile” and “desktop”.
- When I extracted and compared all the partitions of all the variants and flavors, the only difference between a mobile and a desktop image is in the MFS partition, as every other partition shares the same hash between two flavors of the same version.
- I then compared the various partitions between a configured and a non configured ME firmware, and noticed that what the Intel FIT tool does when you change the system’s configuration is to simply write that configuration inside of the MFS partition.
- This means that the MFS partition, which doesn’t contain any code modules, is used for storage of configuration files used by the ME firmware. This is somewhat confirmed by the fact that the MFS partition is marked as containing data.
After modifying me_cleaner to add support for the Librem, which allows us to neutralize the ME while keeping the Wifi module working, I discussed with Nicola Corna how to best integrate the feature into me_cleaner. We came to the conclusion that having a new option to allow users to select which partitions to keep would be a better method, so I sent a pull request that adds such a feature.
Unfortunately, while the wifi module was working with this change, I also had an adverse side-effect when adding the MFS partition back into the ME firmware: my machine would refuse to power off, for example, and would have trouble rebooting.
- The exact behavior is that if I power off the machine, Linux would do the entire power off sequence then stop, and I would have to manually force shutdown the Librem by holding the power button for 5 seconds. As for the rebooting issue, instead of actually rebooting when Linux finishes its poweroff sequence, the system will be frozen for a few seconds before suddenly shutting itself down forcibly, then turning itself back on 5 seconds later, on its own. This isn’t the most critical of issues, but it would be very annoying to users, and unfortunately, I couldn’t find the cause of this strange behavior. All I knew was that if I remove the MFS partition, coreboot says the ME partition is corrupted, and the wifi module doesn’t work, and if I keep the MFS partition, coreboot says the ME partition is valid, the wifi module works, but the poweroff/reboot issues automatically appear.
- The solution for these issues turned out to be unexpectedly simple. After another of our developers said he was ready to live with the poweroff/reboot issues, and I sent him a neutralized ME for his system, I was told that his machine was working fine with no side-effects at all. I didn’t know what the difference between his machine and mine was, other than the fact that my machine is a prototype and his was a “production” machine. I then tested my neutralized ME on the “production” Librem 13 unit I had on hand, and I didn’t have any side effects of the neutralizing of the ME firmware. I then updated my coreboot build script to add the neutralization option and asked users on our forums to test it, and every one who tested the neutralized ME reported back success with no side-effects. I then realized the problem is probably only caused by the prototype machine that I was using. Well, I can live with that.
Disabling the ME
The next step for me was to start reverse-engineering the ME firmware, like I had done before. This is of course a very long and arduous process that took a while and for which I don’t really have much progress to show. One thing I wanted to reverse-engineer was the MFS file system format so I could see which configuration files are within it and to start eliminating as much from it as possible. I started from the beginning however, by reverse engineering the entry point in the ROM. I will spare you much of the detail and the troubles in trying to understand some of the instructions, and mostly some of the memory accesses. The important thing to know is that before I got too far along, Positive Technologies announced the discovery of a way to disable the Intel ME, and I needed to test it.
Unfortunately, enabling the HAP bit which disables the ME Core, didn’t work on the Librem: it was causing the power LED to blink very slowly, and nothing I could do would stop it until I removed the battery. I first thought the machine was stuck in a boot loop, but it was just blinking really slowly. I figured out eventually that the reason was that the “HAP” bit was not added in version 11.0.0, but rather in version 11.0.x (where x > 0). I decided to try a newer ME firmware version and the HAP bit did work on that, which confirmed that the ME disablement was a feature added to the ME after the version the Librem came with (18.104.22.1680). So now I have a newer ME (version 22.214.171.1242) that is disabled thanks to the HAP bit, but… no Wi-Fi again.
I decided to retry using the FIT tool to configure the ME with the exact same settings as the old ME firmware. I went through every setting available to make sure it matches, and when I tried booting it again, the ME Core was disabled and the Wifi module was working. Great Success!
Obviously, I then needed to do plenty of testing, make sure it’s all working as it should, confirm that the ME Core was disabled, test the behavior of the system with a ME firmware both disabled and neutralized, and that it has no side effects other than what we wanted.
My previous coreboot build script was using the ME image from the local machine, but unfortunately, I can’t do that now for disabling the ME since it’s not supported on the ME image that most people have on their machines. So I updated my coreboot build script to make it download the new ME version from a public link (found here), and I used bsdiff to patch the ME image with the proper configuration for the WiFi to work. I made sure to check that the only changes to the ME image is in the MFS partition and is configuration data, so the binary patch does not contain any binary code and we can safely distribute it.
Moving towards the FSP
The next step will be to continue the reverse-engineering efforts, but for now, I’ve put that on hold because Positive Technologies have announced that they found an exploit in the ME Firmware allowing the executing of unsigned code. This exploit will be announced at the BlackHat Europe 2017 conference in December, so we’ll have to wait and see how their exploit works and what we can achieve with it before going further. Also, once Positive Technologies release their information, it might be possible for us to work together and share our knowledge. I am hoping that I can get some information from them on code that they already reverse engineered, so I don’t have to duplicate all of their efforts. I’d also like to mention that, just as last time, Igor Skochinsky has generously shared his research with us, but also getting data from Positive Technologies would be a tremendous help, considering how much work they have already invested on this.
Right now, I have decided to move my focus to investigating the FSP, which is another important binary that needs to be reverse-engineered and removed from coreboot. I don’t think that anyone is currently actively working on it, so hopefully, I can achieve something without duplicating someone else’s work, and we can advance the cause much faster this way. I think I will concentrate first on the PCH initialization code, then move to the memory initialization.