Coreboot on the Librem 13 v2, part 1

Hello everyone! I am very happy to announce that the coreboot port to the Librem 13 v2 as well as the Librem 15 v3 is done! Wow, what an adventure! The entire thing took about 2 weeks of hard work, and an additional week of testing, fixing small issues that kept popping up, and cleaning up the code/commits.

It was truly an adventure, and I would have liked to stop and take the time to write 10 blog posts during that time, one for every major bump in the road or milestone, but I was under a strict deadline because we needed to finish the port before we started shipping the new Librem 13 v2 hardware (from now on referred to as ‘the v2’), so it could be shipping with coreboot pre-installed from day one. Now that the port is finished, I can finally start writing the first chapter in the story.

TL;DR: in the process of porting the Skylake-based Librem 13 v2 to coreboot, I have implemented a new debugging method (“flashconsole”) and added it to coreboot. It has been reviewed and merged upstream. The “flashconsole” driver is a debugging method for coreboot to write its console log to the SPI flash itself. So if you want to port a board to coreboot and you don’t have access to UART (or don’t want to solder UART wires to the motherboad), and can’t use USB debugging (on skylake for example), then you can enable the CONSOLE_SPI_FLASH configuration option and the console log will be written to the flash. When you use your external programmer, just dump the flash first, then you can use ‘cbfstool rom.bin read -r CONSOLE -f console.log‘ to extract the console log from it. No wires, no mess, no soldering required. Well, you do still need the external flasher, but you already have to use it to unbrick the machine since it wasn’t booting (and if it was booting, then you already have log access through cbmem, so you don’t need uart or flash console).

Getting your feet wet

Since I learned my lesson when I first tried to do the v2 port (my, oh so embarassing first attempt), I decided to grab all the logs I could get from the v2 before doing anything else. After running all the commands from the Motherboard Porting Guide, I copied the files over to my work laptop, and when I tried to look at the flash contents, I couldn’t find the rom.bin file! Maybe the cat ate it? It turns out, when I was trying to dump the flash, I hadn’t noticed that the ‘flashrom’ command returned this error :

Found chipset "Intel Sunrise Point (Skylake-U Premium)" with PCI ID 8086:9d48.
ERROR: This chipset is not supported yet.

Well, that’s interesting, flashrom doesn’t support the Skylake processors (confirmed here). So my first task would be to add Skylake support to flashrom. After I mentioned that in IRC, Nico Huber said that he already did port flashrom to Skylake, but it’s just been untested/unreviewed/unmerged. I decided to start reviewing those changes, to get my feet wet, understand how flashrom works, how to read the Intel PCH datasheet, and also to contribute something to the coreboot/flashrom community other than by submitting new code. I’ve sent my comments on a few of those patches, some things got fixed and/or merged, until I had to stop because my deadline was catching up to me.

The boring part

The first thing I had to do when I started the port was to understand what needed to be done. I’ve watched this talk by Shawn Nematbakhsh in the 2014 Chrome OS Firmware Summit where he explained the process of porting a new chromebook board to coreboot. Since I knew that the v2  was based on the skylake family of Intel processors, I looked at existing skylake boards in the coreboot tree and I found a few : Google Chell, Google Glados, Google Lars, Intel KBLRVP and Intel Kunimitsu. I decided to use the Google Chell board as my starting point (this was a random choice), so I copied the mainboard/google/chell directory into mainboard/purism/librem13v2 and I started to edit the files. I mainly edited the Kconfig/Kconfig.name/board_info.txt files to replace ‘google/chell’ by ‘purism/librem13’ everywhere I found it. Then I started removing files or references that were chromeos specific and after fumbling a bit in the dark, and removing anything that I thought wasn’t needed or that I didn’t understand, I managed to get coreboot to compile.

I also took the time of course to extract the vbios, download the FSP image, adapt the Kconfig file, and update the GPIO values to match the original firmware (more on that in a future post).

When I tested it though, it didn’t work, obviously. I needed to know what went wrong, I needed to debug it! Unfortunately, I couldn’t find any UART pads on the motherboard, and when I looked at the usbdebug (which I used for the v1), I realized that it is not implemented for Skylake. After some research, I realized that usbdebug was actually a feature of Broadwell processors, but on Skylake it’s different, it’s called DCI, and it requires some proprietary hardware, and proprietary software to talk over a proprietary protocol in order to get the DCI debugging working. I did not want to do that, so I looked for an alternative.

Now the fun begins

For me, the obvious choice for debugging was to use the SPI flash itself. After all, I was testing different things blindly, and every time that it failed to work, I was using my external flasher to write a new coreboot version to it. Why not read the flash at the same time and grab the log from it. This felt like such an obvious method for debugging but to my surprise, it wasn’t implemented in coreboot, so I decided to do just that.

First, I needed to be able to test my implementation, so I decided to implement it for Broadwell so I could test it on the v1 hardware first. Once it would work on the v1, then I could try it on the v2. I looked at the API used by coreboot to write to the flash, and used that to write my log to a fixed offset in the flash (which I knew was unused), then tested it. It worked! Wow, that was fast. This is going to be soooo easy! Yeah… right…

So now that it worked with a hard coded offset, I need to make it write to a CBFS file. The CBFS (CoreBoot FileSystem, I assume) has various sections in it called ‘files’, and after looking at the API for a bit, I figured out how to make it parse the CBFS and give me the offset and size of the ‘console’ file. I then changed the Makefile.inc so it would add a ‘console’ file to CBFS when the option is enabled. I then test, and my ‘cbfsconsole’ logger doesn’t work anymore. I played with it for a while without understanding the problem. Eventually, I found out that if I was writing at offset 0x200000 (my hardcoded value, pointing to unused space), it was working, but if I was writing to offset 0x260000 (the offset of the ‘console’ file in CBFS), it wasn’t working. I even removed the cbfs-related code and just hardcoded the value to 0x260000, it was still not working. Aaron Durbin (adurbin on IRC) came to the rescue, after understanding what I was doing and the issues I was having, he asked me *how* I was creating the ‘console’ file that was added to cbfs, and he immediatly saw the problem. The file was copied from /dev/zero, so it was all zeroes, but that’s not how NOR flash works. Apparently, you can’t write a ‘1’ in a NOR flash, you can only change a ‘1’ into a ‘0’, but not a ‘0’ into a ‘1’. Since my console file was all zeroes, all my writes were not working. Well, that was news to me, I didn’t know that’s how NOR flash worked.. So, what do you do if you want to turn a ‘0’ into a ‘1’? Well, that’s simple, you do a ‘sector erase’, which will erase the entire sector (turning it entirely into 1s (0xff data)), *then* you can write back your data. That also explains this life-long question I always had of “why does flashrom always erase sectors before it writes the data into it”.

So, here is the right way of populating any sector of the flash on which you expect to write anything :

dd if=/dev/zero count=1 bs=$(__cbfsconsole_size) | tr '\000' '\377'

Now that I had a proper ‘console’ file, and my driver was writing the log. I added the code to do a sector erase so whenever the PC boots, the ‘console’ file would be erased entirely so the log can be written to it safely. After all, if I try to write over existing data, it will just be a mess of old and new data. Unfortunately, this caused me my second headache.

Before I get into that, I will first, quickly talk about all the hours I wasted, trying to enable cbfsconsole for bootblock and romstage. You see, coreboot has 3 stages, each being executed as independent programs. The bootblock is the very first code that gets executed when you turn on the computer, it will setup the processor to use its cache to act as RAM, then it will execute the romstage. The romstage will initialize the RAM, then the ramstage is executed, which will do most of the actual hardware initialization. When I wrote cbfsconsole, I had it initially enabled only for ramstage, and now that it was working, I decided to enable it for the bootblock and romstage stages. Unfortunately, that didn’t work because I was using a global variable in my code, and because the SPI driver is doing crazy things like ‘malloc’ (allocating memory) and also using global variables, which of course, you can’t do before the RAM has been initialized.

So after messing with it for a while, I found out that coreboot uses a nice trick where you can declare global variables in a certain way (using the CAR_GLOBAL macro) and then you need to access them differently (using car_get_var and car_set_var) and then you can use global variables, which will just be stored in the cache-as-ram (CAR) section of the binary, instead of being stored in RAM. After I ported my cbfsconsole to use the CAR_GLOBAL trick, and then spending a few hours, trying to get it to compile, I eventually realized that it was a crazy idea, cbfsconsole used a global variable, but it also used the SPI flasher API which itself used global variables, but the SPI flasher itself used various SPI driver implementations which themselves used malloc and global variables. Even after I disabled all the drivers and only left the WIMBOND driver, and ported it to use CAR_GLOBAL, it was not finished, because the SPI driver uses Broadwell specific PCI interfaces, which I had to include and which themselves used global variables. It was a never-ending game where it was not trivial to port the entire thing to use CAR_GLOBAL. So I eventually gave up and thought that I could never get the logger to work on the bootblock and romstage stages (which is crucially important because the crash on the v2 was happening in those stages).

The following day however, as I was trying to compile coreboot for the v2 (skylake), I enabled the cbfsconsole by mistake and it compiled. Shocked, I realized that on skylake, there was a new hardware-sequencing implementation for the SPI driver which didn’t use any global variables or anything. So yeah, cbfsconsole would have worked if I had just tried it on Skylake before trying it on Broadwell!

Since my cbfsconsole was now properly compiling for Skylake, it was time to test it on the v2. Unfortunately, it was obviously not working. At that point though, I was already joined by Matt “Mr. Chromebox” DeVillier, who had access to a Chromebook Chell with the Servo (debug) connector installed, and he volunteered to test the cbfsconsole on his Google Chell to see if it would work. Once he enabled cbfsconsole, his Chell Chromebook stopped booting, but using his servo debugger, he was able to get the debug logs which showed him this peculiar line :

SPI Transaction Timeout (Exceeded 15 ms) at Flash Offset d0f000 HSFSTS = 0x3f066020

After a lot of debugging, reading the datasheet for the SPI hardware sequencer and testing various things, the only conclusion I could get to was that the Skylake hardware SPI sequencer was frozen for no particular reason, and that was causing everything to just freeze. Well, to be honest, it was freezing because the SPI transactions would timeout, causing the SPI driver to print the debug line “SPI Transaction Timeout” above, which itself would make it try to write it again to the flash, and we end up in an infinite loop.

I gave up again on trying to understand it and I went to the #coreboot IRC channel asking if someone knew why this would happen. Aaron Durbin came to the rescue once again, telling us that some implementations might freeze if we try to erase a sector that is already erased. Of course the sector was already erased, since it was all full of 0xff data (see above), and trying to erase it is what’s causing all our problems! Once I removed the code that erases the sectors, and replaced it with code that would read the ‘console’ file, and set its starting offset to the first occurence of the 0xff byte, the cbfsconsole worked! Matt reported that his Chell device booted, and when I tested on the v2, I finally got some output!

Finalizing things

Now that we had a way to get a debug log out of coreboot, it was time to get it to work on the v2. Unfortunately, it was crashing it seems on the very first thing that it was trying to do. This was the entire log :

coreboot-4.5-1805-g7da7ddf-dirty Fri May 12 20:15:36 UTC 2017 bootblock starting...
Calling FspTempRamInit

It didn’t take long to fix that because Matt had experience with that. It looks like the FSP needs us to give it a Microcode address and length that are not zero. Even though the FSP Integration Guide says that setting the MicrocodeRegionBase and MicrocodeRegionLength parameters to zero means that no microcode update is available, the reality is that the FSP simply freezes or crashes if a valid non-zero-sized Microcode is not provided to it.

In this case, coreboot was disappointing, because on Broadwell, the Microcode files were automatically included in the build since they are distributed with coreboot (in the ‘blobs’ repository), but for Skylake, they are not available, and that’s because Intel decided to make you accept a non-distribute license for downloading the microcode files, so they were of course never added to coreboot. The problem here is that coreboot simply adds a zero-sized cpu_microcode_blob.bin file to CBFS instead of complaining that the file could not be found. Also, we have to manually set the memory location and file size in the config file, even though those values could easily be programatically retrieved using the same CBFS APIs I used for the ‘console’ file.

Anyways, once I added the microcode file and set its position and length in the config, the FspTempRamInit was successful and the bootblock finished and loaded the romstage, which of course crashed at the FspMemoryInit call.

I will spare you the details today of how we got the memory init to work, and I will leave that for a future blog post. Suffice to say that it wasn’t trivial, but once we got memory working, we ended up with another crash. This time, the last line from the log was :

Calling FspTempRamExit API

After investigating the code, it looked like right after that call, the romstage main was returning, and it was jumping back into assembly code to teardown the Cache-as-RAM system that had been setup initially, then it would call the ramstage. I started trying to understand all it did, and trying to figure out why it would crash in the assembly code, and what might have been different in the various registers, etc.. Thankfully, I didn’t waste too much time on this, because Matt was asked to test something on a Google Sentry Chromebook, so he switched his environment to the Sentry and tested what he had to test for someone else, and then he saw that it was crashing at the exact same place as us on the v2. It was working for him before, and yet, it wasn’t working now. He realized he still had the cbfsconsole enabled, and once he disabled it, it booted. This was an incredibly lucky thing for us to realize (without wasting a week on this), that cbfsconsole was the one blocking coreboot. Once I disabled cbfsconsole support for bootblock and romstage, I was happy to see the ramstage booting and the screen light up!

For a while though, I kept it like that. We didn’t need to have cbfsconsole working for both romstage and ramstage, as long as ramstage was booting, that’s all I needed to see the logs of. Fast forward a couple of weeks later, and after the port was done, I came back to this issue. Actually, I simply mentioned the problem that I “still have to fix” on IRC and Aaron Durbin diagnosed the problem and came up with the solution right away. You see, when you use CAR_GLOBAL variable, and the FspTempRamInit is called (which tears down the cache-as-ram), the content of those global CAR variables are copied to the actual RAM, and it works just fine. However, if your variable itself contains a pointer to another CAR variable, then your data was copied but its content points to the old cache address, so when you try to access it, you can’t. This is what was happening for cbfsconsole. The SPI handler itself had the bug because it was storing a pointer to the SPI driver and that variable was wrong after the CAR was removed. The trick to fix it was to simply replace the calls “car_get_var_ptr” into “car_sync_var_ptr” which would migrate the data to the appropriate RAM location before returning the CAR pointer. Once I did that, the cbfsconsole was working on all 3 stages at the same time and without issues.

Porting to FMAP

There was one discussion we had on IRC about cbfsconsole, which was about how it should not be writing to the CBFS in the first place. My argument was that the MRC cache is also written in a CBFS file. The MRC cache is a memory cache, because memory discovery, test and initialization takes about 10 seconds, but those RAM settings are saved in the cache, which can be provided to the FSP, and subsequent boots only take 200 ms to initialize the RAM with the cache. Aaron thought that was wrong too and that the CBFS API doesn’t provide us with any writing mechanism. It is true after all, I used the CBFS API to find the offset and size in the flash for the console file, and used a different API to write to it. Coreboot has a different concept called FMAP (Flash map), which is used to map different areas in the flash to be used by coreboot. The FMAP defines the size of the flash, the offset/size of the BIOS region, and in that region, the offset and size of the CBFS. Aaron believes that the console log should have been written to a separate FMAP area besides the CBFS rather than inside a CBFS file itself. The reason is simple, while it is currently possible to write to it, it is possible that in the future, the CBFS would implement a checksum system, which would become broken if we wrote to it directly in the manner I just used (I think it actually already has checksums, I just ignore them).

After the discussion about the pros/cons of using FMAP instead of CBFS to store the console log, I decided to rename my cbfsconsole driver into ‘flashconsole’, and I implemented support for using an FMAP area. The code has even been simplified now because the FMAP API can directly return the right Read/Write object for us to use and there is no need to work around the limitation of the CBFS API. When the option is enabled the CONSOLE area is automatically added to the resulting coreboot image, and once you dump your flash, you can extract the FMAP console, using this command :

cbfstool rom.bin read -r CONSOLE -f console.log

And that’s it.

Conclusion

The Librem port was a great experience, but it would never have been possible without having access to a debug log. I am happy to have implemented the flashconsole method of debugging, which will be very useful in the future for anyone who wants to port coreboot to a new board, for which they do not have easy access to UART pads. I have sent this feature upstream for review, and it has been merged. You can see it here.

In my next blog post, I will explain how we got the memory init to work, and the various issues we got. Also, what actual steps are needed to port a new board to coreboot (GPIO, memory, PCI, ACPI) and what I’ve learned in the past 2 weeks.

Stay tuned!