Diving back into coreboot development

Youness Alaoui

Youness Alaoui

Hardware enablement developer
Youness Alaoui

Hello Purists!

Let me first introduce myself: I’m Youness Alaoui, mostly known as KaKaRoTo, and I’m a Free/Libre Software enthusiast and developer. I’ve been hired by Purism to work on porting coreboot to the Librem laptops, as well as to try and tackle the Intel ME issue afterwards.

I know many of you are very excited about the prospect of having coreboot running on your Librem and finally dropping the proprietary AMI BIOS that came with it. That’s why I’ll be posting reports here about progress I’m making—what I’ve done so far, and what is left to be done.

The first priority is to test and finish the initial port of the Librem 13 v1 that coreboot developer Duncan Laurie started (with one of the 4 librem laptops that Purism donated to the coreboot team back then). The second task will be to port our newest Librem 13 “v2” hardware prototype to coreboot. In order to do all that, I have to learn a lot about the BIOS, coreboot internals, and general hardware initialization on modern computers. I have started the learning process at the end of November, and I still have lots to learn.

Today’s progress report is quite lengthy, so if you’re not interested in the technical details, here’s the “TL;DR” version:

  • I’m still in the early stages of my learning curve.
  • I’ve just received the hardware, so I can now start experimenting/testing coreboot on it.
  • I want to proceed very carefully and not rush anything, in order to avoid making any irreversible mistakes and to make sure that nobody bricks or somehow damages their laptop when the update to coreboot is made available (the good news is that it will be possible to update the BIOS on your machine from software only).

Read on for the details of my progress since I started.


Initial Research

I have a diversified low-level development & reverse engineering background, but I have never had the chance to learn about the details of the boot process on a common PC before, so my first task was to learn about it.

Overall, I’ve been a bit disappointed in the lack of clear and understandable information on the process that happens at boot. I was hoping to see some sort of “BIOS for newbies” guide somewhere, but unfortunately, it isn’t that easy 😉

If standard “applications” development is like surfing and middleware+kernel development is like snorkelling, the world of coreboot and BIOSes is a bit like diving to the very bottom of the ocean—a vast and mostly uncharted territory, full of highly complex systems that make you wonder about how it all works. And you must be careful not to step on a stingray.

Going deep.

On the coreboot wiki and associated websites, there is a lot of information that merely triggers more questions in your head. For example:

  • The CS:IP value at boot points to 0xFFFFF0 and that should point to the top of the BIOS, but how does the CPU map that address to the SPI flash rom, does it get read and stored on RAM (obviously not since the RAM is not yet initialized) or is that segment special and automatically triggers hardware-based SPI reads from the flash when accessed ? Does this mean that the CPU itself has a hardware controller for the flash? or is that the super I/O in which case how does the CPU tell the super I/O to do those reads?
  • What about the actual filesystem in the flash, who parses it? Or is that irrelevant since the CPU starts at the top of the BIOS image and the filesystem headers/etc.. is at the bottom (that turns out to not be true since the filesystem magic at offset 0x10 is the first thing being read at boot) ?
  • What is the checklist of things that the BIOS needs to initialize, and how does the southbridge/northbridge/superIO affect it?
  • Assuming I have a superIO with part number ABCD1234 and coreboot supports ABCD1233, what should I be looking for (in the datasheets, presumably) in order to know what to change to make the new chip supported by coreboot?
  • coreboot also uses “device tree” files which appear to be a major component in porting a new board, and pretty much all the documentation I found on those files, their purpose, and their syntax boils down to, “Read the source code for the device tree compiler to understand how it works”.

I also watched a few youtube videos from coreboot developers giving talks at various conferences, and while that was very interesting and educational, it was mostly meant to promote coreboot, or to talk about its history or talk about the higher level rather than the details. There is one talk about coreboot internals, but it seems to be lower level than what I need right now.

Maybe there is good documentation that answers my questions but I just haven’t been able to find it yet, either because I’m not so good with searching, or because I keep getting lost and distracted whenever I search for something. While I complained earlier about the lack of documentation, I feel like I need to clarify here: there actually is a lot of documentation, the problem (when you’re not encountering an area with missing parts) is that it’s easy to get lost and, globally, it is hard to figure out what to read in which order. In various cases the amount and complexity of documentation is simply staggering. Every time I find an interesting document, I start reading it, it starts referencing things (and there are a lot of acronyms in this world) that I never heard of before, so I start researching that, and I find 100 documents to explain what it is, and they either give me the short description (useless) or the detailed technical description, which is another hundreds of pages of documentation, that itself, references other things that I need to research again. Rinse, repeat until exhaustion.

  • One good example is this Motherboard porting guide page which I would have loved to find early and I didn’t stumble on it until my 3rd week of research, even though it’s right there on the Documentation page, but I didn’t see it because that Documentation page first lists a lot of other pages, such as “Creating valid IRQ tables” which led me to research IRQ tables, and PCI devices, and how they work, and what is an IRQ port, and how the interrupts work, etc. then I saw the link about how to solder a socket to your board, then I saw/read a dozen other articles before scrolling down and seeing the motherboard porting guide link which I needed.
  • Another example is that I found various 1000+ pages datasheets from Intel and other whitepapers/articles on the boot process, but there’s no way I’m going to be able to read and understand all of that (without forgetting 99% of it in the process). I usually learn by experimentation, so I saved those documents for later reference and kept researching.

As you can see, we’re in the paradoxical situation where we sit between two extremes: on many areas there is a certain lack of documentation (or missing parts, or hard to find docs), while in other areas there is an excess in documentation, and I can’t seem to find any docs to cover the middleground—some kind of comprehensive Introduction to Everything that doesn’t skip over all the important bits, without letting/requiring you to stray away from the doc.

To summarize: I’ve learned a lot, about the hardware, about the boot process, and about coreboot itself, but I still have a lot of things to learn.

Adventures in Hardware Land

2016-12-16-16-25-03

Probing the Librem 13 v2 prototype

I received the Librem 13 v2 prototype hardware fairly quickly, and my first task was to try and dump the BIOS from it. The BIOS that came with it was from InsydeH2O, and when I tried to run flashrom to get it to dump the flash, it failed and gave me an error message saying that flashrom does not support laptops (due to possible EC conflicts) and that I must use the official tool for reading/writing to the BIOS.

Unfortunately, I couldn’t find an InsydeH2O flasher tool that worked on the machine. I did find the (apparently) appropriate tool but it wasn’t working for me for some reason, so I decided to do things a little differently.

First, I identified SOIC8 chips on the motherboard using a digital microscope, and found the one which I assumed was the BIOS then connected it to a Saleae Logic Pro 16 Logic Analyzer.

2016-12-21-18-13-25

I was able to get a nice trace of the first 10 seconds of boot, showing the BIOS being read. Now I had to make sense of that, so I’ve read parts of the chipset’s datasheet and wrote a script that parses a .csv file generated from the Logic Analyzer and outputs the entire SPI command trace with data and rebuilds the data into a single buffer (I’ve pushed my code to Github if you want it). I was then able to get a nice image of the flash. Only about 43% of the flash was dumped, but it was all that was needed to boot the device, so I assume the data that was not read is simply unused portions of the flash.

logic_analyzer_read_commands

Unfortunately, and I did not know that yet, the data that was read was corrupted. The reason is simply because the logic analyzer wires might have introduced crosstalk and added noise to the data, due to the relatively long wires being used—considering that this is running on 50 MHz, the chances of noise were higher, causing the data to be slightly corrupted. You’ll soon know why this is a problem!

I now (presumably) had a nice dump of the BIOS on hand. Although it was all the data needed to boot the PC, it was still only 42.8% of the actual flash data, so I would have had to do a new and more complete dump, but I didn’t know how I could achieve it without powering the chip:

  • If I turn on the laptop, so the flash is powered, then I can’t talk to the chip because the Southbridge is already controlling it and driving the pins high or low.
  • If instead, I keep the laptop off, and just power the chip, then I will be powering the entire 3.3V rail of the laptop, which could half-boot some other chips, and that just didn’t sound like a good thing for me to do. I didn’t want to be injecting 3.3V into the motherboard while all the other power rails are off, as it might fry some components (maybe not, but I don’t want to run that risk). I’ve seen others mention that they did exactly that (provide external 3.3V power to the chip in order to read it while the PC is off) in the coreboot mailing list and old blog posts, but I was too worried to attempt it myself, considering this is pretty much the only v2 hardware available right now.

So the only solution left for me to finish dumping the BIOS was to remove the chip from the motherboard and use an external flasher to dump it.

Preparing for transplants

I already had plans to replace the SOIC8 flash chip with a socket, because I wanted to have an external flasher and the ability to test my BIOS easily while developing coreboot—otherwise, as soon as I’d flash a bad BIOS, I would run the risk of bricking the laptop.

So the first thing I built was an external programmer, which consisted of an FTDI UM232H board in a breadboard, with a SOIC8 socket soldered on this SMT breakout board from Adafruit. Then I wired everything and tested using some spare SOIC8 flash chips I bought and flashrom was able to read/write to the chip without problems.

2016-12-21-17-52-14

Once that was done, I took every precaution possible before attempting to replace the flash chip on the motherboard itself. After all, I’m not purely a hardware guy—you could say I’m a software guy who plays around with hardware. Thankfully, in the past few years, I’ve learned some soldering skills, but looking at that incredible motherboard was scary. I mean, look at this miniature piece of art, this is what I had to de/re-solder:

2016-12-16-00-20-19

I know the popular method for desoldering a multi-pin chip is to use a heat gun, but I couldn’t see how I could put any kind of broad heat on that area without all these incredibly small resistors being melted and blown away by the hot air. Those resistors looked incredibly small and fragile, and I didn’t know if there were some components on the other side of the PCB that would just fall off the board due to gravity, so the heat gun was not an option.

I started looking for alternative methods to desolder such components and found this video on how to use a copper wire to lift the pads of the chip while desoldering. I planned to do that, so I first started by soldering one of the spare SOIC8 flash chips I bought with the SOIC8 breakout board, then tried to desolder it, but it just came out by using the solder wick, so I didn’t even get to try this new method. I tried to desolder using the solder wick on the flash on the motherboard as well, but that didn’t work, so I tried the copper wire method… and it didn’t work either. But while trying to do that, I noticed the chip moved easily, so I decided to just spam both sides with solder and remove it that way. It only took a couple of seconds to get it removed! And thus it went very easily, and it was clean, and it was great, and there was much rejoicing!

2016-12-21-17-47-40

No notable damage to surrounding areas, besides the plastic vent grid that had to be damaged a bit by the soldering iron (there was no way around it, and it was needed for the socket to fit anyway) and some slight damage to the ground pin from the motherboard—thankfully, the ground pad is still usable since it remained attached to the motherboard even though the pad itself lifted off the board.

Strange things happen

As you know, by that time I already had a working “external programmer” setup, so I removed the solder from the pads of the original flash chip, put it into the socket on the breadboard, and tried to dump it… but flashrom couldn’t find the chip. No problem; I put the flash chip that I had before (with which I had just tested the working programmer 5 minutes earlier)… and it wasn’t working either! Odd. I assumed some wire got disconnected, so I checked everything twice, but nothing was wrong, so I put back both chips and both were unreadable. I grabbed a brand new chip from the tube I got, and tested it, and it worked. So, for some reason, both the original flash chip from the laptop as well as another flash chip which was on the breadboard (away from the laptop) have decided to both become completely unresponsive within 5 minutes of them being both working. “Hmmm.”

The only explanation I see for this would be some kind of ESD (electro-static discharge) that damaged the chip, but this wouldn’t make sense either because, first of all, I was wearing an anti-static wrist strap (for the first time in my life, because even though I never damaged anything with ESD before, I didn’t want to push my luck with this one-of-a-kind prototype laptop) and, secondly, if it was an ESD it should have damaged one chip—not two—considering that I wasn’t touching both at the same time. I had also disconnected the battery from the motherboard before attempting any work, so I don’t think I fried the chip by causing a short circuit with solder while desoldering (and that still wouldn’t explain the second chip from the breadboard also being damaged). I guess this is one of the big unresolved mysteries of life: why did the chicken cross the road, and why did those chips self-destruct?

Oh well! Not a big deal, right? I grabbed a new chip, flashed it with the dump I had made using the logic analyzer, and put it into the laptop… Except it didn’t boot.

facepalm

And that’s why I spoke earlier about the data getting corrupted. I eventually changed my code to see if there ever were multiple reads of the same addresses, and to compare the values between the two reads when that happened, which confirmed to me that there was indeed corruption, since there were a few ranges where the data was read more than once and it returned different results in multiple reads.

I know the socket on the Librem 13 v2 hardware is properly soldered though: if I put a blank flash the laptop shuts down immediately after turning on, but if I put the corrupted flash in it, it stays on, the keyboard backlight is on, but the screen never turns on and the laptop doesn’t boot.

So, one week in, and I already “broke” my v2 hardware—but thankfully, it’s nothing major, at least not anything like what I was imagining (lots of resistors/components desoldered and loose, scratching the board with the iron, dropping a huge bead of solder somewhere on the MB, or shorting some pins that would make the motherboard throw lots of smoke on first boot, etc!) I just need to get a copy of the original BIOS, or maybe try to experiment with coreboot and gradually fix things until I get things working.

Getting the Librem 13 v1 hardware

Through all this, I had not yet received a Librem v1 laptop for testing the existing coreboot port. Remember my statement at the start of this post where I said that the first priority is to get the existing coreboot port to work on the v1 hardware? Well, to do that, I first needed to get my hands on a v1 Librem 13. Unfortunately, Purism has already sold all of the v1 laptops from inventory (with no plans on ordering new ones since production has moved to the v2 hardware). Thankfully however, after a long wait (3 weeks!) and numerous internal emails, we finally managed to get a hold of one such unit. We had originally planned some complicated swapping (asking James to forfeit his Librem 13 and Jeff to send him his Librem 15 in exchange, which means backups/transfers and complex shipping timing to minimize downtime), until we got a surprise email from Nicole three weeks down the road where she offered to give up her Librem 13 unit, for Science! Nicole said, “The prospect of some progress on the Coreboot front is worth almost any effort so I gladly contribute what I can.”

Wrapping up

As you can see, we’re at the beginning of this new effort, and the Librem 13 v1 is another story that I’ll be covering in the next blog post(s). It’ll be interesting as well, so stay tuned for more news!