Category Archives: GCC

VC4 toolchain hardware testing

I’ve been wanting to get testing for my VC4 toolchain running on a real Raspberry Pi, in order to bash out bugs and then go on to do some improvements to the compiler. Using the simulator is OK, but it’s not complete and almost certainly not 100% accurate in the stuff it does, and I’ve made some “best guess” changes without really verifying them against hardware. So it could be wrong in all sorts of ways.

I managed to hack Kristina’s open firmware┬áin order to load code (i.e. GCC regression test binaries) over the serial cable, and then run it. This is pretty much all you need for GCC testing (load test, run, observe output — again over serial), but you need to be careful to not let the state of the board deteriorate over time due to the effects of the tests you are running. E.g. they might miscompile in such a way that they scribble over the memory of the resident “shell” code that is returned to on completion of each test, or throw exceptions (divide by zero, for instance).

The latter in particular has been causing some trouble: some aspect of the VC4 processor appears to not have been understood properly (by me?). On taking an exception or handling an IRQ, the VC4 switches to exception mode (possibly aka “secure mode”). After we’ve done our exception or interrupt handling, we want to return to the original processor mode (supervisor mode) with an RTI instruction: but, for some reason, that doesn’t work and the processor stays stuck in exception mode. That’s a recipe for disaster: exception mode uses a separate stack pointer, and the sticking behaviour means that subsequently the two stacks trample over each other and the board inevitably crashes.

That still needs to be sorted out, but for today, I’ve come up with a workaround: just reset the board after running each test (no matter if it passes or fails). In a way, that’s kind of better, particularly when we don’t entirely understand the system: assuming the reset works correctly, it means we get a fresh state for each test we run. The method appears stable (though a full test run is rather slow), and results are OK: there are a few differences from a simulator run, which means there’s at least one code generation bug of some description somewhere that the simulator is hiding (something like two wrongs making a right?).

6502 compiler hacking

So I’ve been working again on my port of GCC to the 6502 family of processors, and fixed a bug in a particular awkward little test case:

int N = 1;
void bar (char a[2][N]) { a[1][0] = N; }

This now compiles to:

        ; frame size 0, pretend size 0, outgoing size 0
        ldy N$
        lda N$+1
        adc _r1
        sta _r1
        sta (_r0),y

This is (a) correct, (b) not entirely obvious (because of the combination of variable-length arrays, implicit casting and so on) and (c) quite nicely efficient. That is all.

Assembler relaxation

It turns out that assembler relaxation (as in, choosing the right size of instruction automatically depending on the magnitude of its operands) wasn’t really working, which meant that the largest size of instruction was being used far more than necessary. This gave code that worked, but which was needlessly bloated.

This was mainly, or at least partly, because I was trying to use relaxation too much: CGEN’s RELAXABLE/RELAXED attributes for insns, or RELAX attribute for operands, seem to work best when they’re only used for PC-relative operands, but I was trying to use them for absolute operands too. It works a bit better when you don’t try to do the latter.

But, that’s not the whole story: we still want auto-ranging to work for immediate instructions, and for them generally to choose a sensible size of instruction. So if we have:

    mov r0,#0
    mov r0,#100

We want to choose the smallest (16-bit) instruction for the first, but a larger insn size for the second (there’s a 32-bit mov variant with a 16-bit immediate). The insn size can also depend on the size of some particular section of code:

    <some code>
    <some code>
    mov r0,#. - .L2

here, we still want to choose the smallest size of insn possible. In yet another case, we don’t know how big the immediate might be until link time:

    mov r0,#SomeSymbol

in this case, we want to use the largest insn size possible, since the address of SomeSymbol likely won’t fit any of the smaller insns.

Anyway: I thought we needed the relaxation machinery to make all that work, but it turns out that we don’t — the missing step is that we just need to be careful to reject immediates that would need relocations on small bitfields, and everything seems to work OK.

One other thing: sometimes it’s useful for the compiler to know exactly which insn it’s generating: for those cases, I’ve added optional .s/.m/.l/.x suffixes (for small/medium/large/extra-large) to several ambiguous instructions so that it can do so. But, those aren’t wired up in the compiler proper yet.

Testing 1, 2, 3…

I borrowed some bits from my 6502 GCC port to wire up the instruction-set simulator to GCC’s testsuite: there may be (probably are!) bugs in all of the simulator, binutils and GCC, but PASSes will mean they are all consistent with each other, which is something at least. I also fixed linking so it maps sections to segments (aka program headers) properly, which was a 1-line fix, but as ever, the hard part was figuring out why it wasn’t working to start with.

In GCC proper, register elimination (replacing offset frame pointer accesses with offset stack pointer accesses, etc.) is now working somewhat properly: it was broken for a long time because of a preprocessor misfeature: it turns out if you have something like,

#include <stdio.h>

enum bla {


int main(int argc, char* argv[])
  printf ("Boom!\n");
  printf ("This is what you might expect to execute\n");
  return 0;

Then it compiles without warning, but the “Boom!” line is executed — the undefined identifiers FIRST and SECOND are both replaced with 0, since the preprocessor knows nothing about the enum. Anyway, in the real code, this meant that (via a similar #if condition) eliminations happened to the wrong register, leading to much confusion!

Oh, and (update) here are the initial results:

                === gcc Summary ===

# of expected passes            46281
# of unexpected failures        9307
# of unexpected successes       1
# of expected failures          94
# of unresolved testcases       7623
# of unsupported tests          1556


This architecture has variable-length instruction encoding, which means that assembler relaxation (basically, choosing the right length of instruction to use from several overlapping alternatives to get the one which fits the operands best) has to work, at least in simple-ish cases. The previous binutils work implemented this I think, but my reworking had broken it. But now it’s working again! Mostly, kind of…

There’s the possibility of having linker relaxation too (mostly to choose shorter instructions when possible), but we can get away without that to start with, I think.

Starting to be able to link again

After thoroughly breaking linking (in the name of “cleaning things up”), it’s now starting to work again. BFD seems to have an unfortunate amount of copy/pasted code in it, and the bits which look like target-independent abstractions kind of aren’t, really, at least not all the time. (It’s a bit ironic that the code in the “easy” part of the GNU toolchain — the assembler, linker and so on — is so much worse than the “hard” part, the compiler proper. I suppose because it’s less interesting so people don’t want to work on improving it?)

Also played Fallout 4 a bit.

Scalar insns & disassembly

I’ve been slowly filling in the support for scalar instructions (all done now I think, apart from some rough edges), and started looking at why disassembling with objdump wasn’t working right (I’ve been testing the new assembler with the old disassembler, so far). I partly fixed it by using a custom hash function (that CGEN-generated code uses to look up instructions, based on the value of the first word of each instruction). The other needed part is a corresponding change to the half-implemented feature to allow CGEN to handle multi-word opcodes: that’s still half-finished.

The vector instructions will be fun — the syntax is a lot less uniform (less “assembly-like”) than CGEN supports easily, so it might need a bit of hacking. Not started that yet.

Migrating CVS to GIT

It seems like I might need to put together a couple of patches for CGEN, which happens to still be kept in CVS, and dealing with that isn’t something I’m likely to want to do for fun. Various read-only GIT mirrors of the CGEN repository exist, but none seem to be up-to-date right at the moment. So I thought I’d try doing one myself.

After some experimentation, I think I have a successful & up-to-date import of the relevant bits of the Sourceware tree:

             ChangeLog MAINTAINERS Makefile.def \
             Makefile.tpl README README-maintainer-mode compile \
             config config.guess config.if \
             config.rpath config.sub configure \
    contrib depcomp etc gettext.m4 \
             install-sh lt~obsolete.m4 ltgcc.m4 ltsugar.m4 \
             ltversion.m4 ltoptions.m4 libtool.m4 \
    ltconfig \
             makefile.vms missing mkdep mkinstalldirs \
             move-if-change src-release symlink-tree \

crap-clone -z9 src \
  $(for x in $SRC_SUPPORT; do echo "-d $x"; done)

Which seems to work OK (well enough at least). All those “-d” options are because this repository heavily uses CVS modules: those files/directories are a flattened-out version of the cgen module (deduced from “cvs [...] co -c“). Apparently crap will do an incremental update too if necessary, not that CGEN gets much love these days.