VC4 toolchain hardware testing

I’ve been wanting to get testing for my VC4 toolchain running on a real Raspberry Pi, in order to bash out bugs and then go on to do some improvements to the compiler. Using the simulator is OK, but it’s not complete and almost certainly not 100% accurate in the stuff it does, and I’ve made some “best guess” changes without really verifying them against hardware. So it could be wrong in all sorts of ways.

I managed to hack Kristina’s open firmware┬áin order to load code (i.e. GCC regression test binaries) over the serial cable, and then run it. This is pretty much all you need for GCC testing (load test, run, observe output — again over serial), but you need to be careful to not let the state of the board deteriorate over time due to the effects of the tests you are running. E.g. they might miscompile in such a way that they scribble over the memory of the resident “shell” code that is returned to on completion of each test, or throw exceptions (divide by zero, for instance).

The latter in particular has been causing some trouble: some aspect of the VC4 processor appears to not have been understood properly (by me?). On taking an exception or handling an IRQ, the VC4 switches to exception mode (possibly aka “secure mode”). After we’ve done our exception or interrupt handling, we want to return to the original processor mode (supervisor mode) with an RTI instruction: but, for some reason, that doesn’t work and the processor stays stuck in exception mode. That’s a recipe for disaster: exception mode uses a separate stack pointer, and the sticking behaviour means that subsequently the two stacks trample over each other and the board inevitably crashes.

That still needs to be sorted out, but for today, I’ve come up with a workaround: just reset the board after running each test (no matter if it passes or fails). In a way, that’s kind of better, particularly when we don’t entirely understand the system: assuming the reset works correctly, it means we get a fresh state for each test we run. The method appears stable (though a full test run is rather slow), and results are OK: there are a few differences from a simulator run, which means there’s at least one code generation bug of some description somewhere that the simulator is hiding (something like two wrongs making a right?).

6502 compiler hacking

So I’ve been working again on my port of GCC to the 6502 family of processors, and fixed a bug in a particular awkward little test case:

int N = 1;
void bar (char a[2][N]) { a[1][0] = N; }

This now compiles to:

bar:
        ; frame size 0, pretend size 0, outgoing size 0
        ldy N$
        lda N$+1
        clc
        adc _r1
        sta _r1
        tya
        sta (_r0),y
        rts

This is (a) correct, (b) not entirely obvious (because of the combination of variable-length arrays, implicit casting and so on) and (c) quite nicely efficient. That is all.

Assembler relaxation

It turns out that assembler relaxation (as in, choosing the right size of instruction automatically depending on the magnitude of its operands) wasn’t really working, which meant that the largest size of instruction was being used far more than necessary. This gave code that worked, but which was needlessly bloated.

This was mainly, or at least partly, because I was trying to use relaxation too much: CGEN’s RELAXABLE/RELAXED attributes for insns, or RELAX attribute for operands, seem to work best when they’re only used for PC-relative operands, but I was trying to use them for absolute operands too. It works a bit better when you don’t try to do the latter.

But, that’s not the whole story: we still want auto-ranging to work for immediate instructions, and for them generally to choose a sensible size of instruction. So if we have:

    mov r0,#0
    mov r0,#100

We want to choose the smallest (16-bit) instruction for the first, but a larger insn size for the second (there’s a 32-bit mov variant with a 16-bit immediate). The insn size can also depend on the size of some particular section of code:

.L2:
    <some code>
    <some code>
    mov r0,#. - .L2

here, we still want to choose the smallest size of insn possible. In yet another case, we don’t know how big the immediate might be until link time:

    mov r0,#SomeSymbol

in this case, we want to use the largest insn size possible, since the address of SomeSymbol likely won’t fit any of the smaller insns.

Anyway: I thought we needed the relaxation machinery to make all that work, but it turns out that we don’t — the missing step is that we just need to be careful to reject immediates that would need relocations on small bitfields, and everything seems to work OK.

One other thing: sometimes it’s useful for the compiler to know exactly which insn it’s generating: for those cases, I’ve added optional .s/.m/.l/.x suffixes (for small/medium/large/extra-large) to several ambiguous instructions so that it can do so. But, those aren’t wired up in the compiler proper yet.

Nonzero in C

Everyone’s favourite is-nonzero “operator” in C (evaluating to 0 for an input of zero, or 1 for non-zero) is of course:

!!a

However if you don’t like that, this also works fine:

a && 1

This amuses me for some reason.

Testing 1, 2, 3…

I borrowed some bits from my 6502 GCC port to wire up the instruction-set simulator to GCC’s testsuite: there may be (probably are!) bugs in all of the simulator, binutils and GCC, but PASSes will mean they are all consistent with each other, which is something at least. I also fixed linking so it maps sections to segments (aka program headers) properly, which was a 1-line fix, but as ever, the hard part was figuring out why it wasn’t working to start with.

In GCC proper, register elimination (replacing offset frame pointer accesses with offset stack pointer accesses, etc.) is now working somewhat properly: it was broken for a long time because of a preprocessor misfeature: it turns out if you have something like,

#include <stdio.h>

enum bla {
  FIRST,
  SECOND
};

#define SOMETHING FIRST
#define OTHERTHING SECOND

int main(int argc, char* argv[])
{
  #if SOMETHING == OTHERTHING
  printf ("Boom!\n");
  #else
  printf ("This is what you might expect to execute\n");
  #endif
  return 0;
}

Then it compiles without warning, but the “Boom!” line is executed — the undefined identifiers FIRST and SECOND are both replaced with 0, since the preprocessor knows nothing about the enum. Anyway, in the real code, this meant that (via a similar #if condition) eliminations happened to the wrong register, leading to much confusion!

Oh, and (update) here are the initial results:

                === gcc Summary ===

# of expected passes            46281
# of unexpected failures        9307
# of unexpected successes       1
# of expected failures          94
# of unresolved testcases       7623
# of unsupported tests          1556

Hello world

“Hello world” is compiling and linking, more or less, but (predictably) doesn’t work, at least with the instruction-set simulator. Hmm, seems like debugging this might be hard!

My table is tidy now due to some unscheduled cleaning-up time. It might get replaced with a new desk soon!

Relaaaaaax

This architecture has variable-length instruction encoding, which means that assembler relaxation (basically, choosing the right length of instruction to use from several overlapping alternatives to get the one which fits the operands best) has to work, at least in simple-ish cases. The previous binutils work implemented this I think, but my reworking had broken it. But now it’s working again! Mostly, kind of…

There’s the possibility of having linker relaxation too (mostly to choose shorter instructions when possible), but we can get away without that to start with, I think.

Starting to be able to link again

After thoroughly breaking linking (in the name of “cleaning things up”), it’s now starting to work again. BFD seems to have an unfortunate amount of copy/pasted code in it, and the bits which look like target-independent abstractions kind of aren’t, really, at least not all the time. (It’s a bit ironic that the code in the “easy” part of the GNU toolchain — the assembler, linker and so on — is so much worse than the “hard” part, the compiler proper. I suppose because it’s less interesting so people don’t want to work on improving it?)

Also played Fallout 4 a bit.