Scalar insns & disassembly

I’ve been slowly filling in the support for scalar instructions (all done now I think, apart from some rough edges), and started looking at why disassembling with objdump wasn’t working right (I’ve been testing the new assembler with the old disassembler, so far). I partly fixed it by using a custom hash function (that CGEN-generated code uses to look up instructions, based on the value of the first word of each instruction). The other needed part is a corresponding change to the half-implemented feature to allow CGEN to handle multi-word opcodes: that’s still half-finished.

The vector instructions will be fun — the syntax is a lot less uniform (less “assembly-like”) than CGEN supports easily, so it might need a bit of hacking. Not started that yet.

Migrating CVS to GIT

It seems like I might need to put together a couple of patches for CGEN, which happens to still be kept in CVS, and dealing with that isn’t something I’m likely to want to do for fun. Various read-only GIT mirrors of the CGEN repository exist, but none seem to be up-to-date right at the moment. So I thought I’d try doing one myself.

After some experimentation, I think I have a successful & up-to-date import of the relevant bits of the Sourceware tree:

SRC_SUPPORT="cgen cpu COPYING COPYING3 COPYING.LIB \
             COPYING3.LIB COPYING.NEWLIB COPYING.LIBGLOSS \
             ChangeLog MAINTAINERS Makefile.def Makefile.in \
             Makefile.tpl README README-maintainer-mode compile \
             config config-ml.in config.guess config.if \
             config.rpath config.sub configure configure.ac \
             configure.in contrib depcomp etc gettext.m4 \
             install-sh lt~obsolete.m4 ltgcc.m4 ltsugar.m4 \
             ltversion.m4 ltoptions.m4 libtool.m4 ltcf-c.sh \
             ltcf-cxx.sh ltcf-gcj.sh ltconfig ltmain.sh \
             makefile.vms missing mkdep mkinstalldirs \
             move-if-change setup.com src-release symlink-tree \
             ylwrap"

crap-clone -z9 :pserver:anoncvs@sourceware.org:/cvs/src src \
  $(for x in $SRC_SUPPORT; do echo "-d $x"; done)

Which seems to work OK (well enough at least). All those “-d” options are because this repository heavily uses CVS modules: those files/directories are a flattened-out version of the cgen module (deduced from “cvs [...] co -c“). Apparently crap will do an incremental update too if necessary, not that CGEN gets much love these days.

CGEN bug/missing feature

Some earlier trouble with some instructions getting mis-assembled has returned — it looks like a needed feature for CGEN (for variable-length instruction encodings) simply isn’t implemented, and in fact just emits the wrong result (as if instruction fields are all in the first word). There are various hints in the CGEN source about this, alluding to its half-finished nature. Maybe it can be patched up.

Branches & relocations

Branches aren’t getting assembled correctly, and I’m not sure why! Lots of stuff in binutils — particularly the “bfd” bit — seems to be implemented per-target by mostly copy-pasting, even though each CPU is essentially pretty similar in principle in terms of how label addresses (and so forth) are patched into the instruction stream, and parts of the code look like they should support such “relocations” generically. (Edit: my CGEN pc-relative fields needed the PCREL-ADDR attribute set…)

More CGEN stuff

Learning more about CGEN and slowly increasing coverage of the instruction set. Quite a lot of simple instructions are working now, but some stuff (e.g. branches, relocations) are still missing and/or not wired up properly yet.

Simulator & CGEN

After reshuffling the simulator into something that’ll hopefully work for GCC regression testing, and implementing some stub system calls in libgloss, things are still not working. I hit some rough edges in the binutils port that mean bits of the libgloss stuff won’t even compile.

The binutils bits (assembler, linker and disassembler, mostly) are based on somewhat ad-hoc parsing of an architecture description file: but, binutils already supports its own system for automating the repetitive bits of implementing an assembler and disassembler (and instruction-set simulator): CGEN, written in Scheme (Guile). I thought it might be fun to try migrating the binutils implementation to that, in the hope that the “rough patches” can then be handled by already-existing generic code. It’s not yet clear if that’s a good idea.

Emulation

Once Newlib was built, the stage 2 compiler also built without too much more trouble, giving a “working” compiler executable. But, nothing will link because the runtime initialisation code is missing (usually called “crt0″, and providing a function “_start” written in assembly language that is called immediately on program load and sets up the C runtime environment then calls “main”). I added very simple/dummy versions to libgloss, which is supplied as part of Newlib and contains such code for a variety of other processors/systems. That means we can almost link a “hello world”-type program, although some more stub functions are needed before that’ll work!

But before doing that, it’s important that we have an emulator or instruction-set simulator to run stuff on without getting too grubby with the real hardware just yet. So I looked around at what people had written, and found something that looks like it’ll almost do the trick, though it’s currently hard-wired for performing “in-circuit emulation” reverse-engineering. The requirements for testing GCC are fairly minimal, it just needs to emulate the CPU itself and a single (possibly fake) peripheral to provide character-stream output, so adapting it for that shouldn’t be too hard.

Handily the real hardware provides a simple memory-mapped UART device: using that for test output seems like a good idea, since then running the same code on hardware later on becomes a possibility. So, I’m adding basic support for the UART to the emulator.