hwpatcher.txt

HWPatcher
---------

1) Introduction
---------------

This tool is intended to a be scriptable general purpose file patcher.
The underlying principle is simple: the tool provide lua-callable functions
to load/save/modify binary objects (raw binary, elf, sb, edoc, formats are easily
added). On top of that, a lua library also provides useful functions to
manipulate addresses and to parse/generate ARM/Thumb code. This is very useful
to patch jumpb addresses or generate small stubs when inserting some code into
a firmware for example.

2) Builtin library
------------------

The major concept of this tool is that of firmware. Conceptually, a firmware is
made up of "sections" and each section can be read from or written too. The
notion of section is specific to each format:
- binary: there is only one section called ""
- elf: these are usual elf sections
- sb: sections are named "sec" to refer to a complete section block, or "sec.idx"
      to refer to a particular section and a particular subsection. A subsection
      is defined as a list of instruction ending by a call or a jump. Index start
      at 0.
- edoc: there is only one section called ""

A section is usually made up of a contiguous set of addresses but this is not
mandatory. When reading from a section, all addresses must be mapped to the read
to be successful. When writing to a section, it is guaranteed to succeed if
all addresses are mapped, but some formats may support on-the-fly section creation
or extension.

Since a file can contain several section, the notion of address has to be
extended. For this reason, an "address" (given to the read/write code) is a table
containing two fields:
- address: contain the numeric address within the section
- section: optional section name
If no section is specified, the code will determine if the unspecified is unique
or not. If not, an error will be issued, otherwise the unique matching location
will be used.

NOTE: see 3) on how to create address properly

The following functions are exported to the lua code:
- load_file(filename)       Load a firmware and guess type, return firmware
- load_elf_file(filename)   Load a firmware as ELF, return firmware
- load_sb_file(filename)    Load a firmware as SB, return firmware
- load_sb1_file(filename)   Load a firmware as SB1, return firmware
- load_bin_file(filename)   Load a firmware as binary, return firmware
- save_file(obj, filename)  Save a firmware to a file
- read(obj, addr, len)      Read data (array of bytes) from a firmware
- write(obj, addr, data)    Write data (array of bytes) to a firmware
- section_info(obj, sec)    Return information about a section in a table (or nil)

Section information is provided in a table with the following fields:
- addr: first address in the section
- size: number of bytes after the starting address

The builtin library also provides the following miscellaneous functions:
- md5sum(filename)          Compute the MD5 sum (array of bytes) of a file

TODO/MISSING:
- there is no way to get the list of sections
- currently the code won't extend sections on write
- section information is only implemented for raw binary
- section information needs to be extended to a list of [addr,size] and maybe
  make a difference between data and bss
- add support for jump/call instruction in SB ?

3) The 'lib' library
--------------------

The builtin interface is rather crude to use so some extra functions are provided
in the 'lib' library which can be imported using:

require('lib')

It contains helper functions to create addresses:
- hwp.make_addr(addr, section)  Create a new address
- hwp.inc_addr(addr, amount)    Create a new address from another with offset

NOTE: you should always use hwp.make_addr() to create an address, it provides
safety checks and also a default stringify method to display the address as
a string nicely.

There also are some convenient functions to read/write integers to firmwares:
- hwp.read32(obj, addr)      Read a 32-bit integer
- hwp.write32(obj, addr, v)  Write a 32-bit integer

It also provides the following miscellaneous functions:
- hwp.md5str(md5)      Convert a MD5 sum into a readable string

4) The 'arm' library
--------------------

In order to help patching ARM/Thumb code, the 'arm' library introduces useful
functions to parse and generate instructions. It can be imported using:

require('arm')

WARNING: although the arm library has been tested, it may generate wrong code,
so do not trust it blindly.

First, it is important to understand that the library differentiate ARM and Thumb
code by using the least significant bit of the addresses. In order words, an
address is Thumb if it's even and ARM if it's odd. An ARM address which is not
word-aligned is invalid.

The following functions provide warpers around this concept:
- arm.is_thumb(addr)     Return true if the address is Thumb
- arm.xlate_addr(addr)   Translate an ARM/Thumb address (*)
- arm.to_thumb(addr)     Take any address and make it Thumb
- arm.to_arm(addr)       Take any address and make it ARM

(*) Translating means: for ARM address, return the address and for Thumb, return
    address with least significant bit cleared, in other word this is the actual
    address of the instruction, which is always half-word aligned

It also contains a few useful integer manipulation routines to go from unsigned
integers to signed, either 32-bit or n-bit:
- arm.sign32(v)              Convert a 32-bit unsigned integer to a signed one
- arm.sign_extend(val, n)    Convert a n-bit unsigned integer to a signed one

The vast majority of the code in the arm library is to parse and generate
branch instruction. As such, branches are a first-class citizen and have their
own representation as a table containing:
- type: always equal to "branch"
- addr: jump address
- link: boolean (true is branch & link, false is just branch)

NOTE: you should always use arm.make_branch() to create a branch

The library provides the following functions to work with branches:
- arm.make_branch(addr, link)                Create a branch instruction
- arm.parse_branch(fw, addr)                 Parse a branch instruction
- arm.write_branch(fw, addr, branch, pool)   Write an instruction (*)

Parsing a branch is conceptually easy: given an ARM or Thumb address, it will
parse the opcode and create the corresponding branch, properly handling link
and destination (either ARM or Thumb). If the instruction is not a branch,
it will report an error.
Creating a branch is slightly more involved: given an ARM or Thumb address, it
will try to write a branch. However, given all the possible constraints (ARM/
Thumb, link, destination), it is not always possible to do at all, or just within
the 32-bit of the instruction. For this reason, the user can given an optional
(pass nil to forbid) pool where to put some values. This is useful for indirect
jumps like "ldr pc, [pool]". Note that the library does not report if it used the
pool or not, so in doubt you should always advance the pool by 32-bit (4 bytes)
after writing a branch, if you plan to use the pool for other values.

Finally, the library provide some other useful code generation functions:
- arm.write_return(fw, addr)          Generate a "bx lr" instruction
- arm.write_xxx_regs(fw, addr, load)  Generate a "{stm,ldm}fd sp!,{r0-r12, lr}"