Immediate Data Tracking

sol-azy performs tracking of immediate values loaded from .rodata via LD_DW_IMM instructions.
This feature is crucial to recover strings, error messages, and embedded constants that are otherwise opaque in the bytecode.


How it works

Solana eBPF programs often use this pattern to load a constant string:

lddw   r1, 0x1000043e8
mov64  r2, 9
  • The lddw instruction loads an offset in .rodata
  • The mov64 gives a length (usually in bytes)
  • sol-azy uses these two to extract a slice of memory and decode it

If the memory region looks printable (ASCII-compatible), it is rendered as a string like:

b"You lose!"

Otherwise, a hex-escaped byte string is emitted.


Output File: immediate_data_table.out

This file lists all detected .rodata ranges accessed via LD_DW_IMM, whether or not they were also used in disassembly.

Format

Each line contains:

<absolute_address> (+ <relative_offset>): <decoded_bytes>

Example:

0x1000043e0 (+ 0x43e0): b"You win!"
0x1000043e8 (+ 0x43e8): b"You lose!"
0x100004434 (+ 0x434): b"Not enough data. Need two u32 values.src/entrypoint.rs"

The relative_offset is computed relative to MM_RODATA_START, and is used to index into the ELF's .rodata section.


Visual Reference

Here's a screenshot of a real immediate_data_table.out file generated from a test case:

Immediate Data Table Output

We can see:

  • Success and failure strings: "You win!", "You lose!"
  • Panic messages
  • Rust format strings
  • Even full numeric patterns (e.g. "00010203...")

Behind the scenes

The logic is handled by this function:

#![allow(unused)]
fn main() {
fn disassemble_wrapper(
    program: &[u8],
    analysis: &mut Analysis,
    imm_tracker_wrapped: Option<&mut ImmediateTracker>,
    path: P,
)
}

Each LD_DW_IMM is analyzed, and its value is registered using:

#![allow(unused)]
fn main() {
imm_tracker.register_offset(insn.imm as usize);
}

Then, for each tracked range:

  • The program is sliced using offset logic
  • The result is passed to:
#![allow(unused)]
fn main() {
pub fn format_bytes(slice: &[u8]) -> String
}

which escapes non-printables and prints ASCII as-is.

LD_DW_IMM: Key Instructions and Address Keys

The tracking system is triggered exclusively by LD_DW_IMM instructions, which are used to load 64-bit constants.
When such an instruction loads an address greater than or equal to MM_RODATA_START, sol-azy considers it a .rodata access.

These addresses become the keys of the immediate_data_table.out output.

Example:

lddw   r1, 0x1000043e0   ; ← This address becomes a key
mov64  r2, 8             ; ← Length hint

This results in:

0x1000043e0 (+ 0x43e0): b"You win!"

Range Truncation: Avoiding Overlaps

In programs with many LD_DW_IMM, multiple memory regions may point into the same .rodata segment.
To avoid overlap between two string regions, sol-azy performs forward truncation:

  • It registers each LD_DW_IMM address (new_start)
  • It finds the next closest start already known
  • It truncates any overlapping previous entry so that no two extracted ranges overlap

This ensures that the memory region allocated for one string or constant does not accidentally contain bytes meant for another.

⚠️ Important Note on Partial Overlaps

The truncation mechanism ensures that two tracked .rodata regions do not overlap, but this does not imply that only the non-overlapping portion of earlier data is relevant.

For example:

  • A lddw at 0x1 loads 4 bytes of useful data.
  • Later, a lddw at 0x3 uses only the high bits (e.g., last 2 bytes).

Even if these regions partially overlap in memory, the system still treats 0x1 as a distinct valid address for its own usage.

This means:

  • The data at 0x1 is still considered to start at 0x1.
  • The data at 0x3 is separately tracked, even if it falls inside a previously registered range.

The truncation is only used to split visible ranges in the output, not to reinterpret or cut off the semantics of earlier loads.

Example

Suppose:

lddw r1, 0x1000043e0      ; key #1
lddw r2, 0x1000043e8      ; key #2, appears later in bytecode

Even if the length for key #1 is unclear (or too long), sol-azy will truncate its range to stop at 0x1000043e8.

This avoids having "You win!" + "You lose!" accidentally merged into one blob, since both of them will be used independently by separate LD_DW_IMM instructions.

Support for sBPF v2+: Address Construction via mov32 + hor64 In sBPF version 2 and above, the use of lddw for loading 64-bit constants is forbidden. Instead, addresses are manually constructed using:

mov32  r1, 0x3000         ; load lower 32 bits
hor64  r1, 0x10000000     ; set upper 32 bits → r1 = 0x1000000000003000

sol-azy handles this by:

  1. Tracking register values using a RegisterTracker
  2. Do an "emulation" of mov and hor64
  3. Resolving loads like ldxdw r2, [dst + off] where dst + off points into .rodata
  4. Extracting and decoding the pointed memory, same as for lddw

This lets the disassembler annotate pointer-based loads even when addresses are assembled dynamically.

Internal Implementation

The tracking structure is a BTreeMap<usize, usize>:

#![allow(unused)]
fn main() {
pub struct ImmediateTracker {
    ranges: BTreeMap<usize, usize>, // start => end
}
}

Each register_offset(new_start) will:

  • Locate the next start value already in ranges
  • Set new_end = next_start
  • Truncate any existing range that would overlap with new_start

This is enforced even if the memory contents could technically overlap — correctness is prioritized.


When this matters

This tracking is especially useful when:

  • The program includes panic messages
  • You want to recover hardcoded strings (e.g. "owner mismatch")
  • You're analyzing solana_program syscalls with string-based I/O
  • You want to reverse undocumented or obfuscated logic

Tips

  • Use --mode disass or --mode both to enable this feature
  • If a string appears truncated, check the corresponding mov64 for its length
  • If no mov64 follows a lddw, the default read length is ~50 bytes for the CFG rendering