Introduction

sol-azy is a modular, CLI-based toolchain designed for working with Solana programs.
It combines static analysis, reverse engineering, and project building features in one streamlined developer and auditor experience.

What Is sol-azy?

sol-azy provides tools for:

Building Solana programs:
- Supports both Anchor and native SBF workflows
- Handles compilation and artifact organization
Recap:
- Produces a compact, audit-friendly summary per program/IDL
- Extracts instruction-level metadata (Signers, Writable, Constrained, Seeded, Memory)
- Maps IDLs to Anchor crates and performs lightweight source parsing to surface constraints, seeds and memory usage
- Ideal as a quick starting report for security reviews and audits
Static Application Security Testing (SAST):
- Uses a custom Starlark-based rule engine
- Applies pattern-matching on the Rust AST
- Enables writing domain-specific security rules
Reverse Engineering:
- Disassembles compiled sBPF bytecode
- Exports Control Flow Graphs in .dot format
- Tracks and formats immediate data from RODATA
- Annotations simplified with Rust-like pseudocode
Dotting:
- Lets you manually reinsert functions into reduced CFGs from the full .dot graph
- Useful for selectively exploring large or complex programs
Fetcher:
- Retrieves deployed .so binaries from Solana RPC endpoints using a program ID
- Makes it easy to reverse-engineer or audit programs without local builds

Why sol-azy?

While tools like solana, cargo build-sbf, or anchor build focus on building and deployment, sol-azy targets:

Security auditing workflows
Automated code review pipelines
Understanding bytecode-level structure
Writing and applying custom static rules

It integrates tightly with Solana's BPF toolchain and syn parsing to provide source-level and binary-level insights in one place.

Project Structure

sol-azy is structured into several engines and CLI commands:

build – Compile programs and prepare artifacts
recap – Generate quick IDL+source summaries for audits (instruction tables, constraints, seeds, memory flags)
sast – Run static analysis with Starlark rules
reverse – Perform bytecode reverse engineering
dotting – Post-process .dot graphs to manually restore functions in reduced CFGs
fetcher – Retrieve deployed on-chain bytecode for offline inspection

See the full CLI Usage section for more details.

Requirements

Rust + Cargo
Solana Toolchain (for cargo build-sbf)
(Optional) anchor for Anchor support
[mdbook] if you are contributing to or browsing the documentation locally

Next Steps

Installation

This page describes how to set up sol-azy and its required dependencies.

1. Prerequisites

Make sure the following tools are installed on your system:

Tool	Purpose	Install link / command
Rust	Required to compile sol-azy	https://rustup.rs
cargo	Rust package manager	Included with `rustup`
Solana CLI	Needed for SBF builds	https://docs.solana.com/cli/install-solana-cli
anchor (optional)	For Anchor-based projects	`cargo install --git https://github.com/coral-xyz/anchor anchor-cli --locked`

Verify installations:

rustc --version
cargo --version
solana --version
anchor --version # optional

2. Clone the Repository

git clone https://github.com/FuzzingLabs/sol-azy
cd sol-azy

3. Build the Tool

cargo build --release

The binary will be available at:

./target/release/sol-azy

You can also run sol-azy in development using:

cargo run -- <command> [options]

4. Install `mdBook` (optional)

To build or view the documentation locally:

cargo install mdbook
mdbook serve docs

Then open http://localhost:3000

Certainly! Here's a "Known Issues & Troubleshooting" section that addresses the Cargo.lock version mismatch and related errors when building Solana programs with Anchor:

Known Issues & Troubleshooting

⚠️ `Cargo.lock` Version Mismatch Error

When running the build command, you might encounter the following error:

error: failed to parse lock file at: ...
Caused by: lock file version 4 requires -Znext-lockfile-bump

Root Cause

This issue arises due to a mismatch between the Cargo.lock file version and the Rust compiler version used by Solana's build tools. Specifically:

Cargo.lock version 4 is generated by newer versions of Cargo and requires rustc 1.78 or newer.
However, Solana's cargo-build-sbf and anchor build commands may use an older rustc version (e.g., 1.75), leading to this incompatibility.

This discrepancy occurs because Solana's build tools bundle their own Rust toolchain, which might not match the system's Rust version managed by rustup.

Solutions

Update Solana CLI and Anchor

Ensure you're using compatible versions of Solana and Anchor that support the newer Cargo.lock format:

# Update Solana CLI to version 2.1.x or newer
sh -c "$(curl -sSfL https://release.anza.xyz/v2.1.0/install)"
# Update Anchor CLI
cargo install --git https://github.com/coral-xyz/anchor avm --locked
avm install latest
avm use latest

These updates align the toolchains with the expected Cargo.lock version and Rust compiler requirements.

Manually Downgrade Cargo.lock Version (Temporary Workaround)

If updating is not feasible, you can temporarily modify the Cargo.lock file:
- Open Cargo.lock in your project root.
- Change the version line from:
```
version = 4
```
  to:
```
version = 3
```
Note: This is a temporary fix. Running cargo update or similar commands may regenerate the Cargo.lock file with version 4.
Ensure Consistent Rust Toolchain

Verify that the Rust version used by Solana's build tools matches the required version:
```
# Check Rust version used by Solana's cargo-build-sbf
cargo-build-sbf --version
```
If the version is older than required, updating the Solana CLI as shown above should resolve the issue.

Additional Resources

By following these steps, you should be able to resolve the Cargo.lock version mismatch error and continue building your Solana programs successfully.

Need Help?

If something doesn't work, check:

Error messages in the CLI output
That cargo, solana, or anchor are in your PATH
That the bytecode you are reversing is a valid .so file, for instance:

test_cases/base_sbf_addition_checker/bytecodes/addition_checker.so: ELF 64-bit LSB shared object, eBPF, version 1 (SYSV), dynamically linked, stripped
test_cases/base_sbf_addition_checker/bytecodes/addition_checker_sbpf_solana.so: ELF 64-bit LSB shared object, eBPF, version 1 (SYSV), dynamically linked, not stripped

You can also open an issue or contact the maintainers.

CLI Usage

sol-azy provides a command-line interface (CLI) for interacting with Solana programs through various operations:

Building programs
Generate Anchor programs audit-friendly summary
Running static analysis
Reversing compiled bytecode
Modifying CFG .dot files
Fetching deployed bytecode
(Future) Fuzzing and testing support

All commands are accessible via:

cargo run -- <command> [options]

IMPORTANT: Using the --release is wayyyyy faster, so if you don’t need debug logs, I’d recommend using it

Available Commands

`build`

Compiles a Solana project using either Anchor or the native SBF toolchain.

cargo run -- build --target-dir ./my_project --out-dir ./out/

`recap`

Generate a compact, audit-friendly summary (per IDL / program) of an Anchor project.

cargo run -- recap -d ../my-solana-project

`sast`

Runs static analysis using Starlark-based rules on the project's source code.

cargo run -- sast --target-dir ./my_project --rules-dir ./rules/ --syn-scan-only

`reverse`

Performs disassembly, control flow graph (CFG) generation, and immediate value extraction on compiled .so files.

cargo run -- reverse --mode both --out-dir ./out --bytecodes-file ./program.so --labeling

`dotting`

Allows you to edit a reduced control flow graph (.dot) by selectively re-inserting functions from the full graph. This is especially useful when working with large binaries where the full CFG is too dense.

cargo run -- dotting \
  -c temp_config.json \
  -r cfg_reduced.dot \
  -f cfg.dot

`fetcher`

Fetches an on-chain deployed Solana program’s bytecode (.so) using its program ID. Useful when you want to analyze a program without having its local source or compiled artifact.

cargo run -- fetcher \
  --program-id 4MEX8vDCZzAxQkuyd6onJCTeFdof6c1HJgznEtCGqA1N \
  --out-dir ./bytecodes/

Optional RPC override:

cargo run -- fetcher \
  -p 4MEX8vDCZzAxQkuyd6onJCTeFdof6c1HJgznEtCGqA1N \
  -o ./bytecodes/ \
  -r https://api.mainnet-beta.solana.com

`test` (TO DO)

`fuzz` (TO DO)

Quickstart

To get started with sol-azy:

`build` Command (WIP - work-in-progress)

The build command compiles Solana programs located in a given target directory.
It supports both Anchor projects and native SBF (Solana BPF) programs.

Usage

cargo run -- build --target-dir ./examples/my_project --out-dir ./out/

Arguments:

--target-dir: Path to the Solana project root.
--out-dir: Path where build outputs should be saved.
--unsafe-version-switch: (Optional) Flag to auto switch the anchor version

Behavior

sol-azy automatically detects the project type based on its contents:

Type	Detection Criteria
Anchor	Presence of `Anchor.toml`
Native SBF	`Cargo.toml` includes `solana-program`

Depending on the project type, it runs one of:

anchor build --skip-lint (for Anchor)
cargo build-sbf (for SBF)

Before building, the tool runs a series of pre-checks:

Verifies that cargo and/or anchor is installed
Checks if the output directory exists or creates it
Validates the project directory structure

Output

By default, the output directory will contain:

Compiled .so file(s) in subdirectories defined by the framework
Any additional files generated by the Solana toolchain

Example

cargo run -- build \
  --target-dir test_cases/base_sbf_addition_checker \
  --out-dir test_cases/base_sbf_addition_checker/out

This builds a native Solana SBF program and saves the output in test_cases/base_sbf_addition_checker/out.

Reverse — You can use the compiled .so as input for disassembly
SAST — Optional static analysis can run on source before or after build

`recap` Command

The recap command generates a compact, audit-friendly summary for an Anchor project.
It inspects IDL(s) under target/idl/, tries to map each IDL to its Anchor crate, performs lightweight source parsing of #[derive(Accounts)] blocks, and emits per-program Markdown tables with: Instruction | Signers | Writable | Constrained | Seeded | Memory.

Usage

# run recap on current directory (default) -> creates ./recap-solazy.md
cargo run -- recap

# run recap on a specific project path (optional -p) -> creates ./recap-solazy.md in the cwd
cargo run -- recap -d ../my-solana-project

Arguments:

-d, --target-dir <PATH> — optional, path to the project root. If omitted the current working directory is used.

How It Works

Verify the target directory looks like an Anchor project (presence of Anchor.toml).
Discover IDL JSON files under target/idl/.
Parse each IDL to obtain instruction and account lists.
Find Anchor crates in the repo by scanning Cargo.toml files for an anchor-lang dependency, then attempt to map each IDL to the best-matching crate:
- prefer crate with the same package name as idl.name,
- otherwise pick the crate with the largest overlap between IDL instruction names and functions discovered in the crate source.
For each mapped crate, the implementation concatenates src/*.rs into a single string (heuristic) and performs lightweight parsing that:
- searches functions for Context<...> (the parser scans for any Context<...> occurrence in a function’s parameter list) and extracts the last top-level generic as the Accounts struct name (e.g. Context<'_, '_, '_, 'info, Foo<'info>> → Foo),
- extracts #[derive(Accounts)] structs and aggregates stacked #[account(...)] attributes attached to fields,
- detects markers inside the #[account(...)] attributes: seeds = [...], has_one = ..., address = ..., constraint/constraints, SPL helpers like token::mint, associated_token::mint, mint::authority, and memory-related flags like space, realloc, realloc::zero,
- flattens IDL account trees (via flatten_accounts) to map IDL leaf account names to struct fields and then annotates table columns (Constrained, Seeded, Memory).
Produce recap-solazy.md containing one section per IDL/program and a Markdown table per program.

Output

File generated: recap-solazy.md (created in the current working directory).
The file contains one section per IDL/program with a Markdown table listing, for each instruction:
- Signers — accounts flagged as signers in the IDL
- Writable — accounts flagged writable / mut in the IDL
- Constrained — fields with has_one, address, owner, constraint(s) or recognized SPL attribute markers
- Seeded — fields using seeds = [...] (detected from #[account(...)])
- Memory — fields using space or realloc / realloc::zero

The output is intended as a quick-start audit report — readable, compact, and suitable for inclusion in initial findings.

Limitations & Notes (important)

Anchor-only: the command expects Anchor-style IDLs and an Anchor.toml project marker. Native Rust / Shank projects are not covered by this command.
Heuristic file handling: the tool concatenates src/*.rs as a quick heuristic (it does not perform Rust module resolution). This is fast and works for many repos, but can miss or mis-attribute items in projects that rely heavily on mod ...; file layout, pub(crate) scope tricks, or macros that generate the Accounts structs.
Text-based parsing: account/attribute detection is implemented with lightweight parsing / regex heuristics:
- it finds #[derive(Accounts)] and groups stacked #[account(...)] attributes,
- it searches inside attributes for tokens like seeds = [, has_one =, address =, constraint, space, realloc, and SPL shorthand forms (e.g. associated_token::mint = ...),
- these heuristics are fast but can produce false negatives on extremely exotic code constructs, unusual macro expansions, or heavily nested generics inside attributes.
Context detection: the function-mapper looks for any Context<...> usage in the fn parameters (qualified or unqualified).
IDL → crate mapping: mapping is best-effort: exact idl.name match preferred; otherwise instruction-name overlap is used. In multi-program monorepos this heuristic generally works but may need manual review for ambiguous cases.
Output filename is fixed: the current tool writes results to recap-solazy.md. Changing this behavior is a small code tweak if you prefer stdout or a configurable filename.

Example

# analyze current project and generate recap-solazy.md
cargo run -- recap
# or if compiled under solazy bin
solazy recap

# or analyze a specific project path
cargo run -- recap -d ../helium-program-library
# or if compiled under solazy bin
solazy recap -d ../helium-program-library

`sast` Command

The sast command performs Static Application Security Testing on Solana projects using a custom rule engine.
It parses the Rust source code, builds an AST, and applies Starlark-based rules to detect potential vulnerabilities or design patterns.

Usage

cargo run -- sast \
  --target-dir ./my_project \
  --rules-dir ./rules/ \
  --syn-scan-only

Arguments:

--target-dir: Path to the root of the Solana project.
--rules-dir: Directory containing .star rule files.
--syn-scan-only: If true, only perform syntactic scanning (no build required).

HIGHLY RECOMMENDED: Using the --release is wayyyyy faster, so if you don’t need debug logs, I’d recommend using it

How It Works

The SAST engine:

Parses all .rs files under the target project (Anchor or native SBF)
Builds a syn AST enriched with source spans
Loads all .star rule files from the provided rules directory
Applies the rules and collects any matches (vulnerabilities, code smells, patterns)

Rules are written in Starlark, making them:

Secure
Sandboxable
Easy to reason about

Rule File Example

load("syn_ast.star", "syn_ast")

RULE_METADATA = struct(
    name = "DangerousPanicUsage",
    author = "FuzzingLabs",
    version = "0.1",
    severity = "High",
    certainty = "High",
    description = "Detects usage of `panic!` in logic paths",
)

def syn_ast_rule(ast):
    return [node for node in ast if node["ident"] == "panic"]

Output

sol-azy prints result in a terminal table or as JSON.

Rule metadata
File names
Matches and associated spans (if available)

Example

cargo run -- sast \
  --target-dir test_cases/base_anchor/programs/base_anchor \
  --rules-dir ./rules/ \
  --syn-scan-only

Fetcher

The fetcher command allows you to retrieve the deployed bytecode of a Solana program and save it locally as fetched_program.so.

This is useful for performing offline analysis, reverse engineering, or static checks without relying on local source code or Solana toolchain.

Usage

cargo run -- fetcher \
  --program-id <PROGRAM_ID> \
  --out-dir <OUTPUT_DIR> \
  [--rpc-url <CUSTOM_RPC_ENDPOINT>]

--program-id: The Solana program ID to fetch.
--out-dir: Directory where the bytecode file will be saved (as fetched_program.so).
--rpc-url: (Optional) Custom Solana RPC endpoint. Defaults to https://api.mainnet-beta.solana.com.

Behavior

Checks if the output directory exists (if not it creates the folder).
Validates the program exists on-chain and is executable.
Writes the bytecode to the specified directory.
Logs the output file path & the RPC used, including when default is applied.

Example

cargo run -- fetcher \
  --program-id srmqPvymJeFKQ4zGQed1GFppgkRHL9kaELCbyksJtPX \
  --out-dir ./out

This will fetch the bytecode of the program and save it to ./out/fetched_program.so.

How does it works?

Data Accounts vs Executable Accounts

On Solana every account is just a blob of bytes, but the runtime sets one special flag:

Flag `executable`	Typical content	File saved by fetcher
`true`	BPF byte-code of a program (optionally behind an Upgradeable Loader “Program → ProgramData” indirection).	`fetched_program.so`
`false`	Arbitrary user-defined state: SPL token mints, AMM pools, governance realms, Anchor structs, sysvars, …	`fetched_account.bin`

fetcher detects this flag automatically:

If the account is executable it resolves the ProgramData pointer (when present), trims everything before the ELF header, then writes a clean shared object.
Otherwise it dumps the raw data unchanged.

If you want to look at the code, there are unit tests that illustrate both paths:

test_fetch_executable fetches the Serum DEX v3 program and asserts the ELF header is present.
test_fetch_non_executable_sysvar fetches the Sysvar Rent account and checks its 17-byte layout.
test_anchor_discriminator_for_onchain_account_info fetches the Marinade State PDA (a non-executable Anchor account) and verifies its first 8 bytes match the Anchor discriminator for the struct State.

Why the first 8 bytes matter (Anchor discriminator)

Anchor-based programs prefix every account with a discriminator:

discriminator = sha256("account:<StructName>")[..8]

Those 8 bytes uniquely identify the struct on-chain. fetcher already prints them for any data account it downloads. In a future version we’ll reverse-map the discriminator to the struct name whenever the hash matches a known Anchor IDL, giving you an instant hint such as:

[fetcher] First 8 bytes (possible Anchor discriminator): 0xd8926b... -> looks like "State" struct name

This automatic recognition will only be possible for accounts that follow the Anchor convention; plain Borsh-only projects will continue to appear as raw bytes.

With these distinctions in mind you can:

Pull down byte-code for offline disassembly (.so).
Snapshot any on-chain state for local inspection or unit-test fixtures (.bin).
Potentially confirm whether a PDA is an Anchor account and which struct it represents. (WIP)

`reverse` Command

The reverse command performs static reverse engineering on compiled Solana eBPF bytecode (.so files).
It supports disassembly, control flow graph (CFG) generation, and immediate data inspection.

Usage

cargo run -- reverse \
  --mode both \
  --out-dir ./out/ \
  --bytecodes-file ./bytecodes/program.so \
  --labeling \
  --reduced \
  --only-entrypoint

Arguments:

--mode: Output mode. One of:
- disass: Disassemble the bytecode
- cfg: Export control flow graph
- both: Disassemble and export CFG
--out-dir: Output directory for result files.
--bytecodes-file: Path to the compiled .so file.
--labeling: Enables use of symbol and section labels when available.
--reduced: (Optional) Excludes functions defined before the entrypoint (often library or startup code).
--only-entrypoint: (Optional) Only generates the CFG for the entrypoint function, allowing custom extension via dotting.

Modes

Mode	Description	Output Files
`disass`	Disassembles bytecode and extracts immediates	`disassembly.out`, `immediate_data_table.out`
`cfg`	Builds a `.dot` graph from instruction flow	`cfg.dot`
`both`	Performs both operations	All of the above

Output Files

Depending on the selected mode and options, the following files may be generated in --out-dir:

disassembly.out: Human-readable disassembly of eBPF instructions
immediate_data_table.out: Table of .rodata strings and constants
cfg.dot: Full control flow graph

You can visualize .dot files using tools like:

dot -Tpng cfg.dot -o cfg.png
xdot cfg.dot

⚠️ For very large programs, even the --reduced version of the CFG can take significant time to generate due to the size and complexity of the bytecode being analyzed and rendered by dot.

Example

cargo run -- reverse \
  --mode both \
  --out-dir test_cases/base_sbf_addition_checker/out1/ \
  --bytecodes-file test_cases/base_sbf_addition_checker/bytecodes/addition_checker.so \
  --labeling \
  --reduced

This command will disassemble the program and generate reduced CFG.

Advanced Use Case

If using --only-entrypoint, sol-azy will generate a minimal CFG with only the entrypoint's subgraph. You can later extend this graph manually using dotting with a JSON list of function clusters to add.

`ast-utils` Command

The ast-utils command generates and displays the Abstract Syntax Tree (AST) representation of Rust source files in JSON format.
It uses the syn crate to parse Rust code and syn-serde for JSON serialization.

Usage

cargo run -- ast-utils --file-path ./src/main.rs

Arguments:

--file-path (or ): Path to the Rust source file to parse -f

Behavior

The tool performs the following operations:

File Reading: Reads the specified Rust source file
AST Parsing: Uses syn::parse_file() to generate the AST
JSON Output: Converts the AST to pretty-printed JSON using syn-serde

The output includes detailed structural information about:

Function definitions
Struct and enum declarations
Import statements
Type definitions
Expression trees
And all other Rust language constructs

Output

The command outputs a JSON representation of the AST directly to stdout. The JSON contains:

Structural Information: Complete syntax tree with all language constructs
Pretty Formatting: Human-readable JSON with proper indentation
Comprehensive Details: All tokens, spans, and syntactic elements

Example

cargo run -- ast-utils --file-path examples/simple_program.rs

This would output something like:

{
  "shebang": null,
  "attrs": [],
  "items": [
    {
      "Fn": {
        "attrs": [],
        "vis": {
          "Public": {
            "pub_token": {
              "span": {
                "start": 0,
                "end": 3
              }
            }
          }
        },
        "sig": {
          "constness": null,
          "asyncness": null,
          "unsafety": null,
          "abi": null,
          "fn_token": {
            "span": {
              "start": 4,
              "end": 6
            }
          },
          "ident": "main",
          // ... more AST structure
        }
      }
    }
  ]
}

Use Cases

Code Analysis: Understanding the structure of Rust source code
Tooling Development: Building custom analysis tools that work with Rust AST
Educational: Learning about Rust's syntax tree representation
Debugging: Inspecting how the compiler parses your code

SAST — Static analysis that operates on similar AST structures
Build — Compiles the source files that can be analyzed with ast-utils

`recap` — What it produces and How to read it

The recap module builds an audit-friendly snapshot of each program in an Anchor project. For every instruction, it emits a compact Markdown table with six columns:

| Instruction | Signers | Writable | Constrained | Seeded | Memory |

This section explains what each column means, how values are derived, and how to interpret the tags you’ll see inside Constrained and Memory.

See Columns

See Constraints

See Tips & Example

See CLI & How it works

Limitations

Anchor-focused: uses Anchor IDLs and attributes. Native Rust/Shank not (yet) covered.
Heuristic parsing: attributes are parsed from source text; exotic macro expansions or unusual patterns may not be detected.
File layout: analysis aggregates src/*.rs; complex module layouts may occasionally hide or duplicate definitions until deeper parsing is added.

Columns

For every instruction, it emits a compact Markdown table with six columns:

| Instruction | Signers | Writable | Constrained | Seeded | Memory |

This page explains what each column means.

Details

Instruction

The instruction name as declared in the IDL (e.g., distribute_v0, initialize_fanout_v0).
Source of truth: IDL.

Signers

Accounts that must sign the transaction for this instruction.
Comes from the IDL flags (signer / isSigner).
Audit cues:
- Unexpected sensitive signers (e.g., PDAs showing up as signers) are red flags.
- Privileged-path mapping: confirm that “owner” or “admin” signers align with the spec.

Writable

Accounts marked as writable/mutable.
Comes from the IDL flags (writable / isMut), flattened even if IDL uses nested account groups.
Audit cues:
- Writable + signer is a powerful combo; ensure it’s necessary.
- High-value state (treasuries, config, counters) being writable should be justified.

Constrained

Fields from the #[account(...)] attributes that impose relationships or safety checks.
Shows as field(tag1,tag2,...), separated by semicolons for multiple fields.
Examples:
- fanout(has_one) — field has a has_one = ... constraint
- receipt_account(constraint,spl) — field has an explicit constraint = ... and SPL helper constraints
- sysvar_instructions(address) — explicit address = ... on a sysvar
Audit cues:
- More constraints generally means tighter coupling and safer invariants.
- Match constrained fields to business rules (who should “own” what, which mint pairs with which ATA, etc.).

Seeded

Accounts that are Program-Derived Addresses (PDAs) with explicit seeds = [...] in attributes.
Lists the field names (e.g., voucher, collection, lazy_signer).
Audit cues:
- PDA derivations reveal authorization schemes. Verify that seeds include both program constants and caller-specific data where appropriate.
- Cross-check against CPIs that expect a signer PDA derived by these seeds.

Memory

Per-field memory management annotations found in #[account(...)]:
- space — hardcoded allocation size at init
- realloc — post-init reallocation
- realloc::zero — reallocation with zero-init
Shows as field(space) / field(realloc) / field(realloc::zero).
Audit cues:
- space: confirm it matches the struct + any dynamic payload; off-by-one or growth vectors matter.
- realloc: ensure rent, zeroing, and access control around growth are handled to avoid state smuggling.

How values are derived (at a glance)

Signers / Writable: read from the IDL per instruction (supports nested account groups via flattening).
Constrained: parsed from #[account(...)] attributes on the associated #[derive(Accounts)] struct for the instruction:
- has_one, address, constraint
- owner constraint
- SPL helpers (grouped as spl)
Seeded: presence of seeds = [ ... ] in the same attributes.
Memory: presence of space = ..., realloc = ..., or realloc::zero.

Note: constraints are attached to the field names, and only shown for fields that also appear in the instruction’s IDL account list (to avoid surfacing unrelated context fields).

Constraint & Helper Tags

Inside Constrained column, tags are attached to the field name to indicate which checks or helper macros apply. These are the tags you’ll see:

has_one From has_one = <field> constraints. Ensures the current account’s stored pubkey equals another account’s key.
Use: Ownership/binding check.
address From address = <expr> constraints. Pins the account to a specific known address (often sysvars).
Use: Ensure the provided account is exactly one expected address.
constraint From constraint = <predicate>. Free-form runtime conditions the program checks before continuing.
Use: Anything from balance thresholds to relationship checks not covered by has_one.
owner From owner = <Program> constraints in the attribute. Enforces the owning program for an account (e.g., Token Program).
Use: Ensure account is controlled by the right program (critical for token/state integrity).
spl Detected when SPL helper attributes are present on the field, e.g.:
- associated_token::mint = ...
- associated_token::authority = ...
- token::mint = ..., token::authority = ...
- mint::authority = ..., mint::freeze_authority = ..., mint::decimals = ...
  Use: These helpers encode common SPL invariants; presence of spl implies ATAs and token relationships are being checked declaratively.

Formatting reminder: field(tag1,tag2) indicates multiple constraint types apply to the same field.

See more about constraint here

Tips & Example

Reading the table like an Auditor

Use columns together to quickly spot intent and risk. All facts below come directly from the table; any inference is explicitly marked.

Privilege map
- Cross-check Signers vs Writable. A signer writing to critical state is normal, but deserves a closer look.
- If an instruction has many writables but few constraints, flag for manual review.
Authorization evidence
- In Constrained, the presence of has_one, address, constraint (or spl shorthands) indicates binding between accounts.
- If a Writable account has no has_one/address/constraint and is not Seeded, that’s a probable weak binding.
Seed usage
- Seeded lists PDAs with seeds = [...].
- If a Seeded PDA is also used as a signer in CPIs (you’ll only see this by reading code; recap itself doesn’t parse CPI signers), it’s likely the program signs with with_signer; verify the seeds match the intended authority scope.
SPL wiring
- spl in Constrained tells you the field uses SPL shorthands (e.g., associated_token::mint, token::authority, mint::authority).
- This usually reduces misbinding risk, but still review who controls those authorities.
Memory lifecycle
- Memory shows space, realloc, realloc::zero on fields. Any realloc likely implies the account can change size; verify who can trigger it and whether zeroing/rent logic is present in code.

The table is a map, not the territory. Treat it as a starting shortlist for manual code review.

Examples (based on truncated Helium protocol's output)

`fanout` (excerpt)


| Instruction   | Signers                  | Writable                                                                                                                  | Constrained                                                                                        | Seeded                                                                            | Memory         |
| ------------- | ------------------------ | ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | -------------- |
| distribute_v0 | payer                    | fanout, payer, to_account, token_account, voucher                                                                         | fanout(has_one); receipt_account(constraint,spl); to_account(spl); voucher(has_one)                | voucher                                                                           | —              |
| stake_v0      | payer, staker            | collection_metadata, fanout, from_account, master_edition, metadata, mint, payer, receipt_account, stake_account, voucher | fanout(has_one); from_account(spl); mint(constraint,spl); receipt_account(spl); stake_account(spl) | collection_master_edition, collection_metadata, master_edition, metadata, voucher | voucher(space) |

How to read (facts + cautious inferences):

stake_v0:
- Fact: there are two signers (payer, staker).
- Fact: several token-related writables use spl shorthands (from_account, stake_account, receipt_account, mint).
- Fact: fanout shows has_one; multiple PDAs appear in Seeded (e.g., voucher, collection-related PDAs).
- Likely implication: the action depends on both user funding (payer) and staking authority (staker). Verify in code which signer is actually used for authority gates and CPIs; ensure the SPL account authorities match the intended signer(s).
distribute_v0 (for contrast):
- Fact: single signer (payer) with SPL-bound accounts and voucher(has_one).
- Likely implication: distribution flow is scoped via has_one and SPL wiring; review any CPI that signs with program PDAs to confirm seed coverage.

`lazy_transactions` (excerpt)


| Instruction                     | Signers | Writable                                                     | Constrained                                                       | Seeded             | Memory                   |
| ------------------------------- | ------- | ------------------------------------------------------------ | ----------------------------------------------------------------- | ------------------ | ------------------------ |
| initialize_lazy_transactions_v0 | payer   | canopy, executed_transactions, lazy_transactions, payer      | canopy(owner,constraint); executed_transactions(owner,constraint) | lazy_transactions  | lazy_transactions(space) |
| execute_transaction_v0          | payer   | executed_transactions, lazy_signer, lazy_transactions, payer | block(constraint); lazy_transactions(has_one,constraint)          | block, lazy_signer | —                        |

How to read (facts + cautious inferences):

initialize_lazy_transactions_v0:
- Fact: lazy_transactions is Seeded and allocated (space).
- Likely implication: central state initialized here; check later instructions for writes or possible size changes.
execute_transaction_v0:
- Fact: lazy_transactions carries both has_one and constraint.
- Likely implication: strong binding; still verify constraint logic (e.g., balance-vs-amount checks, signer-vs-owner checks).

Practical Tips for security researcher

Writables without bindings: For each row, compare Writable with Constrained/Seeded. If an account is writable but neither seeded nor constrained, mark it for manual review (probable weak binding).
Signer misuse: If a Signer is present but the writable target has no binding, watch for confused-deputy style flows (probable mis-authorization).
SPL edges: spl tags are good signs, but confirm that the authority in token::authority / mint::authority is the intended one (not user-controlled across instructions).
Memory growth: Any realloc in Memory should correlate with strong constraints (has_one/address/predicate). If not present in the table, probable risk: size-controlled DoS or storage smuggling.
PDA spoofing attempts: For any Seeded PDA that later signs in CPIs (you’ll see it in code, not the table), try seed-collision or context-mismatch angles (e.g., same seeds across tenants). If seeds include user-controlled data, test cross-instance manipulation.
Token misrouting: When Writable includes token accounts without spl or has_one, you can probably attempt deposit/withdrawal to user-owned ATAs if constraints allow it.

Static Analysis

sol-azy includes a flexible static analysis engine designed to scan Solana Rust source code for programs vulnerabilities, code smells, or user-defined patterns.

This engine leverages the Starlark language to express detection logic in .star files, and operates directly on the parsed Rust Abstract Syntax Tree (AST).

Key Concepts

AST-Based: Operates purely on the Rust syntax tree using the [syn] crate — no type inference or semantic resolution is performed. (We're working on a future MIR...)
Declarative Rules: Users write .star scripts to describe what they want to detect.
Safe & Sandboxed: Rules are evaluated inside a restricted Starlark runtime.

Rule Engine Capabilities

The rule engine gives you access to:

Node inspection (e.g. calls, structs, attributes, visibility)
Parent-child relationships in AST
Span and file location tracking
Metadata enrichment (severity, certainty, etc.)
JSON-compatible output for integration

Use Cases

Anchor account declaration validation
Detection of unsafe CPI (Cross Program Invocation)
Missing signer or owner checks
Misuse of invoke_signed or unchecked sysvars
Custom security checks during CI

📘 How to write Rules
✅ Use case example

Note

The sast engineering core in sol-azy is based on the excellent open-source project
radar by Auditware).

We've been heavily inspired by their approach and wanted a standalone binary capable of it.

Rule Format

sol-azy allows developers and auditors to write custom static analysis rules using the Starlark language — a Python-like configuration language used by projects like Bazel and Buck/Buck2 (Buck2 docs).

These rules are evaluated against the Rust AST (Abstract Syntax Tree) of a Solana program, enabling precise pattern matching to detect vulnerabilities or code smells.

Rule File Structure

A valid rule file is a .star script containing two main parts:

RULE_METADATA — a dictionary with basic info
syn_ast_rule(root) — the entrypoint function, called on each parsed file

RULE_METADATA = {
    "version": "0.1.0",
    "author": "your-name",
    "name": "Rule Name",
    "severity": "Low" | "Medium" | "High" | "Critical",
    "certainty": "Low" | "Medium" | "High",
    "description": "What the rule checks for"
}

Example Rule: Arbitrary CPI

RULE_METADATA = {
    "version": "0.1.0",
    "author": "forefy",
    "name": "Arbitrary Cross-Program Invocation",
    "severity": "Medium",
    "certainty": "Medium",
    "description": "Detects CPIs made to arbitrary or unchecked program IDs."
}

def syn_ast_rule(root: dict) -> list[dict]:
    matches = []
    raw_nodes = syn_ast.find_raw_nodes(root)
    for sink in raw_nodes:
        if template_manager.is_matching_template_by_key(sink, "CALL_FN_SOLANAPROGRAM_PROGRAM_INVOKE") and not template_manager.is_matching_template_by_key(sink, "CHECK_SPLTOKEN_ID_CTX_ACCOUNT_AUTHORITY_KEY"):
            matches.append(syn_ast.to_result(sink))
    return matches

Execution Flow

When sol-azy runs a rule:

It parses the source code into an AST
Converts it to JSON
Passes it as the root parameter to syn_ast_rule
The rule inspects the tree using helper functions (typically, here the syn_ast.find_raw_nodes() here is used to gather each function independently)
Any result added to matches is reported

Helper Libraries

sol-azy ships with built-in sol-azy helpers (written in Starlark):

src/static/starlark_libs/
├── syn_ast.star            # AST navigation utilities
└── template_manager.star   # Match against common templates

These can be imported and used in any rule. Examples:

raw_nodes = syn_ast.find_raw_nodes(root)
template_manager.is_matching_template_by_key(node, "CHECK_INSTRUCTION_DISCRIMINATOR")

📌 Note: The template_manager logic enables reusable pattern detection (documented in Templates).

Writing New Rules

To create a new rule:

Create a .star file in your rules directory
Define RULE_METADATA and syn_ast_rule(...)
Use cargo run -- sast ... to apply the rule

Documentation

What Are Templates?

Templates in sol-azy are reusable pattern matchers designed to identify specific AST fragments in Rust source code. They allow users to describe common logic constructs in a simple, declarative way, and can be used as building blocks in Starlark rules or during static pattern matching.

They are especially useful when:

A pattern appears frequently (e.g., ctx.accounts.authority.is_signer)
You want to simplify rule definitions by abstracting repetitive AST shapes

Template Anatomy

Each template includes:

A pattern: describing a shape to find in the AST, using simplified idents (Rust path segments)
A priority_rule: used to guide traversal and maintain node order consistency

Example Template

TEMPLATES["CHECK_CTX_ACCOUNT_AUTHORITY_KEY_TOKEN_OWNER"] = {
    "pattern": {
        "cond": {
            "binary": {
                "left": {"idents": ["ctx", "accounts", "authority", "key"]},
                "op": "!=",
                "right": {"idents": ["token", "owner"]},
            }
        }
    },
    "priority_rule": ["left", "op", "right"],
}

This matches AST code like:

#![allow(unused)]
fn main() {
if ctx.accounts.authority.key != &token.owner
}

Each template defines a shallow structural pattern over AST nodes of maximum depth 3.

For now, a maximum depth of 3 levels in AST node matching has proven sufficient, but deeper recursive pattern support could be added in the future if needed, directly within the template_manager.star logic

Fields Used

`idents`

Lists of identifier segments (e.g., ["ctx", "accounts", "authority", "key"]) that are matched in order, exactly.

`method`

Optional field for method calls like .key().

`op`

Operator such as "==" or "!=".

`macro`, `call`, `unary`, `binary`, `field`

These correspond to the Rust AST node types (extracted from syn) that can be matched.

`priority_rule`

The priority_rule defines the traversal order of keys inside a pattern node. It ensures that, during linearization of the AST, the relevant fields are matched in the correct order, especially in constructs like:

"priority_rule": ["left", "op", "right"]

This guarantees consistent matching across pattern instances.

Wildcard Support

You can use a wildcard * in the idents list to match any one identifier. For example:

"idents": ["ctx", "accounts", "*"]

...matches any field under ctx.accounts, such as ctx.accounts.user_a.

Dynamic Template Creation

For convenience, you can add generator for classical templates programmatically, this one is an example:

def generate_call_fn_template(*idents):
    return {
        "pattern": {
            "call": {
                "args": "",  # ignored for now
                "func": {"idents": idents},
            }
        },
        "priority_rule": ["func", "args"],
    }

This allows you to match function calls like:

#![allow(unused)]
fn main() {
solana_program::program::invoke(...)
}

and by using the dynamic generation with generate_call_fn_template("solana_program", "program", "invoke"), you don't have to manually write a full template.

Template Testing

The folder test_starlark_condition_template/ contains a test.py script that acts as a an example place for templates.

It defines AST snippets and verifies that each pattern matches correctly:

    # if
    assert is_matching_template_by_key(AST, "CHECK_CTX_ACCOUNT_AUTHORITY_KEY_TOKEN_OWNER")
    assert is_matching_template_by_key(AST2, "CHECK_SPLTOKEN_ID_CTX_ACCOUNT_AUTHORITY_KEY")
    assert is_matching_template_by_key(AST3, "CHECK_NOT_CTX_ACCOUNTS_AUTHORITY_ISSIGNER")
    assert is_matching_template_by_key(AST4, "CHECK_CTX_ACCOUNTS_WILDCARD_KEY_EQ")

    # require
    assert is_matching_template_by_key(AST5, "REQUIRE_CTX_ACCOUNTS_RENT_KEY_SYSVAR_RENT_ID")

    # called function
    assert is_matching_template_by_key(AST2, "CALL_FN_SOLANAPROGRAM_PROGRAM_INVOKE")

    # dynamic template
    assert is_matching_template(AST2, generate_call_fn_template("solana_program", "program", "invoke"))

Summary of supported pattern types already implemented (can easily be extended)

Templates can express:

Binary comparisons (==, !=)
Unary operations (!some_flag)
Field access (ctx.accounts.x.is_signer)
Macro calls (require_eq!(...))
Method calls (ctx.accounts.user_a.key())
Function calls (example: solana_program::program::invoke(...))

They support wildcards like "*" to generalize over certain path segments.

Usage in Rules

Templates are often used within .star files like this:

if template_manager.is_matching_template_by_key(node, "CHECK_CTX_ACCOUNT_AUTHORITY_KEY_TOKEN_OWNER"):
    continue # continue the loop over all nodes, the check is there, so it's probably not vuln

This allows auditors or developers to write rules that check for high-level semantic conditions without diving into low-level AST fields every time.

Advantages

🔁 Reusability: templates can be applied across multiple rules
🔍 Precision: match deeply nested expressions in structured order
🔧 Extensibility: you can write custom templates without editing core logic

Example

🔍 Use Case: Arbitrary CPI Detection with Templates

Consider the following Rust function:

#![allow(unused)]
fn main() {
pub fn cpi_insecure(ctx: Context<Cpi>, amount: u64) -> ProgramResult {
    solana_program::program::invoke(
        &spl_token::instruction::transfer(
            ctx.accounts.token_program.key,
            ctx.accounts.source.key,
            ctx.accounts.destination.key,
            ctx.accounts.authority.key,
            &[],
            amount,
        )?,
        &[
            ctx.accounts.source.clone(),
            ctx.accounts.destination.clone(),
            ctx.accounts.authority.clone(),
        ],
    )
}
}

This function performs a Cross-Program Invocation (CPI) without validating that ctx.accounts.token_program.key matches the expected program (i.e., spl_token::ID). This is exactly the kind of vulnerability we want to detect using a template like this:

TEMPLATES["CALL_FN_SOLANAPROGRAM_PROGRAM_INVOKE"] = {
    "pattern": {
        "call": {
            "args": "",
            "func": {"idents": ["solana_program", "program", "invoke"]},
        }
    },
    "priority_rule": ["func", "args"],
}

What the Template Does

This template matches any call to solana_program::program::invoke(...), regardless of its arguments. It uses the idents field to match the identifier path in the AST and priority_rule to specify the matching order.

Detection

So the full vulnerable code is:

#![allow(unused)]
fn main() {
use anchor_lang::prelude::*;
use anchor_lang::solana_program;

declare_id!("Fg6PaFpoGXkYsidMpWTK6W2BeZ7FEfcYkg476zPFsLnS");

pub mod arbitrary_cpi_secure2 {
    use super::*;
    pub fn cpi_secure2(ctx: Context<Cpi>, amount: u64) -> ProgramResult {
        if &spl_token::ID != ctx.accounts.token_program.key {
            return Err(ProgramError::IncorrectProgramId);
        }
        solana_program::program::invoke(
            &spl_token::instruction::transfer(
                ctx.accounts.token_program.key,
                ctx.accounts.source.key,
                ctx.accounts.destination.key,
                ctx.accounts.authority.key,
                &[],
                amount,
            )?,
            &[
                ctx.accounts.source.clone(),
                ctx.accounts.destination.clone(),
                ctx.accounts.authority.clone(),
            ],
        )
    }

    pub fn cpi_insecure2(ctx: Context<Cpi>, amount: u64) -> ProgramResult {
        solana_program::program::invoke(
            &spl_token::instruction::transfer(
                ctx.accounts.token_program.key,
                ctx.accounts.source.key,
                ctx.accounts.destination.key,
                ctx.accounts.authority.key,
                &[],
                amount,
            )?,
            &[
                ctx.accounts.source.clone(),
                ctx.accounts.destination.clone(),
                ctx.accounts.authority.clone(),
            ],
        )
    }
}

#[program]
pub mod arbitrary_cpi_secure {
    use super::*;

    pub fn cpi_secure(ctx: Context<Cpi>, amount: u64) -> ProgramResult {
        if &spl_token::ID != ctx.accounts.token_program.key {
            return Err(ProgramError::IncorrectProgramId);
        }
        solana_program::program::invoke(
            &spl_token::instruction::transfer(
                ctx.accounts.token_program.key,
                ctx.accounts.source.key,
                ctx.accounts.destination.key,
                ctx.accounts.authority.key,
                &[],
                amount,
            )?,
            &[
                ctx.accounts.source.clone(),
                ctx.accounts.destination.clone(),
                ctx.accounts.authority.clone(),
            ],
        )
    }

    pub fn cpi_insecure(ctx: Context<Cpi>, amount: u64) -> ProgramResult {
        solana_program::program::invoke(
            &spl_token::instruction::transfer(
                ctx.accounts.token_program.key,
                ctx.accounts.source.key,
                ctx.accounts.destination.key,
                ctx.accounts.authority.key,
                &[],
                amount,
            )?,
            &[
                ctx.accounts.source.clone(),
                ctx.accounts.destination.clone(),
                ctx.accounts.authority.clone(),
            ],
        )
    }
}

#[derive(Accounts)]
pub struct Cpi<'info> {
    source: AccountInfo<'info>,
    destination: AccountInfo<'info>,
    authority: AccountInfo<'info>,
    token_program: AccountInfo<'info>,
}
}

When we run:

cargo run --release -- \
  sast \
  --target-dir ../SolanaPlayground/sealevel-attacks/ \
  --rules-dir rules/syn_ast

alt text

sol-azy successfully detects three instances of the vulnerability, reported as:

../SolanaPlayground/sealevel-attacks//programs/5-arbitrary-cpi/secure/src/lib.rs:73:11
../SolanaPlayground/sealevel-attacks//programs/5-arbitrary-cpi/secure/src/lib.rs:29:11
../SolanaPlayground/sealevel-attacks//programs/5-arbitrary-cpi/insecure/src/lib.rs:10:11

Two are found in the secure/ module (the source code provided above), and one in the insecure/ module, which defines the same logic in a separate file with 1 vulnerable function.

Summary

Thanks to this template-based pattern matcher, sol-azy is able to:

Statistically identify unvalidated CPI targets,
Highlight affected source locations with precision,
Save analysts from manually combing through source code.

💡 Tip: You can dynamically create templates like this using generate_call_fn_template(...) to reduce duplication. For instance:
is_matching_template(AST, generate_call_fn_template("solana_program", "program", "invoke"))
This matches the same call pattern with minimal boilerplate.

Starlark Libraries References

AST Node Structure

An AST node represents a single element in the Abstract Syntax Tree with a standardized structure. Each node contains: a raw_node with the original AST data, an access_path string representing the node's location in the tree, metadata for additional information like position and mutability flags, children and parent references for tree navigation, ident for the node's identifier, and optional fields like args for function arguments and root to indicate if it's a root node. This consistent structure facilitates tree traversal and pattern matching operations throughout the codebase.

{
    "raw_node": {},   # Original AST data from parser
    "access_path": "",# Path string showing location in tree (e.g. "root.expr.binary.left")
    "metadata": {},   # Additional data like position and mutability flags
    "children": [],   # List of child nodes
    "parent": {},     # Reference to parent node
    "root": False,    # Boolean indicating if this is a root node
    "args": [],       # Function arguments (if applicable)
    "ident": ""       # Node identifier/name
}

Syn AST Utilities

The Syn AST utilities module provides functions for working with Rust's syntactic ASTs, particularly for security analysis.

Core Components

Constants

EMPTY_ACCESS_PATH, EMPTY_IDENT, EMPTY_METADATA, EMPTY_NODE: Default values for empty nodes

Node Management

new_ast_node(syn_ast_node, metadata, access_path): Creates a new AST node
ast_node_add_child(node, child): Adds a child to an AST node
ast_node_add_children(node, children): Adds multiple children to an AST node
to_result(node): Converts a node to a result format
filter_result(result): Filters duplicate results

Tree Traversal

traverse_tree(node, collector): Traverses a tree with a collector function
flatten_tree(root): Flattens a tree into a list of nodes
first(nodes): Returns the first node from a list

Node Finding Functions

find_by_child(self, child_ident): Finds nodes by child identifier
find_chained_calls(self, *idents): Finds chained method calls
find_macro_attribute_by_names(self, *idents): Finds macro attributes by name
find_by_similar_access_path(self, access_path, stop_keyword): Finds nodes with similar access paths
find_comparisons(self, ident1, ident2): Finds comparisons between two identifiers
find_comparison_to_any(self, ident): Finds comparisons involving a specific identifier
find_functions_by_names(self, *function_names): Finds functions by name
find_by_names(self, *idents): Finds nodes by identifier names
find_method_calls(self, caller, method): Finds method calls on a specific caller
find_assignments(self, ident, value_ident): Finds assignment operations
find_mutables(self): Finds mutable variables
find_account_typed_nodes(self, ident): Finds account-typed nodes
find_member_accesses(self, ident): Finds member accesses for a specific identifier

AST Preparation

find_ident_src_node(sub_data, sub_access_path, metadata): Finds identifier source nodes
find_fn_names(node): Extracts function names from an AST
find_raw_nodes_by_fn_names(node, func_names): Finds raw nodes by function names
find_raw_nodes(ast): Finds all raw nodes in an AST
prepare_syn_ast(ast, access_path, parent): Prepares a Syn AST for analysis
prepare_ast(ast): Main function to prepare an AST for analysis

Usage Examples

For finding specific code patterns:

This documentation provides an overview of the functionality available in these Starlark libraries, which are designed for security analysis of Solana programs by detecting specific code patterns in their AST representations.

Reverse Engineering

sol-azy provides a reverse engineering module tailored for Solana programs compiled to eBPF.
It allows you to disassemble .so binaries, extract control flow, and track embedded immediate data.

This tooling is especially useful for:

Security researchers auditing deployed programs
Developers understanding bytecode behavior
Anyone comparing compiled output to source logic

Features

Disassembler: Converts raw bytecode into human-readable SBPF instructions + Rust-like comparisons
Control Flow Graph: Generates .dot files representing program structure
Immediate Tracker: Resolves strings or data loaded from .rodata

Each of these features is accessible through the reverse CLI command.

Input

The reverse engine operates on compiled Solana .so files, typically generated by:

anchor build
# or
cargo build-sbf

You pass the .so file using --bytecodes-file.

Output

Depending on the selected mode, sol-azy produces one or more of the following:

File	Description
`disassembly.out`	Instruction-by-instruction disassembly
`immediate_data_table.out`	Extracted strings or data from RODATA
`cfg.dot`	Control flow graph (Graphviz-compatible)

You can visualize cfg.dot with:

dot -Tpng cfg.dot -o cfg.png

Subsections

To dive deeper into how reverse analysis works in sol-azy:

Usage Example

cargo run -- reverse \
  --mode both \
  --out-dir ./out/ \
  --bytecodes-file ./bytecodes/program.so \
  --labeling

Compatibility

Supports .so files compiled using Solana's official toolchain
Compatible with both Anchor and native SBF programs
Works on programs targeting solana_rbpf / solana_sbpf

Note

The reverse engineering core in sol-azy is based on the excellent open-source project
sbpf-solana by Anza (anza-xyz).

We have modified and extended its disassembly and control flow analysis logic to better fit sol-azy’s needs, especially for static audits, immediate tracking, and custom export formats.

Reverse Overview

This section explains how sol-azy performs static reverse engineering on Solana programs compiled to SBF.

The reverse module combines disassembly, control flow analysis, and memory inspection, using a customized static analysis engine adapted from sbpf-solana.

How It Works

ELF Parsing

sol-azy loads the .so bytecode using Solana’s Executable abstraction (from solana_rbpf), which parses the ELF and loads its segments (e.g., .text, .rodata).
Instruction Analysis

Using the Analysis struct from sbpf-solana, the tool walks through all valid instruction addresses, building:
- A disassembled instruction list
- Basic block boundaries
- Cross-references and destination mappings
Immediate Tracking

When LD_DW_IMM instructions reference MM_RODATA, sol-azy tries to:
- Interpret the referenced memory slice
- Associate it with a MOV64_IMM or MOV32_IMM defining its length
- Format the result as a printable string (e.g., b"hello world")
Graph Generation

For control flow graphs, each basic block becomes a node in a .dot file, with edges linking jumps, calls, and returns.

Internal Components

ImmediateTracker: Tracks memory ranges referenced by LD_DW_IMM
get_string_repr: Converts slices from .rodata into readable strings
export_cfg_to_dot: Emits Graphviz-compatible control flow graphs
disassemble_wrapper: Main entrypoint for disassembly + data extraction

ReverseOutputMode

The CLI dispatches different logic depending on this enum:

#![allow(unused)]
fn main() {
pub enum ReverseOutputMode {
    Disassembly(String),
    ControlFlowGraph(String),
    DisassemblyAndCFG(String),
}
}

Example Workflow (Recap)

cargo run -- reverse \
  --mode both \
  --out-dir ./out/ \
  --bytecodes-file ./bytecodes/program.so \
  --labeling

Disassembly

sol-azy statically disassembles compiled Solana eBPF programs into a readable, instruction-by-instruction view.
This view is enhanced with immediate data decoding, especially for strings loaded from .rodata.

Overview

The disassembly engine in sol-azy builds upon sbpf-solana's instruction decoder.
It adds layers of audit-focused context by:

Labeling basic blocks (e.g., lbb_42)
Resolving immediate values from .rodata
Emitting annotated output into disassembly.out
Adding Rust-like comparison for better understanding

Example

Here’s a disassembly snippet produced by sol-azy:

entrypoint:
    mov64 r2, r1                                    r2 = r1
    mov64 r1, r10                                   r1 = r10
    add64 r1, -96                                   r1 += -96   ///  r1 = r1.wrapping_add(-96 as i32 as i64 as u64)
    call function_308                       
    ldxdw r7, [r10-0x48]                    
    ldxdw r8, [r10-0x58]                    
    ldxdw r1, [r10-0x38]                    
    mov64 r2, 8                                     r2 = 8 as i32 as i64 as u64
    jgt r2, r1, lbb_91                              if r2 > r1 { pc += 79 }
    ldxdw r1, [r10-0x40]                    
    ldxw r2, [r1+0x0]                       
    stxw [r10-0xa8], r2                     
    ldxw r1, [r1+0x4]                       
    stxw [r10-0xa4], r1                     
    mov64 r1, 0                                     r1 = 0 as i32 as i64 as u64
    stxdw [r10-0x40], r1                    
    lddw r1, 0x100004610 --> b"\x00\x00\x00\x00\xd0C\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00…        r1 load str located at 4294985232
    stxdw [r10-0x60], r1                    
    mov64 r1, 2                                     r1 = 2 as i32 as i64 as u64
    stxdw [r10-0x58], r1                    
    stxdw [r10-0x48], r1                    
    mov64 r1, r10                                   r1 = r10
    add64 r1, -136                                  r1 += -136   ///  r1 = r1.wrapping_add(-136 as i32 as i64 as u64)
    stxdw [r10-0x50], r1                    
    mov64 r1, r10                                   r1 = r10
    add64 r1, -164                                  r1 += -164   ///  r1 = r1.wrapping_add(-164 as i32 as i64 as u64)
    stxdw [r10-0x78], r1                    
    lddw r1, 0x100004210 --> b"\xbf#\x00\x00\x00\x00\x00\x00a\x11\x00\x00\x00\x00\x00\x00\xb7\x02\x00\x0…        r1 load str located at 4294984208
    stxdw [r10-0x70], r1                    
    stxdw [r10-0x80], r1                    
    mov64 r1, r10                                   r1 = r10
    add64 r1, -168                                  r1 += -168   ///  r1 = r1.wrapping_add(-168 as i32 as i64 as u64)
    stxdw [r10-0x88], r1                    
    mov64 r1, r10                                   r1 = r10
    add64 r1, -160                                  r1 += -160   ///  r1 = r1.wrapping_add(-160 as i32 as i64 as u64)
    mov64 r2, r10                                   r2 = r10
    add64 r2, -96                                   r2 += -96   ///  r2 = r2.wrapping_add(-96 as i32 as i64 as u64)
    call function_858                       
    ldxdw r1, [r10-0xa0]                    
    ldxdw r2, [r10-0x90]                    
    syscall [invalid]                       
    ldxw r1, [r10-0xa8]                     
    ldxw r2, [r10-0xa4]                     
    add64 r2, r1                                    r2 += r1   ///  r2 = r2.wrapping_add(r1)
    lsh64 r2, 32                                    r2 <<= 32   ///  r2 = r2.wrapping_shl(32)
    rsh64 r2, 32                                    r2 >>= 32   ///  r2 = r2.wrapping_shr(32)
    jne r2, 1337, lbb_58                            if r2 != (1337 as i32 as i64 as u64) { pc += 6 }
    lddw r1, 0x1000043e0 --> b"You win!"            r1 load str located at 4294984672
    mov64 r2, 8                                     r2 = 8 as i32 as i64 as u64
    syscall [invalid]                       
    mov64 r1, 987654321                             r1 = 987654321 as i32 as i64 as u64
    ja lbb_63                                       if true { pc += 5 }
lbb_58:
    lddw r1, 0x1000043e8 --> b"You lose!"           r1 load str located at 4294984680
    mov64 r2, 9                                     r2 = 9 as i32 as i64 as u64
    syscall [invalid]                       
    mov64 r1, 123456789                             r1 = 123456789 as i32 as i64 as u64

Annotating Immediate Loads

Instructions like:

lddw   r1, 0x1000043e0

point into .rodata. sol-azy:

Checks if imm >= MM_RODATA_START
Extracts the corresponding bytes from the .so
Uses the next MOV64_IMM (here, mov64 r2, 8) to determine the length
Displays a byte string: b"You win!"

This process is handled by:

#![allow(unused)]
fn main() {
pub fn update_string_resolution(program: &[u8], insn: &Insn, next_insn_wrapped: Option<&Insn>, register_tracker: &mut RegisterTracker) -> String
}

Support for sBPF v2+: Address Construction via mov32 + hor64 In sBPF version 2 and above, the use of lddw for loading 64-bit constants is forbidden. Instead, addresses are manually constructed using:
mov32  r1, 0x3000         ; load lower 32 bits
hor64  r1, 0x10000000     ; set upper 32 bits → r1 = 0x1000000000003000
sol-azy handles this by:

Tracking register values using a RegisterTracker

Do an "emulation" of mov and hor64

Resolving loads like ldxdw r2, [dst + off] where dst + off points into .rodata

Extracting and decoding the pointed memory, same as for lddw

This lets the disassembler annotate pointer-based loads even when addresses are assembled dynamically.

Visualization

Here is an example of a control flow graph with disassembly and immediate data decoded:

Disassembly with .rodata decoded

Arrows represent jumps or branches
Blocks show disassembled instructions
--> b"...string..." indicates .rodata interpretation

Output Files

When running:

cargo run -- reverse --mode disass --out-dir ./out --bytecodes-file ./program.so

You get:

File	Description
`disassembly.out`	Main instruction listing with annotations
`immediate_data_table.out`	All tracked immediate memory ranges

Example from immediate_data_table.out:

0x1000043e0 (+ 0x43e0): b"You win!"
0x1000043e8 (+ 0x43e8): b"You lose!"

Tips

Enable --labeling to auto-gen labels.
Use mode = both to get disassembly + CFG together.

Immediate Data Tracking

sol-azy performs tracking of immediate values loaded from .rodata via LD_DW_IMM instructions.
This feature is crucial to recover strings, error messages, and embedded constants that are otherwise opaque in the bytecode.

How it works

Solana eBPF programs often use this pattern to load a constant string:

lddw   r1, 0x1000043e8
mov64  r2, 9

The lddw instruction loads an offset in .rodata
The mov64 gives a length (usually in bytes)
sol-azy uses these two to extract a slice of memory and decode it

If the memory region looks printable (ASCII-compatible), it is rendered as a string like:

b"You lose!"

Otherwise, a hex-escaped byte string is emitted.

Output File: `immediate_data_table.out`

This file lists all detected .rodata ranges accessed via LD_DW_IMM, whether or not they were also used in disassembly.

Format

Each line contains:

<absolute_address> (+ <relative_offset>): <decoded_bytes>

Example:

0x1000043e0 (+ 0x43e0): b"You win!"
0x1000043e8 (+ 0x43e8): b"You lose!"
0x100004434 (+ 0x434): b"Not enough data. Need two u32 values.src/entrypoint.rs"

The relative_offset is computed relative to MM_RODATA_START, and is used to index into the ELF's .rodata section.

Visual Reference

Here's a screenshot of a real immediate_data_table.out file generated from a test case:

Immediate Data Table Output

We can see:

Success and failure strings: "You win!", "You lose!"
Panic messages
Rust format strings
Even full numeric patterns (e.g. "00010203...")

Behind the scenes

The logic is handled by this function:

#![allow(unused)]
fn main() {
fn disassemble_wrapper(
    program: &[u8],
    analysis: &mut Analysis,
    imm_tracker_wrapped: Option<&mut ImmediateTracker>,
    path: P,
)
}

Each LD_DW_IMM is analyzed, and its value is registered using:

#![allow(unused)]
fn main() {
imm_tracker.register_offset(insn.imm as usize);
}

Then, for each tracked range:

The program is sliced using offset logic
The result is passed to:

#![allow(unused)]
fn main() {
pub fn format_bytes(slice: &[u8]) -> String
}

which escapes non-printables and prints ASCII as-is.

LD_DW_IMM: Key Instructions and Address Keys

The tracking system is triggered exclusively by LD_DW_IMM instructions, which are used to load 64-bit constants.
When such an instruction loads an address greater than or equal to MM_RODATA_START, sol-azy considers it a .rodata access.

These addresses become the keys of the immediate_data_table.out output.

Example:

lddw   r1, 0x1000043e0   ; ← This address becomes a key
mov64  r2, 8             ; ← Length hint

This results in:

0x1000043e0 (+ 0x43e0): b"You win!"

Range Truncation: Avoiding Overlaps

In programs with many LD_DW_IMM, multiple memory regions may point into the same .rodata segment.
To avoid overlap between two string regions, sol-azy performs forward truncation:

It registers each LD_DW_IMM address (new_start)
It finds the next closest start already known
It truncates any overlapping previous entry so that no two extracted ranges overlap

This ensures that the memory region allocated for one string or constant does not accidentally contain bytes meant for another.

⚠️ Important Note on Partial Overlaps

The truncation mechanism ensures that two tracked .rodata regions do not overlap, but this does not imply that only the non-overlapping portion of earlier data is relevant.

For example:

A lddw at 0x1 loads 4 bytes of useful data.
Later, a lddw at 0x3 uses only the high bits (e.g., last 2 bytes).

Even if these regions partially overlap in memory, the system still treats 0x1 as a distinct valid address for its own usage.

This means:

The data at 0x1 is still considered to start at 0x1.
The data at 0x3 is separately tracked, even if it falls inside a previously registered range.

The truncation is only used to split visible ranges in the output, not to reinterpret or cut off the semantics of earlier loads.

Example

Suppose:

lddw r1, 0x1000043e0      ; key #1
lddw r2, 0x1000043e8      ; key #2, appears later in bytecode

Even if the length for key #1 is unclear (or too long), sol-azy will truncate its range to stop at 0x1000043e8.

This avoids having "You win!" + "You lose!" accidentally merged into one blob, since both of them will be used independently by separate LD_DW_IMM instructions.

Support for sBPF v2+: Address Construction via mov32 + hor64 In sBPF version 2 and above, the use of lddw for loading 64-bit constants is forbidden. Instead, addresses are manually constructed using:
mov32  r1, 0x3000         ; load lower 32 bits
hor64  r1, 0x10000000     ; set upper 32 bits → r1 = 0x1000000000003000
sol-azy handles this by:

Tracking register values using a RegisterTracker

Do an "emulation" of mov and hor64

Resolving loads like ldxdw r2, [dst + off] where dst + off points into .rodata

Extracting and decoding the pointed memory, same as for lddw

This lets the disassembler annotate pointer-based loads even when addresses are assembled dynamically.

Internal Implementation

The tracking structure is a BTreeMap<usize, usize>:

#![allow(unused)]
fn main() {
pub struct ImmediateTracker {
    ranges: BTreeMap<usize, usize>, // start => end
}
}

Each register_offset(new_start) will:

Locate the next start value already in ranges
Set new_end = next_start
Truncate any existing range that would overlap with new_start

This is enforced even if the memory contents could technically overlap — correctness is prioritized.

When this matters

This tracking is especially useful when:

The program includes panic messages
You want to recover hardcoded strings (e.g. "owner mismatch")
You're analyzing solana_program syscalls with string-based I/O
You want to reverse undocumented or obfuscated logic

Tips

Use --mode disass or --mode both to enable this feature
If a string appears truncated, check the corresponding mov64 for its length
If no mov64 follows a lddw, the default read length is ~50 bytes for the CFG rendering

Control Flow Graph (CFG)

sol-azy can extract a static control flow graph (CFG) from a compiled Solana eBPF program.
The output is a Graphviz-compatible .dot file representing function-level control flow between basic blocks.

This is useful for:

Visualizing branching behavior
Locating unreachable code
Detecting loop structures
Understanding high-level logic without source code

Overview

The CFG is generated via:

#![allow(unused)]
fn main() {
pub fn export_cfg_to_dot(
    program: &[u8],
    analysis: &mut Analysis,
    path: impl AsRef<Path>,
    reduced: bool,
    only_entrypoint: bool,
) -> std::io::Result<()>
}

sol-azy uses the static Analysis engine to:

Identify all functions
Segment them into basic blocks
Record all dominators and edges
Render each function as a subgraph cluster in Graphviz .dot syntax

Filtering the graph

--reduced: excludes library functions that appear before the program’s entrypoint, reducing noise.
--only-entrypoint: includes only the function where execution starts, allowing for very focused manual exploration (e.g., with dotting).

Structure of the Graph

Each basic block is rendered as a node with label:

lbb_<id> [label=<<table>...</table>>];

The instruction list is printed line-by-line
If a string is found via LD_DW_IMM + MOV64_IMM, it’s appended with: --> b"..."
Long strings are truncated

Resolving Edges

Edges are derived from instruction flow and jump destinations:

Conditional jumps produce two outgoing edges
Unconditional jumps produce one
Return or syscall ends a block
Dominator relationships (parent-child) are shown with dotted arrows (style=dotted; arrowhead=none)

Detailed behavior:
sol-azy draws edges based on:

jne, jeq, etc. → conditional edges
ja (jump always) → unconditional
call, exit, and ret → no outgoing edge

dominator_parent → rendered with:

lbb_A -> lbb_B [style=dotted; arrowhead=none];

Example (DOT snippet)

Here’s a raw .dot snippet generated by sol-azy:

lbb_58 [label=<<table border="0" cellborder="0" cellpadding="3">
  <tr><td align="left">lddw</td><td align="left">r1, 0x1000043e8 --&gt; b"You lose!"</td></tr>
  <tr><td align="left">mov64</td><td align="left">r2, 9</td></tr>
  <tr><td align="left">syscall</td><td align="left">[invalid]</td></tr>
  <tr><td align="left">mov64</td><td align="left">r1, 123456789</td></tr>
</table>>];

Graphviz will render this as a block node with a 4-row table inside.

How sol-azy Generates It

CFG generation is implemented in:

#![allow(unused)]
fn main() {
pub fn export_cfg_to_dot(
    program: &[u8],
    analysis: &mut Analysis,
    path: impl AsRef<Path>,
)
}

It walks:

The analysis.functions map
Each function’s set of cfg_nodes
Each node’s instructions (via analysis.instructions)
Control destinations (e.g., jne, ja, call)
Dominator relationships (cfg_node.dominator_parent)

The layout is rendered using Graphviz-style clusters:

subgraph cluster_42 {
    label="function_name";
    lbb_42 [ ... ];
}

Strings from `.rodata`

CFG generation is enhanced by the same string resolution logic used in disassembly:

#![allow(unused)]
fn main() {
fn get_string_repr(
    program: &[u8],
    insn: &Insn,
    next_insn: Option<&Insn>
) -> String
}

This makes string loads from .rodata visible directly in the graph, so when an instruction like::

lddw r1, 0x1000043e8

is followed by:

mov64 r2, 9

sol-azy uses this to resolve:

b"You lose!"

It is rendered like this:

<td align="left">r1, 0x1000043e8 --&gt; b"You lose!"</td>

This makes constant decoding directly visible in the graph.

Rendering the Graph

Once cfg.dot is generated, use:

dot -Tsvg cfg.dot -o cfg.svg
xdot cfg.dot         # for interactive navigation

⚠️ For very large programs, even the --reduced version of the CFG can take significant time to generate due to the size and complexity of the bytecode being analyzed and rendered by dot.

Function Grouping

Each function is placed into a subgraph cluster for clarity. This helps:

Separate function-level CFGs
Navigate large programs
Find easily the main part

Entrypoint Example:

subgraph cluster_3 {
    label="entrypoint";
    tooltip=lbb_3;
    lbb_3 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r2, r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -96</td></tr><tr><td align="left">call</td><td align="left">function_308</td></tr></table>>];
    lbb_7 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r7, [r10-0x48]</td></tr><tr><td align="left">ldxdw</td><td align="left">r8, [r10-0x58]</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x38]</td></tr><tr><td align="left">mov64</td><td align="left">r2, 8</td></tr><tr><td align="left">jgt</td><td align="left">r2, r1, lbb_91</td></tr></table>>];
    lbb_91 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">lddw</td><td align="left">r1, 0x1000043f4 --&gt; b&quot;Not enough data. Need two u32 values.&quot;</td></tr><tr><td align="left">mov64</td><td align="left">r2, 37</td></tr><tr><td align="left">syscall</td><td align="left">[invalid]</td></tr><tr><td align="left">mov64</td><td align="left">r1, 2</td></tr><tr><td align="left">stxw</td><td align="left">[r10-0xc8], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -200</td></tr><tr><td align="left">call</td><td align="left">function_554</td></tr></table>>];
    lbb_100 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, r0</td></tr><tr><td align="left">jeq</td><td align="left">r7, 0, lbb_107</td></tr></table>>];
    lbb_12 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x40]</td></tr><tr><td align="left">ldxw</td><td align="left">r2, [r1+0x0]</td></tr><tr><td align="left">stxw</td><td align="left">[r10-0xa8], r2</td></tr><tr><td align="left">ldxw</td><td align="left">r1, [r1+0x4]</td></tr><tr><td align="left">stxw</td><td align="left">[r10-0xa4], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, 0</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x40], r1</td></tr><tr><td align="left">lddw</td><td align="left">r1, 0x100004610 --&gt; b&quot;\x00\x00\x00\x00\xd0C\x00\x00\x08\x00\x00\x…</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x60], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, 2</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x58], r1</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x48], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -136</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x50], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -164</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x78], r1</td></tr><tr><td align="left">lddw</td><td align="left">r1, 0x100004210 --&gt; b&quot;\xbf#\x00\x00\x00\x00\x00\x00a\x11\x00\x00\…</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x70], r1</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x80], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -168</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x88], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -160</td></tr><tr><td align="left">mov64</td><td align="left">r2, r10</td></tr><tr><td align="left">add64</td><td align="left">r2, -96</td></tr><tr><td align="left">call</td><td align="left">function_858</td></tr></table>>];
    lbb_43 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0xa0]</td></tr><tr><td align="left">ldxdw</td><td align="left">r2, [r10-0x90]</td></tr><tr><td align="left">syscall</td><td align="left">[invalid]</td></tr><tr><td align="left">ldxw</td><td align="left">r1, [r10-0xa8]</td></tr><tr><td align="left">ldxw</td><td align="left">r2, [r10-0xa4]</td></tr><tr><td align="left">add64</td><td align="left">r2, r1</td></tr><tr><td align="left">lsh64</td><td align="left">r2, 32</td></tr><tr><td align="left">rsh64</td><td align="left">r2, 32</td></tr><tr><td align="left">jne</td><td align="left">r2, 1337, lbb_58</td></tr></table>>];
    lbb_58 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">lddw</td><td align="left">r1, 0x1000043e8 --&gt; b&quot;You lose!&quot;</td></tr><tr><td align="left">mov64</td><td align="left">r2, 9</td></tr><tr><td align="left">syscall</td><td align="left">[invalid]</td></tr><tr><td align="left">mov64</td><td align="left">r1, 123456789</td></tr></table>>];
    lbb_52 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">lddw</td><td align="left">r1, 0x1000043e0 --&gt; b&quot;You win!&quot;</td></tr><tr><td align="left">mov64</td><td align="left">r2, 8</td></tr><tr><td align="left">syscall</td><td align="left">[invalid]</td></tr><tr><td align="left">mov64</td><td align="left">r1, 987654321</td></tr><tr><td align="left">ja</td><td align="left">lbb_63</td></tr></table>>];
    lbb_63 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">stxdw</td><td align="left">[r10-0x68], r1</td></tr><tr><td align="left">lddw</td><td align="left">r1, 0x100004630 --&gt; b&quot;\x00\x00\x00\x00\xd8C\x00\x00\x08\x00\x00\x…</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x60], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, 1</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x58], r1</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x48], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -160</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x50], r1</td></tr><tr><td align="left">lddw</td><td align="left">r1, 0x100004238 --&gt; b&quot;\xbf#\x00\x00\x00\x00\x00\x00y\x11\x00\x00\…</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x98], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -104</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0xa0], r1</td></tr><tr><td align="left">mov64</td><td align="left">r6, 0</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x40], r6</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -136</td></tr><tr><td align="left">mov64</td><td align="left">r2, r10</td></tr><tr><td align="left">add64</td><td align="left">r2, -96</td></tr><tr><td align="left">call</td><td align="left">function_858</td></tr></table>>];
    lbb_86 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x88]</td></tr><tr><td align="left">ldxdw</td><td align="left">r2, [r10-0x78]</td></tr><tr><td align="left">syscall</td><td align="left">[invalid]</td></tr><tr><td align="left">jeq</td><td align="left">r7, 0, lbb_107</td></tr></table>>];
    lbb_90 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ja</td><td align="left">lbb_102</td></tr></table>>];
    lbb_102 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">add64</td><td align="left">r8, 16</td></tr><tr><td align="left">ja</td><td align="left">lbb_109</td></tr></table>>];
    lbb_109 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r8+0x0]</td></tr><tr><td align="left">ldxdw</td><td align="left">r2, [r8-0x8]</td></tr><tr><td align="left">ldxdw</td><td align="left">r3, [r2+0x0]</td></tr><tr><td align="left">add64</td><td align="left">r3, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r2+0x0], r3</td></tr><tr><td align="left">jne</td><td align="left">r3, 0, lbb_118</td></tr></table>>];
    lbb_115 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r3, [r2+0x8]</td></tr><tr><td align="left">add64</td><td align="left">r3, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r2+0x8], r3</td></tr></table>>];
    lbb_118 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x0]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x0], r2</td></tr><tr><td align="left">jne</td><td align="left">r2, 0, lbb_104</td></tr></table>>];
    lbb_122 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x8]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x8], r2</td></tr><tr><td align="left">ja</td><td align="left">lbb_104</td></tr></table>>];
    lbb_104 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">add64</td><td align="left">r8, 48</td></tr><tr><td align="left">add64</td><td align="left">r7, -1</td></tr><tr><td align="left">jne</td><td align="left">r7, 0, lbb_109</td></tr></table>>];
    lbb_107 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r0, r6</td></tr><tr><td align="left">exit</td></tr></table>>];
  }

alt text

Full Example (Visual)

Here’s what a real sol-azy-generated CFG can look like:

Each rectangle = basic block
Arrows = jumps, calls, or branches
Dashed arrows = dominator links

Code example:

#![allow(unused)]
fn main() {
use solana_program::{
    account_info::AccountInfo,
    entrypoint,
    entrypoint::ProgramResult,
    pubkey::Pubkey,
    msg,
};

entrypoint!(process_instruction);

fn win() -> u64 {
    msg!("You win!");
    987654321
}

fn loose() -> u64 {
    msg!("You lose!");
    123456789
}

pub fn process_instruction(
    _program_id: &Pubkey,
    _accounts: &[AccountInfo],
    instruction_data: &[u8],
) -> ProgramResult {
    if instruction_data.len() < 8 {
        msg!("Not enough data. Need two u32 values.");
        return Err(solana_program::program_error::ProgramError::InvalidInstructionData);
    }

    let a = u32::from_le_bytes(instruction_data[0..4].try_into().unwrap());
    let b = u32::from_le_bytes(instruction_data[4..8].try_into().unwrap());

    msg!("Inputs: {} + {}", a, b);

    let result = if a + b == 1337 {
        win()
    } else {
        loose()
    };

    msg!("Result: {}", result);

    Ok(())
}
}

CFG recovered from bytecode:

CFG Example

Reduced Control Flow Graph (CFG)

Analyzing large Solana eBPF programs can produce overwhelming control flow graphs (CFGs) due to the sheer number of functions and basic blocks. sol-azy offers two modes to reduce graph complexity:

--reduced: Only include functions defined after the entrypoint.
--only-entrypoint: Include only the function cluster of the entrypoint itself.

1. `--reduced`

The --reduced flag filters the generated CFG by discarding functions that are likely part of the runtime or standard library.

Example

cargo run -- reverse \
  --mode cfg \
  --out-dir ./out/ \
  --bytecodes-file ./program.so \
  --labeling \
  --reduced

What It Does

Keeps only functions that appear after the entrypoint in the binary layout.
Typically corresponds to user-defined logic.
Excludes Solana runtime boilerplate (e.g., abort_internal, core::fmt, etc.)

2. `--only-entrypoint`

The --only-entrypoint flag isolates just the entrypoint function, without including its callees or any other clusters.

Example

cargo run -- reverse \
  --mode cfg \
  --out-dir ./out/ \
  --bytecodes-file ./program.so \
  --labeling \
  --only-entrypoint

What It Does

Only exports the cluster corresponding to the entrypoint.
Skips all other functions, even if they are part of user logic.
Ideal for initializing a minimal CFG for manual extension.

Why It Matters

✅ Greatly improves readability for large programs
✅ Speeds up rendering in tools like xdot or Graphviz
✅ Useful for focused auditing and vulnerability research

⚠️ With reduced, depending on program structure, some utility functions may still be present if called after the entrypoint.

Comparison

Flag	Includes Entry?	Includes Callees?	Includes Library Code?
(default / full)	✅	✅	✅
`--reduced`	✅	✅	❌
`--only-entrypoint`	✅	❌	❌

Visualization

You can render the resulting .dot files as usual:

dot -Tsvg cfg.dot -o cfg.svg
xdot cfg.dot

Reduced graphs will render faster and be easier to navigate.

⚠️ For very large programs, even the --reduced version of the CFG can take significant time to generate due to the size and complexity of the bytecode being analyzed and rendered by dot.

When to Use

Scenario	Recommended Flag
You want to analyze app logic only	`--reduced`
You want to isolate `entrypoint` manually	`--only-entrypoint`
You need full picture including libraries	(default - no flags)

Dotting: Customizing Reduced CFGs

The dotting feature in sol-azy allows you to manually augment a reduced control flow graph (CFG) by reinserting specific function clusters from the full graph.

This is particularly useful when using --reduced or --only-entrypoint modes, which intentionally drop unused or library-heavy functions. With dotting, you can selectively restore those clusters for targeted analysis.

Motivation

Reduced graphs simplify reverse engineering, but sometimes:

Important logic is optimized into shared helpers
Runtime wrappers (e.g. error handling) live outside the entrypoint
Functions of interest are excluded unintentionally

With dotting, you don’t need to regenerate a new full CFG. Instead, you can grow your existing graph by manually appending clusters and their edges.

How It Works

You create a small JSON file listing function cluster IDs to reinsert.
You run the dotting command pointing to:
- The original full .dot file (reference),
- Your reduced .dot file,
- And the JSON config.
sol-azy:
- Adds matching subgraph cluster_XX blocks.
- Appends new edges only if both sides already exist in the reduced graph.
The result is saved as updated_<reduced>.dot.

CLI Usage

cargo run -- dotting \
  --config path/to/functions.json \
  --reduced-dot path/to/reduced.dot \
  --full-dot path/to/full.dot

Config Format

Your JSON file should look like:

{
  "functions": ["10", "42", "87"]
}

Each entry is a cluster ID (i.e., the number in cluster_<id> from the .dot file). These are generally assigned incrementally during graph generation.

You can locate these IDs by inspecting the full .dot or searching for strings like:

subgraph cluster_42 {
    label="function_name";
    ...
}

Example Workflow

[one-time action] Generate a full graph (It allows for easily selecting specific clusters without re-analyzing the full bytecode every time a function needs to be added):
```
cargo run -- reverse \
  --mode cfg \
  --bytecodes-file program.so \
  --out-dir ./full \
```

Generate a reduced graph with only the entrypoint:

cargo run -- reverse \
  --mode cfg \
  --bytecodes-file program.so \
  --out-dir ./out \
  --only-entrypoint

Create a functions.json file:
```
{
 "functions": ["17014"]
}
```

Run dotting:

cargo run -- dotting \
  --config ./functions.json \
  --reduced-dot ./out/cfg.dot \
  --full-dot ./full/cfg.dot

Visualize the result:
```
xdot ./out/updated_cfg.dot
```

Example showcase

Before

digraph {
graph [
rankdir=LR;
concentrate=True;
style=filled;
color=lightgrey;
];
node [
shape=rect;
style=filled;
fillcolor=white;
fontname="Courier New";
];
edge [
fontname="Courier New";
];
  subgraph cluster_369287 {
    label="entrypoint";
    tooltip=lbb_369287;
    lbb_369287 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r2, r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -80</td></tr><tr><td align="left">call</td><td align="left">function_387396</td></tr></table>>];
    lbb_369291 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r10-0x50]</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x48]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x68], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r4, [r10-0x38]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x58], r4</td></tr><tr><td align="left">ldxdw</td><td align="left">r3, [r10-0x40]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x60], r3</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x28]</td></tr><tr><td align="left">ldxdw</td><td align="left">r5, [r10-0x30]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x1000], r5</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0xff8], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -32</td></tr><tr><td align="left">mov64</td><td align="left">r5, r10</td></tr><tr><td align="left">call</td><td align="left">function_336430</td></tr></table>>];
    lbb_369306 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, 0</td></tr><tr><td align="left">ldxw</td><td align="left">r1, [r10-0x20]</td></tr><tr><td align="left">jeq</td><td align="left">r1, 22, lbb_369321</td></tr></table>>];
    lbb_369309 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x8]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x38], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x10]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x40], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x18]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x48], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x20]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x50], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -80</td></tr><tr><td align="left">call</td><td align="left">function_389183</td></tr></table>>];
    lbb_369320 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, r0</td></tr></table>>];
    lbb_369321 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -104</td></tr><tr><td align="left">call</td><td align="left">function_17014</td></tr></table>>];
    lbb_369324 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r0, r6</td></tr><tr><td align="left">exit</td></tr></table>>];
  }
  lbb_369287 -> lbb_409579 [style=dotted; arrowhead=none];
  lbb_369287 -> {lbb_369291};
  lbb_369291 -> lbb_369287 [style=dotted; arrowhead=none];
  lbb_369291 -> {lbb_369306};
  lbb_369306 -> lbb_369291 [style=dotted; arrowhead=none];
  lbb_369306 -> {lbb_369309 lbb_369321};
  lbb_369309 -> lbb_369306 [style=dotted; arrowhead=none];
  lbb_369309 -> {lbb_369320};
  lbb_369320 -> lbb_369309 [style=dotted; arrowhead=none];
  lbb_369320 -> {lbb_369321};
  lbb_369321 -> lbb_369306 [style=dotted; arrowhead=none];
  lbb_369321 -> {lbb_369324};
  lbb_369324 -> lbb_369321 [style=dotted; arrowhead=none];
}

before_cfg

After

digraph {
graph [
rankdir=LR;
concentrate=True;
style=filled;
color=lightgrey;
];
node [
shape=rect;
style=filled;
fillcolor=white;
fontname="Courier New";
];
edge [
fontname="Courier New";
];
  subgraph cluster_369287 {
    label="entrypoint";
    tooltip=lbb_369287;
    lbb_369287 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r2, r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -80</td></tr><tr><td align="left">call</td><td align="left">function_387396</td></tr></table>>];
    lbb_369291 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r10-0x50]</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x48]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x68], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r4, [r10-0x38]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x58], r4</td></tr><tr><td align="left">ldxdw</td><td align="left">r3, [r10-0x40]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x60], r3</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x28]</td></tr><tr><td align="left">ldxdw</td><td align="left">r5, [r10-0x30]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x1000], r5</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0xff8], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -32</td></tr><tr><td align="left">mov64</td><td align="left">r5, r10</td></tr><tr><td align="left">call</td><td align="left">function_336430</td></tr></table>>];
    lbb_369306 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, 0</td></tr><tr><td align="left">ldxw</td><td align="left">r1, [r10-0x20]</td></tr><tr><td align="left">jeq</td><td align="left">r1, 22, lbb_369321</td></tr></table>>];
    lbb_369309 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x8]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x38], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x10]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x40], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x18]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x48], r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r1, [r10-0x20]</td></tr><tr><td align="left">stxdw</td><td align="left">[r10-0x50], r1</td></tr><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -80</td></tr><tr><td align="left">call</td><td align="left">function_389183</td></tr></table>>];
    lbb_369320 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, r0</td></tr></table>>];
    lbb_369321 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r1, r10</td></tr><tr><td align="left">add64</td><td align="left">r1, -104</td></tr><tr><td align="left">call</td><td align="left">function_17014</td></tr></table>>];
    lbb_369324 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r0, r6</td></tr><tr><td align="left">exit</td></tr></table>>];
  }
  lbb_369287 -> lbb_409579 [style=dotted; arrowhead=none];
  lbb_369287 -> {lbb_369291};
  lbb_369291 -> lbb_369287 [style=dotted; arrowhead=none];
  lbb_369291 -> {lbb_369306};
  lbb_369306 -> lbb_369291 [style=dotted; arrowhead=none];
  lbb_369306 -> {lbb_369309 lbb_369321};
  lbb_369309 -> lbb_369306 [style=dotted; arrowhead=none];
  lbb_369309 -> {lbb_369320};
  lbb_369320 -> lbb_369309 [style=dotted; arrowhead=none];
  lbb_369320 -> {lbb_369321};
  lbb_369321 -> lbb_369306 [style=dotted; arrowhead=none];
  lbb_369321 -> {lbb_369324};
  lbb_369324 -> lbb_369321 [style=dotted; arrowhead=none];

subgraph cluster_17014 {
    label="function_17014";
    tooltip=lbb_17014;
    lbb_17014 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r6, r1</td></tr><tr><td align="left">ldxdw</td><td align="left">r7, [r6+0x10]</td></tr><tr><td align="left">jeq</td><td align="left">r7, 0, lbb_17036</td></tr></table>>];
    lbb_17017 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r8, [r6+0x8]</td></tr><tr><td align="left">mul64</td><td align="left">r7, 48</td></tr><tr><td align="left">add64</td><td align="left">r8, 16</td></tr><tr><td align="left">ja</td><td align="left">lbb_17043</td></tr></table>>];
    lbb_17043 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r8-0x8]</td></tr><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x0]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x0], r2</td></tr><tr><td align="left">jne</td><td align="left">r2, 0, lbb_17021</td></tr></table>>];
    lbb_17048 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x8]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x8], r2</td></tr><tr><td align="left">jne</td><td align="left">r2, 0, lbb_17021</td></tr></table>>];
    lbb_17052 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r2, 32</td></tr><tr><td align="left">mov64</td><td align="left">r3, 8</td></tr><tr><td align="left">call</td><td align="left">function_373318</td></tr></table>>];
    lbb_17055 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ja</td><td align="left">lbb_17021</td></tr></table>>];
    lbb_17021 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r8+0x0]</td></tr><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x0]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x0], r2</td></tr><tr><td align="left">jne</td><td align="left">r2, 0, lbb_17033</td></tr></table>>];
    lbb_17026 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r1+0x8]</td></tr><tr><td align="left">add64</td><td align="left">r2, -1</td></tr><tr><td align="left">stxdw</td><td align="left">[r1+0x8], r2</td></tr><tr><td align="left">jne</td><td align="left">r2, 0, lbb_17033</td></tr></table>>];
    lbb_17030 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">mov64</td><td align="left">r2, 40</td></tr><tr><td align="left">mov64</td><td align="left">r3, 8</td></tr><tr><td align="left">call</td><td align="left">function_373318</td></tr></table>>];
    lbb_17033 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">add64</td><td align="left">r8, 48</td></tr><tr><td align="left">add64</td><td align="left">r7, -48</td></tr><tr><td align="left">jne</td><td align="left">r7, 0, lbb_17043</td></tr></table>>];
    lbb_17036 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r2, [r6+0x0]</td></tr><tr><td align="left">jeq</td><td align="left">r2, 0, lbb_17056</td></tr></table>>];
    lbb_17038 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ldxdw</td><td align="left">r1, [r6+0x8]</td></tr><tr><td align="left">mul64</td><td align="left">r2, 48</td></tr><tr><td align="left">mov64</td><td align="left">r3, 8</td></tr><tr><td align="left">call</td><td align="left">function_373318</td></tr></table>>];
    lbb_17042 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">ja</td><td align="left">lbb_17056</td></tr></table>>];
    lbb_17056 [label=<<table border="0" cellborder="0" cellpadding="3"><tr><td align="left">exit</td></tr></table>>];
  }

lbb_17014 -> {lbb_17017 lbb_17036};
lbb_17017 -> {lbb_17043};
lbb_17021 -> {lbb_17026 lbb_17033};
lbb_17026 -> {lbb_17030 lbb_17033};
lbb_17030 -> {lbb_17033};
lbb_17033 -> {lbb_17036 lbb_17043};
lbb_17036 -> {lbb_17038 lbb_17056};
lbb_17038 -> {lbb_17042};
lbb_17042 -> {lbb_17056};
lbb_17043 -> {lbb_17021 lbb_17048};
lbb_17048 -> {lbb_17021 lbb_17052};
lbb_17052 -> {lbb_17055};
lbb_17055 -> {lbb_17021};
lbb_409579 -> {lbb_17014 lbb_369287};
}

after_cfg

Behavior Notes

Edges are only added if both source and target basic blocks are already present.
If you want edges for new blocks too, consider adding additional clusters.
updated_cfg.dot is created next to your original file.
The original cfg.dot is not modified.

Tips

Combine --only-entrypoint + dotting to build your CFG incrementally.

Architecture

sol-azy is a modular static analysis toolkit designed to work on Solana programs compiled to eBPF.
It is capable of disassembling, analyzing control flow, decoding embedded .rodata strings, and performing pattern-based syntactic analysis through rule-based AST matching.

High-Level Design

sol-azy is structured around three main engines, supported by auxiliary modules:

Core Engines

Reverse Engine Handles binary-level disassembly, control flow graph generation, and .rodata analysis. → Triggered via the reverse CLI command.
SAST Engine Performs static source-level analysis using Starlark-based rule evaluation on Rust ASTs. → Triggered via the sast CLI command.
Build Engine Detects the project type (Anchor, SBF) and compiles the bytecode accordingly. → Triggered via the build CLI command.

Supporting Modules

Dotting Module Allows users to manually reintroduce function clusters into reduced CFGs by editing .dot files post-generation. → Useful for large programs or targeted function exploration.
Fetcher Module Retrieves deployed program bytecode directly from on-chain Solana accounts via RPC. → Enables reverse analysis even without access to local source code.

Each component is designed to be composable and scriptable, making sol-azy flexible for both auditing and program analysis workflows.

Component Overview

1. `reverse/`

Handles all operations on compiled .so eBPF files:

disass.rs: Disassembler with inline .rodata resolution
cfg.rs: Generates DOT graphs from the control flow
immediate_tracker.rs: Tracks data regions used by LD_DW_IMM
utils.rs: String formatting and decoding

It produces:

disassembly.out
immediate_data_table.out
cfg.dot

→ See Reverse Overview

2. `parsers/` + `state/sast_state.rs`

Used in source-level static analysis.

Parses all .rs files into [syn::File] ASTs
Builds AstPositions with span references
Applies Starlark-based rules to nodes and attributes
Aggregates findings into a SastState

→ See SAST Overview

3. `engines/starlark_engine.rs`

Embeds a Starlark interpreter to run user-defined rules against Rust ASTs.

Prepares the AST (JSON + span tracking)
Loads .star files from rules/
Invokes syn_ast_rule(...) with context
Collects matches and metadata as JSON

→ See Writing Rules

4. `commands/`

Command routing layer:

build_command.rs: Uses Anchor/Cargo to compile .so files
reverse_command.rs: Dispatches disass + cfg generation
sast_command.rs: Launches Starlark rule scanning

Used by AppState::run_cli() to manage flow.

5. `helpers/`

Utilities to:

Detect project type (Anchor.toml, Cargo.toml)
Check external dependencies (cargo, anchor)
Run subprocesses with environment overrides

6. `dotting/`

Post-processing module for .dot control flow graphs:

Allows restoring function subgraphs in reduced or entrypoint-only graphs
Takes as input a list of function IDs (clusters) to reinject
Outputs an updated_*.dot file with the requested functions and edges

This module is especially useful when a full CFG is too large or noisy, letting analysts rebuild targeted graphs incrementally.

→ See Dotting

7. `fetcher/`

Bytecode retrieval module for on-chain programs:

Connects to Solana RPC endpoints
Downloads the deployed .so bytecode of a program ID
Saves the ELF file locally for reverse analysis

This feature is useful for audits where source code is unavailable or unverifiable.

→ See Fetcher

Output Flow

        +----------------+
        | .so Bytecode   | ← built via cargo / anchor
        +----------------+
                 |
         [reverse_command]
                 ↓
        +-----------------------+
        |  Analysis (sbpf-solana)
        |  + immediate tracker
        +-----------------------+
          ↓        ↓        ↓
    disass  immediate   cfg.dot
     .out    _data.out

--------------------------------------

        +----------------+
        |  Rust Source   | ← e.g. Anchor project
        +----------------+
                 |
           [sast_command]
                 ↓
        +----------------------+
        | syn::File + spans    |
        | + rule evaluation    |
        +----------------------+
                 ↓
             Findings
           (printed / JSON)

External Dependencies

sbpf-solana (anza-xyz): Disassembly / Analysis core
syn: Source AST parsing
starlark-rust: Rule evaluation engine

Extensibility

The architecture is modular and designed for extension:

Add new output formats by extending the ReverseOutputMode
Plug in new analysis passes (e.g., MIR or LLVM IR) in engines/
Write new rules without modifying Rust code (.star files)
Integrate into CI pipelines via CLI interface

Next Steps

`AppState` Architecture

The AppState struct acts as the central orchestrator for sol-azy’s CLI runtime.
It coordinates the execution of commands like build, sast and reverse, and stores the resulting internal states across executions.

Where It Lives

File: src/state/app_state.rs

Responsibilities

Parse and dispatch the correct CLI subcommand via run_cli()
Track cumulative state (e.g., build results, SAST matches)
Encapsulate application-wide control flow

Structure

#![allow(unused)]
fn main() {
pub struct AppState {
    pub cli: Cli,
    pub build_states: Vec<BuildState>,
    pub sast_states: Vec<SastState>,
}
}

cli: The parsed Cli object from Clap, holding user input
build_states: Stores results of build_command::run(...)
sast_states: Stores results of sast_command::run(...)

Core Method: `run_cli`

This is the entrypoint for CLI usage:

#![allow(unused)]
fn main() {
pub fn run_cli(&mut self)
}

It matches on the selected Commands enum variant:

#![allow(unused)]
fn main() {
match &self.cli.command {
    Commands::Build { ... } => self.build_project(...),
    Commands::Sast { ... } => self.run_sast(...),
    Commands::Reverse { ... } => self.run_reverse(...),
    ...
}
}

Each arm delegates to a method that:

Executes the logic of the command
Logs the outcome
Updates the internal state vector (build_states, sast_states) → except for the reverse command

Why is `AppState` needed?

sol-azy is a multi-command CLI application, and AppState provides:

A consistent runtime container to track what’s been executed
A clean separation of CLI logic from actual analysis logic

Example Flow

cargo run -- build --target-dir myproj --out-dir myproj/out

Leads to:

AppState::run_cli()
→ AppState::build_project(...)
→ build_command::run(...)
→ Result stored in app_state.build_states

`SastEngine` Architecture

Evolution of sol-azy

sol-azy is currently in development, and we have few ideas for new features.