Learn Ghidra, Chapter 2: Exploring Control Flow & Data References

by Evgenii Studitskikh
0 comments 43 views 6 minutes read

In the previous chapter, we got Ghidra installed and walked through a basic reverse-engineering example: a simple password checker written in C. We imported the binary, viewed assembly and pseudocode, and even patched a jump instruction. In this chapter, we’ll go deeper: we’ll explore how to trace control flow (branches, loops) and how to follow data references (strings, global variables). These techniques are essential when dealing with more complex programs, where logic is spread across multiple functions and data is tucked away in unexpected places.

We’ll use another small example to illustrate—this time with branching and a lookup table—and then see how Ghidra helps us see the “shape” of the program, not just linear code.

Example: A “license key checker” with branching and a table

Here’s a toy C program that checks a license key against a small table and branches according to a match:

#include <stdio.h>
#include <string.h>

int validate(const char *key) {
    // A small table of valid keys
    const char *valid[] = {
        "ABC123", 
        "XYZ789", 
        "HELLO42"
    };
    int n = sizeof(valid) / sizeof(valid[0]);
    for (int i = 0; i < n; i++) {
        if (strcmp(key, valid[i]) == 0) {
            return i + 1;  // returns 1, 2, or 3
        }
    }
    return 0;  // invalid
}

int main() {
    char input[32];
    printf("Enter license key: ");
    if (!fgets(input, sizeof(input), stdin)) {
        return 1;
    }
    input[strcspn(input, "\n")] = 0;
    int res = validate(input);
    switch (res) {
        case 1:
            printf("Key ABC123 accepted\n");
            break;
        case 2:
            printf("Key XYZ789 accepted\n");
            break;
        case 3:
            printf("Key HELLO42 accepted\n");
            break;
        default:
            printf("Invalid key\n");
            break;
    }
    return 0;
}
C

Compile it (for example, with gcc -m64 -o license_checker license.c). The logic is simple—but from the compiled binary, you won’t immediately see the switch statement or easy names. With Ghidra, we can tease out the flow and understand the branching.

Step 1: Import and Analyze as before

Import license_checker into your Ghidra project, run the default auto-analysis, and open it in Code Browser. Expand the Functions list and locate main (or perhaps _start then jump into it). Also look for validate.

Step 2: Observe the control flow in main

Click on main. In the Listing (assembly side), you will see instructions corresponding to:

  • Calling validate
  • A switch or jump table dispatch
  • Different paths leading to different printf calls

Ghidra’s Decompiler window, however, often reconstructs the switch as a switch (res) with case labels. That gives you a higher-level view. But don’t rely solely on pseudocode—go back and forth between the decompiled view and the disassembly to confirm details.

You can also see flow arrows in the Listing — green arrows for fall-through, red arrows for branches, etc. Using “Show Instruction Flow” (or enabling the “Flow Arrows” view) helps visually track which basic blocks jump where.

Step 3: Find the table of valid keys (data reference)

In your decompiled validate function, you’ll see something like:

for (i = 0; i < 3; i++) {
  if (strcmp(key, *(valid + i)) == 0) {
    ...
  }
}
C

But where is valid stored? That’s likely in the data (read-only) section. In the Listing or Symbol Tree, locate “Strings” or “Defined Strings.” You should see "ABC123", "XYZ789", "HELLO42" somewhere. Double-click one of them — Ghidra will bring you to the memory location (in the Data window) where it’s stored.

Then right-click and “References → Show References To.” You’ll see which instructions refer to that string (e.g., the strcmp call in validate). That gives you a trail: from a constant string to code logic.

Likewise, you may see a pointer table or an array in .rodata (read-only data). That’s the table valid[]. Manipulating pointers, you can see how the compiled code computes the address of each string (often via offsets).

Step 4: Cross-function control flow and call graph

Once you’ve grasped main and validate, use Ghidra to see how functions interconnect. In the Code Browser:

  • In Symbol Tree, right-click a function and choose “Show Call Graph” or “Graph View.”
  • You’ll see nodes for main, validate, strcmp, printf, etc., with arrows showing who calls whom.

This is especially helpful when binaries scale: dozens of functions, many calls. The call graph gives you the “big picture” — then you can zoom into each part.

Step 5: Annotate, comment, and rename for clarity

One of Ghidra’s most helpful features is letting you rename functions, variables, and parameters as you understand them. E.g.:

  • Rename FUN_00401234 to validate_key
  • Rename local_10 to key_input
  • Add comments above a block like “// branch for valid key 2 (XYZ789)”

These annotations don’t change the binary, but they greatly improve readability as you explore.

What did we learn?

  • Control flow exploration — Branches, loops, and switch statements get reconstructed in decompiled pseudocode, but examining assembly with flow arrows solidifies understanding.
  • Data references — Strings and tables in the data section are crackable via “References” tools; they often betray program logic (e.g., valid values, messages).
  • Call graphs — Helpful to see the large-scale structure and interaction of functions.
  • Renaming & comments — Essential for turning a confusing disassembly into a readable, documented reverse-engineered version.

Exercises you can try

  1. Add another key: Recompile after inserting one more valid key (e.g. "OPEN123") and re-run Ghidra. See how the table shifts and how offsets change.
  2. Make it case-insensitive: Modify the program to check the key, ignoring upper/lower case (e.g. via strcasecmp), and see how Ghidra represents library calls differently.
  3. Obfuscate the table: Instead of storing the strings plainly, XOR each byte with a constant (e.g. 0x5A) and decode at runtime. Then reverse that logic in Ghidra to retrieve the original plaintext.

You may also like