Lightpanda Zig lightpanda-io/browser

Post-Chrome Browser Architecture

Walk the Zig build graph, the module map, the libcurl HTTP layer, the html5ever Rust-to-Zig parser bridge, the v8 embedding, and the headless event loop.

7 stops ~22 min Verified 2026-05-04
What you will learn
  • How build.zig.zon pins v8, libcurl, BoringSSL, and html5ever into one Zig build
  • How src/lightpanda.zig acts as a directory listing of the architecture
  • How libcurl headers are wrapped in Zig with errdefer cleanup on partial failure
  • How html5ever drives DOM construction through a sixteen-callback table back into Zig
  • How Parser.zig connects the FFI boundary to per-page DOM ownership
  • How Env.zig creates one v8 isolate per browser, loads a snapshot, and installs callbacks
  • How Runner.zig polls libcurl, runs scripts, and pumps v8 queues without a compositor
Prerequisites
  • Comfort reading Zig (allocators, errdefer, extern struct)
  • Familiarity with C ABI FFI at a high level
  • No prior knowledge of v8, libcurl, or html5ever required
1 / 7

build.zig.zon: Three Engines, One Build

build.zig.zon:1

The package manifest pins v8, BoringSSL, curl, brotli, zlib, nghttp2, sqlite3, and libidn2 with content-addressed hashes.

A browser is not one engine; it is several, glued together. build.zig.zon is where Lightpanda lists what it borrows: v8 for JavaScript through the lightpanda-io/zig-v8-fork tag, curl 8.18.0 for HTTP, BoringSSL for TLS, plus brotli, zlib, nghttp2, sqlite3, and libidn2. html5ever is the one engine not in this file because it is built through cargo from src/html5ever/Cargo.toml, not through Zig's package manager.

The pinned minimum_zig_version = "0.15.2" is enforced at the top of build.zig: the build graph emits a @compileError if your local Zig is older. Each dependency hash is content-addressed, so the build is reproducible from this file alone. The commented-out // .v8 = .{ .path = "../zig-v8-fork" } line is the local-checkout escape hatch maintainers use when iterating on the v8 fork itself.

Key takeaway

One package manifest pins every native dependency by content hash. Reading it first tells you what foreign engines the browser links into Zig.

.{
    .name = .browser,
    .version = "1.0.0-dev",
    .fingerprint = 0xda130f3af836cea0, // Changing this has security and trust implications.
    .minimum_zig_version = "0.15.2",
    .dependencies = .{
        .v8 = .{
            .url = "https://github.com/lightpanda-io/zig-v8-fork/archive/refs/tags/v0.4.3.tar.gz",
            .hash = "v8-0.0.0-xddH61GNBABFJ11FJ8KDYXITyjKh4jQ54taEenYek2xJ",
        },
        // .v8 = .{ .path = "../zig-v8-fork" },
        .brotli = .{
2 / 7

src/lightpanda.zig Is the Module Map

src/lightpanda.zig:21

The top-level module reads like a directory listing of the architecture, with App, Network, Browser, Session, Page, Frame, js, and HttpClient as public exports.

Most projects bury the architecture under build configuration. Lightpanda surfaces it. src/lightpanda.zig is the single module the binary, the snapshot creator, and the legacy test all import as @import("lightpanda"). Reading it top to bottom tells you the shape of the system: App is the process root, Network is the curl multi handle, Server is the CDP listener, Browser owns one v8 isolate, Session holds a cookie jar, Page is a root frame plus a v8 context, Frame is a browsing context.

The grouping is intentional. Process and network types come first, then browsing context, then JS engine and dump formats, then wire-protocol pieces (CDP, MCP, cookies). Grep for any of these names elsewhere and you see them imported as const lp = @import("lightpanda"); followed by lp.Frame or lp.js. One file and one import make the whole API reachable.

Key takeaway

Read this file first. It is the directory listing of the architecture, in dependency order: process, network, browsing context, JS engine, output formats.

pub const log = @import("log.zig");
pub const App = @import("App.zig");
pub const Network = @import("network/Network.zig");
pub const Server = @import("Server.zig");
pub const Config = @import("Config.zig");
pub const String = @import("string.zig").String;
pub const Notification = @import("Notification.zig");

pub const URL = @import("browser/URL.zig");
pub const Page = @import("browser/Page.zig");
pub const Frame = @import("browser/Frame.zig");
pub const Browser = @import("browser/Browser.zig");
pub const Session = @import("browser/Session.zig");

pub const js = @import("browser/js/js.zig");
pub const dump = @import("browser/dump.zig");
pub const markdown = @import("browser/markdown.zig");
3 / 7

HTTP Headers Wrap libcurl's Linked List

src/network/http.zig:62

Headers wraps a curl_slist linked list, returning errors when curl_slist_append fails and using errdefer to clean up partial state.

Curl returns null on allocation failure and leaves any list it already built intact, which makes partial-failure cleanup the caller's job. Headers.init answers that with a Zig idiom: each curl_slist_append result is checked, a null becomes error.OutOfMemory, and errdefer libcurl.curl_slist_free_all(header_list) registers the cleanup the moment the first list exists. If the second or third append fails, the errdefer runs and the linked list is freed.

The two appended headers are not optional. The comment on Accept-Language calls out a real failure mode: Akamai's bot-protection rules flag clients that send Accept-Encoding without Accept-Language, so omitting it is what triggers the block. sec-CH-UA is the User-Agent Client Hints variant of the user agent string, also expected on real browser traffic. Lightpanda's Config.HttpHeaders ships defaults that match a recent Chrome build.

Key takeaway

Curl's null-on-failure return becomes a Zig error, errdefer registers the slist cleanup, and the User-Agent / sec-CH-UA / Accept-Language defaults are the difference between real-browser traffic and an instant block.

    pub fn init(user_agent: [:0]const u8) !Headers {
        const header_list = libcurl.curl_slist_append(null, user_agent);
        if (header_list == null) {
            return error.OutOfMemory;
        }
        // libcurl leaves the list intact when curl_slist_append fails, so we own it.
        errdefer libcurl.curl_slist_free_all(header_list);

        // Always add sec-CH-UA header
        const with_sec_ch_ua = libcurl.curl_slist_append(header_list, Config.HttpHeaders.sec_ch_ua);
        if (with_sec_ch_ua == null) {
            return error.OutOfMemory;
        }

        // Always add Accept-Language. Omitting it triggers bot-protection on
        // some CDNs (Akamai) when Accept-Encoding is present.
        const updated_headers = libcurl.curl_slist_append(with_sec_ch_ua, Config.HttpHeaders.accept_language);
        if (updated_headers == null) {
            return error.OutOfMemory;
        }

        return .{ .headers = updated_headers };
    }
4 / 7

html5ever's C ABI: Sixteen Callbacks Per Parse

src/html5ever/lib.rs:35

The Rust parser exposes parse_document as extern "C", taking a callback table that the Zig side implements.

An HTML parser talks to a tree, and the tree is owned by the host. html5ever solves this with the TreeSink trait, but Lightpanda's tree lives in Zig. The bridge is a #[no_mangle] extern "C" wrapper that takes the same operations as a callback table: create an element, append a node, pop, handle a parse error, create a comment, append a doctype, add attributes if missing, reparent children, append before sibling. Sixteen function pointers travel together in one parse call.

Every callback receives the same opaque ctx pointer the Zig side passed in, so the Rust code never knows what the host data structures look like. The real Rust TreeSink implementation in sink.rs is a thin shim that forwards each operation to the matching callback. The crate does the tokenizing, the state machine, and the HTML5 quirks; Zig owns the resulting tree.

Key takeaway

html5ever exposes parse_document as a C function with a sixteen-callback table. Zig owns the DOM; Rust just drives the construction.

#[no_mangle]
pub extern "C" fn html5ever_parse_document(
    html: *mut c_uchar,
    len: usize,
    document: Ref,
    ctx: Ref,
    create_element_callback: CreateElementCallback,
    get_data_callback: GetDataCallback,
    append_callback: AppendCallback,
    parse_error_callback: ParseErrorCallback,
    pop_callback: PopCallback,
    create_comment_callback: CreateCommentCallback,
    create_processing_instruction: CreateProcessingInstruction,
    append_doctype_to_document: AppendDoctypeToDocumentCallback,
    add_attrs_if_missing_callback: AddAttrsIfMissingCallback,
    get_template_contents_callback: GetTemplateContentsCallback,
    remove_from_parent_callback: RemoveFromParentCallback,
    reparent_children_callback: ReparentChildrenCallback,
    append_before_sibling_callback: AppendBeforeSiblingCallback,
    append_based_on_parent_node_callback: AppendBasedOnParentNodeCallback,
) -> () {
5 / 7

Parser.zig: The Other Side of the Bridge

src/browser/parser/Parser.zig:83

The Zig Parser passes its container, itself, and a table of Zig callbacks to the Rust extern function.

The Rust side defines the protocol; the Zig side implements it. Parser.parse takes the HTML bytes, the parser's container (the root node Rust will append into), the parser pointer itself as ctx, and a list of callconv(.c) function pointers. The callbacks are plain Zig functions whose signatures match what extern "c" fn html5ever_parse_document declared in html5ever.zig.

Each callback runs while html5ever is still on the call stack inside parse_document, so any allocation it does (creating a new Element, appending a child) lands in the Zig page arena before the Rust frame returns. By the time parse returns to its caller, the entire DOM is built and the typed_arena Rust used internally has been dropped. Two arenas in two languages produce one tree in one synchronous call.

Key takeaway

Parser.parse hands html5ever the host pointer, the root node, and a Zig callback table. The DOM is built inside one synchronous call, with allocations landing in the page arena.

pub fn parse(self: *Parser, html: []const u8) void {
    h5e.html5ever_parse_document(
        html.ptr,
        html.len,
        &self.container,
        self,
        createElementCallback,
        getDataCallback,
        appendCallback,
        parseErrorCallback,
        popCallback,
        createCommentCallback,
        createProcessingInstruction,
        appendDoctypeToDocument,
        addAttrsIfMissingCallback,
        getTemplateContentsCallback,
        removeFromParentCallback,
        reparentChildrenCallback,
        appendBeforeSiblingCallback,
        appendBasedOnParentNodeCallback,
    );
}
6 / 7

Env.zig: One v8 Isolate Per Browser

src/browser/js/Env.zig:103

Env.init creates a v8 isolate from the snapshot blob, installs error and promise callbacks, and enters the isolate.

v8 is heavy to spin up; a snapshot blob short-circuits most of it. Snapshot.load in App.init reads a precomputed startup heap so the isolate boots with all the built-in objects already in place. Env.init hands that snapshot to v8 through params.snapshot_blob, attaches a default ArrayBuffer allocator, registers the external references the snapshot needs to resolve C++ pointers, and creates the isolate.

The errdefer chain is the interesting shape. Each line that allocates a v8 resource (the params struct, the array buffer allocator, the isolate itself) registers the matching destructor on the next line. If any subsequent step fails (template registration, microtask policy, fatal error handler), every prior allocation is unwound in reverse order. It is the same pattern as the libcurl headers stop, but now it spans a C++ engine, the Zig allocator, and v8's own internal heap.

Key takeaway

v8 boots from a snapshot blob and a Zig allocator, with errdefer chained at every step so partial init never leaks the isolate or the array buffer allocator.

pub fn init(app: *App, opts: InitOpts) !Env {
    if (comptime IS_DEBUG) {
        comptime {
            // V8 requirement for any data using SetAlignedPointerInInternalField
            const a = @alignOf(@import("TaggedOpaque.zig"));
            std.debug.assert(a >= 2 and a % 2 == 0);
        }
    }

    // Initialize class IDs once before any V8 work
    class_id_once.call();

    const allocator = app.allocator;
    const snapshot = &app.snapshot;

    var params = try allocator.create(v8.CreateParams);
    errdefer allocator.destroy(params);
    v8.v8__Isolate__CreateParams__CONSTRUCT(params);
    params.snapshot_blob = @ptrCast(&snapshot.startup_data);

    params.array_buffer_allocator = v8.v8__ArrayBuffer__Allocator__NewDefaultAllocator().?;
    errdefer v8.v8__ArrayBuffer__Allocator__DELETE(params.array_buffer_allocator.?);

    params.external_references = &snapshot.external_references;

    var isolate = js.Isolate.init(params);
    errdefer isolate.deinit();
7 / 7

Runner.zig: Headless Loop, No Compositor

src/browser/Runner.zig:68

The wait loop polls libcurl and v8 queues, sending v8 a moderate memory pressure hint once per second.

A normal browser ties its event loop to vsync, requestAnimationFrame, and a compositor. Lightpanda has none of those, so the loop is a polling timer. _wait calls _tick on a 200ms cadence, draining libcurl's multi handle, processing queued navigations, running pending scripts, and pumping v8's microtask and macrotask queues. There is nothing to render and nothing to composite, so the loop just waits on network and JS.

The piece that does not exist in Chrome is the manual GC hint. Wrappers and external references accumulate on long-running pages because v8 has no reason to think it is under memory pressure when the headless host is not. memoryPressureNotification(.moderate) once per second keeps the 123MB peak from drifting upward. The comment is explicit: without this hint, a page that stays alive for seconds running heavy JS holds onto wrappers v8 should have dropped.

Key takeaway

The headless loop polls libcurl and v8 on a 200ms tick, with a once-per-second moderate memory pressure hint to v8 so long-running pages do not drift upward in RSS.

fn _wait(self: *Runner, comptime is_cdp: bool, opts: WaitOpts) !CDPWaitResult {
    var timer = try std.time.Timer.start();

    const tick_opts = TickOpts{
        .ms = 200,
        .until = opts.until,
    };

    // Periodic V8 GC hint during long waits. V8 is otherwise only nudged on
    // session/page teardown (Browser.zig, Page.zig), so a page that stays
    // alive for seconds while running heavy JS accumulates wrappers and
    // external-ref'd Zig allocations V8 has no reason to drop. `.moderate`
    // speeds up incremental GC without stalling the tick.
    const gc_hint_period_ns: u64 = std.time.ns_per_s;
    var gc_hint_timer = std.time.Timer.start() catch unreachable;

    while (true) {
        if (gc_hint_timer.read() >= gc_hint_period_ns) {
            gc_hint_timer.reset();
            self.frame._page.cleanupClosedPopups();
            self.session.browser.env.memoryPressureNotification(.moderate);
        }

        const tick_result = self._tick(is_cdp, tick_opts) catch |err| {
Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free