TRK-013 · FILE-FORMATS / TOOLING

PNG chunks, annotated by hand

DATE2026-05-30
READ~6 min
REV2026-07-03
TAGSfile-formats, tooling

Open any PNG in a hex viewer and the first 8 bytes are always the same:

89 50 4E 47 0D 0A 1A 0A

That's the signature, and it is not really a header. It's institutional trauma, rendered in hexadecimal. Every one of those eight bytes exists because of a specific way files got mangled in transit in the early nineties, and the PNG authors decided that a corrupted file should fail loudly, at byte one, instead of decoding into modern art.

To walk the whole format we need a specimen. Here's the smallest interesting PNG I could produce — one red pixel, 69 bytes total, generated with the script at the end of this post:

00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
00000010: 0000 0001 0000 0001 0802 0000 0090 7753  ..............wS
00000020: de00 0000 0c49 4441 5478 da63 f8cf c000  .....IDATx.c....
00000030: 0003 0101 00f7 0341 4300 0000 0049 454e  .......AC....IEN
00000040: 44ae 4260 82                             D.B`.

Sixty-nine bytes to say "red". We'll account for all of them.

The signature, byte by paranoid byte

BytesValueWhat it's afraid of
89high bit set7-bit channels (old email gateways) that strip bit 7
50 4E 47PNGyou, squinting at a hex dump, wondering what this file is
0D 0ACR LF"text mode" transfers converting CRLF → LF
1ACtrl-ZMS-DOS TYPE printing the whole file to your terminal
0ALFthe reverse conversion, LF → CRLF

The line-ending bytes are the clever ones. If you ever FTP'd a binary in ASCII mode by accident, you know exactly which transfer bug bytes 5, 6 and 8 are lying in wait for: any tool that "helpfully" translates line endings will alter them, the signature check fails, and the decoder rejects the file immediately instead of five megabytes later. The 1A is pure courtesy to 1995: under DOS, Ctrl-Z stops the TYPE command, so listing a PNG to your console printed four characters of garbage instead of ten minutes of beeping.

The chunk model

After the signature, a PNG is nothing but chunks, laid end to end until the file stops:

FieldSizeNotes
length4 bytesbig-endian, data only
type4 bytesASCII, case carries meaning
datan bytesdepends on type
crc4 bytesover type + data

That's the entire container format. No global table of contents, no offsets, no version negotiation. You read a length, a name, the payload, a checksum; repeat until you hit the end marker. A PNG parser's outer loop fits on an index card, which is exactly why the format has quietly outlived half a dozen "better" ones.

Note that length counts the data only — not the type or CRC — so every chunk costs 12 bytes of overhead. And it's capped at 2³¹−1, because the spec politely declines to trust anyone's signed-integer handling.

IHDR — 13 bytes of commitment

The first chunk must be IHDR. Ours reads:

00 00 00 0D  49 48 44 52   length = 13, type = "IHDR"
00 00 00 01               width  = 1
00 00 00 01               height = 1
08                        bit depth = 8
02                        colour type = 2 (truecolour RGB)
00                        compression = 0
00                        filter = 0
00                        interlace = 0
90 77 53 DE               CRC-32

Width and height are 32-bit, so PNG supports images two billion pixels on a side. My monitor does not, but it's nice that the format is ready.

The colour type is secretly a bitfield: 1 = palette, 2 = colour, 4 = alpha. Valid combinations give you types 0, 2, 3, 4 and 6. Types 1 and 5 are illegal combinations — a palette bit with nothing to index, roughly — so PNG's colour types simply skip two numbers and everyone has agreed not to talk about it.

The compression field has exactly one defined value, 0, meaning zlib/deflate. It has held that value since 1996. It is a field reserved for a future that never arrived, which makes it the format's most honest byte.

Four boolean flags, hidden in capitalization

Chunk names look like words, but the case of each letter is a flag bit (bit 5 of the ASCII code, the one that toggles case):

  1. First letter — uppercase means critical (refuse to render without it), lowercase means ancillary (skippable).
  2. Second letter — uppercase means a public, registered chunk; lowercase means private.
  3. Third letter — reserved, always uppercase. A flag bit saved for later, like the compression byte.
  4. Fourth letter — lowercase means safe to copy if you edit the image without understanding this chunk.

So IHDR is critical/public, and tEXt (comments) is ancillary, public, safe to copy. Hiding four booleans inside the capitalization of an identifier is the kind of trick that would get you removed from a code review today, and yet here it's genuinely elegant: a dumb 1996 image editor could copy chunks it had never heard of and know whether that was safe, from the name alone.

NOTE — This is also forward compatibility that actually worked. When APNG animation chunks (acTL, fcTL, fdAT) were bolted on decades later, every existing decoder skipped them correctly and rendered the first frame. No version bump, no flag day.

CRCs absolutely everywhere

Every chunk ends with a CRC-32 over its type and data (not the length), using the same polynomial as ZIP and Ethernet. This is the part that reads as excessive today: the pixel data inside IDAT is a zlib stream, and zlib already ends with its own Adler-32 checksum. Your pixels are checksummed twice, by two different algorithms, in the same file.

The nineties did not trust your modem. Honestly — fair.

The practical win is that per-chunk CRCs localize damage. A corrupted ancillary chunk can be detected and skipped without decompressing anything, and a decoder can tell you which chunk is bad instead of shrugging "file corrupt".

IDAT — where the pixel lives

00 00 00 0C  49 44 41 54   length = 12, type = "IDAT"
78 DA                      zlib header (deflate, 32K window, max compression)
63 F8 CF C0 00 00          deflate-compressed data
03 01 01 00                Adler-32 of the uncompressed bytes
F7 03 41 43                CRC-32 of the chunk

What got compressed is four bytes: 00 FF 00 00 — a filter byte (0 = None) followed by one RGB pixel of pure red. Every scanline in a PNG starts with a filter byte choosing one of five predictors (None, Sub, Up, Average, Paeth — the last named after Alan Paeth, one of the few people with an algorithm in every image on the web and almost no name recognition). Filters are why PNG compresses gradients well: you compress the prediction error, not the pixels. For one lonely pixel, filtering has nothing to predict, so: None.

Yes, we spent 12 bytes of zlib framing to compress 4 bytes into 6. Compression is a long game.

IEND — the most computed constant in image history

00 00 00 00  49 45 4E 44  AE 42 60 82

IEND has zero data bytes, so its CRC only covers the four letters of its own name — meaning it is the same in every PNG ever made. Every image on every webpage you have ever loaded ends with these exact 12 bytes, and billions of decoders have dutifully re-verified AE 42 60 82 billions of times, always with success. Somewhere, a CRC unit deserves a long-service award.

Reproduce it

The specimen above, in 15 lines of Python:

import zlib, struct, binascii

def chunk(ctype, data):
    c = struct.pack(">I", len(data)) + ctype + data
    return c + struct.pack(">I", binascii.crc32(ctype + data))

sig  = bytes([0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A])
ihdr = struct.pack(">IIBBBBB", 1, 1, 8, 2, 0, 0, 0)   # 1x1, 8-bit, truecolour
idat = zlib.compress(b"\x00\xff\x00\x00", 9)          # filter byte + one red pixel

with open("red.png", "wb") as f:
    f.write(sig + chunk(b"IHDR", ihdr) + chunk(b"IDAT", idat) + chunk(b"IEND", b""))

Run it, xxd red.png, and compare against the dump at the top. Then go read the PNG specification — it's short, free, and includes the rationale sections where the authors explain, with visible weariness, exactly which disasters each byte is guarding against.