Why tar can’t do what zip does — and why that .rar needed a temp file.
In Lesson 1 you reached into a zip and pulled one member out instantly, because the central directory let the reader seek. Try the exact same move on a .tar.gz and it crawls. Try to pipe a .rar into unrar and it flat-out refuses. Both surprises come from one split: random-access vs streaming, and seekable vs not.
Every archive you’ll meet falls on one side of a line:
| Family | Examples | Has an index? | List / grab one member |
|---|---|---|---|
| Random-access | .zip, .rar | Yes — central directory | Cheap: seek to it |
| Streaming | .tar, .gz, .tar.gz | No | Costly: read front-to-back |
tar — short for tape archive — was designed for magnetic tape: a sequential device you read front-to-back. So it has no central directory at all. Each member is just a header (name, size, timestamps) immediately followed by its bytes, then the next header, and so on.1 There is no index to jump to — “no way of knowing how many files a tar archive contains unless the whole archive is traversed.”1
.tar.gz makes it worse: one solid streamCompress a tar with gzip and you wrap the entire archive in a single compressed stream. Now the data isn’t just index-less — it’s solid: “to find the 50th file, you must uncompress and read files 1 through 49 first.”1 So tar -tzf big.tar.gz has to decompress and scan the whole thing just to list it. There is no cheap “jump to one member” — the structure to jump with doesn’t exist.
.tar.gz looks just like a 2 GB .zip in your file manager. But “give me one file from it” is a seek in the zip and a full decompress-scan in the tar.gz. Format dictates cost.
The three verbs from Lesson 1 still exist for streaming formats. They work; they’re simply doing a front-to-back read under the hood:
# LIST — walks the whole archive (no index to read)
tar -tf archive.tar
tar -tzf archive.tar.gz # add z for gzip-compressed
# STREAM ONE member to stdout — note -O (capital o = "to stdout")
tar -xO -f archive.tar path/in/archive.csv | head
tar -xzO -f archive.tar.gz path/in/archive.csv | head
# BULK extract to a directory
tar -xf archive.tar -C out/
tar -xzf archive.tar.gz -C out/
# A lone .gz wraps ONE file — just decompress its single stream to stdout
zcat prices.csv.gz | head # zcat == gunzip -c == gzip -dc
gzip -dc prices.csv.gz | wc -l
-O is the tar equivalent of unzip -p: decompress one member straight to stdout, nothing written to disk.2 And zcat is the whole story for a single .gz — gzip compresses one stream, so there’s no member to pick; you just decompress it.3
.rar needed a temp fileNow the payoff. Remember the July data: .rar files sitting inside a .zip. I couldn’t pipe each rar into unrar; I had to write it to a temp file first. Here’s exactly why.
A pipe is sequential-only — you can read the bytes flowing past, but you can’t seek backwards or jump ahead. A random-access reader (unzip, unrar) needs to seek: to the index at the end, then back to a member’s offset. So it needs a seekable file — a real file on disk the OS can jump around in. A pipe can’t provide that, so unrar refuses stdin.
| You have… | Reader needs… | Through a pipe? | So you must… |
|---|---|---|---|
| tar / gz stream | sequential read | ✅ works | just pipe it (zcat … |) |
| zip / rar (nested) | seek (random access) | ❌ no seek in a pipe | spill to a temp file, then read |
The rule of thumb: streaming formats flow through pipes; random-access formats often need a seekable file. When a seekable-only tool meets a pipe, you give it a temp file — that one workaround is forced by the format, not a quirk of the tool. (In code you sometimes dodge even that, by handing the reader an in-memory seekable buffer — which is exactly Lesson 3.)
You can now classify any archive before touching it: random-access (zip/rar — seek, cheap to grab one) or streaming (tar/gz — walk, costly to grab one), reach into the streaming ones with tar -O / zcat, and you can explain the seekable rule that forces a temp file when a pipe meets unrar. Format → cost → the right verb.
Retrieve, don’t re-read. Answer from memory; feedback is instant.
Wikipedia — tar (computing), the “Format details” and limitations. The clearest account of why tar has no index and what that costs. ~8 minutes. For the verbs, the authoritative reference is the GNU tar manual — Extracting Specific Files (and GNU gzip manual for zcat).
unzip -l on a big zip vs tar -tzf on a big .tar.gz and ask me to explain the gap. Or hand me a real archive and I’ll tell you which family it’s in and the cheapest way to reach into it.
.tar.gz.-O/--to-stdout) and Extracting Specific Files.zcat = gunzip -c = gzip -dc: decompress a single stream to stdout.