Reaching Into Archives · Lesson 4 of the course
🌐 हिंग्लिश version →

The Field Guide

The whole course as one decision flow, one cheat sheet, and two tools that cover everything.

You’ve got the model now: the index makes selective extraction possible, the streaming/seekable split says when it’s cheap and when it isn’t, and code adds the in-memory buffer. This capstone compresses all of it into a field guide you’ll actually keep — and ends with a task to lock it in.

Your mission Fluency is recall under time pressure. The point of a field guide is that you don’t re-derive the answer at 11pm with a 2 GB archive in front of you — you glance, decide, type.

The decision flow

One archive, one question at a time:

You have an archive Need the WHOLE thing? yes BULK extract unzip -d · tar -xf · 7z x no — just some files Random-access? (zip / rar) yes → seek no → walk zip / rar → unzip -p · unrar p one seek — cheap tar / gz → tar -O · zcat sequential walk In code → zipfile / tarfile · nested zip → BytesIO (no temp file) shelling out to a tool that needs a path (e.g. unrar) → spill to a temp file
Four questions — whole or part, which family, in code, through a pipe — settle every case in the course.

The cheat sheet

The three verbs across every format you’ll meet. Bookmark this one.

FormatListStream one → stdoutBulk extract
.zipunzip -l a.zipunzip -p a.zip funzip -d out/ a.zip
.rarunrar l a.rarunrar p a.rar funrar x a.rar out/
.tartar -tf a.tartar -xO -f a.tar ftar -xf a.tar -C out/
.tar.gztar -tzf a.tgztar -xzO -f a.tgz ftar -xzf a.tgz -C out/
.gz (one file)zcat f.gzgunzip f.gz
.7z7z l a.7z7z e -so a.7z f7z x a.7z
any (universal)bsdtar -tf a.*bsdtar -xOf a.* fbsdtar -xf a.*

Two tools that cover (almost) everything

Don’t want to remember per-format syntax? Two readers swallow most formats behind one interface:

The catch worth remembering from Lesson 2: a universal tool still obeys the format’s physics. bsdtar -tf on a .tar.gz is still a sequential walk; on a .zip it’s still a seek. One interface — not one cost.

Capstone — reach into your own archive

This is the part that builds the skill. Do it now on a real archive (one of your BSE zips is perfect). Click each step as you finish it:

Course win

You started by asking “how did you extract those without unzipping them?” You can now answer it cold — the index, the seek, the streaming split, the in-memory buffer — and reach into any .zip, .rar, .tar(.gz), .7z three ways, choosing the cheap one on sight. That’s the fluency the mission was after.

Recall check — everything, interleaved

This one mixes all four lessons on purpose — interleaving is what proves the knowledge is yours, not just fresh. Retrieve each from memory.

Primary source — keep these two

libarchive / bsdtar — the universal reader; skim the front page on automatic format detection. And the 7-Zip command-line manual for l / x / e -so. Together they’re the two-tool kit behind the cheat sheet above.

I’m your teacher — use me. Did the capstone throw an error, or did the timing surprise you? Paste what happened. And when you’re ready, ask me to design Lesson 5 — encrypted archives, zip64 for >4 GB, or wiring all of this into your backtester’s loader. You pick the direction.
Lesson 3 · 📖 Glossary 🎉 Course complete — pick Lesson 5 with your teacher

Sources

  1. 7-Zip manual — -so (write data to stdout) and the command reference (l list, e/x extract).
  2. libarchive — bsdtar. “On read, compression and format are always detected automatically, and the same API is used for all formats.” Reads tar, zip, 7-zip, cpio, iso, and more. See also the bsdtar(1) man page.