tar wo kyun nahi kar sakta jo zip karta hai — aur wo .rar ko temp file kyun chahiye thi.
Lesson 1 mein aapne ek zip mein haath daal kar ek member turant nikaal liya, kyunki central directory ne reader ko seek karne diya. Bilkul wahi move ek .tar.gz par karo aur wo reng-reng ke chalta hai. Ek .rar ko unrar mein pipe karne ki koshish karo aur wo saaf mana kar deta hai. Dono surprises ek hi split se aate hain: random-access vs streaming, aur seekable vs not.
Har archive jo aapko milega ek line ke kisi ek taraf girta hai:
| Family | Examples | Index hai? | List / ek member nikaalna |
|---|---|---|---|
| Random-access | .zip, .rar | Haan — central directory | Sasta: us par seek |
| Streaming | .tar, .gz, .tar.gz | Nahi | Mehnga: front-to-back padho |
tar — yaani tape archive — magnetic tape ke liye banaya gaya tha: ek sequential device jise aap front-to-back padhte ho. Toh iske paas central directory hoti hi nahi. Har member bas ek header hai (name, size, timestamps) jiske turant baad uske bytes aate hain, phir agla header, aur aise hi.1 Jump karne ke liye koi index hai hi nahi — “no way of knowing how many files a tar archive contains unless the whole archive is traversed.”1
.tar.gz aur bigaad deta hai: ek solid streamEk tar ko gzip se compress karo aur aap poore archive ko ek single compressed stream mein lapet dete ho. Ab data sirf index-less nahi hai — wo solid hai: “to find the 50th file, you must uncompress and read files 1 through 49 first.”1 Toh tar -tzf big.tar.gz ko sirf list karne ke liye poori cheez decompress aur scan karni padti hai. Koi sasta “jump to one member” nahi hai — jisse jump karein wo structure maujood hi nahi.
.tar.gz aapke file manager mein ek 2 GB .zip jaisa hi dikhta hai. Par “mujhe ismein se ek file do” zip mein ek seek hai aur tar.gz mein poora decompress-scan. Format hi cost decide karta hai.
Lesson 1 ke teen verbs streaming formats ke liye bhi maujood hain. Wo kaam karte hain; bas andar-andar ek front-to-back read kar rahe hote hain:
# LIST — walks the whole archive (no index to read)
tar -tf archive.tar
tar -tzf archive.tar.gz # add z for gzip-compressed
# STREAM ONE member to stdout — note -O (capital o = "to stdout")
tar -xO -f archive.tar path/in/archive.csv | head
tar -xzO -f archive.tar.gz path/in/archive.csv | head
# BULK extract to a directory
tar -xf archive.tar -C out/
tar -xzf archive.tar.gz -C out/
# A lone .gz wraps ONE file — just decompress its single stream to stdout
zcat prices.csv.gz | head # zcat == gunzip -c == gzip -dc
gzip -dc prices.csv.gz | wc -l
-O tar ka unzip -p wala equivalent hai: ek member seedha stdout par decompress, disk par kuch nahi.2 Aur ek akele .gz ke liye zcat hi poori kahani hai — gzip ek stream compress karta hai, toh chunne ke liye koi member hi nahi; aap bas usse decompress kar dete ho.3
.rar ko temp file kyun chahiye thiAb asli faayda. July ka data yaad karo: .rar files ek .zip ke andar. Main har rar ko unrar mein pipe nahi kar saka; mujhe pehle usse ek temp file mein likhna pada. Yeh raha exactly kyun.
Ek pipe sequential-only hai — aap behte hue bytes padh sakte ho, par seek karke peeche ya aage nahi jump kar sakte. Ek random-access reader (unzip, unrar) ko seek karna padta hai: end wale index tak, phir member ke offset tak wapas. Toh use ek seekable file chahiye — disk par ek real file jisme OS jump kar sake. Ek pipe wo de nahi sakti, isliye unrar stdin mana kar deta hai.
| Aapke paas hai… | Reader ko chahiye… | Pipe ke through? | Toh aapko… |
|---|---|---|---|
| tar / gz stream | sequential read | ✅ chalta hai | bas pipe karo (zcat … |) |
| zip / rar (nested) | seek (random access) | ❌ pipe mein seek nahi | temp file mein likho, phir padho |
Rule of thumb: streaming formats pipes ke through behte hain; random-access formats ko aksar ek seekable file chahiye. Jab ek seekable-only tool ek pipe se milta hai, aap use ek temp file de dete ho — wo ek workaround format ki wajah se forced hai, tool ki kisi ada se nahi. (Code mein aap kabhi-kabhi wo bhi bacha lete ho, reader ko ek in-memory seekable buffer de kar — jo exactly Lesson 3 hai.)
Ab aap kisi bhi archive ko chhune se pehle classify kar sakte ho: random-access (zip/rar — seek, ek nikaalna sasta) ya streaming (tar/gz — walk, ek nikaalna mehnga), streaming waalon mein tar -O / zcat se haath daal sakte ho, aur wo seekable rule samjha sakte ho jo pipe-meets-unrar par temp file force karta hai. Format → cost → sahi verb.
Retrieve karo, dobara mat padho. Memory se jawaab do; feedback turant.
Wikipedia — tar (computing), “Format details” aur limitations. Tar ke paas index kyun nahi aur uski cost kya hai — iska sabse saaf account. ~8 minutes. Verbs ke liye authoritative reference hai GNU tar manual — Extracting Specific Files (aur zcat ke liye GNU gzip manual).
unzip -l aur ek bade .tar.gz par tar -tzf time karo aur mujhse gap samjhne ko kaho. Ya koi real archive do aur main bataunga wo kis family mein hai aur usme haath daalne ka sabse sasta tareeka kya hai.
.tar.gz.-O/--to-stdout) aur Extracting Specific Files.zcat = gunzip -c = gzip -dc: ek single stream ko stdout par decompress.