Aap ek file ko zip se bina baaki sab unpack kiye kaise nikaal sakte ho.
Ek ghanta pehle maine 17 archives mein haath daala aur har ek se ek hi file nikaali β ek single SENSEX.csv β jabki har archive mein ~150 index CSVs the, plus futures aur options ka dher. Maine unhe unpack nahi kiya. Gigabytes compressed hi rahe; maine sirf wahi file chui jo chahiye thi.
Yeh koi jugaad nahi hai β yeh ek structural fact hai ki ek zip kaise bana hota hai. Bas ek fact samajh lo aur poori technique usi se nikal aati hai: aap archives ko unpack karna band kar dete ho aur unme haath daalna shuru kar dete ho. Yeh lesson wahi fact hai.
Ek zip koi folder nahi hai. Wo ek file hai: har member back-to-back stored, aur phir β bilkul end mein β ek central directory: ek list jisme har member ka ek record hai, jo uska naam, size, aur β sabse important β uska byte offset file ke andar batata hai.1
Toh ek reader poora archive kabhi scan nahi karta. Wo end par jump karta hai, End of Central Directory record padhta hai, usse index tak pahunchta hai, jo naam aapne maanga usse dhoondhta hai, aur seedha un bytes par seek karta hai.2 Baaki 149 members kabhi padhe hi nahi jaate.
Kyunki index offsets store karta hai, aapko random access milta hai β kisi bhi byte par jump karne ki ability, bina yeh padhe ki uske pehle kya aaya. Faayda concrete hai: plain sequential reading se aap, average mein, ek file dhoondhne ke liye aadha archive scan karte; index se aap sirf index aur wahi file padhte ho.2
Yeh ek property do operations ko alag kar deti hai jinhe log aksar gadd-madd kar dete hain:
Har archive tool wahi teen verbs deta hai. Inhe ek set ki tarah seekho β list, stream-one, bulk β aur aap zip aur rar dono mein ek jaise haath daal sakte ho:
# 1 Β· LIST β read only the index (the central directory). Always do this first.
unzip -l data.zip # names + sizes, instantly
unrar l data.rar
# 2 Β· STREAM ONE member to stdout β decompress just that file, nothing to disk
unzip -p data.zip SENSEX.csv > SENSEX.csv
unrar p data.rar SENSEX.csv > SENSEX.csv
# 3 Β· BULK extract everything to a folder β the default, usually more than you need
unzip -d out/ data.zip
unrar x data.rar out/
Verb 2 wahi hai jo aapke kaam karne ka tareeka badal deta hai. -p / p member ke bytes ko stdout par bhej dete hain, file likhne ke bajaaye3 β toh aap usse seedha agle program mein pipe kar sakte ho aur ek bhi temp file likhe bina kaam ho jaata hai:
# Peek at the first rows of one member inside a 50 MB archive
unzip -p data.zip SENSEX.csv | head
# Count rows, or feed straight into awk / a program β no extraction step
unzip -p data.zip SENSEX.csv | wc -l
unzip -p data.zip SENSEX.csv | awk -F, '$2 > 80000'
Yahi cheez us extraction ko chala rahi thi jo aapne dekhi: zip mahino ke liye unzip -p, rar mahino ke liye unrar p β har archive se ek member stream hua, baaki ~149 untouched. Extract gigabytes files likh deta jinhe main phir delete karta. Stream ne sirf wahi likha jo maine maanga.
Ab aap (1) bata sakte ho ki archive se ek file nikaalna sasta kyun hai β central directory random access deti hai β aur (2) teen tareekon se haath daal sakte ho: list ke liye -l, ek member ko pipe mein stream karne ke liye -p, bulk-extract ke liye -d. Aapne βpoori cheez unzip karoβ sochna chhod diya aur βjis file ko chahiye us par seek karoβ sochna shuru kar diya.
Dobara mat padho β retrieve karo. Effortful recall hi isse aisi memory banata hai jo agle hafte tak rahegi. Apne dimaag se jawaab do; feedback turant milega.
Wikipedia β ZIP (file format), βStructureβ section. Central directory, EOCD record, aur index end mein hone se random access kaise milta hai β iska sabse saaf accessible account. ~10 minutes. Byte-level authoritative detail ke liye spec khud hai PKWARE ka APPNOTE.TXT.
.tar.gz, ek node_modules tarball)? Path paste karo aur main usi par -l / -p dikhaata hoon. Curious ho ki .tar.gz yeh sasta kyun nahi kar sakta, ya wo nested .rar ko temp file kyun chahiye thi? Wo Lesson 2 hai β par abhi pooch lo agar khatak raha hai.
-p extracts to stdout (pipe); -l lists; -d sets the extraction directory.