archive.tar #
Description
tar
is a module to access tar archives.
Tape archives (tar) are a file format for storing a sequence of files that can be read and written as streams. This module covers the reading of the basic sections of archives produced by GNU tools like Linux command tar -xvf
but in memory instead modifing the filesystem. Parses directories, files, and file's content and manage paths longer than 100 chars.
Read Efficiency
An entire tar file can be read in memory or by chunks. Keeps in memory a single decompressed chunk of 32 KB at a time and also keeps in memory a single tar block of 512 bytes at a time. Convert paths to strings until needed and the user reader implementation can stop early the reading process.
Read Example
The tar blocks are parsed and some fields are passed to Reader
implemented methods.
import os
import archive.tar
fn main() {
os.chdir(@) or {}
path := 'archive/tar/testdata/life.tar.gz'
reader := tar.new_debug_reader()
tar.read_tar_gz_file(path, reader)!
}
Look also in examples
folder the tar_gz_reader.v
program.
fn new_debug_reader #
fn new_debug_reader() &DebugReader
new_debug_reader returns a DebugReader
fn new_decompressor #
fn new_decompressor(untar &Untar) &Decompressor
new_decompressor returns a Decompressor to decompress a tar.gz file A given Untar with a registered Reader will read the blocks.
fn new_untar #
fn new_untar(reader Reader) &Untar
new_untar builds a untar with a given Reader.
fn read_tar_gz_file #
fn read_tar_gz_file(path string, reader Reader) !
read_tar_gz_file decompresses a given local file and reads all the blocks with a given reader.
interface Reader #
interface Reader {
mut:
// dir_block is called when untar reads a block of type directory.
// Call `Read.get_path()` to get the full name of the directory.
// `size` field is zero for directories.
// The implementor can set Read's field `stop_early` to suspend the reader.
dir_block(mut read Read, size u64)
// file_block is called when untar reads a block of type filename.
// Call `Read.get_path()` to get the full name of the file.
// `size` is the expected file size in bytes to be read later.
// The implementor can set Read's field `stop_early` to suspend the reader.
file_block(mut read Read, size u64)
// file_block is called when untar reads a block of type filedata.
// Call `Read.get_path()` to get the full name of the file data belongs to.
// The `data` size is 512 bytes or less. `pending` indicates how many bytes are left to read.
// The implementor can inspect the data and use the pending value
// to set Read's field `stop_early` to suspend the reader.
data_block(mut read Read, data []u8, pending int)
// other_block is called when untar reads a block type other than directory,
// filename or filedata. `Read.get_header()` and 'details' give more info about the block.
// `block device` or `FIFO`.
// The implementor can set Read's field `stop_early` to suspend the reader.
other_block(mut read Read, details string)
}
Reader is used to read by Untar to parse the blocks.
enum BlockHeader #
enum BlockHeader as u8 {
file = u8(`0`) // 0x30
hard_link = u8(`1`) // 0x31
sym_link = u8(`2`) // 0x32
char_dev = u8(`3`) // 0x33
block_dev = u8(`4`) // 0x34
dir = u8(`5`) // 0x35
fifo = u8(`6`) // 0x36
long_name = u8(`L`) // 0x4c = 76 dec
global = u8(`g`) // 0x67 pax
}
ustart header block octets Field | Offset | Length
name | 0 | 100 mode | 100 | 8 uid | 108 | 8 gid | 116 | 8 size | 124 | 12 mtime | 136 | 12 chksum | 148 | 8 typeflag | 156 | 1 linkname | 157 | 100 magic | 257 | 6 version | 263 | 2 uname | 265 | 32 gname | 297 | 32 devmajor | 329 | 8 devminor | 337 | 8 prefix | 345 | 155
enum BlockSpecial #
enum BlockSpecial {
no // for headers `0`,`5` or data blocks
blank_1 // first blank block: continue
blank_2 // second blank block: end of archiv
ignore // for headers `1`, `2`, `3`, `4`, `6`
long_name // for header `L`
global // for header `g`
unknown // for not header defined
}
enum ReadResult #
enum ReadResult {
@continue
stop_early
end_of_file
end_archive
overflow
}
ReadResult is returned by ReadResultFn
struct DebugReader #
struct DebugReader implements Reader {
}
DebugReader implements a Reader and prints rows for blocks read as directories, files, file data blocks and special blocks.
struct Decompressor #
struct Decompressor {
mut:
untar &Untar
}
fn (Decompressor) read_all #
fn (mut d Decompressor) read_all(tar_gz []u8) !ReadResult
read_all decompresses the given tar_gz
array with all the tar blocks. Then calls untar method read_all
to read all the blocks at once. A read result is returned which can be of the type stop early or an error.
fn (Decompressor) read_chunks #
fn (mut d Decompressor) read_chunks(tar_gz []u8) !ReadResult
read_chunks decompresses the given tar_gz
array by chunks of 32768 bytes which can hold up to 64 tar blocks of 512 bytes each. Then calls untar method read_block with ChunksReader dispatcher. A read result is returned which can be of the type stop early or an error.
struct Read #
struct Read {
mut:
block_number int
special BlockSpecial
prefix_len int
prefix_buf [131]u8
separator bool
path_len int
path_buf [100]u8
long_path &LongPath = unsafe { nil }
pub mut:
stop_early bool
}
Read is used by Untar to call Reader implemented methods. The implementor can read the block's get_block_number()
and get_path()
and can set the field stop_early
to true to suspend the reading.
fn (Read) get_path #
fn (b Read) get_path() string
get_path returns the path of this read. The path is valid for blocks of types directory, file and file data.
fn (Read) get_block_number #
fn (b Read) get_block_number() int
get_block_number returns the consecutive number of this read.
fn (Read) get_special #
fn (b Read) get_special() BlockSpecial
get_special returns the special type of the Read.
fn (Read) str #
fn (r Read) str() string
str returns a string representation with block number, path, special type and stop early.
struct Untar #
struct Untar {
mut:
reader Reader
max_blocks int
buffer [512]u8 // data to parse block
read Read // last read to send/receive to/from reader implementation
state State // true when reading data blocks or long names
size int // remaining data size during state_data
long_path &LongPath = unsafe { nil } // not nil to hold a file long_name
blank_block int = -1 // last no-data block with all-zeros
}
Untar uses a reader to parse the contents of a unix tar file. Reuses a fixed array of 512 bytes to parse each TAR block.
fn (Untar) str #
fn (u Untar) str() string
str returns a string representation with max_blocks and last read.
fn (Untar) read_all_blocks #
fn (mut u Untar) read_all_blocks(blocks []u8) !ReadResult
read_all_blocks parses the data blocks of any decompressed *.tar.gz array. The data blocks length must be divisible by 512.
fn (Untar) read_single_block #
fn (mut u Untar) read_single_block(block []u8) !ReadResult
read_single_block parses one data block at a time. The data block length must be 512. Two consecutive no data blocks have 512 zeroes returns a .end_archive result.