Giter Club home page Giter Club logo

microtar's Introduction

microtar

A lightweight tar library written in ANSI C

Basic Usage

The library consists of microtar.c and microtar.h. These two files can be dropped into an existing project and compiled along with it.

Reading

mtar_t tar;
mtar_header_t h;
char *p;

/* Open archive for reading */
mtar_open(&tar, "test.tar", "r");

/* Print all file names and sizes */
while ( (mtar_read_header(&tar, &h)) != MTAR_ENULLRECORD ) {
  printf("%s (%d bytes)\n", h.name, h.size);
  mtar_next(&tar);
}

/* Load and print contents of file "test.txt" */
mtar_find(&tar, "test.txt", &h);
p = calloc(1, h.size + 1);
mtar_read_data(&tar, p, h.size);
printf("%s", p);
free(p);

/* Close archive */
mtar_close(&tar);

Writing

mtar_t tar;
const char *str1 = "Hello world";
const char *str2 = "Goodbye world";

/* Open archive for writing */
mtar_open(&tar, "test.tar", "w");

/* Write strings to files `test1.txt` and `test2.txt` */
mtar_write_file_header(&tar, "test1.txt", strlen(str1));
mtar_write_data(&tar, str1, strlen(str1));
mtar_write_file_header(&tar, "test2.txt", strlen(str2));
mtar_write_data(&tar, str2, strlen(str2));

/* Finalize -- this needs to be the last thing done before closing */
mtar_finalize(&tar);

/* Close archive */
mtar_close(&tar);

Error handling

All functions which return an int will return MTAR_ESUCCESS if the operation is successful. If an error occurs an error value less-than-zero will be returned; this value can be passed to the function mtar_strerror() to get its corresponding error string.

Wrapping a stream

If you want to read or write from something other than a file, the mtar_t struct can be manually initialized with your own callback functions and a stream pointer.

All callback functions are passed a pointer to the mtar_t struct as their first argument. They should return MTAR_ESUCCESS if the operation succeeds without an error, or an integer below zero if an error occurs.

After the stream field has been set, all required callbacks have been set and all unused fields have been zeroset the mtar_t struct can be safely used with the microtar functions. mtar_open should not be called if the mtar_t struct was initialized manually.

Reading

The following callbacks should be set for reading an archive from a stream:

Name Arguments Description
read mtar_t *tar, void *data, unsigned size Read data from the stream
seek mtar_t *tar, unsigned pos Set the position indicator
close mtar_t *tar Close the stream

Writing

The following callbacks should be set for writing an archive to a stream:

Name Arguments Description
write mtar_t *tar, const void *data, unsigned size Write data to the stream

License

This library is free software; you can redistribute it and/or modify it under the terms of the MIT license. See LICENSE for details.

microtar's People

Contributors

deadcast2 avatar rxi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microtar's Issues

An empty archive can be written but not read back

Steps:

  1. Write zero files to an archive. The archive file is created successfully.
  2. Use mtar_open() to read the archive file.

Expected result: mtar_open() succeeds.
Actual result: mtar_open() fails with the "null record" error.

After writing N files into an archive (where N can possibly be zero), one would expect to read that archive and extract N files from it without a failure. Presently, when reading an archive, one has to check whether the tar file is empty (i.e., the file is 1KB of zeros, which is the end-of-archive entry). It would be nice if mtar_open() could handle this special case without failing.

unable to use on tar.xz

I am getting checksum and other errors when trying to read a tar.xz.

Is it not supported ?

Archiving gz files

When attempting to archive an already tar.gz compressed file, it corrupts it and then I fail to extract.

Support writing files without knowing their size

Currently it seems you must know the size of the data going into the tar but in some cases where I'm porting IO to go into a tar, I don't know the file size and I'd prefer to avoid buffering.

Is is trivial to enhance this to keep updating the size as I write more data? I see some code to start writing null after size remaining is 0, perhaps I could simply modify that line to keep going?

Also would be cool if I could write multiple files at the same time (not multi-threaded though) by getting a descriptor when writing the file header. The API would then have you pass that descriptor to the write_data functions.

Appreciate the hard work :)

No support for tarballs larger than 4GB

The implementation implicitly uses unsigned, etc which is often compiled as 32-bit words. In this case offsets will silently wrap for tarballs longer than 4GB and cause errors.

The easiest fix would be to add a separate method (mtar_add_pos(unsigned pos, unsigned offset)) which checks that the position doesn't wrap silently while doing the addition. Support for a 64-bit implementation would be nice as well :)

Finalize prohibits writing more later

I noticed that if I use mtar_finalize() the tar cannot be opened again for writing later. The files do not appear inside but the tar size grows.

If I skip calling finalize I can open the tar for writing in append mode and write more files to it without any problems. My tar clients seem to even be able to open them.

Is finalize truly needed? Should I just try to call it at the end of my process which opens and closes the tar multiple times to append it's data. Or as a compromise can I just "unfinalize" by removing the last two null records? Why doesn't the library do that?

makefile and test.c

Please add file with main() function and makefile like this

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..a3f6503
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,10 @@
+*.tmp
+*.o
+*.*~
+*.d
+*.dSYM
+*.swp
+
+test
+*.tar
+*.txt
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..ef8d87e
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,20 @@
+CC  = gcc
+BIN = test
+
+# Flags
+CFLAGS += -std=c99 -Wall -pedantic -pedantic-errors -O2
+
+SRC = $(BIN).c
+OBJ = $(SRC:.c=.o)
+
+$(BIN): ./src/microtar.o $(BIN).c
+	rm -f $(BIN) $(OBJS)
+	$(CC) $(SRC) -I./src $(CFLAGS) -o $(BIN) $(LIBS) ./src/microtar.o
+
+./src/microtar.o: ./src/microtar.c ./src/microtar.h
+	$(CC) $(CFLAGS) -c -o ./src/microtar.o ./src/microtar.c
+
+.PHONY: clean
+
+clean:
+	rm -f ./$(BIN) ./*.tar ./*.txt ./src/*.o
diff --git a/test.c b/test.c
new file mode 100644
index 0000000..0a906dc
--- /dev/null
+++ b/test.c
@@ -0,0 +1,29 @@
+#include <string.h>
+#include "microtar.h"
+
+int main(void)
+{
+
+mtar_t tar;
+/* utf-8 chars */
+const char *str1 = "Hello world, Witaj świecie";
+const char *str2 = "Goodbye world, żegnam";
+
+/* Open archive for writing */
+mtar_open(&tar, "test.tar", "w");
+
+/* Write strings to files `test1.txt` and `test2.txt` */
+/* no setup data */
+mtar_write_file_header(&tar, "test1.txt", strlen(str1));
+mtar_write_data(&tar, str1, strlen(str1));
+mtar_write_file_header(&tar, "test2.txt", strlen(str2));
+mtar_write_data(&tar, str2, strlen(str2));
+
+/* Finalize -- this needs to be the last thing done before closing */
+mtar_finalize(&tar);
+
+/* Close archive */
+mtar_close(&tar);
+
+return 0;
+}

Overwrite file?

I am wondering how easy it is to overwrite a single file within a tar without re-creating it?

If modify doesn't work, what about deleting the record and appending a new one at the end with new content?

microtar from mem

I don't know how good form this is, but this is useful when A) obviously when you have your tar file in mem, B) useful when you have a inflated (decompressed!) gzip buffer from e.g. zlib.

// mtar from mem buffer

static int mem_write(mtar_t *tar, const void *data, unsigned size)
{
	return MTAR_EWRITEFAIL;
}

static int mem_read(mtar_t *tar, void *data, unsigned size)
{
	unsigned char *buf = (unsigned char *)tar->stream;
	memcpy(data, buf + tar->pos, size);
	return MTAR_ESUCCESS;
}

static int mem_seek(mtar_t *tar, unsigned offset)
{
	return MTAR_ESUCCESS;
}

static int mem_close(mtar_t *tar)
{
	// Todo, Delete the data?
	return MTAR_ESUCCESS;
}

int mtar_open_mem(mtar_t *tar, void *data)
{
	int err;
	mtar_header_t h;

	// Init tar struct and functions
	memset(tar, 0, sizeof(*tar));
	tar->write = mem_write;
	tar->read = mem_read;
	tar->seek = mem_seek;
	tar->close = mem_close;

	tar->stream = data;

	err = mtar_read_header(tar, &h);
	if (err != MTAR_ESUCCESS)
	{
		mtar_close(tar);
		return err;
	}

	// Return ok
	return MTAR_ESUCCESS;
}
// --------

Stack overflow inside mtar_write_file_header

It is possible to cause stack-overflow while calling mtar_write_file_header and passing name of the file larger than 100.

Inside microtar.c strcpy is called which results in overwriting more data than it should.

strcpy(h.name, name);

==73490==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016b1a687c at pc 0x00010517be68 bp 0x00016b1a6770 sp 0x00016b1a5f20
WRITE of size 201 at 0x00016b1a687c thread T0
    #0 0x10517be64 in wrap_strcpy+0x4fc (libclang_rt.asan_osx_dynamic.dylib:arm64+0x4be64) (BuildId: 4947f3677e4435f39b5765e7dbc19bf732000000200000000100000000000b00)
    #1 0x104c5dab4 in mtar_write_file_header microtar.c:336
    #2 0x104c5a618 in LLVMFuzzerTestOneInput target.cc:19
    #3 0x104c76584 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) FuzzerLoop.cpp:617
    #4 0x104c75e78 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) FuzzerLoop.cpp:519
    #5 0x104c77550 in fuzzer::Fuzzer::MutateAndTestOne() FuzzerLoop.cpp:763
    #6 0x104c78394 in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, std::__1::allocator<fuzzer::SizedFile>>&) FuzzerLoop.cpp:908
    #7 0x104c6773c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) FuzzerDriver.cpp:912
    #8 0x104c94570 in main FuzzerMain.cpp:20
    #9 0x1a028ff24  (<unknown module>)
    #10 0xb47efffffffffffc  (<unknown module>)

Attached is a sample crash file.

crash.zip

microtar from mem.. -6 bad checksum

Hi. Nice C thing. I wrote my mem_ variants from the file_ variants.

The tar->stream was only used in these four callbacks, so I thought this would work, but I get a -6 bad checksum.

static int mem_write(mtar_t *tar, const void *data, unsigned size)
{
	//unsigned res = fwrite(data, 1, size, (FILE *)tar->stream);
	//return (res == size) ? MTAR_ESUCCESS : MTAR_EWRITEFAIL;
	return MTAR_EWRITEFAIL;
}

static int mem_read(mtar_t *tar, void *data, unsigned size)
{
	unsigned char *buf = (unsigned char *)tar->stream;
	data = &buf[tar->pos];
	return MTAR_ESUCCESS;
}

static int mem_seek(mtar_t *tar, unsigned offset)
{
	return MTAR_ESUCCESS;
}

static int mem_close(mtar_t *tar)
{
	return MTAR_ESUCCESS;
}

int mtar_open_mem(mtar_t *tar, void *data)
{
	int err;
	mtar_header_t h;

	// Init tar struct and functions
	memset(tar, 0, sizeof(*tar));
	tar->write = mem_write;
	tar->read = mem_read;
	tar->seek = mem_seek;
	tar->close = mem_close;

	tar->stream = data;

	err = mtar_read_header(tar, &h);
	if (err != MTAR_ESUCCESS)
	{
		mtar_close(tar);
		return err;
	}

	// Return ok
	return MTAR_ESUCCESS;
}```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.