Giter Club home page Giter Club logo

Comments (6)

hosseinmoein avatar hosseinmoein commented on May 19, 2024

That is an interesting question.
This library is as much about interface as is about functionality. It is designed so additional functionality can easily be added. This is meant to be a coherent container, so it is inherently a C++ thing.
Can you put an interface on top of it to make it C-usable? Yes.
But why? And what type of interface are you envisioning for C?
You can always use it as is in a predominately C application. But I am curious to hear what you are envisioning

Best,
HM

from dataframe.

alejandro-colomar avatar alejandro-colomar commented on May 19, 2024

As of now, I have a data structures library (with dynamic arrays, dynamic buffers, linked lists, and binary search trees) and I'm using it in the test program to build my DataFrame emulator.

First I did a working program where I hard-coded many things, and now I'm transforming the program into a library by generalizing it.

The API I envisioned is this one:

enum	Alx_DataFrame_Type {
	ALX_DF_S64 = 1,
	ALX_DF_DBL,
	ALX_DF_STR
};

struct	Alx_DataFrame_Cell {
	union {
		int64_t			z;
		double			r;
		struct Alx_DynBuf	*s;
	};
	int	err;
};

struct	Alx_DataFrame_Row {
	struct Alx_LinkedList	*cells;
	int			err;
};

union	Alx_DataFrame_Desc {
	struct	Alx_DataFrame_Desc_Txt {
		int	uniq;
		int	top;
		int	freq;
	};
	struct	Alx_DataFrame_Desc_Num {
		double	mean;
		double	std;
		double	min;
		double	q_25;
		double	q_50;
		double	q_75;
		double	max;
	};
};

struct	Alx_DataFrame_Col {
	int				type;   /* enum Alx_DataFrame_Type */
	struct Alx_DynBuf		*hdr;   /* column header string */
	cmp_f				*cmp;   /* user custom comparison function for the data (if not, standard comparations are done */
	bool				ltd_values;   /* limited set of possible values? */
	struct Alx_BST			*values;   /* values; either stored by the parser, or passed by the user if limited set of values is true */
	struct Alx_DataFrame_Desc	*desc;
};

struct	Alx_DataFrame {
	struct Alx_LinkedList	*cols;
	struct Alx_LinkedList	*rows;
};

int	alx_df_init		(struct Alx_DataFrame **df);
void	alx_df_deinit		(struct Alx_DataFrame *df);
int	alx_df_add_col		(struct Alx_DataFrame *restrict df,
				 int type, char *restrict hdr,
				 cmp_f *cmp,
				 struct Alx_BST *restrict values);
int	alx_df_parse		(struct Alx_DataFrame *restrict df,
				 FILE *restrict istream);
int	alx_df_drop_row		(struct Alx_DataFrame *df,
				 ptrdiff_t nrow);
int	alx_df_drop_col		(struct Alx_DataFrame *df,
				 ptrdiff_t ncol);
int	alx_df_dropna		(struct Alx_DataFrame *df);
int	alx_df_sort		(struct Alx_DataFrame *df,
				 ptrdiff_t ncol);
int	alx_df_sort_bwd		(struct Alx_DataFrame *df,
				 ptrdiff_t ncol);
int	alx_df_describe		(struct Alx_DataFrame *df);
int	alx_df_fprn_data	(FILE *restrict ostream,
				 struct Alx_DataFrame *restrict df);
int	alx_df_fprn_desc	(FILE *restrict ostream,
				 struct Alx_DataFrame *restrict df);

The dataframe would consist of a linked list of rows, and a linked list with column configurations and descriptions.

The rows are also linked lists of cells, which in the end contain the data in dynamic buffers.

A simple program using it would be the following (Its a prototype; it may have errors; also, I didn't care about error handling):

enum Fields {
	FLDS_ID,
	FLDS_NAME,
	FLDS_AGE,
	FLDS_HEIGHT,

	FIELDS
};
const char *const hdrs[FIELDS] = {
	[FLDS_ID]	= "id",
	[FLDS_NAME]	= "name",
	[FLDS_AGE]	= "age",
	[FLDS_HEIGHT]	= "height"
};
const char *const types[FIELDS] = {
	[FLDS_ID]	= ALX_DF_S64,
	[FLDS_NAME]	= ALX_DF_STR,
	[FLDS_AGE]	= ALX_DF_S64,
	[FLDS_HEIGHT]	= ALX_DF_DBL
};

int main(void)
{
	struct Alx_DataFrame	*df;
	FILE			*less;

	fp = fopen("file.csv", "r");

	alx_df_init(&df);
	for (ptrdiff_t i = 0; i < FIELDS; i++)
		alx_df_add_col(df, types[i], hdrs[i], NULL, NULL);
	alx_df_parse(df, fp);

	alx_sort_bwd(FLDS_AGE);		/* oldest first */
	less	= popen("less -S", "w");
	alx_df_fprn_data(less, df);	/* print data with less(1) */
	pclose(less);

	alx_df_describe(df);		/* calculate description */
	less	= popen("less -S", "w");
	alx_df_fprn_desc(less, df);	/* print description with less(1) */
	pclose(less);

	alx_df_dropna(df);		/* drop rows with invalid values */
	less	= popen("less -S", "w");
	alx_df_fprn_data(less, df);	/* print data with less(1) */
	pclose(less);
	alx_df_describe(df);		/* need to calculate description again */
	less	= popen("less -S", "w");
	alx_df_fprn_desc(less, df);	/* print description with less(1) */
	pclose(less);


	return	0;
}

from dataframe.

alejandro-colomar avatar alejandro-colomar commented on May 19, 2024

The problem with what I have now, as I see it, is that i have zillions of mallocs, and I'm concerned about performance. Maybe your library could be faster.

Nevertheless, as it's relatively easy and simple, I'll first finish my library just to measure its performance. It'll take me some time, though. If yours relies on arrays, it will probably be much faster. That's why I thought of porting or wrapping it to C.

from dataframe.

hosseinmoein avatar hosseinmoein commented on May 19, 2024

So, I followed a few principals in this library

  1. I must support any type either built-in or user defined without needing new code
  2. Never chase pointers ala linked lists, including virtual function calls
  3. Have all column data in continuous memory space
  4. Never use more space than you need (i.e. unions)
  5. Avoid copying data as much as possible. Unfortunately, sometimes you have to
  6. Use multi-threading but only when it makes sense

from dataframe.

alejandro-colomar avatar alejandro-colomar commented on May 19, 2024

Regarding 2 & 3:

I first tried to do that, but I don't know how to do it, and I don't know if it is possible in C. How do you store all data from a column contiguously, if every field can have a different type? Do you use templates for that in C++?

Would you know how to do a C interface for your library similar to what I wrote? I don't know much about the internals of your library (I don't know much C++). I could help in the C code.

from dataframe.

hosseinmoein avatar hosseinmoein commented on May 19, 2024

Yes, this library relies very heavily on templates. I am not sure how/if that is possible in C.
Columns could be of different types. But each element in a given column is of the same type.

I suggest you look at my documentation and code to get some ideas. You could just use it as is in your apps

from dataframe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.