Comments (6)
That is an interesting question.
This library is as much about interface as is about functionality. It is designed so additional functionality can easily be added. This is meant to be a coherent container
, so it is inherently a C++ thing.
Can you put an interface on top of it to make it C-usable? Yes.
But why? And what type of interface are you envisioning for C?
You can always use it as is in a predominately C application. But I am curious to hear what you are envisioning
Best,
HM
from dataframe.
As of now, I have a data structures library (with dynamic arrays, dynamic buffers, linked lists, and binary search trees) and I'm using it in the test program to build my DataFrame emulator.
First I did a working program where I hard-coded many things, and now I'm transforming the program into a library by generalizing it.
The API I envisioned is this one:
enum Alx_DataFrame_Type {
ALX_DF_S64 = 1,
ALX_DF_DBL,
ALX_DF_STR
};
struct Alx_DataFrame_Cell {
union {
int64_t z;
double r;
struct Alx_DynBuf *s;
};
int err;
};
struct Alx_DataFrame_Row {
struct Alx_LinkedList *cells;
int err;
};
union Alx_DataFrame_Desc {
struct Alx_DataFrame_Desc_Txt {
int uniq;
int top;
int freq;
};
struct Alx_DataFrame_Desc_Num {
double mean;
double std;
double min;
double q_25;
double q_50;
double q_75;
double max;
};
};
struct Alx_DataFrame_Col {
int type; /* enum Alx_DataFrame_Type */
struct Alx_DynBuf *hdr; /* column header string */
cmp_f *cmp; /* user custom comparison function for the data (if not, standard comparations are done */
bool ltd_values; /* limited set of possible values? */
struct Alx_BST *values; /* values; either stored by the parser, or passed by the user if limited set of values is true */
struct Alx_DataFrame_Desc *desc;
};
struct Alx_DataFrame {
struct Alx_LinkedList *cols;
struct Alx_LinkedList *rows;
};
int alx_df_init (struct Alx_DataFrame **df);
void alx_df_deinit (struct Alx_DataFrame *df);
int alx_df_add_col (struct Alx_DataFrame *restrict df,
int type, char *restrict hdr,
cmp_f *cmp,
struct Alx_BST *restrict values);
int alx_df_parse (struct Alx_DataFrame *restrict df,
FILE *restrict istream);
int alx_df_drop_row (struct Alx_DataFrame *df,
ptrdiff_t nrow);
int alx_df_drop_col (struct Alx_DataFrame *df,
ptrdiff_t ncol);
int alx_df_dropna (struct Alx_DataFrame *df);
int alx_df_sort (struct Alx_DataFrame *df,
ptrdiff_t ncol);
int alx_df_sort_bwd (struct Alx_DataFrame *df,
ptrdiff_t ncol);
int alx_df_describe (struct Alx_DataFrame *df);
int alx_df_fprn_data (FILE *restrict ostream,
struct Alx_DataFrame *restrict df);
int alx_df_fprn_desc (FILE *restrict ostream,
struct Alx_DataFrame *restrict df);
The dataframe would consist of a linked list of rows, and a linked list with column configurations and descriptions.
The rows are also linked lists of cells, which in the end contain the data in dynamic buffers.
A simple program using it would be the following (Its a prototype; it may have errors; also, I didn't care about error handling):
enum Fields {
FLDS_ID,
FLDS_NAME,
FLDS_AGE,
FLDS_HEIGHT,
FIELDS
};
const char *const hdrs[FIELDS] = {
[FLDS_ID] = "id",
[FLDS_NAME] = "name",
[FLDS_AGE] = "age",
[FLDS_HEIGHT] = "height"
};
const char *const types[FIELDS] = {
[FLDS_ID] = ALX_DF_S64,
[FLDS_NAME] = ALX_DF_STR,
[FLDS_AGE] = ALX_DF_S64,
[FLDS_HEIGHT] = ALX_DF_DBL
};
int main(void)
{
struct Alx_DataFrame *df;
FILE *less;
fp = fopen("file.csv", "r");
alx_df_init(&df);
for (ptrdiff_t i = 0; i < FIELDS; i++)
alx_df_add_col(df, types[i], hdrs[i], NULL, NULL);
alx_df_parse(df, fp);
alx_sort_bwd(FLDS_AGE); /* oldest first */
less = popen("less -S", "w");
alx_df_fprn_data(less, df); /* print data with less(1) */
pclose(less);
alx_df_describe(df); /* calculate description */
less = popen("less -S", "w");
alx_df_fprn_desc(less, df); /* print description with less(1) */
pclose(less);
alx_df_dropna(df); /* drop rows with invalid values */
less = popen("less -S", "w");
alx_df_fprn_data(less, df); /* print data with less(1) */
pclose(less);
alx_df_describe(df); /* need to calculate description again */
less = popen("less -S", "w");
alx_df_fprn_desc(less, df); /* print description with less(1) */
pclose(less);
return 0;
}
from dataframe.
The problem with what I have now, as I see it, is that i have zillions of mallocs, and I'm concerned about performance. Maybe your library could be faster.
Nevertheless, as it's relatively easy and simple, I'll first finish my library just to measure its performance. It'll take me some time, though. If yours relies on arrays, it will probably be much faster. That's why I thought of porting or wrapping it to C.
from dataframe.
So, I followed a few principals in this library
- I must support any type either built-in or user defined without needing new code
- Never chase pointers ala linked lists, including virtual function calls
- Have all column data in continuous memory space
- Never use more space than you need (i.e. unions)
- Avoid copying data as much as possible. Unfortunately, sometimes you have to
- Use multi-threading but only when it makes sense
from dataframe.
Regarding 2 & 3:
I first tried to do that, but I don't know how to do it, and I don't know if it is possible in C. How do you store all data from a column contiguously, if every field can have a different type? Do you use templates for that in C++?
Would you know how to do a C interface for your library similar to what I wrote? I don't know much about the internals of your library (I don't know much C++). I could help in the C code.
from dataframe.
Yes, this library relies very heavily on templates. I am not sure how/if that is possible in C.
Columns could be of different types. But each element in a given column is of the same type.
I suggest you look at my documentation and code to get some ideas. You could just use it as is in your apps
from dataframe.
Related Issues (20)
- Error: specializing member ‘hmdf::DataFrame<int, hmdf::HeteroVector<0> >::set_lock’ requires ‘template<>’ syntax| HOT 9
- Use of std::shared_mutex with shared_lock instead of native locks defined in ThreadGranularity.h HOT 2
- DateTime: Issue with parsing ISO datetime HOT 6
- StdVisitor error with user-defined type HOT 4
- in dynamic libraries, get_column returns an empty data vector HOT 10
- MedianVisitor giving wrong result HOT 2
- Issues while compiling with DataFrame headers HOT 4
- test failed HOT 2
- load_column from single_act_visit.get_result() HOT 3
- failed to compile tests and examples in ubuntu HOT 2
- The `DataFrame.h` occurred an error: `In included file: unknown type name 'requires'` HOT 3
- Compile Failed with `VERSION 2.0.0` HOT 3
- How can I convert the `Eigen Matrix` to `DataFrame` or `DataFrame` to `Eigen Matrix`? HOT 4
- Error compiling on Linux x86_64 with g++ 14.0.1 or Clang 17.0.6 HOT 3
- Question: Does this library support streaming data frames? HOT 3
- CLang 16.0.6 fails to build a file including `DataFrame/DataFrame.h` HOT 1
- INTERFACE_LINK_LIBRARY is missing `tbb` HOT 1
- How to divide a big dataframe into several small ones while no memory copy needed HOT 9
- Group by on string dataframe
- do get_data_by_isel on the view HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframe.