Giter Club home page Giter Club logo

dataframe's People

Contributors

dependabot[bot] avatar nro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataframe's Issues

[Request] Make findByIndex return Iterable<DataRow>

Hey Alex,

if I am putting an index on my DataFrame I can search for a row like this:

dataFrame.addIndex("idx", "barcode")
DataRow row = dataFrame.findByIndex("idx", "barcode1");

However, in some cases multiple rows are returned. So ideally it could return sth like:

Iterable<DataRow> row = dataFrame.findByIndex("idx", "barcode1");

Could you maybe add this.

Best, Simon

Locale are not taken in operation such column copy

Hi,
I changed Locale to FRENCH in NumberUtil Class but it is not taken in copy column (I tried for Double type but I think it is the same for other types).

The only place where it is used is in DataFrame.print function..

Trying to filter Data by applying conditions to the csv file

Hi. I am currently facing an issue with the Dataframe.
I have donwloaded a file from Amazon s3 private bucket and I am facing issues while filtering the rows that respect a certain condition.
Here is my code:
`
//This function allows me to connect to the private s3 bucket
connection();
S3Object s3object = s3client.getObject(bucketName, sourceFile);
DataFrame file = DataFrame.load(s3object.getObjectContent(), FileFormat.CSV);

//listColumns & size displaying
System.out.println(file.getColumnNames().toString());
System.out.println(file.size());
//getting the first line with the header column "AreaQ" being superior to 2
file.select("(AreaQ > 2)").print();`

I am having an error on this last line saying that there was a NULL exception that occured and the exception being "Exception in getValues() with cause = 'NULL' and exception = 'column header name not found 'AreaQ'' de.unknownreality.dataframe.DataFrameRuntimeException: column header name not found 'AreaQ'"
and yet I do have a column named AreaQ with numeric values that are > to 2.
Can you help me please?

Initialization of the parser map is not thread-safe

Discovered this while adding columns to data frames in a multi threaded environment. Each thread has its own frame, so concurrency should not be an issue. However, ParserUtil#getParserMap() and ParserUtil#init() are not thread-safe. The lazy initialization of parserMap can lead to unexpected "Parser not found" errors when adding columns to multiple frames in multiple threads. This occurs because the if (parserMap == null) check no longer triggers (since the map has been created by init()), but the map is not done initializing.

From just looking at ParserUtil.java, it doesn't seem like there is a reason to lazily initialize this. Making the parser map static final and initializing it in a static block seems like a good solution that avoids multi-threading issues.

dataframe need a count function

In my situation, I need get the dataframe size frequent

But consider the API, I will do this by

df.toList().size()

but the toList method defined in BaseDataFrame is expensive.

@Override
    public List<List> toList() {
        ArrayList<List> list = new ArrayList<>();

        for (DataRow row : this) {
            List data = new ArrayList();
            for (int i = 0; i < columns.length; i++) {
                data.add(row.get(i));
            }
            list.add(data);
        }

        return list;
    }

any idea?

Binary File Format

With the introduction of custom value types (#22), all column values can be written and read from DataStreams. This enables the implementation of a binary file format to improve performance and decrease file sizes

Support of temporal column types

Hi Alexander,

I'm looking at "DataFrame" to use it in of my pet projects

  • How to support temporal column types, e.g. LocalDate or LocalDateTime? Want I want to do is to hide a parsed CSV or Excel sheet behind "DataFrame" but Excel has temporal column types
  • Did you have a look at Commons CSV? Since you strive for minimal dependencies it might be not an option but I wanted to ask :-)

Siegfried

support for custom column types

introduce a value type abstraction to support custom column types (like temporal column types #21 )
Value types should provide the following methods:

  • read/write values to DataOutputStream
  • convert values to String
  • parse values from String

These value types must then be used in the following library parts:

  • CSV reader / writer
  • Automatic column type detection
  • Printer
  • Filter query parser

Value types are currently being worked on in branch
value-type-abstraction

Progress:

  • there are value types for all previously available columns.
  • value type object are available from column, row and header objects
  • CSV writer uses value types to write String representations

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.