Giter Club home page Giter Club logo

Comments (7)

albertvillanova avatar albertvillanova commented on July 4, 2024

Note that the behavior described above is different if the format is JSON-Lines and the "pyarrow" engine is used:

json_lines = b'{"col1": 1, "col2": 1.0}\n{"col1": 2, "col2": 2.0}'

df = pd.read_json(io.BytesIO(json_lines), lines=True, engine="pyarrow")
assert not (df["col1"].dtype == df["col2"].dtype)

On the other hand, the downcasting appears again if the "ujson" engine (the default one) is used:

json_lines = b'{"col1": 1, "col2": 1.0}\n{"col1": 2, "col2": 2.0}'

df = pd.read_json(io.BytesIO(json_lines), lines=True)
assert df["col1"].dtype == df["col2"].dtype

from pandas.

PushpitSB avatar PushpitSB commented on July 4, 2024

Thats a good point

from pandas.

albertvillanova avatar albertvillanova commented on July 4, 2024

Also note that this downcasting is not performed by pandas.read_csv:

csv_content = "col1,col2\n1,1.0\n2,2.0"

df = pd.read_csv(io.StringIO(csv_content))
assert not (df["col1"].dtype == df["col2"].dtype)

from pandas.

albertvillanova avatar albertvillanova commented on July 4, 2024

Additionally, str column is also cast to int:

d = [{"col1": 1, "col2": 1.0, "col3": "1"}, {"col1": 2, "col2": 2.0, "col3": "2"}]

df = pd.read_json(io.StringIO(json.dumps(d)))
assert df["col1"].dtype == df["col2"].dtype == df["col3"].dtype

from pandas.

rhshadrach avatar rhshadrach commented on July 4, 2024

Passing dtype=False, I get the expected behavior of the OP. But the docstring doesn't seem clear to me:

If True, infer dtypes; if a dict of column to dtype, then use those; if False, then donโ€™t infer dtypes at all, applies only to the data.

Perhaps the language can be improved.

@albertvillanova - can you confirm if dtype=False satisfies your use-case? Labeling this as just a docs issue for now.

from pandas.

albertvillanova avatar albertvillanova commented on July 4, 2024

@rhshadrach thanks for your reply.

Unfortunately, passing dtype=False does not satisfy my use-case, because indeed I was passing dtype_backend="pyarrow" as well (I did not mention it in the description to make things simpler).

Therefore the float-to-int downcasting persists even if passing dtype=False when passing dtype_backend="pyarrow":

d = [{"col1": 1, "col2": 1.0}, {"col1": 2, "col2": 2.0}]

df = pd.read_json(io.StringIO(json.dumps(d)), dtype_backend="pyarrow")
assert df["col1"].dtype == df["col2"].dtype

df = pd.read_json(io.StringIO(json.dumps(d)), dtype_backend="pyarrow", dtype=False)
assert df["col1"].dtype == df["col2"].dtype

Additionally, I would like to ask if in the former case (when no passing dtype_backend="pyarrow"), there would be other side effects when passing dtype=False. Would other dtypes be treated differently?

from pandas.

rhshadrach avatar rhshadrach commented on July 4, 2024

Thanks @albertvillanova - I've reclassified this issue.

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.