Giter Club home page Giter Club logo

Comments (7)

jorisvandenbossche avatar jorisvandenbossche commented on May 28, 2024 1

I think pyogrio doesn't handle schemas, so I haven't tried.

Pyogrio doesn't support the schema keyword (as that is a fiona specific parameter), but it certainly does support writing the different data types. But because the input for pyogrio is a geopandas DataFrame or numpy arrays, the data already has a schema, and pyogrio uses that (instead of letting the user specify it separately).
So if you ensure that your input data has an int16 column, pyogrio should pass that information through to GDAL:

import geopandas
from shapely.geometry import Point

gdf = geopandas.GeoDataFrame({"col": np.array([1, 2, 3], dtype="int16"), "geometry": [Point(i, i) for i in range(3)]})
gdf
#    col     geometry
# 0    1  POINT (0 0)
# 1    2  POINT (1 1)
# 2    3  POINT (2 2)

gdf.to_file("test_gdb.gdb", driver="OpenFileGDB", engine="pyogrio")

I am not fully sure how then to check independently whether it has actually written the correct data type to the OpenFileGDB, but ogrinfo indicates that it has:

$ ogrinfo test_gdb.gdb test_gdb
INFO: Open of `test_gdb.gdb'
      using driver `OpenFileGDB' successful.

Layer name: test_gdb
Geometry: Point
Feature Count: 3
Extent: (0.000000, 0.000000) - (2.000000, 2.000000)
Layer SRS WKT:
(unknown)
FID Column = OBJECTID
Geometry Column = SHAPE
col: Integer(Int16) (0.0)
OGRFeature(test_gdb):1
  col (Integer(Int16)) = 1
  POINT (0 0)

OGRFeature(test_gdb):2
  col (Integer(Int16)) = 2
  POINT (1 1)

OGRFeature(test_gdb):3
  col (Integer(Int16)) = 3
  POINT (2 2)

from fiona.

jorisvandenbossche avatar jorisvandenbossche commented on May 28, 2024 1

And if you want to control the exact OpenFIleGDB types being used by GDAL, it seems to have a creation option COLUMN_TYPES that can be passed (see https://gdal.org/drivers/vector/openfilegdb.html#layer-creation-options, but didn't try this)


@sgillies OGR indeed only uses int32 or int64 data in its internal data model, but there is the concept of "sub type" to annotate a type with additional information (I assume it doesn't change how the data is represented internally, still int32, but then it is used as a hint when writing): https://gdal.org/api/vector_c_api.html#_CPPv415OGRFieldSubType, and there is has a OFSTInt16.
Pyogrio uses this when the input data has a bitwidth < 32, and based on the example above, it seems to have effect. Fiona could use this as well. It's already declared:

Fiona/fiona/ogrext3.pxd

Lines 102 to 105 in 195579d

ctypedef int OGRFieldSubType
cdef int OFSTNone = 0
cdef int OFSTBoolean = 1
cdef int OFSTInt16 = 2

and could set it like is already done for bool subtype as well:

Fiona/fiona/ogrext.pyx

Lines 1293 to 1295 in 195579d

if value == 'bool':
value = 'int32'
field_subtype = OFSTBoolean

from fiona.

remi-braun avatar remi-braun commented on May 28, 2024 1

@sgillies thanks for taking this seriously!

  1. Sadly, I honestly don't know the specifics on this. My only will was to integrate shapefiles into ESRI's GeoDataBases (don't blame me I am forced to 😬) without using arcpy...
    My knowledge on this only comes from the answer I shared in the initial issue 😞
  2. Whatever works for my usecase would be fine for me 😉

from fiona.

sgillies avatar sgillies commented on May 28, 2024

@remi-braun Happy new year! OGR doesn't have a short integer type, only 32 and 64-bit integers. Neither does Fiona at this time, thus your layers are being constructed with 32 bit wide integer fields. I don't think there is any logic in the GDB driver to reduce the width at creation time.

Do you see different behavior if you use ogr2ogr or pyogrio?

from fiona.

remi-braun avatar remi-braun commented on May 28, 2024

Happy new year to you too 😉

I think pyogrio doesn't handle schemas, so I haven't tried.
2024-01-11_08h37_44

What's weird is that for ESRI a short isn't an int16 but also an int32, but I don't exactly know what means the end of int32:4.
Note that text:255 works for GDB, so the : mechanism is in some way already handled in OpenFileGDB.

And we made it all work for Shapefiles, so other drivers handle this mechanism.

from fiona.

remi-braun avatar remi-braun commented on May 28, 2024

With pyogrio, I successfully wrote short dtypes!
However, geopandas doesn't read corretly the input type, so I had to change the type of every column (which could be time consuming):

import geopandas as gpd
gdb_path = "my_gdb.gdb"
layer = "B1_observed_event_a"

# Read layer
observed_event = gpd.read_file(gdb_path , layer=layer)

# Set correct types
observed_event.event_type = observed_event.event_type.astype("int32")
observed_event.obj_desc = observed_event.obj_desc.astype("int16")
observed_event.notation = observed_event.notation.astype("str")  # How can I set str:255 ?
observed_event.det_method = observed_event.det_method.astype("int16")
observed_event.dmg_src_id = observed_event.dmg_src_id.astype("int32")

# Write back in gdb
observed_event.to_file("my_gdb_copy.gdb", layer=layer, driver="OpenFileGDB", engine="pyogrio")

However, with fiona's schema, I succeeded to set str:255 as field, but not with pyogrio. How can I do that ?
2024-01-11_10h18_02

PS: the goal of all this is to allow the GDB domains to be recognized automatically, but I don't know if it will work even with the correct types

from fiona.

sgillies avatar sgillies commented on May 28, 2024

@remi-braun I've begun working on this and have 2 questions.

  1. Is not the field width specifier 4 in int32:4 specific to Shapefiles? I'd love to not have to think about this anymore if we don't have to. It's not clear to me that OGR will coerce a 4 char wide OFTInt to OFTInt16 when saving.
  2. What would you think about an "int16" type as in #1358? If your GPKG file has a "short" field, Fiona would report "int16", and if you write a new schema from Fiona with "int16" type, it should manifest in GPKG as a "short".

from fiona.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.