This is a dokcer environment to quickly get up a geolake service with
- a JDBC catalog (Postgres)
- a computing engine(Spark 3.3)
- a storage backend(local file system)
Geolake is built on top of Apache Iceberg, this project was also borrowed from docker-spark-iceberg.
docker compose up
Open http://localhost:8888 in browser, there are 2 notebooks inside :
-
geolake-scala-demo: shows an example of how to use geolake scala api to read/write data from/to geolake.
-
benchmark-portotaxi: this notebook runs a benchmark on portotaxi dataset which has 1.7M records. You will see the reading/writing performance of the 3 Parquet format(GeoLake Parqeut, GeoParquet, GeoParquet(bbox)) for spatial data. You will also see how the partition reolution parameter affects the performance.