Giter Club home page Giter Club logo

pdal-parallelizer's People

Contributors

clementalba avatar dependabot[bot] avatar jean-roc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pdal-parallelizer's Issues

comman-line temp directory

When using pdal-parallelizer in command-line with mini-forge on Windows, it seems that temp directory has to be already created otherwise his path is unknown.
FileNotFoundError: [WinError 3] Le chemin d’accès spécifié est introuvable: 'E:/0_en_cours/2020_classification/temp_parallelizer'
It should be nice to automatically create this directory.

-merge_tiles option writer issue

When using the merge_tiles option I face a writer issue.

pdal-parallelizer process-pipelines -c E:\0000_test\pdal_parallelizer\config.json -it single -nw 9 -tpw 1 -ts 60 60 -b 1 -mt

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\applications\miniforge3\envs\prod_env\Scripts\pdal-parallelizer.exe\__main__.py", line 7, in <module>
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\pdal_parallelizer_cli\__main__.py", line 44, in process_pipelines
    process(
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\__init__.py", line 186, in process_pipelines
    merge_ppln.execute()
RuntimeError: Couldn't create writer stage of type 'writers.laz'.
You probably have a version of PDAL that didn't come with a plugin
you're trying to load.

It seems that a writers.laz is called during the merge step when it should be the writers.las.

Create a Python API

  • Create a new Python package for the CLI version
  • Port the main.py file from the CLI version in this new Python package
  • Create a new main.py file which will be the entry point of the API
  • Adapt the process_pipelines function from the CLI version for use via the API
  • Make the process_pipelines function in the CLI version now use the process_pipelines function in the API version
  • Check that all tests pass

Warning message for tile size display even with -it dir option

With this command :

pdal-parallelizer process-pipelines -c E:\0000_test\pdal_parallelizer\config.json -it dir -nw 15 -tpw 1

I get this message :

WARNING - You are using the default value of the tile_size option (256 by 256 meters). Please check if your points cloud's dimensions are greater than this value.
Do you want to continue ?  [y/N]

Tile size impact on workers number

I faced a memory saturation when over-tiling a single input file.

A 500mx500m input, divided in 35 tiles computed by 15 worked over-consumes memory capacities.
pdal-parallelizer process-pipelines -c E:\0000_test\pdal_parallelizer\config.json -it single -nw 15 -tpw 1 -ts 100 100 -mt

When worker and tiles size are computed so workers only compute once during the process, the unmanaged memory doesn't blow up.
pdal-parallelizer process-pipelines -c E:\0000_test\pdal_parallelizer\config.json -it single -nw 9 -tpw 1 -ts 180 180 -mt

Computing the input file's hold might be used to calculate the optimal tile sizing, or conversely.

-tile_size option needs -buffer

pdal-parallelizer process-pipelines -c E:\0000_test\pdal_parallelizer\config.json -it single -nw 9 -tpw 1 -ts 80 80 -mt -rb
Beginning of the execution

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\applications\miniforge3\envs\prod_env\Scripts\pdal-parallelizer.exe\__main__.py", line 7, in <module>
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\pdal_parallelizer_cli\__main__.py", line 44, in process_pipelines
    process(
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\__init__.py", line 140, in process_pipelines
    delayed = do.process_pipelines(output_dir=output, json_pipeline=pipeline, temp_dir=temp, iterator=iterator,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\do.py", line 58, in process_pipelines
    p = t.pipeline(is_single)
        ^^^^^^^^^^^^^^^^^^^^^
  File "D:\applications\miniforge3\envs\prod_env\Lib\site-packages\pdal_parallelizer\tile.py", line 69, in pipeline
    p.insert(len(p) - 1, bounds.removeBuffer(self.bounds_without_buffer))
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Tile' object has no attribute 'bounds_without_buffer'

What if we don't want any buffer to be used ? A null value isn't valid.

Add support for Virtual Point Cloud

PDAL Wrench has created a new format called Virtual Point Cloud (.vpc) with now a specification, it is based on STAC and similar to GDAL's VRT.

It links in one JSON file several local or remote sources with additionnal metadatas (geojson extent, statistics, etc.), it could be a new input type for the CLI.

Wrench uses it to parallelize the processing of multiple files but on a file by file basis, not by generating new chunks.

-buffer Int

For documentation : the buffer value has to be an integer.

Computation performance decreasing with important number of files

Working with Conda version 2.0.3, I was confronted with underutilized computational capabilities while working with a large dataset consisting of about 5000 laz tiles. Each worker was arround 0.5% of the CPU when the expected behaviour is about 5%.
With a smaller subset of the exact same tiles, it works fine.

My process file is
pdal-parallelizer process-pipelines -c F:\nuages\2020_lidar\2022_06_pdal_mneh_config.json -it dir -nw 20 -tpw 1 -mt

Cannot work due to the error of json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I installed by conda and tried to run by process(config = path_to_config.json, input_type="single"...
But it cannot work due to the error below.

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

my config.json was written by this
{
"input": "F:/",
"output": "F:/",
"temp": "F:/",
"pipeline": "F:/pipeline.json"
}
Please tell me how to solve it.

Thank you.

Removing tiles when merge_tiles on

It's usefull to keep tiles resulting from -ts option, but it would also be usefull to be able to remove them with an option if the result you're looking for is a merge of the tiles.

Default tiles format las

The default format of the tiles is the las. It doesn't take account of the compression option specified in the pipeline (writers.las).
However, when using the merge step, called with -mt, it does.

Reorganise the project

  • Use a "PipelineWrapper" or a "PipelineManager" class to simply actions related to PDAL Pipelines
  • Simplify the tile.pipeline function by separating it in multiple functions
  • Simplify the tile.split function by separating it in multiple functions

CPU explotation limit

Working with 2.0.3 release with -dir option on, I faced underutilization of CPUs over 6 wokers assigned.

Basically, from 3 to 6 workers, each CPU use around 4 to 5% of the global CPU's capacities, but over 6 workers it falls to less than 2%.

StatisticsError('must have at least two data points')

Processing never starts, as it appears to be expecting that there are already files in the output directory.

CLI Command:
pdal-parallelizer process-pipelines -c /work/test/data_1/pdal-parallelizer.json -it single

pdal-parallelizer.json:

{
  "input": "/work/test/data_1/input/sample_points.laz",
  "output": "/work/test/data_1/output",
  "temp": "/work/test/data_1/temp",
  "pipeline": "/work/test/data_1/pipeline.json"
}

pipeline.json:

{
  "pipeline": [
    {
      "type": "readers.las",
      "filename": "/work/test/data_1/input/sample_points.laz"
    },
    {
      "type": "writers.laz",
      "filename": "/work/test/data_1/output/output.laz"
    }
  ]
}

Trace:

Parallelization started.

Traceback (most recent call last):
  File "/root/miniconda3/bin/pdal-parallelizer", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/root/miniconda3/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/miniconda3/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/miniconda3/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/pdal_parallelizer/pdal_parallelizer_cli/__main__.py", line 42, in process_pipelines
    process(
  File "/root/miniconda3/lib/python3.10/site-packages/pdal_parallelizer/__init__.py", line 130, in process_pipelines
    file_manager.getEmptyWeight(output_directory=output)
  File "/root/miniconda3/lib/python3.10/site-packages/pdal_parallelizer/file_manager.py", line 60, in getEmptyWeight
    deciles = [round(q, 2) for q in statistics.quantiles(weights_ko, n=10)]
  File "/root/miniconda3/lib/python3.10/statistics.py", line 662, in quantiles
    raise StatisticsError('must have at least two data points')
statistics.StatisticsError: must have at least two data points

Create a package for Conda

Having a pdal-parallelizer available through conda-forge would eliminate the requirement to use pip, it would be simpler as conda is already needed to install PDAL as a dependency.

Crash due to multiple expressions on filters.assign

With pdal-parallelizer 2.0.3 on Windows, both with single and dir options, multiples requests with filters.assign make pdal-parallelizer crash. For exemple,

    {
        "type":"filters.assign",
        "value":
		[
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]",
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]",
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]"
		]
    },

has to be replaced by,

    {
        "type":"filters.assign",
        "value":
		[
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]"
		]
    },    {
        "type":"filters.assign",
        "value":
		[
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]"
		]
    },    {
        "type":"filters.assign",
        "value":
		[
		"value" : "Dimension = ValueExpression [WHERE ConditionalExpression)]"
		]
    },

Command-line tiling and renaming single input

When the single input option is activated, the ouput is tiled based on dask parameters and named according to xmins/ymins of the bounding-boxes.

Would it be possible to

  • merge the ouputs,
  • exclude the buffers of the ouput when used, and
  • rename the merged output based on the input name ?

So
input_name.laz -> xmins/ymins.laz & xmins/ymins.laz & xmins/ymins.laz & xmins/ymins.laz
would be
input_name.laz -> input_name.laz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.