esri / large-network-analysis-tools Goto Github PK

View Code? Open in Web Editor NEW

58.0 9.0 7.0 4.67 MB

Tools and code samples for solving large network analysis problems in ArcGIS Pro

License: Apache License 2.0

Python 100.00%

network-analysis python large-dataset arcpy nax origin-destination

large-network-analysis-tools's Introduction

large-network-analysis-tools

The tools and code samples here help you solve large network analysis problems in ArcGIS Pro.

We have provided some python script tools designed to solve large network analysis problems by splitting the input data into chunks and solving the chunks in parallel. You can use these tools as is, you can modify the provided scripts to suit your needs, or you can use them as an example when writing your own code.

Features

The LargeNetworkAnalysisTools.pyt toolbox has three geoprocessing tools:

Solve Large OD Cost Matrix - solves a large origin destination cost matrix problem, and the results can be written to a feature class, a set of CSV files, or a set of Apache Arrow files
Solve Large Analysis With Known OD Pairs - generates routes between origins and preassigned destinations
Parallel Calculate Locations - calculates network locations for a large dataset in parallel

Requirements

ArcGIS Pro 2.5 or later (ArcGIS Pro 2.9 or later is recommended for improved performance and functionality, and older Pro versions have not been thoroughly tested)
One of the following three options:
- A routable network dataset and the Network Analyst extension license
- An ArcGIS Online account with routing privileges and sufficient credits
- A portal with ArcGIS Enterprise routing services configured.
Origin and destination points you wish to analyze

Instructions

Download the latest release
Modify the code to suit your needs if desired
Run the code in standalone python, or run the provided geoprocessing tool from within ArcGIS Pro.

Solve Large OD Cost Matrix tool

The Solve Large OD Cost Matrix tool can be used to solve a large origin-destination cost matrix, calculating the travel time and distance from a set of origins to a set of destinations. You can use a time or distance cutoff and a number of destinations to find for each origin to reduce the problem size, and the calculations are optimized by spatially sorting the inputs. The tool can calculate extremely large OD cost matrices by chunking up the problem and solving in parallel. You can choose to save the outputs to a feature class, set of CSV files, or set of Apache Arrow tables.

Solve Large OD Cost Matrix tool inputs

Origins (Python: Origins) - The feature class or layer containing the origins.
Destinations (Python: Destinations) - The feature class or layer containing the destinations.
Network Data Source (Python: Network_Data_Source) - Network dataset, network dataset layer, or portal URL to use when calculating the OD Cost Matrix.
Travel Mode (Python: Travel_Mode) - Network travel mode to use when calculating the OD Cost Matrix
Time Units (Python: Time_Units) - The time units the output Total_Time field will be reported in.
Distance Units (Python: Distance_Units) - The distance units the output Total_Distance field will be reported in.
Maximum Origins and Destinations per Chunk (Python: Max_Inputs_Per_Chunk) - Defines the chunk size for parallel OD Cost Matrix calculations. For example, if you want to process a maximum of 1000 origins and 1000 destinations in a single chunk, set this parameter to 1000.
Maximum Number of Parallel Processes (Python: Max_Processes) - Defines the maximum number of parallel processes to run at once. Do not exceed the number of logical processors of your machine.
Output Updated Origins (Python: Output_Updated_Origins) - Path to the output feature class that will contain the updated origins, which may be spatially sorted and have added fields. The OriginOID field in the Output OD Lines Feature Class refers to the ObjectID of the Output Updated Origins and not the original input origins.
Output Updated Destinations (Python: Output_Updated_Destinations) - Path to the output feature class that will contain the updated destinations, which may be spatially sorted and have added fields. The DestinationOID field in the Output OD Lines Feature Class refers to the ObjectID of the Output Updated Destinations and not the original input destinations.
Output OD Cost Matrix Format (Python: Output_Format) - The desired output format for the OD Cost Matrix Lines. The available choices are:
- "Feature class" - A single, combined feature class. This option is the slowest to create and will likely fail for extremely large problems. Note: For performance reasons, the feature class output does not generate line shapes; the geometry of the features in the output is null. If you want straight line shapes, find the config file, od_config.py, in the toolbox's root directory and change "lineShapeType": arcpy.nax.LineShapeType.NoLine, to "lineShapeType": arcpy.nax.LineShapeType.StraightLine,.
- "CSV files" - A set of .csv files. Each file represents the OD Cost Matrix Lines output, without shape geometry, for a chunk of origins and a chunk of destinations, using the naming scheme ODLines_O_#_#_D_#_#.csv, where the # signs represent the ObjectID ranges of the origins and destinations in the chunk. If you have set a value for the Number of Destinations to Find for Each Origin parameter, you may find some output files using the naming scheme ODLines_O_#_#.csv because results from all destinations have been combined into one file.
- "Apache Arrow files" - A set of Apache Arrow files. Each file represents the OD Cost Matrix Lines output, without shape geometry, for a chunk of origins and a chunk of destinations, using the naming scheme ODLines_O_#_#_D_#_#.arrow, where the # signs represent the ObjectID ranges of the origins and destinations in the chunk. If you have set a value for the Number of Destinations to Find for Each Origin parameter, you may find some output files using the naming scheme ODLines_O_#_#.arrow because results from all destinations have been combined into one file. This option is not available in versions of ArcGIS Pro prior to 2.9 and is not available if the network data source is a service.
Output OD Lines Feature Class (Python: Output_OD_Lines_Feature_Class) - Path to the output feature class that will contain the OD Cost Matrix Lines output computed by the tool if the Output OD Cost Matrix Format parameter value is "Feature class". The schema of this feature class is described in the arcpy documentation. Note that the OriginOID and DestinationOID fields refer to the ObjectIDs of the Output Updated Origins and Output Updated Destinations feature classes and not the original input data.
Output Folder (Python: Output_Folder) - Path to a folder, which will be created by the tool, that will contain the CSV or Arrow files representing the OD Cost Matrix Lines results if the Output OD Cost Matrix Format parameter value is "CSV files" or "Apache Arrow files". The schema of the files is described in the arcpy documentation, although the shape, OriginName, and DestinationName fields are not included in these outputs. Note that the OriginOID and DestinationOID fields refer to the ObjectIDs of the Output Updated Origins and Output Updated Destinations feature classes and not the original input data.
Cutoff (Python: Cutoff) - Impedance cutoff limiting the search distance for each origin. For example, you could set up the problem to find only destinations within a 15 minute drive time of the origins. This parameter is optional. Leaving it blank uses no cutoff.
- If your travel mode has time-based impedance units, Cutoff represents a time and is interpreted in the units specified in the Time Units parameter.
- If your travel mode has distance-based impedance units, Cutoff represents a distance and is interpreted in the units specified in the Distance Units parameter.
- If your travel mode has other units (not time- or distance-based), Cutoff should be specified in the units of your travel mode's impedance attribute.
Number of Destinations to Find for Each Origin (Python: Num_Destinations) - The number of destinations to find for each origin. For example, setting this to 3 will result in the output including the travel time and distance from each origin to its three closest destinations. This parameter is optional. Leaving it blank results in finding the travel time and distance from each origin to all destinations.
Time of Day (Python: Time_Of_Day) - The start time of day for the analysis. No value indicates a time neutral analysis.
Barriers (Python: Barriers) - Point, line, or polygon barriers to use in the OD Cost Matrix analysis. This parameter is optional.
Precalculate Network Locations (Python: Precalculate_Network_Locations) - When you solve a network analysis, the input points must "locate" on the network used for the analysis. When chunking your inputs to solve in parallel, inputs may be used many times. Rather than calculating the network location fields for each input every time it is used, it is more efficient to calculate all the network location fields up front and re-use them. Set this parameter to True to pre-calculate the network location fields. This is recommended for every situation unless:
- You are using a portal URL as the network data source. In this case, pre-calculating network locations is not possible, and the parameter is hidden.
- You have already pre-calculated the network location fields using the network dataset and travel mode you are using for this analysis. In this case, you can save time by not precalculating them again.
Spatially Sort Inputs (Python: Sort_Inputs) - A Boolean indicating whether to spatially sort origins and destinations prior to commencing the parallel solve. Using sorted data will improve the efficiency of the chunking behavior significantly, and it may reduce the number of credits consumed if you're using a service that charges credits. If your input data is already sorted, then sorting is not useful, and you should set this parameter to false. Otherwise, you should set his parameter to true. Note, however, that spatial sorting is only available if you have the Advanced license. Spatial sorting will be skipped automatically if you don't have the necessary license, and the parameter will be hidden in the tool dialog.

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

You can run the tool in ArcGIS Pro just like any other geoprocessing tool. You just need to connect to the provided Python toolbox from the Catalog Pane either in the Toolboxes section or the Folders section.

If you plan to use ArcGIS Online or a portal as your network data source, make sure you're connected to that portal in your current Pro session.

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix(
    Origins, Destinations, Network_Data_Source, Travel_Mode, Time_Units, Distance_Units,
    Max_Inputs_Per_Chunk, Max_Processes, Output_Updated_Origins, Output_Updated_Destinations,
    Output_Format, Output_OD_Lines_Feature_Class, Output_Folder,
    Cutoff, Num_Destinations, Time_Of_Day, Barriers, Precalculate_Network_Locations
)

You can also run the provided scripts by directly calling solve_large_odcm.py from the command line instead of using the geoprocessing tool as the code's gateway. Call python solve_large_odcm.py -h to print the command line help to show you how to do this.

Recommended settings for best performance

The tool includes several settings that can impact the overall run time.

For best performance, use the "CSV files" or "Apache Arrow files" option for the Output OD Cost Matrix Format instead of the "Feature class" option, which is much slower to process.

The other main consideration is what type of network data source is being used for the analysis, and the optimal chunk size and number of parallel processes depend on this choice.

If the network data source is ArcGIS Online, the Maximum Number of Parallel Processes parameter is capped at 4 concurrent processes so as not to overload the service for other users. The ArcGIS Online OD Cost Matrix service also limits the number of origins and destinations allowed in a single problem. As of this writing, that number is 1000, so the Maximum Origins and Destinations per Chunk parameter value cannot be greater than 1000. If you enter a larger number, the tool will automatically reduce the chunk size to the maximum allowed.

If the network data source is an ArcGIS Enterprise service, the service configuration may limit the number of allowed concurrent processes, and this number should not be exceeded for the Maximum Number of Parallel Processes parameter. (If you are the service administrator, you can update the service configuration to increase this number.) However, you also shouldn't exceed the number of logical processors of your machine (the client) because the client manages the jobs sent to the server and cannot manage more concurrent processes than it has logical cores available. Uncommonly, ArcGIS Enterprise services also limit the number of allowed inputs, and in this case the tool will automatically adjust the Maximum Origins and Destinations per Chunk to that limit if the input value is too large. However, usually ArcGIS Enterprise services do not include such limits, and the recommended chunk size depends on whether the service's network dataset is in a file geodatabase or a mobile geodatabase as discussed below.

If the network data source is a network dataset in a file geodatabase, set the Maximum Number of Parallel Processes to the number of logical processors of your machine. A Maximum Origins and Destinations per Chunk value of around 1000 or 2000 typically works best, even for very large input datasets, because these small chunks solve very quickly.

If the network data source is a network dataset in a mobile geodatabase, the internal OD Cost Matrix solver functions a little differently than it does for file geodatabase network datasets. The internal solver does its own multithreaded, parallelized operations spread across your machine's resources, so additional parallelization on the client side will not improve performance. A Maximum Number of Parallel Processes value of 2 to 4 is recommended. Additionally, because of this internal parallelization, larger OD Cost Matrix problems solve more quickly than with file geodatabase data, so you may have better overall tool run times using a Maximum Origins and Destinations per Chunk value around 10,000.

To some extent, the best chunk size depends on the configuration of your input data. The tool will spatially sort the input data if you have the Advanced license, and sorted data allows for smarter chunking. Before solving the OD Cost Matrix for each chunk of origins and destinations, it first does a simple and quick straight-line filter to remove any destinations that are very far away, and if all destinations are filtered out, the chunk will be skipped. Smaller chunks are more likely to be skipped than larger chunks, particularly if your data is highly clustered.

Technical explanation of how this tool works

The tool consists of several scripts:

LargeNetworkAnalysisTools.pyt: This defines the python toolbox and the tool as you see it in the ArcGIS Pro UI. It does some minimal parameter validation and calls solve_large_odcm.py to actually run the analysis.
solve_large_odcm.py: This defines a class, ODCostMatrixSolver, that validates and preprocesses the inputs and then calls parallel_odcm.py as a subprocess to do the parallel solves. The class also parses log messages from the parallel_odcm.py and writes them out as geoprocessing messages.
parallel_odcm.py: This script chunks the inputs, solves the OD Cost Matrices in parallel, and postprocesses the results.
od_config.py: In this file, you can override some OD Cost Matrix analysis settings that are not included in the tool dialog. This is provided to make the scripts easier to customize so you don't have to dig through the more complex parts of the code to find these simple settings.
helpers.py: Contains some helper methods and global variables.

Why do we have both solve_large_odcm.py and parallel_odcm.py? Why do we call parallel_odcm.py as a subprocess? This is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations. Thus, solve_large_odcm.py does all the pre-processing in the main python process, but it passes the inputs to parallel_odcm.py as a separate subprocess, and that subprocess can, in turn, spin up parallel processes for the OD Cost Matrix calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Solve Large Analysis With Known OD Pairs tool

The Solve Large Analysis With Known OD Pairs tool can be used to calculate the travel time and distance and generate routes between preassigned origin-destination pairs. It can calculate many routes simultaneously by chunking up the problem and solving in parallel.

Multiple types of origin-destination pairs are supported:

one-to-one: A field in the input origins table indicates which destination the origin is assigned to. Each origin can be assigned to only one destination.
many-to-many: A separate table defines the preassigned origin-destination pairs. A single origin may be assigned to multiple destinations.

Solve Large Analysis With Known OD Pairs tool inputs

Origins (Python: Origins) - The feature class or layer containing the origins. If your feature class includes fields in the Route solver object input Stops schema, with the exception of "RouteName", "Sequence", and "LocationType", these fields will automatically be used in the analysis. For the one-to-one Origin-Destination Assignment Type, the origins dataset must have a field populated with the ID of the destination the origin is assigned to.
Origin Unique ID Field (Python: Origin_Unique_ID_Field) - A field in origins representing the origin's unique ID. For the many-to-many Origin-Destination Assignment Type, the values in the Origin-Destination Pair Table's origin ID field must correspond to these unique origin IDs.
Destinations (Python: Destinations) - The feature class or layer containing the destinations. If your feature class includes fields in the Route solver object input Stops schema, with the exception of "RouteName", "Sequence", and "LocationType", these fields will automatically be used in the analysis.
Destination Unique ID Field (Python: Destination_Unique_ID_Field) - A field in destinations representing the destination's unique ID. For the one-to-one Origin-Destination Assignment Type, the values in the origins table's Assigned Destination Field should correspond to these unique destination IDs. For the many-to-many Origin-Destination Assignment Type, the values in the Origin-Destination Pair Table's destination ID field must correspond to these unique destination IDs.
Origin-Destination Assignment Type (Python: OD_Pair_Type) - A text string indicating which type of preassigned origin-destination pairs to use for the analysis. The options are:
- A field in Origins defines the assigned Destination (one-to-one)
- A separate table defines the origin-destination pairs (many-to-many)
Assigned Destination Field (Python: Assigned_Destination_Field) - A field in Origins indicating the ID of the destination each origin is assigned to. Any origin with a null value or a value that does not match a valid destination ID will be ignored in the analysis. This parameter is only applicable for the one-to-one Origin-Destination Assignment Type.
Origin-Destination Pair Table (Python: OD_Pair_Table) - A table or CSV file defining origin-destination pairs. The table must be populated with a column of origin IDs matching values in the Origin Unique ID Field of the Origins table and a column of destination IDs matching values in the Destination Unique ID Field of the Destinations table. This parameter is only applicable for the many-to-many Origin-Destination Assignment Type.
Origin ID Field in Origin-Destination Pair Table (Python: Pair_Table_Origin_Unique_ID_Field) - The field name in the Origin-Destination Pair Table defining the origin IDs. This parameter is only applicable for the many-to-many Origin-Destination Assignment Type.
Destination ID Field in Origin-Destination Pair Table (Python: Pair_Table_Destination_Unique_ID_Field) - The field name in the Origin-Destination Pair Table defining the destination IDs. This parameter is only applicable for the many-to-many Origin-Destination Assignment Type.
Network Data Source (Python: Network_Data_Source) - Network dataset, network dataset layer, or portal URL to use when calculating the Route analysis.
Travel Mode (Python: Travel_Mode) - Network travel mode to use when calculating the Route analysis.
Time Units (Python: Time_Units) - The time units the output travel time will be reported in.
Distance Units (Python: Distance_Units) - The distance units the output travel distance will be reported in.
Maximum OD Pairs per Chunk (Python: Max_Pairs_Per_Chunk) - Defines the chunk size for parallel Route calculations, the number of origin-destination routes to calculate simultaneously. For example, if you want to process a maximum of 1000 origins and 1000 destinations in a single chunk, for a total of 1000 paired routes, set this parameter to 1000.
Maximum Number of Parallel Processes (Python: Max_Processes) - Defines the maximum number of parallel processes to run at once. Do not exceed the number of logical processors of your machine.
Output Routes (Python: Output_Routes) - Path to the output feature class that will contain the calculated routes between origins and their assigned destinations. The schema of this feature class is described in the arcpy documentation. The feature class's schema matches that described for the Routes table in the documentation, plus two additional fields, "OriginUniqueID" and "DestinationUniqueID", containing the unique ID field values for the origin and destination this route connects.
Time of Day (Python: Time_Of_Day) - The start time of day for the analysis. No value indicates a time neutral analysis.
Barriers (Python: Barriers) - Point, line, or polygon barriers to use in the OD Cost Matrix analysis. This parameter is optional.
Precalculate Network Locations (Python: Precalculate_Network_Locations) - When you solve a network analysis, the input points must "locate" on the network used for the analysis. If origins and destinations are used more than once, it is more efficient to calculate the network location fields up front and re-use them. Set this parameter to True to pre-calculate the network location fields. This is recommended unless:
- You are using a portal URL as the network data source. In this case, pre-calculating network locations is not possible, and the parameter is hidden.
- You have already pre-calculated the network location fields using the network dataset and travel mode you are using for this analysis. In this case, you can save time by not precalculating them again.
- Each destination has only one assigned origin. In this case, there is no efficiency gain in calculating the location fields in advance.
Sort Origins by Assigned Destination (Python: Sort_Origins) - A Boolean indicating whether to sort origins by their assigned destination prior to commencing the parallel solve. Using sorted data will improve the efficiency of the solve slightly. If your input data is already sorted, or if no origins are assigned to the same destinations, then sorting is not useful, and you should set this parameter to false. This parameter is only applicable for the one-to-one Origin-Destination Assignment Type.
Reverse Direction of Travel (Python: Reverse_Direction) - A Boolean indicating whether to reverse the direction of travel and calculate the route from the destination to the origin. The default is false. This parameter is only applicable for the one-to-one Origin-Destination Assignment Type.

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

If you plan to use ArcGIS Online or a portal as your network data source, make sure you're connected to that portal in your current Pro session.

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.SolveLargeAnalysisWithKnownPairs(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.SolveLargeAnalysisWithKnownPairs(
    Origins, Origin_Unique_ID_Field, Destinations, Destination_Unique_ID_Field,
    OD_Pair_Type, Assigned_Destination_Field, OD_Pair_Table,
    Pair_Table_Origin_Unique_ID_Field, Pair_Table_Destination_Unique_ID_Field,
    Network_Data_Source, Travel_Mode, Time_Units, Distance_Units,
    Max_Pairs_Per_Chunk, Max_Processes, Output_Routes,
    Time_Of_Day, Barriers, Precalculate_Network_Locations, Sort_Origins, Reverse_Direction
)

You can also run the provided scripts by directly calling solve_large_route_pair_analysis.py from the command line instead of using the geoprocessing tool as the code's gateway. Call python solve_large_route_pair_analysis.py -h to print the command line help to show you how to do this.

Technical explanation of how this tool works

The tool consists of several scripts:

LargeNetworkAnalysisTools.pyt: This defines the python toolbox and the tool as you see it in the ArcGIS Pro UI. It does some minimal parameter validation and calls solve_large_route_pair_analysis.py to actually run the analysis.
solve_large_route_pair_analysis.py: This defines a class, RoutePairSolver, that validates and preprocesses the inputs and then calls parallel_route_pairs.py as a subprocess to do the parallel solves. The class also parses log messages from the parallel_route_pairs.py and writes them out as geoprocessing messages.
parallel_route_pairs.py: This script chunks the inputs, solves the Route analyses in parallel, and postprocesses the results.
rt_config.py: In this file, you can override some Route analysis settings that are not included in the tool dialog. This is provided to make the scripts easier to customize so you don't have to dig through the more complex parts of the code to find these simple settings.
helpers.py: Contains some helper methods and global variables.

Why do we have both solve_large_route_pair_analysis.py and parallel_route_pairs.py? Why do we call parallel_route_pairs.py as a subprocess? This is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations. Thus, solve_large_route_pair_analysis.py does all the pre-processing in the main python process, but it passes the inputs to parallel_route_pairs.py as a separate subprocess, and that subprocess can, in turn, spin up parallel processes for the Route calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Parallel Calculate Locations tool

The Parallel Calculate Locations tool can be used to efficiently precalculate network locations for a large dataset by chunking up the input feature class and calculates the network locations in parallel.

Note: This tool is provided in case the only thing you want to do is calculate network locations for a large dataset. If you're going to run the Solve Large OD Cost Matrix or Solve Large Analysis With Known OD Pairs tools, those tools can automatically precalculate network locations when you run them, and they use the same parallelized logic as the Parallel Calculate Locations tool.

Parallel Calculate Locations tool inputs

The tool inputs are similar to those in the core Calculate Locations tool. Please see that tool's official documentation for more details about some of the parameters.

Input Features (Python: Input_Features) - The point feature class or layer whose network locations you want to calculate.
Output Features (Python: Output_Features) - The catalog path to the output feature class. Unlike in the core Calculate Locations tool, this tool generates a new feature class instead of merely adding fields to the original. A new feature class must be generated during the parallel processing, and as a result, the ObjectIDs may change, so we ask the user to specify an output feature class path instead of overwriting the original. We also do this to avoid accidentally deleting the user's original data if the tool errors.
Network Dataset (Python: Network_Dataset) - Network dataset or network dataset layer to use when calculating network locations.
Maximum Features per Chunk (Python: Max_Features_Per_Chunk) - Defines the number of features that will be in each chunk in the parallel processing.
Maximum Number of Parallel Processes (Python: Max_Processes) - Defines the maximum number of parallel processes to run at once. Do not exceed the number of logical processors of your machine.
Travel Mode (Python: Travel_Mode) - Network travel mode to use when calculating network locations. This parameter is optional.
Search Tolerance (Python: Search_Tolerance) - The maximum search distance that will be used when locating the input features on the network. Features that are outside the search tolerance will be left unlocated. The default is 5000 meters.
Search Criteria (Python: Search_Criteria) - The network dataset source feature classes on which input features are allowed to locate. The default locatable sources for the network will be used if you don't specify a value for this parameter.
Search Query (Python: Search_Query) - An optional query for each network dataset source feature class filtering the source features that can be located on. By default, no query is used for any source.

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

Note: Limitations of arcpy prevented me from using the standard SQL query builder control in the tool UI for the Search Query parameter, so you must specify the SQL query expression manually as a string. The tool does some validation to ensure that the strings are usable, but it doesn't provide any help in constructing them. The easiest way to get the queries right is to do as follows:

Open the core Calculate Locations tool (the standard one in the Network Analyst Tools toolbox).
Set the input features and the network dataset.
Use the Search Query control in the Calculate Locations tool to construct the queries you want using the SQL expression builder.
Click the SQL button on the query builder to see the raw SQL syntax and copy it.
Paste the SQL query string into the Parallel Calculate Locations tool dialog.

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.ParallelCalculateLocations(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.ParallelCalculateLocations(
    Input_Features, Output_Features, Network_Dataset,
    Max_Features_Per_Chunk, Max_Processes,
    Travel_Mode, Search_Tolerance, Search_Criteria, Search_Query
)

Tool output

The output feature class will be a copy of the input feature class with the network location fields appended. Because the original ObjectIDs may have shifted, the output feature class includes an ORIG_OID field with the values of the original ObjectID. (If the feature class already had an ORIG_OID field, the new field may be called ORIG_OID1, ORIG_OID2, etc.)

Technical explanation of how this tool works

The tool consists of two main scripts:

LargeNetworkAnalysisTools.pyt: This defines the python toolbox and the tool as you see it in the ArcGIS Pro UI. It does some minimal parameter validation, makes a backup copy of the input locations, and calls parallel_calculate_locations.py as a subprocess to do the parallel processing.
parallel_calculate_locations.py: This script chunks the inputs and calculates the locations in parallel.
helpers.py: Contains some helper methods and global variables.

Calling parallel_calculate_locations.py as a subprocess is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Resources

Issues

Find a bug or want to request a new feature? Please let us know by submitting an issue.

Contributing

Esri welcomes contributions from anyone and everyone. Please see our guidelines for contributing.

Licensing

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

A copy of the license is available in the repository's license.txt file.

large-network-analysis-tools's People

Contributors

Stargazers

Watchers

Forkers

marcusblake-uc shafaypro jongfranco erin-1919 isabella232 simrit1 yaslan

large-network-analysis-tools's Issues

Enhance "Solve Large Analysis With Known OD Pairs" tool to do many-to-many problems and accept a table of OD pairs

First GitHub issue report. Please excuse any GitHub contributing guidelines faux pas...

I am executing the "Solve Large Analysis With Known OD Pairs" tool via a Python script. The script does the following:

Reads an input OD Pairs csv file with 6 required variables: Origin_ID, Origin_Longitude, Origin_Latitude, Destination_ID, Destination_Longitude, Destination_Latitude
Creates two layers as required by the tool: (1) Origins (Origin_ID, Origin_Longitude, Origin_Latitude, Destination_ID; unique on the combination of Origin_ID and Destination_ID); and (2) Destinations (Destination_ID, Destination_Longitude, Destination_Latitude; unique on Destination_ID).
Executes the tool given user inputs and using the layers created in Step 2
Does a table-to-table conversion on the output layer to return a CSV file with the calculated travel metrics merged onto a copy of the original file

However, I am encountering an unexpected error at step 3 when attempting to execute the tool. Specifically:

arcgisscripting.ExecuteError: Non-unique values were found in the unique ID field Origin_ID in <MappingLayerObject object at 0x0000026C2BE445E0>.
Invalid inputs.
Failed to execute (SolveLargeAnalysisWithKnownPairs).

This error message is surprising to me: the origins in the Origins input layer -- and hence the Origin_ID values -- are expected to be non-unique given the requirement to include for each origin the set of assigned destinations. Right? Or should I read the following literally to say that each origin can only be assigned a single destination:

Assigned Destination Field (Python: Assigned_Destination_Field) - A field in origins indicating the ID of the destination each origin is assigned to. Any origin with a null value or a value that does not match a valid destination ID will be ignored in the analysis.

If that's the case, no issue per se, but it would be helpful if the tool were generalized to accept a list of OD pairs. Each pair unique on the combination of origin ID and destination ID, but not necessarily on either individually.

P.S. if this does in fact turn out to be a question of functionality rather than an issue, here's my minor issue contribution: the GitHub landing page for large-network-analysis-tools (https://github.com/Esri/large-network-analysis-tools) has a minor oversight in the "Solve Large Analysis With Known OD Pairs tool" section. The tool signature under the "Running the tool from standalone Python" subsection references the right arguments but the wrong function: "arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix" rather than "arcpy.LargeNetworkAnalysisTools.SolveLargeAnalysisWithKnownPairs" as above.

Thank you in advance!

Known pairs issues/questions

I am using the known pairs tool, and I noticed that when I put in 1/3/1990 7:20:00 AM I got 6/8/2022 7:20 AM showing in the results. Is there something I am doing wrong? My goal is to get the traffic time for a Wednesday at 7:20:00 AM. I am also having an issue as how to set curb approach for the known pairs tool, any guidance would be helpful, to get that to work. I have been comparing the times between the known pairs tool and the large of cost matrix, and they are not turning out the same, I suspect that they may be cause by the issues described above.

Modifying for Service Area

How do I adapt this code to utilise Service Area analysis rather than OD Cost Matrix, and to also loop through multiple polygon barrier files?

Write logger messages from subprocess to GP UI

The script tool spins up a subprocess so it can launch further subprocesses with concurrent.futures to solve OD cost matrices in parallel. Currently the GP UI does not show any of the logger messages from the subprocess because it can't access them.

I found a way to do this. See this blog post:
https://fabianlee.org/2019/09/15/python-getting-live-output-from-subprocess-using-poll/

I am prototyping this for another project and will push updates here once I'm comfortable with the solution and have some time. Initial tests show that it actually does work.

Create input parameters to designate unique ID fields for the input origins and destinations

Suggestion from a user: It would be useful if users could specify the origin and destination fields to use as identifiers.

This user got tripped up matching the output lines to their original input data since the OriginOID and DestinationOID values refer to the values in the output Origins and Destinations tables instead of the input.

Resolving this would involve adding more joins on the output but would not be horrendously difficult.

Empty geometries in Line-FeatureClass result of SolveLargeODCostMatrix

Hi,

first, thank you for your work here!

I have a question about the featureClass Output of the SolveLargeODCostMatrix tool.
I use ArcGIS Pro 2.9.5 (Python 3.7.11) under Windows 10, 64-bit and downloaded the last version of the tools here.

The tool SolveLargeODCostMatrix runs without an error or a warning from the Toolbox in ArcGIS Pro.
The table of the Output OD Matrix is fine, Total_Time and Total_Distance and DestinationRank is calculated correctly,
Just the LineFeatures between Origin and Destinations are missing. There are empty geometries in the line-layer output, also Shape_Length in the table is 0.
I tried a lot, different origin and destination input sources (5-some hundred , different projections of all layers (4326 and 25832), nothing worked, always empty geometries. If i use the OD cost matrix from the network analyst toolbar with the same input, everything is fine.

My settings were (I tried some changes but nothing changed my empyt geometry results):
Travel mode "Driving time", Unit "Meters", max origins "100" up to "1000", parallel processe "4", Output OD "Feature class", cutoff "", Number of Destinations "2", Time of day "", Precalulcate "Yes", Sort "Yes".

Is this a known issue?

Regards Christoph

Time and date

Hi, is there a way to put in time_of_day or some kind of day and time for the origins? Such as a Wednesday at 8 pm, or using Unix time converted to milliseconds?

Is it possible to set the Calculate Network Locations to parallel processing as well?

I regularly have to process 13m+ points which get updated monthly. It currently takes me well over a day to calculate the network locations before sending them through the OD Matrix part, and each month I have to redo them from scratch (they are new locations, so I can't reuse old values).

Is it possible to spread the Calculate Network Locations tool over multiple processes in the same way as the OD Matrix part?

That level of Python is well above my level of understanding, but any updates to the tool, or tips on how best to pre-process the data would be greatly appreciated!

@mmorang

Solve Large Analysis with known OD Pairs tool fails

I have been trying to use the tool within ArcGIS Pro, with a GTFS network dataset that I have developed. However, when running the tool, it fails and I get the following message in Pro:

File "C:\Users\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\concurrent\futures\process.py", line 239, in _process_worker

I have double-checked the data, and they don't appear to have any issue. Note that I have also performed route analysis using the same data and network dataset successfully.

I am not sure what the above message means. Any advice?

Thanks!

Alter outputting OID to other unique ID from input origins/destinations

Is there a way I can alter the output OriginOID and DestinationOID so that they correspond to another unique ID field (instead of ObjectID) of the input origin/destination feature classes?

Curb approach

Is it possible to get curb approach added as a feature to this tool? I would like to set the curb approach for everything to the right hand side of the vehicle. I understand that I would have to do something with field mappings, but I am unsure how to do that.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.