layers
build_data_map
Assign a GeoDataFrame to a rustalgos.graph.NetworkStructure. A NetworkStructure provides the backbone for the calculation of land-use and statistical aggregations over the network. Points will be assigned to the closest street edge. Polygons will be assigned to the closest n_nearest_candidates adjacent street edges. up to
Parameters
A GeoDataFrame representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
A rustalgos.graph.NetworkStructure. Best generated with the io.network_structure_from_nx function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
A GeoDataFrame representing barriers. These barriers will be considered during the assignment of data points to the network.
The number of nearest street edge candidates to consider when assigning data points to the network. This is used to determine the best assignments based on proximity. Edges are sorted by distance and the closest n_nearest_candidates are considered.
Returns
A rustalgos.data.DataMap instance.
compute_accessibilities
Compute land-use accessibilities for the specified land-use classification keys over the street network. The landuses are aggregated and computed over the street network relative to the network nodes, with the implication that the measures are generated from the same locations as those used for centrality computations.
Parameters
A GeoDataFrame representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column label from which to take landuse categories, e.g. a column labelled “landuse_categories” might contain “shop”, “pub”, “school”, etc.
Land-use keys for which to compute accessibilities. The keys should be selected from the same land-use schema used for the landuse_labels parameter, e.g. “pub”. The calculations will be performed in both weighted wt and non_weighted nw variants.
A GeoDataFrame representing nodes. Best generated with the io.network_structure_from_nx function. The outputs of calculations will be written to this GeoDataFrame, which is then returned from the function.
A rustalgos.graph.NetworkStructure. Best generated with the io.network_structure_from_nx function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Distances corresponding to the local thresholds to be used for calculations. The for distance-weighted metrics will be determined implicitly using min_threshold_wt. If the distances parameter is not provided, then the beta or minutes parameters must be provided instead.
A list of to be used for the exponential decay function for weighted metrics. The thresholds for unweighted metrics will be determined implicitly. If the betas parameter is not provided, then the distances or minutes parameter must be provided instead.
A list of walking times in minutes to be used for calculations. The thresholds for unweighted metrics and for distance-weighted metrics will be determined implicitly using the speed_m_s and min_threshold_wt parameters. If the minutes parameter is not provided, then the distances or betas parameters must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
A GeoDataFrame representing barriers. These barriers will be considered during the assignment of data points to the network.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve.
The number of nearest candidates to consider when assigning respective data points to the nearest adjacent streets.
The default min_threshold_wt parameter can be overridden to generate custom mappings between the distance and beta parameters. See rustalgos.distances_from_beta for more information.
The default speed_m_s parameter can be configured to generate custom mappings between walking times and distance thresholds .
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf parameter is returned with additional columns populated with the calcualted metrics. Three columns will be returned for each input landuse class and distance combination; a simple count of reachable locations, a distance weighted count of reachable locations, and the smallest distance to the nearest location.
The input data_gdf is returned with two additional columns: nearest_assigned and next_nearest_assign.
Notes
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G)
print(nodes_gdf.head())
landuses_gdf = mock.mock_landuse_categorical_data(G)
print(landuses_gdf.head())
nodes_gdf, landuses_gdf = layers.compute_accessibilities(
data_gdf=landuses_gdf,
landuse_column_label="categorical_landuses",
accessibility_keys=["a", "c"],
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
print(nodes_gdf.columns)
# weighted form
print(nodes_gdf["cc_c_400_wt"])
# non-weighted form
print(nodes_gdf["cc_c_400_nw"])
# nearest distance to landuse
print(nodes_gdf["cc_c_nearest_max_800"])
compute_mixed_uses
Compute landuse metrics. This function wraps the underlying rust optimised functions for aggregating and computing various mixed-use. These are computed simultaneously for any required combinations of measures (and distances). By default, hill and hill weighted measures will be computed, by the available flags e.g. compute_hill or compute_shannon can be used to configure which classes of measures should run.
See the accompanying paper on arXiv for additional information about methods for computing mixed-use measures at the pedestrian scale.
The data is aggregated and computed over the street network, with the implication that mixed-use and land-use accessibility aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared. The outputs of the calculations are written to the corresponding node indices in the same node_gdf GeoDataFrame used for centrality methods, and which will display the calculated metrics under correspondingly labelled columns.
Parameters
A GeoDataFrame representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column label from which to take landuse categories, e.g. a column labelled “landuse_categories” might contain “shop”, “pub”, “school”, etc., landuse categories.
A GeoDataFrame representing nodes. Best generated with the io.network_structure_from_nx function. The outputs of calculations will be written to this GeoDataFrame, which is then returned from the function.
A rustalgos.graph.NetworkStructure. Best generated with the io.network_structure_from_nx function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Compute Hill diversity. This is the recommended form of diversity index. Computed for q of 0, 1, and 2.
Compute distance weighted Hill diversity. This is the recommended form of diversity index. Computed for q of 0, 1, and 2.
Compute shannon entropy. Hill diversity of q=1 is generally preferable.
Compute the gini form of diversity index. Hill diversity of q=2 is generally preferable.
Distances corresponding to the local thresholds to be used for calculations. The for distance-weighted metrics will be determined implicitly using min_threshold_wt. If the distances parameter is not provided, then the beta or minutes parameters must be provided instead.
A list of to be used for the exponential decay function for weighted metrics. The thresholds for unweighted metrics will be determined implicitly. If the betas parameter is not provided, then the distances or minutes parameter must be provided instead.
A list of walking times in minutes to be used for calculations. The thresholds for unweighted metrics and for distance-weighted metrics will be determined implicitly using the speed_m_s and min_threshold_wt parameters. If the minutes parameter is not provided, then the distances or betas parameters must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
A GeoDataFrame representing barriers. These barriers will be considered during the assignment of data points to the network.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve.
The number of nearest candidates to consider when assigning respective data points to the nearest adjacent streets.
The default min_threshold_wt parameter can be overridden to generate custom mappings between the distance and beta parameters. See rustalgos.distances_from_beta for more information.
The default speed_m_s parameter can be configured to generate custom mappings between walking times and distance thresholds .
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf parameter is returned with additional columns populated with the calculated metrics.
The input data_gdf is returned with two additional columns: nearest_assigned and next_nearest_assign.
Notes
| key | formula | notes |
|---|---|---|
| hill | Hill diversity: this is the preferred form of diversity metric because it adheres to the replication principle and uses units of effective species instead of measures of information or uncertainty. The q parameter controls the degree of emphasis on the richness of species as opposed to the balance of species. Over-emphasis on balance can be misleading in an urban context, for which reason research finds support for using q=0: this reduces to a simple count of distinct land-uses. | |
| hill_wt | This is a distance-weighted variant of Hill Diversity based on the distances from the point of computation to the nearest example of a particular land-use. It therefore gives a locally representative indication of the intensity of mixed-uses. is a negative exponential function where controls the strength of the decay. ( is provided by the Network Layer, see rustalgos.distances_from_beta.) | |
| shannon | Shannon diversity (or_information entropy_) is one of the classic diversity indices. Note that it is preferable to use Hill Diversity with q=1, which is effectively a transformation of Shannon diversity into units of effective species. | |
| gini | Gini-Simpson is another classic diversity index. It can behave problematically because it does not adhere to the replication principle and places emphasis on the balance of species, which can be counter-productive for purposes of measuring mixed-uses. Note that where an emphasis on balance is desired, it is preferable to use Hill Diversity with q=2, which is effectively a transformation of Gini-Simpson diversity into units of effective species. |
hill_wt at q=0 is generally the best choice for granular landuse data, or else q=1 or
q=2 for increasingly crude landuse classifications schemas.
A worked example:
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G)
print(nodes_gdf.head())
landuses_gdf = mock.mock_landuse_categorical_data(G)
print(landuses_gdf.head())
nodes_gdf, landuses_gdf = layers.compute_mixed_uses(
data_gdf=landuses_gdf,
landuse_column_label="categorical_landuses",
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
# the data is written to the GeoDataFrame
print(nodes_gdf.columns)
# access accordingly, e.g. hill diversity at q=0 and 800m
print(nodes_gdf["cc_hill_q0_800_nw"])
Be cognisant that mixed-use and land-use accessibility measures are sensitive to the classification schema that has been used. Meaningful comparisons from one location to another are only possible where the same schemas have been applied.
compute_stats
Compute numerical statistics over the street network. This function wraps the underlying rust optimised function for computing statistical measures. The data is aggregated and computed over the street network relative to the network nodes, with the implication that statistical aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared.
Parameters
A GeoDataFrame representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column labels corresponding to the columns in data_gdf from which to take numerical information.
A GeoDataFrame representing nodes. Best generated with the io.network_structure_from_nx function. The outputs of calculations will be written to this GeoDataFrame, which is then returned from the function.
A rustalgos.graph.NetworkStructure. Best generated with the io.network_structure_from_nx function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Distances corresponding to the local thresholds to be used for calculations. The for distance-weighted metrics will be determined implicitly using min_threshold_wt. If the distances parameter is not provided, then the beta or minutes parameters must be provided instead.
A list of to be used for the exponential decay function for weighted metrics. The thresholds for unweighted metrics will be determined implicitly. If the betas parameter is not provided, then the distances or minutes parameter must be provided instead.
A list of walking times in minutes to be used for calculations. The thresholds for unweighted metrics and for distance-weighted metrics will be determined implicitly using the speed_m_s and min_threshold_wt parameters. If the minutes parameter is not provided, then the distances or betas parameters must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
A GeoDataFrame representing barriers. These barriers will be considered during the assignment of data points to the network.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve.
The number of nearest candidates to consider when assigning respective data points to the nearest adjacent streets.
The default min_threshold_wt parameter can be overridden to generate custom mappings between the distance and beta parameters. See rustalgos.distances_from_beta for more information.
The default speed_m_s parameter can be configured to generate custom mappings between walking times and distance thresholds .
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf parameter is returned with additional columns populated with the calcualted metrics.
The input data_gdf is returned with two additional columns: nearest_assigned and next_nearest_assign.
Notes
A worked example:
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G)
print(nodes_gdf.head())
numerical_gdf = mock.mock_numerical_data(G, num_arrs=3)
print(numerical_gdf.head())
nodes_gdf, numerical_gdf = layers.compute_stats(
data_gdf=numerical_gdf,
stats_column_label="mock_numerical_1",
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
print(nodes_gdf.columns)
# weighted form
print(nodes_gdf["cc_mock_numerical_1_mean_400_wt"])
# non-weighted form
print(nodes_gdf["cc_mock_numerical_1_mean_400_nw"])
The following stat types will be available for each stats_key for each of the
computed distances:
maxandminsumandsum_wtmeanandmean_wtcountandcount_wtmedianandmedian_wtvarianceandvariance_wtmadandmad_wt(deviation from the median)