layers
assign_gdf_to_network
Assign a GeoDataFrame
to a rustalgos.NetworkStructure
. A NetworkStructure
provides the backbone for the calculation of land-use and statistical aggregations over the network. Data points will be assigned to the two closest network nodes — one in either direction — based on the closest adjacent street edge. This facilitates a dynamic spatial aggregation strategy which will select the shortest distance to a data point relative to either direction of approach.
Parameters
A GeoDataFrame
representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
Returns
A rustalgos.DataMap
instance.
The input data_gdf
is returned with two additional columns: nearest_assigned
and next_neareset_assign
.
Notes
The max_assign_dist
parameter should not be set overly low. The max_assign_dist
parameter sets a crow-flies
distance limit on how far the algorithm will search in its attempts to encircle the data point. If the
max_assign_dist
is too small, then the algorithm is potentially hampered from finding a starting node; or, if a
node is found, may have to terminate exploration prematurely because it can’t travel sufficiently far from the
data point to explore the surrounding network. If too many data points are not being successfully assigned to the
correct street edges, then this distance should be increased. Conversely, if most of the data points are
satisfactorily assigned, then it may be possible to decrease this threshold. A distance of around 400m may provide
a good starting point.
The precision of assignment improves on decomposed networks (see graphs.nx_decompose), which offers the additional benefit of a more granular representation of variations of metrics along street-fronts.
Example assignment on a non-decomposed graph.
Assignment of data to network nodes becomes more contextually precise on decomposed graphs.
compute_accessibilities
Compute land-use accessibilities for the specified land-use classification keys over the street network. The landuses are aggregated and computed over the street network relative to the network nodes, with the implication that the measures are generated from the same locations as those used for centrality computations.
Parameters
A GeoDataFrame
representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column label from which to take landuse categories, e.g. a column labelled “landuse_categories” might contain “shop”, “pub”, “school”, etc.
Land-use keys for which to compute accessibilities. The keys should be selected from the same land-use schema used for the landuse_labels
parameter, e.g. “pub”. The calculations will be performed in both weighted wt
and non_weighted nw
variants.
A GeoDataFrame
representing nodes. Best generated with the io.network_structure_from_nx
function. The outputs of calculations will be written to this GeoDataFrame
, which is then returned from the function.
A rustalgos.NetworkStructure
. Best generated with the io.network_structure_from_nx
function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Distances corresponding to the local thresholds to be used for calculations. The parameters (for distance-weighted metrics) will be determined implicitly. If the distances
parameter is not provided, then the beta
parameter must be provided instead.
A , or array of to be used for the exponential decay function for weighted metrics. The distance
parameters for unweighted metrics will be determined implicitly. If the betas
parameter is not provided, then the distance
parameter must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve
.
The default min_threshold_wt
parameter can be overridden to generate custom mappings between the distance
and beta
parameters. See rustalgos.distances_from_beta
for more information.
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale
. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf
parameter is returned with additional columns populated with the calcualted metrics. Three columns will be returned for each input landuse class and distance combination; a simple count of reachable locations, a distance weighted count of reachable locations, and the smallest distance to the nearest location.
The input data_gdf
is returned with two additional columns: nearest_assigned
and next_neareset_assign
.
Notes
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G, crs=3395)
print(nodes_gdf.head())
landuses_gdf = mock.mock_landuse_categorical_data(G)
print(landuses_gdf.head())
nodes_gdf, landuses_gdf = layers.compute_accessibilities(
data_gdf=landuses_gdf,
landuse_column_label="categorical_landuses",
accessibility_keys=["a", "c"],
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
print(nodes_gdf.columns)
# weighted form
print(nodes_gdf["cc_c_400_wt"])
# non-weighted form
print(nodes_gdf["cc_c_400_nw"])
# nearest distance to landuse
print(nodes_gdf["cc_c_nearest_max_800"])
compute_mixed_uses
Compute landuse metrics. This function wraps the underlying rust
optimised functions for aggregating and computing various mixed-use. These are computed simultaneously for any required combinations of measures (and distances). By default, hill and hill weighted measures will be computed, by the available flags e.g. compute_hill
or compute_shannon
can be used to configure which classes of measures should run.
See the accompanying paper on arXiv
for additional information about methods for computing mixed-use measures at the pedestrian scale.
The data is aggregated and computed over the street network, with the implication that mixed-use and land-use accessibility aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared. The outputs of the calculations are written to the corresponding node indices in the same node_gdf
GeoDataFrame
used for centrality methods, and which will display the calculated metrics under correspondingly labelled columns.
Parameters
A GeoDataFrame
representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column label from which to take landuse categories, e.g. a column labelled “landuse_categories” might contain “shop”, “pub”, “school”, etc., landuse categories.
A GeoDataFrame
representing nodes. Best generated with the io.network_structure_from_nx
function. The outputs of calculations will be written to this GeoDataFrame
, which is then returned from the function.
A rustalgos.NetworkStructure
. Best generated with the io.network_structure_from_nx
function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Compute Hill diversity. This is the recommended form of diversity index. Computed for q of 0, 1, and 2.
Compute distance weighted Hill diversity. This is the recommended form of diversity index. Computed for q of 0, 1, and 2.
Compute shannon entropy. Hill diversity of q=1 is generally preferable.
Compute the gini form of diversity index. Hill diversity of q=2 is generally preferable.
Distances corresponding to the local thresholds to be used for calculations. The parameters (for distance-weighted metrics) will be determined implicitly. If the distances
parameter is not provided, then the beta
parameter must be provided instead.
A , or array of to be used for the exponential decay function for weighted metrics. The distance
parameters for unweighted metrics will be determined implicitly. If the betas
parameter is not provided, then the distance
parameter must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve
.
The default min_threshold_wt
parameter can be overridden to generate custom mappings between the distance
and beta
parameters. See rustalgos.distances_from_beta
for more information.
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale
. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf
parameter is returned with additional columns populated with the calculated metrics.
The input data_gdf
is returned with two additional columns: nearest_assigned
and next_nearest_assign
.
Notes
key | formula | notes |
---|---|---|
hill | Hill diversity: this is the preferred form of diversity metric because it adheres to the replication principle and uses units of effective species instead of measures of information or uncertainty. The q parameter controls the degree of emphasis on the richness of species as opposed to the balance of species. Over-emphasis on balance can be misleading in an urban context, for which reason research finds support for using q=0 : this reduces to a simple count of distinct land-uses. | |
hill_wt | This is a distance-weighted variant of Hill Diversity based on the distances from the point of computation to the nearest example of a particular land-use. It therefore gives a locally representative indication of the intensity of mixed-uses. is a negative exponential function where controls the strength of the decay. ( is provided by the Network Layer , see rustalgos.distances_from_beta .) | |
shannon | Shannon diversity (or_information entropy_) is one of the classic diversity indices. Note that it is preferable to use Hill Diversity with q=1 , which is effectively a transformation of Shannon diversity into units of effective species. | |
gini | Gini-Simpson is another classic diversity index. It can behave problematically because it does not adhere to the replication principle and places emphasis on the balance of species, which can be counter-productive for purposes of measuring mixed-uses. Note that where an emphasis on balance is desired, it is preferable to use Hill Diversity with q=2 , which is effectively a transformation of Gini-Simpson diversity into units of effective species. |
hill_wt
at q=0
is generally the best choice for granular landuse data, or else q=1
or
q=2
for increasingly crude landuse classifications schemas.
A worked example:
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G, crs=3395)
print(nodes_gdf.head())
landuses_gdf = mock.mock_landuse_categorical_data(G)
print(landuses_gdf.head())
nodes_gdf, landuses_gdf = layers.compute_mixed_uses(
data_gdf=landuses_gdf,
landuse_column_label="categorical_landuses",
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
# the data is written to the GeoDataFrame
print(nodes_gdf.columns)
# access accordingly, e.g. hill diversity at q=0 and 800m
print(nodes_gdf["cc_hill_q0_800_nw"])
Be cognisant that mixed-use and land-use accessibility measures are sensitive to the classification schema that has been used. Meaningful comparisons from one location to another are only possible where the same schemas have been applied.
compute_stats
Compute numerical statistics over the street network. This function wraps the underlying rust
optimised function for computing statistical measures. The data is aggregated and computed over the street network relative to the network nodes, with the implication that statistical aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared.
Parameters
A GeoDataFrame
representing data points. The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.
The column label corresponding to the column in data_gdf
from which to take numerical information.
A GeoDataFrame
representing nodes. Best generated with the io.network_structure_from_nx
function. The outputs of calculations will be written to this GeoDataFrame
, which is then returned from the function.
A rustalgos.NetworkStructure
. Best generated with the io.network_structure_from_nx
function.
The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.
Distances corresponding to the local thresholds to be used for calculations. The parameters (for distance-weighted metrics) will be determined implicitly. If the distances
parameter is not provided, then the beta
parameter must be provided instead.
A , or array of to be used for the exponential decay function for weighted metrics. The distance
parameters for unweighted metrics will be determined implicitly. If the betas
parameter is not provided, then the distance
parameter must be provided instead.
An optional column name for data point keys. This is used for deduplicating points representing a shared source of information. For example, where a single greenspace is represented by many entrances as datapoints, only the nearest entrance (from a respective location) will be considered (during aggregations) when the points share a datapoint identifier.
Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances.
Tolerance in metres indicating a spatial buffer for datapoint accuracy. Intended for situations where datapoint locations are not precise. If greater than zero, weighted functions will clip the spatial impedance curve above weights corresponding to the given spatial tolerance and normalises to the new range. For background, see rustalgos.clip_weights_curve
.
The default min_threshold_wt
parameter can be overridden to generate custom mappings between the distance
and beta
parameters. See rustalgos.distances_from_beta
for more information.
The scale of random jitter to add to shortest path calculations, useful for situations with highly rectilinear grids or for smoothing metrics on messy network representations. A random sample is drawn from a range of zero to one and is then multiplied by the specified jitter_scale
. This random value is added to the shortest path calculations to provide random variation to the paths traced through the network. When working with shortest paths in metres, the random value represents distance in metres. When using a simplest path heuristic, the jitter will represent angular change in degrees.
Returns
The input node_gdf
parameter is returned with additional columns populated with the calcualted metrics.
The input data_gdf
is returned with two additional columns: nearest_assigned
and next_neareset_assign
.
Notes
A worked example:
from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs, io
# prepare a mock graph
G = mock.mock_graph()
G = graphs.nx_simple_geoms(G)
nodes_gdf, edges_gdf, network_structure = io.network_structure_from_nx(G, crs=3395)
print(nodes_gdf.head())
numerical_gdf = mock.mock_numerical_data(G, num_arrs=3)
print(numerical_gdf.head())
nodes_gdf, numerical_gdf = layers.compute_stats(
data_gdf=numerical_gdf,
stats_column_label="mock_numerical_1",
nodes_gdf=nodes_gdf,
network_structure=network_structure,
distances=[200, 400, 800],
)
print(nodes_gdf.columns)
# weighted form
print(nodes_gdf["cc_mock_numerical_1_mean_400_wt"])
# non-weighted form
print(nodes_gdf["cc_mock_numerical_1_mean_400_nw"])
The following stat types will be available for each stats_key
for each of the
computed distances:
max
andmin
sum
andsum_wt
mean
andmean_wt
variance
andvariance_wt