cityseer.metrics.layers

dict_wgs_to_utm

dict_wgs_to_utm(data_dict)
                -> dict

Converts data dictionary x and y values from WGS84lng, lat geographic coordinates to the local UTM projected coordinate system.

Parameters
data_dict
dict

A dictionary representing distinct data points, where each key represents a uid and each value represents a nested dictionary with x and y key-value pairs.

example_data_dict = {
    'uid_01': {
        'x': 6000956.463188213,
        'y': 600693.4059810264
    },
    'uid_02': {
        'x': 6000753.336609659,
        'y': 600758.7916663144
    }
}
Returns
dict

Returns a copy of the source dictionary with the x and y values converted to the local UTM coordinate system.

Notes
from cityseer.tools import mock
from cityseer.metrics import layers

# let's generate a mock data dictionary
G_wgs = mock.mock_graph(wgs84_coords=True)
# mock_data_dict takes on the same extents on the graph parameter
data_dict_WGS = mock.mock_data_dict(G_wgs, random_seed=25)
# the dictionary now contains wgs coordinates
for i, (key, value) in enumerate(data_dict_WGS.items()):
    print(key, value)
    # prints:
    # 0 {'x': -0.09600470559254023, 'y': 51.592916036617794}
    # 1 {'x': -0.10621770551738155, 'y': 51.58888719412964}
    if i == 1:
        break

# any data dictionary that follows this template can be passed to dict_wgs_to_utm()
data_dict_UTM = layers.dict_wgs_to_utm(data_dict_WGS)
# the coordinates have now been converted to UTM
for i, (key, value) in enumerate(data_dict_UTM.items()):
    print(key, value)
    # prints:
    # 0 {'x': 701144.5207785056, 'y': 5719758.706109629}
    # 1 {'x': 700455.0000341447, 'y': 5719282.703221394}
    if i == 1:
        break

encode_categorical

encode_categorical(classes)
                   -> Tuple[tuple, np.ndarray]

Converts a list of land-use classes (or other categorical data) to an integer encoded version based on the unique elements.

Comment

It is generally not necessary to utilise this function directly. It will be called implicitly if calculating land-use metrics.

Parameters
classes
Union[list, tuple, np.ndarray]

A list, tuple or numpy array of classes to be encoded.

Returns
tuple

A tuple of unique class descriptors extracted from the input classes.

np.ndarray

A numpy array of the encoded classes. The value of the int encoding will correspond to the order of the class_descriptors.

Notes
from cityseer.metrics import layers

classes = ['cat', 'dog', 'cat', 'owl', 'dog']

class_descriptors, class_encodings = layers.encode_categorical(classes)
print(class_descriptors)  # prints: ('cat', 'dog', 'owl')
print(list(class_encodings))  # prints: [0, 1, 0, 2, 1]

data_map_from_dict

data_map_from_dict(data_dict)
                   -> Tuple[tuple, np.ndarray]

Converts a data dictionary into a numpy array for use by DataLayer classes.

Comment

It is generally not necessary to use this function directly. This function will be called implicitly when invoking DataLayerFromDict

Parameters
data_dict
dict

A dictionary representing distinct data points, where each key represents a uid and each value represents a nested dictionary with x and y key-value pairs. The coordinates must be in a projected coordinate system matching that of the NetworkLayer to which the data will be assigned.

example_data_dict = {
    'uid_01': {
        'x': 6000956.463188213,
        'y': 600693.4059810264
    },
    'uid_02': {
        'x': 6000753.336609659,
        'y': 600758.7916663144
    }
}
Returns
tuple

A tuple of data uids corresponding to the data point identifiers in the source data_dict.

np.ndarray

A 2d numpy array representing the data points. The indices of the second dimension correspond as follows:

idxproperty
0x coordinate
1y coordinate
2assigned network index - nearest
3assigned network index - next-nearest

The arrays at indices 2 and 3 will be initialised with np.nan. These will be populated when the DataLayer.assign_to_network method is invoked.

class DataLayer

Categorical data, such as land-use classifications and numerical data, can be assigned to the network as a DataLayer. A DataLayer represents the spatial locations of data points, and can be used to calculate various mixed-use, land-use accessibility, and statistical measures. Importantly, these measures are computed directly over the street network and offer distance-weighted variants; the combination of which, makes them more contextually sensitive than methods otherwise based on simpler crow-flies aggregation methods.

The coordinates of data points should correspond as precisely as possible to the location of the feature in space; or, in the case of buildings, should ideally correspond to the location of the building entrance.

Note that in many cases, the DataLayerFromDict class will provide a more convenient alternative for instantiating this class.

DataLayer

DataLayer(data_uids,
          data_map)
Parameters
data_uids
Union[list, tuple]

A list or tuple of data identifiers corresponding to each data point. This list must be in the same order and of the same length as the data_map.

data_map
np.ndarray

A 2d numpy array representing the data points. The length of the first dimension should match that of the data_uids. The indices of the second dimension correspond as follows:

idxproperty
0x coordinate
1y coordinate
2assigned network index - nearest
3assigned network index - next-nearest

The arrays at indices 2 and 3 will be populated when the DataLayer.assign_to_network method is invoked.

Returns
DataLayer

Returns a DataLayer.

Properties

DataLayer.uids

Unique ids corresponding to each location in the data_map.

DataLayer.data_x_arr

DataLayer.data_y_arr

DataLayer.Network

DataLayer.assign_to_network

DataLayer.assign_to_network(Network_Layer,
                            max_dist)

Once created, a DataLayer should be assigned to a NetworkLayer. The NetworkLayer provides the backbone for the localised spatial aggregation of data points over the street network. The measures will be computed over the same distance thresholds as used for the NetworkLayer.

The data points will be assigned to the two closest network nodes — one in either direction — based on the closest adjacent street edge. This enables a dynamic spatial aggregation method that more accurately describes distances over the network to data points, relative to the direction of approach.

Parameters
Network_Layer
networks.NetworkLayer
max_dist
Union[int, float]

The maximum distance to consider when assigning respective data points to the nearest adjacent network nodes.

Notes
Comment

The max_dist parameter should not be set too small. There are two steps in the assignment process: the first, identifies the closest street node; the second, sets-out from this node and attempts to wind around the data point — akin to circling the block. It will then review the discovered graph edges from which it is able to identify the closest adjacent street-front. The max_dist parameter sets a crow-flies distance limit on how far the algorithm will search in its attempts to encircle the data point. If the max_dist is too small, then the algorithm is potentially hampered from finding a starting node; or, if a node is found, may have to terminate exploration prematurely because it can’t travel far enough away from the data point to explore the surrounding network. If too many data points are not being successfully assigned to the correct street edges, then this distance should be increased. Conversely, if most of the data points are satisfactorily assigned, then it may be possible to decrease this threshold. A distance of around 400m provides a good starting point.

Comment

The precision of assignment improves on decomposed networks (see graphs.nX_decompose), which offers the additional benefit of a more granular representation of variations in metrics along street-fronts.

Example assignment of data to a networkExample assignment on a non-decomposed graph.

Example assignment of data to a networkAssignment of data to network nodes becomes more contextually precise on decomposed graphs.

DataLayer.compute_aggregated

DataLayer.compute_aggregated(**kwargs)

This method is deprecated and, if invoked, will raise a DeprecationWarning. Please use compute_landuses or compute_stats instead.

DataLayer.compute_landuses

DataLayer.compute_landuses(landuse_labels,
                           mixed_use_keys=None,
                           accessibility_keys=None,
                           cl_disparity_wt_matrix=None,
                           qs=None,
                           angular=False)

This method wraps the underlying numba optimised functions for aggregating and computing various mixed-use and land-use accessibility measures. These are computed simultaneously for any required combinations of measures (and distances), which can have significant speed implications. Situations requiring only a single measure can instead make use of the simplified DataLayer.hill_diversity, DataLayer.hill_branch_wt_diversity, and DataLayer.compute_accessibilities methods.

See the accompanying paper on arXiv for additional information about methods for computing mixed-use measures at the pedestrian scale.

The data is aggregated and computed over the street network relative to the Network Layer nodes, with the implication that mixed-use and land-use accessibility aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared. The outputs of the calculations are written to the corresponding node indices in the same NetworkLayer.metrics dictionary used for centrality methods, and will be categorised by the respective keys and parameters.

For example, if hill and shannon mixed-use keys; shops and factories accessibility keys are computed on a Network Layer instantiated with 800m and 1600m distance thresholds, then the dictionary would assume the following structure:

NetworkLayer.metrics = {
    'mixed_uses': {
        # note that hill measures have q keys
        'hill': {
            # here, q=0
            0: {
                800: [...],
                1600: [...]
            },
            # here, q=1
            1: {
                800: [...],
                1600: [...]
            }
        },
        # non-hill measures do not have q keys
        'shannon': {
            800: [...],
            1600: [...]
        }
    },
    'accessibility': {
        # accessibility keys are computed in both weighted and unweighted forms
        'weighted': {
            'shops': {
                800: [...],
                1600: [...]
            },
            'factories': {
                800: [...],
                1600: [...]
            }
        },
        'non_weighted': {
            'shops': {
                800: [...],
                1600: [...]
            },
            'factories': {
                800: [...],
                1600: [...]
            }
        }
    }
}
Parameters
landuse_labels
Union[list, tuple, np.ndarray]

A set of land-use labels corresponding to the length and order of the data points. The labels should correspond to descriptors from the land-use schema, such as “retail” or “commercial”. This parameter is only required if computing mixed-uses or land-use accessibilities.

mixed_use_keys
Union[list, tuple]

An optional list of strings describing which mixed-use metrics to compute, containing any combination of key values from the following table, by default None. See Notes for additional information.

accessibility_keys
Union[list, tuple]

An optional list or tuple of land-use classifications for which to calculate accessibilities. The keys should be selected from the same land-use schema used for the landuse_labels parameter, e.g. “retail”. The calculations will be performed in both weighted and non_weighted variants, by default None.

cl_disparity_wt_matrix
Union[list, tuple, np.ndarray]

A pairwise NxN disparity matrix numerically describing the degree of disparity between any pair of distinct land-uses. This parameter is only required if computing mixed-uses using hill_pairwise_disparity or raos_pairwise_disparity. The number and order of land-uses should match those implicitly generated by encode_categorical, by default None.

qs
Union[list, tuple, np.ndarray]

The values of q for which to compute Hill diversity. This parameter is only required if computing one of the Hill diversity mixed-use measures, by default None.

angular
bool

Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances, by default False

Notes
keyformulanotes
hill(iSpiq)1/(1q)q0,q1limq1exp(iSpilogpi)\scriptstyle\big(\sum_{i}^{S}p_{i}^q\big)^{1/(1-q)}\ q\geq0,\ q\neq1 \\ \scriptstyle lim_{q\to1}\ exp\big(-\sum_{i}^{S}\ p_{i}\ log\ p_{i}\big)Hill diversity: this is the preferred form of diversity metric because it adheres to the replication principle and uses units of effective species instead of measures of information or uncertainty. The q parameter controls the degree of emphasis on the richness of species as opposed to the balance of species. Over-emphasis on balance can be misleading in an urban context, for which reason research finds support for using q=0: this reduces to a simple count of distinct land-uses.
hill_branch_wt[iSdi(piTˉ)q]1/(1q)Tˉ=iSdipi\scriptstyle\big[\sum_{i}^{S}d_{i}\big(\frac{p_{i}}{\bar{T}}\big)^{q} \big]^{1/(1-q)} \\ \scriptstyle\bar{T} = \sum_{i}^{S}d_{i}p_{i}This is a distance-weighted variant of Hill Diversity based on the distances from the point of computation to the nearest example of a particular land-use. It therefore gives a locally representative indication of the intensity of mixed-uses. did_{i} is a negative exponential function where β\beta controls the strength of the decay. (β\beta is provided by the Network Layer, see distance_from_beta.)
hill_pairwise_wt[iSjiSdij(pipjQ)q]1/(1q)Q=iSjiSdijpipj\scriptstyle\big[ \sum_{i}^{S} \sum_{j\neq{i}}^{S} d_{ij} \big( \frac{p_{i} p_{j}}{Q} \big)^{q} \big]^{1/(1-q)} \\ \scriptstyle Q = \sum_{i}^{S} \sum_{j\neq{i}}^{S} d_{ij} p_{i} p_{j}This is a pairwise-distance-weighted variant of Hill Diversity based on the respective distances between the closest examples of the pairwise distinct land-use combinations as routed through the point of computation. dijd_{ij} represents a negative exponential function where β\beta controls the strength of the decay. (β\beta is provided by the Network Layer, see distance_from_beta.)
hill_pairwise_disparity[iSjiSwij(pipjQ)q]1/(1q)Q=iSjiSwijpipj\scriptstyle\big[ \sum_{i}^{S} \sum_{j\neq{i}}^{S} w_{ij} \big( \frac{p_{i} p_{j}}{Q} \big)^{q} \big]^{1/(1-q)} \\ \scriptstyle Q = \sum_{i}^{S} \sum_{j\neq{i}}^{S} w_{ij} p_{i} p_{j}This is a disparity-weighted variant of Hill Diversity based on the pairwise disparities between land-uses. This variant requires the use of a disparity matrix provided through the cl_disparity_wt_matrix parameter.
shannoniSpilogpi\scriptstyle -\sum_{i}^{S}\ p_{i}\ log\ p_{i}Shannon diversity (orinformation entropy) is one of the classic diversity indices. Note that it is preferable to use Hill Diversity with q=1, which is effectively a transformation of Shannon diversity into units of effective species.
gini_simpson1iSpi2\scriptstyle 1 - \sum_{i}^{S} p_{i}^2Gini-Simpson is another classic diversity index. It can behave problematically because it does not adhere to the replication principle and places emphasis on the balance of species, which can be counter-productive for purposes of measuring mixed-uses. Note that where an emphasis on balance is desired, it is preferable to use Hill Diversity with q=2, which is effectively a transformation of Gini-Simpson diversity into units of effective species.
raos_pairwise_disparityiSjiSdijpipj\scriptstyle \sum_{i}^{S} \sum_{j \neq{i}}^{S} d_{ij} p_{i} p_{j}Rao diversity is a pairwise disparity measure and requires the use of a disparity matrix provided through the cl_disparity_wt_matrix parameter. It suffers from the same issues as Gini-Simpson. It is preferable to use disparity weighted Hill diversity with q=2.
Comment

The available choices of land-use diversity measures may seem overwhelming. hill_branch_wt paired with q=0 is generally the best choice for granular landuse data, or else q=1 or q=2 for increasingly crude landuse classifications schemas.

A worked example:

from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs

# prepare a mock graph
G = mock.mock_graph()
G = graphs.nX_simple_geoms(G)

# generate the network layer
N = networks.NetworkLayerFromNX(G, distances=[200, 400, 800, 1600])

# prepare a mock data dictionary
data_dict = mock.mock_data_dict(G, random_seed=25)
# prepare some mock land-use classifications
landuses = mock.mock_categorical_data(len(data_dict), random_seed=25)

# generate a data layer
L = layers.DataLayerFromDict(data_dict)
# assign to the network
L.assign_to_network(N, max_dist=500)
# compute some metrics - here we'll use the full interface, see below for simplified interfaces
# FULL INTERFACE
# ==============
L.compute_landuses(landuse_labels=landuses,
                   mixed_use_keys=['hill'],
                   qs=[0, 1],
                   accessibility_keys=['c', 'd', 'e'])
# note that the above measures can optionally be run individually using simplified interfaces, e.g.
# SIMPLIFIED INTERFACES
# =====================
# L.hill_diversity(landuses, qs=[0])
# L.compute_accessibilities(landuses, ['a', 'b'])

# let's prepare some keys for accessing the computational outputs
# distance idx: any of the distances with which the NetworkLayer was initialised
distance_idx = 200
# q index: any of the invoked q parameters
q_idx = 0
# a node idx
node_idx = 0

# the data is available at N.metrics
print(N.metrics['mixed_uses']['hill'][q_idx][distance_idx][node_idx])
# prints: 4.0
print(N.metrics['accessibility']['weighted']['d'][distance_idx][node_idx])
# prints: 0.019168843947614676
print(N.metrics['accessibility']['non_weighted']['d'][distance_idx][node_idx])
# prints: 1.0

Note that the data can also be unpacked to a dictionary using NetworkLayer.metrics_to_dict, or transposed to a networkX graph using NetworkLayer.to_networkX.

Caution

Be cognisant that mixed-use and land-use accessibility measures are sensitive to the classification schema that has been used. Meaningful comparisons from one location to another are only possible where the same schemas have been applied.

DataLayer.hill_diversity

DataLayer.hill_diversity(landuse_labels,
                         qs=None)

Compute hill diversity for the provided landuse_labels at the specified values of q. See DataLayer.compute_landuses for additional information.

Parameters
landuse_labels
Union[list, tuple, np.ndarray]

A set of land-use labels corresponding to the length and order of the data points. The labels should correspond to descriptors from the land-use schema, such as “retail” or “commercial”.

qs
Union[list, tuple, np.ndarray]

The values of q for which to compute Hill diversity, by default None

Notes

The data key is hill, e.g.:

NetworkLayer.metrics['mixed_uses']['hill'][<<q key>>][<<distance key>>][<<node idx>>]

DataLayer.hill_branch_wt_diversity

DataLayer.hill_branch_wt_diversity(landuse_labels,
                                   qs=None)

Compute distance-weighted hill diversity for the provided landuse_labels at the specified values of q. See DataLayer.compute_landuses for additional information.

Parameters
landuse_labels
Union[list, tuple, np.ndarray]

A set of land-use labels corresponding to the length and order of the data points. The labels should correspond to descriptors from the land-use schema, such as “retail” or “commercial”.

qs
Union[list, tuple, np.ndarray]

The values of q for which to compute Hill diversity, by default None

Notes

The data key is hill_branch_wt, e.g.:

NetworkLayer.metrics['mixed_uses']['hill_branch_wt'][<<q key>>][<<distance key>>][<<node idx>>]

DataLayer.compute_accessibilities

DataLayer.compute_accessibilities(landuse_labels,
                                  accessibility_keys)

Compute land-use accessibilities for the specified land-use classification keys. See DataLayer.compute_landuses for additional information.

Parameters
landuse_labels
Union[list, tuple, np.ndarray]

A set of land-use labels corresponding to the length and order of the data points. The labels should correspond to descriptors from the land-use schema, such as “retail” or “commercial”.

accessibility_keys
Union[list, tuple]

The land-use keys for which to compute accessibilies. The keys should be selected from the same land-use schema used for the landuse_labels parameter, e.g. “retail”. The calculations will be performed in both weighted and non_weighted variants.

Notes

The data keys will correspond to the accessibility_keys specified, e.g. where computing retail accessibility:

NetworkLayer.metrics['accessibility']['weighted']['retail'][<<distance key>>][<<node idx>>]
NetworkLayer.metrics['accessibility']['non_weighted']['retail'][<<distance key>>][<<node idx>>]

DataLayer.compute_stats

DataLayer.compute_stats(stats_keys,
                        stats_data_arrs,
                        angular=False)

This method wraps the underlying numba optimised functions for computing statistical measures. The data is aggregated and computed over the street network relative to the Network Layer nodes, with the implication that statistical aggregations are generated from the same locations as for centrality computations, which can therefore be correlated or otherwise compared. The outputs of the calculations are written to the corresponding node indices in the same NetworkLayer.metrics dictionary used for centrality methods, and will be categorised by the respective keys and parameters.

For example, if a valuations stats key is computed on a Network Layer instantiated with 800m and 1600m distance thresholds, then the dictionary would assume the following structure:

NetworkLayer.metrics = {
    'stats': {
        # stats grouped by each stats key
        'valuations': {
            # each stat will have the following key-value pairs
            'max': {
                800: [...],
                1600: [...]
            },
            'min': {
                800: [...],
                1600: [...]
            },
            'sum': {
                800: [...],
                1600: [...]
            },
            'sum_weighted': {
                800: [...],
                1600: [...]
            },
            'mean': {
                800: [...],
                1600: [...]
            },
            'mean_weighted': {
                800: [...],
                1600: [...]
            },
            'variance': {
                800: [...],
                1600: [...]
            },
            'variance_weighted': {
                800: [...],
                1600: [...]
            }
        }
    }
}
Parameters
stats_keys
Union[str, list, tuple]

If computing a single stat: a str key describing the stats computed for the stats_data_arr parameter. If computing multiple stats: a list or tuple of keys. Computed stats will be saved under the supplied key to the N.metrics dictionary.

stats_data_arrs
Union[ List[Union[list, tuple, np.ndarray]], Tuple[Union[list, tuple, np.ndarray]], list, tuple, np.ndarray]

If computing a single stat: a 1d list, tuple or numpy array of numerical data, where the length corresponds to the number of data points in the DataLayer. If computing multiple stats keys: a 2d list, tuple, or numpy array of numerical data, where the first dimension corresponds to the number of keys in the stats_keys parameter and the second dimension corresponds to number of data points in the DataLayer. e.g:

# if computing three keys for a DataLayer containg 5 data points
stats_keys = ['valuations', 'floors', 'occupants']
stats_data_arrs = [
    [50000, 60000, 55000, 42000, 46000],  # valuations
    [3, 3, 2, 3, 5],  # floors
    [420, 300, 220, 250, 600]  # occupants
]
angular
bool

Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations and distances, by default False

Notes

The data keys will correspond to the stats_keys parameter, e.g.:

NetworkLayer.metrics['stats']['valuations'][<<stat type>>][<<distance key>>][<<node idx>>]
NetworkLayer.metrics['stats']['floors'][<<stat type>>][<<distance key>>][<<node idx>>]
NetworkLayer.metrics['stats']['occupants'][<<stat type>>][<<distance key>>][<<node idx>>]

A worked example:

from cityseer.metrics import networks, layers
from cityseer.tools import mock, graphs

# prepare a mock graph
G = mock.mock_graph()
G = graphs.nX_simple_geoms(G)

# generate the network layer
N = networks.NetworkLayerFromNX(G, distances=[200, 400, 800, 1600])

# prepare a mock data dictionary
data_dict = mock.mock_data_dict(G, random_seed=25)
# let's prepare some numerical data
stats_data = mock.mock_numerical_data(len(data_dict), num_arrs=1, random_seed=25)

# generate a data layer
L = layers.DataLayerFromDict(data_dict)
# assign to the network
L.assign_to_network(N, max_dist=500)
# compute some metrics
L.compute_stats(stats_keys='mock_stat',
                stats_data_arrs=stats_data)
# let's prepare some keys for accessing the computational outputs
# distance idx: any of the distances with which the NetworkLayer was initialised
distance_idx = 200
# a node idx
node_idx = 0

# the data is available at N.metrics
print(N.metrics['stats']['mock_stat']['mean_weighted'][distance_idx][node_idx])
# prints: 71297.82967202332

Note that the data can also be unpacked to a dictionary using NetworkLayer.metrics_to_dict, or transposed to a networkX graph using NetworkLayer.to_networkX.

Comment

Per the above worked example, the following stat types will be available for each stats_key for each of the computed distances:

  • max and min
  • sum and sum_weighted
  • mean and mean_weighted
  • variance and variance_weighted

DataLayer.compute_stats_single

DataLayer.compute_stats_single(**kwargs)

This method is deprecated and, if invoked, will raise a DeprecationWarning. Please use compute_stats instead.

DataLayer.compute_stats_multiple

DataLayer.compute_stats_multiple(**kwargs)

This method is deprecated and, if invoked, will raise a DeprecationWarning. Please use compute_stats instead.

DataLayer.model_singly_constrained

DataLayer.model_singly_constrained(key,
                                   i_data_map,
                                   j_data_map,
                                   i_weights,
                                   j_weights,
                                   angular=False)

Undocumented method for computing singly-constrained interactions.

class DataLayerFromDict

Directly transposes an appropriately prepared data dictionary into a DataLayer. This class calls data_map_from_dict internally. Methods and properties are inherited from the parent DataLayer class, which can be referenced for more information.

DataLayerFromDict

DataLayerFromDict(data_dict)
Parameters
data_dict
dict

A dictionary representing distinct data points, where each key represents a uid and each value represents a nested dictionary with x and y key-value pairs. The coordinates must be in a projected coordinate system matching that of the NetworkLayer to which the data will be assigned.

Returns
DataLayer

Returns a DataLayer.

Notes

Example dictionary:

example_data_dict = {
    'uid_01': {
        'x': 6000956.463188213,
        'y': 600693.4059810264
    },
    'uid_02': {
        'x': 6000753.336609659,
        'y': 600758.7916663144
    }
}