Article

Locating Street Lights in Point Clouds Using AI (Part 1)

Blog post by Niek IJzerman and Shayla Jansen; source: amsterdamintelligence.com

In recent years, the City of Amsterdam has been harnessing the power of point cloud technology to improve the lives of its citizens. For example, point clouds and corresponding technologies have been applied in various urban projects, ranging from assessing sidewalk width to extracting geometric features from trees.

Last year, the City of Amsterdam carried out a pilot project where we automatically located streetlights in Weesp using point cloud technologies. The successful implementation of this approach allowed us to extract meaningful properties of the found objects, such as their height and inclination. The project yielded useful results, for example improved targeted maintenance.

Since the pilot in Weesp demonstrated the usefulness of automated streetlight localization using point clouds, the City of Amsterdam decided to expand this initiative citywide. Here, the aim was to replicate the success experienced in Weesp and provide our residents with a more efficient, sustainable, and aesthetically pleasing urban environment.

In this blog post, we will explain the first steps of the pipeline we used for this project. This includes the data collection, training set generation, semantic segmentation of our point clouds and streetlight extraction from segmented point clouds. Furthermore, we will dive into some of the ethical considerations for this project.

Pipeline

In Figure 1, the pipeline that we use to locate streetlights in point clouds is visualized. In the subsequent sections, we elaborate on parts 1 until 6. Parts 7 and 8 will be explained in an upcoming blogpost.

Figure 1 Street light localization pipeline

Step 1: Collection of point clouds

In order to locate streetlights in point clouds, we require access to point cloud data. One commonly employed method for capturing real-world point clouds involves the use of laser scanners. To obtain point cloud data of Amsterdam, we currently employ a laser scanner mounted on top of a vehicle, utilizing a technique known as Light Detection And Ranging or Laser Imaging Detection And Ranging (LiDAR). LiDAR scanners emit a laser pulse, which is subsequently recorded upon reflection. The time taken for the return and the strength of these reflections are precisely measured to estimate the distance of the points with an average accuracy of two centimeters, rendering them highly suitable for streetlight localization. While the city is captured as a single point cloud, we further divide it into smaller segments measuring 50m x 50m to facilitate seamless processing.

Step 2: Selection of train area

After collecting our point cloud data, we now need to generate a point cloud training set to train our semantic segmentation algorithm. The first step in creating this training dataset is to select an appropriate city district from which we can sample point clouds to build the training set.

Choosing the right district for sampling a training set is not a straightforward decision. Amsterdam comprises multiple districts, each with its unique identity. While we'd like to consider as many relevant district characteristics as possible in the selection process, it's not feasible to cover them all. Therefore, we have identified two overarching aspects crucial for describing each district's identity: (1) its landscapes and (2) its residents.

First, the district's landscapes are important because we aim to train our segmentation model on a diverse range of point clouds that collectively represent Amsterdam as accurately as possible. For instance, relying solely on point clouds from Amsterdam Centrum won't adequately represent Nieuw-West, and vice versa. Second, we emphasize the importance of the district's residents because we want our segmentation model to perform consistently across different groups of people.

To illustrate this importance, consider a scenario where we select Oud-Zuid as the sampling district for our training set. Naturally, the model would excel at segmenting point clouds from Oud-Zuid, given that its landscapes are well-represented in the training data. Consequently, we could precisely identify streetlights and address related issues or complaints. However, this could disproportionately benefit the predominantly white population living in Zuid. Conversely, choosing Oud-Zuid as a sampling district might lead to poor segmentation results in Zuidoost since its landscapes aren't well-covered in the training data (see image 1). Building upon our earlier argument about Oud-Zuid's residents, this could put the considerable number of people from ethnic minorities residing in Zuidoost at a disadvantage. We want to avoid such an outcome.

Image 1: Landscape in Zuidoost: Bijlmererhorst

In light of the considerations mentioned in the previous paragraph, we have decided to select Amsterdam Oost as the district to sample our training set from. Oost offers a wide variety of landscapes, including monumental and new construction buildings, and provides a representative cross-section of the diverse population that calls Amsterdam home.

Step 3: Selection of train point clouds

To select appropriate point clouds from Oost for creating our training set, we have developed a sampling algorithm. This algorithm takes into account various statistical properties of point clouds to assign a value score to each point cloud. Based on these value scores, we sample the point clouds. The statistical properties used to calculate these value scores include:

  1. The number of points in the point cloud
  2. The number of streetlights in the point cloud
  3. The number of different types of streetlights in the point cloud
  4. The rarity of streetlight composition (calculated using tf-idf) in the point cloud

The first property is determined using point cloud statistics, while properties 2, 3, and 4 are calculated using the city's existing streetlight database. The property of  each value can range from 0 to 1. We have also assigned weights to these properties to signify their relative importance. Specifically, we have applied weights of w1 = 1, w2 = 1, w3 = 2, and w4 = 2, where wn represents the weight associated with property n. Subsequently, the property values {v1, … vn} are averaged to yield a single value score, v, which falls within the range of 0 to 1.

Before calculating the value for all point clouds, we filtered out clouds that either lacked any streetlights or consisted of fewer than 1 million points, as they held no significance for our training set (see Figure 2.1). This initial filtering step reduced the sample pool from 4,905 to 3,768 point clouds.

Next, we applied our sampling algorithm to select n = 60 point clouds. In addition to using value scores for point cloud selection, we incorporated an exploration parameter, e, and a distance restriction parameter, d. The exploration parameter e ranges from 0 to 1 and introduces randomization in the sampling process, where e = 0 implies sampling solely based on value scores, while e = 1 signifies completely random sampling. We set the exploration rate at 0.4. The distance restriction rule specifies the minimum distance between two sampled point clouds, ensuring a broader coverage across Oost. We set d = 2, requiring a minimum distance of 2 point clouds between the sampled ones.

Figure 2: Point cloud sampling process

Following the application of our sampling algorithm, we currently have 60 sampled point clouds (as depicted in Figure 2.2). The comparison between the value of these sampled clouds and the value of clouds throughout the entirety of Amsterdam Oost is presented in Figure 3. It's evident that our approach has enabled us to create a training set in which each point cloud possesses a relatively high value score.

Figure 3: Value distribution training set Amsterdam Oost.

Step 4: Data fusion on pipeline

In order to make use of the sampled point clouds for training purposes, we had to annotate the training set. In 2021, we developed a method for automatically labeling urban point clouds through data fusion. This method was presented at the 10th International Workshop on Urban Computing at ACM SIGSPATIAL 2021 and was applied in the Weesp pilot. The fusion pipeline allows us to swiftly annotate our training set for training. If you're interested in an explanation of our fusion algorithm, you can refer to this blogpost.

Step 5: Segmentation of city point clouds

With our training set in hand, we trained a semantic segmentation algorithm known as RandLA-Net. This algorithm has proven to be a robust method for semantic segmentation and was therefore employed in previous projects and internships. Following the training phase, we carried out inferences on approximately 55,000 point clouds collected in Amsterdam, amounting to a total data size of 750 GB. Both training and inference operations were conducted on the Azure Machine Learning platform using 4 NVIDIA Tesla K80 GPUs.

Step 6: Extraction of streetlights

Figure 4. The detected streetlights in one point cloud tile

Now we know which points are associated with streetlights, we want to identify individual poles (figure 4). We perform connected component labeling to create clusters of points belonging to the street light class, while removing noise on the go. Next, we extract properties of the poles. We take horizontal slices and find the center of each, using the smallest enclosing circle. A selection of these centers (using variance, for more details see the previous post) is used to fit a straight line to the pole. From this fit, we deduce the location, height and tilt of each pole. Finally, we remove some (16%) very likely false positives: objects higher than 16 meters, smaller than 1.8 meters and those located at a spot where we know there is a tree.

To be able to validate the extracted poles, we create images for each of them: colored point clouds and the fit (see for example Figure 5), from multiple angles. As we still have more than 150 thousand potential streetlights to deal with, we use the multiprocessing package  to speed things up.

Figure 5. Colored point cloud for one extracted pole, including the fit (in red). 

Ethical considerations

Before moving on to validating and delivering streetlights, we want to ensure that this project is in line with our own ethical standards. Our way to go for this is to create an ethical leaflet for the project, based on the tada values. We assess the moral risks and opportunities and how we can act on those.

As our own scoring on the values in Figure 6 implies, there is room for improvement.  The topic of user-centeredness is taken into account in the next project and sharing of data is still up to debate. On some other topics, we can already make a difference.

Figure 6. Ethical leaflet scoring on values

 

First of all, there is much to win in the area of openness and transparency. Consequently, we decided to share our project with a technical audience at PyData Amsterdam 2023. And of course, this blogpost is (next to entertaining you) also a way to share our methods. In addition, we fully open sourced the latest version of our code. Check it out and feel free to use it for your own needs, ask questions or help out! You can open an issue, submit a PR or contact us.

Next, we want to know whether the climate impact of this project is proportional. The energy use for training and inference was approximately 200kWh (4 x 300W x 400h * 0.4 utilization). This is roughly the energy consumption of one streetlight for a year. That sounds pretty reasonable to us.

Also, we figure that a human-in-the-loop is important in this project. Finally, we are wondering how our work impacts different groups of people. Stay tuned for the next blog to check out what we did on these topics and what the final results are!

*Source Header Image

Source: amsterdamintelligence.com

Additional info

Image credits

Header image: street lights ovl1 banner - by amsterdam smart city https://amsterdamsmartcity.com/updates/project/flexible-street-lighting