🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Introduction
  • Contribution
  • Architecture
  • 1. Feature Learning Network
  • 2. Covolutional Middle Layer
  • 3. Region Proposal Network

Was this helpful?

  1. LiDAR

VoxelNet

PreviousFrustrum-PointNetNextPoint RCNN

Last updated 3 years ago

Was this helpful?

Zhou, Yin, and Oncel Tuzel. "Voxelnet: End-to-end learning for point cloud based 3d object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Introduction

To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations

  • for example, a bird’s eye view projection.

We remove the need of manual feature engineering for 3D point clouds and propose VoxelNet,

Aa generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network.

VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer.

Contribution

  • directly operates on sparse 3D points and avoids information bottlenecks introduced by manual feature engineering.

  • an efficient method to implement VoxelNet which benefits both from the sparse point structure and efficient parallel processing on the voxel grid.

Architecture

Voxel feature encoding (VFE) layer enables inter-point interaction within a voxel, by combining point-wise features with a locally aggregated feature.

Stacking multiple VFE layers allows learning complex features for characterizing local 3D shape information.

Specifically, VoxelNet divides the point cloud into equally spaced 3D voxels, encodes each voxel via stacked VFE layers, and then 3D convolution further aggregates local voxel features, transforming the point cloud into a high-dimensional volumetric representation.

Finally, a RPN consumes the volumetric representation and yields the detection result. This efficient algorithm benefits both from the sparse point structure and efficient parallel processing on the voxel grid.

1. Feature Learning Network

Voxel Partition and Grouping:

Input 3D data of (D, H, W). Partition input data by 3D Voxel grid of size vD, vH, vW.

This is not projected bird eye view

Group the 3D ponits according to the voxel they reside in.

Random Sampling

one scan frame contains ~100k points and processing all these points cost a very heavy computation

Randomly sample number T points from the voxels that contain higher than T points.

  • To decrease computational cost

  • decrease imbalance of points between the voxels to reduce sampling bias

Stacked Voxel Feature Encoding

The key innovation is the chain of VFE layers.

For each Voxels, repeat the following process

The final output after all voxel repetition is sparse 4D tensor : CxD'xH'xW'

2. Covolutional Middle Layer

We use ConvMD(cin; cout; k; s; p) to represent an M- dimensional convolution operator where cin and cout are the number of input and output channels, k, s, and p are the M-dimensional vectors corresponding to kernel size, stride size and padding size respectively

  • e.g. k = (k; k; k) for 3D

Each convolutional middle layer applies 3D convolution(BN, ReLU),

3D convolution is a slow process, bottleneck. See PointPillarNet

The convolutional middle layers aggregate voxel-wise features within a progressively expanding receptive field, adding more context to the shape description.

3. Region Proposal Network

See Faster R-CNN. A modification to Region proposal network of Faster R-CNN.

https://arxiv.org/pdf/1711.06396.pdf