Gated-SCNN: New State-of-the-art Method for Semantic Segmentation

A group of researchers from NVIDIA, the University of Waterloo, the University of Toronto and the Vector Institute have published a new state-of-the-art method for semantic segmentation.

The novel method outperforms current state-of-the-art methods on the Cityscapes benchmark dataset by large margins (2% for mIoU, mean intersection-over-union and 4% on boundary F-score). Researchers propose an interesting architecture composed of two streams – a segmentation stream and a shape stream. The key idea here is to divide the task of image segmentation into semantic segmentation prediction and boundaries prediction. In order to exploit this kind of architecture, researchers introduce a type of gates between the two streams (or branches) that allow the shape stream to learn more robust features using the higher-level activations from the classical (segmentation) stream. They call this architecture a “Gated Shape Convolutional Neural Network” (Gated-SCNN). The last module of the architecture is fusing the output of both streams to produce the final output. The network was trained using segmentation as well as “dual-task” loss.

 

The architecture of the proposed approach.

 

Researchers showed that this kind of architecture is able to learn more precise boundaries, especially when segmenting smaller objects. The evaluations show that it outperforms strong baseline models such as DeepLabV3, PSP-Net on the Cityscapes dataset.

 

A video showing the performance of the proposed method.

 

Outputs from the model as well as more details about the method can be found on the official project website. The paper is available on arxiv, while the code will be open-sourced soon, according to the information on the website.

Google’s Neural Network Model Generates Realistic High-Fidelity Videos

Researchers from Google Research proposed a novel method for generating realistic, high-fidelity natural videos. In the past several years, we have witnessed the progress of generative models like GANs (Generative Adversarial Networks) and VAEs (variational autoencoders) towards generating realistic images. However, due to the…

Does Object Recognition Work for Everyone?

Researchers from Facebook AI Research (FAIR) have performed an interesting experiment analyzing how object recognition systems perform for people coming from different countries and with different income levels. To be able to perform such study researchers used the Dollar Street image dataset which contains…

AI System Can Provide Orthodontic Diagnosis and Treatment Plans

A group of researchers from Osaka University in Japan have proposed an automated orthodontic diagnosis system based on Natural Language Processing (NLP). In a paper published recently, Tomoyuki Kajiwara and his colleagues describe an artificial intelligence system that takes various patient data and outputs…

SANet: Flexible Neural Network Model for Style Transfer

Researchers from the Artificial Intelligence Research Institute in Korea have proposed a novel neural network method for arbitrary style transfer. Style transfer, as a technique of recomposing images in the style of other images, has become very popular especially with the rise of convolutional…

Speech2Face: Neural Network Predicts the Face Behind a Voice

In a paper published recently, researchers from MIT’s Computer Science & Artificial Intelligence Laboratory have proposed a method for learning a face from audio recordings of that person speaking. The goal of the project was to investigate how much information about a person’s looks…

Deepfake Videos: GAN Sythesizes a Video From a Single Photo

Researchers from Samsung AI and Skolkovo Institute of Science and Technology have produced a system that can create realistic fake videos of a person talking, given only a few images of that person. In their paper, named “Few-Shot Adversarial Learning of Realistic Neural Talking…

PyTorch’s torchvision 0.3 Comes With Segmentation and Detection Models, New Datasets and More

The new release 0.3 of PyTorch’s torchvision library brings several new features and improvements. The newest version of torchvision includes models for semantic segmentation, instance segmentation, object detection, person keypoint detection, etc. Torchvision developers added reference training and evaluation scripts for several tasks within…