e-Bulletin

Video ProfilesInnovation

October 2022 Issue

Unsupervised segmentation and its applications

Unsupervised segmentation deep learning
Tomoaki Nakamura
Associate Professor, Graduate School of Informatics and Engineering

Tomoaki Nakamura conducts research on unsupervised segmentation of time-series data based on hierarchical Bayesian models.

“Unsupervised means that a person does not have to give the correct answer. In recent machine learning, supervised learning—which learns from data with correct labels attached in advance—is mainstream,” explains Nakamura. “On the other hand, my approach has the advantage of being able to handle data without the correct labels. Next, segmentation technology extracts meaningful patterns from time-series data and classifies them into groups of meaning. An example is spoken language, where sound is in the form of vibrations of air. Humans divide it into phonemes, which are units of sound, and recognize them discretely. Furthermore, each phoneme sequence is discretely recognized by dividing it into words, which are units with meaning. In this way, segmentation is a technique for segmenting time-series data, grouping it according to meaning, and handling it discretely. Unsupervised segmentation enables analysis of unlabeled time series data.”

Focus on generative models for unsupervised segmentation

One unsupervised segmentation method is Gaussian Process-Hidden Semi-Markov Models (GP-HSMM). This model assumes that time-series data is generated by creating segments, which are partial time series, from the association patterns of multiple classes and connecting them. A model that expresses the process by which data is generated in this way is a called a generative model. In a generative model, by probabilistically describing the process of data generation, unsupervised learning that probabilistically infers parameters from data alone is possible.

The purpose of GP-HSMM training is to estimate parameters that maximize the probability of segment ‘X’ being generated from time-series data ‘S’ only. In other words, what kind of fixed patterns are there from data alone? From which fixed pattern are the partial time series in the data generated? And how long is that segment that must be estimated? Nakamura and his group have developed a method for efficiently inferring these parameters.

Here are two examples of actual analysis cases using this method. The first is human motion analysis, where the researchers extracted basic motions included in human motion from only 96-dimensional motion capture data of human motion.

In the accompanying video, the graph on the left is the input time-series data, and the graph on the lower right is the result of segmenting this data only. The horizontal axis represents time, and the vertical axis represents the number of basic movements classified. In other words, sections assigned the same number represent the same basic movement. If this is compared with the motion capture data visualized in the middle video, the class changes when the action changes. In this way, it is possible to extract six basic motions, which are automatically put into meaningful groups, from only the time-series data.

Motion capture data of a marmoset

Another example is related to animals where this method was successfully applied to the motion capture data of a marmoset, rapidly segmenting it, and automatically extracting characteristic behavior. Furthermore, Nakamura extended this method, and developed a model to segment data with a double segmentation structure. For example, speech described above becomes data with a double segment structure.

“By segmenting the speech waveform, we can extract phonemes, and by segmenting the phoneme sequence, we can extract words,” explains Nakamura. “To extract words from speech waveforms, two stages of segmentation are required. Therefore, by hierarchically connecting segmentation models, we applied this method to speech with a double segmentation structure.”

Notably, in this model a segmented model called HSMM is placed on top of the GP-HSMM model described earlier. In this model, all characters, sound characteristics, and words can be learned without supervision from speech waveforms alone.

The video shows the results of the actual analysis of voice using a data set of artificial words composed of the sounds of "aiueo". As words, it consists of two-word sentences such as “a-oi” and “ao” and three-word utterances.

In the video, the graph on the left is the result of segmenting only the speech waveform, and the unit "AIUEO" was correctly segmented as a character. Furthermore, by segmenting this syllable string the words contained in the dataset were extracted. The graph on the right is an index representing the closeness to the correct answer called ARI. This method achieves highly accurate segmentation compared to other methods.

Future

“In this video I introduced a technique for unsupervised segmentation that can extract fixed patterns from time-series data without the correct labels,” says Nakamura. “I introduced GP-HSMM as a basic model and demonstrated that it is possible to extract basic motion and characteristic motion from motion capture data. In addition, I introduced a double segmentation analysis model that hierarchically connects segmentation models and showed that syllables and words can be extracted from speech alone. In the example introduced this time, only motion and voice were used, but this method can also analyze various time-series data, that are not limited to motion and voice.”

Research Keywords: motion segmentation, Gaussian process, hidden semi-Markov model, motion capture data, high-dimensional time-series data

References and further information

1. Masatoshi Nagano, Tomoaki Nakamura, Takayuki Nagai, Daichi Mochihashi, Ichiro Kobayashi and Wataru Takano, “HVGH: Unsupervised Segmentation for High-dimensional Time Series Using Deep Neural Compression and Statistical Generative Model”, Frontiers in Robotics and AI, Vol. 6, Article 115, pp. 1-15, Nov. 2019

2. Masatoshi Nagano, Tomoaki Nakamura, Takayuki Nagai, Daichi Mochihashi, Ichiro Kobayashi and Wataru Takano, “High-dimensional Motion Segmentation by Variational Autoencoder and Gaussian Processes”, IROS2019, pp. 105-111, Nov. 2019

3. Masatoshi Nagano, Tomoaki Nakamura, Takayuki Nagai, Daichi Mochihashi, Ichiro Kobayashi, Masahide Kaneko, “Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process”, IROS2018, pp. 4067-4074, Oct. 2018

4. Tomoaki Nakamura, Takayuki Nagai, Daichi Mochihashi, Ichiro Kobayashi, Hideki Asoh and Masahide Kaneko, “Segmenting Continuous Motions with Hidden Semi-Markov Models and Gaussian Processes”, Frontiers in Neurorobotics, vol.11, article 67, pp. 1-11, Dec. 2017