移动模式挖掘&用户意图发现 报告人:王月 时间:2016.10.9.

1 移动模式挖掘&用户意图发现 报告人:王月 时间: ...
Author: Bernadette Gallagher
0 downloads 0 Views

1 移动模式挖掘&用户意图发现 报告人:王月 时间:

2 移动模式挖掘 Gmove (KDD 2016) 移动模式挖掘方法概括 T-patterns (KDD 2007) Gathering patterns (ICDE 2013) Periodic patterns (KDD 2010) POIs and travel sequences (WWW 2009) discovering latent semantics (Wireless Sensor Networks 2015) 用户意图发现 主题模型 短文本中的数据稀疏问题 DCT (KDD 2016) PTM&SPTM (KDD 2016) 意图挖掘现状

3 移动模式挖掘

4 GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media (KDD 2016)Geo-tagged social media (GeoSM) : spatial, temporal, textual Highly sparse and complex GeoSM data : group-level mobility modeling Hidden Markov Models(HMMs) : learn the multi-view latent states and people’s transitions between them Iterative framework : alternate between user grouping and group- level mobility modeling Text sparsity : text augmenter (leverages keyword spatiotemporal correlations to augment raw messages)

5 GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media (KDD 2016)

6 移动模式挖掘 Datasets Applications pure GPS trajectory datasemantic trajectories (e.g.,GeoSM) Applications traffic jam prevention location prediction urban planning personalized activity recommendation

7 Types of mobility modeling approachesPattern-based approaches mine regular mobility patterns from movement data. Mining different types of movement patterns(e.g., T-patterns) Finding a set of objects that are frequently co-located(e.g., flock, swarm, and gathering patterns) Mining periodic movement patterns Law-based approaches investigate the physical laws that govern human’s moving behaviors. Model-based approaches attempt to learn statistical models from movement data.

8 Input : GPS trajectories Output : trajectory pattern (T-pattern)F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In KDD, pages 330–339, 2007. Input : GPS trajectories Output : trajectory pattern (T-pattern) T-PATTERNS WITH STATIC ROI Discovering Regions-of-Interest T-PATTERNS WITH DYNAMIC ROI A trajectory pattern is a sequence of spatial regions that, on the basis of the source trajectory data, emerge as frequently visited in the order specified by the sequence; in addition, the transition between two consecutive regions in such a sequence is annotated with a typical travel time that, again, emerges from the input trajectories.

9 K. Zheng, Y. Zheng, N. J. Yuan, and S. ShangK. Zheng, Y. Zheng, N. J. Yuan, and S. Shang. On discovery of gathering patterns from trajectories. In ICDE, pages 242–253, 2013. Definition : gathering is a trajectory pattern modelling various group incidents such as celebrations, parades, protests, traffic jams and so on. Challenges : the process of discovering gathering patterns over large- scale trajectory databases can be quite lengthy. Solution : effective indexing structures fast pattern detection algorithms implemented with bit vectors incremental algorithms for handling new trajectory arrivals We propose a novel concept, called gathering, which is a trajectory pattern modelling various group incidents such as celebrations, parades, protests, traffic jams and so on. A key observation is that these incidents typically involve large congregations of individuals, which form durable and stable areas with high density. Since the process of discovering gathering patterns over large-scale trajectory databases can be quite lengthy, we further develop a set of well thought out techniques to improve the performance. These techniques, including effective indexing structures, fast pattern detection algorithms implemented with bit vectors, and incremental algorithms for handling new trajectory arrivals, collectively constitute an efficient solution for this challenging task.

10 Output : periodic behaviorsZ. Li, B. Ding, J. Han, R. Kays, and P. Nye. Mining periodic behaviors for moving objects. In KDD, pages 1099–1108, 2010. Input : tracking data Output : periodic behaviors In this paper, we address the problem of mining periodic behaviors for moving objects. It involves two sub-problems: how to detect the periods in complex movement, and how to mine periodic movement behaviors. Our main assumption is that the observed movement is generated from multiple interleaved periodic behaviors associated with certain reference locations. Based on this assumption, we propose a two-stage algorithm, Periodica, to solve the problem. At the first stage, the notion of reference spot is proposed to capture the reference locations. Through reference spots, multiple periods in the movement can be retrieved using a method that combines Fourier transform and autocorrelation. At the second stage, a probabilistic model is proposed to characterize the periodic behaviors. For a specific period, periodic behaviors are statistically generalized from partial movement sequences through hierarchical clustering. A reference spot is a dense area that is frequently visited in the movement.

11 Z. Li, B. Ding, J. Han, R. Kays, and P. NyeZ. Li, B. Ding, J. Han, R. Kays, and P. Nye. Mining periodic behaviors for moving objects. In KDD, pages 1099–1108, 2010. In this paper, we address the problem of mining periodic behaviors for moving objects. It involves two sub-problems: how to detect the periods in complex movement, and how to mine periodic movement behaviors. Our main assumption is that the observed movement is generated from multiple interleaved periodic behaviors associated with certain reference locations. Based on this assumption, we propose a two-stage algorithm, Periodica, to solve the problem. At the first stage, the notion of reference spot is proposed to capture the reference locations. Through reference spots, multiple periods in the movement can be retrieved using a method that combines Fourier transform and autocorrelation. At the second stage, a probabilistic model is proposed to characterize the periodic behaviors. For a specific period, periodic behaviors are statistically generalized from partial movement sequences through hierarchical clustering. A reference spot is a dense area that is frequently visited in the movement.

12 Y. Zheng, L. Zhang, X. Xie, and W. -Y. MaY. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from GPS trajectories. In WWW, pages 791–800, 2009. Input : GPS trajectories Output : interesting locations and travel sequences model multiple individuals’ location histories : tree-based hierarchical graph (TBHG) infer the interest of a location : HITS-based inference model mine the classical travel sequences among locations In this work, we first model multiple individuals’ location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual’s access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users’ travel experiences. 2) Users’ travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users’ travel experiences.

13 B. Deb and P. Basu. Discovering latent semantic structure in human mobility traces. In Wireless Sensor Networks, pages 84–103. Springer, 2015. Goal : to extract the semantics of mobility that is latent in the available contact traces. Method : Probabilistic Latent Semantic Trajectories (PLST), an unsupervised approach to extract semantically different locations and sequential patterns of mobility from such traces.

14 B. Deb and P. Basu. Discovering latent semantic structure in human mobility traces. In Wireless Sensor Networks, pages 84–103. Springer, 2015.

15 以上轨迹数据挖掘工作集中在构建模型描述观 察到的时空轨迹本身,在用户意图的分析和理 解方面还鲜有涉及。

16 轨迹意图挖掘

17 轨迹意图挖掘 类似于文本挖掘领域中的主题挖掘!

18 主题模型 LDA (latent dirichlet allocation)Author-Topic model (ATM) : extends LDA to include authorship information Topics over Time (TOT) : models time jointly with word co-occurrence patterns Bigram Topic Model : extends LDA by incorporating a notion of word order NTM (Neural Topic Model) : three layer neural network model considers the order of the words in the text

19 短文本中的数据稀疏问题

20 如何解决数据稀疏问题? 在训练模型之前将短文本聚集成长文档 假设短文本仅包含单一的主题biterm topic model (BTM):直接在整个语料库中对词共现 模式(即biterms)的生成过程进行建模

21 Dynamic Clustering of Streaming Short Documents (KDD 2016)short text: data sparsity assigning a single topic to each short document steaming data: time-varying distributions of topics over documents and words over topics modeling temporal dynamics by a short-term or long-term dependency model over sequential data dynamic clustering topic model (DCT)

22 Graphical representation of DCTNote that short term dependence DCT model excludes the two blue curved lines; while long term dependence DCT model does include these two lines.

23 Topic Modeling of Short Texts: A Pseudo-Document View (KDD 2016)Problems :data sparsity pseudo document : implicitly aggregate short texts against data sparsity Pseudo-document-based Topic Model (PTM) : without using auxiliary information Assumption 1: huge volume of short texts are generated from much less yet regular-sized latent documents, called pseudo documents. Assumption2 : each short text belongs to one and only one pseudo document. Sparsity-enhanced PTM (SPTM) : to further eliminate undesired correlations between pseudo documents and latent topics applying Spike and Slab prior to topic distributions of pseudo documents The short texts are observed documents and the pseudo documents are latent ones. A multinomial distribution is introduced to model the distribution of short texts over pseudo documents.

24 Topic Modeling of Short Texts: A Pseudo-Document View (KDD 2016)形式化描述:

25 Topic Modeling of Short Texts: A Pseudo-Document View (KDD 2016)生成过程:

26 如何解决数据稀疏问题?—利用辅助信息 Research along this line can be categorized into two types. Type1 : aggregate short texts into longer ones directly according to auxiliary information Type2 : build specific models with those information during short texts generation ( i.e., take auxiliary information into modeling directly) Problem: Auxiliary information is not always available or just too costly for deployment. Solution: designing customized topic models for short texts

27 如何解决数据稀疏问题?—缺乏辅助信息 biterm topic model word network topic modeldual sparse topic model self-aggregation topic model (SATM) clustering-based topic models Pseudo-document-based Topic Model (PTM) & Sparsity- enhanced PTM (SPTM)

28 意图挖掘 对轨迹数据进行处理以直接应用LDA 使用新颖的生成模型来处理序列模式挖掘 ADR-SPLDA T-BiLDAW4 (Who+Where+When+What)

29 Discovery of activity patterns using topic models (Ubicomp 2008)New method to recognize daily routines as a probabilistic combination of activity patterns Using probabilistic topic models to automatically discover the underlying activity patterns Applicable without user annotation

30 用户意图发现面临的挑战 如何实现有序位置的意图挖掘? 如何反映用户的个体和时空特征? 如何实现多源定位数据的增强和融合?

31 Intention of Movement:IoM假设轨迹是这样产生的:给定一个时间,首先采样用户隐含的 意图,然后根据采样的意图再采样合适的轨迹。 轨迹意图被认为是隐含元素,它和三种属性有关: 对象(用户的个人信息) 位置序列(空间信息) 位置序列的时间(时间信息) 意图对于位置序列,对象和时间分别存在一种分布,轨迹被认 为包含若干种意图。

32 THANKS!