Recently, a friend of mine is engaging in a research
about web mining of microblog (weibo), and trying to present some association
rules and marketing recommendations. He said there are few studies on web
mining of microblog and the main reason is that the length of the blog is too
short to find sufficient information and the vocabulary and phrases used in
microblog tend to be casual and informal that hinders the machine learning.
Figure
1. Shannon and Weaver's Model of Communication applied to Twitter Communication
Channel
Weibo, as the Chinese version of microblog, has been playing a more and more important role in our daily life just as Twitter in western society. I spend some time to search the articles about Twitter and finally found a published essay about An Observational Study of Physical Activity-Related Tweets. This is a PHD dissertation from Columbia University. The specific aims of this observational study of physical activity-related messages (Tweets) from the microblogging social medium, Twitter, were to determine the overall network structure and major communities among Tweet sources, and describe Tweet contents. The research team applied web data mining methods including social network analysis and n-gram based text mining techniques to discover network patterns among Tweet sources and contents of 174,394 Tweets that mentioned at least one of 17 different physical activities.
The primary framework underpinning this study is Shannon
and Weaver's mathematical model of communication (Shannon, 1948; Weaver &
Shannon, 1963), which was introduced by Prof. Chan earlier in our class. The
social network analysis also use some indicators we are familiar with to demonstrates
that most physical activity Tweet networks have sparse networks consisting of
many isolates and small groups (total average Tweet users= 2000, and density = 0.00037, reciprocity 12.5%,
total degree centralization 0.0113,
link count 970, isolates 743 per a network). The analysis results yielded
graphical representations of Tweet communication network structures and network
measures and identified key actors and communities. Key actors in communities
in most of the 17 physical activity networks were predominantly individuals
rather than organizations, healthcare providers, or governments.
The study results contribute
to advancing the methodological breadth of mining social media for
health-related purposes and also a good case study for other purposes’ study on
microblog.
[1] Sunmoo Yoon. Application of
Social Network Analysis and Text Mining to Characterize Network Structures and
Contents of Microblogging Messages: An Observational Study of Physical
Activity-Related Tweets. Columbia University, 2011.





