Recently, a friend of mine is engaging in a research
about web mining of microblog (weibo), and trying to present some association
rules and marketing recommendations. He said there are few studies on web
mining of microblog and the main reason is that the length of the blog is too
short to find sufficient information and the vocabulary and phrases used in
microblog tend to be casual and informal that hinders the machine learning.
Figure
1. Shannon and Weaver's Model of Communication applied to Twitter Communication
Channel
Weibo, as the Chinese version of microblog, has been playing a more and more important role in our daily life just as Twitter in western society. I spend some time to search the articles about Twitter and finally found a published essay about An Observational Study of Physical Activity-Related Tweets. This is a PHD dissertation from Columbia University. The specific aims of this observational study of physical activity-related messages (Tweets) from the microblogging social medium, Twitter, were to determine the overall network structure and major communities among Tweet sources, and describe Tweet contents. The research team applied web data mining methods including social network analysis and n-gram based text mining techniques to discover network patterns among Tweet sources and contents of 174,394 Tweets that mentioned at least one of 17 different physical activities.
The primary framework underpinning this study is Shannon
and Weaver's mathematical model of communication (Shannon, 1948; Weaver &
Shannon, 1963), which was introduced by Prof. Chan earlier in our class. The
social network analysis also use some indicators we are familiar with to demonstrates
that most physical activity Tweet networks have sparse networks consisting of
many isolates and small groups (total average Tweet users= 2000, and density = 0.00037, reciprocity 12.5%,
total degree centralization 0.0113,
link count 970, isolates 743 per a network). The analysis results yielded
graphical representations of Tweet communication network structures and network
measures and identified key actors and communities. Key actors in communities
in most of the 17 physical activity networks were predominantly individuals
rather than organizations, healthcare providers, or governments.
The study results contribute
to advancing the methodological breadth of mining social media for
health-related purposes and also a good case study for other purposes’ study on
microblog.
[1] Sunmoo Yoon. Application of
Social Network Analysis and Text Mining to Characterize Network Structures and
Contents of Microblogging Messages: An Observational Study of Physical
Activity-Related Tweets. Columbia University, 2011.

The result suggests that individual can be more influential than an organization, I guess that's what SNS brings along. BTW, Weibo isn't equal to microblog, it's no more than a Chinese twitter. As Weibo split the interaction into forward and comment, it may be an more complicated and interesting subject than twitter.
ReplyDeleteThanks for your reminder that microblog is far more than Weibo. I searched the Wikipedia and it said microblogging services offer features such as privacy settings, which allow users to control who can read their microblogs, or alternative ways of publishing entries besides the web-based interface. These may include text messaging, instant messaging, E-mail, digital audio or digital video.
DeleteI use both blog and fb, actually I find it very useful for me to keep a blog as a daily and it remand me about what I get after a whole day's work. And maybe I post some words in my fb, but do you think micro-blog has a big market?
DeleteWell~Maybe I'll try to do some SNA on something I'm using and find more~
ReplyDeleteYes, you may consider bringing some application of SNA in your group project of this course. It will be a good way to practice what you learnt.
DeleteDear Ling , It's a very interesting topic, when others are clarifying the concept or principle of SNA, you have already paid attention to the SNA application. It's really a combination of theory with reality. From the concept we shall know that the degree centrality is between 0 and 1, and your result of the research showing 0.0113 is very small, which means everyone involved in the twitter communication shares nearly the same position, which is totally different from the star network. The reciprocity being 12.5% shows their interaction are all frequent. Do my understanding in the right way?
ReplyDeleteDear Xuan, thank you for your challenging question on my blog and the article I referred.
ReplyDeleteLet me try to clarify with the definition and explanation from the article.
Reciprocity: The proportion of links in a unimodal network that are reciprocated (Wellman, 1999). It indicates how many Tweet IDs respond to each other within a network.
Total Degree centralization: The degree of distribution concentration in a network. Measurements closer to 0 indicates decentralized information flow reflecting distributed communication style among Tweet users.
From the explanation above, the two figures are different and not necessarily consistent with each other and that's why they are combined to present a better analysis result.
Regarding the smallness of degree centrality in the analysis result, the main reason is that the tweets for the research had been filtered and only Physical Activity-Related tweets were studied. It is not a common star network as we know before.
Great point. I believe that the study of Weibo, blog or other social network website could be really helpful in discovering the behavior pattern of human beings.
ReplyDeleteWow. A nice article which gives us some facts in the real world of the social network in Twitter. It is interested to find the the key nodes are individuals instead of organizations or societies.
ReplyDeletewoo...web mining of microblog (weibo)? The term is really interesting. Is it an advanced technology in modern society? I'm so curious about this. Thanks to your introduction about this novel method.
ReplyDeleteI think it's worth to do a web mining on weibo, because nowadays, more and more people post weibo than write a long blog. Although the number of words in weibo is fewer than a blog, it also contains useful information. So we really can know something through weibo.
ReplyDeleteu do have a really awesome friend!
ReplyDeleteand the diagram is interesting
web mining of microblog, it is really interesting , and microblog is a much more effective social media nowadays, it could influence millions of people in a quite short time, so it is worth of doing web mining on the microblog, good study!
ReplyDelete> Key actors in communities in most of the 17 physical activity networks were predominantly individuals rather than organizations, healthcare providers, or governments.
ReplyDeleteNot surprised to see this, IMHO this is one of the reasons why microblogs or even social networks are so popular -- everyone can create contents and get noticed.