Saturday, November 24, 2012

A SNA application on Study of Twitter


Recently, a friend of mine is engaging in a research about web mining of microblog (weibo), and trying to present some association rules and marketing recommendations. He said there are few studies on web mining of microblog and the main reason is that the length of the blog is too short to find sufficient information and the vocabulary and phrases used in microblog tend to be casual and informal that hinders the machine learning.

Figure 1. Shannon and Weaver's Model of Communication applied to Twitter Communication Channel

Weibo, as the Chinese version of microblog, has been playing a more and more important role in our daily life just as Twitter in western society. I spend some time to search the articles about Twitter and finally found a published essay about An Observational Study of Physical Activity-Related Tweets. This is a PHD dissertation from Columbia University. The specific aims of this observational study of physical activity-related messages (Tweets) from the microblogging social medium, Twitter, were to determine the overall network structure and major communities among Tweet sources, and describe Tweet contents. The research team applied web data mining methods including social network analysis and n-gram based text mining techniques to discover network patterns among Tweet sources and contents of 174,394 Tweets that mentioned at least one of 17 different physical activities.

The primary framework underpinning this study is Shannon and Weaver's mathematical model of communication (Shannon, 1948; Weaver & Shannon, 1963), which was introduced by Prof. Chan earlier in our class. The social network analysis also use some indicators we are familiar with to demonstrates that most physical activity Tweet networks have sparse networks consisting of many isolates and small groups (total average Tweet users= 2000, and density = 0.00037, reciprocity 12.5%, total degree centralization 0.0113, link count 970, isolates 743 per a network). The analysis results yielded graphical representations of Tweet communication network structures and network measures and identified key actors and communities. Key actors in communities in most of the 17 physical activity networks were predominantly individuals rather than organizations, healthcare providers, or governments.
The study results contribute to advancing the methodological breadth of mining social media for health-related purposes and also a good case study for other purposes’ study on microblog.


[1] Sunmoo Yoon. Application of Social Network Analysis and Text Mining to Characterize Network Structures and Contents of Microblogging Messages: An Observational Study of Physical Activity-Related Tweets. Columbia University, 2011.

14 comments:

  1. The result suggests that individual can be more influential than an organization, I guess that's what SNS brings along. BTW, Weibo isn't equal to microblog, it's no more than a Chinese twitter. As Weibo split the interaction into forward and comment, it may be an more complicated and interesting subject than twitter.

    ReplyDelete
    Replies
    1. Thanks for your reminder that microblog is far more than Weibo. I searched the Wikipedia and it said microblogging services offer features such as privacy settings, which allow users to control who can read their microblogs, or alternative ways of publishing entries besides the web-based interface. These may include text messaging, instant messaging, E-mail, digital audio or digital video.

      Delete
    2. I use both blog and fb, actually I find it very useful for me to keep a blog as a daily and it remand me about what I get after a whole day's work. And maybe I post some words in my fb, but do you think micro-blog has a big market?

      Delete
  2. Well~Maybe I'll try to do some SNA on something I'm using and find more~

    ReplyDelete
    Replies
    1. Yes, you may consider bringing some application of SNA in your group project of this course. It will be a good way to practice what you learnt.

      Delete
  3. Dear Ling , It's a very interesting topic, when others are clarifying the concept or principle of SNA, you have already paid attention to the SNA application. It's really a combination of theory with reality. From the concept we shall know that the degree centrality is between 0 and 1, and your result of the research showing 0.0113 is very small, which means everyone involved in the twitter communication shares nearly the same position, which is totally different from the star network. The reciprocity being 12.5% shows their interaction are all frequent. Do my understanding in the right way?

    ReplyDelete
  4. Dear Xuan, thank you for your challenging question on my blog and the article I referred.

    Let me try to clarify with the definition and explanation from the article.

    Reciprocity: The proportion of links in a unimodal network that are reciprocated (Wellman, 1999). It indicates how many Tweet IDs respond to each other within a network.

    Total Degree centralization: The degree of distribution concentration in a network. Measurements closer to 0 indicates decentralized information flow reflecting distributed communication style among Tweet users.

    From the explanation above, the two figures are different and not necessarily consistent with each other and that's why they are combined to present a better analysis result.

    Regarding the smallness of degree centrality in the analysis result, the main reason is that the tweets for the research had been filtered and only Physical Activity-Related tweets were studied. It is not a common star network as we know before.

    ReplyDelete
  5. Great point. I believe that the study of Weibo, blog or other social network website could be really helpful in discovering the behavior pattern of human beings.

    ReplyDelete
  6. Wow. A nice article which gives us some facts in the real world of the social network in Twitter. It is interested to find the the key nodes are individuals instead of organizations or societies.

    ReplyDelete
  7. woo...web mining of microblog (weibo)? The term is really interesting. Is it an advanced technology in modern society? I'm so curious about this. Thanks to your introduction about this novel method.

    ReplyDelete
  8. I think it's worth to do a web mining on weibo, because nowadays, more and more people post weibo than write a long blog. Although the number of words in weibo is fewer than a blog, it also contains useful information. So we really can know something through weibo.

    ReplyDelete
  9. u do have a really awesome friend!

    and the diagram is interesting

    ReplyDelete
  10. web mining of microblog, it is really interesting , and microblog is a much more effective social media nowadays, it could influence millions of people in a quite short time, so it is worth of doing web mining on the microblog, good study!

    ReplyDelete
  11. > Key actors in communities in most of the 17 physical activity networks were predominantly individuals rather than organizations, healthcare providers, or governments.
    Not surprised to see this, IMHO this is one of the reasons why microblogs or even social networks are so popular -- everyone can create contents and get noticed.

    ReplyDelete