Big Data is increasingly playing an important role in decision-making. They are helping make policy-makers, scientists, researchers make informed decisions and understand human interactions and motives better. For the initiated, Big Data refers to large datasets that can be analysed to reveal patterns, trends relating to human behaviour and interactions.
And, digital epidemiology is gaining ground quickly. Broadly, it refers to a branch of study that uses digital interactions to assess the health condition of a particular set of population.
How many times have you searched for symptoms or treatments or cures for ailments? And, these queries are mapped to particular area which gives us a rough idea of the health condition of a set of population. Sounds cool, right?
Google Flu Trends and Google Dengue Trends are a case in point.
Taking this idea forward, a bunch of researchers from University of California and Virginia Tech, tried to study how sexual risk behaviors are communicated on real-time social networking sites and Big Data might inform HIV prevention and detection.
The study had three specific goals:
- To check the possibility of extracting real-time data on conversations pertaining to HIV;
- Understand the prevalence and content of these interactions; and
- To check the feasibility of using the real-time data to monitor and detecting HIV transmission.
About 2,157,260 geolocated tweets were used to analyse and separated into two: a sexual tweet or a drug risk-related tweet (if it contained one or more risk-related words). Here’s how they made sure that the tweets were properly filtered”
“A sample of the filtered tweets was manually checked to ensure they were accurately related to HIV risk behavior. The text of each tweet was processed to maximize sensitivity and specificity of content identification by filtering out tweets that contained co-occurring words that were not associated with HIV risk behaviors (such as removing tweets if “coke” included references to the drink instead of the drug). Based on these results, the list of words in the algorithm was refined to improve the accuracy of the tweets as being related to sexual risk.”
The algorithm collected 8,538 sexual risk-related tweets and 1,342 stimulant drug use-related tweets.
Results of the study
The results showed significant positive relationship between the proportion of sex risk-related, drug risk-related tweets and the HIV prevalence level in the area. It showed that real-time social media data might be used for extracting, detecting, and remote monitoring of health-related attitudes and behaviors.
The authors contend, “This study is important because it not only provides support for use of “big data” and information on where and how people are communicating about HIV risk behaviors online, but because it also provides support for a method of testing whether these data can be used for HIV surveillance.”
Reference(s):
Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes by Sean D. Young, PhD, MS, Caitlin Rivers, MS, and Bryan Lewis, PhD. Access it here.
thx