New Data Analysis Tool Developed

recently published paper in Science reports a new data analysis tool that is able to search complex data sets for relationships and trends that are invisible in other types of statistical analysis.
Take, for example, bacterial species that colonize the gut of humans and other mammals: There are trillions of bacteria; even narrowing down the data set to just seven thousand yields over 22 million potential relationships between assorted pairs of bacteria. How can microbiologists keep themselves from drowning in such a huge sea of data, and know in advance what kinds of patterns to look for? Challenges like this are faced not only by microbiologists: Large, complex data sets with thousands of variables are increasingly common in fields such as genomics, physics, political science, economics and more, and there is thus an increasing need for data-analysis tools to make sense of them.
The tool,developed by Yakir Reshef,David Reshef and Hilary Finucanet,under the guidance of advisers Michael Mitzenmacher of the Harvard University School of Engineering and Applied Sciences and Pardis Sabeti of the Broad Institute – is called the maximal information coefficient, or MIC for short. It is based on the idea that if two variables are related to each other, there should be a way to draw a grid on a scatterplot of the two variables in a way that captures the relationship between them. The algorithm that calculates the MIC searches through many such grids and uses the one best able to quantify how strong the relationship is. Researchers can calculate the MIC on each pair of variables in their data set, rank the pairs by their scores (the higher the score, the more related the pair) and then examine the top-scoring pairs – that is, the pairs that affect each other the most.
To test how well the algorithm works, Yakir, David and Hilary applied the MIC to data sets in a variety of fields – global health, gene expression, human gut microbiota and even major-league baseball – and compared the MIC results to those of current methods.
How did they fare? With regard to the microbiota data, the MIC was able to narrow down 22 million variable pairs to just a few hundred interesting relationships, many of which had not been observed before. For instance, it identified examples of “non-coexistent” species in which if one bacterium is abundant, the other is not, and vice versa. Some of the non-coexistent relationships identified were familiar – known to be caused by differences in host diet – while others were novel. This finding raises the possibility of the existence of additional factors that, like diet, affect the make-up of the human microbiome.

In another example, the team examined a data set from the World Health Organization covering 200 countries and containing 357 variables per country. One of the identified relationships was between female obesity and household income in the Pacific Islands, in which obesity increases with income, in contrast with other countries. It turned out that obesity, rather than being an anomaly, is considered a sign of status in the Pacific Islands. Most methods would treat this separate trend as “noise,” but the MIC is able to identify relationships, such as this one, that include more than one trend.
MIC is part of a suite of statistical tools called MINE for Maximal Information-based Nonparametric Exploration.One of the greatest strengths of this newly discovered tool within MINE is its ability to detect and analyze a broad spectrum of patterns and characterize them according to a number of different parameters a researcher might be interested in. Other statistical tools work well for searching for a specific pattern in a large data set, but cannot score and compare different kinds of possible relationships. Researchers can also use MINE to generate new ideas and connections.
Sourced and adapted from:
Enhanced by Zemanta

R-loops break walls of gene silencing

From :http://www.news.ucdavis.edu/search/news_detail.lasso?id=10165

Researchers at the University of California, Davis, have figured out how the human body keeps essential genes switched “on” and silences the vast stretches of genetic repeats and “junk” DNA.

Frédéric Chédin, associate professor in the Department of Molecular and Cellular Biology, describes the research in a paper published today (March 1) in the journal Molecular Cell. The work could lead to treatments for lupus and other autoimmune diseases, by reversing the gene-silencing process known as cytosine methylation.

“R-loops” are the key, say graduate student Paul Ginno, Chédin and colleagues. The loops emerge in the RNA transcription process in DNA sections that are rich in cytosine and guanine, the C and G in the four-letter DNA code. These C and G stretches serve as “on” switches, or promoters, for about 60 percent of human genes.

Scientists have known since the 1980s that these so-called CG island promoters are not subject to methylation. But, Chédin said, the mechanism has been a long-standing mystery.

The UC Davis researchers built a catalog of almost 8,000 CG islands in the human genome, studied their DNA sequences and found the CG sequences to be skewed toward having one strand of the double helix rich in guanine, and the complementary strand rich in cytosine.

Then, in RNA transcription, the G-rich RNA remains stably bound to a C-rich DNA strand, forcing the G-rich DNA strand into a loop — which then prevents methylation.

DNA methylation is considered part of the new field of epigenetics, which studies inheritable genetic changes that are not directly coded in the DNA sequence. However, the new work shows that, at least at CG islands, the epigenetic state is determined by the DNA sequence.

Scientists know that reduced methylation of DNA plays a key role in triggering autoimmunity in lupus, Chédin said. However, the molecular events behind this DNA under-methylation have been unclear.

“Our work establishes that excessive R-loop formation may drive under-methylation and autoimmunity,” Chédin said.

Co-authors: Paul Lott, graduate student; Holly Christensen, undergraduate; and Ian Korf, associate professor in the Department of Molecular and Cellular Biology and the Genome Center.

The National Institutes of Health and the Foundation for Prader-Willi Research supported the project.

Enhanced by Zemanta

Computational Science-Many Scientists Have Poor Coding Skills

My Computer
Image by aLii_ via Flickr

“When hackers leaked thousands of e-mails from the Climatic Research Unit (CRU) at the University of East Anglia in Norwich, UK, last year, global-warming sceptics pored over the documents for signs that researchers had manipulated data. No such evidence emerged, but the e-mails did reveal another problem — one described by a CRU employee named “Harry”, who often wrote of his wrestling matches with wonky computer software….”

Excerpt  from Nature.com/news;Read more by clicking the following link:http://bit.ly/bpZBOD

Enhanced by Zemanta