Friday, August 29, 2014

Genomic cold fusion? Part II. Realities of mapping

Mapping to find genomic causes of a trait of interest, like a disease, is done when the basic physiology is not known—maybe we have zero ideas, or the physiology we think is involved doesn’t show obvious differences between cases and controls.  If you know the biology, you won't have to use mapping methods, because you can explore the relevant genes directly.  Otherwise, and today often, we have to go fishing, in the genome, to find places that may vary in association—statistical regularity—with the trait.

The classical way to do this is called linkage analysis.  That term generally refers to tracing cases and marker variants in known families.  If parents transmit a causal allele (variant at some place in the genome) to their children, then we can find clusters of cases in those families, but no cases in other families (assuming one cause only).  We have Mendel’s classical rules for the transmission pattern and can attempt to fit that pattern to the data—for example, to exclude some non-genetic trait sharing.  After all, family members might share many things just because they have similar interests or habits.  Even disease can be due to shared environmental exposures. Mendelian principles allow us, with enough data, to discriminate.

“Enough data” is the catch.  Linkage analysis works well if there is a strong genetic signal.  If there is only one cause, we can collect multiple families and analyze their transmission patterns jointly.  Or, in some circumstances, we can collect very large, multi-generational families (often called pedigrees) and try to track a marker allele with the trait across the generations.  This has worked very well for some very strong-effect variants conferring very high risk for very specific, even quite rate, disorders.  That is because the linkage disequilibrium—the association between a marker allele and a causal variant due to their shared evolutionary history (as described in Part I) ties the two together in these families.

But it is often very costly or impractical to collect actual large pedigrees that include many children each generation, and multiple generations.  Family members who have died cannot be studied and medical records may be untrustworthy, or family members may have moved, refuse to participate in a study, or be inaccessible for many reasons.  So a generation or so ago the idea arose that if we collect cases from a population we may also collect copies of nearby marker alleles in linkage disequilibrium—shared evolutionary history in the population—so that, as described in Part I, a marker allele has been transmitted through many generations of unknown but assumed pedigree, so that the marker will have been transmitted in the pedigree along with the causal variant.  This is implicit linkage analysis, called genomewide association analysis (GWAS), about which we’ve commented many times in the past.  GWAS look for association between marker and causal site in implicit but assumed pedigrees, and is another form of linkage analysis.

When genetic causation is simple enough, this will work.  Indeed, it is far easier and less costly to collect many cases and controls than many deep pedigrees, so that a carefully designed GWAS can identify causes that are reasonably strong.  But this may not always work, when a trait is ‘complex’, and has many different genetic and/or environmental contributing causes.

If causation is complex, families provide a more powerful kind of sample to use in searching for genetic factors.  The reason is simple: in general a single family will be transmitting fewer causal variants than a collection of separate families.  Related to this is the reason that isolate populations, like Finland or Iceland, can in principle be good places to search, because they represent very large, even if implicit, pedigrees.  Sometimes the pedigree can actually be documented in such populations.

If causation is complex, then linkage analysis in families will hopefully be better than big population samples for finding causal contributors, simply because a family will be segregating (transmitting) fewer different causal variants than a big population.  We might find the variant in linkage analysis in a big family, or an isolate population, but of course if there are many different variants, a given family may point us only to one or two of them.  For this reason, many argue that family analysis is useless for complex traits—one commenter on a previous Tweet we made from our course, likened linkage analysis for complex traits to ‘cold fusion’.  In fact, this was a mistake and is incorrect. 

Association analysis, the main alternative to linkage analysis, is just a combining of many different implicit families, for the population-history reason we’ve described here and in Part I.  The more families you combine, whether they are explicit or implicit, the more variation, including statistical ‘noise’, you incorporate.  The rather paltry findings of many GWAS are a testament to this fact, explaining as they have only a small fraction of most traits to which that method has been applied.  Worse, the greater the sample of this type, like cases vs controls, the more environmental variation you may be grouping together, again greatly watering down even the weak signal of many or, probably, by far most genetic causal factors.

In fact, if you are forced to go fishing for genetic cause, you may well be fishing in dreamland because you may simply be in denial of the implications of causal complexity.  In fact, all mapping is a form of linkage analysis.  Instead, one should tailor one’s approach to the realities of data and trait.  Some complex trait genes have been found by linkage analysis (e.g., the BRCA breast-cancer associated genes), though of course here we might quibble about the definition of 'complexity'. 

Sneering at linkage analysis because it is difficult to get big families, or  because even single deep families may themselves be transmitting multiple causes (as is often found in isolate studies, in fact), is often simply a circle-the-wagon defense of Big Data studies, that capture huge amounts of funding with relatively little payoff to date.

A biological approach?
Many linkage and association analyses are done because we don’t understand the basic biology of a trait well enough to go straight to ‘candidate’ genes to detect, prevent, or develop treatment for a trait.  Today, even though this approach has been the rule for nearly 20 years now, with little payoff, the defense is often still that more, more and even more data will solve the problem.  But if causation is too complex this can also be a costly, self-interested, weak defense.

If we have whole genome sequence on huge numbers of people, or even everyone in a population, or in many populations so we can pool data, that we will find the pot of gold (or is it cold fusion?) at the end of the rainbow.

One argument for this is to search population-wide genome sequenced biomedical data bases for variants that may be transmitted from parents to offspring, but that are so rare that they cannot generate a useful signal in huge, pooled GWAS studies.  This usually will still be in the form of linkage analysis if a marker in a given causal gene is transmitted with the trait in occasional families but the same gene is identified, even if via different families.  That is, if variation in the same gene is found to be involved in different individuals, but with different specific alleles, then one can take that gene seriously as a causal candidate.

This sometimes works, but usually only when the gene’s biology is known enough to have a reason to suspect it.  Otherwise, the problem is that so much is shared between close family members (whether implicitly or explicitly in known pedigrees) that if you don’t know the biology there will be too much to search through, too much co-transmitted variation.  Causal variation need not be in regular ‘genes’, but can be, and for complex traits seems typically to be, in regulatory or other regions of the genome, whose functional sites may not be known.  Also, we all harbor variation in genes that is not harmful, and we all carry ‘dead’ genes without problems, as many studies have now shown.

If one knows enough biology to suspect a set of genes, and finds variants of known effect (such as truncating a gene’s coding region so a normal protein isn’t made) in different affected individuals, then one has strong evidence s/he has found a target gene.  There are many examples of this for single-gene traits.  But for complex traits, even most genes that have been identified have only weak effects—the same variant most of the time is also found in healthy, unaffected individuals.  In this case, which seems often to be the biological truth, there is no big-cause gene to be found, or a gene has a big-cause only in some unusual genotypes in the rest of the genome.

Even knowing the biology doesn't say whether a given gene's protein code is involved rather than its regulation or other related factors (like making the chromosomal region available in the right cells, downregulating its messenger RNA, and other genome functions).  Even in multiple instances of a gene region, there may be many nucleotide variants observed among cases and controls.  The hunt is usually not easy even knowing the biology--and this is, of course, especially true if the trait isn't well-defined, as is often the case, or if it is complex or has many different contributors.

Big Data, like any other method, works when it works.  The question is when and whether it is worth its cost, regardless of how advantageous for investigators who like playing with (or having and managing) huge resources.  Whether or not it is any less ‘cold fusion’ than classical linkage analysis in big families, is debatable.  

Again, most searches for causal variation in the genome rest on statistical linkage between marker sites and causal sites due to shared evolutionary history.  Good study design is always important.  Dismissal of one method over another is too often little more than advocacy of a scientist’s personal intellectual or vested interests.

The problem is that complex traits are properly named:  they are complex. Better ideas are needed than what are being proposed these days.  We know that Big Data is ‘in’ and the money will pour in that direction.  From such data bases all sorts of samples, family or otherwise, can be drawn.  Simulation of strategies (such as with programs like our ForSim that we discussed in our recent Logical Reasoning course in Finland) can be done to try to optimize studies. 

In the end, however, fishing in a pond of minnows, no matter how it’s done, will only find minnows. But these days they are very expensive minnows.

Thursday, August 28, 2014

Genomic cold fusion? Part I. Rational and irrational aspects of mapping

I’m sitting here on a smooth, quiet train from Zurich to Innsbruck, a few days after the mini-course that we taught in Helsinki. In this post I want to make a few reflections on things said by people reacting to Facebook or Twitter messages about the course, comments that were too short to do justice to what we actually said.

In particular, the issues have to do with the nature of genome mapping strategies and what they are or mean.  There seems to be a good bit of confusion in this area, perhaps because of a lack of proper explanation of what these methods do, and why and how they work.

First, nobody should be doing mapping, looking for genes causally responsible for traits, unless they have some legitimate reason for believing that a trait is substantially affected by genes—that is, that variation in the trait or risk of a trait like a disease is causally associated with variation in a particular spot in the genome.  Such a reason, at best, would be that the trait seems to segregate in families as if caused by a single Mendelian factor.  If the evidence is weaker than that—as it so often is—then mapping becomes the more problematic.

If we don’t know the part of the genome that affects the trait, then we use many measured variable sites, called markers, that span the genome with the idea that wherever the causal site is, it will be near one of our markers.  Essentially, that is, we are searching for statistically significant associations between the marker and trait, based on some basically subjectively chosen measure, like a p-value, in samples that we believe are appropriate for detecting causal effects.

What is perhaps not widely appreciated, is the nearly essential way that such searches rely on evolutionary assumptions.  We say ‘nearly’ because if one happens by huge luck to genotype the causal site itself, the test for association may be a bit more direct, as we’ll try to explain.

Mapping is based on evolutionary history
Evolution, or population history, generates the variation that causes the trait effect, and the variation we use as markers.  Mutational events generating these variants occur when they occur, and we choose markers based on the idea that they vary in our chosen type of sample, and that the instances of a given marker allele (variant) are descendant copies of some original mutation.  These instances of the same allele are said to be identical by descent (IBD) from that common ancestral copy.  Sets of instances of the marker also mark nearby chromosomal regions that have been passed down the same chain of descent.  That shared region is called a haplotype, and it gradually shortens over the post-mutation generations by a process called recombination.

If at some later time in the history of the haplotype ‘tagged’ by the marker variant another mutation occurs in a gene and alters that gene’s effects to generate the trait we are interested in, then the marker variant will be present in subsequent descendant copies of that twice-hit haplotype, and the causal signal will be associated with the presence of the marker variant.  This is called linkage disequilibrium (LD), and is the reason that mapping works.  That is, mapping works because of shared evolutionary (population) history of the marker and causal variants.

An hypothetical, simple example
[I’m continuing this post a couple of days from when I started it on the train to Innsbruck, and now finishing it in a nice hotel in Old Town, overlooking the Inn river.  Beautiful!]

Let’s say that we have a marker at which some people have a G nucleotide and others a T.   And let’s say the disease causal site, D, is near the G/T site, and that the D mutation, wherever it is on the chromosome, is near a copy of the chromosome that has the G on it at the marker site.  Then, what we hope is that the disease will be associated with the G—that enough more people with the disease will have the G than people without the disease.  This is the kind of association between trait-cause and marker that mapping is looking for.  But what can make it happen?

If we’re lucky everyone with the D allele at the causal site will have the trait (the ‘D’ mutation is fully penetrant, as we’d say).  And if there has been no recombination, and no other way to get the trait, then nobody with a T at the marker will also have the D variant—none of the T-bearers will have the disease.  Cases will have the G, controls the T.

This sort of perfect association depends on when the D-mutation, wherever it is on the chromosome, occurred relative to the mutation that produced the T at the marker.  We usually pick marker sites because we know that the variation (here, G vs T) is common in the population, and that means that the mutation is rather old.  Enough generations have passed for there to be a substantial fraction of T-bearing, and G-bearing people in the population.

If the ‘D’ mutation occurred right after this G-T marker’s mutation, then all copies of the G variant at the marker will also have the trait.  But if the trait-mutation occurred much later, then only a few of the G-bearing chromosomes will have the D-causing trait.  The association, even if true, will be weak.  If the D-site is far from the G-T marker site, then if the D-causing mutation occurred long enough ago for most G-bearers also to have the trait, but there’s a trap: in this case there will have  been enough time for recombination to switch the D-site onto a T-bearing marker chromosome.  The G-D association will no longer be perfect.

Likewise, if there are many different causes of the trait, then some cases will not be due to the D-variant (tagged by the G-allele at the nearby marker), even if the latter really is also a cause.  We’ll have cases with the T-marker variant, and in this case it’s not because of recombination.  The more causes of the trait the weaker the association between a specific marker, like the G-T one. 

Science or cold fusion?
So mapping is a multiple-edged sword.  Now, there are several ways to try to find trait-associated parts of the genome.  One is called linkage mapping, the other association mapping (genomewide association, or GWAS).  And one can also think that causal sites can be found not  by relying on linkage-disequilibrium, but simply by looking for causal variants directly.

These various strategies have their strong and weak points, and there is just as strong disagreement as to which to apply when.  That’s why someone can, sometimes sneeringly, claim that this or that approach is ‘cold fusion’—that it’s imaginary, and won’t or can’t work.  But since mapping for complex traits is not doing very well—as we’ve posted many times (and many others have repeatedly observed), we are usually explaining only a rather small if not trivial fraction of causation by mapping, the issues are serious, regardless of the vested interests of those contending with these issues.

In our next post we’ll discuss some of these issues about methods.

Choose a blog post and vote!

Choose your favorite blog post among the nominees at 3 Quarks Daily, and vote here!  (There are 3 MT nominees -- just sayin'!)

Wednesday, August 20, 2014

Blogging isn't catastrophic, but the opposite could be.

Ken and I just had an article published in Evolutionary Anthropology:

Catastrophes in evolution: Is Cuvier's world extinct or extant?

It's open access, so no need for a subscription to read it.

It's the second one we've done (first one is here). The piece is largely the product of many discussions we've had, mainly over email, and these discussions were sparked by posts we had each written for the MT.

Beyond how satisfying it was to have these discussions with Ken and to write this paper with him, it was a great excuse to read Elizabeth Kolbert's articles in The New Yorker (here and here) as well as her wonderful book that accompanies them:


Although the subtitle's irksome if you're not keen on separating human behavior from nature, the book's incredibly insightful. And, it's captivating if you just love tales of exploration and discovery, and if you eat up details about kit, gear and extraordinary travel conditions. It was sometimes difficult to read through my jealousy, and I consider that reason alone to recommend this read, regardless of the compelling scientific history, the exciting albeit depressing cutting-edge knowledge, as well as the important political message that only peeks out, from under the enormous pile of scientific evidence, in her final paragraphs.

It's because of our ongoing discussions and writings and then also Kolbert's, that Ken and I got to thinking about whether and how extinction, background and mass extinctions, and especially Cuvier's pre-Darwinian notions of "catastrophism" are playing out in paleoanthropology right now. This is the overall theme of our piece linked above.

Kolbert deals briefly with Neanderthals near the end of her book. However, Ken and I weren't so much concerned with what happened to the Neanderthals as whether, for instance, we could fairly consider what happened to them to be "extinction" given what we know about their DNA living on inside, probably, billions of us today. And, because of those genetic circumstances, it naturally made us wonder whether anything we call "extinct" truly is and if it is, how could we know? This of course begs for a thoughtful consideration of species and adaptation and, seemingly, all the ol' evolutionary chestnuts that are terribly difficult to crack.

I don't think that what Ken and I contributed in Evolutionary Anthropology was far different from anything that could have occurred before blogs were invented, but blogging certainly did facilitate it. What's more, if I didn't have The Mermaid's Tale, if I wasn't routinely reading it and writing for it, I probably wouldn't be thinking this regularly and this deeply about many of these marvelous things in the first place, especially not with the unimaginably wonderful benefit of engaging with Anne and Ken.  What a catastrophe that would be.

Tuesday, August 19, 2014

Nominate a blog post for the 3 Quarks Daily science writing prize

If you write about science or if you read about science, and if you like making new friends, earning praise and winning money, or if you would like science writers to make new friends, earn praise and win money, then you should definitely, by the August 22 deadline, nominate something for this:

The 5th annual 3 Quarks Daily science writing prize!

Information here: http://www.3quarksdaily.com/3quarksdaily/2014/08/frans-b-m-de-waal-to-judge-5th-annual-3qd-science-prize.html

3QD editor Abbas Raza says:
We are very honored and pleased to announce that Frans de Waal has agreed to be the final judge for our 5th annual prize for the best blog and online writing in the category of science. Details of the previous four science (and other) prizes can be seen on our prize page.

What a fantastic judge they scored this year.

Last round of this contest--thanks to readers of the MT who voted and to the 3QD editors and that year's judge, Sean Carroll--I won the Charm Quark for "Forget bipedalism, what about babyism?"*


It's a wonderfully inspiring thing to experience and I'm so excited for the writers who will win this year's contest. Please help to make it a good contest by nominating what's turned you on, lit you up, wizened, informed, enlightened, or inspired you.  All you have to do is choose something about science that you like, dating back only as far as August 10, 2013, and then post the URL to it in the comments section HERE.

Each person can only nominate one link, which encourages writers to nominate one of their own. So don't be humble or shy or insecure. Do it!

And if you're not a writer, nominate a link that you've really enjoyed reading. Support your science writers in this often thankless service!

This isn't a ploy to get you to nominate one of mine. For good fun, I already nominated this one anyway:



But if Ken, Anne, Dan, Jim, Reed or another guest writer posted something here, or if another writer posted something anywhere else in the last year that stuck with you or that struck you, then for the love of science and science writing, please nominate them before the August 22 deadline!


*Which now has deadlinks to cute photos because back in 2012 I didn't know what the hell I was doing with images in blogger.

Monday, August 18, 2014

Logical Reasoning in Helsinki

Ken and I are in Finland this week co-teaching the Logical Reasoning in Human Genetics course that Ken and Joe Terwilliger have taught a number of times in a number of places over the last 10 years.  People in the class, and/or I, may do some live tweeting at #lrhg14.

We'll be away for another week or so after the course.  We will do some blogging this week or next if we find the time.  If not, we'll be back the first week of September.

Helsinki: Wikipedia

Friday, August 15, 2014

The abbatoir, the lab, and pre-medieval behavior

It's a lazy August day and one wonders what to write about.  So I took a walk with my constant companions--sadly, not a dog, but my iPod.  I was listening to one of the BBC Radio4 program podcasts that we like, and I thought it would be worth putting down some thoughts, hoping to make them relevant.

Abbatoirs, or slaughterhouses, are among the most sensitive kinds of industrial plants.  This post was stimulated by the  BBC story I was listening to (File on 4: Inside the Abbatoir, June 17, 2014).  A standard protocol for killing mammals is to stun them with an electric shock to the brain, knocking them out to they'll feel no pain or terror, and then quickly killing them by, for example, stringing them up, slitting their throat, and letting the blood drain. Then they are butchered. The treatment of food birds is something like this, as I understand it, but the birds are first hung up by their feet, so they probably feel more terror before the deed is done.  Of course, all of this may be more gruesomely done on the farm, for both birds and mammals, though there are certainly farmers who work hard to ensure that their animals are calm until their sudden end.  But an abbatoir does it to numbers that would match a WWI battlefield--every day.

A properly run abbatoir, gruesomely, uses the same idea we have with human execution: a nice last meal, and a blindfold or for those to be done in with chemicals, a tranquilizer first.  Similar considerations are given to pets who are put 'to sleep' by a vet when they are old and suffering.

In the slaughterhouse, Lovis Corinth, 1893; Wikipedia

The BBC story described how this killing is done when done right.  It's properly supervised, sanitary, and the like.  If an animal has to go, well, it's better than how most wild animals have had to make their exit, being torn apart by a predator while alive or suffering an injury or disease without medical care or even (with some exceptions) sympathy from friends or relatives.

But the BBC story also describes how some Jews and Muslims are excused from this humaneness, and allowed to engage in pre-medieval slaughtering techniques (i.e., no stunning first), because, apparently, God (the loving one, that is) apparently said we have to torment animals to please Him.  That doesn't seem very different from Aztecs cutting out the hearts of their living victims (although, I vaguely recall their victims were at least intoxicated on something first).  I only pick these examples because I am too ignorant to have any idea how much other savagery we humans allow today in the name of other Gods or for what rationales.

If stunning is humane and if we are to eat meat, the killing is probably not exceptionable.  However, the BBC story reports various lapses in the system, disturbing instances of lax inspection, and cheating for sport, anger, or for convenience. Even in this sensitive context, are the insensitive among us.

What about fish?  We are generally quite happy with dragging them up from hearth and home, by the net-full, only to suffocate en masse, not so different from, say, the gas chambers, I guess.  Or, when undertaking mere individual slaughter, by hooking them (for sport) in the mouth before asphyxiating them.  Fortunately, thanks to research in part by faculty here at Penn State that shows that fish are not just automatons, there are growing numbers of human fish abbatoirs, that use altered water or stunning to lull the animals to their doom, as humanely at least as the fate we dole out to mammals.

Our concern for doing our killing gently is clearly inconsistent even when applied to other humans. Just look at the latest news. Bombing of children and hospitals, beheading or crucifying captured people because our God (the loving one, that is) says it's the thing to do and (we say) doesn't like their God. He must be a blood-sport fan.  In that regard, it is interesting to read, as perchance I've been doing, Milton's Paradise Lost, in which there's a Hollywood-like tale of wars among the 'angels' in that Heaven we so aspire to attend.

We justify at least instant killing on the grounds that we have to eat and that, given those conditions, instant killing is at least terror and pain-free.  But one reason vegetarians believe as they do is that killing sentient animals (some would, properly, include all animals) is in itself cruel no matter how kindly done, and since we can live perfectly well, and more economically sustainably, on plants, that's what we should do (though, personally, I question the plant exceptionalism since plants clearly respond to environmental trauma and threats).

Experimental abbatoirs
But MT is generally a science blog.  So let's talk about what goes on in the animal research lab.  IRBs (Institutional Rationalizing Boards) generally approve research procedures as being useful to human knowledge, and good for the research business, so long as they don't outright torture the animals. There are at least some limits.  But speaking of things pre-medieval, the reality is closer to saying that, as God (the loving one, that is) pronounced, the rest of animals and plants are just here for us to exploit, and we countenance a lot of things being done to animals, effectively under such an implicit assumption.

For example, what about, say, flies?  Here, the rationalizing gets even more contorted, or perhaps less. Insects and such simple creatures are said either not to feel pain or experience terror.  The way they're sometimes treated flies must absolutely like to have their body segments altered, or electrodes stuck into their brains.  Observations of insects in nature suggests they do sense and recoil from danger, and experience distress.

The arguments justifying research-based experimenting with animals is that that's how we learn about the world (and there's the widespread treatment of science as a largely unquestioned good), or that making countless animals experience a nasty disease or experimental 'procedure', often the only life they'll know, will eventually prevent humans from having to suffer in the same manner we make the mice suffer. We at least claim to try to minimize the trauma, but many in science know the more grim reality.  It's human exceptionalism, but since we're the ones in charge it's no surprise that we behave that way.

Just as we give life and then taketh it away from cows and chickens, so do we for lab mice.  They have their day (in the artificial light of the mouse room), at least, existence they'd not experience were it not for our NIH grant.  Some even get to have a rather active sex life (though, if female, usually they are killed while pregnant, so we can study their not-yet-offspring).


If we accept the reality and inevitability of mortality, then one can accept the killing for food as well as research. But need we accept the torment?  Could we at least have more stringent limits? Animal rights lobbyists, descendants of anti-vivisectionists, are irritating to those running research labs, but perhaps at least help keep things somewhat tempered.  After all, this is nothing new:  The great Roman physician Galen was famous for doing dissections on live unanethesized animals--in the name of science, and indeed somewhat theatrically.  We're not as savage as that!

We can always make up a rationale about human good or basic knowledge, or that the animals don't really suffer; but the fraction of lab animals who shed much light on scientific knowledge is small, and what we're allowed to do to them not so small, even though certainly many lab animals do 'contribute' to ultimate human good.  These are not easy issues (and I say this not in an accusatory way: I worked on developmental genetics of mice for many years).

We all have to die, humans as well as other animals.  The pre-scientific belief systems promise something better afterwards, and if you believe that kind of thing, then lucky you!  But we can at least do our best to make the exit of those enslaved by us a painless one......I had intended again to say a 'humane' one, but that now somehow seems an inappropriate word.  Thinking about the abbatoir, and other aspects of human behavior, puts these issues in stark perspective.