the

BIOL 6330 Unit 2 2 3 Species v Gene History

this discussion serves as kind of a

segue or a transition

between our phylogenetics unit which is

most of unit 1 and

overlaps a little bit here in unit 2

towards moving on to other molecular

evolution topics

i know many of you are going to be sad

that we're leaving phylogeny behind

but you know it's important we've got

other things to do

so a little bit of this is review

looking back reminders

and then kind of a final couple

important concepts as we look

at an overall big picture of

relationships among

organisms or among different species

right because that's what a phylogeny is

it's trying to reconstruct something

that goes all the way back to the very

heart of evolution

that all living organisms share common

ancestry

and so phylogeny is attempt to

understand

map out and diagram all of that shared

history

and so you know this already but if a

group of organisms is monophyletic

then we know that they share a single

common ancestor that is unique from all

of the other ancestors to all the other

living organisms

and that says something unique about

that group and so similarities that we

see across that group are widely

distributed from the

within that group almost certainly came

from that ancestor and so they're unique

to that group

many groups are classified and then

named

based on the characteristics that define

them

as monophyletic and today we have a

large genetic databases to either

support

many of those things which have been

understood sometimes even for hundreds

of years

but also to revise and discover areas

where people made mistakes

and today we're going to be talking a

little bit about why some of these past

things were mistakes

and so when a non-monophyletic group is

put together

in this case paraphyletic right because

we have two very distantly related

branches and we have to cut out

all these close relatives before we get

down to the common ancestor so that's

sorry that's polyphyletic right because

there's more than one

clade that we'd have to cut out so

non-monophyletic polyphyletic in this

case

sometimes those were classified in the

past and they were mistakes we're going

to look at two examples of

non-monophyletic groups

and talk about why so the first is maybe

the most famous very very common

a paraphyletic group quote unquote the

reptiles

the crocodiles lizards turtles maybe we

throw in other ones right snakes are

closely related to lizards

right if you look at it actually it

turns out that lizards are a

paraphyletic group with respect to

snakes because snakes are just really

highly evolved lizards and are more

close

related to some lizards than others and

we see that pattern over and over and

over again

here we have birds which are just really

highly evolved reptiles

and should have been included in the

original classification for reptiles

but they were not and so reptiles is

paraphyletic because we left out

that branch birds more closely related

to crocodilians

than they are to the other living

reptiles and so we can't have a group

like that

without including birds in it okay

hopefully you understand that that's a

bit of a review

and of course the reasons for this are

pretty obvious there have been

long branches lots of evolution and many

morphological features

along the bird branch right so birds

look very very different from their

ancestors if you're to put a crocodile

and a bird find their common ancestor

crocodiles have many more overall

morphological similarities

to that common ancestor lizards right

same thing morphological similarities

snakes despite using

losing their legs still have many

morphological similarities to that

common ancestor

but birds have evolved very very rapidly

and have lost some of those

morphological features

this by the way is one of my favorite

birds a shoebill stork look them up

they're very very cool

and if anyone ever says well i don't

believe birds are reptiles or maybe

even better right birds are dinosaurs

because birds are more closely related

to the

raptor dinosaurs t-rex velociraptor

all of those two-legged carnivorous

dinosaurs birds are just

a close relative of those and so they

really are not only reptiles they're

also

dinosaurs the only living group of

dinosaurs to survive

and if anyone doesn't believe you show

them the shoe bill stork right it's

quite

phenomenal this is one that looks like

he's about to eat you

okay but birds were not recognized as

reptiles number one because they evolved

and look

quite different from their ancestors so

rapid morphological evolution

and then number two also many of the

lineages that would have helped us to

connect them and include them

into the reptiles those lineages were

extinct and it's not until recently that

we've had an

extensive enough of a fossil record to

find support for this bird dinosaur

and thus bird reptile hypothesis

but the molecules the molecular data

very very strongly support that and

people were beginning to figure it out

because of fossil evidence

and re-evaluation of morphological

characters even before we had

well-supported molecular phylogenys

but the fact that we do just is kind of

like extra

support for this idea and so we need to

make decisions about higher level

taxonomy

once we have well-supported phylogenys

and so we revise it

and so today the group reptilia although

it's kind of more less of a formal term

we probably would say diopsids

now includes the birds so we didn't just

get rid of it we now recognize and we

didn't get rid of the term dinosaurs

because it was well known we just now

recognize that

birds are part of the dinosaur group and

in fact you might hear people refer to

the non-avian

dinosaurs that's everything else except

the birds it's kind of a long

clunky way but that's kind of the old

school traditional dinosaurs or really

just

most of the dinosaurs minus the birds

okay so just be aware of that this is

important

now a quick word that's not really

related to phylogenetics but i guess

it's tangentially related

higher level taxonomies i hope you

realize are not

equivalent so if i look at a genus of

insects

and a genus of birds those even though

those two groups have the same name

they're both

genera right that's the plural for genus

they're really not comparable and

equivalent

so really higher level divisions are

monophyletic groups but as far as

how many or how old how many species are

included in or how old a genus is

there's no rule

it's really just whatever is convenient

convenient then we have genus

a family uh order class all the way up

right and then of many subcategories in

between

but realize that although species are a

real biological

population a phenomenon a good

classification although you can't find a

single

definition for species that seems to be

a real thing that we can at least

to a large extent compare one species to

another

genera families orders classes all the

other higher ones really are

are not objective criteria it's like

there's no

set of rules for what makes a genus it

can be one species it can be tens of

thousands of species

so just be aware of that now one other

last example

the old classification of vultures was

polyphyletic so as we begin to evaluate

re-evaluate morphological characters

and evaluate all of the vast sources of

molecular data that came

in it became very apparent rather

quickly

that the old group vultures was not

valid

the old world vultures like you see in

the lion king or in those african nature

videos right in africa and asia and

europe

those old world vultures

are closely related to the other birds

of prey the eagles the hawks the owls

okay so they're part of that raptor

group but the new world vultures despite

looking very similar to the old world

vultures

are only very distantly related and

there are some morphological clues to

this which we don't need to worry about

but the molecular data very strongly

supports this

and it turns out the new world vultures

like we see flying around out here the

black vultures

right buzzards turkey vultures or the

california condor the the andean condor

those are actually more closely related

to storks than they are to the birds of

prey

and in this case it was convergence

right so they're

filling similar niches they're both

eating decayed material so they have

lots of feathers on their head to

prevent

accumulation of the nasty stuff they're

eating in their feathers

they have that hooked beak for ripping

meat

and so we see that uh those many

converging characteristics which

originally

allowed or tricked scientists into

putting them in the same group and today

these are the vulturidae these have been

given a new family name and so we've

we've just had to break them up and

revise

our taxonomy right so this was such a

bad representation because

of convergence right where with the

birds it was a little different it was

loss of ancestral characteristics or

plesiomorphies that made it difficult to

put the birds in with the reptiles

there are other groups too that are

largely recognized as

grossly or very strongly polyphyletic so

for instance

sometimes when people are trying to

organize

like a genus of insects where they're

just so many different species

if they have a genus or a species within

a genus let's say

that is let's do this let's say a genus

within a family

so they have a genus they know it's in

this family they have a species but they

can't decide where to put it

they don't know if it should go in genus

a b c or d

they will often have kind of what is

like a trash can

a genus right a genus said ah if we

don't know we'll just stick it in there

and try and figure it out later

later so sometimes these polyphyletic

groups are recognized and say oh we just

don't have enough data to fix it yet but

someday

we'll fix it so be aware of that and

it's an ongoing process of organizing

grouping classifying and naming

all of these organisms and that's the

goal of

systematics and phylogeny is a very

critical important part of systematics

all right now there are three reasons

and we're going to list them i don't

have them numbered here but you should

write them down in your notes

there are three reasons why a gene

history

might not match a species history and by

and large we've been kind of just

you know inferring assuming that it

would that if we can accurately figure

out gene histories putting together

these matrix

matrices and figuring out all these

complex ways to do phylogenys

that if we use those genes they will

give us the species history

and by and large that's true but there

are some exceptions that we need to be

aware of

the first one is gene duplication events

and you are already very clearly aware

of this

so if we have a gene duplication event

it means we have

paralegal copies or duplicated copies of

an ancestral gene and initially they're

identical they start to accumulate

mutations and become different but

they're still recognizable

and can be readily aligned to each other

in a

dna matrix so this could be alpha

hemoglobin and beta hemoglobin still

very similar and they accumulate more

and more mutations as they evolve

separately

because they are no longer a single copy

but think about this

so if this is the true set of

relationships species a is more closely

related to species b

than it is to species c this is you know

the out group for those two species

what would happen if we didn't recognize

this was a gene family with two copies

and we sampled gene gene 2

and gene 3.

we just randomly sample those genes we

amplify them by pcr or something

look at the relationships between gene 4

gene 2

and gene 3. if you trace them back 2 and

3 are down this

smaller line there is where 2 and 3 3

share a common answer

ancestor so gene 2 and 3 are more

closely related to one another

than they are to gene 4 which doesn't

connect back in until this gene

duplication event

so that's just one example how if we

don't recognize

the problem of parallelogy in these

multi-copy genes and if we have

bad representation then it becomes a

problem and our gene history

may not reflect our species history now

today

with large sampling and better

understanding of all of the duplication

events and the big

families we we can recognize these much

much more easily and it's kind of cool

because we can trace

we can map copies back onto a phylogeny

like this and see

where they've occurred and what we found

is that for some genes duplication is

rampant

so just as an example and we'll probably

look at this when we look at the

molecular evolution near the end of the

semester

but opsins in insects are an incredibly

diverse group

family of genes with multiple

duplications and multiple losses over

and over and over again it's a very very

twisted and complex story that we have

to tell okay so number one

reason why gene history will not match

species history

is if we have gene duplication events

and we're mixing up paralegal and

orthologous copies of genes

okay reason number two is called

xenology

if you've had the introductory evolution

course you're already familiar with this

at least with this term

synology literally means ancient

relationships or ancient association

um sorry not ancient alien alien

relationships or alien association

zenos is greek root word for for alien

and this means that we have a gene

that has been transferred over from a

very very distant relative

and when i say relative realize that

that's

a term that you can use for any organism

right

if i say oh look at that bacteria it's

your relative that's true very ancient

billions

of years ago relative but it's still a

relative but notice that

in bacteria this is very common it

happens all the time and so they and

also the archaea

so prokaryotes very readily transfer dna

from one

species to maybe even a very distantly

related species

and it happens in eukaryotes also but

less frequently so it's not as much of

an issue with eukaryotic relationships

but it happens so much among the

bacteria in archaea

that it really makes a phylogenetic

analysis of these group groups very very

difficult in fact some would say even

not possible in fact you want to try to

infer

more of a network-like pattern like this

where it looks like a web

or a a net rather than a simple

phylogeny like we have over in the

eukaryotes and here again they try to

put one

rare horizontal gene transfer event they

do happen in eukaryotes sometimes

viral mediated but they're rare enough

that they are exception to the rule and

usually don't have much of an impact on

all

at all on figuring out the phylogeny of

this group

but for the eukaryote for the

prokaryotes it's rampant

and so if we have a horizontal gene

transfer event we suddenly have a

bacteria that

can trace most of its genetic history

down one lineage but it shares some of

its genetic history down another lineage

so we need more than one

branch coming in to our quote unquote

tree which is now really more like a

network

right so to represent a network or to

represent relationships among

prokaryotes we need a network

or a web-like pattern okay so

there are different methodologies we're

not going to talk in detail about these

we might mention them briefly

and come back to them during our mole or

our population genetics because

realize that this network-like pattern

applies not only to phylogenys among

different bacteria species

but it also applies to relationships

within a sexually redu

reproducing species so you look at you

me and say wait

what do you mean well if you trace your

own lineage back

far enough it might be six generations

it might be 100 generations

but if you do that you're going to find

that your paternal lineage

and your maternal lineage

sorry let me shut this down so we don't

okay so if you trace your maternal and

your mother's ancestors and your

father's ancestors eventually you will

find that they connect in at the same uh

you'll have

your great great great grandfather on

one side might be your great great maybe

three times more great grandfather on

the other side of the family also and so

we get that reticulation

which is a technical term or that

network-like pattern so

networks are better ways to show

relationships between species of

prokaryotes

they are also better ways to show

relationships within

a sexually reproducing species like a

phylogeny or not a phylogeny sorry a

genealogy of individuals in the human

species and i use

one of the european royals because of

all of the inner breeding that occurred

in royals they they tend to have this

network like pattern much more

strongly and much more recently among

their ancestors

because you know if you marry your your

uh cousin

right or if you marry your niece which

happened actually more frequently than

you might think among

many royal groups then suddenly

your kids and any kids you have are

going to have that close connection on

both sides of the family okay

so just realize that so remember number

one gene duplication number two is

synology

and this pattern that's produced by

zenology network like pattern can also

be produced within sexually reproducing

uh organisms and that makes

relationships within

a sexually reproducing species harder to

figure out and

more applicable to a network-like

pattern than a true phylogeny

okay now finally um

is the third reason is what we were

going to call lineage sorting

and to understand linear sorting we need

to understand this idea of coalescence

coalescence is when we have different

um alleles and a wider range of alleles

and it may take a while for them to

coalesce into a single

version of that allele either through

natural selection or through genetic

drift they will

eventually so if coalescence and we'll

talk about coalescence at a little

later time a little bit more it has to

do with the molecular clock also

but if speciation events are rapid and

coalescence is slow

we could have multiple alleles this

diagram is not the best diagram to

illustrate this

in fact let me grab another one

okay this is a better image one you're

actually kind of familiar with we

introduced this in a very early lecture

in the class

and we looked at how when we have

multiple alleles originally they're all

the same

we have a mutation now we have genetic

diversity and if it's a good allele or

if genetic drift randomly chooses it we

end up with

a fixation of that allele this process

if we look at in reverse is also

coalescence

where all of the diversity represented

in this population eventually coalesces

into and we can get back to one

single ancestral allele we may have to

go a little bit farther back but i think

maybe that allele right there

represents oh no we've got to go farther

back and maybe even more distantly

but we find the ancestor of all of these

alleles

now what would happen if in this time

period when they speciate there's

diversity

this one carries some black and red and

eventually red becomes fixed

this one carries some black and red but

then maybe either due to a change in the

environment and a switch in natural

selection

or again sometimes genetic random chance

what if we fix the black allele

instead of the red allele right if that

occurred then this bcc would have only

the black allele dating back to these

guys maybe a few mutations here and

there but still that functional black

allele these guys would have all the

black allele

and these guys would have the allele

represented by the red dot

and so we if we were only using this

gene

we might infer that specie c is more

closely related to species a

even though it's more close related to

species b so that is called lineage

sorting and it's the third reason why

gene history

may not match species history all right

so again when speciation events are

rapid and coalescence is long and drawn

out

then it's more likely for lineage

sorting to occur now

usually lineage sorting is only a

problem among a few genes but there are

those cases where

really really rapid speciation makes

linear sorting more common

and so groups of organisms that speciate

very very rapidly like

the early birds there seemed to be this

explosion of species and many new groups

popping up very rapidly when birds first

evolved

it makes it very difficult to figure out

the relationships because there's a high

degree of lineage sorting

and so that can make it tricky to find

the true species relationships so again

review

reason number one is gene duplication

and the problems with orthology and

porology

reason number two is horizontal gene

transfer which we call

xenology reason number three is lineage

sorting

which occurs when genetic diversity is

carried through into populations

and then fixes in a pattern that's

different than the

relationships between the species okay

and again this is not the

best drawing but they're showing how you

have extinction these are supposed to

represent alleles

but they're descendants of each other

anyway it's a bad drawing so use the

other drawing okay

but the linear story in the key amount

of time is how long it takes for

speciation to occur

and how long it takes coalescence or

allele fixation

if speciation is fast and coalescence or

allele fixation is slow

you'll get lots lots of lineage sorting

and that makes phylogenys difficult to

figure out

okay now the last thing i'm going to

talk about we won't go in in detail

but i want to talk about consensus trees

consensus tree is a way to take two or

more trees

and summarize them in a single tree that

that summarizes all of the relationships

this is a strict consensus tree which

means that we only show relationships

that are present

in all of the trees that we're trying to

summarize so notice in both trees the b

which is stands for baboon

is sister group to the rest so in our

consensus tree we put baboon as sister

group to the rest

orangutang o is sister group to the

remaining three so we put that there

but then tree number one and tree number

two disagree about relationships within

this this is human chimpanzee gorilla

human chimpanzee gorilla

and so because they disagree we collapse

those down and show yeah

they're separate from these other two

but we don't know the relationships

within this group so we show it as a

polytomy

right so sometimes a polytomy represents

disagreement between maybe different

analyses

or it could also be that if we have one

character that supports here and one

character that supports here parsimony

can't

decide equal characters mapped onto

different branches

would also result in that but a

consensus tree is an overview

okay and there are different ways to do

consensus trees

i'm not going to make you do consensus

trees so you won't have to figure them

out if i give you subsets of trees

but i do want you to recognize them and

you'll see them frequently in the

literature

we might take a parsimony tree and a

maximum likelihood tree and a bayesian

tree

and then summarize them in a strict

consensus and then collapse down nodes

where they don't agree

so you should be able to look at these

three trees and see how we get there

we only show relationships that are

present and really the only relationship

that's present in all three trees

is this abc group which although

within it there are different

relationships a b and c a b and c a b

and c are all part of a monophyletic

group in all three

so we represent that here but then

collapse the node down because there's

one tree that

disagrees with the other ones two of

them say a b one of them says

no it's bc that are most closely related

and so people realize this was maybe a

little bit too rigorous

and that it didn't accurately reflect

because sometimes we have multiple trees

that that agree and just one of them

disagrees we still have to collapse it

down under a strict consensus tree

and so a majority rule consensus tree

does not collapse everything down but it

puts numbers showing how many of tree

how many of the constituent trees going

into the consensus tree

how many of them agree so in this case 2

out of 3 or 66 this technically should

be 66.666

but two-thirds of the trees support that

a b

here and here so we put a six seven and

that one was a hundred so we know all

the trees support that group

and this one with d being more closely

related to a b and c that node right

there

is again two out of the three trees

which is in this tree and this tree

and notice that even though the d being

related to a b and c is in tree one and

two

and the a b together is in tree one and

three so different sets of trees it

doesn't matter we're just kind of going

with majority

rules it doesn't matter which trees they

appear in and so you will see consensus

trees all the times as ways to summarize

different analyses may be done with

different genes

or trees that were

done with different phylogenetic methods

and so in a way a consensus tree is kind

of a very early

way to do super trees it's not quite a

super tree but it's maybe a

same similar idea where we're trying to

summarize multiple analyses into one

with a little bit of a formal analysis

but

it's not a very tricky thing to do

and you can even do it by hand for these

smaller genes

sorry for these smaller trees