FGRS: protocol


ChIP DNA Primer Design

Last Updated

February 12, 2010 7:04 AM

Primer Selection for ChIP-DNA Validation


You have a transcription factor with which you are working.

You expect that transcription factor to have some number of genomic binding sites.  These are places in the genome where the factor binds and regulates gene expression.  For each of the factors we are studying many sites are known (previous literature and yet more).

You have used the technique of ChIP (chromatin immunopreciptation) to purify DNA fragments that represent the actual in vivo genomic locations of binding.

Would it not be nice to perform some assay that would allow you to make heads or tails of whether your ChIP-purified DNA sample is in any way valid?

Yes, yes it would.

The most direct way to do this is to ask the question:

Does my ChIP-DNA have any fragments in it that were previously shown to be binding sites for the transcription factor I am studying?

Ponder that.  It’s a good question.

The way by which you can address this question is PCR.

Previously Validated Target Selection

Can you just go do PCR, however?  Well, no.  You need primers.

Halt, however.  There are other issues.

We must now decide how to select previously characterized sites of binding

We can then and only then design primers that will allow us to probe our ChIP-DNA sample and determine whether or not we are also detecting these sites of binding.  So, hit the pause button on that primers issue.  We have a binding site decision to make first.

Previous research has shown that your transcription factor of interest binds some targets in a way more statistically believable than others.  We should probably select targets that are the most valid or believable of all possible targets.  Seems like a good plan?

How to do this?

First, go here.

You’ll see something like this:

If you click on the link “Download Location Data” you will see a new page that gives us access to an interesting dump truck of data:

We are interested in the P value data used for analysis.

P values are a statistical quantity that tell us whether an event is likely to have occurred by random freakin’ chance.  The lower the P value the lower the likelihood that the phenomena is by random chance.

Download the data in Excel or Text format.

In this case Excel would be best.

Open this Excel spreadsheet (pvalbygene_forpaper_abbr.xls).

It will look something like this:

This is a lot of data, huh?

What is it saying?

First, have you read the paper?  Do you have any idea what you are looking at, what data you are referencing?  If not, I highly suggest you go read it right now.  Stop working so much, start thinking more.

Okay, feeling confident now?

Across the columns are the many, many transcription factors that were studied in this research.  Down the rows are the > 6000 genes in the yeast genome.  Each cell, therefore, is a P value that says whether or not a particular transcription factor (column) regulates a certain gene (row).

For example, in the the second data column we see the data for the transcription factor ABF1.  A few rows down we see the P value of “6.53E-01” that says NO NO NO the transcription factor ABF1 does not likely regulate the gene FUN14.  Conversely, a few dozen rows down we see a P value of “8.07E-05” that says that ABF1 does likely regulate ACS1.

Get the picture?

This data matrix is wicked useful.  We can sort it by our transcription factor of interest and get at the top of the list the most strongly regulated gene targets.  That’s what we want - to know which genes were shown to be most strongly bound in previous studies.

First, delete the first row.  It’s in my way.

Now, select all.

Now select Data, and Sort.

You will want to tell Excel that the file does indeed have a header row.

See the picture below:

I selected “SWI6_YPD” to indicate that I am interested in my transcription factor SWI6.  I also selected “Ascending” sort value because I want the really crazy low P values on top.

Give your computer a second or two to work on it.  It might not be as cool as mine.

I scroll WAY OVER and see that indeed Excel has done as commanded:

I have really low P values at the top of my list.

This means that if I now look left I can see the genes that were shown to most strongly “likely to be regulated” by this factor (according to this study).

A few samples would be PCL1, YOR246C, SRL1, TOS11, CLN1.

We are now ready to move to the next step - taking these genes and getting actual binding site coordinates.

Binding Sites Location

Open a web browser to SGD.

We now want to search for one of these targets.  Why?  I’ll show you why.

I searched for PCL1 and produced the following screen:

Scrolling down we see:   

Oh my!

Look - binding sites!

Go ahead, click on the map (next to Sequence Information).

This produces something like:

You can mouse-over each of the binding sites to get the exact coordinates of binding.

Watch me.

I now know that near the gene PCL1 there is a SWI6 binding site - a high confidence binding site (remember the low, low, low P value?).  I now know the coordinates of the binding site.

They are: chromosome 14, 87570 to 87577.  How exciting!

We are much closer to our goal.

DNA For this Region

We would now like to get the DNA sequence for this region and design PCR primers in order to amplify it in our ChIP-DNA sample.

You will notice that 87570 to 87577 is only 7 base pairs.  I do NOT think we are going to PCR just 7 base pairs.  We would never be able to see that band clearly on a gel, you know?

So, let’s arbitrarily add 200 to each end.

We will now consider the genomic coordinates to be:

Chromosome: 14

Start: 87570-100 = 87470

Stop: 87577+100 = 87677

We now have a genomic distance of 207.

Let’s look up DNA for this region.

From SGD you can select Gene/Seq Resources to get the following:

Use option 2 to fill out the form and get DNA sequence for this region.

Click on “Submit Form”.

You will now see something like:

If you were to click on “Chromosomal Features Map” you would see:

It is nice to see PCL1 in this region - it means that we did not do something wrong - we are still dealing with the correct region of the genome.

You can use the back button of you browser to go back to the previous page.

Now click on “Design Primers”.

You’ll see something like:

You can copy/paste that DNA sequence to keep it for your records.

This screen also sets the stage for us to design PCR primers.

Designing PCR Primers

Select “PCR” and press “Submit”.

This screen allows us to set the parameters for our PCR primer design:

We are not going to discuss these variables at this time but you can click on the [info] links to learn more.  I encourage you to read and research these parameters.

Leave the defaults as they are for now.

Press “Submit”.

We are now presented with a screen that shows us that PCR primers were successfully found according to these parameters and filters:

Click on the link “This is the BEST pair of primers”.

This is it!  You will want to record ALL of this information.  All of it.

There is one final gotcha, however.

You might not have received that message that primers were successfully found for that region.  This would have looked like:

The error might have been related to GC content as well.

What do to do?

You have two options.  You will need to either select a completely new gene target (and start this process from scratch) or see if another (if there is another) binding site near this gene target works for PCR primer design.  Life is hard.