ChIP-seq is probably the only known technique available today to ‘map’ global occupancy of a protein on the genome. There some important considerations in these studies. The most important concerns have been well-discussed in a recent paper by Tytelman et al. (August, 2009) (here). The most important finding of this paper is the observation that mere presence of chromatin can bias the coverage and detected reads thus twisting the final interpretation. This work shows conclusively that any study of such global patterns will always be marred by inherent chromatin states.
Having said this let us walk through the steps in ChIP-seq analysis. The analysis of ChIP-seq data has two components
- Determining the ‘locations’ of the reads. This can be further extended by generating meta-data for the given location such as: Nearest genes, nearest sturctural and/or functionl segments of the genome etc.
- Generating ‘motif’ for the putative binding site.
The basic steps in analysis of ChIP-seq are
- Determining the locations of the reads and converting this information in a ‘visualizable’ format. As discussed in the previous post, the tools are part of the tool-kit developed by the ABI-SOLiD. This information can be loaded as a ‘Custom Track’ to the UCSC genome Browser (For comprehensive tutorial on how to use the USCS genome browser please visit their home page at the http://genome.ucsc.edu.
- The motif detection is carried out by the MACS tool box. This tool box is decribed by Zhang et al., Genome Biology (here). The paper describes the tool in detail. The help available with the tool is also very good.