SNP detection help file

Probe selection program

The arguments

Chromosome/Coordinates - the chromosome and location of the target region for the SNP
Exons only - choose to span the probes only on annotated exons, or evenly across the target region
Strand - probes can be designed to span either the (+), (-), or both strands
Length of probes - the oligo lenngth (in bp) of the overlapping probes *
Number of probes - the number of probes available for the experiement (estimated**)

* To speed up the probe selection process, 50bp lengths for all chromosomes have been pre-cached. Other oligo lengths may not be pre-cached and will take longer to compute. Experimental results indidate that 50bp is recommended. In addition, longer probes may go beyond the 150 cycle cutoff used in the array synthesis step and will not be included in the final list of probe sequences.
** The probes are distributed to maximally cover a target region. The algorithm tries to use the number entered here but the process of applying the filters, effects of roundoff error and spacing around edges, the final number will diverge slightly
* * *

Range confirmation

The application will apply the filters of removing probes in repeat regions: regions with sequences going beyond the 150 cycle cutoff (set by the chip manufacturing process), and intergenic regions if selected. From there it will display the final coverable range that the probes can cover, and will return the average spacing between the probes to maximally cover the region. Pressing the Synthesize Probes button will create the oligo file (using the filename specified) making it available in gzip format.

The application will indicate the spacing of the probes. Any spacing beyond 5 base pairs is not considered reliable for the experiement and will not allow you to continue unless some of the parameters are changed. The smallest spacing between each probe is a single base pair. When that happens, a warning appears as it is likely that the number of probes will not all be utilized.

* * *

The oligo file

Once the ranges have been confirmed and accepted, the application creates a downloadable oligo file. The oligo file lists the probes to be used in the experiment, formatted to be compatible with Nimblegen. The application uses each of the columns as the following:
SEQ_IDM - the chromosome name
PROBE_ID - a unique 16 character identifier of the probe made up of: Alias (4), 00 for + strands and 99 for - strands (2), probe 'P' (1), and the probe position (9 digits) (same as next column)
POSITION - position of the probe in relation to the reference genome (the left most base)
PROBE_SEQUENCE - probe sequence
DESIGN_NOTE - the application uses this to indicate any curated exons that are present, separated by commas

* * *

Upload Feature

The upload feature has been implemented to allow users to upload their own fasta files. The functionality is limited as the annotated files used to remove repeats and select for exons are not available. The only filters applied is the probe sequences that exceed the cycle calculation provided by the user.
Uploaded files can be zipped (.gz or .zip).
* * *
SNP probe selection program:
http://hokkaido.bcgsc.ca/SNPdetection/cgi-bin/probeselection.cgi


log2 ratio calculator program

The .pair files

The program requires two files, one for the sample and one for the reference.
Outputs a file in CSV format (comma separated values) which contain the probes used and the raw intensity scores.
The program can also accept the original oligo file to add additional annotation to the CSV file. Namely, the Design_note column will be added containing the names of any exons that the probe covers. Note that like the upload feature above, uploaded files can be zipped (.gz or .zip).

* * *

The .csv file

This is the output of the program containing the log2 ratios of the nomalized intensity scores taken from the sample and reference. The application outputs uses each of the following columns:
Chromosome - the chromosome name (same as the SEQ_IDM value in the oligo file)
Position - position of the probe in relation to the reference genome
Log2Intensity - the raw log2 ratio calculated from the sample and reference intentisities
Log2Ratio - the log2 ratio from the sample and reference normalized over the entire dataset
Strand - the strand of the probe, '1' corresponds to the (+) strand while '-1' corresponds to the 9-0 strand
Design_note - an optional column containing any extra notes, usually containing the names of any exons tha are spanned by the probe

log2 ratio calculator program:
http://hokkaido.bcgsc.ca/SNPdetection/cgi-bin/ratiocalculator.cgi


Back to the front page
http://hokkaido.bcgsc.ca/SNPdetection