awk

-d delimiter argument here it is space (" "), and n is should be replaced with a number of max number of columns to be printed.

cut -d ' ' -f 1-n file | head

Subset analysis results for a set of rsIDs

Following code subsets the result files for a set of rsIDs. File2 is the file only contatining the list of SNPs to be subsetted. File1 is the main file (e.g. result files) to be subsetted. Code looks for mathcing elements in column 2 of File1

awk 'NR==FNR{pats[$0]; next} $2 in pats' File2 File1

Exclude certain rows on the condition upon certain colums

Following code excludes any lines that satisfy the conditional $3=="chr6" && 30000<$4 && $4<50000 and outputs all fields for all other rows. The conditional $3=="chr6" && 30000<$4 && $4<50000 means: “anything in column 3 that is chr6 that ranges from 30000 to 50000

awk 'NR==1 || !($3=="chr6" && 30000<$4 && $4<50000)'  file

To return only the selected fields from the operation above simply define the columns you want to be outputted with {print $}.

awk 'NR==1 || !($3=="chr6" && 30000<$4 && $4<50000) {print $1, $2}'  file

Code below subsets for the first 10 columns. -d also defines the delimiter type of the file, which is space in this example.

cut -d " " -f 1-10 file

ifelse examples

See some nice examples here

Other operations

Nested loop example

Following code was written to generate example data sets for the gwasurvivr package. It subsets the file chr1_good_snps.impute for rows (snps) and for samples (n). Also this shows how to do calculations on unix and assign it to a new variable. In this case let function is used to define col, which is the fields (columns) of the impute file. 5 is for the first 5 columns of the file defining the snps, and *3 is for each genotype probability of a given sample. Note: Be careful with the format of the file name. Do not use _ after the for loop variables, it doesn’t print the expected file name. (quite likely _ has a syntax meaning…)

for snps in 100 1000 10000 100000 150000
do
    for n in 100 1000 5000
    do
        let col=$n*3+5 
      head -$snps chr1_good_snps.impute | cut -d " " -f 1-$col > n$n.snp$snps.chr1.impute
      echo "snp $snp n $n "
    done
done

echo "All Done!"
exit

Change the file permissions to executable

chmod +x filename