NGS Bioinformatics Workshop
1.2 Tutorial – Sequence Formats, Databases and
Visualization Tools
March 15th, 2012
BioSci room B9242
Facilitator: Richard Bruskiewich
Adjunct Professor, MBB
Learning Objectives
Linux revisited
Quick dive into the Open-Bio pool (BioPython)
A first look at NGS data:
NCBI short read archive
Processing NGS: FASTX tool kit et al.
Visualization: IGV
Files and Permission
 Linux user permissions: owner, group, or others
Owner/user is the person who created the file
“OWNS” the file / directory
Group is a team of people that’s associated together
GROUP project / Team work
Others is just other people on the server
 Each file / directory can have it’s permission set
to (r)ead, (w)rite, or e(x)ecute
chmod: change file permissions
Do a long listing (ls –l)
 dr-x-wxrw- Separated into four sections
(d)(r - x)(- w x)(r w -)
directory or file (-)
user (owner)
chmod o+x foo.txt
 grant ‘execute’ permission to ‘others’ on foo.txt
chmod g-rw foo.txt
 remove ‘read’ and ‘write’ permission from group
chmod ugo+rwx foo.txt  grant all rights to everyone
To change the user/group (‘owner’) of a file:
chmod ubuntu:ubuntu foo.txt
a few useful tips…
 Hitting “tab” will auto-complete file or program names (or
suggest possible names)
 Up arrow will let you return to previous commands
 Editing of text files: “nano” is an easier alternative to “emacs”,
but less powerful
 alternatively, use SSH client to transfer files on your Windows desktop, edit
them in Windows, then transfer back
 BUT: make sure you use a text editor that knows the difference between a
Windows and a Linux text file (e.g. Notepad++)
Some more useful basic Linux commands
“cd” changes your directory, e.g. ‘cd /usr/local’
“man” display manual for command, e.g. ‘man
“pwd” tells you the directory you are currently
in (= working directory)
“history” will list recent commands,
enumerated with line numbers. By; typing an
exclamation point with the line number (e.g.
!123), you can redo the command
Accessing remote servers
“ssh” – Secure Shell
ssh –i private_keypair [email protected]
“scp” – Secure CoPy
ssh –i private_keypair [[email protected]:]sourcefile
[[email protected]:]targetfile
Where user is the account (default: local user)
and host is the internet name of the computer
(defaults: local host)
OpenBio Case Study: BioPython
NGS Bioinformatics Workshop
1.2 Tutorial – Sequence Formats, Databases and Visualization Tools
Linux, MacOSX or Unix only
Get the precompiled binary
wget http://hannonlab.cshl.edu/fastx_toolkit/
tar –xvf
sudo mv bin/* /usr/local/bin
FASTX tool kit I
 FASTQ-to-FASTA converter
FASTQ Information
Shortening reads in a FASTQ or FASTQ files (removing
barcodes or noise).
FASTQ/A Renamer
Collapsing identical sequences in a FASTQ/A file into a single
sequence (while maintaining reads counts)
FASTQ/A Trimmer
Chart Quality Statistics and Nucleotide Distribution
FASTQ/A Collapser
Convert FASTQ files to FASTA files.
Renames the sequence identifiers in FASTQ/A file.
FASTQ/A Clipper
Removing sequencing adapters / linkers
FASTX tool kit II
 FASTQ/A Reverse-Complement
Producing the Reverse-complement of each sequence in a
 FASTQ/A Barcode splitter
FASTA Formatter
Filters sequences based on quality
FASTQ Quality Trimmer
Converts FASTA sequences from/to RNA/DNA
FASTQ Quality Filter
Changes the width of sequences line in a FASTA file
FASTA Nucleotide Changer
Splitting a FASTQ/FASTA files containing multiple samples
Trims (cuts) sequences based on quality
FASTQ Masker
Masks nucleotides with 'N' (or other character) based on
Integrative Genomics Viewer

similar documents