HaploReg, RegulomeDB and more on Python programming

HaploReg, RegulomeDB and
more on Python programming
Lin Liu
Yang Li
• HaploReg retrieves the ENCODE annotation for the
selected SNP, as well as other SNPs in LD
• Using the “Set Options” tab, the user can configure
values such as the LD threshold and the population used
from 1000 Genomes data used to calculate LD
Python programming wrap-up
if else
for and while loop
index: starts from 0, different from R
four important data structure:
list: a = [1, 2, 3, 4]; a.append(5)
tuple: a = (‘cat’, ‘dog’); a[0], a[1] = a[1], a[0]
dictionary: a = {‘chr1’:{10254:’G’, 13257:’T’}}; a.keys();
from sets import Set
species = Set([‘hs’, ‘mm’, ‘chimp’])
zoos = Set([‘mm’, ‘wolf’, ‘chimp’])
zoos | species
zoos & species
zoos - species
• Some tricky fact:
– Shallow copy and deep copy
• Shallow copy: a = [1,2,3]; b = a; b[2] = 4; print(a)
• Deep copy:
– from copy import deepcopy
– a = [1, 2, 3]; b = deepcopy(a); b[2] = 4; print(a)
– List comprehension:
• Like in R: loops are slow slow slow
• a = [1, 2, 3]; a = [b + 1 for b in a]; print(a)
• How to read bam (binary) files in python?
– import pybedtools
• How to perform numerical computation in
– import numpy as np
– Include array and matrix calculation, very useful
• How to use shell script in python?
– Get all files in a folder
– import os
– os.listdir(“yourdirectory”)
Object oriented programming
Class and objects in python
class HMM:
#transition_probs[i, j] is the probability of transitioning to state i from state j
#emission_probs[i, j] is the probability of emitting emission j while in state i
def __init__(self, transition_probs, emission_probs):
self._transition_probs = transition_probs
self._emission_probs = emission_probs
def emission_dist(self, emission):
return self._emission_probs[:, emission]
def num_states(self):
return self._transition_probs.shape[0]
def transition_probs(self):
return self._transition_probs
Interface with other programming
• Rpy: R and python interface
• cygwin: python and C interface
• When to use python?
– Text manipulation
– Some simple machine learning implementation
(like using matlab)
– Some very well-written package available: PyStan
(Bayesian MCMC sampler), matlablib, pybedtools
• When not to use python:
– Large scale simulation: most often you cannot get
rid of loops
– Statistical analysis: R is much better and well
– Best strategy: C interface python
Some good reference code for python
• Check MACS14 python script
• You can learn how to write a python script
into an executable software from MACS14

similar documents