171 lines
5.3 KiB
Plaintext
171 lines
5.3 KiB
Plaintext
CS 325, Algorithms (MS/MEng-level), Fall 2019
|
|
|
|
HW10 - Challenge Problem - RNA Structure Prediction (6%)
|
|
This problem combines dynamic programming and priority queues.
|
|
|
|
Due Wednesday 12/4, 11:59pm.
|
|
No late submission will be accepted.
|
|
|
|
Include in your submission: report.txt, rna.py.
|
|
Grading:
|
|
* report.txt -- 1%
|
|
* 1-best structure -- 2%
|
|
* number of structures -- 1%
|
|
* k-best structures -- 2%
|
|
|
|
Textbooks for References:
|
|
[1] KT Ch. 6.5 (DP over intervals -- RNA structure)
|
|
[2] KT slides: DP I (RNA section)
|
|
http://www.cs.princeton.edu/~wayne/kleinberg-tardos/
|
|
|
|
***Please analyze time/space complexities for each problem in report.txt.
|
|
|
|
1. Given an RNA sequence, such as ACAGU, we can predict its secondary structure
|
|
by tagging each nucleotide as (, ., or ). Each matching pair of () must be
|
|
AU, GC, or GU (or their mirror symmetries: UA, CG, UG).
|
|
We also assume pairs can _not_ cross each other.
|
|
The following are valid structures for ACAGU:
|
|
|
|
ACAGU
|
|
.....
|
|
...()
|
|
..(.)
|
|
.(.).
|
|
(...)
|
|
((.))
|
|
|
|
We want to find the structure with the maximum number of matching pairs.
|
|
In the above example, the last structure is optimal (2 pairs).
|
|
|
|
>>> best("ACAGU")
|
|
(2, '((.))')
|
|
|
|
Tie-breaking: arbitrary. Don't worry as long as your structure
|
|
is one of the correct best structures.
|
|
|
|
some other cases (more cases at the bottom):
|
|
|
|
GCACG
|
|
(2, '().()')
|
|
UUCAGGA
|
|
(3, '(((.)))')
|
|
GUUAGAGUCU
|
|
(4, '(.()((.)))')
|
|
AUAACCUUAUAGGGCUCUG
|
|
(8, '.(((..)()()((()))))')
|
|
AACCGCUGUGUCAAGCCCAUCCUGCCUUGUU
|
|
(11, '(((.(..(.((.)((...().))()))))))')
|
|
GAUGCCGUGUAGUCCAAAGACUUCACCGUUGG
|
|
(14, '.()()(()(()())(((.((.)(.))()))))')
|
|
CAUCGGGGUCUGAGAUGGCCAUGAAGGGCACGUACUGUUU
|
|
(18, '(()())(((((.)))()(((())(.(.().()()))))))')
|
|
ACGGCCAGUAAAGGUCAUAUACGCGGAAUGACAGGUCUAUCUAC
|
|
(19, '.()(((.)(..))(((.()()(())))(((.)((())))))())')
|
|
AGGCAUCAAACCCUGCAUGGGAGCACCGCCACUGGCGAUUUUGGUA
|
|
(20, '.(()())...((((()()))((()(.()(((.)))()())))))()')
|
|
|
|
2. Total number of all possible structures
|
|
|
|
>>> total("ACAGU")
|
|
6
|
|
|
|
3. k-best structures: output the 1-best, 2nd-best, ... kth-best structures.
|
|
|
|
>>> kbest("ACAGU", 3)
|
|
[(2, '((.))'), (1, '(...)'), (1, '.(.).')]
|
|
|
|
The list must be sorted.
|
|
Tie-breaking: arbitrary.
|
|
|
|
In case the input k is bigger than the number of possible structures, output all.
|
|
|
|
Sanity check: kbest(s, 1)[0][0] == best(s)[0] for each RNA sequence s.
|
|
|
|
All three functions should be in one file: rna.py.
|
|
|
|
See more testcases at the end.
|
|
|
|
Debriefing (required!): --------------------------
|
|
|
|
0. What's your name?
|
|
1. Approximately how many hours did you spend on this assignment?
|
|
2. Would you rate it as easy, moderate, or difficult?
|
|
3. Did you work on it mostly alone, or mostly with other people?
|
|
4. How deeply do you feel you understand the material it covers (0%-100%)?
|
|
5. Any other comments?
|
|
|
|
This section is intended to help us calibrate the homework assignments.
|
|
Your answers to this section will *not* affect your grade; however, skipping it
|
|
will certainly do.
|
|
|
|
|
|
TESTCASES:
|
|
|
|
for each sequence s, we list three lines:
|
|
best(s)
|
|
total(s)
|
|
kbest(s, 10)
|
|
|
|
|
|
|
|
ACAGU
|
|
(2, '((.))')
|
|
6
|
|
[(2, '((.))'), (1, '.(.).'), (1, '..(.)'), (1, '...()'), (1, '(...)'), (0, '.....')]
|
|
------
|
|
AC
|
|
(0, '..')
|
|
1
|
|
[(0, '..')]
|
|
------
|
|
GUAC
|
|
(2, '(())')
|
|
5
|
|
[(2, '(())'), (1, '()..'), (1, '.().'), (1, '(..)'), (0, '....')]
|
|
------
|
|
GCACG
|
|
(2, '().()')
|
|
6
|
|
[(2, '().()'), (1, '(..).'), (1, '()...'), (1, '.(..)'), (1, '...()'), (0, '.....')]
|
|
------
|
|
CCGG
|
|
(2, '(())')
|
|
6
|
|
[(2, '(())'), (1, '(.).'), (1, '.().'), (1, '.(.)'), (1, '(..)'), (0, '....')]
|
|
------
|
|
CCCGGG
|
|
(3, '((()))')
|
|
20
|
|
[(3, '((()))'), (2, '((.)).'), (2, '(.()).'), (2, '.(()).'), (2, '.(().)'), (2, '.((.))'), (2, '((.).)'), (2, '(.(.))'), (2, '(.().)'), (2, '((..))')]
|
|
------
|
|
UUCAGGA
|
|
(3, '(((.)))')
|
|
24
|
|
[(3, '(((.)))'), (2, '((.).).'), (2, '((..)).'), (2, '(.(.)).'), (2, '((.))..'), (2, '.((.)).'), (2, '.((.).)'), (2, '.((..))'), (2, '((..).)'), (2, '((.)..)')]
|
|
------
|
|
AUAACCUA
|
|
(2, '.((...))')
|
|
19
|
|
[(2, '((.)..).'), (2, '(()...).'), (2, '()(...).'), (2, '().(..).'), (2, '()....()'), (2, '.()(..).'), (2, '.()...()'), (2, '.(.)..()'), (2, '.((...))'), (2, '.(.(..))')]
|
|
------
|
|
UUGGACUUG
|
|
(4, '(()((.)))')
|
|
129
|
|
[(4, '(())(.)()'), (4, '(()((.)))'), (3, '(().)..()'), (3, '(().).(.)'), (3, '(().)(..)'), (3, '((.))..()'), (3, '((.)).(.)'), (3, '((.))(..)'), (3, '(())(..).'), (3, '(())(.)..')]
|
|
------
|
|
UUUGGCACUA
|
|
(4, '(.()()(.))')
|
|
179
|
|
[(4, '((()).).()'), (4, '((.)()).()'), (4, '(.()()).()'), (4, '.(()()).()'), (4, '.(()()(.))'), (4, '((()).(.))'), (4, '((.)()(.))'), (4, '((()())..)'), (4, '(.()()(.))'), (3, '((()).)...')]
|
|
------
|
|
GAUGCCGUGUAGUCCAAAGACUUC
|
|
(11, '(((()()((()(.))))((.))))')
|
|
2977987
|
|
[(11, '(()())(((()().))(((.))))'), (11, '(()())(((()()).)(((.))))'), (11, '(()())(((()(.)))(((.))))'), (11, '(()()()((()(.)))(((.))))'), (11, '(((()()((()().)))((.))))'), (11, '(((()()((()(.))))((.))))'), (11, '(()()()((()()).)(((.))))'), (11, '(()()()((()().))(((.))))'), (11, '(((()()((()()).))((.))))'), (10, '(()()()((()().).)((.))).')]
|
|
------
|
|
AGGCAUCAAACCCUGCAUGGGAGCG
|
|
(10, '.(()())...((((()()))).())')
|
|
560580
|
|
[(10, '.(()())...((((())())).)()'), (10, '.(()())...((((()()))).)()'), (10, '.(()())...(((()(()))).)()'), (10, '.(()())...(((()(()))).())'), (10, '.(()())...((((())())).())'), (10, '.(()())...((((()()))).())'), (9, '((.).)(...(.((()()))).)()'), (9, '((.).)(...(((.)(()))).)()'), (9, '((.).)(...(.(()(()))).)()'), (9, '((.).)(...((.(()()))).)()')]
|
|
------
|