Compare commits

...

7 Commits

19 changed files with 2414 additions and 23 deletions

View File

@ -0,0 +1,119 @@
CS 325-001, Analysis of Algorithms, Fall 2019
HW1 - Python 3, qsort, BST, and qselect
Due electronically on flip on Monday 9/30 at 11:59pm.
No late submission will be accepted.
Need to submit on flip: report.txt, qsort.py, and qselect.py.
qselect.py will be automatically graded for correctness (1%).
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw1 qselect.py qsort.py report.txt
Note:
1. You can ssh to flip machines from your own machine by:
$ ssh access.engr.oregonstate.edu
2. You can add /nfs/farm/classes/eecs/fall2019/cs325-001/ to your $PATH:
$ export PATH=$PATH:/nfs/farm/classes/eecs/fall2019/cs325-001/
and add the above command to your ~/.bash_profile,
so that you don't need to type it every time.
(alternatively, you can use symbolic links or aliases to avoid typing the long path)
3. You can choose to submit each file separately, or submit them together.
Textbooks for References:
[1] CLRS Ch. 9.2 and Ch. 12
0. Q: What's the best-case, worst-case, and average-case time complexities of quicksort.
Briefly explain each case.
1. [WILL BE GRADED]
Quickselect with Randomized Pivot (CLRS Ch. 9.2).
>>> from qselect import *
>>> qselect(2, [3, 10, 4, 7, 19])
4
>>> qselect(4, [11, 2, 8, 3])
11
Q: What's the best-case, worst-case, and average-case time complexities? Briefly explain.
Filename: qselect.py
2. Buggy Qsort Revisited
In the slides we showed a buggy version of qsort which is weird in an interesting way:
it actually returns a binary search tree for the given array, rooted at the pivot:
>>> from qsort import *
>>> tree = sort([4,2,6,3,5,7,1,9])
>>> tree
[[[[], 1, []], 2, [[], 3, []]], 4, [[[], 5, []], 6, [[], 7, [[], 9, []]]]]
which encodes a binary search tree:
4
/ \
2 6
/ \ / \
1 3 5 7
\
9
Now on top of that piece of code, add three functions:
* sorted(t): returns the sorted order (infix traversal)
* search(t, x): returns whether x is in t
* insert(t, x): inserts x into t (in-place) if it is missing, otherwise does nothing.
>>> sorted(tree)
[1, 2, 3, 4, 5, 6, 7, 9]
>>> search(tree, 6)
True
>>> search(tree, 6.5)
False
>>> insert(tree, 6.5)
>>> tree
[[[[], 1, []], 2, [[], 3, []]], 4, [[[], 5, []], 6, [[[], 6.5, []], 7, [[], 9, []]]]]
>>> insert(tree, 3)
>>> tree
[[[[], 1, []], 2, [[], 3, []]], 4, [[[], 5, []], 6, [[[], 6.5, []], 7, [[], 9, []]]]]
Hint: both search and insert should depend on a helper function _search(tree, x) which
returns the subtree (a list) rooted at x when x is found, or the [] where x should
be inserted.
e.g.,
>>> tree = sort([4,2,6,3,5,7,1,9]) # starting from the initial tree
>>> _search(tree, 3)
[[], 3, []]
>>> _search(tree, 0)
[]
>>> _search(tree, 6.5)
[]
>>> _search(tree, 0) is _search(tree, 6.5)
False
>>> _search(tree, 0) == _search(tree, 6.5)
True
Note the last two []'s are different nodes (with different memory addresses):
the first one is the left child of 1, while the second one is the left child of 7
(so that insert is very easy).
Filename: qsort.py
Q: What are the time complexities for the operations implemented?
Debriefing (required!): --------------------------
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%100%)?
5. Any other comments?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,170 @@
CS 325, Algorithms (MS/MEng-level), Fall 2019
HW10 - Challenge Problem - RNA Structure Prediction (6%)
This problem combines dynamic programming and priority queues.
Due Wednesday 12/4, 11:59pm.
No late submission will be accepted.
Include in your submission: report.txt, rna.py.
Grading:
* report.txt -- 1%
* 1-best structure -- 2%
* number of structures -- 1%
* k-best structures -- 2%
Textbooks for References:
[1] KT Ch. 6.5 (DP over intervals -- RNA structure)
[2] KT slides: DP I (RNA section)
http://www.cs.princeton.edu/~wayne/kleinberg-tardos/
***Please analyze time/space complexities for each problem in report.txt.
1. Given an RNA sequence, such as ACAGU, we can predict its secondary structure
by tagging each nucleotide as (, ., or ). Each matching pair of () must be
AU, GC, or GU (or their mirror symmetries: UA, CG, UG).
We also assume pairs can _not_ cross each other.
The following are valid structures for ACAGU:
ACAGU
.....
...()
..(.)
.(.).
(...)
((.))
We want to find the structure with the maximum number of matching pairs.
In the above example, the last structure is optimal (2 pairs).
>>> best("ACAGU")
(2, '((.))')
Tie-breaking: arbitrary. Don't worry as long as your structure
is one of the correct best structures.
some other cases (more cases at the bottom):
GCACG
(2, '().()')
UUCAGGA
(3, '(((.)))')
GUUAGAGUCU
(4, '(.()((.)))')
AUAACCUUAUAGGGCUCUG
(8, '.(((..)()()((()))))')
AACCGCUGUGUCAAGCCCAUCCUGCCUUGUU
(11, '(((.(..(.((.)((...().))()))))))')
GAUGCCGUGUAGUCCAAAGACUUCACCGUUGG
(14, '.()()(()(()())(((.((.)(.))()))))')
CAUCGGGGUCUGAGAUGGCCAUGAAGGGCACGUACUGUUU
(18, '(()())(((((.)))()(((())(.(.().()()))))))')
ACGGCCAGUAAAGGUCAUAUACGCGGAAUGACAGGUCUAUCUAC
(19, '.()(((.)(..))(((.()()(())))(((.)((())))))())')
AGGCAUCAAACCCUGCAUGGGAGCACCGCCACUGGCGAUUUUGGUA
(20, '.(()())...((((()()))((()(.()(((.)))()())))))()')
2. Total number of all possible structures
>>> total("ACAGU")
6
3. k-best structures: output the 1-best, 2nd-best, ... kth-best structures.
>>> kbest("ACAGU", 3)
[(2, '((.))'), (1, '(...)'), (1, '.(.).')]
The list must be sorted.
Tie-breaking: arbitrary.
In case the input k is bigger than the number of possible structures, output all.
Sanity check: kbest(s, 1)[0][0] == best(s)[0] for each RNA sequence s.
All three functions should be in one file: rna.py.
See more testcases at the end.
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Any other comments?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.
TESTCASES:
for each sequence s, we list three lines:
best(s)
total(s)
kbest(s, 10)
ACAGU
(2, '((.))')
6
[(2, '((.))'), (1, '.(.).'), (1, '..(.)'), (1, '...()'), (1, '(...)'), (0, '.....')]
------
AC
(0, '..')
1
[(0, '..')]
------
GUAC
(2, '(())')
5
[(2, '(())'), (1, '()..'), (1, '.().'), (1, '(..)'), (0, '....')]
------
GCACG
(2, '().()')
6
[(2, '().()'), (1, '(..).'), (1, '()...'), (1, '.(..)'), (1, '...()'), (0, '.....')]
------
CCGG
(2, '(())')
6
[(2, '(())'), (1, '(.).'), (1, '.().'), (1, '.(.)'), (1, '(..)'), (0, '....')]
------
CCCGGG
(3, '((()))')
20
[(3, '((()))'), (2, '((.)).'), (2, '(.()).'), (2, '.(()).'), (2, '.(().)'), (2, '.((.))'), (2, '((.).)'), (2, '(.(.))'), (2, '(.().)'), (2, '((..))')]
------
UUCAGGA
(3, '(((.)))')
24
[(3, '(((.)))'), (2, '((.).).'), (2, '((..)).'), (2, '(.(.)).'), (2, '((.))..'), (2, '.((.)).'), (2, '.((.).)'), (2, '.((..))'), (2, '((..).)'), (2, '((.)..)')]
------
AUAACCUA
(2, '.((...))')
19
[(2, '((.)..).'), (2, '(()...).'), (2, '()(...).'), (2, '().(..).'), (2, '()....()'), (2, '.()(..).'), (2, '.()...()'), (2, '.(.)..()'), (2, '.((...))'), (2, '.(.(..))')]
------
UUGGACUUG
(4, '(()((.)))')
129
[(4, '(())(.)()'), (4, '(()((.)))'), (3, '(().)..()'), (3, '(().).(.)'), (3, '(().)(..)'), (3, '((.))..()'), (3, '((.)).(.)'), (3, '((.))(..)'), (3, '(())(..).'), (3, '(())(.)..')]
------
UUUGGCACUA
(4, '(.()()(.))')
179
[(4, '((()).).()'), (4, '((.)()).()'), (4, '(.()()).()'), (4, '.(()()).()'), (4, '.(()()(.))'), (4, '((()).(.))'), (4, '((.)()(.))'), (4, '((()())..)'), (4, '(.()()(.))'), (3, '((()).)...')]
------
GAUGCCGUGUAGUCCAAAGACUUC
(11, '(((()()((()(.))))((.))))')
2977987
[(11, '(()())(((()().))(((.))))'), (11, '(()())(((()()).)(((.))))'), (11, '(()())(((()(.)))(((.))))'), (11, '(()()()((()(.)))(((.))))'), (11, '(((()()((()().)))((.))))'), (11, '(((()()((()(.))))((.))))'), (11, '(()()()((()()).)(((.))))'), (11, '(()()()((()().))(((.))))'), (11, '(((()()((()()).))((.))))'), (10, '(()()()((()().).)((.))).')]
------
AGGCAUCAAACCCUGCAUGGGAGCG
(10, '.(()())...((((()()))).())')
560580
[(10, '.(()())...((((())())).)()'), (10, '.(()())...((((()()))).)()'), (10, '.(()())...(((()(()))).)()'), (10, '.(()())...(((()(()))).())'), (10, '.(()())...((((())())).())'), (10, '.(()())...((((()()))).())'), (9, '((.).)(...(.((()()))).)()'), (9, '((.).)(...(((.)(()))).)()'), (9, '((.).)(...(.(()(()))).)()'), (9, '((.).)(...((.(()()))).)()')]
------

View File

@ -0,0 +1,42 @@
HW11 -- OPTIONAL (for your practice only -- solutions will be released on Tuesday)
Edit Distance (see updated final review solutions)
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw11 edit.py
Implement two functions:
* distance1(s, t): Viterbi-style (either top-down or bottom-up)
* distance2(s, t): Dijkstra-style (best-first)
For Dijkstra, you can use either heapdict or heapq (see review problem 7).
Given that this graph is extremely sparse (why?), heapq (ElogE) might be faster than heapdict (ElogV)
because the latter has overhead for hash.
They should return the same result (just return the edit distance).
We have 10 testcases (listed below); the first 5 test distance1(),
and the second 5 test distance2() on the same 5 string pairs.
My solutions (on flip2):
Testing Case 1 (open)... 0.001 s, Correct
Testing Case 2 (open)... 0.000 s, Correct
Testing Case 3 (open)... 0.012 s, Correct
Testing Case 4 (open)... 0.155 s, Correct
Testing Case 5 (open)... 0.112 s, Correct
Testing Case 6 (hidden)... 0.000 s, Correct
Testing Case 7 (hidden)... 0.000 s, Correct
Testing Case 8 (hidden)... 0.004 s, Correct
Testing Case 9 (hidden)... 0.009 s, Correct
Testing Case 10 (hidden)... 0.021 s, Correct
Total Time: 0.316 s
distance1("abcdefh", "abbcdfg") == 3
distance1("pretty", "prettier") == 3
distance1("aaaaaaadaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxaaaaaaaaaaaaaaaaaaaaaa") == 5
distance1('cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbxtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfjbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwy', 'cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasonrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwy') == 3
distance1('cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpasdfkecyywrbvhlqgxzutdjfmvlhsezfbhbllmfhzlqlcwibubyyjupbwhztsxyksfthkptxqlmhivfjbgclhombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrttoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwydmbihjkvziitusmkjljrsbafytsinql', 'cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfjbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwydmbihjkvziitusmkjljrsbafytsinql') == 11
distance2("abcdefh", "abbcdfg") == 3
distance2("pretty", "prettier") == 3
distance2("aaaaaaadaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxaaaaaaaaaaaaaaaaaaaaaa") == 5
distance2('cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbxtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfjbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwy', 'cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasonrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwy') == 3
distance2('cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpasdfkecyywrbvhlqgxzutdjfmvlhsezfbhbllmfhzlqlcwibubyyjupbwhztsxyksfthkptxqlmhivfjbgclhombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrttoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwydmbihjkvziitusmkjljrsbafytsinql', 'cpuyedzrwcbritzclzhwwabmlyresvewkdxwkamyzbtwiqzvokqpkecyywrbvhlqgxzutdjfmvlhsezfbhfjbllmfhzlqlcwibubyyjupbwhztskyksfthkptxqlmhivfjbgclwsombvytdztapwpzmdqfwwrhqsgztobeuiatcwmrzfbwhfnpzzasomrhotoqiwvexlgxsnafiagfewmopdzwanxswfsmbxsmsczbwsgnwydmbihjkvziitusmkjljrsbafytsinql') == 11

View File

@ -0,0 +1,80 @@
CS 325-001, Analysis of Algorithms, Fall 2019
HW2 - Divide-n-conquer: mergesort, number of inversions, longest path
Due Monday Oct 7, 11:59pm (same submission instructions as HW1).
No late submission will be accepted.
Need to submit: report.txt, msort.py, inversions.py, and longest.py.
longest.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw2 report.txt {msort,inversions,longest}.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw2
Textbooks for References:
[1] CLRS Ch. 2
0. Which of the following sorting algorithms are (or can be made) stable?
(a) mergesort
(b) quicksort with the first element as pivot
(c) quicksort with randomized pivot
(d) selection sort
(e) insertion sort
(f) heap sort --- not covered yet (see CLRS Ch. 6)
1. Implement mergesort.
>>> mergesort([4, 2, 5, 1, 6, 3])
[1, 2, 3, 4, 5, 6]
Filename: msort.py
2. Calculate the number of inversions in a list.
>>> num_inversions([4, 1, 3, 2])
4
>>> num_inversions([2, 4, 1, 3])
3
Filename: inversions.py
Must run in O(nlogn) time.
3. [WILL BE GRADED]
Length of the longest path in a binary tree (number of edges).
We will use the "buggy qsort" representation of binary trees from HW1:
[left_subtree, root, right_subtree]
>>> longest([[], 1, []])
0
>>> longest([[[], 1, []], 2, [[], 3, []]])
2
>>> longest([[[[], 1, []], 2, [[], 3, []]], 4, [[[], 5, []], 6, [[], 7, [[], 9, []]]]])
5
Note the answer is 5 because the longest path is 1-2-4-6-7-9.
Filename: longest.py
Must run in O(n) time.
Debriefing (required!): --------------------------
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
Note you are encouraged to discuss with your classmates,
but each students should submit his/her own code.
4. How deeply do you feel you understand the material it covers (0%100%)?
5. Any other comments?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,83 @@
CS 325, Algorithms, Fall 2019
HW3 - K closest numbers; Two Pointers
Due Monday Oct 14, 11:59pm. (same submission instructions as HW1-2).
No late submission will be accepted.
Need to submit: report.txt, closest_unsorted.py, closest_sorted.py, xyz.py.
closest_sorted.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw3 report.txt {closest*,xyz}.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw3
1. Given an array A of n numbers, a query x, and a number k,
find the k numbers in A that are closest (in value) to x.
For example:
find([4,1,3,2,7,4], 5.2, 2) returns [4,4]
find([4,1,3,2,7,4], 6.5, 3) returns [4,7,4]
find([5,3,4,1,6,3], 3.5, 2) returns [3,4]
Filename: closest_unsorted.py
Must run in O(n) time.
The elements in the returned list must be in the original order.
In case two numbers are equally close to x, choose the earlier one.
2. [WILL BE GRADED]
Now what if the input array is sorted? Can you do it faster?
find([1,2,3,4,4,7], 5.2, 2) returns [4,4]
find([1,2,3,4,4,7], 6.5, 3) returns [4,4,7]
Filename: closest_sorted.py
Must run in O(logn + k) time.
The elements in the returned list must be in the original order.
Note: in case two numbers are equally close to x, choose the smaller one:
find([1,2,3,4,4,6,6], 5, 3) returns [4,4,6]
find([1,2,3,4,4,5,6], 4, 5) returns [2,3,4,4,5]
Hint: you can use Python's bisect.bisect for binary search.
3. For a given array A of n *distinct* numbers, find all triples (x,y,z)
s.t. x + y = z. (x, y, z are distinct numbers)
e.g.,
find([1, 4, 2, 3, 5]) returns [(1,3,4), (1,2,3), (1,4,5), (2,3,5)]
Note that:
1) no duplicates in the input array
2) you can choose any arbitrary order for triples in the returned list.
Filename: xyz.py
Must run in O(n^2) time.
Hint: you can use any built-in sort in Python.
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
Note you are encouraged to discuss with your classmates,
but each students should submit his/her own code.
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Which part(s) of the course you like the most so far?
6. Which part(s) of the course you dislike the most so far?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,114 @@
CS 325-001, Algorithms, Fall 2019
HW4 - Priority Queue and Heaps
Due via the submit program on Monday Oct 21, 11:59pm.
No late submission will be accepted.
Need to submit: report.txt, nbest.py, kmergesort.py, datastream.py.
datastream.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw4 report.txt {nbest,kmergesort,datastream}.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw4
Textbooks for References:
[1] CLRS Ch. 6
[2] KT slides for binary heaps (only read the first 20 pages!):
https://www.cs.princeton.edu/~wayne/kleinberg-tardos/pdf/BinomialHeaps.pdf
[3] Python heapq module
0. There are two methods for building a heap from an unsorted array:
(1) insert each element into the heap --- O(nlogn) -- heapq.heappush()
(2) heapify (top-down) --- O(n) -- heapq.heapify()
(a) Derive these time complexities.
(b) Use a long list of random numbers to show the difference in time. (Hint: random.shuffle or random.sample)
(c) What about sorted or reversely-sorted numbers?
1. Given two lists A and B, each with n integers, return
a sorted list C that contains the smallest n elements from AxB:
AxB = { (x, y) | x in A, y in B }
i.e., AxB is the Cartesian Product of A and B.
ordering: (x,y) < (x',y') iff. x+y < x'+y' or (x+y==x'+y' and y<y')
You need to implement three algorithms and compare:
(a) enumerate all n^2 pairs, sort, and take top n.
(b) enumerate all n^2 pairs, but use qselect from hw1.
(c) Dijkstra-style best-first, only enumerate O(n) (at most 2n) pairs.
Hint: you can use Python's heapq module for priority queue.
Q: What are the time complexities of these algorithms?
>>> a, b = [4, 1, 5, 3], [2, 6, 3, 4]
>>> nbesta(a, b) # algorithm (a), slowest
[(1, 2), (1, 3), (3, 2), (1, 4)]
>>> nbestb(a, b) # algorithm (b), slow
[(1, 2), (1, 3), (3, 2), (1, 4)]
>>> nbestc(a, b) # algorithm (c), fast
[(1, 2), (1, 3), (3, 2), (1, 4)]
Filename: nbest.py
2. k-way mergesort (the classical mergesort is a special case where k=2).
>>> kmergesort([4,1,5,2,6,3,7,0], 3) # k=3
[0,1,2,3,4,5,6,7]
Q: What is the complexity? Write down the detailed analysis in report.txt.
Filename: kmergesort.py
3. [WILL BE GRADED]
Find the k smallest numbers in a data stream of length n (k<<n),
using only O(k) space (the stream itself might be too big to fit in memory).
>>> ksmallest(4, [10, 2, 9, 3, 7, 8, 11, 5, 7])
[2, 3, 5, 7]
>>> ksmallest(3, range(1000000, 0, -1))
[1, 2, 3]
Note:
a) it should work with both lists and lazy lists
b) the output list should be sorted
Q: What is your complexity? Write down the detailed analysis in report.txt.
Filename: datastream.py
[UPDATE] The built-in function heapq.nsmallest() is _not_ allowed for this problem.
The whole point is to implement it yourself. :)
4. (optional) Summarize the time complexities of the basic operations (push, pop-min, peak, heapify) for these implementations of priority queue:
(a) unsorted array
(b) sorted array (highest priority first)
(c) reversly sorted array (lowest priority first)
(d) linked list
(e) binary heap
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
Note you are encouraged to discuss with your classmates,
but each students should submit his/her own code.
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Which part(s) of the course you like the most so far?
6. Which part(s) of the course you dislike the most so far?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,130 @@
CS 532-001, Algorithms, Fall 2019
HW5 - DP (part 1: simple)
HWs 5-7 are all on DPs.
Due Monday Oct 28, 11:59pm.
No late submission will be accepted.
Need to submit report.txt, mis.py, bsts.py, bitstrings.py.
mis.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw5 report.txt {mis,bsts,bitstrings}.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw5
Textbooks for References:
[1] CLRS Ch. 15
[2] KT Ch. 6
or Ch. 5 in a previous version:
http://cs.furman.edu/~chealy/cs361/kleinbergbook.pdf
Hint: Among the three coding questions, p3 is the easiest, and p1 is similar to p3.
You'll realize that both are very similar to p0 (Fibonacci).
p2 is slightly different from these, but still very easy.
0. (Optional) Is Fibonacci REALLY O(n)?
Hint: the value of f(n) itself grows exponentially.
1. [WILL BE GRADED]
Maximum Weighted Independent Set
[HINT] independent set is a set where no two numbers are neighbors in the original list.
see also https://en.wikipedia.org/wiki/Independent_set_(graph_theory)
input: a list of numbers (could be negative)
output: a pair of the max sum and the list of numbers chosen
>>> max_wis([7,8,5])
(12, [7,5])
>>> max_wis([-1,8,10])
(10, [10])
>>> max_wis([])
(0, [])
[HINT] if all numbers are negative, the optimal solution is 0,
since [] is an independent set according to the definition above.
>>> max_wis([-5, -1, -4])
(0, [])
Q: What's the complexity?
Include both top-down (max_wis()) and bottom-up (max_wis2()) solutions,
and make sure they produce exact same results.
We'll only grade the top-down version.
Tie-breaking: any best solution is considered correct.
Filename: mis.py
[HINT] you can also use the naive O(2^n) exhaustive search method to verify your answer.
2. Number of n-node BSTs
input: n
output: number of n-node BSTs
>>> bsts(2)
2
>>> bsts(3)
5
>>> bsts(5)
42
[HINT] There are two 2-node BSTs:
2 1
/ \
1 2
Note that all other 2-node BSTs are *isomorphic* to either one.
Qa: What's the complexity of this DP?
Qb: What's the name of this famous number series?
Feel free to use any implementation style.
Filename: bsts.py
3. Number of bit strings of length n that has
1) no two consecutive 0s.
2) two consecutive 0s.
>>> num_no(3)
5
>>> num_yes(3)
3
[HINT] There are three 3-bit 0/1-strings that have two consecutive 0s.
001 100 000
The other five 3-bit 0/1-strings have no two consecutive 0s:
010 011 101 110 111
Feel free to choose any implementation style.
Filename: bitstrings.py
[HINT] Like problem 1, you can also use the O(2^n) exhaustive search method to verify your answer.
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Which part(s) of the course you like the most so far?
6. Which part(s) of the course you dislike the most so far?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,114 @@
CS 325-001, Algorithms, Fall 2019
HW6 - DP (part 2)
Due on Monday Nov 4, 11:59pm.
No late submission will be accepted.
Need to submit: report.txt, knapsack_unbounded.py, knapsack_bounded.py.
knapsack_bounded.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw6 report.txt knapsack*.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw6
Textbooks for References:
[1] KT Ch. 6.4
or Ch. 5.3 in a previous version:
http://cs.furman.edu/~chealy/cs361/kleinbergbook.pdf
[2] KT slides for DP (pages 1-37):
https://www.cs.princeton.edu/~wayne/kleinberg-tardos/pdf/06DynamicProgrammingI.pdf
[3] Wikipedia: Knapsack (unbounded and 0/1)
[4] CLRS Ch. 15
Please answer time/space complexities for each problem in report.txt.
0. For each of the coding problems below:
(a) Describe a greedy solution.
(b) Show a counterexample to the greedy solution.
(c) Define the DP subproblem
(d) Write the recurrence relations
(e) Do not forget base cases
(f) Analyze the space and time complexities
1. Unbounded Knapsack
You have n items, each with weight w_i and value v_i, and each has infinite copies.
**All numbers are positive integers.**
What's the best value for a bag of W?
>>> best(3, [(2, 4), (3, 5)])
(5, [0, 1])
the input to the best() function is W and a list of pairs (w_i, v_i).
this output means to take 0 copies of item 1 and 1 copy of item 2.
tie-breaking: *reverse* lexicographical: i.e., [1, 0] is better than [0, 1]:
(i.e., take as many copies from the first item as possible, etc.)
>>> best(3, [(1, 5), (1, 5)])
(15, [3, 0])
>>> best(3, [(1, 2), (1, 5)])
(15, [0, 3])
>>> best(3, [(1, 2), (2, 5)])
(7, [1, 1])
>>> best(58, [(5, 9), (9, 18), (6, 12)])
(114, [2, 4, 2])
>>> best(92, [(8, 9), (9, 10), (10, 12), (5, 6)])
(109, [1, 1, 7, 1])
Q: What are the time and space complexities?
filename: knapsack_unbounded.py
2. [WILL BE GRADED]
Bounded Knapsack
You have n items, each with weight w_i and value v_i, and has c_i copies.
**All numbers are positive integers.**
What's the best value for a bag of W?
>>> best(3, [(2, 4, 2), (3, 5, 3)])
(5, [0, 1])
the input to the best() function is W and a list of triples (w_i, v_i, c_i).
tie-breaking: same as in p1:
>>> best(3, [(1, 5, 2), (1, 5, 3)])
(15, [2, 1])
>>> best(3, [(1, 5, 1), (1, 5, 3)])
(15, [1, 2])
>>> best(20, [(1, 10, 6), (3, 15, 4), (2, 10, 3)])
(130, [6, 4, 1])
>>> best(92, [(1, 6, 6), (6, 15, 7), (8, 9, 8), (2, 4, 7), (2, 20, 2)])
(236, [6, 7, 3, 7, 2])
Q: What are the time and space complexities?
filename: knapsack_bounded.py
You are encouraged to come up with a few other testcases yourself to test your code!
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Which part(s) of the course you like the most so far?
6. Which part(s) of the course you dislike the most so far?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,147 @@
CS 325-001, Algorithms, Fall 2019
HW8 - Graphs (part I); DP (part III)
Due on Monday November 18, 11:59pm.
No late submission will be accepted.
Include in your submission: report.txt, topol.py, viterbi.py.
viterbi.py will be graded for correctness (1%).
To submit:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/submit hw8 report.txt {topol,viterbi}.py
(You can submit each file separately, or submit them together.)
To see your best results so far:
flip $ /nfs/farm/classes/eecs/fall2019/cs325-001/query hw8
Textbooks for References:
[1] CLRS Ch. 23 (Elementary Graph Algorithms)
[2] KT Ch. 3 (graphs), or Ch. 2 in this earlier version:
http://cs.furman.edu/~chealy/cs361/kleinbergbook.pdf
[3] KT slides (highly recommend!):
https://www.cs.princeton.edu/~wayne/kleinberg-tardos/pdf/03Graphs.pdf
[4] Jeff Erickson: Ch. 5 (Basic Graph Algorithms):
http://jeffe.cs.illinois.edu/teaching/algorithms/book/05-graphs.pdf
[5] DPV Ch. 3, 4.2, 4.4, 4.7 (Dasgupta, Papadimitriou, Vazirani)
https://www.cs.berkeley.edu/~vazirani/algorithms/chap3.pdf (decomposition of graphs)
https://www.cs.berkeley.edu/~vazirani/algorithms/chap4.pdf (paths, shortest paths)
[6] my advanced DP tutorial (up to page 16):
http://web.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdf
Please answer non-coding questions in report.txt.
0. For the following graphs, decide whether they are
(1) directed or undirected, (2) dense or sparse, and (3) cyclic or acyclic:
(a) Facebook
(b) Twitter
(c) a family
(d) V=airports, E=direct_flights
(e) a mesh
(f) V=courses, E=prerequisites
(g) a tree
(h) V=linux_software_packages, E=dependencies
(i) DP subproblems for 0-1 knapsack
Can you name a very big dense graph?
1. Topological Sort
For a given directed graph, output a topological order if it exists.
Tie-breaking: ARBITRARY tie-breaking. This will make the code
and time complexity analysis a lot easier.
e.g., for the following example:
0 --> 2 --> 3 --> 5 --> 6
/ \ | / \
/ \ v / \
1 > 4 > 7
>>> order(8, [(0,2), (1,2), (2,3), (2,4), (3,4), (3,5), (4,5), (5,6), (5,7)])
[0, 1, 2, 3, 4, 5, 6, 7]
Note that order() takes two arguments, n and list_of_edges,
where n specifies that the nodes are named 0..(n-1).
If we flip the (3,4) edge:
>>> order(8, [(0,2), (1,2), (2,3), (2,4), (4,3), (3,5), (4,5), (5,6), (5,7)])
[0, 1, 2, 4, 3, 5, 6, 7]
If there is a cycle, return None
>>> order(4, [(0,1), (1,2), (2,1), (2,3)])
None
Other cases:
>>> order(5, [(0,1), (1,2), (2,3), (3,4)])
[0, 1, 2, 3, 4]
>>> order(5, [])
[0, 1, 2, 3, 4] # could be any order
>>> order(3, [(1,2), (2,1)])
None
>>> order(1, [(0,0)]) # self-loop
None
Tie-breaking: arbitrary (any valid topological order is fine).
filename: topol.py
questions:
(a) did you realize that bottom-up implementations of DP use (implicit) topological orderings?
e.g., what is the topological ordering in your (or my) bottom-up bounded knapsack code?
(b) what about top-down implementations? what order do they use to traverse the graph?
(c) does that suggest there is a top-down solution for topological sort as well?
2. [WILL BE GRADED]
Viterbi Algorithm For Longest Path in DAG (see DPV 4.7, [2], CLRS problem 15-1)
Recall that the Viterbi algorithm has just two steps:
a) get a topological order (use problem 1 above)
b) follow that order, and do either forward or backward updates
This algorithm captures all DP problems on DAGs, for example,
longest path, shortest path, number of paths, etc.
In this problem, given a DAG (guaranteed acyclic!), output a pair (l, p)
where l is the length of the longest path (number of edges), and p is the path. (you can think of each edge being unit cost)
e.g., for the above example:
>>> longest(8, [(0,2), (1,2), (2,3), (2,4), (3,4), (3,5), (4,5), (5,6), (5,7)])
(5, [0, 2, 3, 4, 5, 6])
>>> longest(8, [(0,2), (1,2), (2,3), (2,4), (4,3), (3,5), (4,5), (5,6), (5,7)])
(5, [0, 2, 4, 3, 5, 6])
>>> longest(8, [(0,1), (0,2), (1,2), (2,3), (2,4), (4,3), (3,5), (4,5), (5,6), (5,7), (6,7)])
(7, [0, 1, 2, 4, 3, 5, 6, 7]) # unique answer
Note that longest() takes two arguments, n and list_of_edges,
where n specifies that the nodes are named 0..(n-1).
Tie-breaking: arbitrary. any longest path is fine.
Filename: viterbi.py
Note: you can use this program to solve MIS, knapsacks, coins, etc.
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Any other comments?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,166 @@
CS 325, Algorithms, Fall 2019
HW9 - Graphs (part 2), DP (part 4)
Due Monday Nov 25, 11:59pm.
No late submission will be accepted.
Include in your submission: report.txt, dijkstra.py, nbest.py.
dijkstra.py will be graded for correctness (1%).
Textbooks for References:
[1] CLRS Ch. 22 (graph)
[2] my DP tutorial (up to page 16):
http://web.engr.oregonstate.edu/~huanlian/slides/COLING-tutorial-anim.pdf
[3] DPV Ch. 3, 4.2, 4.4, 4.7, 6 (Dasgupta, Papadimitriou, Vazirani)
https://www.cs.berkeley.edu/~vazirani/algorithms/chap3.pdf
https://www.cs.berkeley.edu/~vazirani/algorithms/chap4.pdf
https://www.cs.berkeley.edu/~vazirani/algorithms/chap6.pdf
[4] KT Ch. 6 (DP)
http://www.aw-bc.com/info/kleinberg/assets/downloads/ch6.pdf
[5] KT slides: Greedy II (Dijkstra)
http://www.cs.princeton.edu/~wayne/kleinberg-tardos/
***Please answer time/space complexities for each problem in report.txt.
1. [WILL BE GRADED]
Dijkstra (see CLRS 24.3 and DPV 4.4)
Given an undirected graph, find the shortest path from source (node 0)
to target (node n-1).
Edge weights are guaranteed to be non-negative, since Dijkstra doesn't work
with negative weights, e.g.
3
0 ------ 1
\ /
2 \ / -2
\/
2
in this example, Dijkstra would return length 2 (path 0-2),
but path 0-1-2 is better (length 1).
For example (return a pair of shortest-distance and shortest-path):
1
0 ------ 1
\ / \
5 \ /1 \6
\/ 2 \
2 ------ 3
>>> shortest(4, [(0,1,1), (0,2,5), (1,2,1), (2,3,2), (1,3,6)])
(4, [0,1,2,3])
If the target node (n-1) is unreachable from the source (0),
return None:
>>> shortest(5, [(0,1,1), (0,2,5), (1,2,1), (2,3,2), (1,3,6)])
None
Another example:
1 1
0-----1 2-----3
>>> shortest(4, [(0,1,1), (2,3,1)])
None
Tiebreaking: arbitrary. Any shortest path would do.
Filename: dijkstra.py
Hint: please use heapdict from here:
https://raw.githubusercontent.com/DanielStutzbach/heapdict/master/heapdict.py
>>> from heapdict import heapdict
>>> h = heapdict()
>>> h['a'] = 3
>>> h['b'] = 1
>>> h.peekitem()
('b', 1)
>>> h['a'] = 0
>>> h.peekitem()
('a', 0)
>>> h.popitem()
('a', 0)
>>> len(h)
1
>>> 'a' in h
False
>>> 'b' in h
True
You don't need to submit heapdict.py; we have it in our grader.
2. [Redo the nbest question from Midterm, preparing for HW10 part 3]
Given k pairs of lists A_i and B_i (0 <= i < k), each with n sorted numbers,
find the n smallest pairs in all the (k n^2) pairs.
We say (x,y) < (x', y') if and only if x+y < x'+y'.
Tie-breaking: lexicographical (i.e., prefer smaller x).
You can base your code on the skeleton from the Midterm:
from heapq import heappush, heappop
def nbest(ABs): # no need to pass in k or n
k = len(ABs)
n = len(ABs[0][0])
def trypush(i, p, q): # push pair (A_i,p, B_i,q) if possible
A, B = ABs[i] # A_i, B_i
if p < n and q < n and ______________________________:
heappush(h, (________________, i, p, q, (A[p],B[q])))
used.add((i, p, q))
h, used = ___________________ # initialize
for i in range(k): # NEED TO OPTIMIZE
trypush(______________)
for _ in range(n):
_, i, p, q, pair = ________________
yield pair # return the next pair (in a lazy list)
_______________________
_______________________
But recall we had two optimizations to speed up the first for-loop (queue initialization):
(1) using heapify instead of k initial pushes. You need to implement this (very easy).
(2) using qselect to choose top n out of the k bests. This one is OPTIONAL.
Analyze the time complexity for the version you implemented.
>>> list(nbest([([1,2,4], [2,3,5]), ([0,2,4], [3,4,5])]))
[(0, 3), (1, 2), (0, 4)]
>>> list(nbest([([-1,2],[1,4]), ([0,2],[3,4]), ([0,1],[4,6]), ([-1,2],[1,5])]))
[(-1, 1), (-1, 1)]
>>> list(nbest([([5,6,10,14],[3,5,10,14]),([2,7,9,11],[3,8,12,16]),([1,3,8,10],[5,9,10,11]),([1,2,3,5],[3,4,9,10]),([4,5,9,10],[2,4,6,11]),([4,6,10,13],[2,3,5,9]),([3,7,10,12],[1,2,5,10]),([5,9,14,15],[4,8,13,14])]))
[(1, 3), (3, 1), (1, 4), (2, 3)]
>>> list(nbest([([1,6,8,13],[5,8,11,12]),([1,2,3,5],[5,9,11,13]),([3,5,7,10],[4,6,7,11]),([1,4,7,8],[4,9,11,15]),([4,8,10,13],[4,6,10,11]),([4,8,12,15],[5,10,11,13]),([2,3,4,8],[4,7,11,15]),([4,5,10,15],[5,6,7,8])]))
[(1, 4), (1, 5), (1, 5), (2, 4)]
This problem prepares you for the hardest question in HW10 (part 3).
Filename: nbest.py
Debriefing (required!): --------------------------
0. What's your name?
1. Approximately how many hours did you spend on this assignment?
2. Would you rate it as easy, moderate, or difficult?
3. Did you work on it mostly alone, or mostly with other people?
4. How deeply do you feel you understand the material it covers (0%-100%)?
5. Any other comments?
This section is intended to help us calibrate the homework assignments.
Your answers to this section will *not* affect your grade; however, skipping it
will certainly do.

View File

@ -0,0 +1,19 @@
qselect(xs,k) =
~xs -> {
pivot <- xs[0]!
left <- xs[#0 <= pivot]
right <- xs[#0 > pivot]
} ->
if k > |left| + 1 then qselect(right, k - |left| - 1)
else if k == |left| + 1 then [pivot]
else qselect(left, k);
_search(xs, k) =
if xs[1] == k then xs
else if xs[1] > k then _search(xs[0], k)
else _search(xs[2], k);
sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]);
search(xs, k) = |_search(xs, k)| != 0;
insert(xs, k) = _insert(k, _search(xs, k));
_insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs

View File

@ -0,0 +1,15 @@
module Common where
import PythonAst
import PythonGen
import Text.Parsec
compile :: (String -> String -> Either ParseError p) -> (p -> [PyStmt]) -> String -> IO ()
compile p t f = do
let inputName = f ++ ".lang"
let outputName = f ++ ".py"
file <- readFile inputName
let either = p inputName file
case either of
Right prog -> writeFile outputName (translate $ t prog)
Left e -> print e

View File

@ -0,0 +1,432 @@
module LanguageOne where
import qualified PythonAst as Py
import Data.Bifunctor
import Data.Char
import Data.Functor
import qualified Data.Map as Map
import Data.Maybe
import qualified Data.Set as Set
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Combinator
import Control.Monad.State
{- Data Types -}
data PossibleType = List | Any deriving Eq
data SelectorMarker = None | Remove
data Op
= Add
| Subtract
| Multiply
| Divide
| Insert
| Concat
| LessThan
| LessThanEq
| GreaterThan
| GreaterThanEq
| Equal
| NotEqual
| And
| Or
data Selector = Selector String Expr
data Expr
= Var String
| IntLiteral Int
| ListLiteral [Expr]
| Split Expr [Selector] Expr
| IfElse Expr Expr Expr
| BinOp Op Expr Expr
| FunctionCall Expr [Expr]
| LengthOf Expr
| Random
| Access Expr Expr SelectorMarker
| Parameter Int
data Function = Function String [String] Expr
data Prog = Prog [Function]
{- Parser -}
type Parser = Parsec String (Maybe Int)
parseInt :: Parser Int
parseInt = read <$> (many1 digit <* spaces)
parseVar :: Parser String
parseVar =
do
c <- satisfy (\c -> (isLetter c && isLower c) || c == '_')
cs <- many (satisfy isLetter <|> digit)
spaces
let var = c:cs
if var `elem` ["if", "then", "else", "rand"]
then fail "reserved"
else return var
parseKwIf :: Parser ()
parseKwIf = string "if" $> ()
parseKwThen :: Parser ()
parseKwThen = string "then" $> ()
parseKwElse :: Parser ()
parseKwElse = string "else" $> ()
parseKwRand :: Parser Expr
parseKwRand = string "rand" $> Random
parseThis :: Parser Expr
parseThis =
do
char '&'
contextNum <- getState
spaces
return (Var $ "context_" ++ show contextNum)
parseList :: Parser Expr
parseList = ListLiteral <$>
do
char '[' >> spaces
es <- sepBy parseExpr (char ',' >> spaces)
spaces >> char ']' >> spaces
return es
parseSplit :: Parser Expr
parseSplit =
do
char '~' >> spaces
e <- parseExpr
spaces >> string "->"
spaces >> char '{'
contextNum <- getState
putState $ return $ 1 + fromMaybe (-1) contextNum
es <- many1 (spaces >> parseSelector)
putState contextNum
spaces >> char '}' >> spaces >> string "->" >> spaces
e' <- parseExpr
spaces
return $ Split e es e'
parseSelectorMarker :: Parser SelectorMarker
parseSelectorMarker = (char '!' >> return Remove) <|> return None
parseSelector :: Parser Selector
parseSelector =
do
name <- parseVar
spaces >> string "<-" >> spaces
expr <- parseExpr
spaces
return $ Selector name expr
parseIfElse :: Parser Expr
parseIfElse =
do
parseKwIf >> spaces
ec <- parseExpr
spaces >> parseKwThen >> spaces
et <- parseExpr
spaces >> parseKwElse >> spaces
ee <- parseExpr
spaces
return $ IfElse ec et ee
parseLength :: Parser Expr
parseLength =
do
char '|' >> spaces
e <- parseExpr
spaces >> char '|' >> spaces
return $ LengthOf e
parseParameter :: Parser Expr
parseParameter =
do
char '#'
d <- digit
spaces
return $ Parameter $ read [d]
parseParenthesized :: Parser Expr
parseParenthesized =
do
char '(' >> spaces
e <- parseExpr
spaces >> char ')' >> spaces
return e
parseBasicExpr :: Parser Expr
parseBasicExpr = choice
[ IntLiteral <$> parseInt
, parseThis
, parseList
, parseSplit
, parseLength
, parseParameter
, parseParenthesized
, Var <$> try parseVar
, parseKwRand
, parseIfElse
]
parsePostfix :: Parser (Expr -> Expr)
parsePostfix = parsePostfixAccess <|> parsePostfixCall
parsePostfixAccess :: Parser (Expr -> Expr)
parsePostfixAccess =
do
char '[' >> spaces
e <- parseExpr
spaces >> char ']' >> spaces
marker <- parseSelectorMarker
spaces
return $ \e' -> Access e' e marker
parsePostfixCall :: Parser (Expr -> Expr)
parsePostfixCall =
do
char '(' >> spaces
es <- sepBy parseExpr (char ',' >> spaces)
char ')' >> spaces
return $ flip FunctionCall es
parsePostfixedExpr :: Parser Expr
parsePostfixedExpr =
do
eb <- parseBasicExpr
spaces
ps <- many parsePostfix
return $ foldl (flip ($)) eb ps
parseOp :: String -> Op -> Parser Op
parseOp s o = try (string s) >> return o
parseLevel :: Parser Expr -> Parser Op -> Parser Expr
parseLevel pe po =
do
start <- pe
spaces
ops <- many $ try $ do
op <- po
spaces
val <- pe
spaces
return (op, val)
spaces
return $ foldl (\l (o, r) -> BinOp o l r) start ops
parseExpr :: Parser Expr
parseExpr = foldl parseLevel parsePostfixedExpr
[ parseOp "*" Multiply, parseOp "/" Divide
, parseOp "+" Add, parseOp "-" Subtract
, parseOp "<<" Insert
, parseOp "++" Concat
, parseOp "<=" LessThanEq <|> parseOp ">=" GreaterThanEq <|>
parseOp "<" LessThan <|> parseOp ">" GreaterThan <|>
parseOp "==" Equal <|> parseOp "!=" NotEqual
, parseOp "&&" And <|> parseOp "||" Or
]
parseFunction :: Parser Function
parseFunction =
do
name <- parseVar
spaces >> char '(' >> spaces
vs <- sepBy parseVar (char ',' >> spaces)
spaces >> char ')' >> spaces >> char '=' >> spaces
body <- parseExpr
spaces
return $ Function name vs body
parseProg :: Parser Prog
parseProg = Prog <$> sepBy1 parseFunction (char ';' >> spaces)
parse :: SourceName -> String -> Either ParseError Prog
parse = runParser parseProg Nothing
{- "Type" checker -}
mergePossibleType :: PossibleType -> PossibleType -> PossibleType
mergePossibleType List _ = List
mergePossibleType _ List = List
mergePossibleType _ _ = Any
getPossibleType :: String -> Expr -> PossibleType
getPossibleType s (Var s') = if s == s' then List else Any
getPossibleType _ (ListLiteral _) = List
getPossibleType s (Split _ _ e) = getPossibleType s e
getPossibleType s (IfElse i t e) =
foldl1 mergePossibleType $ map (getPossibleType s) [i, t, e]
getPossibleType _ (BinOp Insert _ _) = List
getPossibleType _ (BinOp Concat _ _) = List
getPossibleType _ _ = Any
{- Translator -}
type Translator = Control.Monad.State.State (Map.Map String [String], Int)
currentTemp :: Translator String
currentTemp = do
t <- gets snd
return $ "temp" ++ show t
incrementTemp :: Translator String
incrementTemp = do
modify (second (+1))
currentTemp
hasLambda :: Expr -> Bool
hasLambda (ListLiteral es) = any hasLambda es
hasLambda (Split e ss r) =
hasLambda e || any (\(Selector _ e') -> hasLambda e') ss || hasLambda r
hasLambda (IfElse i t e) = hasLambda i || hasLambda t || hasLambda e
hasLambda (BinOp o l r) = hasLambda l || hasLambda r
hasLambda (FunctionCall e es) = any hasLambda $ e : es
hasLambda (LengthOf e) = hasLambda e
hasLambda (Access e _ _) = hasLambda e
hasLambda Parameter{} = True
hasLambda _ = False
translate :: Prog -> [Py.PyStmt]
translate p = fst $ runState (translateProg p) (Map.empty, 0)
translateProg :: Prog -> Translator [Py.PyStmt]
translateProg (Prog fs) = concat <$> traverse translateFunction fs
translateFunction :: Function -> Translator [Py.PyStmt]
translateFunction (Function n ps ex) = do
let createIf p = Py.BinOp Py.Equal (Py.Var p) (Py.ListLiteral [])
let createReturn p = Py.IfElse (createIf p) [Py.Return (Py.Var p)] [] Nothing
let fastReturn = [createReturn p | p <- take 1 ps, getPossibleType p ex == List]
(ss, e) <- translateExpr ex
return $ return $ Py.FunctionDef n ps $ fastReturn ++ ss ++ [Py.Return e]
translateSelector :: Selector -> Translator Py.PyStmt
translateSelector (Selector n e) =
let
cacheCheck = Py.NotIn (Py.StrLiteral n) (Py.Var "cache")
cacheAccess = Py.Access (Py.Var "cache") [Py.StrLiteral n]
cacheSet = Py.Assign (Py.AccessPat (Py.Var "cache") [Py.StrLiteral n])
body e' = [ Py.IfElse cacheCheck [cacheSet e'] [] Nothing, Py.Return cacheAccess]
in
do
(ss, e') <- translateExpr e
vs <- gets fst
let callPrereq p = Py.Standalone $ Py.FunctionCall (Py.Var p) []
let prereqs = maybe [] (map callPrereq) $ Map.lookup n vs
return $ Py.FunctionDef n [] $ ss ++ prereqs ++ body e'
translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
translateExpr (Var s) = do
vs <- gets fst
let sVar = Py.Var s
let expr = if Map.member s vs then Py.FunctionCall sVar [] else sVar
return ([], expr)
translateExpr (IntLiteral i) = return ([], Py.IntLiteral i)
translateExpr (ListLiteral l) = do
tl <- mapM translateExpr l
return (concatMap fst tl, Py.ListLiteral $ map snd tl)
translateExpr (Split e ss e') = do
vs <- gets fst
let cacheAssign = Py.Assign (Py.VarPat "cache") (Py.DictLiteral [])
let cacheStmt = [ cacheAssign | Map.size vs == 0 ]
let vnames = map (\(Selector n es) -> n) ss
let prereqs = snd $ foldl (\(ds, m) (Selector n es) -> (n:ds, Map.insert n ds m)) ([], Map.empty) ss
modify $ first $ Map.union prereqs
fs <- mapM translateSelector ss
(sts, te) <- translateExpr e'
modify $ first $ const vs
return (cacheStmt ++ fs ++ sts, te)
translateExpr (IfElse i t e) = do
temp <- incrementTemp
let tempPat = Py.VarPat temp
(ists, ie) <- translateExpr i
(tsts, te) <- translateExpr t
(ests, ee) <- translateExpr e
let thenSts = tsts ++ [Py.Assign tempPat te]
let elseSts = ests ++ [Py.Assign tempPat ee]
let newIf = Py.IfElse ie thenSts [] $ Just elseSts
return (ists ++ [newIf], Py.Var temp)
translateExpr (BinOp o l r) = do
(lsts, le) <- translateExpr l
(rsts, re) <- translateExpr r
(opsts, oe) <- translateOp o le re
return (lsts ++ rsts ++ opsts, oe)
translateExpr (FunctionCall f ps) = do
(fsts, fe) <- translateExpr f
tps <- mapM translateExpr ps
return (fsts ++ concatMap fst tps, Py.FunctionCall fe $ map snd tps)
translateExpr (LengthOf e) =
second (Py.FunctionCall (Py.Var "len") . return) <$> translateExpr e
translateExpr (Access e Random m) = do
temp <- incrementTemp
(sts, ce) <- translateExpr e
let lenExpr = Py.FunctionCall (Py.Var "len") [Py.Var temp]
let randExpr = Py.FunctionCall (Py.Var "randint") [ Py.IntLiteral 0, lenExpr ]
return (sts, singleAccess ce randExpr m)
translateExpr (Access c i m) = do
(csts, ce) <- translateExpr c
(ists, ie) <- translateExpr i
temp <- incrementTemp
if hasLambda i
then return (csts ++ ists ++ [createFilterLambda temp ie m], Py.FunctionCall (Py.Var temp) [ce])
else return (csts ++ ists, singleAccess ce ie m)
translateExpr (Parameter i) = return $ ([], Py.Var $ "arg" ++ show i)
translateExpr _ = fail "Invalid expression"
singleAccess :: Py.PyExpr -> Py.PyExpr -> SelectorMarker -> Py.PyExpr
singleAccess c i None = Py.Access c [i]
singleAccess c i Remove = Py.FunctionCall (Py.Member c "pop") [i]
createFilterLambda :: String -> Py.PyExpr -> SelectorMarker -> Py.PyStmt
createFilterLambda s e None = Py.FunctionDef s ["arg"]
[ Py.Assign (Py.VarPat "out") (Py.ListLiteral [])
, Py.For (Py.VarPat "arg0") (Py.Var "arg")
[ Py.IfElse e
[ Py.Standalone $ Py.FunctionCall (Py.Member (Py.Var "out") "append")
[ Py.Var "arg0" ]
]
[]
Nothing
]
, Py.Return $ Py.Var "out"
]
createFilterLambda s e Remove = Py.FunctionDef s ["arg"]
[ Py.Assign (Py.VarPat "i") $ Py.IntLiteral 0
, Py.Assign (Py.VarPat "out") (Py.ListLiteral [])
, Py.While (Py.BinOp Py.LessThan (Py.Var "i") $ Py.FunctionCall (Py.Var "len") [Py.Var "arg"])
[ Py.IfElse e
[ Py.Standalone $ Py.FunctionCall (Py.Member (Py.Var "out") "append")
[ singleAccess (Py.Var "arg") (Py.Var "i") Remove
]
]
[]
Nothing
, Py.Assign (Py.VarPat "i") (Py.BinOp Py.Add (Py.Var "i") (Py.IntLiteral 1))
]
, Py.Return $ Py.Var "out"
]
translateOp :: Op -> Py.PyExpr -> Py.PyExpr -> Translator ([Py.PyStmt], Py.PyExpr)
translateOp Add l r = return ([], Py.BinOp Py.Add l r)
translateOp Subtract l r = return ([], Py.BinOp Py.Subtract l r)
translateOp Multiply l r = return ([], Py.BinOp Py.Multiply l r)
translateOp Divide l r = return ([], Py.BinOp Py.Divide l r)
translateOp LessThan l r = return ([], Py.BinOp Py.LessThan l r)
translateOp LessThanEq l r = return ([], Py.BinOp Py.LessThanEq l r)
translateOp GreaterThan l r = return ([], Py.BinOp Py.GreaterThan l r)
translateOp GreaterThanEq l r = return ([], Py.BinOp Py.GreaterThanEq l r)
translateOp Equal l r = return ([], Py.BinOp Py.Equal l r)
translateOp NotEqual l r = return ([], Py.BinOp Py.NotEqual l r)
translateOp And l r = return ([], Py.BinOp Py.And l r)
translateOp Or l r = return ([], Py.BinOp Py.Or l r)
translateOp Concat l r = return ([], Py.BinOp Py.Add l r)
translateOp Insert l r = do
temp <- incrementTemp
let assignStmt = Py.Assign (Py.VarPat temp) l
let appendFunc = Py.Member (Py.Var temp) "append"
let insertStmt = Py.Standalone $ Py.FunctionCall appendFunc [r]
return ([assignStmt, insertStmt], Py.Var temp)

View File

@ -0,0 +1,47 @@
module PythonAst where
data PyBinOp
= Add
| Subtract
| Multiply
| Divide
| LessThan
| LessThanEq
| GreaterThan
| GreaterThanEq
| Equal
| NotEqual
| And
| Or
data PyExpr
= BinOp PyBinOp PyExpr PyExpr
| IntLiteral Int
| StrLiteral String
| BoolLiteral Bool
| ListLiteral [PyExpr]
| DictLiteral [(PyExpr, PyExpr)]
| Lambda [PyPat] PyExpr
| Var String
| Tuple [PyExpr]
| FunctionCall PyExpr [PyExpr]
| Access PyExpr [PyExpr]
| Ternary PyExpr PyExpr PyExpr
| Member PyExpr String
| In PyExpr PyExpr
| NotIn PyExpr PyExpr
data PyPat
= VarPat String
| IgnorePat
| TuplePat [PyPat]
| AccessPat PyExpr [PyExpr]
data PyStmt
= Assign PyPat PyExpr
| IfElse PyExpr [PyStmt] [(PyExpr, [PyStmt])] (Maybe [PyStmt])
| While PyExpr [PyStmt]
| For PyPat PyExpr [PyStmt]
| FunctionDef String [String] [PyStmt]
| Return PyExpr
| Standalone PyExpr

View File

@ -0,0 +1,132 @@
module PythonGen where
import PythonAst
import Data.List
import Data.Bifunctor
import Data.Maybe
indent :: String -> String
indent = (" " ++)
stmtBlock :: [PyStmt] -> [String]
stmtBlock = concatMap translateStmt
block :: String -> [String] -> [String]
block s ss = (s ++ ":") : map indent ss
prefix :: String -> PyExpr -> [PyStmt] -> [String]
prefix s e sts = block (s ++ " " ++ translateExpr e) $ stmtBlock sts
if_ :: PyExpr -> [PyStmt] -> [String]
if_ = prefix "if"
elif :: PyExpr -> [PyStmt] -> [String]
elif = prefix "elif"
else_ :: [PyStmt] -> [String]
else_ = block "else" . stmtBlock
while :: PyExpr -> [PyStmt] -> [String]
while = prefix "while"
parenth :: String -> String
parenth s = "(" ++ s ++ ")"
translateStmt :: PyStmt -> [String]
translateStmt (Assign p e) = [translatePat p ++ " = " ++ translateExpr e]
translateStmt (IfElse i t es e) =
if_ i t ++ concatMap (uncurry elif) es ++ maybe [] else_ e
translateStmt (While c t) = while c t
translateStmt (For x in_ b) = block head body
where
head = "for " ++ translatePat x ++ " in " ++ translateExpr in_
body = stmtBlock b
translateStmt (FunctionDef s ps b) = block head body
where
head = "def " ++ s ++ "(" ++ intercalate "," ps ++ ")"
body = stmtBlock b
translateStmt (Return e) = ["return " ++ translateExpr e]
translateStmt (Standalone e) = [translateExpr e]
precedence :: PyBinOp -> Int
precedence Add = 3
precedence Subtract = 3
precedence Multiply = 4
precedence Divide = 4
precedence LessThan = 2
precedence LessThanEq = 2
precedence GreaterThan = 2
precedence GreaterThanEq = 2
precedence Equal = 2
precedence And = 1
precedence Or = 0
opString :: PyBinOp -> String
opString Add = "+"
opString Subtract = "-"
opString Multiply = "*"
opString Divide = "/"
opString LessThan = "<"
opString LessThanEq = "<="
opString GreaterThan = ">"
opString GreaterThanEq = ">="
opString Equal = "=="
opString NotEqual = "!="
opString And = "and"
opString Or = "or"
translateOp :: PyBinOp -> PyBinOp -> PyExpr -> String
translateOp o o' =
if precedence o < precedence o'
then parenth . translateExpr
else translateExpr
dictMapping :: PyExpr -> PyExpr -> String
dictMapping f t = translateExpr f ++ ": " ++ translateExpr t
list :: String -> String -> [PyExpr] -> String
list o c es = o ++ intercalate ", " (map translateExpr es) ++ c
translateExpr :: PyExpr -> String
translateExpr (BinOp o l@(BinOp o1 _ _) r@(BinOp o2 _ _)) =
translateOp o o1 l ++ opString o ++ translateOp o o2 r
translateExpr (BinOp o l@(BinOp o1 _ _) r) =
translateOp o o1 l ++ opString o ++ translateExpr r
translateExpr (BinOp o l r@(BinOp o2 _ _)) =
translateExpr l ++ opString o ++ translateOp o o2 r
translateExpr (BinOp o l r) =
translateExpr l ++ opString o ++ translateExpr r
translateExpr (IntLiteral i) = show i
translateExpr (StrLiteral s) = "\"" ++ s ++ "\""
translateExpr (BoolLiteral b) = if b then "true" else "false"
translateExpr (ListLiteral l) = list "[" "]" l
translateExpr (DictLiteral l) =
"{" ++ intercalate ", " (map (uncurry dictMapping) l) ++ "}"
translateExpr (Lambda ps e) = parenth (head ++ ": " ++ body)
where
head = "lambda " ++ intercalate ", " (map translatePat ps)
body = translateExpr e
translateExpr (Var s) = s
translateExpr (Tuple es) = list "(" ")" es
translateExpr (FunctionCall f ps) = translateExpr f ++ list "(" ")" ps
translateExpr (Access (Var s) e) = s ++ list "[" "]" e
translateExpr (Access e@Access{} i) = translateExpr e ++ list "[" "]" i
translateExpr (Access e i) = "(" ++ translateExpr e ++ ")" ++ list "[" "]" i
translateExpr (Ternary c t e) =
translateExpr t ++ " if " ++ translateExpr c ++ " else " ++ translateExpr e
translateExpr (Member (Var s) m) = s ++ "." ++ m
translateExpr (Member e@Member{} m) = translateExpr e ++ "." ++ m
translateExpr (Member e m) = "(" ++ translateExpr e ++ ")." ++ m
translateExpr (In m c) =
"(" ++ translateExpr m ++ ") in (" ++ translateExpr c ++ ")"
translateExpr (NotIn m c) =
"(" ++ translateExpr m ++ ") not in (" ++ translateExpr c ++ ")"
translatePat :: PyPat -> String
translatePat (VarPat s) = s
translatePat IgnorePat = "_"
translatePat (TuplePat ps) =
"(" ++ intercalate "," (map translatePat ps) ++ ")"
translatePat (AccessPat e es) = translateExpr (Access e es)
translate :: [PyStmt] -> String
translate = intercalate "\n" . concatMap translateStmt

View File

@ -0,0 +1,509 @@
---
title: A Language for an Assignment - Homework 1
date: 2019-12-27T23:27:09-08:00
tags: ["Haskell", "Python", "Algorithms"]
---
On a rainy Oregon day, I was walking between classes with a group of friends.
We were discussing the various ways to obfuscate solutions to the weekly
homework assignments in our Algorithms course: replace every `if` with
a ternary expression, use single variable names, put everything on one line.
I said:
> The
{{< sidenote "right" "chad-note" "chad" >}}
This is in reference to a meme, <a href="https://knowyourmeme.com/memes/virgin-vs-chad">Virgin vs Chad</a>.
A "chad" characteristic is masculine or "alpha" to the point of absurdity.
{{< /sidenote >}} move would be to make your own, different language for every homework assignment.
It was required of us to use
{{< sidenote "left" "python-note" "Python" >}}
A friend suggested making a Haskell program
that generates Python-based interpreters for languages. While that would be truly
absurd, I'll leave <em>this</em> challenge for another day.
{{< /sidenote >}} for our solutions, so that was the first limitation on this challenge.
Someone suggested to write the languages in Haskell, since that's what we used
in our Programming Languages class. So the final goal ended up:
* For each of the 10 homework assignments in CS325 - Analysis of Algorithms,
* Create a Haskell program that translates a language into,
* A valid Python program that works (nearly) out of the box and passes all the test cases.
It may not be worth it to create a whole
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
A general purpose language is one that's designed to be used in various
domains. For instance, C++ is a general-purpose language because it can
be used for embedded systems, GUI programs, and pretty much anything else.
This is in contrast to a domain-specific language, such as Game Maker Language,
which is aimed at a much narrower set of uses.
{{< /sidenote >}} language for each problem,
but nowhere in the challenge did we say that it had to be general-purpose. In
fact, some interesting design thinking can go into designing a domain-specific
language for a particular assignment. So let's jump right into it, and make
a language for the first homework assignment.
### Homework 1
There are two problems in Homework 1. Here they are, verbatim:
{{< codelines "text" "cs325-langs/hws/hw1.txt" 32 38 >}}
And the second:
{{< codelines "text" "cs325-langs/hws/hw1.txt" 47 68 >}}
We want to make a language __specifically__ for these two tasks (one of which
is split into many tasks). What common things can we isolate? I see two:
First, __all the problems deal with lists__. This may seem like a trivial observation,
but these two problems are the __only__ thing we use our language for. We have
list access,
{{< sidenote "right" "filterting-note" "list filtering" >}}
Quickselect is a variation on quicksort, which itself
finds all the "lesser" and "greater" elements in the input array.
{{< /sidenote >}} and list creation. That should serve as a good base!
If you squint a little bit, __all the problems are recursive with the same base case__.
Consider the first few lines of `search`, implemented naively:
```Python
def search(xs, k):
if xs == []:
return false
```
How about `sorted`? Take a look:
```Python
def sorted(xs):
if xs == []:
return []
```
I'm sure you see the picture. But it will take some real mental gymnastics to twist the
rest of the problems into this shape. What about `qselect`, for instance? There's two
cases for what it may return:
* `None` or equivalent if the index is out of bounds (we give it `4` an a list `[1, 2]`).
* A number if `qselect` worked.
The test cases never provide a concrete example of what should be returned from
`qselect` in the first case, so we'll interpret it like
{{< sidenote "right" "undefined-note" "undefined behavior" >}}
For a quick sidenote about undefined behavior, check out how
C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture function</a>.
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
but functions that don't terminate are undefined behavior. There's only one other way the function
returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
{{< /sidenote >}} in C++:
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
This makes this base case valid:
```Python
def qselect(xs, k):
if xs == []:
return []
```
"Oh yeah, now it's all coming together." With one more observation (which will come
from a piece I haven't yet shown you!), we'll be able to generalize this base case.
The observation is this section in the assignment:
{{< codelines "text" "cs325-langs/hws/hw1.txt" 83 98 >}}
The real key is the part about "returning the `[]` where x should be inserted". It so
happens that when the list given to the function is empty, the number should be inserted
precisely into that list. Thus:
```Python
def _search(xs, k):
if xs == []:
return xs
```
The same works for `qselect`:
```Python
def qselect(xs, k):
if xs == []:
return xs
```
And for sorted, too:
```Python
def sorted(xs):
if xs == []:
return xs
```
There are some functions that are exceptions, though:
```Python
def insert(xs, k):
# We can't return early here!
# If we do, we'll never insert anything.
```
Also:
```Python
def search(xs, k):
# We have to return true or false, never
# an empty list.
```
So, whenever we __don't__ return a list, we don't want to add a special case.
We arrive at the following common base case: __whenever a function returns a list, if its first argument
is the empty list, the first argument is immediately returned__.
We've largely exhasuted the conclusiosn we can draw from these problems. Let's get to designing a language.
### A Silly Language
Let's start by visualizing our goals. Without base cases, the solution to `_search`
would be something like this:
{{< codelines "text" "cs325-langs/sols/hw1.lang" 11 14 >}}
Here we have an __`if`-expression__. It has to have an `else`, and evaluates to the value
of the chosen branch. That is, `if true then 0 else 1` evaluates to `0`, while
`if false then 0 else 1` evaluates to `1`. Otherwise, we follow the binary tree search
algorithm faithfully.
Using this definition of `_search`, we can define `search` pretty easily:
{{< codelines "text" "cs325-langs/sols/hw1.lang" 17 17 >}}
Let's use Haskell's `(++)` operator for concatentation. This will help us understand
when the user is operating on lists, and when they're not. With this, `sorted` becomes:
{{< codelines "text" "cs325-langs/sols/hw1.lang" 16 16 >}}
Let's go for `qselect` now. We'll introduce a very silly language feature for this
problem:
{{< sidenote "right" "selector-note" "list selectors" >}}
You've probably never heard of list selectors, and for a good reason:
this is a <em>terrible</em> language feature. I'll go in more detail
later, but I wanted to make this clear right away.
{{< /sidenote >}}. We observe that `qselect` aims to partition the list into
other lists. We thus add the following pieces of syntax:
```
~xs -> {
pivot <- xs[rand]!
left <- xs[#0 <= pivot]
...
} -> ...
```
There are three new things here.
1. The actual "list selector": `~xs -> { .. } -> ...`. Between the curly braces
are branches which select parts of the list and assign them to new variables.
Thus, `pivot <- xs[rand]!` assigns the element at a random index to the variable `pivot`.
the `!` at the end means "after taking this out of `xs`, delete it from `xs`". The
syntax {{< sidenote "right" "curly-note" "starts with \"~\"" >}}
An observant reader will note that there's no need for the "xs" after the "~".
The idea was to add a special case syntax to reference the "selected list", but
I ended up not bothering. So in fact, this part of the syntax is useless.
{{< /sidenote >}} to make it easier to parse.
2. The `rand` list access syntax. `xs[rand]` is a special case that picks a random
element from `xs`.
3. The `xs[#0 <= pivot]` syntax. This is another special case that selects all elements
from `xs` that match the given predicate (where `#0` is replaced with each element in `xs`).
The big part of qselect is to not evaluate `right` unless you have to. So, we shouldn't
eagerly evaluate the list selector. We also don't want something like `right[|right|-1]` to evaluate
`right` twice. So we settle on
{{< sidenote "right" "lazy-note" "lazy evaluation" >}}
Lazy evaluation means only evaluating an expression when we need to. Thus,
although we might encounter the expression for <code>right</code>, we
only evaluate it when the time comes. Lazy evaluation, at least
the way that Haskell has it, is more specific: an expression is evaluated only
once, or not at all.
{{</ sidenote >}}.
Ah, but the `!` marker introduces
{{< sidenote "left" "side-effect-note" "side effects" >}}
A side effect is a term frequently used when talking about functional programming.
Evaluating the expression <code>xs[rand]!</code> doesn't just get a random element,
it also changes <em>something else</em>. In this case, that something else is
the <code>xs</code> list.
{{< /sidenote >}}. So we can't just evaluate these things all willy-nilly.
So, let's make it so that each expression in the selector list requires the ones above it. Thus,
`left` will require `pivot`, and `right` will require `left` and `pivot`. So,
lazily evaluated, ordered expressions. The whole `qselect` becomes:
{{< codelines "text" "cs325-langs/sols/hw1.lang" 1 9 >}}
We've now figured out all the language constructs. Let's start working on
some implementation!
#### Implementation
It would be silly of me to explain every detail of creating a language in Haskell
in this post; this is neither the purpose of the post, nor is it plausible
to do this without covering monads, parser combinators, grammars, abstract syntax
trees, and more. So, instead, I'll discuss the _interesting_ parts of the
implementation.
##### Temporary Variables
Our language is expression-based, yes. A function is a single,
arbitrarily complex expression (involving `if/else`, list
selectors, and more). So it would make sense to translate
a function to a single, arbitrarily complex Python expression.
However, the way we've designed our language makes it
not-so-suitable for converting to a single expression! For
instance, consider `xs[rand]`. We need to compute the list,
get its length, generate a random number, and then access
the corresponding element in the list. We use the list
here twice, and simply repeating the expression would not
be very smart: we'd be evaluating twice. So instead,
we'll use a variable, assign the list to that variable,
and then access that variable multiple times.
To be extra safe, let's use a fresh temporary variable
every time we need to store something. The simplest
way is to simply maintain a counter of how many temporary
variables we've already used, and generate a new variable
by prepending the word "temp" to that number. We start
with `temp0`, then `temp1`, and so on. To keep a counter,
we can use a state monad:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}}
Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
For now, all we have to worry about is the second element of the tuple,
the integer counting how many temporary variables we've used. We can
get the current temporary variable as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}}
We can also get a fresh temporary variable like this:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}}
Now, the
{{< sidenote "left" "" "code" >}}
Since we are translating an expression, we must have the result of
the translation yield an Python expression we can use in generating
larger Python expressions. However, as we've seen, we occasionally
have to use statements. Thus, the <code>translateExpr</code> function
returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
{{< /sidenote >}}for generating a random list access looks like
{{< sidenote "right" "ast-note" "this:" >}}
The <code>Py.*</code> constructors are a part of a Python AST module I quickly
threw together. I won't showcase it here, but you can always look at the
source code for the blog (which includes this project)
<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
{{< /sidenote >}}
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}}
##### Implementing "lazy evaluation"
Lazy evaluation in functional programs usually arises from
{{< sidenote "right" "graph-note" "graph reduction" >}}
Graph reduction, more specifically the <em>Spineless,
Tagless G-machine</em> is at the core of the Glasgow Haskell
Compiler (GHC). Simon Peyton Jones' earlier book,
<em>Implementing Functional Languages: a tutorial</em>
details an earlier version of the G-machine.
{{< /sidenote >}}. However, Python is neither
functional nor graph-based, and we only lazily
evaluate list selectors. Thus, we'll have to do
some work to get our lazy evaluation to work as we desire.
Here's what I came up with:
1. It's difficult to insert Python statements where they are
needed: we'd have to figure out in which scope each variable
has already been declared, and in which scope it's yet
to be assigned.
2. Instead, we can use a Python dictionary, called `cache`,
and store computed versions of each variable in the cache.
3. It's pretty difficult to check if a variable
is in the cache, compute it if not, and then return the
result of the computation, in one expression. This is
true, unless that single expression is a function call, and we have a dedicated
function that takes no arguments, computes the expression if needed,
and uses the cache otherwise. We choose this route.
4. We have already promised that we'd evaluate all the selected
variables above a given variable before evaluating the variable
itself. So, each function will first call (and therefore
{{< sidenote "right" "force-note" "force" >}}
{{< todo >}}Explain forcing{{< /todo >}}
{{< /sidenote >}}) the functions
generated for variables declared above the function's own variable.
5. To keep track of all of this, we use the already-existing state monad
as a reader monad (that is, we clear the changes we make to the monad
after we're done translating the list selector). This is where the `Map.Map String [String]`
comes from.
The `Map.Map String [String]` keeps track of variables that will be lazily computed,
and also of the dependencies of each variable (the variables that need
to be access before the variable itself). We compute such a map for
each selector as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}}
We update the existing map using `Map.union`:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}}
And, after we're done generating expressions in the body of this selector,
we clear it to its previous value `vs`:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}}
We generate a single selector as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}}
This generates a function definition statement, which we will examine in
generated Python code later on.
Solving the problem this way also introduces another gotcha: sometimes,
a variable is produced by a function call, and other times the variable
is just a Python variable. We write this as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}}
##### Special Case Insertion
This is a silly language for a single homework assignment. I'm not
planning to implement Hindley-Milner type inference, or anything
of that sort. For the purpose of this language, things will be
either a list, or not a list. And as long as a function __can__ return
a list, it can also return the list from its base case. Thus,
that's all we will try to figure out. The checking code is so
short that we can include the whole snippet at once:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}}
`mergePossibleType`
{{< sidenote "right" "bool-identity-note" "figures out" >}}
An observant reader will note that this is just a logical
OR function. It's not, however, good practice to use
booleans for types that have two constructors with no arguments.
Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
Elm-based article</a> about this, which the author calls the
Boolean Identity Crisis.
{{< /sidenote >}}, given two possible types for an
expression, the final type for the expression.
There's only one real trick to this. Sometimes, like in
`_search`, the only time we return something _known_ to be a list, that
something is `xs`. Since we're making a list manipulation language,
let's __assume the first argument to the function is a list__, and
__use this information to determine expression types__. We guess
types in a very basic manner otherwise: If you use the concatenation
operator, or a list literal, then obviously we're working on a list.
If you're returning the first argument of the function, that's also
a list. Otherwise, it could be anything.
My Haskell linter actually suggested a pretty clever way of writing
the whole "add a base case if this function returns a list" code.
Check it out:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}}
Specifically, look at the line with `let fastReturn = ...`. It
uses a list comprehension: we take a parameter `p` from the list of
parameter `ps`, but only produce the statements for the base case
if the possible type computed using `p` is `List`.
### The Output
What kind of beast have we created? Take a look for yourself:
```Python
def qselect(xs,k):
if xs==[]:
return xs
cache = {}
def pivot():
if ("pivot") not in (cache):
cache["pivot"] = xs.pop(0)
return cache["pivot"]
def left():
def temp2(arg):
out = []
for arg0 in arg:
if arg0<=pivot():
out.append(arg0)
return out
pivot()
if ("left") not in (cache):
cache["left"] = temp2(xs)
return cache["left"]
def right():
def temp3(arg):
out = []
for arg0 in arg:
if arg0>pivot():
out.append(arg0)
return out
left()
pivot()
if ("right") not in (cache):
cache["right"] = temp3(xs)
return cache["right"]
if k>(len(left())+1):
temp4 = qselect(right(), k-len(left())-1)
else:
if k==(len(left())+1):
temp5 = [pivot()]
else:
temp5 = qselect(left(), k)
temp4 = temp5
return temp4
def _search(xs,k):
if xs==[]:
return xs
if xs[1]==k:
temp6 = xs
else:
if xs[1]>k:
temp8 = _search(xs[0], k)
else:
temp8 = _search(xs[2], k)
temp6 = temp8
return temp6
def sorted(xs):
if xs==[]:
return xs
return sorted(xs[0])+[xs[1]]+sorted(xs[2])
def search(xs,k):
return len(_search(xs, k))!=0
def insert(xs,k):
return _insert(k, _search(xs, k))
def _insert(k,xs):
if k==[]:
return k
if len(xs)==0:
temp16 = xs
temp16.append([])
temp17 = temp16
temp17.append(k)
temp18 = temp17
temp18.append([])
temp15 = temp18
else:
temp15 = xs
return temp15
```
It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
Even to get this code, I had to come up with hacks __in a language I created__.
The first is the hack is to make the `qselect` function use the `xs == []` base
case. This doesn't happen by default, because `qselect` doesn't return a list!
To "fix" this, I made `qselect` return the number it found, wrapped in a
list literal. This is not up to spec, and would require another function
to unwrap this list.
While `qselect` was struggling with not having the base case, `insert` had
a base case it didn't need: `insert` shouldn't return the list itself
when it's empty, it should insert into it! However, when we use the `<<`
list insertion operator, the language infers `insert` to be a list-returning
function itself, inserting into an empty list will always fail. So, we
make a function `_insert`, which __takes the arguments in reverse__.
The base case will still be generated, but the first argument (against
which the base case is checked) will be a number, so the `k == []` check
will always fail.
That concludes this post. I'll be working on more solutions to homework
assignments in self-made languages, so keep an eye out!

View File

@ -0,0 +1,65 @@
---
title: Compiling a Functional Language Using C++, Part 9 - Polymorphism
date: 2019-12-09T23:26:46-08:00
tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
---
Last time, we wrote some pretty interesting programs in our little language.
We successfully expressed arithmetic and recursion. But there's one thing
that we cannot express in our language without further changes: an `if` statement.
Suppose we didn't want to add a special `if/else` expression into our language.
Thanks to lazy evaluation, we can express it using a function:
```
defn if c t e = {
case c of {
True -> { t }
False -> { e }
}
}
```
But an issue still remains: so far, our compiler remains __monomorphic__. That
is, a particular function can only have one possible type for each one of its
arguments. With our current setup, something like this
{{< sidenote "right" "if-note" "would not work:" >}}
In a polymorphically typed language, the inner <code>if</code> would just evaluate to
<code>False</code>, and the whole expression to 3.
{{< /sidenote >}}
```
if (if True False True) 11 3
```
This is because, for this to work, both of the following would need to hold (borrowing
some of our notation from the [typechecking]({{< relref "03_compiler_typechecking.md" >}}) post):
$$
\\text{if} : \\text{Int} \\rightarrow \\text{Int}
$$
$$
\\text{if} : \\text{Bool} \\rightarrow \\text{Bool}
$$
But using our rules so far, such a thing is impossible, since there is no way for
\\(\text{Int}\\) to be unified with \\(\text{Bool}\\). We need a more powerful
set of rules to describe our program's types. One such set of rules is
the [Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system),
which we have previously alluded to. In fact, the rules we came up
with were already very close to Hindley-Milner, with the exception of two:
__generalization__ and __instantiation__. Instantiation first:
$$
\frac
{\\Gamma \\vdash e : \\sigma \\quad \\sigma' \\sqsubseteq \\sigma}
{\\Gamma \\vdash e : \\sigma'}
$$
Next, generalization:
$$
\frac
{\\Gamma \\vdash e : \\sigma \\quad \\alpha \\not \\in \\text{free}(\\Gamma)}
{\\Gamma \\vdash e : \\forall a . \\sigma}
$$

View File

@ -1,7 +1,9 @@
@import "style.scss";
$sidenote-width: 350px;
$sidenote-offset: 15px;
$sidenote-width: 30rem;
$sidenote-offset: 1.5rem;
$sidenote-padding: 1rem;
$sidenote-highlight-border-width: .2rem;
.sidenote {
&:hover {
@ -11,15 +13,16 @@ $sidenote-offset: 15px;
}
.sidenote-content {
border: 2px dashed;
padding: 9px;
border: $sidenote-highlight-border-width dashed;
padding: $sidenote-padding -
($sidenote-highlight-border-width - $standard-border-width);
border-color: $primary-color;
}
}
}
.sidenote-label {
border-bottom: 2px solid $primary-color;
border-bottom: .2rem solid $primary-color;
}
.sidenote-checkbox {
@ -30,7 +33,7 @@ $sidenote-offset: 15px;
display: block;
position: absolute;
width: $sidenote-width;
margin-top: -1.5em;
margin-top: -1.5rem;
&.sidenote-right {
right: 0;
@ -45,8 +48,8 @@ $sidenote-offset: 15px;
@media screen and
(max-width: $container-width + 2 * ($sidenote-width + 2 * $sidenote-offset)) {
position: static;
margin-top: 10px;
margin-bottom: 10px;
margin-top: 1rem;
margin-bottom: 1rem;
width: 100%;
display: none;
@ -55,16 +58,16 @@ $sidenote-offset: 15px;
}
&.sidenote-left {
margin-left: 0px;
margin-left: 0rem;
}
&.sidenote-right {
margin-right: 0px;
margin-right: 0rem;
}
}
@include bordered-block;
padding: 10px;
padding: $sidenote-padding;
box-sizing: border-box;
text-align: left;
}

View File

@ -1,24 +1,28 @@
$container-width: 800px;
$container-width: 50rem;
$standard-border-width: .075rem;
$primary-color: #36e281;
$primary-color-dark: darken($primary-color, 10%);
$code-color: #f0f0f0;
$code-color-dark: darken($code-color, 10%);
$border-color: #bfbfbf;
$font-heading: "Lora", serif;
$font-body: "Raleway", serif;
$font-code: "Inconsolata", monospace;
$standard-border: 1px solid $border-color;
$standard-border: $standard-border-width solid $border-color;
@mixin bordered-block {
border: $standard-border;
border-radius: 2px;
border-radius: .2rem;
}
body {
font-family: $font-body;
font-size: 1.0em;
font-size: 1.0rem;
line-height: 1.5;
margin-bottom: 1em;
margin-bottom: 1rem;
text-align: justify;
}
@ -27,8 +31,8 @@ main {
}
h1, h2, h3, h4, h5, h6 {
margin-bottom: .1em;
margin-top: .5em;
margin-bottom: .1rem;
margin-top: .5rem;
font-family: $font-heading;
font-weight: normal;
text-align: left;
@ -49,7 +53,7 @@ code {
pre code {
display: block;
padding: 0.5em;
padding: 0.5rem;
overflow-x: auto;
background-color: $code-color;
}
@ -61,12 +65,12 @@ pre code {
box-sizing: border-box;
@media screen and (max-width: $container-width){
padding: 0em 1em 0em 1em;
padding: 0rem 1rem 0rem 1rem;
}
}
.button, input[type="submit"] {
padding: 0.5em;
padding: 0.5rem;
background-color: $primary-color;
border: none;
color: white;
@ -87,7 +91,7 @@ pre code {
nav {
background-color: $primary-color;
width: 100%;
margin: 1em 0px 1em 0px;
margin: 1rem 0rem 1rem 0rem;
}
nav a {
@ -110,7 +114,7 @@ nav a {
}
.post-content {
margin-top: .5em;
margin-top: .5rem;
}
h1 {