@@ -26,14 +26,14 @@ the latter was unpleasant, making me constantly break from the prose
|
||||
|
||||
In the end, I decided to underline the words, and come back to them later.
|
||||
However, even then, the task is fairly arduous. For one, words I don't recognize
|
||||
aren't always in their canonical form (they can conjugated, plural, compound,
|
||||
aren't always in their canonical form (they can be conjugated, plural, compound,
|
||||
and more): I have to spend some time deciphering what I should add to a
|
||||
flashcard. For another, I had to bounce between a PDF of my book
|
||||
(from where, fortunately, I can copy-paste) and my computer. Often, a word
|
||||
confused the translation software out of context, so I had to copy more of the
|
||||
surrounding text. Finally, I learned that given these limitations, the pace of
|
||||
my reading far exceeds the rate of my translation. This led me to underline
|
||||
less words.
|
||||
fewer words.
|
||||
|
||||
I thought,
|
||||
|
||||
@@ -60,10 +60,10 @@ interleaved with the technical details.
|
||||
### The Core Solution
|
||||
The core idea has always been:
|
||||
|
||||
1. Find thing that look like underlines
|
||||
1. Find things that look like underlines
|
||||
2. See which words they correspond to
|
||||
3. Perform {{< sidenote "right" "lemmatization-node" "lemmatization" >}}
|
||||
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">wikipedia</a>) is the
|
||||
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">Wikipedia</a>) is the
|
||||
process of turning non-canonical forms of words (like <code>am</code> (eng) /
|
||||
<code>suis</code> (fr)) into their canonical form which might be found in the
|
||||
dictionary (<code>to be</code> / <code>être</code>).
|
||||
@@ -182,7 +182,7 @@ plenty of Flask applications in Codex's training dataset. In one shot,
|
||||
it generated a little web application that enabled me to tweak the source word
|
||||
and final translation. It also enabled me to throw away certain underlines.
|
||||
This was useful when, across different sessions, I forgot and underlined
|
||||
the same word, or when I underlined a word but later decided it not worth
|
||||
the same word, or when I underlined a word but later decided it was not worth
|
||||
including in my studying. This application produced an Anki deck, using
|
||||
the Python library [`genanki`](https://github.com/kerrickstaley/genanki).
|
||||
Anki has a nice mechanism to de-duplicate decks, which meant that every
|
||||
|
||||
Reference in New Issue
Block a user