Grammar pass

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
2026-04-05 16:09:27 -07:00
parent 767545dda4
commit aabbc66bb2
2 changed files with 13 additions and 13 deletions

View File

@@ -26,14 +26,14 @@ the latter was unpleasant, making me constantly break from the prose
In the end, I decided to underline the words, and come back to them later.
However, even then, the task is fairly arduous. For one, words I don't recognize
aren't always in their canonical form (they can conjugated, plural, compound,
aren't always in their canonical form (they can be conjugated, plural, compound,
and more): I have to spend some time deciphering what I should add to a
flashcard. For another, I had to bounce between a PDF of my book
(from where, fortunately, I can copy-paste) and my computer. Often, a word
confused the translation software out of context, so I had to copy more of the
surrounding text. Finally, I learned that given these limitations, the pace of
my reading far exceeds the rate of my translation. This led me to underline
less words.
fewer words.
I thought,
@@ -60,10 +60,10 @@ interleaved with the technical details.
### The Core Solution
The core idea has always been:
1. Find thing that look like underlines
1. Find things that look like underlines
2. See which words they correspond to
3. Perform {{< sidenote "right" "lemmatization-node" "lemmatization" >}}
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">wikipedia</a>) is the
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">Wikipedia</a>) is the
process of turning non-canonical forms of words (like <code>am</code> (eng) /
<code>suis</code> (fr)) into their canonical form which might be found in the
dictionary (<code>to be</code> / <code>être</code>).
@@ -182,7 +182,7 @@ plenty of Flask applications in Codex's training dataset. In one shot,
it generated a little web application that enabled me to tweak the source word
and final translation. It also enabled me to throw away certain underlines.
This was useful when, across different sessions, I forgot and underlined
the same word, or when I underlined a word but later decided it not worth
the same word, or when I underlined a word but later decided it was not worth
including in my studying. This application produced an Anki deck, using
the Python library [`genanki`](https://github.com/kerrickstaley/genanki).
Anki has a nice mechanism to de-duplicate decks, which meant that every