@@ -18,12 +18,12 @@ and, viewed from this perspective, I think the experience has been a
|
|||||||
colossal success.
|
colossal success.
|
||||||
|
|
||||||
As someone who works on software, I am always reminded that end-users rarely
|
As someone who works on software, I am always reminded that end-users rarely
|
||||||
care about the technology as much as us technologists; they care about
|
care about the technology as much as we technologists; they care about
|
||||||
having their problems solved. I find taking that perspective to be challenging
|
having their problems solved. I find taking that perspective to be challenging
|
||||||
(though valuable) because software is my craft, and because in thinking
|
(though valuable) because software is my craft, and because in thinking
|
||||||
about the solution, I have to think about the elements that bring it to life.
|
about the solution, I have to think about the elements that bring it to life.
|
||||||
|
|
||||||
With LLMs, I was able --- allowed? --- to view things more so from the
|
With LLMs, I was able --- allowed? --- to view things more from the
|
||||||
end-user perspective. I didn't know, and didn't need to know, the API
|
end-user perspective. I didn't know, and didn't need to know, the API
|
||||||
for `PyMuPDF`, `argostranslate`, or `spaCy`. I didn't need to understand
|
for `PyMuPDF`, `argostranslate`, or `spaCy`. I didn't need to understand
|
||||||
the PDF format. I could move one step away from the nitty-gritty and focus
|
the PDF format. I could move one step away from the nitty-gritty and focus
|
||||||
@@ -45,7 +45,7 @@ on code as a medium.
|
|||||||
There are two perspectives through which one may view software:
|
There are two perspectives through which one may view software:
|
||||||
as a craft in and of itself, and as a means to some end.
|
as a craft in and of itself, and as a means to some end.
|
||||||
My flashcard extractor can be viewed in vastly different ways when faced
|
My flashcard extractor can be viewed in vastly different ways when faced
|
||||||
from these two perspective. In terms of craft, I think that it is at best
|
from these two perspectives. In terms of craft, I think that it is at best
|
||||||
mediocre; most of the code is generated, slightly verbose and somewhat
|
mediocre; most of the code is generated, slightly verbose and somewhat
|
||||||
tedious. The codebase is far from inspiring, and if I had written it by hand,
|
tedious. The codebase is far from inspiring, and if I had written it by hand,
|
||||||
I would not be particularly proud of it. In terms of product, though,
|
I would not be particularly proud of it. In terms of product, though,
|
||||||
@@ -57,7 +57,7 @@ The truth is, the "builder vs. craftsman" distinction is a simplifying one,
|
|||||||
another in the long line of "us vs. them" classifications. Any one person is
|
another in the long line of "us vs. them" classifications. Any one person is
|
||||||
capable of being any combination of these two camps at any given time. Indeed,
|
capable of being any combination of these two camps at any given time. Indeed,
|
||||||
different sorts of software demand to be viewed through different lenses.
|
different sorts of software demand to be viewed through different lenses.
|
||||||
I will _still_ treat work on my long-term projects as craft, because
|
I will _still_ treat work on my long-term projects as a craft, because
|
||||||
I will come back to it again and again, and because our craft has evolved
|
I will come back to it again and again, and because our craft has evolved
|
||||||
to engender stability and maintainability.
|
to engender stability and maintainability.
|
||||||
|
|
||||||
@@ -93,11 +93,11 @@ I think that my flashcard generator is an early instance of such software.
|
|||||||
It doesn't worry about various book formats, or various languages, or
|
It doesn't worry about various book formats, or various languages, or
|
||||||
various page layouts. The heuristic was tweaked to fit my use case, and
|
various page layouts. The heuristic was tweaked to fit my use case, and
|
||||||
now works 100% of the time. I understand the software in its entirety.
|
now works 100% of the time. I understand the software in its entirety.
|
||||||
I thought about sharing it --- and, in way, I did, since it's
|
I thought about sharing it --- and, in a way, I did, since it's
|
||||||
[open source](https://dev.danilafe.com/DanilaFe/vocab-builder) --- but realized
|
[open source](https://dev.danilafe.com/DanilaFe/vocab-builder) --- but realized
|
||||||
that outside of the constraints of my own problem, it likely will not be
|
that outside of the constraints of my own problem, it likely will not be
|
||||||
of that much use. I _could_ experiment with more varied constraints, but
|
of that much use. I _could_ experiment with more varied constraints, but
|
||||||
that would turn in back into the sort of software I discussed above:
|
that would turn it back into the sort of software I discussed above:
|
||||||
general, robust, and complex.
|
general, robust, and complex.
|
||||||
|
|
||||||
Today, I think that there is a whole class of software that is amenable to
|
Today, I think that there is a whole class of software that is amenable to
|
||||||
@@ -112,7 +112,7 @@ if I had to give a rough heuristic, it would be problems that:
|
|||||||
etc. significantly raises the bar for quality.
|
etc. significantly raises the bar for quality.
|
||||||
* e.g., I collect flashcards once every two weeks;
|
* e.g., I collect flashcards once every two weeks;
|
||||||
I organize my filesystem once a month; I don't spend nearly enough money
|
I organize my filesystem once a month; I don't spend nearly enough money
|
||||||
to want to re-generate cash flow charts very often
|
to want to regenerate cash flow charts very often
|
||||||
* __have an "answer" that's relatively easy to assess__, because
|
* __have an "answer" that's relatively easy to assess__, because
|
||||||
LLMs are not perfect and iteration must be possible and easy.
|
LLMs are not perfect and iteration must be possible and easy.
|
||||||
* e.g., I can see that all the underlined words are listed in my web app;
|
* e.g., I can see that all the underlined words are listed in my web app;
|
||||||
@@ -137,7 +137,7 @@ with others --- that last one because they can just ask as well.
|
|||||||
#### The Unfair Advantage of Being Technical
|
#### The Unfair Advantage of Being Technical
|
||||||
I recognize that my success described here did not come for free. There
|
I recognize that my success described here did not come for free. There
|
||||||
were numerous parts of the process where my software background helped
|
were numerous parts of the process where my software background helped
|
||||||
get the most out of Codex.
|
me get the most out of Codex.
|
||||||
|
|
||||||
For one thing, writing software trains us to think precisely about problems.
|
For one thing, writing software trains us to think precisely about problems.
|
||||||
We learn to state exactly what we want, to decompose tasks into steps,
|
We learn to state exactly what we want, to decompose tasks into steps,
|
||||||
|
|||||||
@@ -26,14 +26,14 @@ the latter was unpleasant, making me constantly break from the prose
|
|||||||
|
|
||||||
In the end, I decided to underline the words, and come back to them later.
|
In the end, I decided to underline the words, and come back to them later.
|
||||||
However, even then, the task is fairly arduous. For one, words I don't recognize
|
However, even then, the task is fairly arduous. For one, words I don't recognize
|
||||||
aren't always in their canonical form (they can conjugated, plural, compound,
|
aren't always in their canonical form (they can be conjugated, plural, compound,
|
||||||
and more): I have to spend some time deciphering what I should add to a
|
and more): I have to spend some time deciphering what I should add to a
|
||||||
flashcard. For another, I had to bounce between a PDF of my book
|
flashcard. For another, I had to bounce between a PDF of my book
|
||||||
(from where, fortunately, I can copy-paste) and my computer. Often, a word
|
(from where, fortunately, I can copy-paste) and my computer. Often, a word
|
||||||
confused the translation software out of context, so I had to copy more of the
|
confused the translation software out of context, so I had to copy more of the
|
||||||
surrounding text. Finally, I learned that given these limitations, the pace of
|
surrounding text. Finally, I learned that given these limitations, the pace of
|
||||||
my reading far exceeds the rate of my translation. This led me to underline
|
my reading far exceeds the rate of my translation. This led me to underline
|
||||||
less words.
|
fewer words.
|
||||||
|
|
||||||
I thought,
|
I thought,
|
||||||
|
|
||||||
@@ -60,10 +60,10 @@ interleaved with the technical details.
|
|||||||
### The Core Solution
|
### The Core Solution
|
||||||
The core idea has always been:
|
The core idea has always been:
|
||||||
|
|
||||||
1. Find thing that look like underlines
|
1. Find things that look like underlines
|
||||||
2. See which words they correspond to
|
2. See which words they correspond to
|
||||||
3. Perform {{< sidenote "right" "lemmatization-node" "lemmatization" >}}
|
3. Perform {{< sidenote "right" "lemmatization-node" "lemmatization" >}}
|
||||||
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">wikipedia</a>) is the
|
Lemmatization (<a href="https://en.wikipedia.org/wiki/Lemmatization">Wikipedia</a>) is the
|
||||||
process of turning non-canonical forms of words (like <code>am</code> (eng) /
|
process of turning non-canonical forms of words (like <code>am</code> (eng) /
|
||||||
<code>suis</code> (fr)) into their canonical form which might be found in the
|
<code>suis</code> (fr)) into their canonical form which might be found in the
|
||||||
dictionary (<code>to be</code> / <code>être</code>).
|
dictionary (<code>to be</code> / <code>être</code>).
|
||||||
@@ -182,7 +182,7 @@ plenty of Flask applications in Codex's training dataset. In one shot,
|
|||||||
it generated a little web application that enabled me to tweak the source word
|
it generated a little web application that enabled me to tweak the source word
|
||||||
and final translation. It also enabled me to throw away certain underlines.
|
and final translation. It also enabled me to throw away certain underlines.
|
||||||
This was useful when, across different sessions, I forgot and underlined
|
This was useful when, across different sessions, I forgot and underlined
|
||||||
the same word, or when I underlined a word but later decided it not worth
|
the same word, or when I underlined a word but later decided it was not worth
|
||||||
including in my studying. This application produced an Anki deck, using
|
including in my studying. This application produced an Anki deck, using
|
||||||
the Python library [`genanki`](https://github.com/kerrickstaley/genanki).
|
the Python library [`genanki`](https://github.com/kerrickstaley/genanki).
|
||||||
Anki has a nice mechanism to de-duplicate decks, which meant that every
|
Anki has a nice mechanism to de-duplicate decks, which meant that every
|
||||||
|
|||||||
Reference in New Issue
Block a user