The quest for a better sweater pattern finder

6 min readDec 15, 2020

The upside of pursuing a passion project is that it’s an excuse to learn new things, dive headfirst into new libraries, structure your program however you’d like, and generally have fun. The downside of pursuing a passion project is that, for me at least, it seems like the scope is always substantially beyond than my actual skills. At the moment, I have a small thing that sort of works, and what I’ve mostly learned is how I want to do everything differently when I move from Sweater Getter v0 to v1.

Let’s talk about it.

The problem I’m trying to solve:

If you’re a knitter and not yet skilled enough to write your own patterns, which definitely describes me, finding a pattern for the sweater that you want to knit is. . . kind of a slog, if not a total pain. In the days before Ravelry’s advanced search, is was undoubtedly enough hassle that a lot of people probably either learned to draft their own patterns or gave up. Now, if I know enough about the pattern I want (i.e. an adult cardigan with modified drop sleeves and negative ease), I can filter the number of sweater patterns down from 140k to only hundreds or thousands of patterns. Did you know that there are a LOT of patterns on Ravelry for crewneck pullovers with interesting colorwork around the neck? So many. So very very many.

So what if we trained a neural network to recognize similar sweaters? I could then use it to take a picture of a sweater and recommend similar patterns from Ravelry!

It’s not a bad idea. It’s just not a very well-realized one yet.

Neural nets are smartest as one-trick ponies.

I’ve just finished listening to Janelle Shane’s You Look Like A Thing And I Love You, which is an absolute delight, and it helped reinforce a few things I’d already known about machine learning. The most important is that our current architectures are very much artificial narrow intelligences (ANI), not artificial general intelligences. And the more tightly you constrain the problem, the higher the performance ceiling of the ANI is likely to be. Neural nets have around as many connections as a worm’s brain does, and you can’t train a worm to drive a car. You can, however, train one worm to recognize common objects, another to understand how turning the wheel will change the trajectory at the current speed with the current road conditions, another to stay in its lane, etc..

My initial approach with Sweater Getter v0 was to find the eigensweaters present in the 250 most popular patterns on Ravelry. It shouldn’t have been surprising that there’s a lot of class imbalance between sorts of sweaters, meaning that I ended up with 1000 good pictures of crewneck sweaters with colorwork after my first pass but only 27 decent pictures of ballet neck sweaters of any kind. I kept on trucking with the 22 eigensweaters I’d identified, but the classification performance was never all that great. 65% accuracy is pretty good for 22 categories, but I struggled to get anything better out of the model.

Some of the problem was certainly the imbalanced dataset — with 1200 pictures out of the ~4000 total being crewneck sweaters with colorwork of some kind, my attempt at regularization (trying to keep the network from just memorizing the training data) led to a network that just called everything a colorwork crewneck sweater. However, a large part of the problem was also that I was trying to make a relatively small classification head on top of a pretrained feature extractor do far too much. There were plenty of photos where I had to squint in order to figure out if the color variation near the neckline was subtle colorwork or intricate cables. Some sweater photos scraped from the same pattern page got categorized as an unstructured cardigans while others were the collar was folded were obviously shawlneck cardigans. This is a hard problem for a human to solve; why am I surprised that an algorithm is also struggling?

Solution: separate classification heads. Train one to recognize necklines and collars, another to recognize overall shape of the sweater, another to recognize texture, and yet another to recognize colorwork. This is also helpful because it ups the number of photos I have for any one class of sweater — there are more librarian cardigans than there are librarian cardigans with stripes.

Training a neural network can take half of forever

Even using a pre-trained feature extractor, one epoch of training (one pass through all the training images) took 1.5–2 hours on Google colab. That really limited my ability to thoroughly train the classification head, in addition to my ability to iterate over different classification head architectures.

Solution part 1: run all the images through the feature extractor and save the resultant output tensors. Since this means some of the calculations are done once and saved, there aren’t as many calculations needed to train the classification heads and it won’t take as long. In turn, that means I can experiment more to find the best architecture and train the heads more thoroughly. Once I’ve got an architecture that’s performing well, I can fine-tune the top few layers of the feature extractor to get better performance. This will require figuring out how to send input to several different layers at the branching point, but that’s a problem for future me to tackle, and it is a soluble problem.

Bigger is both better and worse

Neural networks that have been trained on larger imagesets tend to perform better, particularly if the architecture is larger; see figure 5 of the Big Transfer paper for some support of this statement. However, that performance comes with some costs; a bigger network is going to take longer to go from input to output, all other things being equal, because it’s just more calculations to do; that’s to say nothing of the increased computational time to train the thing in the first place.

My best-performing sweater recognizer took minutes to go from image to output vector. I’m sure some people will be fine with letting that process run in the background if they’re warned that it’s going to take a while, but. . .

Solution/solution to the training speed part 2: find a zippier object identification pre-trained neural network to build on top of. It’s better to have fair-to-good results in under a minute than excellent results in tens of minutes in this context.

When training a machine learning algorithm, data acquisition will take much if not most of your time

I’m not going to tell you how much time I spent working on image scraping, eigensweater identification, and deleting not-so-useful-to-the-algorithm pictures like artful shots of skeins of yarn and loving closeups of cabling details on a pocket. This is partly because I didn’t track it but mostly because I really don’t want to try and calculate how many tens of hours of my life were dedicated to something that I could probably train a monkey to do.

Solution part 1: crowdsource pattern classification. The delightful success of SkyKnit shows that there are plenty of fiber artists out there who are happy to spend time on somebody’s weird AI project. I’m neither as hilarious nor as popular as Janelle Shane, but I’m hoping the potential usefulness of this project will convince some Ravelers to spend some time helping me out. (Some of you reading this may even be people who followed a link on a Ravelry discussion). This will involve me actually figuring out how the Ravelry API works so I can serve up random sweater patterns and deciding on vote thresholds for inclusion in the training data, but it’s a definite step forward from manually categorizing and labeling ALL THOSE SWEATER PHOTOS.

Solution part 2: train a neural net to do it. Jacques Mattheij trained a neural network to recognize and sort LEGO pieces, so why not train one to recognize useful-to-the-sweater-getter and not-useful-to-the-sweater-getter photos? I don’t expect it will be perfect, but scanning for less-useful pictures that got through is much easier than examining every photo in detail.

In conclusion, I’ve got a lot of delightful work to do

The upside of discovering with version 0 that you need to change almost everything about the project is that you can just. . . start over and start better. I can take the time to compare PyTorch with Keras+TF2. I can go slowly and make sure I’m coding well in VSCode rather than in a frenetic and impossible-to-replicate flurry of cells in Jupyter Lab (which is fabulous for experimentation and a terrible place for code to live long-term). I can actually talk to some other knitters and crocheters to get their feedback on what works best! I can learn some Javascript instead of blindly hacking together from example code. And there’s no grade at the end, just a potentially useful little web app.

There will probably be a number of update posts to come as I figure out various aspects of Sweater Getter v1, an actually function thing, coming sometime to a webpage you can access easily from where you are now.