Deeply Learned Typography

Tom7 creates the uppestcase and lowestcase letters by training two deep learning models: one to create uppercase letters from lowercase ones, and the other to create lowercase letters from uppercase ones. Then he pushes things (beyond) their logical conclusions, such as creating lowercase versions of lowercase letters, and uppercase versions of uppercase letters. The results are fonts you can download and “use.” In the process, he builds a lot of really neat custom UIs to visualize what the models are doing. It’s entertaining and interesting.

Via Macdrifter.

Two loaves of Ken Forkish’s Overnight Country Blonde. After a 15h bulk rise at 23ºC, the dough had almost quadrupled in volume. Much more than usual. The final bread is a tad flat. I think it’s because it ran out of energy, but maybe I could have made deeper cuts.

Publishing datasette to Google Cloud Compute with GitHub Actions

Simon Willison has a fascinating data-publishing and data-management project named datasette. A few months ago, he put together a plugin named datasette-ripgrep that uses ripgrep (you use ripgrep, right?) to search folders of files and display the results using datasette’s machinery.

I thought of creating a datasette-ripgrep instance to search all the packages from the Enthought Tool Suite. Using GitHub to search across this cohesive set of tools, and only this set of tools, doesn’t really work.

Setting datasette-ripgrep up locally turned out to be pretty easy. But publishing it to Google Cloud Compute (GCP) using GitHub Actions so I could automate the daily the content of the indexes repositories turned out to be a multi-month effort.

I started working off the demo deploy action which took me most of the way there. But I kept running into GCP authentication issues. It complained that “No credentials provided, skipping authentication”. That is, until I realized 2 months later (of on-and-off attempts) that I was putting GitHub secrets in Settings > Environment > Secrets, and not in Settings > Secrets. *slaps forehead* I’m sure actions can see secrets in the Environment section somehow, but I don’t know how. Another thing I learned is that when the GCP docs ask you to put the service account key in a GitHub secrets, you can just paste the whole JSON as-is.

The next hurdle was that the datasette publish cloudrun command would fail with the error “You do not appear to have access to project […]“. I tried many things related to IAM, role, service accounts and the likes, but without success. The ah ha! moment came when I realized/remembered that datasette.publish.cloudrun actually talks to GCP using the gcloud command line tool. I identified that it calls the builds and deploy subcommands. Using that information I could make searches to figure out which permissions were required to execute those commands. The one I was missing was Cloud Build Editor (and maybe Viewer).

In the end, the Service Account has the following roles (I’m not 100% sure they’re all necessary):

Cloud Build Editor
Compute Engine Service Agent
Service Account User
Cloud Run Admin
Storage Admin
Viewer

After 100 failed deploys and much reading of mediocre Medium articles and of Google’s (seemingly) incomplete and incorrect READMEs, the 101th deploy succeeded! You can now search the ETS repos at the very unglamorous URL of https://datasette-ripgrep-ets-alicuzwd4a-uc.a.run.app and see the source on GitHub.

The discussion on symbiotic relationships between apps on the latest Core Intuition by @manton and @danielpunkass reminded me of Eastgate’s SummerFest/WinterFest for “artisanal” research/thinking/writing apps. It’s a great example of a collection of complementary apps.

What is the maximum one can be doing with someone else and still be hanging out?

Austin Kleon talks about the 13-month International Fixed Calendar. It reminds of the ISO week numbers that the Danes love so much. It always weirded me out when they ask something like “Are you free week 14?” But thinking back, they might have been onto something.

Despite all the damage done by the ice in Texas, it still created quite a bit of beauty (or at least novelty).

Icicle agave during the Texas winter storm.

Lessons Learned During My PhD

I shared these notes with my friends and colleagues at the Technical University of Denmark (DTU) when I finished my PhD in December 2015.

Here are things that I either learned during my PhD, or that helped me get through it. I in no way have all the answers, but there are a few things that helped me that I think would be helpful to other people as well.

I split the content in six different sections, meta-things, writing, reading, speaking, programming, and mind and body. I included links to articles, books, and tools that are related to each section.

Writing

Build an Outline When Writing a Paper

At first, I did not have a method for writing but then I read this article by Timothée Poisot, which really helped me. The main idea is to start building an outline as soon as you’re working on the new project. As you read papers, or have ideas, you add quotes, citations, and snippets to your outline. You can start writing sentences and whole paragraphs whenever you want. The big advantage is that all the relevant things you find for a given project are all together; no need to go hunting for where you read that thing, six months ago.

On the Mac, I use Tree², which I love. OmniOutliner is the most famous alternative, but it much more expensive. I don’t know about Windows or Linux. I know Microsoft Word has an outlining mode that could work. You could also outline in a normal Word document, or a text file, but good outliners give you many shortcuts to move things around and to insert new ideas. The new Manuscripts application for Mac looks amazing. Scrivener is an amazing writing tool for both Mac and Windows. It is particularly good for large projects.

Other good resources include:

Scientific / Academic Paper Writing Template | ORGANIZING CREATIVITY
Daniel Wessel: Using Content Outlines and Circus Ponies Notebooks for Writing Articles and Theses | Butler Library Blog explains in even more details the idea of building an outline when writing.

Get Good Early and Write More

Do not despair, you will get better at writing! I don’t really know how to do it, but the earlier you get good the easier your life will be. :-) A possible way to get better is certainly to write more. Maybe having a writing club would help. There are many good books about writing. I read How to write a lot, by Paul J. Silvia and really liked it.

Here are a few articles about writing:

10 writing tips and the psychology behind them.
How to Be a Speed Writer supports the outlining idea.
Advice on Research and Writing is a collection of other articles on research, writing, speaking, etc.

Reading

Make Time

You never find time, you’ll have to make some. Personally I found out that I could read an article on the bus and while walking. When I needed to read many article, I would take the bus mornings and nights for a week. That way I could read about 10 articles in week. It helped me to keep a stack of papers that I wanted to read on my desk; I could grab one whenever had a moment.

Read With a Goal

I find that it’s much easier to understand and remember papers if I read with a purpose, as a way of answering a question. Also, if that question is related to what you’re currently working on, you can add your discoveries in the outline of your paper.

Give Faces to Names

I found that being able to associate a face with an author name helped me remember who was talking about what and made it easier to remember different papers and ideas.

Take Notes and Highlights

That one is pretty obvious, but writing down notes and ideas while reading papers really helps. It helps while reading the paper but also later when coming back to it. I designed a highlight system for myself based on the idea of Walton Jones. I used specific colors to code different things, for example paper-specific results were yellow, new references in green, and paper summaries in red. This way could look back at a paper and have a general overview of what was important without reading the whole paper again.

On the Mac, Skim allows you to export notes including their color. It also works well with DEVONthink (see below).

Have a Good “Personal Search Engine”

I recommend having a good search system for your notes and papers. On the Mac I use an application called DEVONthink which can show you documents related to the one you’re currently reading. It also does non-exact searches, e.g., it can search for the meaning of your query, not just the exact words you typed. It really is magic. I am not aware of anything exactly the same on Windows or Linux, but The Brain (Windows) and Recoll might be good places to start.

Give Your Articles Unique Identifiers

I gave all my articles the same file name structure, authorYEARfirstword, e.g. chabotleclerc2014predicting. I picked this format because it has no space and it’s also very short and compact. It’s essentially a unique identifier. I used the identifier everywhere: in handwritten notes, computer notes, outlines, marginalia. This way, I knew exactly which paper I was talking about.

Remembering Things

I tried to use spaced repetition (also see this link) for a while but it didn’t stick for really long. Anki is probably the best tool for the job.

Things That Didn’t Work

Having a Wiki

Based on the idea of this crazy guy, Stian Håklev, I tried setting up a wiki using either DokuWiki or TiddlyWiki. I managed to keep it updated for about a month and then abandoned it. It required a lot of maintenance and a lot of diligence to keep up with the linking between subjects.

2021 update: These days, I would strongly recommend looking into Zettelkasten as an alternative.

A Single Giant Outline, Mind Map or Concept Map

I tried building a gigantic outline or mind map or concept map of all the things I learned, but it became unmanageable. I still find these methods really useful when fleshing out an idea for a project or paper, but they were not really good tools for me to manage such a large amount of knowledge. If you want to try these tools, on the Mac I recommend MindNode (simple and cheap), and iThoughtsX (more powerful, more expensive). MindMeister is online and very popular. Docear mixes an outliner, a reference manager (JabRef) and an outliner. It’s power and complex. If you want to know everything about mind mapping and the best tools, check out Brett Terpstra’s blog.

For concept maps, I love Scapple, which works on Mac and Windows. Cmap is also multi-platform, but heavier. I use them when I want to find structure in a mess of ideas floating in my head.

Speaking

This series of articles by David L Stern really influenced me: How to Give a Talk. He suggests 5 rules³:

Don’t put words on slides
Use black slides
Show your data
Don’t tell jokes
Don’t take a data dump on your audience
Practice, practice, practice.

Of course, some rules are meant to be broken, but it’s worth a read. I also liked the book Presentation Zen: Simple Ideas on Presentation Design and Delivery by Garr Reynolds.

Programming

When Modeling, Optimize for a Metric

When working on a new model, first decide what you’re optimizing for, is it a correlation, the mean square error, or some other metric. Make sure this metric is computed automatically with each simulation, so you have immediate feedback about how good your model is. Just eyeballing the results is a really bad idea.

D.R.Y.

Don’t. Repeat. Yourself. When you see that you have the same code in multiple places, it’s a good hint that you need to refactor it into a function, and to call that function. Also, if your function is hundred lines long it’s probably a sign that he should be chopped into smaller functions that have clear names.

If you’re interested in becoming a better programmer, I recommend The Pragmatic Programmer, by Andrew Hunt and David Thomas, Clean Code and Code Complete.

Write Tests

You should write tests for your functions to make sure that they are actually doing what you think they are doing. They help make sure that code is correct, they help detecting regressions (when you break things that used to work), they act as a specification, design and documentation, and they make refactoring easier.

I cannot tell you how many times I found small and huge bugs because of the tests I wrote. All decent programming languages, including Matlab include a framework to automatically run tests. The words you want to look for are “unit testing”. You should run tests every time you make changes. You can even set Git to run the tests before every commit. This way you can make sure you never commit broken code.

The advantage of working with a free language (Python!) and in the open is that you can use “continuous integration” system to have them run tests every time you push new changes to GitHub (or BitBucket). Travis CI is one of those services.

These articles or sites are worth reading for info about testing.

Test-Driven Data Analysis. Start here.
Is your research software correct, by Mike Croucher, a Research Software Engineer at The University of Sheffield.

Don’t Overuse Code Comments

Code comments are useful to explain why you’re doing something, but they should not explain every single line of code. Otherwise one day you will change the comment or the code and then one of them will be lying and you won’t be able to know which one.

Instead, use explicit names for variables and functions. Write function help files. Write documentation. Write README files. And break up long and complicated lines of code into smaller ones where you can name concepts and variables.

Use Version Control

Version control is a communication tool and a collaboration tool. It’s invaluable even if you don’t work with someone else, I can assure you that your future self will be really happy to know what your past self was thinking when he/she wrote that code/text. I heard your future self can get really upset. :-)

Track Where Stuff Comes From

Track where experiment results and figures come from. You can do it by hand, by keeping your log book and tracking output folders and inputs commits, but you can also use automated solutions. I used Sumatra, which is written in Python but can be used with Matlab. Every time you run a simulation it saves to a database: the experimental parameter used, the input files, and the command line outputs. It also allows you to comment and tag results. You can look back and see where things are from, what worked and what didn’t, and rerun experiments with exactly the same parameters. A provenance-tracking tool like Sumatra paired with version control is a good step towards making reproducible research. Recipy is a new Python-only solution that looks interesting.

Follow “Best practices”

Some very bright people have written “Best practices” for scientific computing. It’s worth reading the two or three articles below and applying their recommendations. I also recommend looking at this shablona project, which proposes a standard folder structure for a given project, I found it to work really well. It’s written with Python in mind but it could easily be adapted for Matlab.

Best Practices for Scientific Computing
Good Enough Practices for Scientific Computing is a follow-up article.

Learn about literate programming, a useful concept where the code and the text (of, e.g., an article) live together. This can be done in Matlab using the publish command. In Python, the Jupyter notebook is the way to go. Actually, you can use the Jupyter notebook with Matlab.

If you want to do more than just read things, you should try to organizing a Software Carpentry workshop. It’s pretty much free and would probably help a lot of people.

Your Computer Should Work for You

If you find out you’re doing the same thing over and over again it might be worth automating it. On the Mac, Keyboard Maestro can do magic (like activating menu items for you, clicking places, moving windows, etc.) On Windows, AutoHotkey can do similar things, and on Linux, I found AutoKey to be the best.

Use a shortcut expander application. For example, I never type “intelligibility”, I just type inty and it expands to the full word. Same thing with my name, my bank account, my phone number, etc. On the Mac, Keyboard Maestro and TextExpander are the best. On Windows, PhraseExpress is compatible with TextExpander; you can sync your snippets via Dropbox. On Linux, Autokey can also do snippet expansions.

Mind and body

Take Care of Your Body

Seriously, take care of your body. If you start having wrist, forearm, or shoulder pain, don’t just live with it. Seek help and get proper a keyboard, mouse and chair. Learn some basic ergonomic practices about keyboard, mouse and screen placement. See the articles below for more info and recommendations.

If you are in pain, really look for help: find a massage therapist, a doctor, and stretch often. Here are a few good ressources.

10 Tips for Using a Computer Mouse
A long list of ergonomic keyboards, by xahlee.
The Microsoft Sculpt is probably the most affordable ergonomic keyboard.
Kinesis makes amazing products. I have an Advantage keyboard and an Evoluent Vertical Mouse. The Advantage is amazing.
If you have any wrist pains, I (Alex) highly recommend the book It’s Not Carpal Tunnel Syndrome!: RSI Theory and Therapy for Computer Professional by Suparna Damany and Jack Belli. The exercises in it really helped me.

Spend Money Where Your Time Is

It’s worth paying for things that make your life better, more comfortable, and easier, especially if you use those things all day long. Buy a nice keyboard, you probably write eight or nine hours a day. Buy nice headphones, good software, a mattress, a bike, etc.

Ask for Help

Ask for help when you’re stuck. Ask for a meeting if you need one. Everyone is super busy, but everyone is super generous. People rarely say no, but they might say “later”. In a perfect world, everyone would be proactive, everyone would have their slot in the calendar, but sometimes the world is not perfect. Also talking to people if often waaaay faster than googling for an answer.

Make Every Day a Non-Zero Day

This idea is not from me, it is from ryans01 on Reddit (and it’s also in the PhD Starter Kit, below). His idea is to make sure that every day you do at least one small thing towards your goal. No day should be completely wasted, even if the thing you do is really small. Keeping a log or journal helps realizing all the things you’ve done in a day and keeping your mood up.

Other Guides Like This One

Other people have written really good and more in-depth guides than this. The PhD Starter Kit is simply amazing. Philip Guo’s Advice for first year PhD students is a must-read/must-watch (as well as many of his other articles. He also wrote: The Ph.D. Grind, a 115-page e-book, is the first known detailed account of an entire Ph.D. experience.” That guy is amazing.

Make it 13 years in 2021. ^[return]
Not available anymore, 2021-01-23. ^[return]
Yes, that 6 rules. And it 2021, it grew to 8 rules. ¯\(ツ)/¯ ^[return]

They could have used additional time during the bulk fermentation to account for the cold, but it’s still a good rise.

Follow @alexchabot on Micro.blog.