Publishing datasette to Google Cloud Compute with GitHub Actions

Simon Willison has a fascinating data-publishing and data-management project named datasette. A few months ago, he put together a plugin named datasette-ripgrep that uses ripgrep (you use ripgrep, right?) to search folders of files and display the results using datasette’s machinery.

I thought of creating a datasette-ripgrep instance to search all the packages from the Enthought Tool Suite. Using GitHub to search across this cohesive set of tools, and only this set of tools, doesn’t really work.

Setting datasette-ripgrep up locally turned out to be pretty easy. But publishing it to Google Cloud Compute (GCP) using GitHub Actions so I could automate the daily the content of the indexes repositories turned out to be a multi-month effort.

I started working off the demo deploy action which took me most of the way there. But I kept running into GCP authentication issues. It complained that “No credentials provided, skipping authentication”. That is, until I realized 2 months later (of on-and-off attempts) that I was putting GitHub secrets in Settings > Environment > Secrets, and not in Settings > Secrets. *slaps forehead* I’m sure actions can see secrets in the Environment section somehow, but I don’t know how. Another thing I learned is that when the GCP docs ask you to put the service account key in a GitHub secrets, you can just paste the whole JSON as-is.

The next hurdle was that the datasette publish cloudrun command would fail with the error “You do not appear to have access to project […]“. I tried many things related to IAM, role, service accounts and the likes, but without success. The ah ha! moment came when I realized/remembered that datasette.publish.cloudrun actually talks to GCP using the gcloud command line tool. I identified that it calls the builds and deploy subcommands. Using that information I could make searches to figure out which permissions were required to execute those commands. The one I was missing was Cloud Build Editor (and maybe Viewer).

In the end, the Service Account has the following roles (I’m not 100% sure they’re all necessary):

  • Cloud Build Editor
  • Compute Engine Service Agent
  • Service Account User
  • Cloud Run Admin
  • Storage Admin
  • Viewer

After 100 failed deploys and much reading of mediocre Medium articles and of Google’s (seemingly) incomplete and incorrect READMEs, the 101th deploy succeeded! You can now search the ETS repos at the very unglamorous URL of https://datasette-ripgrep-ets-alicuzwd4a-uc.a.run.app and see the source on GitHub.

The discussion on symbiotic relationships between apps on the latest Core Intuition by @manton and @danielpunkass reminded me of Eastgate’s SummerFest/WinterFest for “artisanal” research/thinking/writing apps. It’s a great example of a collection of complementary apps.

What is the maximum one can be doing with someone else and still be hanging out?

Austin Kleon talks about the 13-month International Fixed Calendar. It reminds of the ISO week numbers that the Danes love so much. It always weirded me out when they ask something like “Are you free week 14?” But thinking back, they might have been onto something.

Despite all the damage done by the ice in Texas, it still created quite a bit of beauty (or at least novelty).

Icicle agave during the Texas winter storm.

Lessons Learned During My PhD

I shared these notes with my friends and colleagues at the Technical University of Denmark (DTU) when I finished my PhD in December 2015.

Here are things that I either learned during my PhD, or that helped me get through it. I in no way have all the answers, but there are a few things that helped me that I think would be helpful to other people as well.

I split the content in six different sections, meta-things, writing, reading, speaking, programming, and mind and body. I included links to articles, books, and tools that are related to each section.

Meta

The main recommendation here is to have a system to deal with all the things you have to do and the things you want to do. I personally used the GTD method, which stands for Getting Things Done. I highly recommend getting the latest book. You can find it delivered at your desk for 100 kr. The main ideas of the system are: everything goes in an inbox (physical and/or on a computer). Periodically, process the inbox and decide if you’ll do the thing, trash it, or store it in your reference system. “Tasks” that require more than two action are considered projects, and a project consist of a list of physical, actionable actions. The other important thing about GTD is that it includes a weekly review of all the projects in your system, as well as your calendar and other active things in your life. I learned about this almost 10 years ago now1 and I would go as far as to say that it changed my life.

There are many applications designed to implement the GTD system. On the Mac, there’s OmniFocus, Things, and TaskPaper (really worth checking). Todoist is multi-platform.

I also recommend reading Getting Results the Agile Way, by J.D. Meier, which proposes a different take on selecting what you should work in. The basic idea is to pick 3 things to focus on every day, three per week, three per month, and three per year. The things you do every day should go towards you weekly goals, etc. The Asian Efficiency blog (weird name, I know), is a good noise-free resource about that kind of material (check the “specialty topics” in the sidebar).

Make checklists

Keep checklists for processes and things you do often but with lots of time in between, like how to log on to remote computers, adding printers to your computer, etc. It will save you a lot of time in the long run. Similarly, save info about things you never remember how to do, such as the behavior of the FFT implementation in MATLAB, check if a file exists using bash, undo a Git merge, etc.

I keep all my notes in individual text files. I have about 800. They live on Dropbox, which means I can edit them and view them anywhere. On my Mac I use nvAlt to search and write them. ResophNotes does the same thing on Windows. Evernote is a web-based service that can be used for a similar purpose. I write all my notes in Markdown (I even wrote a paper in it!), which also allows me to easily export my notes or view in nice previews. Both nvAlt and ResophNotes are “markdown-aware”.

Keep a Logbook

I kept a daily or weekly text file in which I wrote almost everything I did within the day. I would write down either thing that I wanted to do, notes to clarify my thoughts, or the things I just did. For example, I would save the Git commit for a particular simulation and my comments about the results. It helped me when I wanted to look back at why I did certain things or to find the source of certain ideas.

Sometimes Paper Is the Best Tool for the Job

I tried doing everything on the computer, but sometime paper really is the best tool for the job. The final product doesn’t have to be on paper but paper is often really helpful to develop ideas because of the freedom you have to place things wherever you want!

If It’s Broken, Fix It

If you see something that is broken, or that could be better, just fix it. Especially if other people have noticed the same problem. I mean to fix both immaterial things (BitBucket, for example) but also physical things. Did you know you can report broken things to CAS and they’ll come repair it within a day or so? Blocked toilet, dead light, water leek? Report it at https://fejlmeld.cas.dtu.dk or use their iOS or Android apps (can’t find the link, but look for DTU Fejlrapportering).

Writing

Build an Outline When Writing a Paper

At first, I did not have a method for writing but then I read this article by Timothée Poisot, which really helped me. The main idea is to start building an outline as soon as you’re working on the new project. As you read papers, or have ideas, you add quotes, citations, and snippets to your outline. You can start writing sentences and whole paragraphs whenever you want. The big advantage is that all the relevant things you find for a given project are all together; no need to go hunting for where you read that thing, six months ago.

On the Mac, I use Tree2, which I love. OmniOutliner is the most famous alternative, but it much more expensive. I don’t know about Windows or Linux. I know Microsoft Word has an outlining mode that could work. You could also outline in a normal Word document, or a text file, but good outliners give you many shortcuts to move things around and to insert new ideas. The new Manuscripts application for Mac looks amazing. Scrivener is an amazing writing tool for both Mac and Windows. It is particularly good for large projects.

Other good resources include:

Get Good Early and Write More

Do not despair, you will get better at writing! I don’t really know how to do it, but the earlier you get good the easier your life will be. :-) A possible way to get better is certainly to write more. Maybe having a writing club would help. There are many good books about writing. I read How to write a lot, by Paul J. Silvia and really liked it.

Here are a few articles about writing:

Reading

Make Time

You never find time, you’ll have to make some. Personally I found out that I could read an article on the bus and while walking. When I needed to read many article, I would take the bus mornings and nights for a week. That way I could read about 10 articles in week. It helped me to keep a stack of papers that I wanted to read on my desk; I could grab one whenever had a moment.

Read With a Goal

I find that it’s much easier to understand and remember papers if I read with a purpose, as a way of answering a question. Also, if that question is related to what you’re currently working on, you can add your discoveries in the outline of your paper.

Give Faces to Names

I found that being able to associate a face with an author name helped me remember who was talking about what and made it easier to remember different papers and ideas.

Take Notes and Highlights

That one is pretty obvious, but writing down notes and ideas while reading papers really helps. It helps while reading the paper but also later when coming back to it. I designed a highlight system for myself based on the idea of Walton Jones. I used specific colors to code different things, for example paper-specific results were yellow, new references in green, and paper summaries in red. This way could look back at a paper and have a general overview of what was important without reading the whole paper again.

On the Mac, Skim allows you to export notes including their color. It also works well with DEVONthink (see below).

Have a Good “Personal Search Engine”

I recommend having a good search system for your notes and papers. On the Mac I use an application called DEVONthink which can show you documents related to the one you’re currently reading. It also does non-exact searches, e.g., it can search for the meaning of your query, not just the exact words you typed. It really is magic. I am not aware of anything exactly the same on Windows or Linux, but The Brain (Windows) and Recoll might be good places to start.

Give Your Articles Unique Identifiers

I gave all my articles the same file name structure, authorYEARfirstword, e.g. chabotleclerc2014predicting. I picked this format because it has no space and it’s also very short and compact. It’s essentially a unique identifier. I used the identifier everywhere: in handwritten notes, computer notes, outlines, marginalia. This way, I knew exactly which paper I was talking about.

Remembering Things

I tried to use spaced repetition (also see this link) for a while but it didn’t stick for really long. Anki is probably the best tool for the job.

Things That Didn’t Work

Having a Wiki

Based on the idea of this crazy guy, Stian Håklev, I tried setting up a wiki using either DokuWiki or TiddlyWiki. I managed to keep it updated for about a month and then abandoned it. It required a lot of maintenance and a lot of diligence to keep up with the linking between subjects.

2021 update: These days, I would strongly recommend looking into Zettelkasten as an alternative.

A Single Giant Outline, Mind Map or Concept Map

I tried building a gigantic outline or mind map or concept map of all the things I learned, but it became unmanageable. I still find these methods really useful when fleshing out an idea for a project or paper, but they were not really good tools for me to manage such a large amount of knowledge. If you want to try these tools, on the Mac I recommend MindNode (simple and cheap), and iThoughtsX (more powerful, more expensive). MindMeister is online and very popular. Docear mixes an outliner, a reference manager (JabRef) and an outliner. It’s power and complex. If you want to know everything about mind mapping and the best tools, check out Brett Terpstra’s blog.

For concept maps, I love Scapple, which works on Mac and Windows. Cmap is also multi-platform, but heavier. I use them when I want to find structure in a mess of ideas floating in my head.

Speaking

This series of articles by David L Stern really influenced me: How to Give a Talk. He suggests 5 rules3:

  1. Don’t put words on slides
  2. Use black slides
  3. Show your data
  4. Don’t tell jokes
  5. Don’t take a data dump on your audience
  6. Practice, practice, practice.

Of course, some rules are meant to be broken, but it’s worth a read. I also liked the book Presentation Zen: Simple Ideas on Presentation Design and Delivery by Garr Reynolds.

Programming

When Modeling, Optimize for a Metric

When working on a new model, first decide what you’re optimizing for, is it a correlation, the mean square error, or some other metric. Make sure this metric is computed automatically with each simulation, so you have immediate feedback about how good your model is. Just eyeballing the results is a really bad idea.

D.R.Y.

Don’t. Repeat. Yourself. When you see that you have the same code in multiple places, it’s a good hint that you need to refactor it into a function, and to call that function. Also, if your function is hundred lines long it’s probably a sign that he should be chopped into smaller functions that have clear names.

If you’re interested in becoming a better programmer, I recommend The Pragmatic Programmer, by Andrew Hunt and David Thomas, Clean Code and Code Complete.

Write Tests

You should write tests for your functions to make sure that they are actually doing what you think they are doing. They help make sure that code is correct, they help detecting regressions (when you break things that used to work), they act as a specification, design and documentation, and they make refactoring easier.

I cannot tell you how many times I found small and huge bugs because of the tests I wrote. All decent programming languages, including Matlab include a framework to automatically run tests. The words you want to look for are “unit testing”. You should run tests every time you make changes. You can even set Git to run the tests before every commit. This way you can make sure you never commit broken code.

The advantage of working with a free language (Python!) and in the open is that you can use “continuous integration” system to have them run tests every time you push new changes to GitHub (or BitBucket). Travis CI is one of those services.

These articles or sites are worth reading for info about testing.

Don’t Overuse Code Comments

Code comments are useful to explain why you’re doing something, but they should not explain every single line of code. Otherwise one day you will change the comment or the code and then one of them will be lying and you won’t be able to know which one.

Instead, use explicit names for variables and functions. Write function help files. Write documentation. Write README files. And break up long and complicated lines of code into smaller ones where you can name concepts and variables.

Use Version Control

Version control is a communication tool and a collaboration tool. It’s invaluable even if you don’t work with someone else, I can assure you that your future self will be really happy to know what your past self was thinking when he/she wrote that code/text. I heard your future self can get really upset. :-)

Track Where Stuff Comes From

Track where experiment results and figures come from. You can do it by hand, by keeping your log book and tracking output folders and inputs commits, but you can also use automated solutions. I used Sumatra, which is written in Python but can be used with Matlab. Every time you run a simulation it saves to a database: the experimental parameter used, the input files, and the command line outputs. It also allows you to comment and tag results. You can look back and see where things are from, what worked and what didn’t, and rerun experiments with exactly the same parameters. A provenance-tracking tool like Sumatra paired with version control is a good step towards making reproducible research. Recipy is a new Python-only solution that looks interesting.

Follow “Best practices”

Some very bright people have written “Best practices” for scientific computing. It’s worth reading the two or three articles below and applying their recommendations. I also recommend looking at this shablona project, which proposes a standard folder structure for a given project, I found it to work really well. It’s written with Python in mind but it could easily be adapted for Matlab.

Learn about literate programming, a useful concept where the code and the text (of, e.g., an article) live together. This can be done in Matlab using the publish command. In Python, the Jupyter notebook is the way to go. Actually, you can use the Jupyter notebook with Matlab.

If you want to do more than just read things, you should try to organizing a Software Carpentry workshop. It’s pretty much free and would probably help a lot of people.

Your Computer Should Work for You

If you find out you’re doing the same thing over and over again it might be worth automating it. On the Mac, Keyboard Maestro can do magic (like activating menu items for you, clicking places, moving windows, etc.) On Windows, AutoHotkey can do similar things, and on Linux, I found AutoKey to be the best.

Use a shortcut expander application. For example, I never type “intelligibility”, I just type inty and it expands to the full word. Same thing with my name, my bank account, my phone number, etc. On the Mac, Keyboard Maestro and TextExpander are the best. On Windows, PhraseExpress is compatible with TextExpander; you can sync your snippets via Dropbox. On Linux, Autokey can also do snippet expansions.

Mind and body

Take Care of Your Body

Seriously, take care of your body. If you start having wrist, forearm, or shoulder pain, don’t just live with it. Seek help and get proper a keyboard, mouse and chair. Learn some basic ergonomic practices about keyboard, mouse and screen placement. See the articles below for more info and recommendations.

If you are in pain, really look for help: find a massage therapist, a doctor, and stretch often. Here are a few good ressources.

Spend Money Where Your Time Is

It’s worth paying for things that make your life better, more comfortable, and easier, especially if you use those things all day long. Buy a nice keyboard, you probably write eight or nine hours a day. Buy nice headphones, good software, a mattress, a bike, etc.

Ask for Help

Ask for help when you’re stuck. Ask for a meeting if you need one. Everyone is super busy, but everyone is super generous. People rarely say no, but they might say “later”. In a perfect world, everyone would be proactive, everyone would have their slot in the calendar, but sometimes the world is not perfect. Also talking to people if often waaaay faster than googling for an answer.

Make Every Day a Non-Zero Day

This idea is not from me, it is from ryans01 on Reddit (and it’s also in the PhD Starter Kit, below). His idea is to make sure that every day you do at least one small thing towards your goal. No day should be completely wasted, even if the thing you do is really small. Keeping a log or journal helps realizing all the things you’ve done in a day and keeping your mood up.

Other people have written really good and more in-depth guides than this. The PhD Starter Kit is simply amazing. Philip Guo’s Advice for first year PhD students is a must-read/must-watch (as well as many of his other articles. He also wrote: The Ph.D. Grind, a 115-page e-book, is the first known detailed account of an entire Ph.D. experience.” That guy is amazing.


  1. Make it 13 years in 2021. [return]
  2. Not available anymore, 2021-01-23. [return]
  3. Yes, that 6 rules. And it 2021, it grew to 8 rules. ¯\(ツ)[return]

They could have used additional time during the bulk fermentation to account for the cold, but it’s still a good rise.

Great Resident Advisor podcast episode. Noisy, glitchy, grimy, dark. RA.760 Hyph11E ⟋ RA Podcast🎵

Installing air filters in classrooms has surprisingly large educational benefits

[…] Math scores went up by 0.20 standard deviations and English scores by 0.18 standard deviations […] this is comparable in scale to […] the potential benefits of smaller class sizes.”