5 Things You Can Do With a Locally Cloned GitHub Wiki

There’s a feature of every GitHub repo that in my experience doesn’t get a ton of love, and that's the wiki. In all fairness, I'm not sure how much love it deserves - it's sorely lacking in features. But did you know it's a separate repo that you can clone and manipulate locally?

5 Things You Can Do With a Locally Cloned GitHub Wiki

If you’ve been developing software for any length of time you’ve probably used GitHub, whether as free hosting for your own personal project, or searching for a library to use, or collaboration on a team.

There’s a feature of every GitHub repo that in my experience doesn’t get a ton of love, and that's the GitHub wiki. In all fairness, I'm not sure how much love it deserves. Sure you can take notes in it, but its lack of short-codes/widgets (such as easily adding a table of contents to the top of your pages) and other basic features (like uploading images to include in pages) makes it a somewhat... over-simplified tool.

I've been using it at work for internal documentation on our project for nearly a year, and its shortcomings are frustrating. Like not being able to search by full content, only by title! (Incidentally, there's an extension for Chrome users called Wiki Search for GitHub, but it's a band-aid over a broken experience.) (This is no longer true... more on that below.)

How to Clone a GitHub Wiki

I discovered something recently that opens the door to some possibilities I wasn't aware existed. You can clone a GitHub wiki!

GitHub uses a wiki system called Gollum, which is built on top of Git and stores its files in a Git repository. You can imagine why that'd be attractive to GitHub. I'm not sure if they had a hand in creating it or not, but I'd imagine they provided support along the way.

What am I getting at? Well, your entire wiki is a Git repository, separate from your main project. You can clone it, look at it, mess with pages and structure, and push your changes back up if you'd like.

GitHub placed the link in a nice convenient spot, near the bottom of the wiki. Sigh. Scroll down and look for it in the bottom-right corner (unlike the main project, which lists the "clone" link at the top).

They only provide the "https" link, but you can change it to the SSH link if you'd like.

  • https: https://github.com/your-account/your-project.wiki.git
  • ssh: git clone git@github.com:your-account/your-project.wiki.git

So now that we can clone a wiki, what can we do with it?


Edit Your Wiki Offline

GitHub uses Gollum to power its wiki, but you can install Gollum on your machine to browse your (cloned) wiki too. Edit your wiki pages even when you don't have an Internet connection; after all, it's just a git repo and every git repo is a complete clone with full history.

Installing Gollum locally is as simple as installing a gem:

My Mac complained about "icu required (brew install icu4c or apt-get install libicu-dev)", so I had to run brew install icu4c first (it's some unicode-compliance library that Gollum must be expecting to be present), and then installing the gem worked. If it hangs on the step "Installing ri documentation for gollum-4.0.1", let it go for a few minutes; it took awhile for me. For more details, check out the installation instructions.

Type "gollum" to fire up the Gollum server (make sure you're in the directory where you cloned the wiki or you'll get an InvalidGitRepositoryError), which uses Sinatra running WEBrick to serve up your wiki pages in a browser. If it starts okay, it should show you something like this:

[2017-01-07 00:07:40] INFO WEBrick 1.3.1
[2017-01-07 00:07:40] INFO ruby 2.0.0 (2015-12-16) [universal.x86_64-darwin15]
== Sinatra (v1.4.7) has taken the stage on 4567 for development with backup from WEBrick
[2017-01-07 00:07:40] INFO WEBrick::HTTPServer#start: pid=15303 port=4567

You could create a shell (.sh) script to automate the process. Just create a file named "start_gollum.sh" or whatever you want, and make it executable with a chmod +x start_gollum.sh if necessary.

#!/bin/sh
cd /Users/jdoe/your-git-project.wiki
gollum

Now open up your browser to http://localhost:4567 and check it out.

Compare the two experiences below - the first is GitHub, the second is running Gollum locally.

The wiki files are just text files with a ".md" extension on them. You could use any editor, such as Notepad++, Atom or whatever your preference is. But it's nice to have access to something that looks similar to what GitHub provides. And actually, it's better.

For one thing, full-text search actually works! Like I mentioned earlier, the GitHub search function does not search full-text, just titles. Pretty lame. As of 2019, you can search the wiki body right in GitHub - a huge improvement in functionality! (Thank you @fordsfords for pointing this out in the comments.)

Secondly, whenever you add or delete, or edit and save a page, a commit is automatically made to the git repo. When you're finished editing, a git push is all you need to push your changes up to GitHub. And when you rename a file, it's recorded in git appropriately as a file rename - not a deletion and add - maintaining history.

Note: Gollum will only show changes in your wiki pages that have been committed. So if you have Gollum running, and then change a page in a text editor, Gollum will not show the updated page (even if you restart it) until the change has been committed to the repo (you don't have to push it up).


Generate HTML Docs

Say you have a project you're developing that you don't want to expose to the world. Private GitHub repos are a very common use case. If your project is hidden, so is your wiki. But you've got some useful documentation that you'd like to make available to whomever.

You can run your local wiki pages through a conversion tool and generate HTML pages from them. Then you could publish them to a website, and avoid creating documentation twice.

Pandoc is one such tool, capable of converting between many of the markup languages Gollum (and GitHub) supports, as well as quite a few other formats.

Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, LibreOffice ODT, EPUB, or Haddock markup to HTML formats ...

The installation page has instructions for different systems. If you have brew installed, it's as simple as running brew install pandoc.  

If all goes well, you should see something like the following:

==> Downloading https://homebrew.bintray.com/bottles/pandoc-1.19.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring pandoc-1.19.1.el_capitan.bottle.tar.gz
==> Caveats
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d
==> Summary
🍺  /usr/local/Cellar/pandoc/1.19.1: 72 files, 88.6M

Run a file through it with:

pandoc -f markdown some-file.md > some-file.html

I threw a bunch of nonsense (but valid) markdown at it…

# Important stuff!

Oh yes, it's important.

---

![settings menu](images/settings_menu.png)

## Less Important but Relevant Stuff

Read _me_.

    ```
    function greeting() {
        console.log("hello");
        }
    ```

Read _**me**_ `too`.

    function greeting() {
        console.log("hello");
    }

### Wassup

Just ignore [this](http://www.google.com).

# Next major point!

* One
* Two
* Three

***

> And you can quote me on this...

Ugh.

... and it rendered the HTML very nicely.

<h1 id="important-stuff">Important stuff!</h1>
<p>Oh yes, it's important.</p>
<hr />
<div class="figure">
<img src="images/settings_menu.png" alt="settings menu" />
<p class="caption">settings menu</p>
</div>

<h2 id="less-important-but-relevant-stuff">Less Important but Relevant Stuff</h2>
<p>Read <em>me</em>.</p>
<pre><code>function greeting() {
    console.log(&quot;hello&quot;);
}</code></pre>
<p>Read <em><strong>me</strong></em> <code>too</code>.</p>
<pre><code>function greeting() {
    console.log(&quot;hello&quot;);
}</code></pre>
<h3 id="wassup">Wassup</h3>
<p>Just ignore <a href="http://www.google.com">this</a>.</p>
<h1 id="next-major-point">Next major point!</h1>
<ul>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ul>
<hr />
<blockquote>
<p>And you can quote me on this...</p>
</blockquote>
<p>Ugh.</p>

Here's a screen shot of the output side-by-side, first on GitHub, then Gollum locally, and finally the rendered HTML. Looks identical to me, minus some CSS styling.

You could also write a script to do something more complicated, like specifying which files to render, or manipulating the output afterwards.

Here's a short bash script that loops through all the ".md" files in a directory, generating an HTML file for each.

FILES="*.md"
for f in $FILES
do
    base=`basename $f ".md"`
    pandoc -s -f markdown $f > "$base".html
done

There are a lot of possibilities. You could further modify the output by adding a copyright notice, or generate a master document that acts as a table of contents, or even call your script from a build server like Travis CI or TeamCity as part of your build to generate HTML documentation automatically.


Insert a Table of Contents

Even though you could have huge wiki pages, with loads of headers, there is no way to tell it to generate a table of contents at the top of your pages. Gollum supports TOCs with a simple [[_TOC_]] tag, but GitHub does not. If you've ever been annoyed by this, you're not the only one (but the odds of it changing anytime soon are slim).

One option is to write a short script in the language of your choice that parses the file and creates a TOC for you. Here's one I wrote in Perl that parses the file for headers, then inserts them into the top of the file and surrounds the TOC with the best approximation of a comment for markdown that I could find so that it can update the TOC later.

use strict;
use warnings;
use File::Copy;
 
my $tocBegin = "[//]: # (Start of TOC)\n";
my $tocEnd = "[//]: # (End of TOC)\n";
 
foreach my $file (<*.md>)
{
    open(my $fh, '<', $file) or die "Can't open $file: $!";
    my @lines = <$fh>;
    close $fh or die "Can't close $file: $!";
 
    my @headers = ();
    foreach (@lines)
    {
        if ($_ =~ /^###/) {
            push @headers, createLink($_, 3);
        }
        elsif ($_ =~ /^##/) {
            push @headers, createLink($_, 2);
        }
        elsif ($_ =~ /^#/) {
            push @headers, createLink($_, 1);
        }
    }
 
    if (scalar(@headers) == 0) {
        next;
    }
 
    open(my $in, '<', $file) or die "Can't open $file' $!";
    open(my $out, '>', "$file.new") or die "Can't write $file.new: $!";
 
    print $out $tocBegin;
    print $out "**TABLE OF CONTENTS**\n";
    foreach(@headers) {
        print $out $_;
    }
    print $out "\n---\n";
    print $out $tocEnd;
 
    my $traversingOldToc = 0;
    while(<$in>) {
        if ($_ eq $tocBegin) {
            $traversingOldToc = 1;
        }
        elsif ($_ eq $tocEnd) {
            $traversingOldToc = 0;
        }
        elsif ($traversingOldToc == 0) {
            print $out $_;
        }
    }
 
    close $in or die "Can't close $file: $!";
    close $out or die "Can't close $file.new: $!";
 
    move("$file.new", $file) or die "Can't rename $file.new to $file: $!";
}
 
sub createLink {
    my $currentLine = $_[0];
    my $indent = $_[1];
 
    my $text = substr($currentLine, $indent);
    $text =~ s/^\s+|\s+$//g;
    my $link = lc $text =~ s/ /-/rg;
    return " " x (($indent-1)*2) . "- " . "<a href=\"#user-content-$link\">$text</a>\n";
}

You'd need to do some work to get this production-ready (such as stripping punctuation from the headers since they aren't included in the anchors) but it does the trick in simple scenarios. Here's how it renders:

Now that I showed you the harder way, you could search for one of the many projects out there that do it for you. For example, a quick search turned up DocToc, which you can install with npm. If you check the output below, you'll see that DocToc recursively checks inside other directories (like "images"), just in case you're structuring your wiki differently than the default "all pages in a single folder" style.

grantwMac:pinboard-bookmarks-to-chrome.wiki gwinney$ doctoc .
 
DocToccing "." and its sub directories for github.com.
 
Found Another-test.md, Home.md, Manual-Test-Scenarios.md, My-test-wiki-page.md, Rate-Limiting-Retrieval-of-URLs-from-Pinboard.md, Rationale-for-Not-Using-Sync-Storage.md in "."
 
Found nothing in "images"
 
==================
 
"Home.md" is up to date
"Manual-Test-Scenarios.md" is up to date
"Rate-Limiting-Retrieval-of-URLs-from-Pinboard.md" is up to date
"Rationale-for-Not-Using-Sync-Storage.md" is up to date
"Another-test.md" will be updated
"My-test-wiki-page.md" will be updated
 
Everything is OK.
 
grantwMac:pinboard-bookmarks-to-chrome.wiki gwinney$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
 
 modified:   Another-test.md
 modified:   My-test-wiki-page.md

The end-result is similar in appearance to mine, even using comments (albeit regular HTML comments) to mark the TOC so it can be updated later.

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
 
- [Important stuff](#important-stuff)
  - [Less Important but Relevant Stuff](#less-important-but-relevant-stuff)
    - [Wassup](#wassup)
- [Next major point](#next-major-point)
 
<!-- END doctoc generated TOC please keep comment here to allow auto update -->

While we're at it, pandoc can do it too by specifying the --toc argument. Here's a quick script for converting your markdown files into HTML documentation that includes a table of contents.

FILES="*.md"
for f in $FILES
do
    base=`basename $f ".md"`
    pandoc --toc -s -f markdown $f > "$base".html
done

Easily Upload (and Add) Images

The GitHub wiki has a button for inserting image links into your wiki page, but it assumes the image is already uploaded somewhere and that you know the URL. There's no mechanism for dropping an image into the editor or otherwise uploading files to the wiki.

If you've got a wiki page to create that involves a lot of images, you may want to clone locally. Gollum supports uploading files when you start it with the right option, as well as a myriad of other options you'll want to check out too.

gollum --allow-uploads dir

Drag a file over the editor and it'll show a light green border around it. The file is copied to the repository, and a link to the file is inserted into the wiki page. If the file is an image, it should be displayed.

[[/uploads/custom toc from doctoc wiki5.png]]

Files dropped onto the editor this way are committed, and will be pushed up with the rest of your wiki.

The show-all flag complements this nicely too.

gollum --allow-uploads dir --show-all

With that option enabled, clicking the "All" or "Files" buttons in the wiki will show everything, not just pages.


Manipulate Pages with the Ruby API

One more option for manipulating pages is the Ruby API called Gollum-lib. To install it, just run `gem install gollum-lib`.

Gollum-lib is nice because it abstracts away some of the nitty-gritty details. For example, the Perl script I wrote earlier could be modified to print out the contents of the file it's iterating through. Or we could just write a few lines of Ruby using gollum-lib.

Here's a short script that loads the local wiki (use the relative path from your Ruby script), then loads a page from the wiki and calls the raw_data function to output the contents of the file. You can see the output in the terminal window in the bottom half of the screenshot.

What else can we do with the API?

A one-liner change from page.raw_data to page.formatted_data renders the page in HTML output:

<p><strong>Table of Contents</strong>  <em>generated with <a href="https://github.com/thlorenz/doctoc">DocToc</a></em></p>
 
<ul>
  <li>
<a href="#heading-one">Heading One</a>
    <ul>
      <li><a href="#heading-one-sub-blahlblah">Heading One Sub bLahlblah</a></li>
    </ul>
  </li>
</ul>
 
 
 
<p><strong>TABLE OF CONTENTS</strong>
- <a href="#user-content-heading-one">Heading One</a>
  - <a href="#user-content-heading-one-sub-blahlblah">Heading One Sub bLahlblah</a></p>
 
<hr />
<p># Heading One</p>
 
<p>Headin' on...</p>
 
<h2><a class="anchor" id="heading-one-sub-blahlblah" href="#heading-one-sub-blahlblah"><i class="fa fa-link"></i></a>Heading One Sub bLahlblah</h2>
 
<p>Read <em>me</em>.</p>

If you wanted a list of all versions of your pages, you could easily get to that information with the API:

require 'rubygems'
require 'gollum-lib'
 
wiki = Gollum::Wiki.new('pinboard-bookmarks-to-chrome.wiki')
Dir.foreach('pinboard-bookmarks-to-chrome.wiki') do |item|
  ext = File.extname(item)
  next if ext != '.md'
  page = wiki.page(File.basename(item, ext))
  puts "#{item}: #{page.version.id}\n"
end

Here's what it outputs:

Another-test.md: dde4a0571de2a1791cdb8bf3575e957e45944873
Home.md: dde4a0571de2a1791cdb8bf3575e957e45944873
Manual-Test-Scenarios.md: dde4a0571de2a1791cdb8bf3575e957e45944873
My-test-wiki-page.md: dde4a0571de2a1791cdb8bf3575e957e45944873
Rate-Limiting-Retrieval-of-URLs-from-Pinboard.md: dde4a0571de2a1791cdb8bf3575e957e45944873
Rationale-for-Not-Using-Sync-Storage.md: dde4a0571de2a1791cdb8bf3575e957e45944873

Or maybe you want an outline of all pages, including a table-of-contents if headers are present, that links to the original wiki:

require 'rubygems'
require 'gollum-lib'
 
wiki = Gollum::Wiki.new('pinboard-bookmarks-to-chrome.wiki')
html = '<h1>Wiki Contents</h1>'
html << '<style>p { font-size:larger; font-weight:bold }</style>'
 
Dir.foreach('pinboard-bookmarks-to-chrome.wiki') do |item|
  ext = File.extname(item)
  next if ext != '.md'
  basename = File.basename(item, ext)
  page = wiki.page(basename)
  html << "<hr><p><a href=\"https://github.com/grantwinney/pinboard-bookmarks-to-chrome/wiki/#{basename}\">#{page.name}</a></p>"
  toc = page.toc_data
  next if toc.nil?
  html << "#{toc}"
end
 
File.write('outline.html', html)

That'll produce a small HTML page with a link to each wiki page and a table of contents if available. (The missing header below was due to some wonky markdown in that file.)

So far, all we've seen is how to query data.

It's also possible to commit your changes from the API too. Here's a Ruby script that loops through your pages, inserts a copyright notice at the very top (if there isn't already one), and then commits the modified files to your wiki repository.

require 'rubygems'
require 'gollum-lib'
 
wiki = Gollum::Wiki.new('pinboard-bookmarks-to-chrome.wiki')
 
def create_commit(page)
  { :message => "added copyright to #{page}",
    :name => 'Grant Winney',
    :email => 'user@email.com' }
end
 
copyright = '<p><em>Copyright 2017 - Grant Winney - <a href="https://opensource.org/licenses/MIT">MIT License</a></em></p>'
 
Dir.foreach('pinboard-bookmarks-to-chrome.wiki') do |item|
  ext = File.extname(item)
  next if ext != '.md'
  basename = File.basename(item, ext)
  page = wiki.page(basename)
  if (!page.raw_data.start_with?(copyright))
    wiki.update_page(page,
                     page.name,
                     page.format,
                     "#{copyright}\n\n#{page.raw_data}",
                     create_commit(page.name))
  end
end

Here's the rendered output:

What's Next..?

There are still some shortcomings in Gollum, but far fewer of them when working locally rather than through GitHub.

I hope this helped you out, and that you learned something new! If you discover any other good uses of cloning your wiki locally, or create something cool with the Ruby API, let me know! I'd love to check it out.

Good luck!