Introducing MarkdownRecord

Published: 2023-08-17

Updated: 2023-12-20



Welcome to the newest iteration of my blog!

This blog used to be a shameful WordPress site... but I can proudly declare that it is now a custom app that I built myself. It's not the first blog I have built and it wasn't exactly a big challenge at this point in my career as a software engineer, but I did it very differently this time and I am going to use this occasion to write about how I did it. In the process, I will be introducing you to a Rails engine that I put together to make this sort of project much easier and faster.

First, I think that as a software engineer, building a blog site from the ground up is a great challenge to take on, but its a challenge that has diminishing returns. Once you have built a blog once or twice, it gets fairly repetitive. Beyond that, I think maintaining your own blog site can be more tedium than its worth, especially with so many powerful, ready to use blogging solutions out there.

The task of building a blog site itself requires solving the same old problems that have been solved countless times already. Honestly, it's pretty boring, which is why I didn't want to do it for this blog when I decided (once again) that I should try to get a blog up and running. That's why this blog was initially a WordPress site. But I don't particularly like WordPress (or any blogging platform, for that matter)... and I like being in control of things, so I eventually broke down and decided to build this thing you are now reading from.

But I didn't want to go through the usual tedium of implementing a client side text editor, a Post table migration, etc - despite the fact that doing so is dead simple and easy with Ruby on Rails (my go-to language and web framework). All that is pretty boring, especially when I have not only done it many times myself over the years (hosting fees have limited the lifespan of many previous blogs), but I have also watched countless other people do it in tutorials, tech blogs, etc. And when you think about it, it just doesn't make sense to put so much work into client side editing capability for an app that I built (and have access to the source code) and that only I will ever update things on, at least in the near term. There had to be a better way that didn't require writing blog posts in HTML or ERB files, and didn't require a text-editor on the client side that nobody besides me will ever use.

Fortunately, I had already been wrestling with a similar use case and problem, and I had a solution in the works. Let me tell you a story about solving that related but different problem. Then we'll get back to the blog problem, I promise.

The Quantum Entanglement of Referenced Data and Written Copy

One of my side projects that I work on intermittently is a table-top wargame that I designed (because I'm a nerd). I have wanted to host the rules for this game online in an interactive web app, but the rules are quite lengthy and consist of many chapters of written content. Furthermore, I wanted the web app that hosts the rules to also provide some tools to help players play the game. The features I had in mind would require the use of many concepts and bits of information that were defined throughout the written rules. For example, I wanted to build a feature that would let users construct an army to play with by selecting a from a list of various kinds of units. To do this, I would need to populated a form with the various options allowed by the rules.

If you have ever played a table-top wargame like Battletech or Warhammer, or a table-top RPG like Dungeons and Dragons, you will know how much game rules can look like database records. The rulebooks for these games are filled with tables of various kinds of data which is often referenced in preparation or while playing the game. When you digitize game rules that are like this, however, you immediately run into problems. Application code can't easily pick out the relevant and useful bits of information needed to populate a form the way humans can easily thumb through a game manual to find the table or section they need to reference. Embedding key bits of information this way means that your reference data (what we'll call the stuff needed for forms and similar things from now on) is tightly coupled to your written copy.

If you were to use the traditional pattern of building web apps for this use case, you would need to decide how to store the written copy and then figure out how to extract the reference data. The path of least resistance might be to simply duplicate the reference data in some other format and not even bother trying to extract it, but that has its own problems. If you store the reference data separately it becomes a consistency and maintainability nightmare when things change (as game rules tend to do). My game's rules are versioned and will always be changing, making duplication of data a big problem that I had to worry about.

If, on the other hand, you decide to store the reference data in its original form in the written copy, you would need a way to easily parse the written copy, identify relevant data, and extract it as its own value in memory. This is theoretically possible in a variety of ways, and your written copy could be in a variety of formats, but you would undoubtedly need to invent your own DSL appropriate to that format to let you delineate the data you want to extract. Most in-browser text editors aren't going to be very friendly with whatever DSL you create, and the prospect of storing your written content as HTML (as most editors, like Trix, will do) is also problematic if you want to use special characters or xml style tags in your DSL.

Then there is the enormous inconvenience of using an in-browser text editor to write and edit anything at all. Even though Rails gives you great libraries and ways of doing this, writing in native text editors will always be a better and more consistent experience that is less reliant on internet connections, javascript libraries and quirky browser behavior.

My first attempt to build an app to host my game rules exemplifies all the problems I described above. In that case, I took the "duplicate the necessary data" approach. I had to implement dozens of forms to populate my application data models, including rich text editors to allow storing and editing the game rules themselves. I also had to implement an enormous web of associations and an over-complicated versioning system to keep it all in sync. Getting the in-browser text editor to match my theme and provide the behavior I wanted felt like voodoo, leading me to consider purchasing a fancy text-editor javascript library for hundreds of dollars. Yikes!

After much effort the system worked... but it was ugly. When I came back to the project after taking a break from it, I chucked the whole thing in the trash in disgust. I shudder when I think about having to maintain that code base for any amount of time.

My next attempt is the main point of this story. I needed a better way to have large amounts of written copy with embedded data that I could make accessible to my application without all the boilerplate, bloat, and duplication. I also needed a way to store large amounts of written copy without needing to implement a client side rich text editor. Since I am in control of the code base, there is absolutely no reason to need client side editing capabilities (spoiler: it's the same with this blog!).

So I began looking for a solution.

Enter: Redcarpet - The safe Markdown parser, reloaded.

Markdown is an amazing format. It's simple and lets you focus on writing instead of messing around with formatting. The Redcarpet ruby gem is an awesome Markdown parser. You can use it to effortlessly turn Markdown into HTML, which is just what I needed to present my game rules online. But Redcarpet's functionality is also fairly easy to extend or override in order to augment it, and that's exactly what I decided to do in order to solve my content-data coupling problem. This time, I took the other approach mentioned above and created a DSL that would let me easily delineate and extract data out of my written copy.

By wrapping the functionality provided by Redcarpet in a Rails engine I shamelessly named MarkdownRecord, I was able to solve all my problems related to my unique use case. The engine does a lot of things, but its primary features are a fairly simple DSL that lets you embed JSON data inside Markdown files using HTML comment tags, an ActiveRecord-like ORM that lets you use that data as if it was stored in the database, and some tooling to let you compile the Markdown into HTML and JSON files that the application uses at runtime.

With the engine, my game rules can now have embedded JSON that I can query just like ActiveRecord models (they are MarkdownRecord models instead, but have a similar API) in order to populate forms, and I can render my game rules written in markdown as HTML (the embedded JSON is removed before the HTML is rendered) without needing a database or a text editor on the client side. Finally, I can do all this without having multiple sources of truth and unnecessary duplication, so I don't need to worry about versioning or things getting out of sync when I make changes.

Fun fact: Using git submodules, I could even manage my markdown files independently from the app! This would essentially be like using a git as a database.

But that's not all! While developing the engine, I found ways to add tons of powerful features that completely change the way you might go about building a web application. For example, the engine has a built-in ContentFragment class, which is just a MarkdownRecord model that is automatically populated with data about your markdown source files. That means you can interact with your source files just like you interact with your embedded JSON data or database records. The ability to store meta data on ContentFragments using the DSL means you can do a lot of powerful things in your application right out of the box - without defining any models, database migrations, or views.

When it comes to rendering and navigating your content, the MarkdownRecord engine provides view helpers that let you build links to your MarkdownRecord models for either HTML or JSON rendering (this works especially well with content fragments). When a ContentFragment is rendered as JSON, the data returned will be the embedded JSON data instead of the HTML that was compiled from the written copy. This allows you to write content for the human reader and define a JSON API at the same time, in the same place.

The engine also comes with built-in controllers, meaning that you can make your HTML and JSON available without having to write a single custom controller. Also, the engine adds full ERB support to your markdown files as well as layouts, so you can use your Markdown files as data-rich views that can be rendered contextually or queried. And honestly, I have only scratched the surface of what it can do... I won't even go on about how your embedded JSON data can have associations...

What all this means is that you could build a doc site or a static API, or both simultaneously, just by writing in Markdown files. No code required beyond what rails new ... gives you (and perhaps some minor configuration). And now we come full circle, because this is all you really need for a blog, too. And so this blog became the proof of concept project for the MarkdownRecord engine.

This Blog is the Proof of Concept

I am currently typing this out in a markdown file, and when it is done I will run a compile command which will parse the Markdown to generate HTML and JSON files. The embedded JSON I added to this file looks like this:

<!--model 
  {
    "type": "post",
    "id": 4,
    "published": true,
    "published_date": "2023-08-17",
    "updated_date": "2023-08-17",
    "title": "Introducing MarkdownRecord",
    "tags": ["MarkdownRecord", "Rails", "Ruby"]
  }
-->

The above JSON provides the metadata I want to associate with this blog post, such as the tags I give it, the published date, etc. This allows me to write code in my application like this:

Post.where(:tags => { :__include__ => @tag }).sort_by(&:published_date).reverse

Using a simple MarkdownRecord model that is defined like this:

class Post < MarkdownRecord::Base
  attribute :title
  attribute :published_date, :type => Date
  attribute :updated_date, :type => Date
  attribute :published, :type => Boolean
  attribute :tags
end

Or I can interact with the ContentFragment that represents this particular markdown file like this:

frag = MarkdownRecord::ContentFragment.find("content/posts/2023_08_15")
# renders the HTML that was compiled from the markdown file at content/posts/2023_08_15.md
render_fragment(frag) 

Note: ContentFragments have ids that indicate the relative path of the source file.

I hope to utilize MarkdownRecord to great effect for my other projects too, and so far, it seems to do exactly what I need it to. Of course, this is a fairly niche use case, and I am fully cognizant that many things can't be accomplished with the MarkdownRecord engine alone. But I hope others will find use for it where it can really excel.

Note: MarkdownRecord is still in beta, and there is a bunch of work to do on it still so I welcome any contributions.

Feel free to check it out and give it a test drive. There is a dedicated docs site here (and yes, the docs site is built using the engine too!).

Thanks for reading. Have a great day!



Tags