Blog Conversion Steps

Blog Conversion Steps

If you read my previous post, you found out why I chose to migrate away from a CMS (Content Management System).

The conversion that I did for my website, took about 30-40 hours. That was a combination of a few evenings after work and a full weekend. To give you a perspective, I only had about 120 nodes on my Drupal site that had to be converted. Some of the more recent nodes, were already in Markdown format, so only formatting and presentation changes needed to be made. The other nodes were in various formats, including plain text and HTML (filter and full).

Do The Setup

In order to create a website with MkDocs, you will need to do some initial setup . I did my updates on Ubuntu. Therefore I was able to run

sudo apt-get install mkdocs

and the rest was done auto-magically. Setup is similar for Mac users. Windows users, you’ll need to install Python and then the MkDocs pip package. Instructions on how to do this are available on the MkDocs website.

Stop Creating Content

Similar to how difficult it is to drain a bath tub if the water is still running, you will need to stop creating content until you do your migration. Otherwise, you will never complete the conversion process.

I will admit, I did write two new posts while going through the conversion process. One about switching to Markdown and MkDocs and the other about raking leaves. Those posts, were written in new site that I was actively working on and not deployed to the old site since it was on the way to the digital landfill.

Backup and Export The Database

Before you start messing with your content, make a backup of the database and your website code and put it on a system that you will not be working on. That way if you do something wrong (yes it happened several times), then you have something that you can redo your changes from.

I copied the production database of my website, as well as the master branch of code to my development server. Then I performed all of the updates in my development environment.

I did it this way because that allows my website to continue to remain active. Also, in the worse case of making mistakes, which did happen, those mistakes were not seen by end users as only I have access to my development server.

Convert Tables to Markdown

Next was to get the posts and other website content into Markdown format. As previously mentioned, some of the content was written in HTML or plain text formats.

What I chose to do was to write some database queries that would format the existing data into or close to the format that I wanted the markdown file to be in. The final output was contained in a column in a temporary table that I was using to make the necessary changes ad adjustments. The queries that I used to make this update are in the SQL file located at https://github.com/almostengr/almostengrwebsite/tree/master/migration_scripts.

Using a number of update statements, I was able to convert the HTML syntax to Markdown. Some of the tags were too complex to convert, so I left those in the database and fixed them after I did the export to a text file.

Export the Markdown

Next, I did an export of the data to a CSV file. Now I only exported the column that had the finalized output and the title of each post. I did this using PHPMyAdmin as I have a MySQL database installed. Be sure to select the option that removes the carriage returns and line breaks from the export.

After that, I assembled a command that would read each line in the CSV file, take the first column of the CSV file and put it into the body of the Markdown file and take the second column of the CSV file and use it to name the CSV file. This was done through the use of a while loop and the awk command. Take a look at csv_to_individual_files.sh script in the migration_scripts directory.

After that, I did the file cleanup. The file clean up removed the additional characters that were present from the CSV file as well as converted the line breaks (<br /> tags) to new line characters. This way the final file will will not be one long line.

Copy Images and Supporting Files

Next, I copied the images that were used on my Drupal site into an images directory. I put all of the images into a single directory. Then I had to update the Markdown files to use relative paths to these files. Now you can use absolute references to the files, but that may cause you to miss errors when you are developing new content as you will need to use production URLs when setting up the links. If you do not have the image in production when creating new content, you will get no image or the alternate text for that image.

Structure and Configure Your Website

Blog Categories

How you structure your website can make a big difference when it comes to SEO and user experience. When I used Drupal, all I had to do was select the appropriate taxonomy category and the URL part was handled. With the Markdown site, I’m having to do the structure manually. I found it best to place each post in a folder based upon the category that it was intended for. In addition, I named each file with the date that I created the article and the title of the article. By having this information in the URL, it better helps the search engines index my blog.

My folders tree looks similar to the following:

docs
 | index.md
 | 2018
    | 2018.02.13-use-twitter-and-rules-for-farmos-notifications/
    | 2018.01.23-raspitraffic-demo,-us-signaling/
    | 2018.01.16-raspberry-pi-first-run-and-installing-updates/
 | 2019
    | 2019.12.21-switched-blog-from-drupal-to-mkdocs/
    | 2019.12.20-blog-reverse-sort-with-mkdocs/
    | 2019.12.24-blog-conversion-steps

Sort by Date

One of the downfalls to using Markdown, is that it sorts by alphabetical and numeric order. Meaning that my oldest posts would show up first. Nobody wants to have to scroll down to the bottom of the page to see the latest post that you wrote. What I needed up having to do was to leverage the find, sort, and sed commands to list the files in reverse order. Once I got them in the most recent post first order, I put them into the MkDocs configuration file (mkdocs.yml). After they were in the configuration file, then I added group headings based on the categories. The code that I used for this is available in the sort_by_latest_post.sh file in the migration_scripts directory.

Proofread, Proofread, Proofread

I went through each of the posts and attempted to correct as many of the formatting and Markdown errors that I could. Some of the errors, I realized were due to a lack of formatting that was done during the database step. So I truncated the temporary table and updated the update database query so that the additional formatting changes could be applied there instead of having to use sed or some other command via the terminal. Some of the HTML formatting I could not remove because of some of the difficulties of trying to match some particular text. So I left it in place for it to be manually modified. As I mentioned earlier, I only had about 120 pages that I had to go through and correct. If you have a blog with thousands or more posts, you may want to have a team help you or build more queries that will make the necessary code updates before deploying.

Deploy To Production

Not Exactly Perfect

Once I was satisfied, I deployed it to production. There are some articles that do not have images contained in them because the image link is incorrect. There are other images that are way too large for the post, and I will need to resize them to make them more appropriate. I will do that at a later date.

I deployed my updated site by using the mkdocs gh-deploy command. This creates a separate branch on your repo to deploy to. Then it will generate the static files for your website and push those files to the gh-pages branch.

Set up Automated Pull or CI

I added a cron job on my production instance that will periodically pull the latest version of the gh-pages to the production server. This will eliminate my need to login to the production instance unless there is a critical failure with the script or something else goes wrong.

If you are not able to do a cron job, you can also use a CI server that will do the production deployment for you. Travis is one that is popular to use with Github.

Conclusion

I am happy with the new setup. The biggest advantage is that I do not have to spend time performing module updates or backups of the Drupal database. Further more, the simplicity of writing blog posts much better as I do not need specialized tools to complete the task.

Updated: 2020-07-15 | Posted: 2019-12-24
Author: Kenny Robinson