Categories
2023

Using Clonezilla expert mode and sfdisk for replacing a ~1 TB disk with a ~240 GB disk

Introduction

I’ve been using Clonezilla (a small bootable GNU/Linux distribution) for doing full harddisk backups (images) for perhaps more than 10 years (I don’t remember it, but a long time). Making a simple Clonezilla backup image is too simple a task to write a blog post about. Recently I did something which for me is a bit unusual and I consider it to be a more advanced task than one would typically use it for. Why was this task “semi-advanced”?

I think this is the first time I remember I manually had to calculate the partition boundaries and customize the partition table – and that I think is interesting to write about. I did several mistakes and I’ve therefore decided I want to write a blog post with all the mistakes and the solution I came up with, to support this excellent free open source tool. If you know of a better free open-source tool than Clonezilla that can do the same (or anything better), please write in the comments section (but I think it’s the best free tool for me, at least). Clonezilla describes itself as:

“Clonezilla is a partition and disk imaging/cloning program similar to True Image® or Norton Ghost®. It helps you to do system deployment, bare metal backup and recovery.”

From the Clonezilla webpage

The use case: Upgrade the mechanical hard-disk in an old PC to SSD

I usually handle IT-related tasks for my parents private computer setup. The PC they currently use is an older one I gave them around ~2 years ago and it replaced an older PC running with an Intel i3 processor, which is now probably around ~4-5 years old. For a long time, this older PC had become unbearable slow and for probably 2 years or so it’s just been turned off and doing absolutely nothing – not even receiving the latest Windows updates. People don’t want to work with a computer that takes several minutes to boot up and even after it has booted up, it still takes a minute or two before you can open up a web-browser or begin to work. I know from experience that significant speed increases can easily be obtained just by swapping the mechanical hard drive out with a newer, modern SSD-drive.

The reason: Mechanical hard drives can typically transfer data at speeds from 80 MB/s to 160 MB/s, while SATA SSD’s offer speeds of 200 MB/s to 550 MB/s and an NVMe M.2 SSD disks can deliver speeds from ~5000 to 6500 or 7000 MB/s. I bought a cheap Kingston SSDNow A400 SSD – 240GB which I checked the specs on and it has read speeds up to 500 MB/s and write speeds up to 350 MB/s. By the way, in case you’re wondering: The screenshots collected below are all made using my KVM over IP-setup aka my BliKVM Raspberry Pi CM4-solution described in https://mfj.one/2023/01/why-and-how-i-use-kvm-over-ip-with-the-blikvm-box-based-on-the-raspberry-pi-compute-module-4/. Now, let’s get started…

The mechanical 1 TB drive – before partition modifications

The most important partition of the original (source) disk is the (NTFS) C:\ Windows drive, which took up almost 600 GB. Furthermore, this disk had space for an additional (NTFS) E:\ partition, taking up almost 200 GB. I also once installed a Linux-partition taking up around 90 GB with a corresponding linux-swap disk taking up almost 6 GB (ignore that the language is not english):

Original partition layout

If we boot up in Linux, this is the same partition table information:

The task is to resize all the partitions with data we want to keep, so the sum of all data can fit on the smaller ~240 GB SSD drive. In general it’s easy to expand a partition if there’s free space around it – and more difficult to shrink, but it can be done. In general for Windows, one has to go to “My Computer” and select “Management”, In the Computer Management then click “Disk Management”, right-click the partition and choose something like “Resize Volume” or maybe even “Shrink Volume”.

I had some problems, but those were solved by googling for “cannot shrink windows volume” (or similar). As I remember it, I think my problems were that at end of the volume, I had a hibernation file, page file(s) and possibly system volume information folder used by the System Restore. So I turned off System Restore (try running “systempropertiesprotection.exe” and “Disable system protection”), I disabled the use of a hibernation file: (run “powercfg /hibernate off” in a Windows administrator cmd-console window) and I also think I disabled “fast boot”. After doing these things I could shrink the C:\-volume.

The mechanical 1 TB drive – after partition modifications

The result was the following partition layout:

I decided I didn’t want the E:\-drive (sda6) at all and also would like to exclude the two Linux-partitions sda8 and sda9. I didn’t do any calculations but just by looking at this I thought to myself that it looked good, because if I roughly sum up all the numbers of the partition sizes I want to keep, that should be able to fit on at ~240 GB SSD disk. But I made a stupid mistake here – the “End”-sector is already outside the physical disk boundaries, which I didn’t realize until later… If I have to do this in the future, I think I might as well have begun calculating the precise partition boundaries to use and also verify that the last “end”-sector fits within the total disk boundary, but more about that later…

With the new partition layout on the new SSD-disk, I would end up having sda6 for Windows recovery environment (instead of sda7) and sda7 would become the 13.3 GB “Microsoft basic data”-partition – which I suspect is also where the Windows recovery image is stored.

Using Clonezilla to backup and restore

I didn’t write down all the steps here, for doing the backup because that part should be relatively trivial and I think it can be done using the “Beginner”-mode – just create an image of the whole harddisk. When the image has to be restored things become more complicated and I’ll add some extra detailed screenshots to explain about the errors I did and what I should remember if (or when) I’ll do something like this in the future.

Creating the disk image from the mechanical hard-drive as source-disk

For people not so familiar with the Clonezilla-tool, I think these are roughly the steps needed to create a backup-image of the harddrive:

  • Create a bootale USB-stick with Clonezilla (or as me use BliKVM, PiKVM, IPMI or similar so you can mount a virtual DVD-drive to boot up Linux with Clonezilla from).
  • In the boot-menu choose the default Clonezilla live CD/USB-setting (I think it’s called “Default setting, VGA 800×600” or similar).
  • Choose language and keymap settings – either use defaults or as you prefer it.
  • Choose something like “Device-image” – (to work with disks or partitions using images).
  • Choose something like: “Use local device (E.g.: hard drive, USB drive)” – and follow the instructions. In my case, this is also where I plugged in my external USB hard-drive, which I used for storing the disk backup image to.
  • You need to choose the proper disk and directory, for the image such that it appears as mounted under /home/partimag – which is where the clonezilla image data must be for things to work.
  • Select something like “Savedisk Save_local_disk_as_an_image” and input filename for the saved image.
  • Select the source disk for full disk back up. In my case I usually also choose “Skip checking/repairing source file system” because it makes things slower and if there are disk errors, what can we do about it? Usually, the operating system is capable of fixing itself so I personally don’t remember any situation where it helped me to check/repair the source file system, at this point in the process. If you suspect your disk is failing, yes, maybe you could opt for this disk-check, just to get a warning and be notified if there are problems. If your disk is in bad shape you’re screwed no matter what and should replace it soon…
  • I also don’t check the saved disk image, because I usually don’t have disk problems and if it happens, I usually deal with them later. I usually throw bad disks away very quickly, because I cannot and don’t want to live with bad or unreliable disk drives – but if you have time or think it’s a good idea, I think the default setting is to check the saved image (see screenshot below).
  • Now start the full disk backup and wait for it to finish. In case anything goes wrong, rest assured that now we have a full disk backup and can always go back and restore all data. But the idea is that now we don’t have to modify the source disk any longer, as the new SSD should replace the old mechanical drive completely and we’ll use the written image instead of the original harddisk, from now on…

I don’t remember the exact details, but I think I took this screenshot at the end of the backup procedure:

This screenshot tells us that it didn’t make an image of sda9 – which was the Linux swap-partition (that makes pretty good sense). It also says: “All the images of partition or LV devices in this image were checked and they are restorable“. It’s up to you if you want to check the disk image and apparently I in this case chose to check the image after it had been written. So far things have been relatively easy – it’s just a plain full-disk “vanilla” backup that has been made.

Restoring requires a an extra effort and this is very things become “semi-advanced”: We need to mess with the partition table information (notice the message: “The partition table file for this disk was found: sda, /home/partimag/2023-05-10-01-img/sda-pt.sf, we’ll use this shortly). Clonezilla now asks you what you want to do and you want to power everything off. Unfortunately this old PC only had place for a single SATA harddisk, so I had to replace the old mechanical drive with the new SSD harddisk. I/we’ll keep the old 1 TB mechanical harddisk and use it for e.g. backups in the future in another computer – no need to throw it away and it isn’t worth much on e.g. ebay.

Restoring the disk image data to the new SSD hard-disk

With the new SSD-harddisk plugged in and also with a USB-cable connected to my portable harddrive containing the stored image, I now booted up Clonezilla again (do it either via a physical USB-stick or use a virtual cd/dvd-rom drive, if your system/setup supports that):

We now want to restore from the image and at some point you need to select the image file directory to mount in the /home/partimag folder. The screenshot above shows that I have a 2 TB disk known as /dev/sdc1 and on that disk I made a subdirectory called “Egernvej_clonezilla“, for the image. The image itself is called a default name: “2023-05-10-01-img“, telling the user the image creation date. After clicking “Done“, this image becomes mounted as /home/partimag, which is where Clonezilla expects backup images to be stored to and also backup images to be read from.

I ignored that I knew at some point an error would most likely tell me something I had to react on. I selected the partitions I wanted to restore and clicked enter:

First attempt – the “naive” approach

I chose “restoreparts” from the menu:

And received the error message: “Two or more partitions from image dir were selected. Only the same partitions on the destination disk could be restored“. I suspected things would fail, because I just used the default settings all along. I think the error message is a bit confusing so I tried to google and search for it to understand things better and it came up with a Github-link, showing parts of the program source-code:

The comments to the source code reveal some extra information, to better understand the error:

In this case, it’s too complicated to restore partitions to different partitions on destination disk. Therefore we only allow original partitions to partitions, i.e. e.g., sda1, sda2 from image -> sda1, sda2 on destination disk.

Comment from https://github.com/stevenshiau/clonezilla/blob/master/sbin/ocs-sr – the source code for the Clonezilla commandline tool to save to or restore from images.

I’m guessing that it’s trying to tell us that “you can only restore the exact same partitions that also were on the source disk” and I’ve deliberately told it to exclude sd6 and sda8 – so I need to go back and do something else…

Second attempt (using options “-k1” and “-icds” in “expert mode”)

The “naive” approach didn’t work. We need to use some “expert”-settings. I googled and found https://drbl.org/fine-print.php?path=./faq/2_System/119_larger_to_smaller_disk_restore_clone.faq#119_larger_to_smaller_disk_restore_clone.faq:

The screenshot above explains there are 2 methods. The first is the simplest – but won’t work, because it requires that “… the used data blocks on the source disk are within the disk boundary of destination one“. The meaning of these options is shown below (I have combined 2 screenshots below, to a single image to save a bit of space):

I suspected that method 1 wouldn’t work and because I rarely use the “expert settings” I was curious to see what happened if I ignored the prerequisite of having data blocks on the source disk, to fit within the boundary of the destination disk:

After a while, this came up:

The red error messages tell that “The ratio for target disk size to original disk size is ~0.24” and it also tells that it cannot create a partition table on /dev/sda.

Remaining attempts (using option “-k2” to create the partition table manually)

I realized Clonezilla could not guess what it is I wanted and I need to specifically tell it how to create the partition table, which I can see requires running with the “-k2”-option (“create partition manually”):

After a while, this came up:

It tells us that it wipes some filesystem header information, drops you to a shell and asks you to fix the partition table before exiting.

First attempt at creating the partition table

While creating the image, the Clonezilla tool stored the original partition table information in different files, e.g. a text file called “sda-pt.sf”: I copied the file and modified it, more or less as shown below and this is where I began making some stupid mistakes which made me think I might as well write a blog post about my mistakes and maybe it can help someone else in the future:

I didn’t want sda{6,8,9} from the old drive (shown to the left, i.e. this is “sda-pt.sf”, the new partition table text file is shown to the right). When the excluded partitions are not included in the new partition layout, the remaining partitions moves up so sda6 on the new drive becomes sda7 on the old and so forth. The partition boundaries has to be calculated in order to ensure that every partition at least can hold the size of the data from the backup, used for creating the image. It’s easy to do using the “start”-sector and the “size” which each has a column in the “sda-pt.sf”-text file.

For each partition, the start sector plus the size gives the offset for the next start-sector (at least that’s the idea unless there is unused space between partitions). People who are more experienced than me can already see I did a stupid mistake here by not modifying the “last-lba” value and also by keeping the huge start sector offset on sda5, which came from the original, unshrinked partition table. There is simply not enough space for this partition layout to fit on the smaller 240 GB SSD-drive and it took me quite some time to understand this error and what I should do:

For writing the partition data to disk I used “sfdisk /dev/sda < TEST.sf“, where TEST.sf is a simple text file with the new partition table information (the contents are shown above). I couldn’t understand the error message because I was focused on the “Last LBA specified by script is out of range” – which was also a mistake:

I found two methods to output the maximum LBA (Logical Block Addressing):

  • Using the contents of /proc/partitions
  • Using “fdisk -l”

Luckily both methods gave the same value as shown in the screenshot above. I tried to insert this value in my “TEST.sf”-file and again I got the same error message. I then subtracted 1 from the “last-lba”-line and I think I got the same error message (which I think is weird, but maybe I’m misunderstanding how to use this line – if you know it, please drop a comment and I’ll update this). Finally I deleted the whole line and this helped, because now I got:

The message: “Sector XXX already used. Failed to add #5 partition: Numerical result out of range” is a new error and therefore a new problem had to be solved. Unfortunately, I got myself completely confused because I didn’t see the real problem and found a bug report saying that I should reboot in UEFI-mode and the error would disappear. So I did that – booted from an Arch Linux USB-stick – but I still got the same error message…

creating the partition table – The right way…

I discovered that I made a stupid mistake, which is best illustrated with the screenshot below:

On the original disk I correctly did shrink the Windows C:\-partition – but I didn’t taken into account that there is now 380 GB of free/un-allocated disk space, before the remaining partition data. I knew I had never tried this with Clonezilla but I had a feeling it could handle it, because it’s such an old, well-tested free Linux-tool. I modified my partition data text-file:

I cannot modify the “size” of each partitions because there is not enough space to increase it and I cannot decrease it, because I cannot restore the image if it doesn’t fit on each partition. The solution is therefore that I should modify the start-sector so there’s no free or unused space, between the partitions – this is simply “wasted space” and this is how I made everything fit into the new SSD-disk.

By the way, maybe it would’ve worked if I now inserted the “last-lba”-line – but I thought to myself that if there are no errors or warnings, Clonezilla and/or “sfdisk” would figure out what needs to be done so things are working (please write a comment if you know about this and have something relevant to add). I also made myself a small partition-validation one-liner command, which I’ll copy/paste here so it can easily be copy/pasted in the future, should you decide to use this information yourself:

IFS=$’\n’; for l in $(cat TEST.sf|grep -i start |awk ‘{print $4,$6}’|tr ‘,’ ‘ ‘); do start=$(echo $l|awk ‘{print $1}’); sz=$(echo $l|awk ‘{print $2}’); next=$(($start + $sz )); echo “l=$l ; start=$start; sz=$sz; next=$next”; done

The command output is so I could easily and automatically compare the partition boundaries with the contents of my partition data text input file and see if I had made a wrong calculation somewhere. I could now write the partition data to disk.

I didn’t try writing the partition data to harddisk from within Clonezilla, because it didn’t boot in UEFI-mode. To be safe I did it from within Arch Linux, booted from a USB-stick in UEFI-mode (if this was not needed, please write a comment but I think it probably was needed). This is how it looked:

No errors – everything looks good now! Now is the time to shutdown Arch Linux and reboot into the Clonezilla environment…

Restoring the data – final attempt (using options “-k” and “-icds” in “expert mode”)

With the fixed partition table written to disk, I wanted to see if Clonezilla understood what to do next. I used the “-k” option meaning “DO NOT create a partition table on the target disk” (because we just manually wrote that and Clonezilla couldn’t understand what to do without us doing it) and “-icds” meaning “Skip checking destination disk size before creating partition table“. These were the settings I thought made most sense to use and this is how it looked (I again combined two screenshots into a single screenshot to save a bit of space):

It’s a bit weird that it came up with these two error messages (in red): “Unable to find target partition ‘sda10’” and “… finished with error!” – hopefully I didn’t screw up my screenshots. If anyone reading this can explain why it wrote this error message or if I made a mistake, please write a comment below…

But except for that I felt good – it must mean everything up until sda10 went ok, so I tried to shut down everything and boot up and see if it worked…

Fixing everything so it boots up correctly…

If I remember correctly, after booting up this is what happened:

I cannot remember what happened after attempting to boot (because there were several days of delay before I actually wrote this blog post, based on the collected screenshots). Also around this moment, I got the following “Secure Boot Violation”-screen:

I had an idea that I had to enable “Secure boot”, because I temporarily turned that off in order to boot from my Arch Linux USB-stick in UEFI-mode (sorry I didn’t explain that above, I try to include many details but not everything and some pieces you have to put together yourself). The following screenshot is slightly manipulated:

After enabling “Secure Boot”, you obviously need to “File” -> “Save Changes and Exit” -> select “Yes” to “Are you sure you want to Save Changes and Exit?”. At this point, I probably might’ve tried to boot into Windows but haven’t any screenshots of that – or maybe this is where I got the “Secure Boot Violation”-error message?

The next thing I remember is going into “System Recovery” – and then into “Windows Recovery” (the screenshot below is again combined of 2 screenshots):

I was very optimistic now – would it all work now after another reboot?

It did! Next time, the system automatically booted up in Windows, I installed a lot of upgrades and as you see I had (wasted) 2.49 GB unallocated disk space in the end. Ideally that space should’ve been added to the Windows C:\-drive because it’s difficult to use 2.49 GB. On the other hand, it’s not important that I waste 2.49 GB out of 240 GB and I had already spend enough hours on this old PC. I also know from experience that probably only 140-150 GB is ever being used, so there’s enough space.

Should I repeat this all over in the future, I think I would’ve put in extra effort to end exactly at the 13.28 GB NTFS boundary, aligned up correctly towards the last sector of the hard-disk. I would just have to expand the C:\-drive by the number of sectors, corresponding to 2.49 GB of data and then all the subsequent “start-sectors” would increase by that same “number-of-sectors”-amount. If done properly, there would be no un-allocated free (wasted) disk space, in the end of the disk.

Minor/remaining issues to fix…

While shrinking partitions, I had to do some of the things described here:

Now things were working again, I had to revert these things. At least I remember doing:

YMMV: It’s up to you what you do from here, if you’ve used this information to do something similar as I did here…

Conclusion

In this blog post, I’ve described how to use Clonezilla for moving several partitions (incl some NTFS-partitions), from a larger disk drive to a smaller disk drive. It’s a task that requires using some of the “expert”-features and I’ve described pretty much all the mistakes I did along the way. I think Clonezilla is a great free open-source software tool and if you read this and are about to do something similar with Clonezilla, you don’t have to make the same mistakes I did and hopefully you’ll be able to quicker accomplish your goal than me.

The most important outcome of this blog post is that I hope the reader now understands more about how to create a partition disk layout text file and how to write that partition layout to disk using e.g. “sfdisk” (I believe the concept behind all these partitioning tools is the same so feel free to use another partition tool for this task). Furthermore you should’ve gained a reasonable understanding of how to align the partition boundaries on the disk so a Clonezilla image containing several partitions all can fit into your manually created partitions.

I have not explained how to use the tool as a beginner – however I think I’ve included a lot of very detailed explanations and screenshots, to avoid any possible confusion and if you understand this I think you can easily use the “beginner”-mode also. The one place where I lost the exact details for the corresponding screenshot was when the “Secure Boot Violation” BIOS-message appeared – I think based on the rest of the descriptions, you should easily be able to figure out how to handle this, despite that.

Finally, thanks for reading this. If you have any comments, suggestions etc, please write in the “comments”-section just below here, thanks again!

Categories
2023

More sporadic blog posts in the future, unfortunately – but still alive…

Background intention: To publish once per month

When I started this blog some months ago I had a few ideas about topics I at least “quickly” could write about, if I suddenly became too busy with other things and didn’t have enough time to come up with a good topic to write about. The past month or a bit more has been months where I haven’t had much free sparetime available. I’ve realized that I cannot come up with enough high-quality content every month and I (at least currently) don’t want to write blog posts, just for writing them – it must provide some value and be useful to people reading it – and not just something copy/pasted from another website.

I have a lot of side-projects going on and many of them are projects I never fully complete and therefore doesn’t make sense to write about – and other side-projects are just very very long-term, like the whole Proxmox-/pfSense story is one that never fully ends. I have many IT-related skills I want to improve, but for those of you working with IT, you probably know it takes a lot of time to become good at something. Getting good ideas for meaningful blog posts and writing everything down with screenshots/illustrations, including relevant references etc is a slow process and it always takes me longer than I imagined.

I have ideas about writing about k3s/k8s, my running docker-containers and and cyber-security (hacking/cracking) in the future but my cyber-sec experience comes from particular CTF-competitions and perhaps it’s best if I can generalize it instead of doing write-up(s) for particular challenges. No matter what, I can see that having the blog, maintaining it and writing posts is rather time-consuming.

Future intention: To publish less and still keep the quality

I’ve therefore decided that future posts will be less frequent – more sporadic – than my original goal of publishing 1 blog post per month (that’s impossible as I see it). Lately I’ve had focus on job interviews and I’ll also change job/career from first of June, so that probably will take some extra time and energy – furthermore in July/August I’m usually away, travelling in Europe/UK for almost 3 weeks and with 3 weeks out of the calendar around July/August, it’ll be almost impossible to also develop a new blog post of decent quality. Hence this explanation.

My future goal is not to post a blog post every month (as I earlier communicated, e.g. in comments), but hopefully at least once every second or third month (no promises, cannot guarantee anything). The good thing is that with the RSS-feed-subscription-method mentioned at https://mfj.one/, the section “Consider subscribing to the RSS-feed, if you like the content” tells what to do, to automatically get updates when new posts are written (the bad thing is that search engines doesn’t like if I don’t update frequently, leading to less readers, but that’s just how it has to be for a one-man hobby project)…

Conclusion

Thanks for reading this – the blog website is not dead, just because it’ll be updated less than every month going forward – I just also have to focus on other things, like having a life and my daily job (soon a new job) and soon summer holiday. I also appreciate all the good comments below the blog posts, it makes me want to continue writing once in a while – thanks a lot for that!

Categories
2023

How to get (setup and run) a totally free hosted email server in the cloud?

Introduction

As some readers of this blog may know, I’m pretty happy with Oracle Cloud for learning, experimenting with and/or improving my cloud-skills and the reason is that it’s free and not only that: I feel I get a lot of free resources (up to 4 instances, 24 GB RAM and 200 GB space, see my previous blog posts about this). This blog post is obviously describing how I’ve found another way to utilize the free Oracle Cloud-resources (in addition to this self-hosted website). The intention is to inspire and I see four possible options:

  1. Opt for a free tier: https://www.oracle.com/cloud/free/ and leave it that way (NB: don’t misuse it).
  2. Upgrade the free tier to “Pay As You Go” (PAYG) which I did (don’t misuse it, because it’s still free!).
  3. Self-host on your (typically older, unused?) hardware and open up your firewall to the internet (allow in-bound traffic) via your router-admin configuration pages… You’ll pay for consumed electricity, but that’s probably still cheap…
  4. Use another Cloud provider of your choice (typically you’ll have to pay so I see this, as the most expensive option).

Notice that with OCI (https://www.oracle.com/cloud/) even though you’re a PAYG-customer in the system, you still don’t have to use any resources that costs anything. Therefore I’m emphasizing: Please don’t misuse the free resources and don’t leave instances running if they’re idle and not doing anything etc.

Next, it’s important to mention that in this blog post I’ll describe the initial steps of how to setup an email-server and run it for free via OCI (but you should be able to do everything with a normal Ubuntu-instance running wherever you want it to). I used “Ubuntu 22.04.2 LTS”…

Explanation of why I almost gave up and why care?

The reason I began looking into this topic is that my (soon previous) hosting-provider in 2022-2023 increased their prices much more than I think is reasonable (blaming it on the Ukraine-war, increasing energy-prices and inflation etc), so what can you do?

It’s not that I cannot afford it – I just thought it was/is unreasonable expensive and so I decided that I also wanted to experiment and learn new things. For this domain I don’t care about web-hosting because I have a very simple webpage and it’s also relatively simple to setup Apache/Nginx… It’s not so common to control your own mailserver (compared to a web-/ file-server), so I think this was an interesting and a good challenge for me. I began messing around with different solutions and arrived at Mailcow: “The mailserver suite with the ‘moo’“.

It doesn’t directly run on the OCI Ampere ARM 64-platform that I’m using so that caused me a lot of pain… It seems it’s primarily made for running on the AMD64-platform (also known as x86-64 or x64).

So you might ask: Doesn’t Oracle provide a free AMD64 instance you can use? They do and I actually have one of them, but that only comes with 1 GB of RAM where each of my 3 ARM-instances can have up to 24 GB of RAM in total (for free) – so each of my ARM-instances have 8 GB of RAM and Mailcow requires at least 6-7 GB of RAM… It took a long time to figure out how to run Mailcow on ARM, so also for that reason I think I needed to share my experiences and hopefully this can make things easier for other people in the same boat.

What this blog post will not cover…

Because I couldn’t make Mailcow work for the ARM-processor for several days I almost gave up and I began looking into other alternatives. Eventually however, I figured that Mailcow looked like the most promising of the mail-servers I wanted to run. And this is where I (maybe) made a mistake: I found a really cheap hosting-provider and moved my email and web-domain (not for this site) to that provider and I can live with that because setting up all the outgoing email mumbo jumbo is boring and tiresome and therefore the main limitation of this blog post is that I’ll not cover neither DNS nor outgoing email-setup (at least not this time).

With that decision I’ll be using a self-signed certificate which doesn’t work well for encrypted nor outgoing email communication… Sounds bad? Maybe – but I’m sure these things can be fixed relatively easy after the preliminary steps has been taken, which will be covered here and then it’s up to the reader if you want to continue where I left. NB: I’ll provide some hints later, because I did experiment with setting up outgoing email on OCI around 6 months ago.

About not covering the setup for setting up the DNS-records: It isn’t difficult to do yourself and you can easily find that information on Google (and ChatGPT probably also knows it, if you’re using that). Use key words such as: “outgoing email server setup”, “A, MX, CName, TXT, PTR records” and “ARC/DKIM key”.

Having a self-hosted Mailcow solution is more risky than using a hosting provider. Why should I setup DNS for my email-domain, when I just bought a 1-year cheap hosting-solution and that provider has a much more professional setup (automated backups, support etc)? With the paid solution I’m also pretty sure I don’t lose any emails and outgoing email is working right out of the box. On the other hand, self-hosting is a good way to learn things.

It’s up to yourself to decide if you want to continue along this path, but if you read through this guide and also take the extra steps with DNS for your maildomain and make it work properly with certificates and in-/out-going email, I’ll appreciate you write it in the comment-field with your remarks. WARNING: Be sure to test outgoing emails with e.g. https://www.mail-tester.com/ – it’ll help a lot… I might also continue this blog post later, perhaps just before my new 1-year hosting subscription runs out…

What I’m not covering here should hopefully not discourage people from trying to go for the full-fledged Mailcow self-hosted solution and I’ve seen a lot of people are really happy about Mailcow and it just runs, for years without problems…

I’ll also not cover anything related to IT-security in this blog post because I expect that Mailcow automatically e.g. has brute-force password protection and similar builtin by default – otherwise, please leave a comment and I’ll look more into it and possibly update this blog post.

Requirements / software dependencies

I’ll/we’ll be using:

As mentioned, I’ve used a setup with a simple Ubuntu 22.04.2 LTS-server, but I think other versions and distributions including Debian, would probably work in a very similar way.

Installation procedure

Unfortunately I haven’t written down every detail but the following is the installation procedure I used (YMMV):

In general, you should follow the guidelines and steps from:
https://docs.mailcow.email/i_u_m/i_u_m_install/#installation-via-script-standalone which are:

  • Install GIT
  • Install Docker
  • Install Docker-compose

I’ll not cover how to do these things because if you’ve read my blog posts and find them interesting I assume you know how to do these things – or can figure it out quickly. I’m also not completely sure if I did things exactly like described in the Mailcow documentation but at least I ended up having:

docker-buildx-plugin/jammy,now 0.10.2-1~ubuntu.22.04~jammy arm64 [installed,upgradable to: 0.10.4-1~ubuntu.22.04~jammy]
docker-ce-cli/jammy,now 5:23.0.1-1~ubuntu.22.04~jammy arm64 [installed,upgradable to: 5:23.0.2-1~ubuntu.22.04~jammy]
docker-ce-rootless-extras/jammy,now 5:23.0.1-1~ubuntu.22.04~jammy arm64 [installed,upgradable to: 5:23.0.2-1~ubuntu.22.04~jammy]
docker-ce/jammy,now 5:23.0.1-1~ubuntu.22.04~jammy arm64 [installed,upgradable to: 5:23.0.2-1~ubuntu.22.04~jammy]
docker-compose-plugin/jammy,now 2.16.0-1~ubuntu.22.04~jammy arm64 [installed,upgradable to: 2.17.2-1~ubuntu.22.04~jammy]
python3-docker/jammy,now 5.0.3-1 all [installed,auto-removable]
python3-dockerpty/jammy,now 0.4.1-2 all [installed,auto-removable]

It shouldn’t cause you too many problems to install neither Git nor Docker or Docker-compose. Next, we need to clone Mailcow-repository:

$ su
# umask
0022 # <- Verify it is 0022
# cd /opt
# git clone https://github.com/mailcow/mailcow-dockerized
# cd /opt/mailcow-dockerized

And we can generate the configuration:

You might want to manually check the generated configuration, e.g. run: “vim mailcow.conf” – especially if you want to enable the webmail client “SOGo”.

At this point, you should decide if you want to enable the SOGo-webmail client and I strongly recommend it – at least just try it out and you can later disable it again. Note that by default it’s disabled so you need to actively do something to make it work.

The webmail client “SOGo” also has a calendar and an address book but it’s the webmail client I’ve been using it for and I think it’s useful when we later perform mail migration of emails from an existing (old) mail-server to the new mail-server. If you decide to not use SOGo as webmail client, you can also setup e.g. Thunderbird or a similar mailprogram and I did try that too.

With Thunderbird however, I manually had to rightclick the account and click “Get messages” and accept a warning, so it would ignore missing encryption. This is a consequence of not having assigned the domain correctly via DNS yet and therefore Mailcow will use a self-signed certificate – not a big deal for what I did, but if you’re handling sensitive emails you also do not want unencrypted email communication. For testing that the mailserver works, I think having a missing trusted signed certificate is a good reason for just using the webmail client (no data is sent across the internet, when the webmail client runs on the mailserver ).

You enable the webmail-client by editing the “mailcow.conf“-file using an editor (I use vim, install what you prefer) and make the modification as seen below (SKIP_SOGO should be “n” for “no” to enable it, by default it’s “y” for “yes”):

# Skip SOGo: Will disable SOGo integration and therefore webmail, DAV protocols and ActiveSync support (experimental, unsupported, not fully implemented) - y/n

#SKIP_SOGO=y
SKIP_SOGO=n

I’m not sure I understand the reasons for not enabling SOGo but the comment says it’s experimental, unsupported and not fully implemented. For what I used it for, it worked perfectly fine those days I tested it. Also, it’s easy to disable again in case of unexpected problems.

Fixing platform-dependent Docker-image errors, using docker-compose.override.yml

At this stage, the official Mailcow-documentation tells us to:

docker compose pull
docker compose up -d

But that won’t work (try if you want). If you do this, you’ll notice 4 of the containers keep restarting every minute.

There is more than one problem here – the first problem has already been mentioned: That we need some images built for linux/ARM instead of linux/AMD64 and we need some extra packages installed. So you should do:

# First install some packages that allows us do some QEMU ARM-emulation:
/opt/mailcow-dockerized# apt install qemu-system-arm binfmt-support qemu-user-static

The next problem is that we need to downgrade some of the images – create a file named “docker-compose.override.yml” with the following:

/opt/mailcow-dockerized# cat docker-compose.override.yml
version: '2.4'
services:
  unbound-mailcow:
    image: quay.io/mailcowarm64/unbound
  clamd-mailcow:
    platform: linux/amd64
  rspamd-mailcow:
    platform: linux/amd64
  php-fpm-mailcow:
    image: quay.io/mailcowarm64/phpfpm
  sogo-mailcow:
    image: mailcow/sogo:1.114
    platform: linux/amd64
  dovecot-mailcow:
    image: mailcow/dovecot:1.21
    platform: linux/amd64
  postfix-mailcow:
    image: quay.io/mailcowarm64/postfix
  acme-mailcow:
    image: quay.io/mailcowarm64/acme
  netfilter-mailcow:
    image: quay.io/mailcowarm64/netfilter
  watchdog-mailcow:
    image: quay.io/mailcowarm64/watchdog
  dockerapi-mailcow:
    image: quay.io/mailcowarm64/dockerapi
  solr-mailcow:
    image: quay.io/mailcowarm64/solr
  olefy-mailcow:
    image: quay.io/mailcowarm64/olefy

We can now spin up all the containers, just as the official documentation told us:

/opt/mailcow-dockerized# docker compose pull
/opt/mailcow-dockerized# docker compose up -d

After a minute or two, you should now be able to access https://${MAILCOW_HOSTNAME_OR_IP_ADDRESS} with the default credentials: “admin” and default password: “moohoo” – except that probably you’ll get a certificate warning: “Your connection to this site is not secure” – I’ll assume if you’re interested in the stuff I write about in this blog, you’ll know how to ignore or handle this – or can quickly figure out how to solve this:

First login: Enter default credentials and you can login – but you might still experience issues…

As you can see in the screenshot above, I accessed the mailserver directly via it’s IP-address – if or when or once you make the proper DNS-changes pointing to your domain, you should be able to access it via the domain-name that you provided in the Mailcow-configuration. Also, obviously, it doesn’t matter, which FQDN you gave it earlier, until (or if) you update your DNS-settings.

WARNING: Immediately after I tried to login, I noticed that nothing happened – I was redirected to “…/debug”, see below:

After every login, absolutely nothing happened – and the page redirects to some “debug”-page? A work-around is to manually just replace “debug” with “admin”…

I’m not sure what the real issue behind the “empty debug”-login-page is – but it’s not something I investigated a lot. I found out that the solution is that one should replace “debug” with “admin”, then refresh the page and it’ll work.

NB: If you find a better solution or perhaps an explanation, please write it in the comments and I’ll update this blog post, thanks!

For testing the mailserver, I think this procedure is perfectly valid (without changing DNS yet). For real production I believe you can still do as described here and wait with the DNS-changes pointing to your domain, until all emails have been migrated. You don’t want to lose emails so you want to be sure you’ve setup all email accounts on the new server, before DNS changes are in place – also preferably you want all emails in place, when you decide to go live…

This procedure should hopefully have the effect that no emails are rejected or lost, until the new DNS-settings are applied and you can turn off the old mail-server. Until now, the email-server and accounts hasn’t been configured – we’ll do that now and also transfer (=copy) the emails from the old, to the new mailserver (=do the email migration procedure).

Email migration, configuration and working with the mailserver-UI

If you’ve arrived here and followed the steps I’ve described, you’ll now successfully have been logged into your new free mailserver. Congratulations: I think the worst is behind us!

Although I’m not using the Mailcow email-server for “production” (maybe in the future I will, however) I want to take this blog some extra steps ahead and explain what to be aware of and how to proceed, should you decide to use Mailcow in a similar way as outlined in this blog post:

The first thing to do, is to replace the credentials for the default admin-user…

The user-interface helps us a lot and I find many of the options or settings to be pretty self-explanatory and user-friendly.

Setting up mailboxes (user accounts)

Before we can setup mailboxes, we need to configure the domain:

Select “Email”-configuration in the top right corner – next click “Add domain”.

Once the domain has been configured, we can begin to add individual mailboxes (for each user on the system):

In this example, the “testuser”-mailbox has been added. Expand by clicking the “+”-icon for more settings.

At this moment I think it’s very helpful to have enabled the SOGo-webmail client, because it’s now easy to test that the mailbox has been configured properly and works:

First login via the SOGo webmail-client to test the mailbox-configuration.

You should now try to login to the webmail client, if you enabled it via the mailcow.conf-configuration file above. Otherwise you should setup your mailprogram, e.g. Thunderbird to connect to the new mailserver.

Assuming you did as I proposed (you can always connect to your mail-program later), this is what we’ll see after logging into the webmail client:

We cannot write outgoing messages because it requires us to properly configure DNS-records or our outgoing emails will (typically) be rejected.

It looks pretty nice, doesn’t it? At least it looks like most other webmail client interfaces I’ve seen (the colors are just a bit different). The SOGo webmail client also has an option to show an address book and a calendar, but I’ve decided not to show these features here, mainly because I doubt that I’ll ever need yet another online calendar/address book besides what I already have in my mobile phone/email program…

Email migration using “sync jobs”

We now want to copy messages over from the old mailserver to the new mailserver. The easiest way to perform the email migration is to use the builtin “sync jobs”-feature:

Select “Sync Jobs” and click the green “Create new sync job”-button.

After clicking the green button, one needs to fill out some information:

Information needed for a “sync job”.

That information is the same as the “alternative method” described below and the reason is that both methods use the “imapsync”-tool.

Notice that it took a while, until I could see the job started (I cannot remember exactly what it did but suddenly it started). And after it completed, the job was still there so apparently it was set to periodically retrieve emails from the old mailserver and copy to the new mailserver. I ended up deleting the job so it didn’t kept polling the old mail-server.

If I weren’t testing I would either have kept the sync job for perhaps a week and/or manually keep an eye out for emails not arriving at the new mailserver. Finally, I would unplug the old mailserver, having given all DNS-servers a realistic chance to update their records with the new domain’s IP address (a day or a few days should be enough).

Email migration using “imapsync”

The “imapsync”-tool has a webpage at https://imapsync.lamiral.info/ and it describes itself as:

a command-line tool that allows incremental and recursive IMAP transfers from one mailbox to another, both anywhere on the internet or in your local network. Imapsync runs on Windows, Linux, Mac OS X. “Incremental” means you can stop the transfer at any time and restart it later efficiently, without generating duplicates. “Recursive” means the complete folders hierarchy can be copied, all folders, all subfolders etc. “Command-line” means it’s not a graphical tool,

Gilles LAMIRAL, author of the Imapsync

So what is it and how does it work? I think it’s a well-tested and well-known program for doing exactly this task: Moving/copying emails from one server to another.

According to https://imapsync.lamiral.info/S/imapservers.shtml the tool has been reported to successfully work with at least 83 different imap servers, incl. Dovecot, Apple Server, Exchange Server, Gimap (Gmail IMAP), Hotmail, Office 365, Yahoo and several more. In order to use the tool, either you:

  • use a free online version of the tool (I however did not test it – test on your own responsibility) – see: https://imapsync.lamiral.info/X/
  • or download it and run it from your own computer (this doesn’t have to be a mailserver, a normal laptop is fine).

The reason I don’t like the free online imapsync-tools is that I don’t know if I can trust whoever is behind that webpage. I don’t know if they’ll keep a copy of my private emails and I just provided my secure login credentials with passwords… At least quickly change password after having used such a tool…

I recommend doing as I did: Download and install “imapsync” on a pc and specify source and destination servers (maybe you can install it on the mail-server and use localhost as destination, but I also didn’t try that). Once installed, the most basic thing one can do, is to query an IMAP server for it’s software name and version:

$ imapsync --host1 imap.gmail.com
...
...
Host1: probing ssl on port 993 ( use --nosslcheck to avoid this ssl probe ) 
Host1: sslcheck detected open ssl port 993 so turning ssl on (use --nossl1 --notls1 to turn off SSL and TLS wizardry)
SSL debug mode level is --debugssl 1 (can be set from 0 meaning no debug to 4 meaning max debug)
Host1: SSL default mode is like --sslargs1 "SSL_verify_mode=0", meaning for host1 SSL_VERIFY_NONE, ie, do not check the server certificate.
Host1: Use --sslargs1 SSL_verify_mode=1 to have SSL_VERIFY_PEER, ie, check the server certificate. of host1
Host1: Will just connect to imap.gmail.com without login
Host1: connecting on host1 [imap.gmail.com] port [993]
Host1 IP address: 142.250.147.108 Local IP address: 192.168.1.57
Host1 banner: * OK Gimap ready for requests from 101.94.73.38 b8mb128406701wrp
Host1 capability: IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 XYZZY SASL-IR AUTH=XOAUTH2 AUTH=PLAIN AUTH=PLAIN-CLIENTTOKEN AUTH=OAUTHBEARER AUTH=XOAUTH AUTH
Host1: found ID capability. Sending/receiving ID, presented in raw IMAP for now.
In order to avoid sending/receiving ID, use option --noid
Sending: 2 ID ("name" "imapsync" "version" "2.229" "os" "linux" "vendor" "Gilles LAMIRAL" "support-url" "https://imapsync.lamiral.info/" "date" "14-Sep-2022 18:08:24 +0000" "side" "host1")
Sent 181 bytes
Read: 	* ID ("name" "GImap" "vendor" "Google, Inc." "support-url" "http://support.google.com/mail" "remote-host" "101.94.73.38" "connection-token" "b8mb128406701wrp")
  	2 OK Success b8mb128406701wrp
Exiting after a justconnect on host(s): imap.gmail.com

It’ll write a lot of information to the screen and the –host1 argument tells which “source” host to use. In this case Google present itself as a an IMAP-server with ID: “GImap” and then there are some host capabilities which we probably can and should ignore.

For doing migration, we need to add something more than “–host1 +sourceServerName” to the command-line. I recommend looking in the official manual https://imapsync.lamiral.info/README and/or just try something like this, which is what worked for me:

$ imapsync --nofoldersizes --addheader --subscribeall --automap --tls1 --host1 mail.myoldexpensiveprovider.com --user1 postmaster@myDomain.com --password1 BAD_PASSWORD_OLD_SERVER --port1 143 --host2 151.123.36.103 --user2 postmaster@myDomain.com --password2 BAD_PASSWORD_NEW_SERVER --no-modulesversion --noreleasecheck

In this example, assume the domain “myDomain.com” is pointing to the existing/old mailserver and my new mail-server IP address is 151.123.36.103 (it isn’t, but I also used that in the screenshots). So, “host1” is source and “host2” is destination – likewise for “password1” and “password2”. The remaining settings stem from when I first ran imacsync using the “sync job”-method so when I later did the same thing from the commandline I copy/pasted the commandline information which was used and which I knew worked.

All I had to do was therefore just to change a few arguments relating to –host2 (e.g. I added –user2 and –password2 arguments, because now I weren’t running imacsync on localhost). In any case, the commandline shown above is the one that worked for me so feel free to be inspired and do something similar.

I’ll not show all the output this command produces, but it’ll automatically log the output in a subdirectory named LOG_imapsync so you don’t need to pipe the output via “tee”, use Linux redirection or things like that.

Future steps about this matter

Obviously with this solution there’s no automated backup – you probably want to add that functionality if you have important mails stored.

If you want to be just a bit serious you also want to setup DNS and outgoing email so it passes the tests at https://www.mail-tester.com/ (and/or receives a high score). I currently have another instance running on OCI, which is running https://hestiacp.com/ – and in that regard, I remember I had problems with setting up outgoing emails and I think I remember that the reason was that the free OCI IP address range had IP-addresses with very bad email reputation in it.

This bad IP-address range automatically results in a low score using mail-tester.com. I remember I fixed that problem by using a free email-relay service (smtp-relay.sendinblue.com ; port 587), but that’s another story for another day. Outgoing email is cumbersome to setup but I know it’s possible to do it, because I’m doing it on another OCI-instance. This blog post describes the steps that are needed to get started, but I didn’t want to waste time on handling outgoing emails and DNS-records. An alternative to using e.g. sendinblue.com for SMTP relaying is probably to pay Oracle (or whoever you’re using) for getting access to a good IP address range, i.e. one that doesn’t have it’s IP address blacklisted (it’s a topic beyond the scope of this blog post, just mentioning it, if you’re heading in that direction).

In general, I don’t think it’s difficult so setup all the DNS and outgoing email-stuff. If you continue along this path and redirect your domain to the new IP-address, this will also enable you to get a proper valid authorized signed certificate for HTTPS-encryption and enable to you activate SSL-encryption and stuff. I’m just not going there this time because:

  • I’ve already been there, done that – it’s not so fun to do things when you already know how to do it.
  • I don’t need it until perhaps beginning of next year and when we get there, I’ll consider it and make a part 2 of this blog post (still with Mailcow I think).
  • It just takes too much time to describe and would make this blog post unreasonable long.

PRO/BONUS TIP: In connection with moving my emails from my old hosted mail-server to the new hosted mail-server, I also changed the DNS-settings and sent some test-mails. I discovered that:

  • Google GMail almost immediately picked up the DNS-changes (within a minute or so) and correctly delivered the emails to the new mail-server.
  • I also tried that same thing with other mail-accounts whose mail-servers didn’t deliver to the new server until after several hours or half a day or something…

I think it was very interesting that I immediately could see that GMail was capable of quickly picking of DNS-changes and once I knew I could both send and receive, I knew everything was done correctly and I could relax and go to sleep, just wait for the remaining DNS-servers across the world to pickup the changed DNS-records…

Conclusion

In this blog post I’ve explained how you can setup a fully working email-server – however without going into the details about setting up DNS and outgoing email. The best thing about it is that it’s completely free and if you do like me you don’t even pay for the electricity bill! Isn’t that great?

As you might’ve realized by now, I think Mailcow is a great project and obviously you can also use the same method to freely host a web-server, which is even easier to do on the ARM-platform (and probably better supported)… I’m really happy that the issues I faced eventually worked out and that I figured out how to use the docker-compose.override.yml file to not only spin up Docker so everything works on ARM-CPUs, but also to downgrade the versions that otherwise were responsible for some weird errors (that I’ve luckily forgot all about now).

There were other minor issues I discovered along the way, but I encourage readers to test my setup, go through this blog post and write me a comment if you succeed or not and if this is useful. Also write me a comment if you think I need to describe something better, rephrase something, if there are better options available I didn’t or don’t know about etc – and thanks for reading this to the end.

Categories
2023

Introduction to keyboard ergonomics (mostly for people typing +6 hours a day on a keyboard)

Why I think keyboard ergonomics is important

In this post I’ll explain what I do and have been done for many years, to minimize the problems from typing for sometimes many hours on a keyboard – first on the job and later back home, in the evenings/weekends etc.

I’ll go through my own experiences and explain how keyboard ergonomics became more important over the years. I’ll explain some mistakes I didn’t knew was mistakes for many years in the beginning and I’ll tell why it’s a problem, if you type for many hours on a keyboard and begin to feel pain – go through some different keyboard types and explain why I changed keyboard layout. I’ll also make this post an introduction to the concept of automated (software) keyboard layout analysis and introduce some concepts that I and many people normally don’t think about, in the search for a so called “optimal keyboard layout”. Finally I’ll add some links to additional advanced background information that I didn’t feel I had time enough to write about.

Historical background:

When I was a young boy and started using computers I like many others didn’t think about keyboard ergonomics. When I became a teenager, I had begun programming simple things and in my mid-teens I began learning and using assembly/machine code, mainly because the PC I had was too old for many of the games my classmates had on their newer and more modern PC’s. The years up until I was almost 30 years was spent mostly in the education system and I used computers in probably the same way people today use it: For school work, email/internet and video games. But then something happened to me:

I began using Emacs on Linux “professionally”…

As a young engineering student I remember, when I started doing more “advanced” reports for higher-level engineering courses and all those courses needed proper mathematical typesetting. I looked around me and all the brightest people recommended to not use Microsoft Word for such reports. The MS Word alternative – even Open Source – was to learn LaTeX. If you don’t know it, LaTeX is the preferred typesetting system for researchers, mathematicians, scientists, engineers and it was an incredible tool for making nice and great-looking reports with proper mathematical expressions and symbols – although there was a steep learning curve, so it wasn’t for everyone. Also, LaTeX isn’t a WYSIWYG-editor like MS Word and similar editors – so you write articles using plain text-files with a syntax relatively similar to HTML and then run a program to process that text-file and generate a PDF-file.

Because I came with a (also by that time) proficient and experienced Linux-user’s background I thought about what was the best editor to use, for producing my LaTeX-reports. It came down to a choice between either VIM or Emacs – and I remember it took a long while to make up my decision and there were pro’s and con’s for each, see e.g. https://en.wikipedia.org/wiki/Editor_war. I joined the comp.text.tex-newsgroup and bought LaTeX-books and eventually decided to go with Emacs, because it had AUCTeX and preview-latex which looked great as seen below:

The screenshot examples above are from the preview-latex webpage. Although LaTeX-input files are ASCII-only, this combination helped to visualize graphics and equations without compiling the whole document so it was easy to quickly check if something looked correct (like mathematical expressions).

Everything was fine for many years…

In a typical work week I spend around 37 hours doing my primary work and additionally, add maybe give or take in average 15 hours at home, working on my side projects – roughly +50-55 hours a week. It’s a lot of hours, typing on a plastic keyboard. It’s probably been like this for +20 years and it’s a good balance for me because I see when I work more than 55 hours per week, it becomes too much.

For years I didn’t notice working too much with a keyboard (and mouse) could be a problem. When I was around 30 years old, I however began to feel pain or aching in my fingers, my hand or wrists, typically after working too many hours per week in front of a keyboard and computer. I understood I began to feel RSI warning-signs. I’ve read about this many times since then and typically people experience problems with:

  • shoulders
  • elbows
  • forearms and wrists
  • hands and fingers

When I’ve been typing on a keyboard for too many hours (>55-60 hours per week) I know from experience that I’m usually experiencing pain, aching etc in arms, wrists, hands and/or fingers (the last 2 categories) and this is also when I know, that I should soon begin to relax – do something without a computer for some days, e.g. in the weekends. The pain doesn’t come from one day to the other – it gradually becomes worse, if ignored… I don’t remember ever having problems with shoulders or elbows. I’ve read that the most common repetitive overuse injury in the hand is called “tendinitis”, i.e. it is what happens when a tendon becomes inflamed and for me I’ve found that this is when I should go to the gym, watch movies or do something completely without a computer.

Part of the solution

I remember I was a PhD-student – around 12 years ago and I made the decision to switch keyboard layout and go away from the typical QWERTY-layout. I started using Colemak instead of Dvorak which also was a serious candidate (more about that decision below). I switched to avoid RSI-symptoms and therefore improve productivity – I believe it’s more ergonomically correct and healthy. I found a lot of internet discussions where people argued about why a given keyboard layout was or is “optimal”. I also joined internet forums with people discussing keyboard-layouts and overall, Colemak enabled and enables me to still use a normal QWERTY-layout because not all keys are different – only 17 keys are changed when comparing to QWERTY. Furthermore, “Z-X-C” typically used for undo, cut and copy-operations were left. This made it relatively easy to switch between layouts, when I e.g. used other computers than my own.

It was a big decision because it took probably 2-3 months to be comfortable with Colemak and obtain reasonable typing speed/accuracy and things were slow in the beginning (there are free online websites and applications with practice lessons to help make the transition). Colemak was/is also a smaller step to learn than DVORAK is, where pretty much all keys have been swapped. I think in terms of “muscle-memory” the switch between QWERTY and DVORAK is much harder compared to the switch between QWERTY and Colemak.

In the years since I switched I’ve always also only used mechanical keyboards instead of the cheaper plastic membrane counterpart keyboards that are used in many workplaces – but ergonomically horrible (at least for me). Even among experienced software engineers or people typing all day, I see people who don’t understand the difference between a bad and a good keyboard. People tend to stick with cheap plastic membrane keyboards and also recommend these – and probably it’s because we’re all different (if that works for them it’s fine with me). I know from experience it isn’t good for me, long-term – to use a plastic-membrane keyboard, even if they’re called an “ergonomic keyboard” (these often comes with a slightly different layout, but the quality is the same as the cheapest keyboards). When I read internet forum discussions where people discuss this problem, it’s generally also mostly something that people who type on a keyboard for at least 6-8 hours a day, has a qualified opinion about.

But it wasn’t just the keyboard layout and the keyboard – it was also the editor…

Eventually only around 4 years ago I came across a webpage that told me that “Emacs is evil”. While making this post I’ve tried to Google and search for that webpage, but unfortunately I cannot find it so it probably doesn’t exist any longer. I however think that the “Emacs is evil”-website was a big eye-opener for me and it wasn’t the editor itself – it was the default keybindings or key-combinations where modifier keys (e.g. control-key) were needed for a lot of common tasks such as opening/saving files, compile a program/document, change file, search for something (grep) etc.

I always had to type “control”-something (e.g. C-x C-f and C-p+ or C-h+something) too many times per day and that pinky-finger stretching became problematic. In retrospect, I think if it wasn’t for all that Emacs finger-stretching the whole time, I wouldn’t have looked into keyboard ergonomics and at least not at that age. I’ll now quote an almost 10 years old question and a bit of the answer: Question: Does Emacs cause “Emacs pinky”? The answer from http://ergoemacs.org/emacs/emacs_pinky.html is from a page that doesn’t exist anymore but it was quoted while it existed and I’ll repeat the quote below because I think it’s correct and useful:

Emacs makes frequent use of the Control key. On a conventional keyboard, the Control Key is at the lower left corner of the keyboard, usually not very large and is pressed by the pinky finger. For those who use Emacs all day, this will result in Repetitive Strain Injury.

From the (now removed) ergoemacs.org-website

The Emacs-wikipedia also has a small section about “Emacs pinky”: https://en.wikipedia.org/wiki/Emacs#Emacs_pinky – and the website http://wiki.c2.com/?EmacsPinky defines “Emacs pinky” as “hitting Control so much your pinky hurts“. I believe the eye-opening lost “Emacs is evil”-website was created by Xah Lee (based on my memory) who has written a lot about Emacs and also created some packages, e.g. “ErgoEmacs Keybinding: an Ergonomics Based Keyboard Shortcut System“, xah-fly-keys (claiming “this is the most efficient editing system in the universe”) and who previously had http://ergoemacs.org/ (not working anylonger).

It’s a pity the some of these webpages aren’t there anylonger, but at least the gist of the problem is hereby passed on. One thing to add here is also that Xah Lee has a page about ergonomic keyboards which might the useful, if you’re reading this and find it to be of interest. Last, but not least I found it a bit interesting to read Xah Lee’s article about History of Key Shortcuts: Emacs, vim, WASD, Etc. After having read all about the Emacs pinky-issue, I understood that I shouldn’t continue to work as I’ve done earlier. I tried to look for a solution and Xah Lee also has an article about that here: How to Avoid the Emacs Pinky Problem. One of the solutions I could see, was to manually remap my keys, at least modifier keys – another solution was to use “Evil Mode” in Emacs – which is a way to get “vim”-keys working in Emacs.

I went away from using Emacs and changed to VIM/Neovim

The idea of using a solution where some keybindings are taken from another editor and used in Emacs or keybindings are being remapped from the standard layout wasn’t an idea I liked. When I customize keys I don’t want to customize everything – only small changes that deviate from the standard. I decided to switch my main editor to Vim / Neovim, which I believe a really good solution and I haven’t looked back since then. Vim-key emulation is also relatively common and found in e.g. Visual Studio Code and QT Creator where I’ve obviously enabled them, for quick/fast navigation. VIM is wonderful – I feel it’s incredible powerful and it’s one of the reasons I love Linux for.

The sections below are gradually more advanced – probably not for everyone – but it pretty much summarizes my knowledge today in this area.

About the mechanical keyboards I’ve owned

The section below is about the mechanical keyboards I have owned and I’ll explain a bit about why I bought them and what I discovered. If you’re new to the concept of mechanical keyboards, you might not know about the concept of different key-switches – I recommend checking out the links below or googling for related keywords:

I have hands-on experience with the following keyboards:

  1. My first mechanical keyboard was a Das Keyboard – don’t remember the exact model – this was excellent in the way that it didn’t look like a spaceship or something completely untraditionally. So it also looks good in a workplace-setting and nobody comes asking questions about why you have such a “weird keyboard” on your desk.
  2. I later think I bought the Ergodox Ez (not moonlander, it was some years before that) – but this is a kind of keyboard that attracts attention in the office, because it’s a split keyboard and not many have them. The theory about using split keyboards is that it’s more ergonomically because the user can easily adjust the distance, which should reduce should tension etc. I think it’s correct but my main concern isn’t with my shoulders.
  3. I probably then I bought my first Kinesis Advantage 2-keyboard around here – and it’s overall the best keyboard I’ve ever had, although it’s also the most expensive I’ve ever bought and had. It’s a very special keyboard, because it has concave keywells, scooped into a bowl shape. This reduces hand and finger extension and its separate keywell-positions keeps wrists straight and perpendicular to the home row. Like the Typematrix-keyboard, this keyboard also has keys arranged in vertical columns, which supposedly should be more “natural” – and I think it also is.
  4. Next up, I bought the Typematrix 2030 keyboard, because it was different: It was small and portable, so I could bring it around to other places and at the same time – it had straight vertical key columns. I think the straight vertical key columns feature was/is a bit overrated for me and it wasn’t a mechanical keyboard so what I didn’t knew was whether or not the advantage of using vertical columns was a bigger advantage than the disadvantage of using yet another plastic-membrane keyboard. Unfortunately, with my experience it wasn’t worth it – but the experience gained was fine and I don’t have anything bad to say about vertical column layout, I just don’t feel it did much of a difference for me and for me I can see that it’s much more important to use a mechanical keyboard. Furthermore I felt it was a bit annoying if I swapped too much between vertical and staggered/normal columns, reducing my typing speed and/or accuracy (and at that time I had a job where I often had to use other shared computers for e.g. optical/tactile measurement equipment).
  5. After some years – and because I had been (and still is) so happy about ergonomic keyboards from Kinesis (at least Advantage 2), I bought the Kinesis Freestyle Edge RGB – split mechanical gaming keyboard – another split keyboard, like the experience I had with the Ergodox EZ. I found out that once again, it’s not that much that it’s a split keyboard that does it for me – but since it’s a mechanical keyboard (with Cherry MX brown switches) it’s definitely not bad. It just isn’t as good as the Advantage 2-keyboard. This has learned me a lesson: I think what works really really great for me is the concave bowl-shaped keywells with vertical columns and that’s it’s a mechanical keyboard…
  6. My last acquisition was a 2-3 months ago, where I bought the Keychron K8 Pro QMK/VIA Wireless Mechanical Keyboard with Hot-Swappable Gateron G blue switches. This at first looks like a normal keyboard, but there are a few interesting considerations I did this time:
    • I’ve usually before this keyboard bought brown Cherry MX-switches (or their clones), because this is typically a “safe choice” if I e.g. wanted to bring my keyboard to somewhere.
    • This is a “Hot-swappable”-keyboard and I already use Kinesis-Advantage 2 as my primary keyboards. Being hot-swappable means I can replace the switches. Last month I received very cheap Gateron G brown switches too, so in terms of changing switches this is the most customization-friendly keyboard I’ve had.
    • I wouldn’t like to use red switches, due to the missing tactile feedback but it’s a huge plus for me that these Gateron-switches are cheap, so I can easily experiment with different switches and with this keyboard, I’ll have to figure out if it’s worth to buy something else than brown switches the next time…
    • I wanted a (relatively) compact keyboard I could bring with me to e.g. CTF-challenges or other places so I avoid using my laptop-keyboard (which is also typically bad, if used for many hours).
    • It has Bluetooth – but that has no or at least only limited value to me. Instead it’s very interesting that it has QMK / VIA firmware, meaning I can hardware-remap the keys so I don’t need to install software, in order to modify the default/normal QWERTY-layout, to Colemak-layout. This could potentially be a problem in some workplaces – but also the keyboard is priced competitively. Read more about QMK/VIA here.

The typical slanted column grid seen on almost all, especially QWERTY-keyboards are offset and designed that way because of historical reasons. In the early 1870s a guy named Christopher Latham Sholes created the QWERTY-layout for early writing machines. But those machines had each key attached to a lever and therefore each lever needed an offset to prevent the levers from running into each other.

Pretty much all other serious contestants to the QWERTY-layout has been created to do things better than QWERTY. In 1936 the DVORAK-layout was patented by August Dvorak and his brother-in-law, William Dealey, claiming it to be faster and more ergonomic. The “best” keyboard layout is one that requires less finger motion, increases typing speed or is more comfortable than QWERTY. However, DVORAK has not replaced QWERTY yet – and probably never will and it’s probably the same with Colemak – that it’ll never replace QWERTY. It’s not only the keyboard that’s important, it’s also the keyboard layout. It’s interesting that even though QWERTY is bad and there are better alternatives, people stick with the worser for historical reasons… This leads to part 2, which is technically harder to understand – but with some additional technical details.

Introduction to analysis of keyboard layouts

Colemak is a keyboard layout that places the most commonly typed characters on the home row. The hypothesis is that this is more ergonomic because you don’t need to move your fingers as much. According to https://colemak.com/Ergonomic the benefits include:

  • Colemak places the most common English letters on the home row and therefore the home row is used 14% more than Dvorak (and 122% more than QWERTY).
  • Finger-travel distance is reduced: Your fingers move 10% more with Dvorak and 102% more with QWERTY, compared to Colemak.
  • Same-finger typing happens 60% more with Dvorak and 340% more with QWERTY, when compared to Colemak.
  • By grouping the vowels (except ‘A’) on the right hand, it reduces very long sequences of same-hand typing, such as “sweaterdresses” on QWERTY.
  • Reduced long sequences of same-hand typing – see e.g. the “Fraser Street comparison” from : Computers and Automation magazine, November, 1972, pp. 18-25 (Dvorak layout advocacy) which is text made up almost entirely of one-hand words on the QWERTY layout.

According to https://www.news.com.au the benefits include:

  • Colemak allows 74 per cent of your typing to be done on the home row, which outshines Dvorak’s 70 per cent and QWERTY’s 32 per cent.
  • Your fingers move 2.2 times less than using QWERTY, it offers 35 more words from the home row and most of the typing is done on the strongest and fastest fingers.

Software packages for analyzing input texts exist, so we can calculate metrics that we believe has an impact on keyboard ergonomics. Back in 2011 these tools also existed – but I have a hard time finding them now (and remember what I used), so I’ll go with the next-best approach which is what Google helped me find, while producing this:

The article: An Analysis of Keyboard Layouts – explains how to take the “heatmap-keyboard” project with Github-source code here: Keyboard Heatmap – and it’s really simple: The author takes some recent email exchanges and run the analysis on that. I felt I should do something as that, so I went into my Thundebird mail folder:

$ cd ~/.thunderbird/cd9e1phk.default/ImapMail/imap.gmail.com/[Gmail].sbd$ cp 'Sent Mail' ~/Downloads

I copied the ‘Sent Mail’-file to my Downloads-folder – unfortunately this was a 1.7 GB file and it seems to also include attachments… I found a better idea and that is to select e.g. 100 (?) of my latest sent emails in Thunderbird and export them as text, into an mbox-file. This resulted in a data-file that was only 17 MB and had around 240.000 text lines in it. Unfortunately it has/had a lot of HTML-stuff, email headers-stuff, base64 content-encoded lines etc and I decided to take 20.000 text lines for analysis. It was ASCII-text producing the following keyboard heatmap analysis:

Initial “naive” heatmap, from “sent emails” – but except that “E” and “T” are very frequent all the other keys seem to be used with some kind of “average”-frequency.

The good thing about Colemak is that “E” and “T” are positioned in the home row, using both DVORAK and Colemak. But the 20.000 input lines above are bad and the analysis is bad because for instance there’s a huge section with base64-encoded text:

Content-Type: image/png; name="hwhLm8Qqsv7L09Bd.png"
Content-Disposition: inline; filename="hwhLm8Qqsv7L09Bd.png"
Content-Id: <part1.s0nd00ox.Eq3M0TV8@gmail.com>
Content-Transfer-Encoding: base64

iVBORw0KGgoAAAANSUhEUgAAAd4AAAWKCAIAAADZrJNmAAAAA3NCSVQICAjb4U/gAAAgAElE
QVR4Xuy9B5gcxfH+v7uzs3kv51POEkiAhEgmmGSyMWBs44xzzjlnG2f755xtsLFxwmCbZHIS
EighlE5Zupw3p9n5f0b9fea/zyXdnfZOd1Iteo7dnp7u6ndm3q6prq5yRiO9DvkIAoKAICAI
...

The base64-encoded text obviously does not consist of typical letters or letters-combinations that you yourself would type on a keyboard. In this case, the base64-encoded text took up 13500 text lines which is a huge fraction of the total text-file. So this analysis is simply deadwrong. I’m pretty sure the base64-encoding is responsible for the extremely “average” yellow/green’ish-colored heatmap.

I decided to clean up the mbox text file and delete these lines and was left with another input text file – now with 13.000 text-lines. Unfortunately this still had a lot of HTML-code, but that at least is more sane than all the base64-encoded strings. The result is shown for some of most important keyboard layouts below:

I’ve added heatmaps for Workman and Norman layouts, although I have not explained much about them. For this analysis, because we’re using sub-optimal input data, we should ignore the that shift-key is the most frequent key used – (likely due to the many HTML-code lines in the input-file).

The position of the very frequent “E” and “T”-letters on the QWERTY-keyboard makes no sense, in terms of being economical with your finger movement. They should instead preferably be on the home row, in order to reduce finger stress. It’s not possible to use the home-row keys in a reasonable way with QWERTY for most normal input texts written in english. In terms of finger travel distance, QWERTY is generally also among the worst layouts of them all – this is however not directly seen from heatmap…

Using additional keyboard layout statistics such as finger frequency, hand balance, finger bigrams and skipgrams, lateral stretch etc

We have the heatmap-technique for better understanding what is an optimal keyboard layout. But there are also a lot of other statistics to look at, for people who are really into the ergonomics of keyboard layouts. For that we need additional text-analysis tools that were developed and become more popular in the years after when Colemak was (2006).

I’ve tested the a200-script which seem to be created by or related to the website https://monkeytype.com/about which is a typing test website to e.g. measure typing speed and accuracy. It seems by default it uses much better input than what I used above so I’ll leave it with the default input and show the output from 2 datasets. It’s a Python-script that is launched using the “./a200”-shellscript. After removing a lot of (at least for me) unknown keyboard layouts and modifying the column output a bit we get:

Default data set: “Monkeytype-quotes” (around 5800 text lines).
Extra data set: “Discord”-chat data (almost 190 thousand text lines).

One can see the data used in the “wordlists”-subdirectory and although we’re approaching the limit of what I know about these things I’ll try to explain what I’ve learned about the results:

  1. The easiest columns for me to understand is the 3 rightmost ones: “top, home, bottom”. These show that for the analysed input text and for both datasets, with Colemak around 70% of the characters was typed with the home or middle keyboard row. For QWERTY that number was around 32%. It’s interesting to see that around 50% of the letters are on the top row, using the QWERTY-layout – this is not very efficient.
  2. The “LTotal” and “RTotal” columns are a measure of the percentage of letters used, that belongs either the left or the right hand. I’ve been contemplating about why this doesn’t add up to 100% so I looked into the source code and believe it has to do with how the keyboard layouts are defined in the “layouts/”-subdirectory. Each keyboard layout file only tells which letters on each of the 3 rows belong to which hand – so when numbers or symbols, e.g. question or exclamation marks are included in the input data the program is not capable of figuring out what finger that non-letter symbol belongs to, thus the total percentage is only a measure of letters from the input-text. The difference between LTotal and RTotal tells something about the “hand balance” – an equal amount is a perfect split. With this metric I don’t think QWERTY is particularly bad, but the difference between all layouts is low so these columns are not really important here.
  3. Next we have the SFB- (Same finger bigram) and the DSFB- (disjointed SFB) columns. The idea is that we want a layout to avoid awkward typing motions and same-finger movements are often seen as slow and awkward to type. SFBs are basically when you use the same finger to type two different or successive keys, e.g. “ed”, “un”, “ol” etc on a QWERTY-keyboard – or when you type “loft” or “unheard”. The other column, the disjointed SFB is a measure for how often two keys with the same finger, but separated by X letters are pressed. As an example, typing MAY with QWERTY is a same finger skip-1-gram. The table shows that QWERTY scores high for both SFB and DSFB, which is bad. Colemak on the other hand scores good for SFB while there isn’t much difference between layouts in the DSFB-column.
  4. The remaining 4 columns are related to “trigram” or 3-key sequence stats:
    • The “alternate”-column measures the frequency of pressing one key with one hand, then pressing a key with the other hand followed by pressing a key with the original hand again.
    • The “roll”-column is a measure of either when two keys are first pressed with one hand, but not using the same finger – and then a third key with the other hand is pressed – or the opposite way around (press a key with one hand followed by 2 keystrokes with the other hand, not using the same finger). This can also be divided into inward and outward rolls, where e.g. “DF” on a qwerty-keyboard is inward and “FD” is outward.
    • The “redirect”-column is measuring a one-handled trigram, where the direction changes, e.g. “DFS” or “DSF” with QWERTY (first inward, then outward or opposite).
    • The “onehand”-column measures the frequency of one-handed trigram rolls, i.e. when the direction is either inward or outward. This is considered more comfortable than “redirect-trigrams”.

There’s also an option for showing the finger-balance or distribution as seen in the image below.

Finger-balance – using the large “discord”-data. The thumbs are not included: LP = “left pinky”, LR = “left ring”, LM = “left middle”, LI = “left index” – and similar for right hand.

Finger-balance isn’t really an expertise-area of mine, but I’m assuming that the best layout is one that distributes the load more or less evenly among each finger. Furthermore, I think it’s safe to assume that in general we want less load on the weak fingers (LP, RP) compared to the stronger fingers (LM+LI, RI+RM). I don’t think this finger-balance analysis shows so different results that justifies any clear conclusion.

Running additional analyses

Colemak/Dvorak is not always the best – I’ve found some extra links and resources that could be of interest and would just briefly summarize, before making a conclusion:

  1. Another keyboard layout analyzer: This website clearly tells you how they define the calculation of the optimal layout score (as they should, when they present to us what they think is the best layout). In this case, it’s a weighed calculation that factors in the distance your fingers moved (33%), how often you use particular fingers (33%), and how often you switch fingers and hands while typing (34%). This means that if you want – you can make your own program or calculation with your own weights – and come up with your own “optimal layout”, based on your preferences. The front page of this website has Dvorak ranked first and second position – and Colemak as third-highest ranking. You can then click “Disance”, “Finger Usage”, “Row Usage”, “Heat Maps” etc to learn more. Also, in top menu you can choose “Configuration” and the website already seems to know something like >20 layouts – many for the Ergodox.
  2. I also found the Keyboard Layout Analyzer: KLAnext v.0.06 to be interesting: In this case, the optimal layout score is a weighed calculation that factors in the distance your fingers moved (40%), Same Finger Bigrams, (40%) and hand alternation (20%). The website allows one to run different comparisons so I picked something simple as my input data set: “KLA Introduction” and clicked the “Compare Layouts”-button. I haven’t studied exactly which tests are and which data is used, but the result is summarized in something called “effort” (a low effort is better for doing the same task). Of the 12 ranked layouts, QWERTY is the worst with +48% “effort”, Colemak comes in second place with only %2 “effort” and top-ranked is something called “POQTEA-QP” with 0% “effort” (something about right-hand letters). It’s a layoyut I’ve never heard of before, but it seems to be a very new layout from 2021 where Colemak is from 2006 – so it could maybe be great…

It should be fair that I also present another side of the story, although it makes everything a bit more unclear. Using the tests below I found that Colemak was supposedly one of the worser layouts – almost as bad as QWERTY:

  1. Another of these “Keyboard Layout Analyzer” (why are they all called the same) again used different weights: The total distance your fingers travelled (3/16); how often you use stronger fingers (3/16); the direction of finger rolls (3/16); how often you switch fingers (3/16) and hands (2/16) between key presses; balancing fingers and hands usage (1/16); and the effort to type whole words (1/16) – I ran a few tests by choosing from the top “Comparison”-menu. No matter what I did, Colemak was almost as bad as QWERTY – but those layouts that “won” the test where pretty unknown layouts to me, such as “BEAKL 35 Matrix” – but it seems this test also knows the different between the different physical keyboards, such as the ergodox – which I think is great. The bad thing is that I felt a lot of the input text data was or is too short to make the tests meaningful…
  2. And another very similar “Keyboard Layout Analyzer” which is very similar generally placed QWERTY as lowest-ranked layout and some unknown layouts in the top: X7.1H Ergolinear/ergolinear (1st place), BEAKL Opted4 Ergo Alt/ergodox (2nd place), BEAKL 15 Matrix/matrix (3rd place)…

I encourage anyone interested in this, to make your own research and maybe post a comment in the blog section below if you think anything should be added/remove/changed.

Conclusion

With many available metrics, it becomes a difficult task to choose the “best” keyboard layout because it doesn’t exist or is a theoretical concept meaning that what we see from tests, depends a lot on the input data which again varies from person to person. The fact that the best keyboard layout doesn’t exist doesn’t mean that there aren’t bad options – QWERTY is one that scores low on a lot of measured parameters, e.g. travel distance and it scores low pretty consistently among different tests and different software tools.

The “best” layout is subjective and a compromise made up by trying to achieve competing objectives. Optimizing a layout for a particular stat prevents optimizing on a different stat but all in all I think Colemak, Dvorak, Workman and Norman are all good layouts that are better than QWERTY. This is subjective but it also comes from reading and trying to understand what other people have written about this – and lots of people have studied this much more than I did and I’ve tried to read a lot of it.

Among the things I learned while writing this blog post and reading up on everything related to the topic is that with the improved analysis-tools I’m not sure if DVORAK and Colemak in the future will have such a large user-base. If they will it probably is because they historically have large user-bases when comparing to the newer and in some cases probably better, upcoming layouts. It might be newer layouts are more efficient, because newer analysis programs are better and people won’t stop talking about ergonomics and improve things.

One must also be careful to choose a layout based on using too small text samples, which I think is a mistake that some of the online web-analyzers did. I normally don’t type “Alice in wonderland”, which is one of the typical text-samples used – this was the reason I tried to export my “Sent mail”-text data, but the next problem I didn’t solve was removing all that bad HTML-stuff that was included in that data. Maybe someone in the future makes a small github-project that removes that HTML-stuff, so one can perform a proper keyboard analysis on real sent emails (I don’t think I’ll do it, although shouldn’t be too difficult) – also don’t forget to add code to your input text data, if you’re a programmer. All results can change when the analysed input text data does, so it’s also very individual, which layouts are best – just don’t use QWERTY if you feel finger- or hand-related pain issues after many hours of typing on a keyboard. I think in general this is the conclusion from seeing many different test results.

I thought this would be a quick blog post – because I don’t currently have time to write about another Linux-projects. This was unexpectedly a very time-consuming blog post to write because I cannot remember exactly why I chose Colemak of the available layouts and as I began typing I thought I had to find some kind of reasonable justification when explaining the decision I took years ago – and/or at least try to justify that I still use Colemak. I hope this blog post helped someone (or you shouldn’t have read all this). I hope to write more about Linux-projects in the future blog posts.

Links to other resources or information that could be of interest

The following are links I didn’t had time to include or explain about (dropping links):

  1. https://colemak.com/Ergonomic
  2. Mailing list for “Keyboard Builders’ Digest”: https://buttondown.email/kbdnews
  3. 178 text pages about keyboard layouts (Google Document): https://bit.ly/keyboard-layouts-doc
  4. Best keyboard layouts – Feb 2023: https://www.keyboard-design.com/best-layouts.html
  5. Alternative keyboard layouts: https://deskthority.net/wiki/Alternative_keyboard_layouts
  6. Github: https://github.com/binarybottle/engram
  7. Github: https://github.com/samuelxyz/trialyzer
  8. Norman layout vs other layouts: https://normanlayout.info/about.html
  9. Norman layout Wiki – Q&A: https://www.reddit.com/r/Norman/wiki/index/
  10. https://kennetchaz.github.io/symmetric-typing/layouts.html
  11. carpalx – design your own keyboard: http://mkweb.bcgsc.ca/carpalx/
  12. Layout Analysis Tool
  13. Layout Playground
Categories
2023

Why and how I use KVM over IP with the BliKVM-box, based on the Raspberry Pi Compute Module 4

Introduction

The problem I had and have with the usual KVM over IP-switches, is that I want a solution so I can remotely control my Proxmox/pfSense-server, even if my whole network is down – which it can or will be, if either Proxmox or pfSense misbehaves, for whatever reason.

As described in earlier blog posts, my Proxmox-server runs pfSense virtualized and this virtual server/machine (VM) controls my whole LAN/VLAN-setup – see e.g. these posts for more background information:

Here’s a picture of the core of my home network setup:

The picture shows the critical parts of my home network system incl. one of my 2 managed switches (for having and using VLANs described in earlier posts). The Proxmox server runs headless, i.e. there’s not mouse, keyboard or monitor connected to it.

And, if anyone reaches this point without knowing it: KVM in this context is an abbreviation for “Keyboard, Video (monitor), Mouse”. IP in this context is just TCP/IP (Transmission Control Protocol/Internet Protocol).

KVM over IP (wired/wireless) is a device that allows to remotely access a computer/server over a network – that’s what the following is about…

Description of the problem

In case I screw something up with my Proxmox/pfSense-server and e.g. lock my self out due to some misconfigured setting or if it’s the system itself that e.g. due to an upgrade, incompatibility or hardware fault suddenly won’t work, all my devices will lose the connection to my router and all my devices can’t access the internet – or each other in the LAN.

I have a backup router, the Asus RT-AC87U, that I can temporarily plugin and turn on (shown to the right in the image above) – but only for emergencies, if all else fails. When I screw something up and lose my whole network (you don’t learn anything if you never screw something up) – a very annoying, tedious process starts:

I have to unplug all cables from the Proxmox-server (HP t730 in the lower parts of the image), move it into my home office, plugin a keyboard and monitor (optionally a mouse) – boot up the system, restore from a backup or otherwise fix the system via manual settings or SSH. Once it works, I put back the machine, plug all the cables back – cross my fingers that I didn’t forget anything or the whole process starts over again.

Narrowing down which KVM-solution could solve the problem

KVM-switches in general solve this problem, so I can leave everything in place and still remotely access to the Proxmox-/pfSense-server. But I didn’t want a solution with new/additional cables, because I have a solution with 4 small plastic PVC conduits with cat 5e-cables to 4 different rooms and there isn’t room for another cable. Furthermore I would have a problem with making room for another outlet and the solution is just not very flexible in case I later change my mind and cabling has to be changed.

I looked at different options in the market and traditional KVM over IP-switches exists but I thought to myself: What good is such a solution, if my whole network goes down? It’s useless without a router, because then I cannot access the router’s administration/admin page and that defeats the whole purpose or idea.

I also think many of these solutions were pretty expensive (and probably mostly used in corporate environments) as I imagine not many consumers want to pay >1000 USD for such a solution, in a home setup. After Googling and considering the different options with help from e.g. Amazon and Ebay, I found the solution and it was or is a great solution in several ways:

  • It’s Linux-based, i.e. based on open source code and therefore “hacker-friendly” (in case I want to customize or modify something)
  • It’s based on Raspberry Pi, which can also be used for a lot of fun home-automation stuff, should I ever decide to dig into that (I later bought another Raspberry Pi for this, so I have an extra for experiments)
  • It works even if my whole network goes down because it’s wireless and this feature is just so much better than what many of the more expensive alternatives offered
  • This KVM-solution is also an alternative to buying a computer/server with Intelligent Platform Management Interface (IPMI) – except IPMI is built for that single motherboard and the BLiKVM-solution is for pretty much all motherboards because it takes the HDMI-video output and simulates keyboard/mouse via a standard USB-port.
  • It’s cheap (both compared to the expensive corporate KVM-switches I also considered and also compared to a server with IPMI-module)

On the negative side, it’s not as user friendly as typical commercial / consumer products are and I also spent a lot of time tweaking and figuring out how I wanted it to work.

Buying a BliKVM CM4-box and a Raspberry Pi CM4

The solution I ended up with, was buying a BliKVM CM4. The “CM4”-part means it’s based on the Raspberry Pi Compute Module 4. The CM4 incorporates e.g. a quad-core ARM Cortex-A72 processor and is widely used in embedded applications. For a general overview of the solution, please see additional details here:

One of the problems with buying Raspberry Pi’s is that they’re out of stock, almost everywhere (at least most of year 2022 and beginning of 2023). It wasn’t easy to get my hands on the CM4, but I can highly recommend using https://rpilocator.com/ for finding online shops that have the CM4 in stock.

Since I live in the EU, I bought both of mine from Poland – https://botland.com.pl/ – they shipped fast and I could track the package and there were absolutely no problems, after I ordered. Ordering the BliKVM was never a problem, but since it’s being shipped from Hong Kong or China, delivery time is longer (I think I waited around 2 weeks or so, to Scandinavia).

Preparations and initial findings

It took a while before I figured out how I wanted things, so I didn’t knew what the end result would be, but I knew it ran Linux and therefore was flexible. One should start at https://wiki.blicube.com/blikvm/en/BLIKVM-CM4-guide/ and then begin to flash the CM4 using the instructions at https://wiki.blicube.com/blikvm/en/flashing_os/.

It took a while (many hours) to figure out the details but there’s also excellent tech-support in a Discord-channel, which I had to use twice before I ended up being happy. I’ll go through the issues I had to overcome before being happy with the solution:

Problem 1: Enabling wireless access via the hotspot.sh-script

I bought the CM4 with WiFi (and bluetooth) so I had a solution that would work, just by connecting a laptop to a WiFi-hotspot whenever I need/needed the BLiKVM. This made the solution independent of cables – and it’s a perfect solution for me.

With Linux it’s normally not a big problem for me to create a WiFi-server or wireless access point. So I started looking through e.g. the Arch Linux Wiki for information about how to setup a hotspot, install and configure hostapd and dnsmasq and during that process I began following the “tech-support”-channel from the BLiKVM Discord-server. I got lazy and instead of inventing the wheel from scratch, I asked if anyone had done it before me: To make the BLiKVM automatically start up a WiFi hotspot, that I can connect to and access the KVM-features. Luckily this solution had already been made. So I took the easy/lazy/quick solution and am sharing it here, in case anyone wants to do as I did:

  1. First flash the BLiKVM-image and it’s important to use the image obtained from Google Drive with the link from (the section “Download the image” and “PiKVM image (BLIKVM-CM4)”: https://wiki.blicube.com/blikvm/en/flashing_os/ (WARNING: DO NOT USE THE IMAGE OBTAINED USING THE DOWNLOAD-BUTTON HERE: https://www.blicube.com/blikvm-products/)
  2. Download the “hotspot”-script from (latest version) https://kvmnerds.com/PiKVM/hotspot – or find a previous version at https://github.com/srepac/pikvm/blob/master/hotspot

You might want to edit e.g. the SSID name, passphrase and network to your liking, before running the script using “./hotspot.sh -f” where the “-f” option will make the hotspot work automatically on every reboot.

Problem 2: The download-link for the image used for flashing was wrong

After having flashed the CM4 with different images a few times and learned (a bit about) the different Raspberry-Pi Linux-distributions I decided I wanted to wrap this up and ensure that everything worked. I spent some hours looking into the BLiKVM-solution and I found out that the “KM”-part, i.e. the keyboard and mouse didn’t work:

  • At first it was very typical that the mouse just moved by itself in a repeatedly, linear infinite pattern.
  • The keyboard never worked.
  • The keyboard and mouse indicator lights were at random times green (meaning “ok”) and other times orange-coloured (meaning “not ok”) – see screenshots a bit below.
  • Later, after some reboots most often the mouse never worked at all.
Using the BLiKVM to control my other Raspberry Pi – notice the orange mouse and keyboard-symbols in the upper right corner.

Such problems would never have happened with a more commercial/consumer-oriented product. But I’ve asked for trouble myself by buying a Linux-product that I wanted to customize myself and one should never do that, if one doesn’t like the challenge in fixing the problem and doesn’t have the patience or hours needed.

I wrote and asked for support in the “tech-support”-channel and eventually found out that I had used the wrong image.

The download-link pointed to a wrong image

After banging my head against the wall I found out what I’ve also explained above:

  • The download-button for the webpage: https://www.blicube.com/blikvm-products/ – is wrong and this is probably the image I used, when my (virtual) keyboard and mouse didn’t work (at least it was wrong on January 3rd 2023, it might be fixed after this).
  • I later attempted to use the image obtained from https://wiki.blicube.com/blikvm/en/flashing_os/ which points to a Google Drive-file – see the section “Download the image” and “PiKVM image (BLIKVM-CM4)” – and then everything, i.e. virtual keyboard and mouse worked as it should.
  • I asked in the Discord-channel: “Why are there links to two different images” and received the answer that I should use the image from Google Drive only – which is also the image that worked. Furthermore, I was told that a work order had now been placed, for updating the link to the correct link so in the future, this might not be an issue at all for people trying to do the same as I did.

This is what I like and think is extremely satisfactory about Linux: When you contribute to or solve something using open source, in a way that nobody or not many people have done before you. It’s challenging and rewarding to solve such problems…

A nearly perfect KVM over IP-solution with built-in wireless WiFi-hotspot

Below are shown some additional screenshots that hopefully illustrate a bit more details, about why this is a good KVM-solution:

This screenshot shows some of the settings available in the upper right corner.

The screenshot above illustrates the issue I had, when neither the virtual mouse or keyboard worked, but it also shows some additional settings. I decided to use the BliKVM (hotspot/WiFi) server IP address: 10.10.10.1, to avoid a potential conflict with 192.168.1.1 in case I use my laptop with both a cable to the normal LAN and at the same time connect to the BLiKVM hotspot.

Once the Raspberry Pi boots up into one of the graphical display modes, one can see something like shown above or below:

In this case, both the mouse and keyboard icon in the upper right corner aren’t orange anymore – at this time I’ve fixed the issue and both WiFi and KVM works as I expect.

The last screenshot is taken after I reflashed the BLiKVM with the correct image (from Google Drive). The BLiKVM-box also works on other machines.

Using the BLiKVM-box for remote administration of my Proxmox- (=virtualized pfSense) server

One is also able to e.g. enter BIOS-setup and use mouse and keyboard, from the beginning i.e. before the operating system has been loaded. The only minor “problem”, is that there’s a delay of maybe around 0.5-1 second (especially for IP over WiFi instead of LAN). On the other hand I think it’s not an unusual or unexpected problem for a KVM-switch that communicates entirely wireless. As long as the BIOS-delay is maybe 5-10 seconds, this is absolutely no problem at all as shown below:

This image/screenshot is obviously manipulated a bit. First, the user must enter the “Boot Menu (F9)” and then we can boot from a virtual CD-ROM – or flash-drive.

Notice everything is green in the upper right corner…

Problem 3: Unable to upload image-file for virtual CD-ROM

Unfortunately I seem to have lost a few screenshots along the way, but for illustrative purposes take a look at the screenshot below:

Mounting a virtual CD-ROM-drive requires uploading image files beforehand.

Initially I tried to upload the virtual CD-ROM-image using the GUI and I even think I tried 2-3 times. After around 25% it stopped with the error seen below and I took a screenshot:

Unable to upload image file via the GUI – there’s no explanation, just “Can’t upload”…

I was trying to upload Clonezilla so I could take a full backup of the harddrive. Luckily, there is a solution consisting of 4 steps described at https://docs.pikvm.org/msd/#upload-images-manually-without-web-ui:

Description of 4-step procedure, to upload ISO-images manually for BLiKVM / PiKVM.

I used the command: “scp clonezilla-live-3.0.2-21-amd64.iso root@10.10.10.1:/var/lib/kvmd/msd/images” followed by “touch /var/lib/kvmd/msd/meta/clonezilla-live-3.0.2-21-amd64.iso“. The “scp”-command seems to be much more stable, as there were no errors and it worked the first time. My guess after this success is that maybe the web UI used UDP and scp uses TCP-protocol, which will automatically retransmit on errors (I’m not sure, if you have an idea or know about it, please write it in a comment)? I didn’t check it but it makes sense that the WiFi-connection wasn’t ideal or perfect because I actually had 1-2 walls between the BLiKVM and my office/laptop and once in a while I shortly lost the display for 2-3 seconds (and then it was stable for some minutes, again I suspect the walls, signal strength or maybe radio interference).

Demonstration

Having solved the problems described above I also managed to create a full-disk backup of my Proxmox (and virtualized pfSense)-server, without physically attaching either a monitor or keyboard/mouse:

Clonezilla completed – this means that I can always fully restore this image, if everything else fails.

I can now use BLiKVM to boot into Proxmox – and shut it down – enter the BIOS – or do whatever I need to, using the builtin BLiKVM WiFi-hotspot:

Proxmox shutting down.

Unfortunately – or maybe I should say “luckily”, I don’t have any Proxmox-problems to show at this moment. But I think I’ve also shown what I want to show… When Proxmox fails to boot up correctly, I can access the terminal, login as root, check system logs, restore, modify, fix configuration files – or if all else fails, I can restore the whole harddrive image from the Clonezilla-image.

I can however not access the Proxmox Web-UI via WiFi, even if the BLiKVM and the server starts up correctly. The reason is that the BLiKVM only accesses the display and provides keyboard/mouse input to the server – there’s no network connection between the BLiKVM and the machine it’s attached or connected to. I think I remember that I have assigned a special (physical) emergency LAN-port to passthrough to the pfSense-server, but that requires me to plugin a network cable from my laptop, to the Proxmox/pfSense-server. In this case it’s nice that 10.10.10.0/24 doesn’t conflict with 192.168.1.0/24 (but hopefully I won’t be needing that ever, it’s just an extra safety precaution).

Conclusion

I can highly recommend this setup and KVM-solution and if you do as above, things should work out just fine. There have been some minor problems along the way, but Open Source is full of challenges one has to solve/fix and that’s just how it is. My contribution here is to describe these things so other people better can decide if they want to buy the same KVM-solution I did and I think this post can save some people both time and frustrations.

Thanks for reading all this. I hope this blog post was useful or maybe even inspiring to some people. If you find this to be useful, find any mistakes, omissions or have relevant comments, ideas etc, please write below and/or contact me directly, thanks (sorry, it can take up to a week or so for me to get back).