Log in
Seblog.nl

#100daysofindieweb

Day 5: getting old posts back (part 1)

I have good news and bad news. The good news it that I made a lot of progress in retrieving my old blogposts. The bad news is that I don’t think I can finish this today anymore.

tl;dr: I have my posts in Kirby-formatted .txt-files now, but I need more time to think about moving the images.

In an attempt to own my data, I posted all my Tweets and Instagram-photo’s onto my weblog. To do so, I started fresh, but I never got around to post the old posts back. I have 8000 posts on this site, but the old one’s are all social media reposts, no originals, even though I have had this domain since 2006.

There are roughly four periods between 2006 and now. In the first 5 years, I used a custom made CMS, written by a 16-year-old who taught himself PHP by reading a 175 paged book. (That’s me.) In 2009, I found it too insecure to keep it running that way, so I moved to Wordpress. Looking back at it it wasn’t so insecure at all. Yes, I was 16 years old when I wrote it, but I knew about SQL-injection at the time, and Wordpress isn’t known for it’s security. My own blog at least had security by obscurity.

At the beginning of the Wordpress era I made a fresh start. Luckily I found the old server still running, so my posts should still be there in that database. (You can send a GET request with Host: seblog.nl to my mother’s website to get the last pre-Kirby version.) If not, I have a PDF somewhere, but that’s not great for importing.

After a while I moved away from Wordpress in favor of .txt-files. I did a proper import back then, so that’s where I’m working with now: the .txt-version of my blog.

And those files are great, but hideous. Somehow I thought it was a nice idea to link them together with Javascript, so this version of my blog is not archived by the Web Archive. It worked like an infinite scroll: I had one file called ‘blog.txt’, which only contained a link to the last post. At the end of that post was a link to the next post. I think my 404-page showed a C implementation of the linked list, because that was my inspiration.

Next to ‘blog.txt’ I also had ‘verhalen.txt’ and ‘tekstbeelden.txt’ for other streams of posts. In my import I wrote a nice little recursion to get a list of all the stories, so I can tag them properly:

function findnext($from, $links, &$a) {
  if($links[$from]) {
    $a[] = $links[$from];
    findnext($links[$from], $links, $a);
  }
}

$verhalen = [];
findnext('verhalen', $links, $verhalen);

There is also a lot of Markdown-that-isn’t-Markdown in my posts, which complicates things. Oh, and Wordpress-HTML. I’m trying to regex a lot out of it.

The last problem I have now are images. In Wordpress and my own creation I stored all the images in one folder, but in Kirby the convention is to store them in the folder of the page. (Kirby is one big folder structure. I like that.) I need a way to either upload them when I post the posts via Micropub, or a way to identify the location of the newly created posts from the old slug, so I can move the files on the server. I don’t want to do it by hand.


This blogposts gets hopelessly long. Maybe I could’ve fixed my images-problem by now. But I want a clear head before I start importing. And a backup, but with 8000 posts that takes some time. I’ll resume tomorrow. At least I made progress!

Day 4: scheduled posts

I'm very late today. It's actually day 5 now in The Netherlands, but since I haven't slept yet I'm still calling this day 4.

Because I started so late, I chose an easy one for today. Unfortunately the easy ones always take longer than you expect. But hey, that makes this episode of #100daysofindieweb an episode!


I am now able to schedule posts on my blog. That's a simple one, because all I have to do is check wether the 'published' field of a post is in the past before I display it. It took a little longer because I had to do it in two places, the feed and the entry page.

The feed now only grabs posts where the 'published' is smaller than the current date, and my router now gives a 404 Not found if a post has a 'published' that is bigger than the current date.

Because my webmentions trigger on the first visit of the page, and the page never gets called, I effectively schedule webmentions with this too. Only one problem: I need to wait for a visitor to open the post once it's published to trigger the mentions. This includes the mention to Brid.gy/publish, to syndicate a post to Twitter. I leave that problem to another day though.

Anyway, it's a simple feature, but a neat one, so I'm happy with it.

Day 3: relative urls

Today I started with fiddling with reply-contexts, but that turned out to be a giant mess, because I store my own posts in a different format than posts by others, and I wanted to re-use my template. I need to come back to that another time.


Yesterday, I mentioned Martijn in my post. And when I mention urls in my posts, my server will try to notify the other site that I mentioned that page. So yesterday my site notified Martijn’s site. At least, it tried.

Martijn is known for giving parsers a hard time. My site came up with http://vanderven.se/mention.php as his webmention endpoint, which is not correct. The following things went wrong:

  • I mentioned http://vanderven.se/martijn, which is not his canonical url. That’s http://vanderven.se/martijn/, ending in a /.
  • Kirby Toolkit’s remote::get(), which I used to fetch the page, did follow the redirect, but my script didn’t update the mentioned url accordingly. Since Martijns endpoint link is relative (webmention.php), that / does matter: it defines Martijns page as a folder, so his endpoint is at http://vanderven.se/martijn/endpoint.php
  • Finally, it turned out that Kirby Toolkit’s url::makeAbsolute($path, $home) did not handle relative urls well. It just intelligently added $path to $home, but didn’t actually solve relative urls where $home is a folder or file.

Martijn encouraged me to write some tests for Kirby Toolkit, because they use phpunit and all that stuff. It was nice, because I didn’t have any experience with this kind of testing (I just refresh the page and hope it works). Learning to test things is a goal for 2017 for me.

When I had a list of good tests, I started to change the function. I might be a little too refresh-frenzy still, but in the end I managed to pass all my own tests. After the pull request and with a little tweaking of my own scripts, I finally got Martijn’s endpoint:

http://vanderven.se/martijn/endpoint.php

So, Martijn, here is another try, still without your canonical /, just to see if it works now. :)

Day 2: 410 Gone

One day I made an RSS-feed just for Martijn, and the other day he said:

[2017-01-09 19:27:07] <Zegnat> Oh, geen support voor /deleted? re: https://seblog.nl/2017/01/08/1/this-is-another-test-post

Nope, I just made a test post and deleted it right away, the hard way: a proper delete. How could I know his parser was just visiting right in the three minutes the posts existed? My site showed a 404 Not found, because the page did not exist.

According to /deleted, that’s not the way things should be. So today I re-posted This is another test post, and then unproperly deleted it.


So when you now go to https://seblog.nl/2017/01/08/1/this-is-another-test-post, you’ll get a 410 Gone, including a Dutch human readable page which explains what that means.

I do this by setting a dt-deleted on my post. I also taught my site that if I put a future date, it won’t 410 on people, until we past that date. This sets up my own Snapchat/Instagram Stories on Seblog!

In addition to a 410 Gone for direct hits on the url, the post does still pop up on feeds, but with the following markup:

<div class="h-entry" style="display:none">
  <a href="https://seblog.nl/2017/01/08/1/this-is-another-test-post" class="u-uid u-url"></a>
  <time datetime="2017-01-08 15:26:26" class="dt-deleted"></time>
</div>

This is not visible for normal users, but advanced feed readers could pick this up and delete the post in their cache. I haven’t added it to the RSS, because I doubt anyone supports this, but h-feed readers can pick it up!

I also don’t support delete via Micropub, but hey, we’re getting somewhere!


Oh and while I was at it: my private posts hide their slug now. My deleted posts don’t, because that would cause a redirect (which is 301 or 302) and I wanted a 410 Gone.

Day 1: fixing reposts

Okay, enough, I’m joining! Aaron has been doing this for ages now (that is, 26 days): 100 days of IndieWeb, in which he builds an IndieWeb related feature into his site or some other service.

I’ve been doing my own 100 days of IndieWeb for a while now, but I never blogged about the outcomes of each day. There also wasn’t much focus. I did multiple things at once and was never really satisfied. So my 100 days of IndieWeb is not about doing more, it’s about doing less, but more consistent.

My updates here won’t all be as spectacular or useful to other people as Aaron’s, but they will be updates. My site will improve a small bit every day.


I wanted to start off with something simple, and since I’m already copying Aaron I’m going to do something he did a few days ago, that is: fixing my reposts.

Before today, they looked like this:

As you can see I default to the hostname when I don’t have an author cached. In the case of a retweet, this is not useful at all! Luckily I already had the data cached from the Twitter import, so with a few tweaks I was able to show that on the page:

To make things more interesting I now have a /reposts feed too.