Day 6: getting old posts back (part 2)
I got my posts back! :D
Unfortunately the quality of the dump isn't very good. Markdown is sometimes not interpreted, there are double posts (Tweets linking to my blog or Instagram) and a lot of images are at width:500px
, which was my default width for over 10 years.
I've been coding some kind of post-match code, too see if I can detect the double posts. It quite hard, because a lot of tweets that are not the same post, are send within 15 minutes of each other, so published
is not a good predictor. Trying to calculate the text similarity with levenshtein()
seemed like the solution to all my problems, but there are quite a few mismatches still. I probably can't do this fully automated.
I now really have to go to do some real-life stuff, so I guess today just has to be the day I imported the posts and added the images. It's not that I haven't put in the hours today, it's just that it's way too much. (Or I need to be more efficient) You can check things out by scrolling way down in my feed, or at /blog or /tekstbeelden.