I have been writing a lot recently about little obstacles I have come across while helping the Church of England move content from their old sites to their new ones. Making a list of all the content to be moved is often the first job, but many of the older/proprietary platforms don’t have the ability to export a list of posts or pages as a .csv for example. Usually copy pasting a list of the files from the back end of the sites is the only option, but then I need to strip out all the junk that copies over with it: edit and delete buttons for example.
Today I needed to remove a lot of date/time stamps, and found the quickest way to do this was once more a regex find and replace in a text editor (Gedit). I had already removed various words from the text, such as “Edit” and “Delete”.
Which basically means any number from 09 in the time stamp format used. Here is an example:
All removed just fine. I then used a different regex to remove all the double line breaks.
Here is another example with a different date format:
Here the Regex was:
And the dates were: