UOMF: Semi-Automate Boring Tasks: Replacing Twitter-Snippets

This is an article from a series of blog postings. Please do read my "Using Org Mode Features" (UOMF) series page for explanations on articles of this series.

For this blog, I had to replace embedded Twitter snippets with images of the original tweets, captions and appropriate alt-texts. This article describes how I was using Org-mode to support me in this project.

Basically, I wanted to replace Twitter-generated embedded HTML snippets like that ...

 #+BEGIN_EXPORT html
 <blockquote class="twitter-tweet" data-cards="hidden" data-lang="en">
 <p lang="en" dir="ltr">Org mode markup deserves to be adopted beyond Emacs
 <a href="https://t.co/08SgDN7Ldt">https://t.co/08SgDN7Ldt</a>
 </p>&mdash; Unix tool tip (@UnixToolTip)
 <a href="https://twitter.com/UnixToolTip/status/1115960157102587904?ref_src=twsrc%5Etfw">
 April 10, 2019</a></blockquote>
 <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
 #+END_EXPORT

... with image snippets like that:

 #+CAPTION: Tweet by UnixToolTip with a link to the article on Org-mode syntax.
 #+ATTR_HTML: :alt Org-mode markup deserves to be adopted beyond Emacs
 #+ATTR_HTML: :align center :width 590
 [[tsfile:2019-04-10T14.50 Twitter.com - UnixToolTip - Org-mode markup deserves to be adopted beyond Emacs -- screenshots publicvoit.png][https://twitter.com/UnixToolTip/status/1115960157102587904]]

This included generating the screenshot and making sure that the files got archived to their destination directories.

Motivation for publicvoit

Since I will get asked why I had to do this task, I will summarize my reasons here. If you're only interested in the Org-mode-related stuff, you can skip this section.

My blog pages work even when cloud services are offline.
Readers should see the content of the tweets even when the original source is offline.
I want less external active content to enable when browsing with no JavaScript like NoScript.
I want less external-snippets-related issues with the Atom feeds.
I may delete my older tweets without losing their context in my articles in future, keeping only the last 14 months or so.
I want to follow the principles for decentralization and POSSE (mentioned here).
It enables much faster page loading and maybe smaller data amount.

And yes, I also replaced embedded Mastodon toots, YouTube videos and other HTML-snippets. For the sake of simplicity, I'm only discussing the process for tweets here.

The Process

First, I wanted to get some basic insight on the amount of snippets I need to replace.

So I searched for "twitter-tweet" in my Org-mode files:

 #+BEGIN_SRC sh
 grep "twitter-tweet" ~/org/*org | wc -l
 #+END_SRC

 #+RESULTS:
 : 85

85 snippets. Quite a number when it comes to manual effort. How are those snippets distributed in my Org files?

 #+BEGIN_SRC sh
 grep "twitter-tweet" ~/org/*org | sed 's/.org:.*//' | uniq -c
 #+END_SRC

 #+RESULTS:
 |  1 | /home/vk/org/misc        |
 |  5 | /home/vk/org/hardware    |
 |  4 | /home/vk/org/notes       |
 |  3 | /home/vk/org/projects    |
 | 69 | /home/vk/org/public_voit |

Okay. So the majority is in my public_voit.org file. Then let's start with the others. To track my progress, I was using a list with checkboxes I generated from the previous table result:

 - progress
   - [X]  1 | /home/vk/org/misc
   - [X]  5 | /home/vk/org/hardware
   - [ ]  4 | /home/vk/org/notes
   - [X]  3 | /home/vk/org/projects → false positives
   - [ ] 69 | /home/vk/org/public_voit

The replacement tasks are tedious, boring and error-prone. This calls for a maximum level of automatisation. However, one does not over-engineer this task. I'm going to describe the sweet-spot I was using.

Here is the process I came up with. I highlighted the most important task steps which I had to do all the time.

Visit Org file that contains embedded tweets by generating a sparse tree with "twitter-tweet".
Visit corresponding article in web browser.
Create screenshot and save file to a temporary directory
- File name example: 2019-04-10T14.50 Org-mode markup deserves to be adopted beyond Emacs.png
Copy URL of tweet to clipboard
Optionally: tagging the screenshot file with further tags.
Execute babel snippet which does:
1. Checking content of clipboard: does it look like a twitter URL? Abort, if not.
2. Extracting twitter user handle of tweet.
3. Switch to temporary directory with the screenshot.
4. Insert "Twitter.com" and the user handle as first parts of the file name.
5. Adding filetags for "screenshots publicvoit".
  - Now, a file name looks like: 2019-04-10T14.50 Twitter.com - UnixToolTip - Org-mode markup deserves to be adopted beyond Emacs -- screenshots publicvoit.png
6. Extracting the image width from the screenshot file.
7. Generating a lazyblorg snippet which includes the screenshot image, links the original tweet and provides caption and alt-text.
8. Archive the screenshot image file via move2archive.
Copy result to blog article
Write alt content and caption

The babel script looks like that:

## assumption: clipboard should contain URL of tweet
URL=$(xclip -o -rmlastnl)
CHECKSTRING=$(echo "${URL}" | cut -d "/" -f 3)
if [ "x${CHECKSTRING}" != "xtwitter.com" ]; then
    echo "ERROR: not a Twitter URL in clipboard!"
    exit 0
fi
TWUSER=$(echo "${URL}" | cut -d "/" -f 4)
TWUSERCAPTION=${TWUSER}
[ ${TWUSER} = "n0v0id" ] && TWUSERCAPTION="me"

cd ${HOME}/tmp/2del/2022-01-02-publicvoit-twitter-screenshots/
appendfilename --smart-prepend --text="Twitter.com - ${TWUSER} -" --quiet *png
filetags --quiet --tags "screenshots publicvoit" *png

MYLASTFILE=$(ls -t | head -n1)
NUMCOMMAS=$(echo $MYLASTFILE | grep -o "," | wc -l)
MYINDEX=$(( 2 + ${NUMCOMMAS} ))
MYWIDTH=$(file "$MYLASTFILE" | cut -d ',' -f ${MYINDEX} | cut -d ' ' -f 2 )

echo "#+CAPTION: Tweet by ${TWUSERCAPTION}: ."
echo "#+ATTR_HTML: :alt "
echo "#+ATTR_HTML: :align center :width ${MYWIDTH}"
echo "[[tsfile:${MYLASTFILE}][${URL}]]"

m2a --batchmode "${MYLASTFILE}"

The Results

Voilà, this is the thing that helped me going through the replacement steps. Of course, I could have automated even more I guess. But that would have involved more effort on generating and testing the automation code. You have to find your personal sweet-spot.

You can see the results all over my blog. Here is one example. First, this is the embedded tweet:

Org mode markup deserves to be adopted beyond Emacshttps://t.co/08SgDN7Ldt
— Unix tool tip (@UnixToolTip) April 10, 2019

And this is the same tweet after being processed by the work-flow I described here:

Org-mode markup deserves to be adopted beyond Emacs — Tweet by UnixToolTip with a link to the article on Org-mode syntax.

Notice the link behind the image itself. That's a feature which was implemented only in recent days for lazyblorg.

When you're visiting this page in a text-only web browser or in case the image is not available, you can see the alt-text which says "Org-mode markup deserves to be adopted beyond Emacs". Please note that there is an important difference between the HTML alt text and the HTML title of an image which you can see when hovering your mouse over an image.