Matthew's Dev Blog

HTML2Markdown Swift Package

Yes, I know

This seems like a bad idea.— Rob Whitaker (@RobRWAPP) December 8, 2021

Why would I need to do this?

My Nearly Departed app uses an excellent API from National Rail to get its live data. And when I say "excellent", I mean it's well-designed and works really well.

The only down-side is that it's XML/SOAP, and the responses occasionally contain HTML.

Until recently, I've had a really hacky set of regular expressions which would extract URLs from HTML anchor tags, and strip out all the other tags, which was good enough. It was only really used in "station alerts", which would contain a link to the National Rail website.

But now I've added a feature to display details about individual stations: ticket office opening times, toilet locations, rail-replacement bus service details, etc... and much of that is HTML. Really bad HTML. The kind of HTML you'd get when exporting a Word document. Terrible.

(Here's an example of the XML data for one station, London Kings Cross.)

So I had a problem: how do I display this data, written in HTML, inside the app?

I had three options:

  1. strip out the tags, and just display the unstyled text
  2. embed a small WKebView inside each row of the SwiftUI List view
  3. somehow use native SwiftUI Markdown support, which is new to SwiftUI in iOS 15

Option 1: stripping tags

While this would display the text, it would lose formatting and - more importantly - lose hyperlinks. There are lots of useful hyperlinks inside the HTML, which I definitely want to keep.

Option 2: embedding lots of WKWebViews inside the SwiftUI List

I haven't benchmarked it, but I'd worry about performance.

Also, the HTML is terrible, often with inline CSS containing hardcoded fonts, font sizes and colours.

Option 3: use the native Text Markdown support

This is a definite possibility. With nice clean Markdown, I would be able to show some basic formatting and support inline hyperlinks. Performance, Dynamic Type and VoiceOver should work, because that's Apple's problem.

The only difficulty is... converting HTML to Markdown.

So I wrote a Swift Package which converts HTML to Markdown

Here's how it works:

import HTML2Markdown

let html = "<p>This is a <em>terrible</em> idea.<br/>I must be daft.</p>"

do {
	let dom = try HTMLParser().parse(html: html)
	let markdown = dom.toMarkdown(options: .unorderedListBullets)
	print(markdown)
} catch {
	// parsing error
}

This generates the following Markdown string:

This is a *terrible* idea.  \nI must be daft.

In Nearly Departed, if a parsing error is thrown then I fall back to the old behaviour - stripping out all the HTML tags and converting HTML entities (&amp;, etc) to readable characters.

How to get it

The Swift Package is available at https://gitlab.com/mflint/HTML2Markdown.

What is supported?

  • <strong> and <em> for highlighting text
  • ordered and unordered lists (<ol> and <ul>)
  • paragraphs (<p>) and line breaks (<br>)
  • hyperlinks (<a href="...">)

All other HTML tags are removed.

Note: SwiftUI.Text currently cannot render Markdown lists, so I've added a MarkdownGenerator.Options.unorderedListBullets option to generate nicer-looking bullets: instead of *.

Tagged with:

First published 21 December 2021