Ben Humphreys

  • Archive
  • RSS

Perl HTML::TableExtract, nbsp and ASCII 160

I was trying to match tables with what looked like spaces in them, using HTML::TableExtract, but for some reason it wasn’t matching, using explicit $foo eq ’ ’ and $foo =~ /A\s*\z/.

After 30 minutes of banging my head against the wall, I found out that the spaces were actually   in the source, and HTML::TableExtract unhelpfully changes them to their ASCII counterparts. This is not the standard ASCII code for space (which is 32), but freaky bizarro space 160.

What is even worse is that perl’s regex \s doesn’t even cover it.

So I was forced to do something disgusting like:

if (ord($foo) == 160) ...
    • #perl
    • #programming
  • 2 years ago
  • Comments
  • Permalink
  • Share
    Tweet

Recent comments

Blog comments powered by Disqus
← Previous • Next →

About

Avatar Computational linguistics researcher at Kyoto University, focussing on machine translation. Also learning Japanese, Korean, French and other badassery.
(日本語版)

Me, Elsewhere

  • @benhumphreys on Twitter
  • benhumphreys on github
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr