I'd rather be programming ...: 2013

Sunday, April 14, 2013

Creating an Image Gallery in HTML5

Disclaimer:

This example works easily on any Linux distribution. OS X users need to install ImageMagick or GraphicsMagick via MacPorts or Homebrew. Windows users need to install Cygwin/X.
People, who believe that the "width" and "height" attributes of the HTML tag are sufficient for scaling images should immediately shut down their computer, and never, ever attempt to design a web page agin.

This article describes steps on how to create an image gallery for a web site. We are going to use HTML5 elements, which are now supported by a large number of browsers, including WebKit, Safari, Google Chrome, Firefox, and Opera. Internet Explorer 9 is supposed to catch up. Our objective is to use the most recent advanced in browser development to create elegant and efficient applications. For the real world, however, additional steps are needed to guaranty users of older browsers a satisfying experience. View live demo at http://petermolnar.us/demos/ImageGallery/

Our goal is to quickly create an image gallery from a large set of photographs. We want to be able to add pictures in the future, and even create additional galleries. We assume that new photographs are added in chunks, and only occasionally: daily, once a week, etc. Therefore, producing the gallery on the fly is not required.

The proposed method does not require server side scripting. The gallery will be created off-line, and the resulting files can be transferred to the server. However, we make use of the powerful Command Line Interface (CLI) and shell scripting.
The examples run on any POSIX based operating system, including Mac OS X and Linux. Shell Scripting or batch scripting is also supported by Microsoft Windows, though, this technique is less known by Windows users.

Image Manipulation

Image manipulation software like Adobe's Photoshop and Gimp are powerful applications when manipulating individual images. However, running the manipulation on hundreds of images is not that easy. Instead, we are going to make use of the command line tool ImageMagick http://www.imagemagick.org/.

The package comes with a number of commands whereby convert is pretty much all we need to deal with. The page http://www.imagemagick.org/Usage/thumbnails/ shows how to create a number of different types of thumbnail images. For this example we use the quirky Polaroid feature.
Most photographers take pictures in very high resolution; too large for the web. Usually, images on web-sites should fit within a box 600 to 800 pixels wide and high. The convertcommand offers an easy re-size function that respects the aspect ratio of the image.

$ convert OriginalImage.jpg -resize 600x600 SmallerImage.jpg

The command works for images in landscape and portrait mode. The command also converts images from one format to another based on the given file extension. The ImageMagick package also features a command identify the get information about size, resolution and color model from a given picture. Since identify is not included in the Mac OS X implementation we have to resort to

$ convert Image.jpg -identify null:

Processing Images

Save the following code into process_images.sh

#!/bin/bash
#
for i in raw/*.JPG; do
   n=`basename $i .JPG`;
   echo -n "processing $n ..."
   convert $i -resize 600x600 images/$n.jpg
   convert $i -resize 150x150 -bordercolor '#EEEEEE' -background '#333333' +polaroid thumbs/$n.png
   echo " done."
done

Producing re-sized images and thumbnails is just one part of the job. If we consider that we may have to deal with hundreds of images we also need a way to automatically create HTML code. The example given here produces a complete page. However, more comprehensive projects would use external CSS and JS files.

#!/bin/bash

echo '
 <!DOCTYPE HTML>
 <html>
 <head>

 <style>
 div.thumbnail { width: 200px; height: 200px; vertical-align:middle; text-align: center; display: inline; }
 div.thumbnail img:hover { position: relative; top:-5px; }
 </style>
 </head>
 <body>
 <div id="thumbs">
'

for i in raw/*.JPG; do
    n=`basename $i .JPG`;
    echo "<div class=\"thumbnail\" id=\"thumb$n\" >";
    echo "<img src=\"thumbs/$n.png\" onclick=\"showImage('$n')\" />"
    echo "</div>"
done

echo '
 </div><!-- #thumbs -->
 </body>
 </html>
'

The final output looks something like this:

<!DOCTYPE HTML>
<html>
<head>

<style>
/* #spacer { background: white; height: 500px; } */
#thumbs { background: white; text-align: center;}
div.thumbnail { vertical-align:middle; text-align: center; display: inline; }


div.thumbnail img:hover {
   position: relative; top:-5px; left: 5px;
   -webkit-transform: rotate(10deg);
   -moz-transform: rotate(10deg);
   -o-transform: rotate(10deg);
}

/*#imagedisplay { width: 100%; position: fixed; top: 30px; left: 0px; text-align: center;  } */
#imagedisplay {position: fixed; top: 30px; left: 50%; width: 0px; }

.invisible { display: none;} 
#picture {
    position: relative; left: -320px; 
    border: #EEEEEE solid 20px;
    -webkit-box-shadow: 5px 5px 15px #000000; /* Safari and Chrome */
    box-shadow: 5px 5px 15px #000000;
}
</style>

<script>
function showImage(name) {
   pic = document.getElementById("picture")
   pic.src="images/"+name+".jpg"
   pic.className="visibile"
}
</script>

</head>
<body>

<div id ="imagedisplay">
  <img id="picture" class="invisible" onclick="this.className='invisible'" />
</div>
<div id="spacer"></div>
<div id="thumbs">

<div class="thumbnail" id="thumbCIMG0494" >
<img src="thumbs/CIMG0494.png" onclick="showImage('CIMG0494')" />
</div>

All the other images ...

<div class="thumbnail" id="thumbCIMG0740" >
<img src="thumbs/CIMG0740.png" onclick="showImage('CIMG0740')" />
</div>

</div><!-- #thumbs -->
</body>
</html>

Sunday, March 3, 2013

Processing class assignments on paper (Part I)

One of my classes is really old-school. I give out exercises on paper. The nature of the class requires the students to write a lot of equations, and draw diagrams. These things are really hard to do electronically, unless every student would have an iPad (or other tablet). Collecting, and keeping the papers is a pain. In addition to the hassle of filing or returning them to the students, I also have to enter the scores into something electronic after grading.

So, I'm developing a system to produce worksheets for the students. The returned sheets can either be graded with pen on paper and then scanned for further processing, or the stack can be scanned into a PDF file, and graded on an iPad. I choose to use QR-Codes to mark the pages. Bar codes seem to be more reliable than deciphering text labels with OCR. In particular, QR-codes can be reliably detected regardless of position, and to a certain degree scale and rotation.

System Requirements

The solution is based on a number of open source packages and libraries. Installation is easy for UNIX systems. I produce the handouts on OS X or Linux, processing of the scanned papers runs on LINUX.
Fortunately, these systems make it easy to install the various packages. Windows users may have a chance with Cygwin/X to get this working.

Creating Worksheets

The worksheets are produced in LaTeX (with help of some PostScript). Fortunately, there are some neat packages, like pgffor, that enable loops and the use of arrays. Some helpful documents are listed here:

Basic conditional if-else structures can be achieved with conditional variables in TeX. http://handyfloss.net/2007.08/latex-programming-how-to-implement-conditionals/

Another thing to point out is how to create LaTeX style files: There is a rather complicated, proper way to do it with documentation files (.dtx). However, for right now, I just put the preamble section of my LaTeX document, i.e. almost everything between \documentclass and \begin{document}, into a file names "worksheet.sty". However, there is one caveat: all \usepackage commands need to be replaced with \RequirePackage (more about that at http://tex.stackexchange.com/questions/19919/whats-the-difference-between-requirepackage-and-usepackage).

An interesting effect: the last page of the assignment already had the name of the next student. Apparently, the new header already became active for the last page. However, a tailing \newpage command forced the last page to be rendered, before the new header is set. Also, multiple new page commands don't create unwanted blank pages.

The following shows an excerpt of the style file:

\RequirePackage[letterpaper,margin=0.75in]{geometry}
\RequirePackage{graphicx}
\RequirePackage{amssymb}
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\RequirePackage{pstricks}
\RequirePackage{pst-barcode}
\RequirePackage{fancyhdr}
\RequirePackage{pgffor,pgfmath}
\pagestyle{fancy}

\newcommand{\hofilename}{XXX}
\newcommand{\hoclass}{ABC123}
\newcommand{\studentid}{A12345678}
\newcommand{\studentname}{John Doe}
\newcommand{\wsId}[1]{\renewcommand{\hoclass}{#1}}
\newcommand{\wsFile}[1]{\renewcommand{\hofilename}{#1}}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}

\setlength{\topmargin}{-1.7in}
\setlength{\headheight}{2.1in}
\setlength{\textheight}{8in}
\setlength{\footskip}{0.5in}

\fancyfoot[C]{%
  \foreach \val in {A, 0, 1, 2, 3, 4, X} {
    \val~\begin{pspicture}(0.7in,0.7in)

                \psbarcode{score=\val}{eclevel=M}{qrcode} \end{pspicture}
  }
}
\fancyfoot[L]{}
\fancyfoot[R]{}
\fancyhead[R]{\footnotesize Do not write over QR codes!}

%%\def\StudentsNames{{ "Smith, Fritz", "Jacobs, Lashonda", "Wu, Sarah" }}
%%\def\StudentsIds{ M12345678, 900123456, 900555123 }

\def\worksheets#1{%
  \foreach \id[count=\xi from 0] in \wsStudentIds {
    \renewcommand{\studentid}{\id}
    \newpage
    \fancyhead[C]{{\LARGE \hoclass:\ \@title}\\ \@date \\[1.5ex]%
        {\bf \pgfmathparse{\wsStudentNames[\xi]}\pgfmathresult}
    }
    \fancyhead[L]{%
        \begin{pspicture}(2in,2in)
          \psbarcode{c=\hoclass\&d=\@date\&f=\hofilename\&p=\thepage\&s=\studentid}{eclevel=M}{qrcode}
        \end{pspicture}
    }
    #1
    \newpage
  }
}

The two arrays, StudentNames and StudentIds should be defined in the main document, or in a separate file that will be included in each assignment.

An assignment would look something like this:

\documentclass[11pt]{article}
\usepackage{worksheet}

\wsId{CIS227}
\wsFile{CE02}

\title{Class Exercise 2}
\date{2013-01-24}

\input{students}

\def\handout{
  \noindent Prove the following tautologies by starting with the left side
  and finding a series of equivalent wff's that will convert the left
  side into the right side.
  \begin{enumerate}
    
  \item $(A \wedge B') \wedge C \leftrightarrow (A \wedge C) \wedge B'$
    \newpage
  \item $(A \vee B) \wedge (A \vee B') \leftrightarrow A$
    \newpage
  \item $A \vee ( B \wedge A') \leftrightarrow A \vee B $
    \newpage
  \item $(A \wedge B')' \vee B \leftrightarrow A' \vee B$
    \newpage
  \item $A \wedge (A \wedge B')' \leftrightarrow A \wedge B$
    \newpage  
  \end{enumerate}
}

\begin{document}
\worksheets{\handout}
\end{document}

Now, that I can produce these worksheets, I just need to develop the program that reads the scanned sheets and processes the grades.

Upgrading Python on Centos 6.3

In recent weeks I have become quite a fan of Python. Just like in many other cases, I learn to appreciate something once I have to use it. The attraction to Python is partly due to the sheer number of available libraries and packages. It seems like for virtually any task someone has already posted the 10-lines of python code to do it.

The majority of my computers run Centos 5.5 and Centos 6.3. Yes, there are some advantages in running Centos, and there are some good reasons not to upgrade a fine-tuned, running system. But Centos shows its conservative side with respect to the versions of its packages: Python 2.4.3 on Centos 5.5, and Python 2.6.6 on Centos 6.3.

Unfortunately, a number of software packages utilize Python, and many of them don't seem to be to friendly to the older versions. Even if most of my scripting would work on the older versions, those packages forced the upgrade.

The installation of newer versions is not that difficult. Fortunately, one can maintain several different version on the same system.

The blog http://toomuchdata.com/2012/06/25/how-to-install-python-2-7-3-on-centos-6-2/ describes how to download and install different versions quite nicely.

I choose /usr/local/Python-2.7.3 and /usr/local/Python-3.3.0 as the new locations. Fortunately, most of the work is done by running:

$ ./configure --prefix=/usr/local/Python-2.7.3
$ make
$ make install

Each Python installation has it's own directory for packages:

/usr/local/Python-2.7.3/lib/python2.7/site-packages/

That means one has to re-install all the packages that are needed. A significant advantage of separate installation directories is, of course, that different versions don't interfere with each other. Certain packages may not be compatible to all versions, either.

For users to select their Python version I created simple module files like this one:

#%Module1.0#####################################################################
##
proc ModulesHelp { } {
        global dotversion

        puts stderr "\tSelects Python 2.7.3"
        puts stderr "\n\tUse `python'."
        puts stderr "\n\t(PM 2013-02-15)\n"
}
module-whatis   "sets Python 2.7.3 as default interpreter
prepend-path    PATH    /usr/local/Python-2.7.3/bin/

All it takes is to set the PATH environment variable, though, there might be some other variables that should be defined.

Sunday, February 24, 2013

Expanding Short URLs

Short URLs are use to save characters in a message, such as tweets or emails. The may also be used to create a permanent or easy to remember URL to sites that for some reason may change.

When a web-browser sends a request to the short URL it usually receives a redirect message with HTTP status 301. Most browsers, will automatically load the redirected location, but your own programs may want to take care of this themselves (see Fetching_Instagram_Pictures).

Offering Short URL service is a great way to collect user data. Not only can those sites track visitors to sites that they don't own, but also place cookies to the visiting browsers.

Here's a quick example:

$ curl -v http://t.co/z9j0Drd7

* About to connect() to t.co port 80 (#0)
*   Trying 199.16.156.11... connected
* Connected to t.co (199.16.156.11) port 80 (#0)
> GET /z9j0Drd7 HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2
> Host: t.co
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Wed, 26 Sep 2012 02:17:41 GMT
< Cache-Control: private,max-age=300
< Expires: Wed, 26 Sep 2012 02:22:41 GMT
< Location: http://instagr.am/p/QBBn6jQPBQ/
< Content-Length: 0
< Server: tfe
< Connection: close
<
* Closing connection #0

Sometimes, a short URL may actually refer to another short URL. The process needs to be repeated in order to get to the final destination. The following shows example of functions that expand short URLs. The same process also tests if they are valid.

We use the CURL library http://curl.haxx.se that has bindings to a large number of programming languages, including PHP and Python.

The using CURL entails three basic steps:

Initialize with the target URL
Set options for the HTTP request, such as GET or POST method, any payload data, and control parameters. Callback functions for producing data that will be send to the URL, or processing data from the server's response will be declared here as well.
Execute the HTTP request. This function usually blocks until the response is received. (However, this can be changed.)

The example program below follows these three steps by calling the functions curl_init(), curl_setopt(), and curl_exec().

CURL can actually follow the redirects automatically, but there might be situation where one wants to see what happens in-between. In the example the line

$ret = curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);

prevents the library function to follow the chain of re-directions. Instead, we resolve the short URL one step at a time.

#!/usr/bin/env php
<?php

$header = array();
$level = 0;

function getHeader($ch, $data) {
global $header, $level;
$hh = explode(":", $data, 2);
if (count($hh)>1) {
$header[$level][str_replace("-", "_", strtolower($hh[0]))]
= trim($hh[1]);
}
return strlen($data);
}

// dummy function, unless we need to store the entire page
function getBody($ch, $data) {
return strlen($data);
}

function expandURL($url) {
global $header, $level;

$level = 0;
$hc = 0;
do {
$header[$level] = array();

// Create a curl handle
$ch = curl_init($url);
$ret = curl_setopt($ch, CURLOPT_HEADER, 1);
$ret = curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
$ret = curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
$ret = curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$ret = curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'getHeader');
$ret = curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'getBody');

// Execute
curl_exec($ch);

// Check if any error occurred
$error = curl_errno($ch);
$info = curl_getinfo($ch);
$hc = $info["http_code"];
// Close handle
curl_close($ch);

if(!$error) {

if (isset($header[$level]["location"])) {
$url = $header[$level]["location"];
}
$level += 1;

} else {
echo "Error: $error\n";
break;
}
} while ($hc==301 or $hc==302);

return $url;
}

$expandedURL = expandURL($argv[1]);

echo "URL ".$argv[1]." ---> ".$expandedURL."\n";
$parsed = parse_url($expandedURL);
$hostpath = implode("/", array_reverse(explode(".", $parsed["host"])));
$parsed["hostpath"] = $hostpath;
$parsed["iterations"] = $level;
$parsed["shorturl"] = $argv[1];
$parsed["md5tail"] = md5($parsed["path"]);
print_r($parsed);
?>

The example uses a callback function getHeader() to process the header information. The function will be called for each line in the HTTP header. Most lines in the header start with a parameter, followed by a ':', and the value. The callback function adds these to the associative array $header. We also need to take care of the blank after the colon:

$header[$level][str_replace("-", "_", strtolower($hh[0]))] = trim($hh[1]);

Pretty much the same thing can be done with the CURL binding for Python, pycurl. However, the package "human_curl" makes it even easier

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import human_curl as hurl
import sys

status = 301
url = sys.argv[1]

r = hurl.get(url)
sc = r.status_code
level = 0;
while sc==301 or sc==302:
locations = r.headers['location'].split(' ')
url = locations[1]
r = hurl.get(url)
sc = r.status_code
level+=1

print "%s ---> %s (iterations: %d)\n" % (sys.argv[1], url, level)

The location entry in the header includes part of the original and the redirected urls. We need to split the string up, and use the second part.

Links:
http://www.php.net/manual/en/function.curl-getinfo.php
http://www.php.net/manual/en/function.get-headers.php
http://stackoverflow.com/questions/472179/how-to-read-the-header-with-pycurl

Sunday, February 17, 2013

Uploading files via email

The following article describes a method for posting new documents on a website. Or more generally, uploading files to a server for further processing. I often create white-board pictures and annotated view graphs in class, and need to post them on the class web site. I found the CamScanner iPhone app particular useful to take pictures of the whiteboard. The app finds the edges of the white-board, crops the image and runs keystone correction and other image processing algorithms to enhance the picture. The other tool is Notability on the iPad, that I use to annotate my view graphs in class. (I prefer writing on my iPad over using SmartBoard with its tedious notebook software: the hand writing has to be so big that one can get hardly anything on the board.)

The majority of these iOS app have a number of ways to get your documents of the device, including Dropbox, Google Drive, and even built-in HTTP servers. However, I choose email because it will also support our departmental document scanner. Furthermore, my inbox fills up daily with announcements of workshops, internships, and other opportunities that I would like to post on my site. There wouldn't be anything easier than hitting the forward button.

Technically, I could email my documents directly to the server. However, enabling sendmail brings a whole bag of responsibilities with it, and negotiating a port 25 with the IT authorities doesn't seem to be worth the trouble. Instead, the described methods use an external, publicly accessible email server, like GMAIL. For my project I setup a dedicated GMAIL account, though, one could also use once regular account, and fetch emails from a particular folder (or label).

To get started, one needs a Linux box, fetchmail, procmail, and the nmh package. These should be available in every Linux distribution; in many cases they're already installed.

The basic fetch mail configuration is explained in http://www.daemonforums.org/showthread.php?t=5590, this blog http://badcherry.wordpress.com/2006/03/30/fetchmail-without-sendmail/ shows how to get around the sendmail daemon.

Here's the setup for Centos 5:

Install the packages:

$ yum -y install fetchmail procmail nmh

Create user account under which the emails will be processed. I wouldn't use my regular user account, but it's possible to use the same account. In this example, the user account is "adriaan"

Create a .fetchmailrc file to test the connection to GMAIL

poll imap.gmail.com protocol IMAP 
   user "xxxxxxx@gmail.com" is adriaan here
   password 'mysecretpassword'
   fetchlimit 1
   keep
   ssl

Test with

$ fetchmail -v -m '/usr/bin/procmail -d adriaan'

When everything works, we change the script to:
```
poll imap.gmail.com protocol IMAP 
   user "xxxxxxx@gmail.com" is adriaan here
   password 'mysecretpassword'
   fetchlimit 1000
   ssl
```
The fetchlimit may prevent disaster if suddenly too many emails come to this account. We removed the "keep" option. From now on, mails will be removed from GMAIL. By default, the fetchmail program only load unread messages.

The next step is creating a script for downloading (and processing) the emails. The script could look something like this /home/adriaan/bin/getProcessMail:

#!/bin/bash
#
fetchmail -m '/usr/bin/procmail -d adriaan' 
inc -file /var/spool/mail/adriaan -truncate +inbox
# this is just collecting ... need to process ...

The MH tools will be used to separate email messages into individual files. There are even tools to extract attachments. Having the email messages in separate files makes processing them easier. However, one may consider deleting the files ones their content has been processed. In order to use MH for the first time, run the command:
```
$ install-mh
```
We need to run this script every ten minutes. Use the crontab -e command to edit the user's cron-table. Add the following line
```
*/10 * * * * /home/adriaan/bin/getProcessMail.sh
```

Now, the email messages will be automatically saved on our system, and we're ready to process them. Everybody could send emails to the account. If this is not desired, the processing script may first check the sender's address, and dismiss all messages that didn't originate from a list of approved senders. Alternatively, one could achieve the same with GMAIL's mail filters.
The MH (http://www.nongnu.org/nmh/package) has a number of tools to deal with the messages, headers, and attachments.

Sunday, January 13, 2013

No more spying on site visitors ... How browsers limit the :visited style

I recently came about an article describing how one could estimate the gender of visitors by checking their browser history. Apparently, feminine users visit different popular sites than masculine users. http://www.mikeonads.com/2008/07/13/using-your-browser-url-history-estimate-gender/

So, how to get access to the users' browser history when they visit your site? There were some really smart ways to do it. Unfortunately, the (popular) browsers (at least) have caught up, and none of these techniques work anymore.

The idea is tricking the browser into revealing URLs that have been previously visited when it applies a different style to those links. In plain CSS: using the pseudo class a:visited. If these techniques worked, one could add a number of links to a web page, somewhere outside the visible area, or hidden by some other elements.

First up: jQuery. In the perfect world $("a:visited") should find us all the links. Somebody even wrote a plugin http://remysharp.com/2008/02/25/visited-plugin/. Behind the scenes the plugin uses the JavaScript function getComputedStyle().
Since that didn't work out, how about changing the style of visited anchor tags so that not only their placement would be distinct, but also the positions of surrounding elements? Well, it didn't work either. The browsers (I tested) refused to apply any style attributes that would alter the rendering of the page, i.e. only color and text-decoration were recognized.
In my last attempt I assigned a individual background images to the :visited pseudo class for each anchor tag. Something, that could be easily generated on the sever. E.g. as a PHP script. If the browser would apply the style additional HTTP requests would be generated for those visited links.

I was not able to trick any of the browser, to reveal their history. I assume the developers got smart enough to patch this security hole. http://dbaron.org/mozilla/visited-privacy

Of course, the desire to obtain demographic information from site users still prevails. I'm looking forward to seeing what else people come up with...