Weather station code now on github

January 8th, 2013

We’re switching from Subversion to Git at work, and I decided to use the te923 code as an exercise to get started with Git.

Souce code is at

Hive tables, partitions and LZO compression

February 24th, 2011

At Lijit we’ve been working with lots of the projects in the Hadoop ecosystem.  In particular, we’re using Hive quite a bit, since it abstracts map/reduce into a familiar SQL-like language.

We deal with fairly large amounts of webserver log data, so are also saving HDFS space and job i/o by using the hadoop-lzo package. It gives fast compression that retains our ability to use the data through Hive queries.

If you are only interested in compression, and have Hadoop and Hive configured appropriately, you can even mix compressed and uncompressed data in separate partitions of a Hive table.  A normal table definition will work:

                       columnA string,
                       columnB string )
       PARTITIONED BY (date string)
       LOCATION '/path/to/hive/tables/foo';

One big advantage of LZO, though, is its ability to be split in map/reduce jobs. This is done by creating an index of the LZO file with the LzoIndexer tool of the hadoop-lzo project. To actually use the index, you will need to use a special input format for your Hive table:

         columnA string,
         columnB string )
    PARTITIONED BY (date string)
    STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
          OUTPUTFORMAT ""
    LOCATION '/path/to/hive/tables/foo';

Now to actually come to the point. In my case, I had already created the table, and was trying to add indexing after the fact. Hive permits changing input format with an alter statement:

        INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
        OUTPUTFORMAT "";

But this alters only future partitions, not existing partitions. They retain their TextInputFormat. So now when I ran my Hive queries, instead of the LZO index file being used for splitting the input, it wound wind up used as table data. My results were mostly correct, but there were some result rows that were garbage.

I fixed this by dropping and recreating the table and partitions with the correct input format. Because I use EXTERNAL tables, the data itself was preserved.

While this is not a big deal, I have lost the ability to mix compressed and uncompressed data in the table. The Hive language manual claims I can alter partition metadata, which would be another way to deal with this, but so far I’ve not been able to make that work in versions 0.5 and 0.6.

Thanks to Dmitriy and Johan from Twitter for helping me understand all this.


The original hadoop-gpl-compression project:

Hive language manual:

Discussion of Hive and table attributes:

dvdstyler on debian

December 12th, 2010

Maybe this is obvious, but it took me more effort than I thought to get dvdstyler working on my debian machine. So here’s the quick-n-dirty(tm) recipe.

It is easily installable via apt-get from after all. Don’t bother with the package advice on the dvdstyler site. Just add deb lenny main to your /etc/apt/sources.list file and run apt-get update. You should then be able to install with a simple apt-get install dvdstyler

If you run a headless debian server as I do, vncserver works like a champ.
Install: apt-get install vnc4server.
Execute: vnc4server -geometry 1024×768 -depth 24.
Export your display: export DISPLAY=myserver:1.
Run: dvdstyler.
Connect a VNC client to myserver:5901 to drive dvdstyler.

See the man pages for vnc4server for more info about the display number and connecting a client.

Weather station – one last thing

May 9th, 2010

It is also possible to read version and status information from the weather station, and the te923con application gives access to this data as well. It is formatted the same way as the weather data, and so was also trivial to process in PHP. I wrote a script that gets the station and sensor status, and sends a notification email if one of the sensors has a low battery. That script is also included in the te923 zip file. Again, I know my PHP skills are weak, so if you have improvements I’d be interested in them.

I scheduled it to run once a day at midnight:

0 0 * * * sleep 30;php te923Status.php <notify email address>

The 30 second sleep is intended to offset the status check from the normal weather data check that happens exactly on the minute.

Debian out of the box doesn’t relay mail to external domains, so I had to do another tweak here. To enable forwarding, I reconfigured exim according to

It’s not really the proper way to create a real relay server, but since I’m behind a firewall and only using the server for this purpose, I didn’t feel it necessary to do more. But the email does look suspicious to the receiving mail system, so if you send to an address outside your domain (like gmail, for example), the mail is likely to be determined as spam. You’ll need to create whatever filters necessary at the recipient account to avoid this.

And that’s it so far. I expect I’ll delve into RRDTool and RRDWeather now to see if I can create graphs of readings Weather Underground does not (like humidity, for example).

Weather Station – fixing the bugs

May 9th, 2010

The first thing I discovered is that the te923con application has a bug in decoding UV index data from the station. The index jumps from .9 to 10.0. A simple patch to te923_com.h is required. This diff output actually is a change to a single line that I split up for clarity here:

@@ -138,7 +138,7 @@
     else {
-        data->uv = bcd2int( buf[18] & 0x0F ) / 10.0 + 
               bcd2int( buf[18] & 0xF0 ) + 
               bcd2int( buf[19] & 0x0F ) * 10.0;
+        data->uv = bcd2int( buf[18] & 0x0F ) / 10.0 + 
               bcd2int( ( buf[18] & 0xF0 ) >> 4 ) + 
               bcd2int( buf[19] & 0x0F ) * 10.0;
         data->_uv = 0;

The next thing to address is the permissions problem. To this point, the only way to get data from the station was to be root. Otherwise, you get this error:

This is a generic issue with USB devices, and I found an item on a wiki ( about GPS units that got me going.

That page discusses how to set the group ownership on the device, as well as the permissions on the device. Long story short, I created the device rule set /etc/udev/rules.d/99-te923.rules (all on one line):

ATTRS{idVendor}=="1130", ATTRS{idProduct}=="6801",
                    MODE="0660", GROUP="plugdev"

idVendor and idProduct identify the TE923 weather station, mode tells the USB driver to give read/write permissions to the user and group that owns the device, and group tells the driver to assign group ownership to “plugdev”. My user on the machine is a member of that group so I should be OK.

Ask the system to reload the USB rulesets:

[mcp:.../te923/te923] sudo udevadm control --reload_rules

And now I can get valid data back from the unit without being root:

[mcp:.../cwanek/cronjobs] te923con

Unfortunately, though, some readings (specifically current temp readings) are empty. Even wide-open permissions on the device don’t help. This doesn’t make sense to me, and I have not yet solved this issue, so I’m still stuck with being root to run te923con. I’d love to know why it would work only partially.

Still, I don’t really want to have root’s crontab running the script, so I configured sudo to skip the password prompt for the group plugdev for the te928con application. With visudo, add:

%plugdev ALL=NOPASSWD: /usr/local/bin/te923con

Moving on, the I was still not able to run te923con without removing the USB human interface device module (sudo rmmod usbhid). While removing it allows access to the te923 station, it would also cause any other HID like a mouse or keyboard to stop functioning. So the trick is to get the usbhid module to release just the weather station.

There are many sites that document how to get usbhid to unbind a device. I found, which gave me the following command:

sudo bash -c "echo -n 2-1:1.0 > /sys/bus/usb/drivers/usbhid/unbind"

bash -c is required so the shell redirection to unbind will succeed.

Digging deeper, I learned ( that you can configure the unbind to happen immediately after the device is connected, by using the “RUN=” option in the device rules file (this is all on one line in /etc/udev/rules.d/99-te923.rules):

ATTRS{idVendor}=="1130", ATTRS{idProduct}=="6801",
    MODE="0660", GROUP="plugdev", 
    RUN="/bin/sh -c 'echo -n $id:1.0 > /sys/bus/usb/drivers/usbhid/unbind'"

So now, apart from the missing data when running as a non-privileged user, the te923 weather station is coexisting with other USB devices, and I am able to schedule the upload script in my own user crontab.

There’s one last thing