MongoDB and Python [Review]

Sun, 14 Apr 2013
Vote on HN


MongoDB and Python

I needed a database for the Web App I'm making and thought a document store would be nice, because it makes it really easy to look up attributes of objects I store in it. Given I've been hearing good things about MongoDB the past year, I decided to finally give it a serious try. So I picked up MongoDB and Python: Patterns and processes for the popular document-oriented database hoping to pick up some best practices.

Sadly the book is really basic, so while it was helpful to get the hang of the syntax, I could have learned that from their documentation. They reuse a lot of code between examples, making them rather repetitive and don't really show off any examples that show the benefits of having a document store like MongoDB. Furthermore, the book is only 53 pages, which really doesn't justify the price. So I'd skip this one and just go for the online documentation and experimenting yourself.

O'Reilly Blogger Review Program


Introduction to Tornado [Review]

Sun, 14 Apr 2013
Vote on HN


Introduction to Tornado

I recently decided to try and create a web app and picked Tornado as my web server, because it is also being used in IPython. I like learning new tools by reading books about them, so I got my hands on a copy of Introduction to Tornado and got started.

The book is pretty thin, which I think in this case is a good thing. Its not meant to exhaustively describe all features Tornado has to offer, but rather a gentle introduction. The book covers all of the important elements to get you started:

  • Creating templates
  • Extending templates with Javascript and CSS
  • Interacting with databases (MongoDB in this case)
  • Making your web app asynchronously
  • Basic security features and authentication
  • Signing in with Twitter and Facebook's OAuth

The book features several nice examples, like a shopping cart for a bookstore, asynchronously keeping tabs on how many items are remaining. A simple Twitter client displaying your latest tweets, a Facebook client showing your timeline, both dealing with authentication. Most examples worked pretty well, though I had some issues getting the Twitter client working, because of errors I made in the callback url on localhost. I didn't get the Facebook example working for the same reason, but its not a big issue.

Overall, I found it a pretty useful book. While I was already somewhat familiar with Web Apps through Udacity and Coursera courses, it was good to get a bit more formal explanation about topics like routing, handlers and templates. I also liked the way they explained what each part of the code did, instead of assuming you had already figured it out. So while its a short read, I think its a nice introduction to Tornado to get you going.

O'Reilly Blogger Review Program


World, meet my alpha version (part III)

Mon, 28 Jan 2013
Vote on HN


On the top of my bucket list is fixing a ton of bugs or issues with regards to the workflow, that should make the app run more stable. A lot of things I want to change are caused by common beginner mistakes. Even though I’ve read the Pragmatic Programmer and tried to take a lot of their comments to heart, it’s funny how poorly I understood what it all meant until I really encountered a situation where it applied. Jeff Atwood posted a nice summary of the book here.

Some of the things I’d like to fix are:

  • Refactoring my early code to be more up to date based on my current experience. This applies to much of the main panel and how the GUI is controlled by PubSub.
  • Make the code more self-contained. Even though I tried to apply MVC by separating the GUI from the calculations and the database, when I started out a lot of database functions would pass along a message back to the GUI. You can imagine that later on when you want to reuse the same database functions, the GUI got called as well creating all sorts of nasty side effects.
  • Because I didn’t fully understand how to use debugging, I cheated by simply adding print statements to each function. Obviously that doesn’t scale (though it helped me understand my workflow tremendously), so instead I’d like critical functions to send a PubSub message to which I can subscribe. Then I can simply add a Settings option to print these messages from one central location or simply unsubscribe from them.
  • Currently when I change the database, I manually change my own MySQL database and then figure out how to replicate it. Up until now I’d advise my one user not to invest large amount of time in annotating contacts, because she’d risk having all that effort go to waste when I decided to update her table. In the future I’d like to make this more easier for the user, especially the non-technical ones, by only dropping tables if its trivial to compute the results again or update the database (rather than dropping it) when it’s an important table.
  • Interface several database action/settings and more general settings, like making a backup of the database, tweak the ratios of the color map or change the default location where the measurement data can be found.
  • Add exceptions or make sure the code doesn’t have any expectations about the data. A good example was that my results first required each subject to have 4 paws, else the code wouldn’t run. But some dogs or cats are amputees and humans obviously only have 2 feet, so I did my best to remove any of these expectations. However, I’m sure there are still some of these assumptions hidden in the code waiting to come across a trial that doesn’t meet them.
  • Document more! This is one major area where I’ve slacked off, telling myself that it’s just a waste of time and didn’t feel like wasting it. However, after spending several days trying to hunt down obscure bugs, because I didn’t fully remember what function triggered what other functions, I’ve definitely changed my mind. Another great advantage I’ve found was that when I write out what I want the code to do, I spot errors in my thinking much faster and get a much better grasp of what I actually want the code to achieve. On top of all this, I want to look into a library that takes all my doc strings and uses that to create proper documentation.

Furthermore, I’d like to keep adding features that either improve the usability or that make the app more useful. Examples of this are:

  • Allow the user to edit objects, rather than requiring them to delete it and create a new one. This applies to simple things adding measurements after a session has been saved, tweaking a single zone location or even being able to redo it without having to recalculate all the results.
  • Allow the user to drag the zone’s square to the right position rather than requiring to use the directional keys on the keyboard. Not everyone prefers using a keyboard, so they shouldn’t be forced to use it.
  • Make it easier to annotate the contacts when the keyboard lacks a numpad. While the numpad is definitely the fastest way to annotate them, laptops generally lack one and pressing Fn while trying to find the keys isn’t as easy. I used to have a version where you could click on the average contacts, but the event required to do so (EVTCHILDFOCUS) had the nasty habit of not being very reliable and getting called more often. One ‘shortcut’ would be to make a button and display the imshow() image as the face of the button.
  • Create a function to manually override the paw detection in a given slice of the entire plate, so if the automagic detection fails enabled the user to try and fix it.
  • Allow different shapes, sizes and number of zones. This should make the application more flexible for measuring other kinds of data, such as humans, horses or elephants (you have to think big!).

There are also features I’d like to add that allow for more data exploration and analysis:

  • Currently I don’t remove any outliers (other than ignoring incomplete contacts or those that didn’t get recognized properly), so it would be helpful if before calculating all the definitive results you could clean up the results. For example by displaying histograms with the distribution of certain variables or plotting all contacts in one graph. By listing all contacts in a list and allow the user to delete them if necessary they can perform any required data cleaning.
  • At the moment you can only analyze one protocol at a time (while the graphs do allow you to plot both at the same time), but obviously comparing them would be very interesting. As I’ve experienced in the past, just displaying two graphs next to each other is not comparing. It takes a lot of experience to interpret the differences especially without a clear frame of reference.
  • Another thing that I’d like to analyze are differences on a population level. Even though I already experimented with this, I wasn’t happy with the end result. Simply displaying multiple graphs over one another wasn’t really useful and since the dogs were subdivided into weight groups there was a lot of variety in numbers. Another issue was averaging dogs with a ‘normal’ and amble gait pattern. As you can see in this figure, the step lengths between the left front and right hind paw (bottom right) were either negative (right hind being behind the left front paw) or positive. However, when you calculate an average, you get a value around -10 which doesn’t really describe either pattern. Clearly there’s a need for some more diligence when segmenting the data.
  • I don’t only want to display the population data alone, but rather compare any dog with the ‘normal’ values based on the medical history of the dog. That way its results are much easier to quantify, because you get a sense of what they should be, so spotting abnormalities should be a lot easier.

Lastly there are several things that I’d like to work on in the future, but that will me require me to learn a lot more. First on the bucket list is: - Reading more books, first of all finishing Code Complete. I’ve got plenty of interesting books, I just haven’t had the time the past few months to try and read them. On top of that, in most cases its best to have a small pet project to try out all the news things you’re trying to learn. This obviously doesn’t go so well if you’re under any time constraints. - Experiment with OpenGL, because though I have an animation working, that library doesn’t work with wxPython. So I’d love to learn some OpenGL so I can learn how to draw to a Canvas myself, without the need of a library to do the heavy lifting. - I’d love to mess around with Microsoft’s Kinect. Not only because it’s a fun gadget, but also because the kinematic data could be a great asset to the gait analysis. First off, it allows me to measure the angles and make estimations about the moments around the joints. Second, it tells me which paw is where at what point in time, so a synchronized Kinect + pressure measurement would make it much easier to automate the paw detection. But obviously Kinect requires some OpenGL knowledge (for performant displaying), OpenCV knowledge for interpreting the image data and brushing up on my old courses on Inverse Kinematics... So I’m still a long way off ever getting this to work. - Try out if sorting results is any better when using MongoDB. Now I know that NoSQL isn’t the solution to all my database problems, but the problem I have with MySQL is how it requires me to break my data into pieces and stitch them back together when I need them again. I’d much rather leave things as they are and skip tedious parsing loops every time I need a different result. Furthermore, requiring a schema is an absolute pain in the ass when your design isn’t set in stone. I’ve spend nearly as much time writing code to ‘build’ a database as I needed to put things in the database. Off course, I don’t intend to break something that’s already working, so I’m probably going to work on a small pet project to see if I like this any better. - Given that I’m writing scientific software I need to be pretty darn sure whether the results are correct. I’d love to build in a dummy data set that can function as a ‘Mock’ object and allow me to check whether the results are correct. This should allow me to catch errors with my calculations that aren’t detectable with the human eye. Especially when you have highly dimensional data, errors easily slip in without a good way to spot them early on. - Above all else, I desperately want to be able to manage measuring myself! Currently I first need to do measurements in the vendor’s software, without any way to tell if the measurement went ok, then export it from their software and import it into mine. You can understand that most clinicians would find this process far too laborious and decide not to use my app. On top of that, the new drivers allow for continuous measuring in contrast to the 2 second limit of the current software version. This would allow me to greatly increase the pace at which the measurements can be performed and analyzed. However, I first need to convince the vendor to give me this access…

As you can see there’s more than enough work for me in store! On top of all this I have to await what the clinic thinks of my first version and how useful they find it. But I’ll be sure to keep you guys up to date on any progress I’m making. If you have any questions on how or why I did something a certain way, be sure to drop a comment!


World, meet my alpha version (part II)

Mon, 28 Jan 2013
Vote on HN


When all the toes are set and the final results are calculated the ‘fun’ part can begin! Now we get to analyze the results and draw smart conclusion based upon them. Note that some results are calculated based upon the average contacts (like the pressure per zone, the location of the zones and the foot axis) while the rest are calculated for each individual contact and then averaged.

The average pressure over time with a standard deviation (N/cm^2). This is basically the sum of all the sensors divided by the surface of all the activated sensors.

Average pressure over time

Note that the surface is probably an overestimation, because even when a sensor is partially loaded, it will still count the entire surface. There are some ways to counter this, but probably a better way would be to ignore the really low values (<0.1 N/sensor). Interestingly enough the values remain pretty constant once they reach their maximum.

Overestimation of the surface

The average force over time (N), which is simply the sum of all the activated sensors.

Average force over time

In humans we generally see an M-shaped curve (purple line), which is due to the ‘rockers’ in the human foot. The first peak is caused by loading the the rear foot (white line), the second peak is caused by shifting the weight towards the forefoot (green line) in order to push off. In the dogs case its just one peak, this most likely has to do with the way quadrupeds walk.

Human pressure over time

The average surface over time (cm^2). The surface is almost the reversed of the pressure, the dogs tend to put down their paws very flatly (all toes making contact) and don’t take start taking them off until late (> 60%) in the stance phase. The lift off also seems to happen pretty evenly in most cases, where first the central toe, then the medial and lateral toes and lastly the two front toes are lifted off.

Average surface over time

The average force (not pressure!) for each ‘toe’ or 2×2 zone (N/cm^2).

Average force (not pressure!) for each toe

Blue is the central/rear toe, then from medial to lateral: green, red, light blue and purple. Because the surface is the same, you can easily compare the forces. The vertical lines depict when the central toe reaches its maximal pressure and when its lifted off. This is analogous to the phases of gait as described by Willems et al (2004, pdf link), which describe very reliable phases during the roll off of a human foot. Starting at the landing of the heel to the first contact of a metatarsal to the contact of all metatarsals to the lifting of the heel and eventually the foot.

Roll off phases

As you can see there are a lot of similarities between the left and right paws and also within the paw the force values within each toe are very similar.

The center of pressure plotted on an image of the maximal values of each sensor.

Average center of pressure

It seems every paw first lands more on the lateral side, only to stabilize somewhere in the middle. In most cases the line is so straight which indicates there’s a very good balance between the medial and lateral side of the paw. Imagine the ankle as a weight scale, where the left and right side are in a constant battle to keep the scale in balance. In order to keep them in balance it requires the muscles on both sides of the paw need to contract with the right amount of force. I expect lame dogs, that may be lacking this balance, will show a completely different center of pressure.

Pronation-supination of the elbow of a dog

The location of the toes (manually set by me).

Toe locations for each paw

The size is fixed at the moment, though technically I could reduce it to 1×1 or scale it up for larger animals. Note that the location in the images may be slightly off, because the interpolation I use (scipy’s map_coordinates) slightly translates the paw, which frustrated me to no end. I haven’t found a solid solution for this sadly.

The paw axis, from the central toe to a point between the two central toes.

Paw axis for each paw

In humans I believe the angle is between 2 and 12 degrees, where a positive angle means the paw is exorotated. Now my definition isn’t perfect, especially because I found some large variations in the shape and loading pattern of the central toe. However, I think it will help find extreme outliers in either direction.

The step length, width, duration for each paw compared to the other paws with an image of the relative positions of each paw

Spatial-Temporal Information

I’d love to figure out how to change the image with the relative positions to an animation that shows how the paws are positioned to each other over time as well. Especially for the running trials, it looks like the paws land very close to each other and you can’t really imagine what this means in practice. Still the current image does help visualize how large the steps and strides are, if the dog were lame on one paw it would probably have an asymmetrical step length and easily stand out.

And lastly a dashboard with some useful stats.

Overview of average results

For Pressure/Force/Surface I calculated the maximal value, the percentage of the stance phase where the maximum was found and the ratio between left and right. I first used an Asymmetry Index (ASI), but I found the values much harder to interpret and probably only make sense if you can compare them between populations. For each of the forces per zone I calculated the same, with the exception that the percentages at the end aren’t left vs right, but the ratio between the toes (which seemed much more useful).

At the bottom you find the axis in degrees, where exorotation is positive. The ‘Timing’ next to it gives the step length, width and step duration just comparing the left vs right paws (step) or the paw with itself (stride).

I actually already made changes to the version you see above, because I initially had the id’s of the two protocols (running and walking) hardcoded in my calculations so I was sure everything would work. But it turned out to be fairly trivial to made it generic enough to allow any protocol id. The list of measurements on the side is therefore replaced with a list of protocols, so you can easily switch between them.

In my last post I’ll talk about the future changes I’d love to make, so read on!


World, meet my alpha version

Mon, 28 Jan 2013
Vote on HN


After several months of pain, sweat and tears I’ve finally wrapped up a first alpha version of my app! For this initial version I mainly focused on getting features working in a semi-usable way, which means it may look rough around the edges, but it gets the job done and in most cases relatively fast. As some of you may remember, we’ve measured over 30 dogs with each 24 measurements. Each measurement containing anything between 6 to 12 contacts, depending on the size of the dog. This leads to a total of about 7000 contacts(!) which I’ve manually assigned with labels for each paw (Left Front, Left Hind, Right Front and Right Hind). On top of that I manually assigned the location of each of the five toes, though I cheated by only calculating this for an average contact based on whether the dog was running or walking. Still this means 25 dogs, 2 types of trials, 4 contacts and 5 toes totally in around a 1000 toe positions. Note that the amount of dogs slightly reduced, because some of them were so small or light that it was impossible to discern any toes.

Now I bet you’re curious what this all looks like! Well I won’t keep you waiting any longer.

The database screen

The Ribbon

I’m a huge fan of Microsoft’s Ribbon and luckily wxPython had its own version. So I was quick to add one myself, because it allows me to use tabs to switch between logical sections of my app and gives me large icons which make for easier clicking (due to Fitts’Law). While the current 48px are probably a bit overkill I’m still fairly happy with them. The only thing that bothers me is all the stock icons and the depressing amount of duplication. Worse, because I compressed so many functionality into one screen I’m also stuck with a huge amount of buttons.

Basically it reminds me of this:

My app != Google's app
Image from

Now this isn’t a fair comparison because I wish you good luck trying to insert subject info or manage a database with just one button, but that’s not to say there isn’t room for improvement.

As you can see, the main tab consists of 4 elements: - Searching the database - Adding subjects to the database - Creating a medical history (anamnesis) based on tags (work in progress) - Creating a session and adding measurements to the session

I think I’m going to reorganize the panel so when you start you basically get a Google like interface: search for a subject and you can get started. If you need to analyze new data then you press a button to add a subject and the other relevant buttons appear (making the ribbon context sensitive like Office) and display the panel to insert the data. Finished inserting a subject? Great we switch to the panel to add measurements. Want to add a more detailed medical history, switch the panels to display just those panels and adjust the ribbon.

Another reason for wanting to change the main tab is that this was literally the first code I wrote, which means it’s horrible. Even though I tried to maintain some sense of a MVC structure, but due to my limited experience I failed pretty badly and a lot of functions need to be untangled.

Processing the measurements

Now on to the more interesting stuff: processing the data.

Processing the measurements

Again the Ribbon is crazy crowded at the moment; this is because there are so many actions required to allow for a flawless and usable paw annotation. Imagine this:

  • Search for contacts
  • Refresh loads the average contacts from the database
  • Remove any contacts that are ‘incomplete’
  • Save the results when you’re done
  • Delete contact removes it from the list
  • Previous/Next Contact let you switch between contacts
  • Undo it if you make a mistake
  • Delete all the contacts in case they are saved the wrong way
  • Marking the four contacts
  • Add a protocol so we can discern between measurements
  • My magic eight ball to parse measurement names into protocols
  • Cancel protocol set’s all the choices back to default in case you make an error
  • Delete all the protocols in case you made a mistake

Then there’s a couple of buttons which could be made context sensitive, because they aren’t needed until you need to assign the zone locations.

  • Find zones button was supposed to mark any zones it could find
  • Add a zone (moving is done with the keyboard arrows)
  • Save the zone locations in the database
  • Undo all the zones in case you made a mistake (before assigning)
  • Delete the zone

As you can see there’s a somewhat recurring pattern: create -> store -> delete

Perhaps I could ‘streamline’ this process by making the program assume what the user wants to do after certain actions, but honestly I think that’s far too error prone and we’re talking about science here. Shuttles have exploded for errors like this and we don’t want to fit a pair of orthotics based on erroneous data.

Besides, what computer geek honestly uses buttons anyway? I already added several keyboard accelerators to make very easy to be more productive. Ctrl+F to search, CTRL+O to remove incomplete contacts, 7 (Left Front), 9 (Right Front), 1 (Left Hind), 3 (Right Hind) which map to the anatomical order of the paws :

Mapping paws to keypad keys

Ctrl+S to save all the annotations to the database and rinse and repeat. In case there’s a contact you don’t want to have stored, simply don’t annotate it and it won’t get saved. When you save a measurement, it will automagically load the next measurement, to reduce any additional redundant key presses.

How can I improve my paw detection?

Off course, I couldn’t have made this feature without the help of Joe Kington, my Stack Overflow hero, who gave this awesome answer that helped me find and sort my paws. While I personally find the second answer more impressive, I didn’t end up using his principal component analysis, because in its current form it doesn’t perform much better than chance. However, based on the results I’ve gathered so far, I should be able to come up with additional heuristics to make the algorithm perform better.

This also shows why it’s so great to have an application wrapped around all the scripts I had in the beginning, I can extend existing functionality without having to redo a lot of work, because a lot of the foundation required is already in place. For example, the GUI allows me to add multiple panels, which let me easily switch between different views of the same data. If I didn’t have a GUI, I would have had to create several figures or switch between them with a command line interface. You can imagine that performing such tasks are very error prone and quite tedious if you constantly have to pass slightly different arguments to display the data you want.

Back to Joe’s code: his find_paws function while dead simple totally did the job:

def find_paws(data, smooth_radius=5, threshold=0.0001):
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    thresh = data > threshold
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    coded_paws, num_paws = sp.ndimage.label(filled)
    data_slices = sp.ndimage.find_objects(coded_paws)
    return object_slices

Now it’s not flawless, as you can see with the contact in the middle, it’s larger than the smooth radius used by the uniform filter. The result is that it recognizes them as separate contacts. The problem gets worse with human feet when any midfoot contact is lacking, because then it will split up the foot into a rear foot and a forefoot.

Example of where the paw detection goes wrong

While I could make the smoothing radius scale with the size of the dog (or the weight for instance) this is a slippery slope. When the dog is running fast, the front and rear paws land on nearly the same spot almost at the same time. If the smoothing radius is too large, it might ‘merge’ these two prints. As with most algorithms, when you try to optimize for certain cases it’s bound to perform worse in other cases. The ‘easy’ way out is to make it as easy as possible to manually correct the algorithm and using the adjustments as feedback for future corrections.

Manually labeling the paws

When you search for the contacts, every contact gets highlighted by a white square. As soon as a contact is annotated with appropriate paw, its color coded accordingly. On the left we see a list of all the found contacts, which shows the duration of the contact (in frames), maximal force (in Newtons) and the maximal size of the surface (in cm). Originally I thought this list might be useful so you can see the differences between contacts, but I found I never looked at the list.

Instead, I’d use the comparing tool at the bottom. What it does is compare the currently selected contact (yellow square) with average representations of the already annotated contacts. It also predicts what contact it probably is based on a simple subtraction: subtract the selected contact pixel by pixel from the other contacts, the one with the smallest difference is likely to match it most likely. I’m sure there’s a better comparison, like using PCA, compare all the frames, not just the maximal values or comparing multiple values, but honestly I found that after annotating 2/3 contacts the comparison would do a pretty good job and otherwise even I had a hard to ‘guesstimate’ which contact is resembled most.

Something that might be of use, like someone suggested on Stack Overflow, is using Inverse Kinematics because if two paws are in contact with the ground, the next contact can’t be one of those two. This should greatly reduce the amount of options at any given point in time. Furthermore, in a lot of cases there’s a clear pattern in which the paws are placed, i.e. Right Front, Left Hind, Left Front, Right Hind etc. One might even wonder if the measurements where this pattern doesn’t occur was even a valid one. Obviously, this doesn’t hold in all cases, because some dogs inherently walk seemingly random, alternating between a normal and an amble pattern.

Another important feature is the adding of protocols, which is basically a way to tag measurements. For instance when I try to fit someone a pair of running shoes or orthotics, I’d generally measure them both barefooted and shod, walking and running and wearing several shoe models or variations of an orthotic. This means I can’t calculate an average over all these different ‘protocols’, because they have very different results. My current version can only sort measurements based on one protocol and while technically I could add a slew of protocols like ‘Barefoot Walking’ to create what I want, but you can understand the list can become very long if I’d need to make combinations for every type or brand of shoe…

Peak detection in a 2D array

After processing all the measurements, the results get calculated. One of the results is calculating an average contact for each paw with the same protocol. This has several advantages, because loading all 3D arrays for each contact with the same protocol is quite data intensive. Furthermore, I’m mostly interested in average results not in single contacts, so I might as well compute the average up front. Now I understand this is an enormous data reduction, but as you’ll see next it comes with a similarly enormous advantage: I only have to set the toe’s positions (zones) once per paw!

You can image that with 4 paws and 5 zones and 2 protocols, it would become very laborious if I’d have to set the zones for every contact with the same protocol. Now I only need to set 40 zones, whereas else it would have probably been at least 200 per dog(!).

Manually setting the toe locations

Now there might be a lot of improvements I could make with regards to the averaging. Currently I’m not making any corrections, while slight paw rotations and shape differences effect the average. Rotations mean the toes aren’t aligned and it might increase the average’s values between the toes and lower those of the toes. The shape differences, for instance when the rear toe wasn’t fully loaded, would result in a ‘shorter’ contact. Since the arrays start behind the contact, this means the toes would end up half way through the average contact.

Again though, it would probably be a better decision to ignore any extreme outliers, unless the variation is more the rule than the exception. One possible solution would be to rotate and translate the contacts to have the optimal overlap. An interesting method I’d love to use is this form of shape matching, where any shape is transformed to a time series. I can then figure out how to transform the contact so they get a better overlap, which in this case would be the smallest Euclidean distance. Obviously thinking up ways to do so is easier said than done, but these calculations can be used for other purposes down the line as well, like tracking motion.

Converting a skull contour to a time series Recognizing pose from time series

Because the story was getting so long, I’ve cut it in two pieces. So read on for part two where I talk about the results!