Data, Diverse and Mighty

I’ve been thinking lately about how many different types of data I actually worked with. Surely, there are indoor light energy traces that I collected and analyzed in an extensive study that resulted in 2 top-tier papers, a public dataset, and that was used to drive aspects of the design of the EnHANTs and the algorithms and protocols for light-powered nodes. And there are also human and object motion traces that we collected and analyzed in a study of motion energy for the Internet of Things that recently received some media coverage.

But there are also many others, including, but by no means limited to: 

  • Mouse breathing data, (yes, mouse, not mouth) – my B.S. honors project. I examined a dataset of volume and pressure traces collected in a set of experiments with asthma-induced mice, and developed algorithms for breathing pattern recognition and for associated calculations of important breathing parameters. Oh, those poor, poor asthmatic mice. Not all of them survived the experiments, and in the data I could see, clear as day, the moment when one of those poor creatures took its last erratic breath…
  • Network traffic captures, wireless and wired, my love from back in the days of my wireless network security research. Do you know how many wireless packets are flying by you every single minute in an urban environment? Its madness. If you just run a sniffer in its standard settings, you see only a tiny fraction of what is out there, but if you get the right kind of a wireless card and tinker with it a bit to be able to receive packets not addressed to you, oh, the trove of traffic you momentarily collect!
  • Human typing patterns, that I played with for a keystroke biometrics authentication study project. Keystroke biometrics identify humans based on not what they type, but based on how they type it – the algorithms consider the inter-stroke timing and its consistency, as well as some more advanced parameters. This was the first study with human participants that I coordinated, and it gave me a real flavor of the remarkable complexity and richness of human-related aspects of data collection and analysis. How do you motivate humans to participate in your study? Does the human behavior change with the time of day? What about human tiredness – does that have an effect? And what about the day of the week? And how similar is the human behavior from one month to the next?
  • NYC subway turnstile data, made available by the MTA along with other developer resources, which can be used to deduce how many people went through each of the subway turnstiles during different times of day. I was looking to see how many people would step on the floor right underneath the turnstile, to see how much energy can be collected from that. And guess what? The subway floor actually gets stepped on much less than one would intuitively expect. Surely, the subway is crowded when you exit the train – along with everybody else! But what happens in-between trains, and what happens outside of the rush hour? The surprisingly low numbers I got are definitely correct once these things are taken into account.
  • Student demographics data, which I carefully collected, and the data of a student survey that we put together to access what students actually thought of the experience. Preliminary results just published, more fun analysis to come.

And that is not to mention the 3+ years of my daily training logs, with ~ 1000 lines of  custom MATLAB code developed around them over the years for all kinds of summary statistics and performance measures :).

This entry was posted in Data science. Bookmark the permalink.