Optimization Part V: Applying Data Oriented Design Principles

Optimizing Data Flow on a Data Intensive C++ Application

Today I’m going to focus on data-oriented design inspired optimizations. Previously, I also did a handful of API improvements made obvious from a flame graph profile, and then replaced all uses of the STL “unordered_map” type with a more efficient implementation of a flat hash map. [Read More]

Notes for a Sequel/JRuby/SQLite Bug

A Bug in Sequel, JRuby + SQLite DATETIME type columns

Accessing SQLite3 DATETIME column data with the Sequel gem “jdbc” SQLite adapter produces different date types for the DATETIME columns than does the MRI Sequel adapter. So, you get ‘Time’ objects in the result sets when using standard Ruby, but ‘Date’ types when using JRuby. The ‘Date’ objects don’t have a time component; the ‘Time’ objects have both date and time. [Read More]

Developing Software is Developing Knowledge

Why Agile Methodologies Work

Software development is the art of transforming vague requirements into precise statements executible by a machine, resulting in a working software system. Almost by definition you can’t begin a project with perfect requirements; if you did so they would be executible or translatable on their own with no need for further development. The reason “Agile” development methodologies have persisted is that they are designed around this basic truth. [Read More]

Optimization Part II: Targeted Optimizations Assisted by Flame Graphing

Early last year IPUMS moved production of IPUMS-International micro-data to the latest version of the core DCP and a new data editing API. In doing so we discovered a number of places where the new API – while performing better than the old one on our USA and CPS test datasets – performed worse than expected on some of the IPUMSI datasets. Not a big deal except for a few datasets that took twenty or thirty times longer to process than we would expect. [Read More]

Optimizing a Data-Intensive C++ Application, Part I

At IPUMS we continuously enhance our data products with newly available datasets, adding new variables and improvements to existing variables. We do this with the “Data Conversion Program”, a C++ application built to transform census and survey data into “harmonized” micro-data. When you visit ipums.org and make data extracts, you’re downloading data developed with the DCP. [Read More]