Colin's Notes

Optimization Part V: Applying Data Oriented Design Principles

Optimizing Data Flow on a Data Intensive C++ Application

Posted on September 27, 2020

Today I’m going to focus on data-oriented design inspired optimizations. Previously, I also did a handful of API improvements made obvious from a flame graph profile, and then replaced all uses of the STL “unordered_map” type with a more efficient implementation of a flat hash map. [Read More]

Notes for a Sequel/JRuby/SQLite Bug

A Bug in Sequel, JRuby + SQLite DATETIME type columns

Posted on September 26, 2020

Accessing SQLite3 DATETIME column data with the Sequel gem “jdbc” SQLite adapter produces different date types for the DATETIME columns than does the MRI Sequel adapter. So, you get ‘Time’ objects in the result sets when using standard Ruby, but ‘Date’ types when using JRuby. The ‘Date’ objects don’t have a time component; the ‘Time’ objects have both date and time. [Read More]

Tags: Software Development Ruby SQLite JRuby

Exploring the Crystal Language

Type Inference

Posted on September 18, 2020

As a long time Rubyist I’ve been intrigued by the Crystal language for a while. Crystal is a compiled statically typed language that uses Ruby syntax pretty much wherever it can. Now the Crystal language approaches a 1.0 release later this year and I wanted to try it out. [Read More]

Tags: Software Development Ruby Crystal

Developing Software is Developing Knowledge

Why Agile Methodologies Work

Posted on September 7, 2020

Software development is the art of transforming vague requirements into precise statements executible by a machine, resulting in a working software system. Almost by definition you can’t begin a project with perfect requirements; if you did so they would be executible or translatable on their own with no need for further development. The reason “Agile” development methodologies have persisted is that they are designed around this basic truth. [Read More]

Tags: Software Development Agile

Optimization Part IV: Profile Guided Optimization

Posted on September 2, 2020

In a previous article on optimization we looked at how to read a flame graph and discover areas of a program that could benefit from optimization by re-writing the source code. [Read More]

Tags: Software Development C++

If You Liked Pascal

Glimmers of Pascal in Three Modern Programming Languages

Posted on August 14, 2020

Each of these relatively new languages takes after Wirth family languages like Pascal or Oberon, and particularly Turbo Pascal. In spirit or small details or tooling you’ll find something familiar. Here are my sort of random observations on Nim, Kotlin, Go [Read More]

Tags: Software Development Turbo Pascal Nim Go-lang Kotlin Free Pascal

Optimization Part III: Better Hash Tables

Posted on August 5, 2020

After you have done the obvious algorithm commplexity analysis on your C++ application in code you’ve written, what’s next on the list for ways to optimize your application? How about looking for more efficient implementations of standard data structures in libraries? [Read More]

Tags: Software Development C++ Flame Graphs

Optimization Part II: Targeted Optimizations Assisted by Flame Graphing

Posted on August 1, 2020

Early last year IPUMS moved production of IPUMS-International micro-data to the latest version of the core DCP and a new data editing API. In doing so we discovered a number of places where the new API – while performing better than the old one on our USA and CPS test datasets – performed worse than expected on some of the IPUMSI datasets. Not a big deal except for a few datasets that took twenty or thirty times longer to process than we would expect. [Read More]

Tags: Software Development C++ Flame Graphs

Optimizing a Data-Intensive C++ Application, Part I

Posted on July 25, 2020

At IPUMS we continuously enhance our data products with newly available datasets, adding new variables and improvements to existing variables. We do this with the “Data Conversion Program”, a C++ application built to transform census and survey data into “harmonized” micro-data. When you visit ipums.org and make data extracts, you’re downloading data developed with the DCP. [Read More]

Tags: Software Development C++

Python 3 Language Notes

Posted on July 3, 2020

Notes on the Pythone 3 Language [Read More]

Tags: Software Development Python