Government Data, Apis, and the tipping point

I've been discovering the mass of information that is created by the government and is very slowly making its way online. I will be posting my discoveries from time to time.

The process of making public information available online seems to go something like this:

  1. Law requires that information be collected and disseminated

  2. People submit forms etc recording such information

  3. Others then scan or otherwise process this information and puts it into a database "silo"

  4. The database is put online onto a "silo" web site

  5. Other NGOs or agencies then somehow get that data (often by scraping the web sites) and make it API accessible

It seems like there may be one or more years between each of these steps, and that different parts of government move at different rates, in fact right now, some of the total universe of government information is at a pre-1 stage while other parts may be at stage 5.

I've been slowly working my way through representative parts of these APIs to try to come up with a cogent way to describe it (or at least to meta-describe it, as "it" is constantly changing.)

In the meanwhile, I came across this interesting blog post which has relevance to this quest:

Is Data at a tipping point? In the blog post he says:

"[…]A similar phase transition has already occurred with regards to data inside business ecosystems. For the past several decades, an increasing number of business processes– from sales, customer service, shipping - have come online, along with the data they throw off.

As these individual databases are linked, via common formats or labels, a tipping point is reached: suddenly, every part of the company organism is connected to the data center.

And every action — sales lead, mouse click, and shipping update — is stored. The result: organizations are overwhelmed by what feels like a tsunami of data.[…]" (from Is Big Data at a tipping point?