As you may have read I have been playing with GIS data.  Much of the sample data I have to play with is broken down by postcode.  Which makes life really simple, image here for example is a map of the number of aardvark owners in a particular region plotted by registered aardvark club houses.  This map lets the club owners see the current club houses and distribution of people by postcode.  This way they could see if they need to site a new clubhouse or the effects of moving one.

I had a list of all  (most at least) the UK postcode sectors by lat/long.  My member data was by partial postcode. So the first thing I had to do was modify my matching postcode data to fit with my people data.   I did this by creating a new table of larger postcode areas and an average centre point for that new location.

insert into avgPcode
   select substring(pcode,0,len(pcode)), avg(lat), avg(long)

You get the idea.  I could then create my plotting data, which I did in xml, and as it was a knock up I just cut and pasted the xml from my sql and created a really bad sql statement that just created the XML, but with the columns for reference, similar to this (minus bad string concat for xml)

SELECT   member.pcode,, avgpcode.long, count(*)  as density
FROM     member
         INNER JOIN avgpcode ON avgpcode.pcode = member.pcode
         inner join locations on member.loc_id = location.loc_id
group by member.reg_pcode, member.loc_id, location.loc_icon,, avgpcode.long
order BY count(*) desc

Doddle really.  Well maybe not the select above is missing the code that determines density to icon size used in the Live maps GeoRss plot.

But the process has got me thinking about how to plot population density data that is more precise, supposing that my member data was not by postcode, a nice handy region grouping identifier, but by lat/log or full postcode and lat/long.

Taking the Full postcode example first.  I could strip it down to postcode section (2km radius) and plot densities that way simple group by pcodesector.pcode, count(*).

I also thought I could break it down by postcode sector, but rather than use the centre of the sector stored in the pcode file, I could calculate a weighted average using number of members and their lat/long to place the centre point within that 2km area.  For many purposes this would be more than enough.

I’d write the sql, but I don’t have the data to test so I won’t, but you get the gist.

Another approach I was toying with but still not sure what to do with is something called a QuadTree, the first picture at the top is a rough idea of what it is.

I started by thinking if we grid up the map into small enough sections we can store a population figure by grid, and plot circles of differing diameters based on that number and a scale.  The more people you throw at this and the more squares you have then that’s an awful lot lot of data as you drill down.  So I had an idea that based on a pre-determined threshold number (and that number could be one) you count the number of people in a square and if it is greater than the threshold then that square is split into four equal parts and you recurse the function.  In fact the whole thing is a recursive algorithm when doing the calculation.  I spoke to my games programmer friend who said its called a QuadTree and is used when drawing landscapes in 3d where height is concerned.

Here is what he said

I think you have rediscovered something called a “quad tree” – what you describe is the exact same thing you do when preparing a landscape for rendering – except the threshold value used is the flatness of the squares.   Basically, you divide a landscape defined by a height map (a rectangular grid of height values) up into large squares then for each square see how “unflat” it is.  If it fails the threshold value then you subdivide it and start again.

You wind up with a recursive structure like
                Object *m_TopLeft;
                Object *m_TopRight;
                Object *m_BotLeft;
                Object *m_BotRight;

Then as you process from the “root” object, you say
Func Process(Object *p) {

   If (all pointers are NULL) {
      Draw the square

In games we do this kind of thing real-time. 

Info – there are also “Oct Trees” which are used to subdivide volumes rather than flat sheets for shadowing, occlusion, Visibility culling, voxel representations,  etc etc etc

If you think carefully about your data representation you should be able to process it all quite fast.

Did you get that ?

So how am I going to use it,  I have some ideas.  They involve MS live maps and this algorithm, I can’t figure it all out yet but consider that

    1. The data would be pre-calculated on this Quadtree algorithm
    2. Each recursion level would be stored, even if no split is made the lat/long extremities of the parent are stored so that a simple sql query by threshold level can be made.  Imagine the Diagram above in 3d sliced into recursion levels in the z-plane.
    3. Live maps have 19 zoom levels
    4. At zoom level 14,  0.5 miles we move from landsat to aerial photographs
    5. somehow match zoom levels to recursion levels
    6. have indicator showing number of items in total plotted in density plot, option to switch to single plot on button.  Pull data direct from DB using live Map co-ordinates
    7. when plotting population densities take into account zoom level in calculation of diameter, a 22 pixel diameter on zoom level 1 blocks out the UK
    8. in fact in relation to 6 dont include items that fall outside a specific plot diameter

Anyone who has done any of this and feel like contributing please leave a comment.  In the mean time I’m gathering some public POI data to test stuff like this with.