
No Black Boxes
Demand Data
We model all of our own demand side data (Geo-demographics, Consumer spend and Consumer wealth). The process relies on four key data inputs: First, Statistics Canada releases the full Census variable list of demographic variables (over 2,200) at the dissemination area or for about every 250 households. Second, Canada Post releases 6 digit postal codes with individual coordinates and household counts annually. Third, Statistics Canada releases quarterly or annual surveys for some of the data for years between the Census. Fourth, Statistics Canada releases a detailed Consumer Price Index (CPI).
The process begins with survey data for the most recent year, which can be at the Provincial/Territory level (13), Census Division (293 regions), Census Metropolitan Area (35 regions), Economic Region (76 regions) or as often is the case, some aggregation or combination of geography. This data is then distributed down to the dissemination area (DA) level using the most recent Census data.
The limited availability of data at the postal code level creates a debate about the most appropriate level of detail for data products. DA level data is easily the most accurate because it’s the lowest level of geography where the full census variables are released. DA level data, however, lacks the convenience of postal code level data, particularly for targeted marketing and customer level analysis. For example, postal code market data can easily be linked to customer level data or other survey data for market share and potential analysis where the postal code is known. It’s effectively a convenience versus accuracy choice.
Exceed navigates this trade-off between accuracy and convenience by estimating data at the most accurate DA level. Postal code data at the 6 digit level, which includes household numbers, is then used to distribute the DA level data down to the individual latitude and longitude of each postal code. This ensures that data distribution mirrors the geographical population growth in Canada. It also means that although individual postal codes within a DA may have different population numbers, they will all have the same average value (i.e. demographics).
In lower population DAs, where the number of actual records for a particular variable are less than 4, Statistics Canada suppresses the data to protect the confidentiality of individual respondent’s personal information. This shows up as missing values, which are estimated using historical data and relationships.
Population estimates are based on Statistics Canada’s annual population survey which is released at the most detailed level (all ages and gender) by Census Division. The distribution to lower levels of geography are dependent on the most recent Canada Post household counts. Household counts are aggregated and reconciled to DAs dissemination area and dissemination block average household size control totals from the latest Census. The end result is a comprehensive estimate of population synchronized to Canada Post household counts.
Statistics Canada also released population and dwelling counts at the more detailed dissemination block level (489,676 regions) for the Census. While this would add additional detail to the population distribution within each DA, we don’t use this data because of a misalignment between dwelling counts from Canada Post postal code coordinates and Statistics Canada geography. Instead, we rely on the dwelling count relationship at the more aggregated DA level for a higher level of accuracy.
Income estimates are based on Statistics Canada’s annual income survey and taxfiler data when applicable. Distribution to lower levels of geography is based on the income distribution from the latest Census income distribution combined with the impact of population and household counts from Canada Post. This assumes that population growth within a particular geographic region will have similar income and demographic characteristics to existing households.
The weakness of the income data is that its upper end category is too low. The upper end of the distribution for individuals and household income is $150,000 and over, and $200,000 and over, respectively.
Marital status, visible minority, labour force, occupation and visible minority estimates follow a similar process to income and population. Annual surveys for each category at various levels of geography are modeled down to the DA level using the latest Census data and then distributed across 6 digit postal codes using the latest population estimates and household counts.
Family Structure, Education, Ethnic Origin, Mother Tongue and Language Spoken Most Often at Home variables do not have current annual surveys. As a result, these estimates are based on the latest Census data at the DA level and then ratio-adjusted to the most recent household and population counts at the postal code level. What this means is that when household and population counts increase as a result of the growth in the Canada Post postal code data, the specific ethnic mix within a DA is assumed to remain unchanged from the last Census.
The fact that population growth in Canada is primarily based on immigration makes this an important assumption. And while immigration from country of origin and province of destination is available, the data is not available together. This means that to add further detail and model actual immigration from country of origin to individual Canadian neighbourhoods, it requires many qualitative assumptions open to debate and therefore additional error.
Consumer spend data is based on Statistics Canada’s annual Survey of Household Expenditures. Provincial survey expenditures are estimated by household for each income quintile (5 levels). This survey data is then modeled to the DA level based on the latest Census income quintile data and then to the postal code level using average household expenditures by DA.
The most technical part of modeling spend data involves imputing missing survey values where survey responses are too low and therefore suppressed. Data is imputed using a variety of methods including previous surveys adjusted for growth using other quintiles and the use of aggregated regional geography ratio-adjusted for provincial differences.
Consumer wealth data is based on the Statistics Canada Survey of Financial Security that is typically available every second year. Like consumer spend data, wealth survey results are available by income quintile and therefore estimated at the DA level using the latest Census data similarly to consumer spend data. For the Northern Territories, which are not covered by the survey, data is modeled from a financial health index survey by the Canadian Council on Social Development.
There are a number of potential issues with the wealth data. First, data is self-reported which is easily totaled for financial items, but it is much more difficult for survey respondents to report non-financial assets, particularly real estate and business equity. Second, the timing and nature of survey results, coupled with the pace of change in variables like real estate, means values will be lagging and lower than actual values in real time.
Supply Data
The business to business (B2B) company directory (lead) data industry has changed significantly with the advent of the World Wide Web and the emergence of effective web scraping tools to capture the data. Search engines like Google use a type of web scraping (screen scraping), which has also become the normal collection method for B2B company directory data. Most of this data, however, is not as complete as one would expect. The problem is that scrapping is complicated and even controversial, with some websites employing anti-scraping measures. The end result is that most scrapping methods are not complete – they collect data that can be used for lead generation but miss too much data to be effectively used in location intelligence.
What differentiates Exceed B2B company directory data is that we are first and foremost users of our own data. This means that it is our own need for accurate and comprehensive company directory data that brings us into the B2B data market. Thinking in LAYERS, our location intelligence modelling solution for all Canadian retail, requires supply side retail competitor data that is accurate and complete for all retail categories in Canada. It’s how we eat our own data, and therefore our in-house quality control. We don’t sell data until it’s accurate and complete enough for our own location intelligence purposes – because inaccurate or incomplete input of competitor data equals bad location decisions.
Google Maps has long been the gold standard for general navigation, but it has also become the standard for its map points of interest (POIs), including businesses. It’s become a necessity for retailers to be located on Google Maps, and it’s why we use it as the gold standard for our Canadian B2B data for most of our retail categories. Data that will help you gain a competitive advantage to finding new opportunities to scale your business – to find and reach your business target audience, data analytics, market research, competitive analysis, and business intelligence.
Exceed is partnered with Applied Geographic Solutions, a leading US supplier of small area data, that also brings over 30 years of data modeling experience to the table. We believe that collaboration makes for the best-in-class solutions on both sides of the border. It’s interesting to note that US small area data is modeled at the Block Group level, the equivalent to the Canadian DA, and the lowest level of geography where both countries release their full census variables. Canadian DAs have roughly half the population of the US Block Group, which makes data suppression a greater problem and therefore also the further modeling of data down to the 6 digit postal code. Nonetheless, even with the higher Block Group population, US data isn’t modeling down to the ZIP+4 level, which is the equivalent to the Canadian 6 digit postal code.