Big data and representativity: ‘Data is not research’

Reading time: 5 mins

We tend to think that vast amounts
of data are representative by definition, but that is not necessarily the case.
Big data should also be subjected to stress tests. Three experts enlighten us
on the Uber’s and Airbnb’s of online access panels, the rise of the robots and the
promise of artificial intelligence (AI) in this article extracted from the ESOMAR 2019 Global
Market Research report and
edited for Research World.

Caroline Frankum, Global CEO of Kantar’s Profiles Division

With humans leaving an
ever-increasing digital fingerprint of what they think, feel, say and do,
computationally analysing increasing volumes of data to reveal key patterns,
trends and associations has – undoubtedly – been key to delivering actionable
insights at scale over the last few years. This observation, made by Caroline
Frankum, Global CEO of Kantar’s
Profiles Division, comes with an added warning though. “Following the GDPR’s
implementation last May, and as data privacy legislation intensifies – e.g. California Consumer Privacy Act (CCPA)
coming into effect in January 2020, and the growing uncertainty of third-party
cookies – it is increasingly apparent that it is not the amount of data that’s
important, but what organizations can do with data compliantly that matters.”

Even so, it is still often
believed that vast amounts of data are representative by definition, notes Andrew Konya, CEO of Remesh, a platform that allows users to get qualitative insights at a
quantitative scale to make better decisions. “We do encounter the common
misconception that more data – bigger N – means higher confidence in results.
However, most researchers seem very aware of how a non-representative sample of
participants in quantitative research translates to lower quality results.”

Presidential
poll

Pete Doe, Chief
Research Officer at clypd, an
audience-based sales platform for television advertising, observed earlier that
online access panels have enabled cheap survey research to proliferate in the
past decade. He now confirms that there is a natural tendency for people to
think ‘bigger is better’, even among some professionals with some statistical
training. “People are taught that margins of error reduce as sample sizes
increase, but they aren’t always taught about biases in measurement. So yes, it
is a fairly common misconception, but this is not a new problem.”

Honesty
and transparency

Big data keeps revolutionising the
insights industry at breakneck speed, almost as if it deliberately ignores the
many red flags. But the representativity concerns are very real and need to be
tackled. For Doe, the main question that any practical research user has to
answer is: ‘Will I make a better decision if I use this data source?’ The
answer to that question is not always clear, he stresses.

Some of the concerns about
representativity are actually concerns about validity. “Inferring people’s
behaviour from device activity is not simple, especially if the device is used
by different people at different times.”

As for red flags, Doe thinks that
users should be wary of companies that are unwilling or unable to explain their
data sources. “No-one with a realistic outlook expects perfection in research,
but we should expect honesty and transparency.”

Billions
at risk

Pete Doe, Chief Research Officer at clypd,

Whilst online access panels have
enabled cheap survey research to proliferate in the past decade, the results
obtained from these convenience panels can be biased and unrepresentative due
to the sampling methods employed. Several checks and stress tests could be
applied in order to diminish these effects.

Doe believes these checks need to
reflect the use of the data and the money at risk. For media transaction data
(ratings), oversight from Joint
Industrial Councils (and the Media Rating Council in the USA)
is worth the expenditure to assure quality when billions of dollars of ad spend
ride on the data, he emphasises.

“And these services don’t use
cheap online survey research data for that very reason. For research with a
narrower scope and less money at risk, users should at least expect
transparency around data sources, the recency of any classification data,
details about the survey, response rates, sample sizes, data editing and
projection methods.”

Uber’s
and Airbnb’s

In order to illustrate the
low-budget online panel trend, Frankum makes a telling comparison:
“Tech-enabled entrants offering cheaper online access panels are like the Ubers
and Airbnb’s of the panel world – they do not own any panellists but focus on
renting opt-in panel assets to leverage their technical assets.”

“Clients are increasingly looking
for richer profiles of consumers. This means compliantly matching behavioural
data to proprietary profile attributes to create more addressable audiences,
before a single survey question has been asked. This is something the cheaper
online access panels are not set up for, or accredited to do. So, their data is
more limited in how it can be used.”

Increased
integration

The use of big data for gaining
insights is still very much in development, and will see further changes over
the next few years. Doe believes it will continue to grow, and there will be
even cheaper options available. “But these will be unlikely to deliver quality
findings. There could be attempts to harness AI to synthesize insights, for
example by inferring behaviours and attitudes from online data, perhaps
harnessing voice activated device data, within privacy constraints.”

The infrastructure and algorithms
people refer to as big data will enable increased integration between passively
collected data, across multiple channels, and first-person research data,
predicts Konya. The result is likely to generate insights which are
increasingly likened to behavioural impacts.

Profound
shift

Doe stresses that data is not
research. “Data is a raw material that needs to be refined before it can be
used as part of a research study, and that requires research and statistical
expertise.” As having lots of data becomes easier, Konya thinks people will
shift their focus from ‘what is the N?’ to ‘how confident are we in this
result?’ That shift will likely move the industry away from working to deliver
the minimum cost per participant – or data point – to working to deliver the
lowest cost per unit confidence.

At Kantar, Frankum and her people
are currently learning a lot from working with AI. “AI is also something to
keep a close eye on when it comes to making big data more representative for
insights and market research in the future.”

This article is an excerpt of the original “Big data
and representativity”, published in the 2019 Global Market Research report.
Read the full content by accessing the report, here.

Source link

Presidential poll

Honesty and transparency

Billions at risk

Uber’s and Airbnb’s

Increased integration

Profound shift