Youniversity – how federated learning will change search

Mar 17th, 2021

Potentially revolutionary, definitely an antitrust lawsuit waiting to happen, federated learning is Google’s machine learning answer to the personalised internet necessary for the future without the personal information necessary for our privacy.

Federated learning, or federated learning of cohorts as a recent announcement on its use for paid media names it, is a machine learning process which uses edge devices to train a central model. There’s a great proof of concept piece here, but roughly translated, this is a way of allowing users to anonymously train a machine learning model to better serve their needs without sacrificing their personal data.

As you can see from the image, the ‘upside down’ method hosts the main machine learning model centrally while edge devices are used to train iterations of the latest model. These are then reuploaded to the main servers and used to train the next iteration of the model which is then sent back to users to train again. As it is the model that’s downloaded and uploaded each time, there is no need for the user’s personal data to leave their device.

The term, first coined by a Google paper in 2016, has become an active area of study in the years since – and was officially announced by Sundar Pichai at Google I/O 2019. The interest has since generated more than a thousand papers since (many of which can be found on Arxiv.org).

Despite the growing academic research around the topic, however, there’s a pretty good outline of the technique (and clues as to its potential uses) in that initial paper:

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning.

Source: H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas Communication-Efficient Learning of Deep Networks from Decentralized Data (2016)

Since then, Google has followed a semi-open source approach to the technique, with TensorFlow Federated offering them the best possible argument against antitrust lawsuits – as it offers the opportunity to use the open source framework to develop their own unique federated learning models.

There are also companies such as OpenMined (there are a few actually, but I liked the website, and their name is a pun, so they win) which are extending PyTorch and Tensorflow for data analysis. The mission statement on their website is as follows:

With OpenMined, people and organizations can host private datasets, allowing data scientists to train or query on data they “cannot see”. The data owners retain complete control: data is never copied, moved, or shared.

This makes the method ideal for privacy conscious development of machine learning models and paves the way for a variety of business uses which offer users the security of total data privacy.

It’s difficult to know for certain, but the early applications of the technique and announcements surrounding it have all been privacy related – and there’s little reason to doubt that that will have been a key concern even if the process began in 2013/4. It’s been noted for some time that privacy concerns have been growing at around the same rate as demand for personalised experiences of the web and the two issues seemed to cause an intractable problem.

However, with a remote learning model which can still employ the best of Google’s Tensorflow neural network learning methods, Google has the potential to provide a leading privacy focused advertising and UX service – while retaining a level of plausible deniability for anti-trust lawsuits if they can provide a workable opensource option for other brands to create their own.

However, the technology has utility beyond advertising and could well tie in with other Google products – though Google has often advised that it doesn’t personalise search results, it does offer dozens of other related products, such as Google Discover, Google Assistant, YouTube (especially as the existing recommendations algorithm has come under fire repeatedly for its role in radicalisation) and many recommendation and map related resources.

If Google can train the Google Assistant in the same way, however, then it could prove to be a decisive factor in the battle it is waging with various other digital assistants – and a huge leap forward in the performance and usefulness of voice search.

We mentioned the possibility that federated learning might be Google’s answer to the end of cookies at the end of 2020 in our 2021 trends report, and the reason we made that assumption is that it lends itself to the use so well.

The purpose of federated learning in the post cookie world will be as a predictive model – like a supercharged, privacy focused ‘similar audiences’ targeting option. By training the model on user devices, and constantly updating the model, it will be possible for the updated model to infer interests from specific actions without requiring any third-party data.

Google has already stated they expect to see 95% of the performance of cookie-based advertising – and that is likely to improve as the model receives real world training over the next few years. It’s not unreasonable to assume, even, that performance will exceed what is presently expected of cookie-based paid media. Google’s section on FLoC on their Privacy Sandbox page states:

We’re encouraged by what we’ve observed and the value that this solution offers to users, publishers and advertisers. Chrome intends to make FLoC-based cohorts available for public testing through origin trials with its next release in March and we expect to begin testing FLoC-based cohorts with advertisers in Google Ads in Q2. If you’d like to get a head start, you can run your own simulations (as we did) based on the principles outlined in this FLoC whitepaper.

All this means that we have two major changes to search – one is that FLoC will mark a huge shift in online privacy – allowing for greater personalisation without personal data, the second, less positive change is that we could see Google become even more opaque when it comes to providing brands with data – whether for organic or paid search.

While the number of possibilities that FLoC opens up is huge, how major these changes are will rely on how well early tests perform, and whether or not Google continues to opensource the project so that it doesn’t face legislative issues down the road.

One thing is certain, however, it does represent a major step forward in integrating machine learning into the everyday use of the internet – and it’s likely to be the first of many.

Contact us today

if you want help keeping up to date with the rapidly changing world of search and digital marketing.

let's chat
Facebook Twitter Instagram Linkedin Youtube