Ir al contenido principal

Random memory adaptation model inspired by the paper: "Memory-based parameter adaptation (MbPA)"

Os dejo a continuación el contenido de un trabajo de investigation sobre machine learning que he realizado en mi escaso tiempo libre: https://github.com/Zeta36/random-memory-adaptation
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction.

I present in this repository a (very simple) random memory adaptation model (RMA) inspired by the Google DeepMind paper:
"Memory-based parameter adaptation (MbPA)" (https://arxiv.org/pdf/1802.10542.pdf)
In the paper, point 4.1. CONTINUAL LEARNING: SEQUENTIAL DISTRIBUTIONAL SHIFT (inside experiments and results), they study the improvements their model suppose in relation to the catastrophic forgetting problem.
They explored the effects of their model (MbPA) on continual learning, i.e. when dealing with the problem of sequentially learning multiple tasks without the ability to revisit a task. For this purposed they use the permuted MNIST setup (Goodfellow et al., 2013).
In page 6 they mention about that once a task is catastrophically forgotten, they found that only a few gradient steps on carefully selected data from memory are sufficient to recover performance.
That point got my attention and I wondered if the same improvement could be reached in an easier (and faster) way using a different approach.

Random memory adaptation model (RMA).

I played around with this idea and created a new model based in a random allocation access (instead of using as they did for MbPA a memory architecture similar to the Differentiable Neural Dictionary (DND) used in Neural Episodic Control (NEC) (Pritzel et al., 2017). The random memory I used has an asymptotic cost of insertion of O(1) and it retrieves pairs of {k,v} also in order O(1). This is much better than what a sorted tree (DND) can do.
The way the RMA works is very straightforward:
  1. We have a memory M = {(hi, vi)}. Keys {hi} are given by the embedding network (in the example of this repository the embedding function is the identity). More details in https://arxiv.org/pdf/1802.10542.pdf
  2. In each step i of the training process we append the current training batch {xi, yi} to the end of the memory M, if and only if m steps lapsed since the last inserting. I.e., we add only a tiny and random fraction of the whole training dataset into the memory M.
  3. In the training process, the backpropagation will be realized over an augmented batch composed by the current training batch i {xi, yi} plus a totally random tiny batch retrieved from the memory M.
  4. We repeat this continue learning process for the n tasks in the same model and without further modification.
That's all.

Results.

The development and use of the RMA is much easier than in the MPbA case, and also the performance of the algorithm is even better that in the DeepMind results. The RMA is able to keep the past knowledge almost perfectly and with a minimal impact in the training time. Moreover, the RMA model does not need access to the test dataset like it's the case in the MPbA.
The only two hyper-parameters we have to chose are the memory size and how much training data we are going to store in the memory.
In the case of the permuted MNIST setup for example it's enough to have sufficient memory allocation to store some examples of all the tasks in order to remember how to solve all of them. In fact, the task n only is forgotten once that all their past "memories" have been overwritten due to a memory no big enough (in relation to the number of tasks).
This is the DeepMind MbPA results in the permuted MNIST (Solving 20 tasks):
These are the results of a RMA with a memory of size 15.000 and where we memorize only 10 batches of the 10.000 used in each task, we can see we still got a great continuity learning ratio in test time.
Solving 10 tasks:
Solving 20 tasks:
Finally, these are the results of the same RMA with a memory of size 15.000, 10 batches of each 10.000 but resolving 40 tasks.
We can see that as long as there is memory for keeping some examples of every task the continual learning rate is very stable against the test dataset...and at the end we are just using a two layers MLP for all the job!! The augmented random memory really seems to do a great work against the catastrophic forgetting.

Future work.

I'd like to check this same model in other tasks and also use a different embedding function (network) different of the identity one used here. Also, there are a lot of possibilities to improve these results. An easy way would be just to keep an independent memory (Mi) for every different task in a similar way Neural Episodic Control(NEC) does for each action in the action space (Ma). Then we would retrieve a random equal proportion from every memory in any backpropagation step.
It'd be interesting too apply this method to some complex supervised task and look for some kind of improvement.

Conclusions.

There is no doubt that the use of some kind of external memory like the one used in MbPA or the used in Neural Episodic Control (NEC) (Pritzel et al., 2017) is going to be a fundamental part of any model able to solve the main shortcomings of neural networks such as catastrophic forgetting, fast, stable acquisition of new knowledge, learning with an imbalanced class labels, and fast learning during evaluation. I showed here that maybe an easy way to solve some of these problems resides just in the use of a random and fast external memory that stores "memories" of past events seen during the training process. It may be possible that we don't need to weighted over the whole memory, and that we don't need neither to use a sorted tree with a kernel able to return the nearest examples.

Entradas populares de este blog

¡Más potencia!

«¡Es la guerra! ¡Traed madera! ¡Más madera!»  (Los hermanos Marx) Introducción. El mundo de las ciencias de la computación están estos días de enhorabuena, un nievo hito histórico acaba de acontecer: hablamos por supuesto del casi milagroso desarrollo de Google DeepMind denominado AlphaZero , un modelo neuronal capaz de aprender de manera autónoma no supervisada (sin apoyo de datos etiquetados ofrecidos por el hombre) a jugar con capacidades sobrehumanas a varios juegos milenarios como el Go y el ajedrez ( aquí podéis descargar el paper de este proyecto). DeepMind acaba de demostrar así que la metodología que utilizaron para que un modelo neuronal aprendiera (con capacidades sobrehumanas) por sí misma sin apoyo de datos humanos el juego de Go, es generalizable a cualquier otro tipo de juego o situación. En el arriba comentado paper nos explican por ejemplo como en 4 horas (sí, sólo 4 horas), la red neuronal fue capaz de aprender a jugar al ajedrez (entre otros juegos) con una ca...

Replicando el desarrollo de Google DeepMind: AlphaGo Zero

Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0. If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society.  (Profesor David Silver) Hace unos meses   Google DeepMind   hizo público uno de sus resultados más asombrosos: una versión del modelo neuronal que fue capaz de derrotar al campeón del mundo de   Go , solo que esta vez no necesitaron hacer uso de ningún aprendizaje supervisado de juegos entre humanos (hablé en este mismo blog en   ...

Sobre el mito de la caja negra en el campo de la inteligencia artificial

En relación a esta  buena entrada de Santiago  donde trata el hito que  DeepMind  ha logrado con el sistema de inteligencia artificial  Alpha Zero , me gustaría comentar algo sobre la cuestión que más se malinterpreta actualmente de la moderna IA: ¿es cierto que no sabemos cómo hace lo que hace? ¿Se trata realmente de una misteriosa caja negra inexpugnable? Pues bien, la respuesta es no y no. Sabemos perfectamente (los que se dedican e investigan en este campo) por qué la moderna IA hace lo que hace y cómo lo hace. Y lo de "la caja negra" pues...sencillamente es un mito sensacionalista. Todo el machine learning actual ( Alpha Zero  incluido) es el resultado de procesos matemáticos algebraicos trabajando sobre números reales. Más en concreto, millones de operaciones de sumas y multiplicaciones tensoriales sobre un conjunto de (millones) de números reales almacenados en un fichero para tal fin. Como veis no hay misterio ni "magia" por ninguna parte. Y...