Ir al contenido principal

Implementación en TensorFlow de la red neuronal generativa (WaveNet) enfocada en la generación de texto

Enfocado en el reciente trabajo teórico de Google DeepMind (WaveNet), he desarrollado e implementado una versión en TensorFlow de dicha arquitectura, pero realizando modificaciones para enfocar el diseño en la generación automática de textos (el paper original es: https://arxiv.org/pdf/1609.03499.pdf).

El modelo ideado originalmente por DeepMind fue divulgado con detalle en este artículo: https://deepmind.com/blog/wavenet-generative-model-raw-audio/, y mi implementación del mismo (con las modificaciones necesarias para enfocarlo todo en la generación de texto), se encuentra en el siguiente repositorio de mi cuenta personal en GitHub: https://github.com/Zeta36/tensorflow-tex-wavenet. Se trata como podéis ver, de un desarrollo Python usando el framework de Google, TensorFlow.

Os dejo a continuación toda la información técnica sobre mi trabajo, aunque siento no tener tiempo para traducirlo y simplemente copiaré a continuación la introducción que hice del mismo en inglés:

A TensorFlow implementation of DeepMind's WaveNet paper for text generation.


This is a TensorFlow implementation of the WaveNet generative neural network architecture for text generation.

Previous Work

Originally, the WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).
This network models the conditional probability to generate the next sample in the audio waveform, given all previous samples and possibly additional parameters.
After an audio preprocessing step, the input waveform is quantized to a fixed integer range. The integer amplitudes are then one-hot encoded to produce a tensor of shape (num_samples, num_channels).
A convolutional layer that only accesses the current and previous inputs then reduces the channel dimension.
The core of the network is constructed as a stack of causal dilated layers, each of which is a dilated convolution (convolution with holes), which only accesses the current and past audio samples.
The outputs of all layers are combined and extended back to the original number of channels by a series of dense postprocessing layers, followed by a softmax function to transform the outputs into a categorical distribution.
The loss function is the cross-entropy between the output for each timestep and the input at the next timestep.
In this repository, the network implementation can be found in wavenet.py.

New approach


This work is based in one implementation of the original WaveNet model (Wavenet), but applying some modifications.
In summary: we are going to use the WaveNet model as a text generator. We'll use raw text data (characters), instead of raw audio files, and once the network is trained, we'll use the conditional probability found to generate samples (characters) into an self-generating process.

Only printable ASCII characters (Dec. 0 up to 255) is supported right now.

Results


Pretty interesting results are reached!! Feeding the network with enough text and training, the model is able to memorize the probability of the characters disposition (in a lenguage), and generate later even a very similar text!!

For example, using the Penn Tree Bank (PTB) dataset, and only after 15000 steps of training (with low set of parameters setting) this was the self-generated output (the final loss was around 1.1):

"Prediction is: 300-servenns on the divide mushin attore and operations losers nis called him for investment it was with as pursicularly federal and sotheby d. reported firsts truckhe of the guarantees as paining at the available ransions i 'm new york for basicane as a facerement of its a set to the u.s. spected on install death about in the little there have a $ N million or N N bilot in closing is of a trading a congress of society or N cents for policy half feeling the does n't people of general and the crafted ended yesterday still also arjas trading an effectors that a can singaes about N bound who that mestituty was below for which unrecontimer 's have day simple d. frisons already earnings on the annual says had minority four-$ N sance for an advised in reclution by from $ N million morris selpiculations the not year break government these up why thief east down for his hobses weakness as equiped also plan amr. him loss appealle they operation after and the monthly spendings soa $ N million from cansident third-quarter loan was N pressure of new and the intended up he header because in luly of tept. N million crowd up lowers were to passed N while provision according to and canada said the 1980s defense reporters who west scheduled is a volume at broke also and national leader than N years on the sharing N million pro-m was our american piconmentalist profited himses but the measures from N in N N of social only announcistoner corp. say to average u.j. dey said he crew is vice phick-bar creating the drives will shares of customer with welm reporters involved in the continues after power good operationed retain medhay as the end consumer whitecs of the national inc. closed N million advanc"

This is really wonderful!! We can see that the original WaveNet model has a great capacity to learn and save long codified text information inside its nodes (and not only audio or image information). This "text generator" WaveNet was able to learn how to write English words and phrases just by predicting characters one by one, and sometimes was able even to learn what word to use based on context.

This output is far to be perfect, but It was trained in a only CPU machine (without GPU) using a low set of parameters configuration in just two hours!! I hope somebody with a better computer can explore the potential of this implementation.

If you want to check this results, you just have to type this in a command line terminal (this will use the trained model checkout I uploaded to the respository):
python generate.py --text_out_path=output.txt --samples 2000 
./logdir/train/2016-10-02T10-45-10/model.ckpt-14999

Requirements


TensorFlow needs to be installed before running the training script. TensorFlow 0.10 and the current master version are supported.

Training the network


You can use any text (.txt) file.

In order to train the network, execute
python train.py --data_dir=data
to train the network, where data is a directory containing .txt files. The script will recursively collect all .txt files in the directory.
You can see documentation on each of the training settings by running
python train.py --help
You can find the configuration of the model parameters in wavenet_params.json. These need to stay the same between training and generation.

Generating text


You can use the generate.py script to generate audio using a previously trained model.

Run
python generate.py --samples 16000 model.ckpt-1000
where model.ckpt-1000 needs to be a previously saved model. You can find these in the logdir. The --samples parameter specifies how many characters samples you would like to generate.
The generated waveform can be stored as a .txt file by using the --text_out_path parameter:
python generate.py --text_out_path=mytext.txt --samples 1500 model.ckpt-1000
Passing --save_every in addition to --text_out_path will save the in-progress wav file every n samples.

python generate.py --text_out_path=mytext.txt --save_every 2000 --samples 1500 model.ckpt-1000

Fast generation is enabled by default. It uses the implementation from the Fast Wavenet repository. You can follow the link for an explanation of how it works. This reduces the time needed to generate samples to a few minutes.

To disable fast generation:
python generate.py --samples 1500 model.ckpt-1000 --fast_generation=false

Missing features

Currently, there is no conditioning on extra information.

Entradas populares de este blog

¡Más potencia!

«¡Es la guerra! ¡Traed madera! ¡Más madera!»  (Los hermanos Marx) Introducción. El mundo de las ciencias de la computación están estos días de enhorabuena, un nievo hito histórico acaba de acontecer: hablamos por supuesto del casi milagroso desarrollo de Google DeepMind denominado AlphaZero , un modelo neuronal capaz de aprender de manera autónoma no supervisada (sin apoyo de datos etiquetados ofrecidos por el hombre) a jugar con capacidades sobrehumanas a varios juegos milenarios como el Go y el ajedrez ( aquí podéis descargar el paper de este proyecto). DeepMind acaba de demostrar así que la metodología que utilizaron para que un modelo neuronal aprendiera (con capacidades sobrehumanas) por sí misma sin apoyo de datos humanos el juego de Go, es generalizable a cualquier otro tipo de juego o situación. En el arriba comentado paper nos explican por ejemplo como en 4 horas (sí, sólo 4 horas), la red neuronal fue capaz de aprender a jugar al ajedrez (entre otros juegos) con una ca...

Replicando el desarrollo de Google DeepMind: AlphaGo Zero

Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0. If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society.  (Profesor David Silver) Hace unos meses   Google DeepMind   hizo público uno de sus resultados más asombrosos: una versión del modelo neuronal que fue capaz de derrotar al campeón del mundo de   Go , solo que esta vez no necesitaron hacer uso de ningún aprendizaje supervisado de juegos entre humanos (hablé en este mismo blog en   ...

Sobre el mito de la caja negra en el campo de la inteligencia artificial

En relación a esta  buena entrada de Santiago  donde trata el hito que  DeepMind  ha logrado con el sistema de inteligencia artificial  Alpha Zero , me gustaría comentar algo sobre la cuestión que más se malinterpreta actualmente de la moderna IA: ¿es cierto que no sabemos cómo hace lo que hace? ¿Se trata realmente de una misteriosa caja negra inexpugnable? Pues bien, la respuesta es no y no. Sabemos perfectamente (los que se dedican e investigan en este campo) por qué la moderna IA hace lo que hace y cómo lo hace. Y lo de "la caja negra" pues...sencillamente es un mito sensacionalista. Todo el machine learning actual ( Alpha Zero  incluido) es el resultado de procesos matemáticos algebraicos trabajando sobre números reales. Más en concreto, millones de operaciones de sumas y multiplicaciones tensoriales sobre un conjunto de (millones) de números reales almacenados en un fichero para tal fin. Como veis no hay misterio ni "magia" por ninguna parte. Y...