Ir al contenido principal

Implementación en TensorFlow de la red neuronal generativa (WaveNet) enfocada en la generación de texto

Enfocado en el reciente trabajo teórico de Google DeepMind (WaveNet), he desarrollado e implementado una versión en TensorFlow de dicha arquitectura, pero realizando modificaciones para enfocar el diseño en la generación automática de textos (el paper original es: https://arxiv.org/pdf/1609.03499.pdf).

El modelo ideado originalmente por DeepMind fue divulgado con detalle en este artículo: https://deepmind.com/blog/wavenet-generative-model-raw-audio/, y mi implementación del mismo (con las modificaciones necesarias para enfocarlo todo en la generación de texto), se encuentra en el siguiente repositorio de mi cuenta personal en GitHub: https://github.com/Zeta36/tensorflow-tex-wavenet. Se trata como podéis ver, de un desarrollo Python usando el framework de Google, TensorFlow.

Os dejo a continuación toda la información técnica sobre mi trabajo, aunque siento no tener tiempo para traducirlo y simplemente copiaré a continuación la introducción que hice del mismo en inglés:

A TensorFlow implementation of DeepMind's WaveNet paper for text generation.


This is a TensorFlow implementation of the WaveNet generative neural network architecture for text generation.

Previous Work

Originally, the WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).
This network models the conditional probability to generate the next sample in the audio waveform, given all previous samples and possibly additional parameters.
After an audio preprocessing step, the input waveform is quantized to a fixed integer range. The integer amplitudes are then one-hot encoded to produce a tensor of shape (num_samples, num_channels).
A convolutional layer that only accesses the current and previous inputs then reduces the channel dimension.
The core of the network is constructed as a stack of causal dilated layers, each of which is a dilated convolution (convolution with holes), which only accesses the current and past audio samples.
The outputs of all layers are combined and extended back to the original number of channels by a series of dense postprocessing layers, followed by a softmax function to transform the outputs into a categorical distribution.
The loss function is the cross-entropy between the output for each timestep and the input at the next timestep.
In this repository, the network implementation can be found in wavenet.py.

New approach


This work is based in one implementation of the original WaveNet model (Wavenet), but applying some modifications.
In summary: we are going to use the WaveNet model as a text generator. We'll use raw text data (characters), instead of raw audio files, and once the network is trained, we'll use the conditional probability found to generate samples (characters) into an self-generating process.

Only printable ASCII characters (Dec. 0 up to 255) is supported right now.

Results


Pretty interesting results are reached!! Feeding the network with enough text and training, the model is able to memorize the probability of the characters disposition (in a lenguage), and generate later even a very similar text!!

For example, using the Penn Tree Bank (PTB) dataset, and only after 15000 steps of training (with low set of parameters setting) this was the self-generated output (the final loss was around 1.1):

"Prediction is: 300-servenns on the divide mushin attore and operations losers nis called him for investment it was with as pursicularly federal and sotheby d. reported firsts truckhe of the guarantees as paining at the available ransions i 'm new york for basicane as a facerement of its a set to the u.s. spected on install death about in the little there have a $ N million or N N bilot in closing is of a trading a congress of society or N cents for policy half feeling the does n't people of general and the crafted ended yesterday still also arjas trading an effectors that a can singaes about N bound who that mestituty was below for which unrecontimer 's have day simple d. frisons already earnings on the annual says had minority four-$ N sance for an advised in reclution by from $ N million morris selpiculations the not year break government these up why thief east down for his hobses weakness as equiped also plan amr. him loss appealle they operation after and the monthly spendings soa $ N million from cansident third-quarter loan was N pressure of new and the intended up he header because in luly of tept. N million crowd up lowers were to passed N while provision according to and canada said the 1980s defense reporters who west scheduled is a volume at broke also and national leader than N years on the sharing N million pro-m was our american piconmentalist profited himses but the measures from N in N N of social only announcistoner corp. say to average u.j. dey said he crew is vice phick-bar creating the drives will shares of customer with welm reporters involved in the continues after power good operationed retain medhay as the end consumer whitecs of the national inc. closed N million advanc"

This is really wonderful!! We can see that the original WaveNet model has a great capacity to learn and save long codified text information inside its nodes (and not only audio or image information). This "text generator" WaveNet was able to learn how to write English words and phrases just by predicting characters one by one, and sometimes was able even to learn what word to use based on context.

This output is far to be perfect, but It was trained in a only CPU machine (without GPU) using a low set of parameters configuration in just two hours!! I hope somebody with a better computer can explore the potential of this implementation.

If you want to check this results, you just have to type this in a command line terminal (this will use the trained model checkout I uploaded to the respository):
python generate.py --text_out_path=output.txt --samples 2000 
./logdir/train/2016-10-02T10-45-10/model.ckpt-14999

Requirements


TensorFlow needs to be installed before running the training script. TensorFlow 0.10 and the current master version are supported.

Training the network


You can use any text (.txt) file.

In order to train the network, execute
python train.py --data_dir=data
to train the network, where data is a directory containing .txt files. The script will recursively collect all .txt files in the directory.
You can see documentation on each of the training settings by running
python train.py --help
You can find the configuration of the model parameters in wavenet_params.json. These need to stay the same between training and generation.

Generating text


You can use the generate.py script to generate audio using a previously trained model.

Run
python generate.py --samples 16000 model.ckpt-1000
where model.ckpt-1000 needs to be a previously saved model. You can find these in the logdir. The --samples parameter specifies how many characters samples you would like to generate.
The generated waveform can be stored as a .txt file by using the --text_out_path parameter:
python generate.py --text_out_path=mytext.txt --samples 1500 model.ckpt-1000
Passing --save_every in addition to --text_out_path will save the in-progress wav file every n samples.

python generate.py --text_out_path=mytext.txt --save_every 2000 --samples 1500 model.ckpt-1000

Fast generation is enabled by default. It uses the implementation from the Fast Wavenet repository. You can follow the link for an explanation of how it works. This reduces the time needed to generate samples to a few minutes.

To disable fast generation:
python generate.py --samples 1500 model.ckpt-1000 --fast_generation=false

Missing features

Currently, there is no conditioning on extra information.

Entradas populares de este blog

Evidencia a favor de la teoría de Jeremy England (usando computación evolutiva)

"You start with a random clump of atoms, and if you shine light on it for long enough, it should not be so surprising that you get a plant." Jeremy England (2014), interview commentary with Natalie Wolchover Hace ya un mes que terminé de estudiar a fondo el interesante trabajo que el físico  Jeremy England  está realizando en el  MIT (Massachusetts Institute of Technology) . En mi blog he divulgado todo lo referente a este trabajo con mucho nivel de detalle, siendo esta entrada un compendio de todo lo que el trabajo cuenta. La idea de esta línea de investigación viene a decir, a grosso modo , que la física de nuestro mundo mantiene una relación implícita entre complejidad y energía . Esta relación indica que, cuanto más complejo es un fenómeno, más energía debe disiparse de modo que crezca la probabilidad de que tal fenómeno finalmente acontezca. Esta teoría de Jeremy parte, y se deduce, de una base termodinámica y de mecánica estadística ya establecida, por lo que sus concl

Aprendizaje automático mediante Deep Q Ntework (DQN + TensorFlow)

"[Las neuronas son] células de formas delicadas y elegantes, las misteriosas mariposas del alma, cuyo batir de alas quién sabe si esclarecerá algún día el secreto de la vida mental."  (Ramón y Cajal) Introducción. Este artículo es una continuación de mi entrada anterior "Las matemáticas de la mente" [2]. Vimos en ese artículo cómo era posible que un simple algoritmo de computación pudiese imitar el modo en que nuestro cerebro aprende a realizar tareas con éxito, simplemente a partir del equivalente computacional de una red neuronal. Sin embargo, a pesar de que en dicha entrada os comentaba el caso de cómo se puede programar un algoritmo capaz de conseguir  literalmente,  aprender a jugar al Conecta4 (4 en raya) sin especificar ( pre-programar ) en ningún momento las reglas del juego; es posible que muchos notasen que aún así, todavía había que pre-procesar la entrada de la red neuronal para ofrecerle a las neuronas (nodos) de la capa de entrada ( inputs ) qué ficha

Aprendizaje autónomo por computación evolutiva (Conecta 4)

"[Las neuronas son] células de formas delicadas y elegantes, las misteriosas mariposas del alma, cuyo batir de alas quién sabe si esclarecerá algún día el secreto de la vida mental."  (Ramón y Cajal) Introducción. Dibujo de Ramón y Cajal de las células del cerebelo de un pollo,  mostrado en "Estructura de los centros nerviosos de las aves", Madrid, 1905. Dos noticias muy importantes que han tenido lugar estas últimas semanas en el campo de la neurociencia y la inteligencia artificial (de las cuales me hice eco en este mismo blog: aquí [1][2] y  aquí [3]), me hizo recordar un trabajo de computación que hice allá por el 2011 cuando inicié el doctorado en ingeniería (el cual por cierto aún no terminé, y que tengo absolutamente abandonado :( Ya me gustaría tener tiempo libre para poder retomarlo; porque además odio dejar las cosas a medias). Pues bien, el trabajo original[4] (que he mejorado) consistía en ser el desarrollo de un algoritmo capaz de aprender a jugar a