google-research

arxiv_latex_cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Under Apache License 2.0
By google-research

latex arxiv

arxiv_latex_cleaner

This tool allows you to easily clean the LaTeX code of your paper to submit to
arXiv. From a folder containing all your code, e.g. /path/to/latex/, it
creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to
arXiv.


Example call:

console
arxiv_latex_cleaner /path/to/latex --im_size 500 --images_whitelist='{"images/im.png":2000}'


Or simply from a config file


console
arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml


Installation:

console
pip install arxiv-latex-cleaner


| :exclamation: arxiv_latex_cleaner is only compatible with Python >=3 :exclamation: |
|--------------------------------------------------------------------------------------|


Alternatively, you can download the source code:


console
git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help


And install as a command-line program directly from the source code:


console
python setup.py install


Main features:
Privacy-oriented

Size-oriented

There is a 50MB limit on arXiv submissions, so to make it fit:



TikZ picture source code concealment

To prevent the upload of tikzpicture source code or raw simulation data, this
feature:



More sophisticated pattern replacement based on regex group captures

Sometimes it is useful to work with a set of custom LaTeX commands when writing
a paper. To get rid of them upon arXiv submission, one can simply revert them to
plain LaTeX with a regular expression insertion.


yaml
{
"pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
"insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
"description" : "Replace figcomp"
}


The pattern above will find all \figcomp{path}{w1}{w2} commands and replace
them with
\parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}.
Note that the insertion template is filled with the
named groups captures
from the pattern. Note that the replacement is processed before all
\includegraphics commands are processed and corresponding file paths are
copied, making sure all figure files are copied to the cleaned version. See also
cleaner_config.yaml for details on how to specify the
patterns.


Usage:

```
usage: [email protected] [-h] [--resize_images] [--im_size IM_SIZE]
[--compress_pdf]
[--pdf_im_resolution PDF_IM_RESOLUTION]
[--images_whitelist IMAGES_WHITELIST]
[--keep_bib]
[--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]]
[--use_external_tikz USE_EXTERNAL_TIKZ]
[--config CONFIG] [--verbose]
input_folder


Clean the LaTeX code of your paper to submit to arXiv. Check the README for
more information on the use.


positional arguments:
input_folder Input folder containing the LaTeX code.


optional arguments:
-h, --help show this help message and exit
--resize_images Resize images.
--im_size IM_SIZE Size of the output images (in pixels, longest side).
Fine tune this to get as close to 10MB as possible.
--compress_pdf Compress PDF images using ghostscript (Linux and Mac
only).
--pdf_im_resolution PDF_IM_RESOLUTION
Resolution (in dpi) to which the tool resamples the
PDF images.
--images_whitelist IMAGES_WHITELIST
Images (and PDFs) that won't be resized to the default
resolution,but the one provided here. Value is pixel
for images, and dpi forPDFs, as in --im_size and
--pdf_im_resolution, respectively. Format is a
dictionary as: '{"path/to/im.jpg": 1000}'
--keep_bib Avoid deleting the *.bib files.
--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]
LaTeX commands that will be deleted. Useful for e.g.
user-defined \todo commands. For example, to delete
all occurrences of \todo1{} and \todo2{}, run the tool
with --commands_to_delete todo1 todo2.Please note
that the positional argument input_folder cannot
come immediately after commands_to_delete, as the
parser does not have any way to know if it's another
command to delete.
--use_external_tikz USE_EXTERNAL_TIKZ
Folder (relative to input folder) containing
externalized tikz figures in PDF format.
--config CONFIG Read settings from .yaml config file. If command
line arguments are provided additionally, the config
file parameters are updated with the command line
parameters.
--verbose Enable detailed output.
```


Testing:

python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test


Note

This is not an officially supported Google product.