posts - 7, comments - 24, trackbacks - 0

My Links

News

Archives

Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Ok, so I needed to convert .docx files to .pdf files on the fly, but none of the free php libraries that were available let me do it on my server (a webservice was not good enough).

Basically either I needed to pay for a library (and have it maybe suck) or just deal with the free ones that didn't convert the formatting well enough.

Not good enough!

I found that LibreOffice (OpenOffice's successor) allows command line conversion using the LibreOffice conversion engine (which DID preserve the formatting like I wanted and generally worked great).

I loaded the latest version of Ubuntu (http://www.ubuntu.com/download/ubuntu/download) onto my Virtual Box (https://www.virtualbox.org/wiki/Downloads) on my computer and found that I was able to easily convert files using the commandline like this:

libreoffice --headless -convert-to pdf fileToConvert.docx -outdir output/path/for/pdf

I thought: sweet...but I don't have admin rights on my host's web server. I tried to use a "portable" version of LibreOffice that I obtained from http://portablelinuxapps.org/ but I was unable to get it to work on my host's webserver, because my host's webserver didn't have all the dependencies (Dependency Hell! http://en.wikipedia.org/wiki/Dependency_hell)

I was at a loss of how to make it work, until I ran across a cool project made by a Ph.D. student (Philip J. Guo) at Stanford called CDE: http://www.stanford.edu/~pgbovine/cde.html

I will let you look at his explanations of how it works (I followed what he did here:

starting at about 32:00 as well as the directions on his site), but in short, it allows one to avoid dependency hell by copying all the files used when you run certain commands, recreating the linux environment where the command worked. I was able to use this to run LibreOffice without having to resort to someone's portable version of it, and it worked just like it did when I did it on Ubuntu with the command above, with a tweak: I needed to run the wrapper of LibreOffice the CDE generated.

So, below is my PHP code that calls it. In this code snippet, the filename to be copied is passed in as $_POST["filename"]. I copy the file to the same spot where I originally converted the file, convert it, copy it back and then delete all the files (so that it doesn't start growing exponentially).

I did it this way because I wasn't able to make it work otherwise on the webserver. If there is a linux + webserver ninja out there that can figure out how to make it work without doing this, I would be interested to know what you did. Please post a comment or something if you did that.

<?php
//first copy the file to the magic place where we can convert it to a pdf on the fly
copy($_POST["filename"], "../LibreOffice/cde-package/cde-root/home/robert/Desktop/".$_POST["filename"]);
//change to that directory
chdir('../LibreOffice/cde-package/cde-root/home/robert');
//the magic command that does the conversion
$myCommand = "./libreoffice.cde --headless -convert-to pdf Desktop/".$_POST["filename"]." -outdir Desktop/";
exec ($myCommand);
//copy the file back
copy("Desktop/".str_replace(".docx", ".pdf", $_POST["filename"]), "../../../../../documents/".str_replace(".docx", ".pdf", $_POST["filename"]));
//delete all the files out of the magic place where we can convert it to a pdf on the fly
$files1 = scandir('Desktop');
//my files that I generated all happened to start with a number.
$pattern = '/^[0-9]/';
foreach ($files1 as $value)
{
preg_match($pattern, $value, $matches);
if(count($matches) ?> 0)
{
unlink("Desktop/".$value);
}
}
//changing the header to the location of the file makes it work well on androids
header( 'Location: '.str_replace(".docx", ".pdf", $_POST["filename"]) );
?>

And here is the tar.gz file I generated I generated with CDE. See below for a working example and complete, documented code.

Success! I made a truly portable version of LibreOffice that can convert files on the fly on a webserver using 100% free, open source software!

Note: since when I used CDE I only converted a .docx to a .pdf, my tar.gz file above will probably only work to do that. To get it to do other things, you will have to do them with CDE first.

*****************************************************************************

UPDATE: since several people have had questions on how to get it working or had issues making it work, I am putting a complete working example out there for you to play with and modify.

Click here for working example.

And here is the tar.gz of the working example, tied up in a nice bow for you. To make sure the permissions don't get screwed up, I recommend uploading the tar.gz file to your server and then unpacking it there.

This is my way of giving back to all the great people out there that have helped me out by doing these kinds of things for me. Pay it forward, guys! [licensed under the MIT license.]

Print | posted on Saturday, November 19, 2011 6:07 PM |

Feedback

Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Thanks for this, I too have a shared server (with 1and1). I managed to package up LibreOffice to process .doc, .docx, ppt and pptx files and convert them to PDF. I can ssh into the server and the commands all work fine. However, when I use exec, or shell_exec, nothing happens! I don't even get an error message. Everything seems fine, it just doesn't create the PDF file at the end of it. Did you have any problems with your shared hosting?
1/8/2012 11:39 AM | tricky
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Try adding "-nologo -nofirststartwizard"(omitting the quotes ;) ) to the parameters... That did the trick for me on my box even though I run it in bash rather than via a php file..
1/10/2012 7:50 AM | Niels
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Great article! :) I've been playing around with this and it seems to work well, but there are two questions I have.

I can convert the first page of a TIFF file to a PDF, but how do I read all pages?

What do I need to install to get it to output a TIFF file?

No worries if you don't know, but if you could help me out with those two points I'd be very grateful :)
2/1/2012 7:04 AM | Paul
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

I can only convert to PDF and not HTML.

convert /tmp/math.docx -> /tmp//math.html using XHTML Writer File
Error: Please reverify input parameters...

Have you made any hardcoding in the CDE for PDF?
2/29/2012 7:56 AM | Vs
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

To convert to something other than a PDF (like a TIFF or HTML), you will need to redo the CDE steps with libre office or some other program in linux that can convert like you want it to.
The tar.gz file I have hanging on the server above was specifically geared towards converting to a .PDF
2/29/2012 8:22 AM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Hi, Can you please make CDE for HTML
Thanks
2/29/2012 9:12 AM | Vs
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

do you use a portable version of Libreoffice ?

I could find only a 32bit version of the portable version of Libreoffice for linux, and my server runs on 64bit..

I tried to use CDE with the installed version of libreoffice, when I use exec(..), it fails and the return code is 77 (which I couldn't find the meaning...).

Any suggestion ?
3/13/2012 3:28 AM | kamel
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

No I didn't use the portable version of Libreoffice.

I made a CDE version of libreoffice in linux and used that instead. I recommend you just use the tar.gz file I have at the end of my article and follow my directions to get that working.

I hope that helps :)
3/13/2012 10:23 PM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

I finally made it work. the return code 77 meant my php script didn't have permissions to execute the handler libreoffice.cde.
So I changed the owner and group of the folder cde-package to www-data, and it works perfectly. You can see a demo here : http://www.flexilivre.com/doc/

Thanks a lot !
3/14/2012 1:37 AM | kamel
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

That is awesome! I am glad that it helped you out!

This article is my way of giving back to all the awesome articles and open source tools and such that folks have put out there. Hearing that it made your life easier makes me happy :)

3/14/2012 7:05 AM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

How to convert docx to pdf without color ?
3/30/2012 3:22 AM | Yau Tee Kuan
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Hi there,

I am trying to use libreoffice to convert docx to pdf from PHP. I have used the following command. I am using Ubuntu linux with Apache.

$myCommand = "/usr/bin/libreoffice -headless -invisible -convert-to pdf {$file_name} -outdir /www-disk/temp/";
exec($myCommand, $output, $ret_var);

I have set the permissions for the user www-data as below:
chown www-data:www-data /usr/bin/libreoffice
chown -R www-data:www-data /usr/share/libreoffice/

The file is not getting converted when I run it through php.

When I run from the command prompt it did convert perfectly.
One thing I noticed. After running the above command there is another command automatically popped out in the terminal as:

# convert ...filepath.docx -> ...filepath.pdf using writer_pdf_Export

Thanks for your help in this...
4/4/2012 2:26 AM | Hameed
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Dear Robert,

Any suggestions from your side? It would be an immense help actually for us here.

Thanks a lot!


4/6/2012 12:43 AM | Hameed
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Have you checked to see if PHP has permissions to execute any command with exec(), like just to rename a file or something? Since the command works from the command line, it should work from PHP as long as PHP is properly set up and updated.

I hope that helps!

Robert
4/10/2012 8:08 PM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Hi there,

I'm uploaded your files to my server and tried to change them accordingly but I am however running into a roadblock.

When executing:

$myCommand = "./libreoffice.cde --headless -convert-to pdf Desktop/".$document." -outdir Desktop/";

I am told ../../../cde-exec: not found

Any thoughts?
4/12/2012 2:43 PM | adjarbde
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

The code above does work.

I have made it work successfully on two different servers (and when it came time to make it work on the 2nd server, I actually went back and copied my code from this article to make it work), and have had several people thank me for it after they were able to get it to work.

This means that, assuming you have properly extracted the tar.gz file above and are running a somewhat updated LAMP server with a somewhat normal set of web server settings, you must be trying to call the command from the wrong directory.

I hope that helps. If I get some time this next week, I will make a working example project and upload it to my server and put a link to the code here in this article.

Robert
4/17/2012 4:18 PM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

I put a working example of it on my server, and included the code for it. See the update. I hope that helps.

Do great things and pay it forward! :)

Robert
4/17/2012 8:49 PM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Thanks for putting so much effort into this, I've currently got my setup successfully converting from docx to PDF. My only question for you is that it seems the fonts in my word documents get screwed up in the conversion process.

I downloaded & installed the Microsoft Core Fonts and even got the Microsoft Vista fonts but there doesn't seem to be any effect. Any tips or ideas on what to do with that?
4/19/2012 1:16 PM | Jeffry56
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

You are welcome!
Well, as far as the font conversion is concerned, it is only going to be as good as the fonts on my ubuntu/libreoffice combination that I used when I made the CDE libreoffice portable that I used in it. I didn't load any special fonts because I didn't need them.
To fix this, I suggest that you fire up linux, install libreoffice, make sure all your fonts appear in your libreoffice, then convert it once on the command line with CDE, like I did, (see the instructions above and CDE website) then replace the LibreOffice folder in my example with your copy of LibreOffice that CDE kicks out, being sure to adjust folder names as necessary.
I believe that will fix your problem. I hope that helps.

Robert
4/19/2012 1:25 PM | Robert Hyatt
Gravatar

# re: Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

Thanks for this helpful article.
But I'm on Windows, I've installed LibreOffice and tryied conversion manually form the GUI, it works fine.
But the command line "soffice --headless -convert-to pdf M:\DataMourad\EnvDev_VB\CoursVB\AdoNet.docx -outdir M:\DataMourad\EnvDev_VB\CoursVB\" dont work yet.
Can anyone tell me what I must to do ? Is CDE nessecerlly for using the code Php.
Thanks a lot for your help.
4/27/2012 3:26 AM | Mourad
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 
 

Powered by: