Friday, June 12, 2009

Powershell script for converting Word documents to PDF format

In my job, we only send documents to clients in pdf formats. Sometimes we have to send several documents at once and it takes a good time to open each one and save it as a pdf, it’s not a very pleasant task.

In order to minimize this problem, I’ve created a simple powershell script that opens a Word document and saves it as pdf without any user interaction. With such script, I can convert a lot of files at once, just like this:

ls . *.doc* –Recurse | %{ ~\scripts\doc2ps1.ps1 $_.fullname }

With this command, I list recursively each doc/docx file in a directory and save each one as PDF.

By default, Office 2007 apps are not able to save files as PDF, so you have to install the add-in from Microsoft first to use the script. The add-in is available here.

Bellow is the script, it’s also available in my PowerShell scripts repository at GitHub (http://github.com/dougfernando/utility-scripts/tree/master) as doc2pdf.ps1.

param (
[string]$source = $(Throw "You have to specify a source path."))

$extensionSize = 3
if ($source.EndsWith("docx")) {
$extensionSize = 4
}

$destiny = $source.Substring(0, $source.Length - $extensionSize) + "pdf"
$saveaspath = [ref] $destiny
$formatPDF = [ref] 17

$word = new-object -ComObject "word.application"
$doc = $word.documents.open($source)
$doc.SaveAs($saveaspath, $formatPDF)
$doc.Close()

echo "Converted file: $source"

ps winword | kill

An important note, the script closes every Word instance in the end, so save any work previous using it.

6 comments:

Anonymous said...

Hello

your script is great . At the moment I need to convert docx files to word97-2003 format and was just wondering how you worked out that pdf format is [ref] 17?

thanks

Dermot

Anonymous said...

Damn, truly great topic. How will I find your subscription?

Kate Watcerson
bug finders

Steve B. said...

Hi,

Why did you kill the word processes at the end ?

Anonymous said...

I'll try to explain why it is 17 and not 18 or whatever.
If you look at the Document.SaveAs Method http://msdn.microsoft.com/en-us/library/microsoft.office.tools.word.document.saveas(v=vs.80).aspx, you'll notice that its attribute are passed by reference. Here we need the FileFormat. Going to the description of the WdSaveFormat Enumeration http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdsaveformat.aspx , we notice that the wdFormatPDF, an enumberation member responsible for saving to PDF, has a number 0f 17 (I don't know why the Member table does not have a column with numbers, I think the leftmost column was supposed to have these numbers but eventually was published empty).

Raju said...

PDF => 17
Other numberers can be discovered
using .net reflector on Microsoft.Office.Interop.Word.dll and searching for wdsaveformat

Anonymous said...

Thanks... very helpful.
There's a little error in my env.
line 15,no ref.