Index PDF documents on SharePoint using Adobe PDF IFilter 9

Using the SharePoint Search you can find documents based on their filename, metadata or content within a document. By default the content of Office documents is indexed by the SharePoint crawler, but PDF files are not crawled. To add support for PDF files you have to add an IFilter which the SharePoint crawler uses to read through PDF files and add the information to the search index.

To obtain an IFilter for PDF you can purchase the Foxit PDF IFilter from Foxit Software. There is also a free PDF IFilter available from Adobe which does exactly what you need and what this post is all about.

 

Getting Adobe IFilter 9 to work with SharePoint

In earlier days you were required to download Adobe's IFilter as a separate file. Since Adobe 8.0 it is included within the Adobe Acrobat and Adobe Reader products. The current version of Adobe Reader is 9.0 includes an IFilter that is compatible with the latest PDF implementations.

To enable PDF indexing use the following steps:

  • Download Adobe Reader 9.0, which includes IFilter 9.0.0.0, from http://www.adobe.com/products/acrobat/
  • Download the Acrobat PDF Picture, to display in front of PDF search result items, from http://www.adobe.com/misc/linking.html
  • Add the PDF file type to the Extensions List for WSS search by editing the registry
    • Start regedit
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList
    • Add PDF to the list as a new String Value. Use a new high value e.g. if 37 is the highest value, use "38" as the key with the value "pdf"
  • Add the Acrobat PDF picture to the SharePoint templates directory. Copy the Acrobat PDF picture called pdficon_small.gif in the 12 Hive\TEMPLATE\IMAGES folder, e.g. %programfiles%\Common Files\Microsoft Shared\Web Server Extensions\12\TEMPLATE\IMAGES.
  • Bind the Acrobat PDF picture to the PDF file type
    • Open the 12 Hive\TEMPLATE\XML\DOCICON.XML file
    • Find the <DocIcons.ByExtension> part
    • Add the following mapping:
      <mapping Key="pdf" Value="pdficon_small.gif" OpenControl="" />
  • Change IFilter mapping in registry
    • Start regedit
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\
    • Add (or modify) the .pdf key
    • Add a Multi-String value with value {E8978DA6-047F-4E3D-9C78-CDBE46041603} or modify if another GUID value already exists.
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\
    • Add (or modify) the .pdf key
    • Add a Multi-String value with value {E8978DA6-047F-4E3D-9C78-CDBE46041603} or modify if another GUID value already exists.
  • Add the Adobe Reader folder to the environment path variable
    • Right Click on My Computer
    • Open Properties
    • Open the Advanced tab
    • Go to the Environment variables
    • Edit the Path variable
    • Add your Reader folder to the Path list, e.g. C:\Program Files\Adobe\Reader 9.0\Reader
  • Restart the Search service by restarting your server or executing the following commands:
    • Run: net stop osearch
    • Run: net start osearch
  • Crawl the PDF documents
    • Existing PDF documents that were crawled before the Adobe PDF IFilter has been installed are not indexed during an incremental crawl. You have to edit each existing PDF file to trigger the crawler to reindex the file during an incremental crawl. It´s easier to run a full crawl after you have installed the Adobe PDF IFilter.

Now all PDF documents are crawled you can query on content inside a PDF document.

Published Thu, Oct 2 2008 3:30 PM by Harold van de Kamp

Comments

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, October 13, 2008 2:24 PM by David Weber

Beste Harold,

We hebben overlappende interesses op zakelijk niveau. Zou je mij willen contacteren via "davidweber at telfort punt nl"

Groet,

David.

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Wednesday, November 12, 2008 9:41 PM by Rocky

I followed these steps, and then I tried these labs.adobe.com/.../PDF_iFilter_8_-_64-bit_Support but still can't index pdf files. The Crawl logs show a message saying the file couldn't be crawled because the filter is missing. Any Ideas? Thanks.

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Saturday, November 22, 2008 7:45 PM by John Kowalczyk

After searching for 3 days, this was the best most current information on how to integrate the lastest Acrobat v9 iFilter into SharePoint 2007. I kept finding old information on v5/v6 and SharePoint 2003, with everything I found stating that since v7 the iFilter is included in the Reader download, there was no information on how to implement. And although I've only completed the steps about an hour ago and cannot confirm that the PDF content is properly indexed, I can say that there are no obvious errors in the suggested steps contained in this post.

# SharePoint Adbobe PDF IFilter Index Configuration | tomfusion.com

Pingback from  SharePoint Adbobe PDF IFilter Index Configuration | tomfusion.com

# Couldn't get it going used IFilter 6 instead

Tuesday, December 02, 2008 3:07 AM by Will

Hi, I couldn't get it working, then I found MS KB 927675 for Adobe IFilter 6 and got it to work.  It is a bit simpler to configure. It does not show how to do the icon as described above though.

Searching through pdf's in SP is a beautiful thing!

Keep up the good work Kamp!

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Tuesday, December 02, 2008 6:15 PM by LarryH

I completed the steps above on my MOSS2007 32-bit instance and it worked.  I would suggest adding an iisreset to the last step after the full crawl.  The pdf icon was not showing up until the iisreset.

Thanks for the great instructions and research on this!

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, December 11, 2008 5:13 PM by Tom

Thanks, worked great for me with sharepoint server 2007

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, December 15, 2008 8:16 PM by Tinch

going to be taking care of this later this week! that's for the great information

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, January 08, 2009 11:15 PM by rod bergren

the iisreset is important. It didn't quite work as expected until I did that.

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, January 22, 2009 2:11 PM by onlyme

All I can say is.... what a royal pain in the ass to have to go through!

In this day and age we should all be able to expect a simple donwload/run/finish type of operation instead of this hassle. Why can't these software giants actually pull their asses into gear and do something right for a change !!

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, January 22, 2009 2:12 PM by onlyme

oh, ps.. thanks for this info :)

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, February 05, 2009 11:37 AM by Michael

PDF Search is working fine. Very good documentation.

Thank you :)

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, February 09, 2009 7:10 PM by Matt

I tried this and adobe's documentation.  Although it indexes the file name, it does not index the content of the document.  Guess I'll go back to using the IFilter 6.0...

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Wednesday, March 04, 2009 8:14 PM by Charan V

This works like a magic....Very good documentation!!!

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Friday, March 06, 2009 8:30 PM by Ariel

This works great.  Thanks very much for posting this.  Although this worked great for me on one server, for some reason the new string value in:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList

... keeps dissapearing after a reboot on another server.  The result being pdf search won't work.  Any ideas?

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, March 09, 2009 8:36 AM by Simon

Works for me but like one of the other comments, I loose part of it after a reboot, ie Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList

Also

the search says it found a <PDF icon> "Microsoft Word" <Valid filename> ".doc" , but with the hyperlink being to the correct pdf

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, March 09, 2009 10:48 PM by aditi

I only followed these 3 steps and pdf crawling is successful:

• Downloaded Adobe Reader 9.0

• Go to Search Settings -> File Types – Add pdf

       (Note: Pdf image shows up automatically with the pdf extension.)

• Added the PDF file type to the Extensions List:

o Start regedit

o Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{GUID}\Gather\Search\Extensions\ExtensionList

o Added PDF to the list as a new String Value. Used a new high value e.g. used "38" as the key with the value "pdf".

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Tuesday, May 05, 2009 1:47 PM by Freek

Hi Harold,

The PDF icon did not show up until I changed 'mapping' into 'Mapping': "<Mapping Key="pdf" Value="pdficon_small.gif" OpenControl="" />" and did an IISRESET.

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Monday, May 18, 2009 8:53 PM by jr

I appreciate the info and have followed the steps and have done a full crawl but I still cannot search within a pdf.  I can find no errors in the logs.  I have tried rebooting and reinstalling but the pdf search still doesn't work.  Any ideas why that may be?  Envio: WindowsServer 2008, MOSS 2007 Enterprise Edition, Adobe 9.0

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Wednesday, May 20, 2009 9:46 PM by Geoff

After chasing my tail for a few hours and also not being able to get the content of PDFs to show up, it turned out the PDFs I was indexing were larger than the default sizes allowed by Search Server. Once I adjusted the MaxDownloadSize and MaxGrowFactor reg keys they started showing up.

Check out:

support.microsoft.com/.../318747

support.microsoft.com/.../927675

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Saturday, May 23, 2009 5:49 AM by Nerble

I can confirm that the difference in GUIDs between the Adobe PDF filter earlier versions and the one used by version 9 (E8978DA6-047F-4E3D-9C78-CDBE46041603) is indeed the cause of my particular manifestation of this problem.  I write this comment to hopefully save some of you other sysadmins some pain by pointing out that a full server bounce is required before the fix takes effect... just cycling the services involved doesn't do the trick.

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Friday, June 05, 2009 3:40 PM by Lise

I've followed the instructions and my pdfs are now indexed and searchable. But the document properties for pdfs are not indexed. I want to show the last-modified date of the documents in my search result site, but how to find it? In the crawled properties section I see several folders, among others office and web, and I was told that crawled properties of pdfs were supposed to be in a folder named pdf. There are no such folder... Any ideas?

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Thursday, June 18, 2009 10:24 AM by alanc

What do you mean by:

"Add a Multi-String value with value ... or modify if another GUID value already exists."

?

When I look in my registry keys for .pdf, they already have 4 entries inside, including a Multi-String.

So should I rename the existing Multi-String or create a new one, and either way, what am I supposed to call it?

# re: Index PDF documents on SharePoint using Adobe PDF IFilter 9

Friday, June 19, 2009 8:05 AM by alanc

Doh! Was looking in the wrong key, which was inconveniently close to the one you specified.

Leave a Comment

(required) 
(required) 
(optional)
(required) 
Please add 7 and 2 and type the answer here:
Powered by Community Server (Commercial Edition), by Telligent Systems