Monday 1 August 2016

Reading data from PDF files using QTP/UFT tool

Reading data from PDF files using UFT tool

We might have come across reading the pdf files (probably reports) while automating our test cases. Many of our Automation testers might have faced difficulties to access PDF files, but with below approach we can overcome this issue and play around with PDF files with UFT tools.

Accessibility of PDF
The Adobe Portable Document Format (PDF) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create them, as well as of the output device on which they are to be displayed or printed.
PDF files specify the appearance of pages in a document in a reliable, device independent manner Adobe provides methods to make the content of a PDF file available to assistive technology such as screen readers. On The Microsoft Windows operating system, Adobe Acrobat and Adobe Reader export PDF content as COM objects. Applications can interface with Acrobat or Adobe Reader in two Ways:

1. Through Microsoft's Active Accessibility (MSAA) interface, using MSAA objects that Acrobat or Adobe Reader exorts.
2. Directly through exported COM objects that allow access to the PDF document's internal structure, called the Document Object Model (DOM).

The Microsoft Windows version of Acrobat is an OLE Automation server. In order to use the OLE objects made available by Acrobat, we must have the full Acrobat product installed on your system!!

For creating OLE AcroExch.App objects , we requires trail/licensed version of acrobat professional or Licensed version of acrobat standard.
We cannot create above object with without installing acrobrat professional/standard softwares, It gives activeX error. No trail version for acrobat standard available.

How to install acrobat professional software:
With the URL: https://acrobat.adobe.com/us/en/free-trial-download.html?sdid=KQUSA

Click on “Get started” button
This will take few minutes to download.
At the end it downloads software with name: acrobatProDCXXXXXXXXXX.exe
Install it with admin rights.
This whole installation process takes 20-30 minutes time
Once installation is ready, you are good to go to create Acrobat COM and DOM objects (AcroExch.App, AcroExch.AVDoc)

Below is the sample code to read PDF file with AcroExch OLE object
’Below Function reads to PDF file and put the content in txt file.
Call PDF_ReadPDFFileAndSaveinTextFile("C:\Govardhan\Testing\mi-certificate1.pdf","C:\Govardhan\Testing\testt.txt")

Public Function PDF_ReadPDFFileAndSaveinTextFile(strPDFFilePath,strTxtFilePath)
     
    On Error Resume Next
 
    'strFileName = "C:\Govardhan\Testing\mi-certificate1.pdf"
    Set AcroApp = CreateObject("AcroExch.App")
    AcroApp.Show
    Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
    AcroAVDoc.Open strFileName,""
    Set AcroAVDoc = AcroApp.GetActiveDoc
    Set AcroPDDoc = AcroAVDoc.GetPDDoc
 
    For i = 0 To AcroPDDoc.GetNumPages - AcroPDDoc.GetNumPages
 
      ' AcquirePage: Acquires the specified page. The first page in a PDDoc is always 0. returns true if successful and false otherwise.
 
      Set PageNumber = AcroPDDoc.AcquirePage(i)
 
      'the Hilite list object is being created
 
       Set PageContent = CreateObject("AcroExch.HiliteList")
       PageContent.Add 0, 20 ' getting 3 words of first page.
 
     'text selection AcroTextSelect is being created
 
       Set AcroTextSelect = PageNumber.CreateWordHilite(PageContent)
 
       'GetNumText: Gets the number of text elements in a text selection. Use this method to determine how many times to call the PDTextSelect.GetText method to obtain all of a text selection’s text.
 
       For j = 0 To AcroTextSelect.GetNumText -1
            Content = Content & AcroTextSelect.GetText(j)
        Next
    Next
 
    msgbox Content
 
    'strFile = "C:\Govardhan\Testing\test.txt"
    strText = Content
 
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Const ForAppending = 8
 
    Set objTextFile = objFSO.OpenTextFile (strFile, ForAppending, True)
    objTextFile.WriteLine(strText)
    objTextFile.Close
 
    AcroAVDoc.Close True
    AcroApp.Exit
    Set AcroDoc = Nothing
    Set AcroApp = Nothing
    On Error GoTo 0
End Function

With this OLE object we can get below properties of PDF file
Time, show, minimize, maximize, hide, FindText, GetTitle, Open, Print size, showtextselect, GetPageNum, GetText, Getpage, Create, delete..etc)

2 comments:

  1. Hi Colleagues,
    I need some help from you.
    I need to write test case in VBscript for QTP / UFT (12.51)
    QTP should launch Adobe Acrobat Pro 9, 10, 11 (all available on testing machines) open existing PDF file from testing folder. Find some text in the PDF document. Select this text and replace with another text. Save changes. Close file. Close Acrobat. Everything looks very simple but I spent 2 days without results. And didn’t find any working code solution.
    findText() method didn’t find the text and I can’t proceed
    Thank you in advance for any help and advice.

    ReplyDelete