Reading data from PDF files using UFT tool
We might have come across reading the pdf files (probably reports) while automating our test cases. Many of our Automation testers might have faced difficulties to access PDF files, but with below approach we can overcome this issue and play around with PDF files with UFT tools.
Accessibility of PDF
The Adobe Portable Document Format (PDF) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create them, as well as of the output device on which they are to be displayed or printed.
PDF files specify the appearance of pages in a document in a reliable, device independent manner Adobe provides methods to make the content of a PDF file available to assistive technology such as screen readers. On The Microsoft Windows operating system, Adobe Acrobat and Adobe Reader export PDF content as COM objects. Applications can interface with Acrobat or Adobe Reader in two Ways:
1. Through Microsoft's Active Accessibility (MSAA) interface, using MSAA objects that Acrobat or Adobe Reader exorts.
2. Directly through exported COM objects that allow access to the PDF document's internal structure, called the Document Object Model (DOM).
The Microsoft Windows version of Acrobat is an OLE Automation server. In order to use the OLE objects made available by Acrobat, we must have the full Acrobat product installed on your system!!
For creating OLE AcroExch.App objects , we requires trail/licensed version of acrobat professional or Licensed version of acrobat standard.
We cannot create above object with without installing acrobrat professional/standard softwares, It gives activeX error. No trail version for acrobat standard available.
How to install acrobat professional software:
With the URL: https://acrobat.adobe.com/us/en/free-trial-download.html?sdid=KQUSA
Click on “Get started” button
This will take few minutes to download.
At the end it downloads software with name: acrobatProDCXXXXXXXXXX.exe
Install it with admin rights.
This whole installation process takes 20-30 minutes time
Once installation is ready, you are good to go to create Acrobat COM and DOM objects (AcroExch.App, AcroExch.AVDoc)
Below is the sample code to read PDF file with AcroExch OLE object
’Below Function reads to PDF file and put the content in txt file.
Call PDF_ReadPDFFileAndSaveinTextFile("C:\Govardhan\Testing\mi-certificate1.pdf","C:\Govardhan\Testing\testt.txt")
Public Function PDF_ReadPDFFileAndSaveinTextFile(strPDFFilePath,strTxtFilePath)
On Error Resume Next
'strFileName = "C:\Govardhan\Testing\mi-certificate1.pdf"
Set AcroApp = CreateObject("AcroExch.App")
AcroApp.Show
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
AcroAVDoc.Open strFileName,""
Set AcroAVDoc = AcroApp.GetActiveDoc
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - AcroPDDoc.GetNumPages
' AcquirePage: Acquires the specified page. The first page in a PDDoc is always 0. returns true if successful and false otherwise.
Set PageNumber = AcroPDDoc.AcquirePage(i)
'the Hilite list object is being created
Set PageContent = CreateObject("AcroExch.HiliteList")
PageContent.Add 0, 20 ' getting 3 words of first page.
'text selection AcroTextSelect is being created
Set AcroTextSelect = PageNumber.CreateWordHilite(PageContent)
'GetNumText: Gets the number of text elements in a text selection. Use this method to determine how many times to call the PDTextSelect.GetText method to obtain all of a text selection’s text.
For j = 0 To AcroTextSelect.GetNumText -1
Content = Content & AcroTextSelect.GetText(j)
Next
Next
msgbox Content
'strFile = "C:\Govardhan\Testing\test.txt"
strText = Content
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForAppending = 8
Set objTextFile = objFSO.OpenTextFile (strFile, ForAppending, True)
objTextFile.WriteLine(strText)
objTextFile.Close
AcroAVDoc.Close True
AcroApp.Exit
Set AcroDoc = Nothing
Set AcroApp = Nothing
On Error GoTo 0
End Function
With this OLE object we can get below properties of PDF file
Time, show, minimize, maximize, hide, FindText, GetTitle, Open, Print size, showtextselect, GetPageNum, GetText, Getpage, Create, delete..etc)
We might have come across reading the pdf files (probably reports) while automating our test cases. Many of our Automation testers might have faced difficulties to access PDF files, but with below approach we can overcome this issue and play around with PDF files with UFT tools.
Accessibility of PDF
The Adobe Portable Document Format (PDF) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create them, as well as of the output device on which they are to be displayed or printed.
PDF files specify the appearance of pages in a document in a reliable, device independent manner Adobe provides methods to make the content of a PDF file available to assistive technology such as screen readers. On The Microsoft Windows operating system, Adobe Acrobat and Adobe Reader export PDF content as COM objects. Applications can interface with Acrobat or Adobe Reader in two Ways:
1. Through Microsoft's Active Accessibility (MSAA) interface, using MSAA objects that Acrobat or Adobe Reader exorts.
2. Directly through exported COM objects that allow access to the PDF document's internal structure, called the Document Object Model (DOM).
The Microsoft Windows version of Acrobat is an OLE Automation server. In order to use the OLE objects made available by Acrobat, we must have the full Acrobat product installed on your system!!
For creating OLE AcroExch.App objects , we requires trail/licensed version of acrobat professional or Licensed version of acrobat standard.
We cannot create above object with without installing acrobrat professional/standard softwares, It gives activeX error. No trail version for acrobat standard available.
How to install acrobat professional software:
With the URL: https://acrobat.adobe.com/us/en/free-trial-download.html?sdid=KQUSA
Click on “Get started” button
This will take few minutes to download.
At the end it downloads software with name: acrobatProDCXXXXXXXXXX.exe
Install it with admin rights.
This whole installation process takes 20-30 minutes time
Once installation is ready, you are good to go to create Acrobat COM and DOM objects (AcroExch.App, AcroExch.AVDoc)
Below is the sample code to read PDF file with AcroExch OLE object
’Below Function reads to PDF file and put the content in txt file.
Call PDF_ReadPDFFileAndSaveinTextFile("C:\Govardhan\Testing\mi-certificate1.pdf","C:\Govardhan\Testing\testt.txt")
Public Function PDF_ReadPDFFileAndSaveinTextFile(strPDFFilePath,strTxtFilePath)
On Error Resume Next
'strFileName = "C:\Govardhan\Testing\mi-certificate1.pdf"
Set AcroApp = CreateObject("AcroExch.App")
AcroApp.Show
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
AcroAVDoc.Open strFileName,""
Set AcroAVDoc = AcroApp.GetActiveDoc
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - AcroPDDoc.GetNumPages
' AcquirePage: Acquires the specified page. The first page in a PDDoc is always 0. returns true if successful and false otherwise.
Set PageNumber = AcroPDDoc.AcquirePage(i)
'the Hilite list object is being created
Set PageContent = CreateObject("AcroExch.HiliteList")
PageContent.Add 0, 20 ' getting 3 words of first page.
'text selection AcroTextSelect is being created
Set AcroTextSelect = PageNumber.CreateWordHilite(PageContent)
'GetNumText: Gets the number of text elements in a text selection. Use this method to determine how many times to call the PDTextSelect.GetText method to obtain all of a text selection’s text.
For j = 0 To AcroTextSelect.GetNumText -1
Content = Content & AcroTextSelect.GetText(j)
Next
Next
msgbox Content
'strFile = "C:\Govardhan\Testing\test.txt"
strText = Content
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForAppending = 8
Set objTextFile = objFSO.OpenTextFile (strFile, ForAppending, True)
objTextFile.WriteLine(strText)
objTextFile.Close
AcroAVDoc.Close True
AcroApp.Exit
Set AcroDoc = Nothing
Set AcroApp = Nothing
On Error GoTo 0
End Function
With this OLE object we can get below properties of PDF file
Time, show, minimize, maximize, hide, FindText, GetTitle, Open, Print size, showtextselect, GetPageNum, GetText, Getpage, Create, delete..etc)