Import PDF into Excel: The Complete Step-by-Step Guide to Converting PDF Tables, Reports, and Data into Spreadsheets in 2026
Learn how to import PDF into Excel using Power Query, Get Data, and online tools. Step-by-step methods to convert tables, reports, and scanned PDFs.

Learning how to import PDF into Excel is one of the most practical skills you can pick up in 2026, whether you handle financial statements, vendor invoices, research reports, or scanned forms that arrive locked inside a portable document format.
The good news is that Microsoft has invested heavily in native PDF connectivity through Power Query, meaning you no longer need third-party converters or copy-paste gymnastics to pull structured tables from a PDF into a worksheet. With a few clicks you can preview every table the engine detects, reshape the columns, and load the data straight into an Excel range or data model.
The reason this matters is simple. Most real-world business data still travels as PDF. Banks deliver statements as PDF. Government agencies publish census tables as PDF. Suppliers email purchase orders as PDF. If you cannot quickly transfer those numbers into rows and columns, you cannot analyze, pivot, or chart them. That is why mastering PDF imports sits alongside foundational skills like vlookup excel formulas and how to create a drop down list in excel as a core productivity capability for analysts, accountants, and operations professionals.
This guide walks through every method available in modern Excel, from the built-in Get Data From PDF connector introduced in Microsoft 365 to fallback options for older versions, Excel for Mac, and scanned image-based PDFs that require optical character recognition. You will learn which approach suits a clean digital PDF, which one handles multi-page reports, and which workflow you should reach for when the source file contains hundreds of merged cells, headers, and footers that would otherwise destroy your data integrity.
We will also cover the common pitfalls that trip up beginners. PDFs were designed for visual fidelity, not data exchange, so the same table can render correctly to the human eye while presenting itself to Excel as a tangled mess of overlapping text boxes. Knowing how to spot these problem files in advance saves hours of cleanup. We will share the diagnostic checks that experienced power users run before they ever click Import, including how to test whether a PDF is text-based or scanned bitmap.
By the end of this article you will have a repeatable workflow you can apply to any PDF, plus a troubleshooting playbook for the inevitable cases where the import does not produce a perfect grid on the first attempt. You will also see how Power Query lets you save the transformation steps so that next month, when an updated version of the same PDF arrives, refreshing the data takes a single click. That refresh capability is the real prize and the reason so many finance teams have standardized on this approach.
We will reference real examples throughout, including a sample bank statement PDF, a multi-page sales report with subtotals, and a scanned tax form. Each example highlights different challenges and shows the exact buttons to click, the menus to expand, and the transformations to apply. You will see screenshots described in detail along with the Power Query M code that runs behind the scenes, so you can adapt the steps to your specific files.
Finally, we will look at when to skip Excel entirely and use Adobe Acrobat, Power BI, or specialized OCR services instead. Excel is powerful, but it is not always the right tool for every PDF, especially those with complex layouts spanning multiple columns. Knowing the limits of the native importer helps you choose the fastest path to clean data and protects you from spending an afternoon wrestling with a file that simply was not designed to be parsed.
PDF Import in Excel by the Numbers

Four Ways to Import a PDF Into Excel
The native Power Query connector in Microsoft 365 and Excel 2021. Detects tables automatically, lets you preview each one, and loads structured data into a worksheet or data model with full refresh support.
The fastest method for a single small table. Open the PDF in a reader, select the table, copy, and paste into Excel. Works for simple grids but loses formatting and often misaligns columns.
Acrobat Pro converts PDFs directly to XLSX with high fidelity, preserving headers and merged cells. Best for complex multi-page reports where Power Query struggles with layout detection.
For scanned or image-based PDFs, use Adobe OCR, Google Drive, or dedicated tools like ABBYY FineReader to recognize text first, then import the resulting digital PDF using Power Query.
Free web tools like Smallpdf or ILovePDF convert PDF to XLSX in seconds. Convenient for one-off jobs but raise data privacy concerns for confidential financial or HR documents.
The native Power Query workflow is the recommended starting point for anyone running Microsoft 365 or Excel 2021 and later on Windows. To begin, open a blank workbook and navigate to the Data tab on the ribbon. Look for the Get Data dropdown on the far left, click it, then expand From File and select From PDF. A file picker appears. Browse to your PDF, select it, and click Import. After a few seconds of processing, the Navigator dialog opens and lists every table and page the engine detected, prefixed with Table001, Table002, Page001, and so on.
The Navigator is where most of the value lives. Click any item to see a live preview on the right side of the window. Tables are highlighted with structured rows and columns, while Pages show the entire page contents as a continuous text block.
For most data work you want to select the table objects rather than the page objects because tables come pre-parsed with header rows and column boundaries. If your data spans multiple pages of a recurring report, select all the matching table entries by holding Ctrl and clicking each one, then use the Transform Data button rather than Load.
Transform Data opens the Power Query Editor, which is the unsung hero of this entire workflow. Here you can append multiple tables into a single dataset using Home then Append Queries, promote the first row to headers, change column data types, filter out subtotal rows, replace stray characters, and split combined fields. Every action you take is recorded as a step in the Applied Steps pane on the right, building a repeatable recipe. When you click Close and Load, Excel writes the cleaned data into a worksheet table and remembers the recipe for future refreshes.
If you regularly receive monthly versions of the same PDF, the refresh feature transforms your workflow. Save the original PDF to a stable file path or a SharePoint location, then edit the Source step in Power Query to point to that location. Next month when the new PDF arrives, save it over the previous file using the same name, return to Excel, and click Data then Refresh All. Power Query reruns every transformation step against the new file and updates your worksheet. This is the same principle behind dashboards that update overnight without human intervention.
For Excel on Mac, the Get Data From PDF option arrived later and may not appear in all versions. If it is missing, your fallback is to use Power Query on Windows, then save the workbook and open it on Mac, where refresh still works. Alternatively, use a cloud service like Microsoft Power Automate or a flow that converts the PDF to CSV before Excel ever sees it. The principles of clean data and repeatable transformations remain identical regardless of which platform you start from.
One detail many users miss is that Power Query can also pull PDFs directly from a URL. Choose Get Data, From Other Sources, From Web, and paste a link to a publicly hosted PDF such as a government report or financial filing. Excel downloads the file in the background and presents the same Navigator dialog. This is especially useful for analysts who track quarterly investor presentations or regulatory filings, since each new release can be refreshed without manually downloading anything. Tools like how to freeze a row in excel become essential when you later scroll through long imported tables.
Performance varies with file size and complexity. A ten-page financial statement typically imports in under thirty seconds on a modern laptop, but a two-hundred-page audit report with hundreds of tables can take several minutes and consume significant memory. Close other workbooks during large imports, and consider loading data directly to the Power Pivot data model instead of a worksheet if the row count exceeds one million. The data model has no row limit and compresses data efficiently, which keeps your workbook responsive.
PDF Types and How Excel Handles Each
A text-based PDF was generated digitally from a source application such as Word, Excel, or accounting software. The underlying file contains real text strings, font information, and positional coordinates, which means Power Query can read characters directly without any image recognition. These are the easiest PDFs to import and typically produce clean tables with minimal cleanup.
To confirm a PDF is text-based, open it in any reader and try selecting a word with your cursor. If the selection highlights individual characters cleanly and you can copy the text, you have a digital PDF. The Get Data From PDF connector will recognize tables automatically, and your import accuracy should approach ninety-five percent on well-structured documents like bank statements or tax forms.

Should You Use Power Query or Adobe Acrobat for PDF Imports?
- +Power Query is built into Microsoft 365 with no additional license cost
- +Transformations are recorded and replay automatically on refresh
- +Handles multi-table merges and complex column splits natively
- +Loads data directly into the Excel data model for big datasets
- +Works with PDFs stored locally, on SharePoint, or accessed via URL
- +Integrates seamlessly with pivot tables, Power BI, and dashboards
- +Open-source M language can be extended for custom logic
- −Cannot read scanned image-based PDFs without external OCR
- −Mac and web versions of Excel still lag behind Windows features
- −Complex layouts with merged cells often need manual cleanup
- −Large PDFs over fifty megabytes can freeze the import dialog
- −No control over PDF passwords or encrypted document handling
- −Refresh fails silently if the source file path changes
- −Learning curve for the Power Query Editor interface
Pre-Import Checklist for Importing PDF Into Excel
- ✓Verify the PDF is text-based by attempting to select and copy text inside it
- ✓Check that the file is not password protected or encrypted before importing
- ✓Confirm your Excel version is Microsoft 365 or 2021 with Power Query enabled
- ✓Save the PDF to a stable local or SharePoint path that will not move
- ✓Note the total number of pages and tables you expect to find
- ✓Close other large workbooks to free memory before starting the import
- ✓Decide in advance whether to load to a worksheet or the data model
- ✓Identify which columns will need data type changes after import
- ✓Plan how to handle subtotal and header rows that repeat across pages
- ✓Document the source URL or vendor for traceability and auditing
- ✓Test the import on one page before processing the entire document
- ✓Back up the original PDF in case you need to re-run the workflow later
Repeatability is the entire point
Every minute you spend cleaning imported PDF data should be reusable. Power Query records each transformation as a replayable step, so next month the same report imports cleanly with one click. Copy and paste forces you to redo every cleanup task from scratch, which is why finance teams standardize on the Get Data From PDF connector for any recurring file.
Once data lands in your worksheet, the real work begins. Even well-formed PDF tables typically arrive with several common issues that need attention before the numbers are trustworthy. The first is data types. Power Query often defaults numeric columns to text, which means SUM formulas return zero and pivot tables refuse to aggregate. Right-click each numeric column in the Power Query Editor, choose Change Type, and select Whole Number, Decimal, or Currency. Date columns need the same treatment or they will sort alphabetically instead of chronologically.
The second issue is duplicate header rows. Multi-page PDFs repeat their column headers on every page, so when Power Query appends the pages it inserts those headers as data rows throughout your dataset. Filter the first column for the header text and use Remove Rows to drop them in one action. The step gets saved into your query and reapplies automatically the next time you refresh. This is dramatically faster than the manual remove duplicates excel feature once the workflow is set up.
A third common cleanup is splitting combined fields. PDFs sometimes pack multiple data points into a single cell, such as a date and description joined by a space or a name followed by an employee ID in parentheses. Use Split Column by Delimiter in the Transform tab, choose space, comma, or a custom character, and Excel breaks the field into separate columns. You can then rename each new column with meaningful headers and adjust data types as needed for downstream analysis.
Footers and page numbers also love to sneak into your data. They typically appear as a row at the bottom of each imported page with values like Page 1 of 10 or Confidential and Proprietary. The cleanest fix is to add a filter step that excludes any row where a key column is null or where a specific text fragment appears. Power Query supports text-contains filters, regular expressions through advanced M code, and conditional column logic to handle even the messiest layouts.
Merged cells in the source PDF translate to repeated values or unexpected blanks in your import. If the original file used merged cells for category groupings such as a region name spanning several rows, those cells often appear only once with subsequent rows showing null. Use the Fill Down feature in Power Query under Transform then Fill then Down to propagate the value into every row beneath it. This single click can save twenty minutes of manual copying and is essential when preparing data for pivot tables or charts.
Currency and number formatting from international PDFs can cause headaches if your locale differs from the source. European formats use commas for decimals and periods for thousands, while American formats reverse the convention. Power Query lets you specify the locale used to parse each column when changing data types, ensuring that 1.234,56 from a German report becomes 1234.56 in your American workbook. Always check a few sample values after import to confirm the magnitudes match the original PDF before trusting the numbers in calculations.
Finally, document your cleanup. Add a Description to each step in the Applied Steps pane by right-clicking and choosing Properties. Six months from now when someone else inherits the workbook, those descriptions explain why each transformation exists. Combined with how to merge cells in excel knowledge for the presentation layer, you build workbooks that are both robust against data changes and friendly to future maintainers, which is the hallmark of a professional Excel deliverable.

PDFs often contain invisible characters like non-breaking spaces, soft hyphens, and zero-width joiners that survive the import process. These break exact-match VLOOKUPs, INDEX MATCH formulas, and joins to other datasets even when cells look identical. Use the CLEAN and TRIM functions on every text column after import, or add Power Query steps to replace U+00A0 and similar codes with regular spaces before loading the data.
Automation is where importing PDFs becomes a force multiplier. If you process the same vendor invoices, bank statements, or regulatory filings every month, you can build a workflow that takes a folder of new PDFs and produces a clean consolidated worksheet in minutes. The foundation is Power Query's Folder connector. Choose Get Data, From File, From Folder, and point Excel at a directory containing all your PDFs. Excel returns a list of files, and you can apply a custom function that runs the same PDF import logic on each one.
The custom function pattern is straightforward once you see it. First, build a working Power Query that imports a single sample PDF and performs all the cleanup steps you need. Then in the Advanced Editor, convert the query into a function by adding a parameter for the file path. Back in the folder query, click Add Column then Invoke Custom Function, choose your new function, and pass in the binary column from the folder listing. Excel processes every PDF, appends the results, and produces a single combined table ready for analysis.
For teams that want to skip Excel for the initial parsing step entirely, Microsoft Power Automate offers a PDF action library that extracts text, tables, and form fields and writes them to Excel Online or SharePoint lists. Combined with email triggers, a flow can watch a mailbox for incoming invoices, extract the line items automatically, and append them to a master ledger. This eliminates manual handling for high-volume workflows and integrates with approval steps for finance compliance.
Another powerful pattern is combining PDF imports with named ranges and structured tables. Once your data lands in a worksheet, convert the range to a Table using Ctrl plus T. Tables automatically expand when new data is added, formulas reference column names instead of cell coordinates, and pivot tables update their source ranges without manual intervention. This pairs especially well with refreshable Power Query connections because each refresh feeds the latest PDF data straight into a Table that all your downstream analysis depends on.
Privacy and data residency deserve careful attention when automating PDF imports. If you use online converters or cloud OCR services, your data may transit through third-party servers in unknown locations. For confidential financial or personally identifiable information, stick to fully local solutions like Power Query and Adobe Acrobat Pro installed on your own machine, or use enterprise services with signed data processing agreements. Most regulated industries explicitly forbid uploading client data to free online tools, so check your compliance policies first.
Excel for the web introduces additional considerations because some Power Query features are not yet fully supported in browser mode. If you build a workbook that imports PDFs and share it via OneDrive or SharePoint, colleagues opening it in Excel Online may see the cached data but cannot trigger a refresh themselves. The standard solution is to schedule refreshes through Power Automate or to ensure that anyone needing a fresh dataset opens the workbook in the desktop application. Power BI offers an even better path for shared dashboards built on PDF sources.
The skills you build importing PDFs transfer directly to other data sources. The same Power Query Editor handles CSV files, Excel workbooks, SQL databases, web pages, JSON feeds, and SharePoint lists. Once you understand the Applied Steps pattern, the function syntax, and the refresh model, you can connect Excel to virtually any data source in your organization. Combined with foundational skills explored in our standard deviation formula in Excel guide, PDF importing becomes one tool among many in a comprehensive data analysis toolkit.
Practical tips separate the experts from the beginners when working with PDF imports. The first tip is always test on a single page before processing an entire document. Right-click the file in the Navigator and choose Edit to open just one table in the Power Query Editor. Validate that the columns, headers, and data types come through correctly. Only after a single page imports cleanly should you go back and select the full set of pages for processing. This habit saves hours when a PDF turns out to have an unexpected layout change halfway through.
The second tip is to name your queries descriptively from the start. The default names like Table001 and Table002 are meaningless three months later when you reopen the workbook. In the Power Query Editor, right-click each query in the Queries pane and rename it to something like InvoiceLineItems or BankTransactions. This pays dividends every time you maintain the workbook and is especially important if you reference the query from formulas, pivot tables, or other queries via merges and appends.
The third tip concerns memory management. Power Query loads PDF data into memory during processing, and large files can consume several gigabytes briefly. If Excel hangs or crashes mid-import, restart with no other applications running, increase your virtual memory if possible, and consider splitting the PDF into smaller chunks using a free tool like PDFsam Basic. Importing five fifty-page PDFs separately and then appending the results is often more reliable than importing one two-hundred-fifty-page monster file in a single operation.
The fourth tip is to validate totals against the source. Once your data lands in Excel, compute a SUM of key numeric columns and compare it against the totals printed in the original PDF. Discrepancies usually point to header rows that snuck in as data, footer rows that contained subtotals, or rows that fell out due to merged cell handling. This sanity check takes ninety seconds and prevents the embarrassment of presenting analysis built on incomplete or duplicated data to stakeholders.
The fifth tip is to leverage parameters for flexibility. Instead of hardcoding a file path or a date range in your Power Query, define a parameter and reference it throughout your queries. Then when you need to point at a different file or a different period, you change one parameter value and every query updates automatically. Parameters live under Home then Manage Parameters in the Power Query Editor and become indispensable as your workbooks grow more sophisticated.
The sixth tip addresses error handling. Power Query offers a Replace Errors transformation that converts error cells into a value you specify, such as zero or null. Apply this to numeric columns that occasionally contain non-numeric stray characters from the PDF. The query continues to load successfully instead of failing on a single bad row, and you can audit which rows were affected by adding a conditional column that flags them. This produces resilient workflows that survive minor variations in the source PDF format month to month.
The seventh and final tip is to learn just enough M language to extend the standard transformations. M is the formula language behind Power Query, accessible through the Advanced Editor. With basic M skills you can write custom functions, build conditional logic that depends on file contents, and parameterize transformations in ways the graphical interface does not expose. Even a working knowledge of Text.Replace, List.Sum, and Table.SelectRows opens significant capabilities and transforms how you handle any data source, not just PDFs imported into Excel.
Excel Questions and Answers
About the Author
Business Consultant & Professional Certification Advisor
Wharton School, University of PennsylvaniaKatherine Lee earned her MBA from the Wharton School at the University of Pennsylvania and holds CPA, PHR, and PMP certifications. With a background spanning corporate finance, human resources, and project management, she has coached professionals preparing for CPA, CMA, PHR/SPHR, PMP, and financial services licensing exams.