Excel Practice Test

โ–ถ

Learning how to import PDF into Excel is one of the most practical skills you can pick up in 2026, whether you handle financial statements, vendor invoices, research reports, or scanned forms that arrive locked inside a portable document format.

The good news is that Microsoft has invested heavily in native PDF connectivity through Power Query, meaning you no longer need third-party converters or copy-paste gymnastics to pull structured tables from a PDF into a worksheet. With a few clicks you can preview every table the engine detects, reshape the columns, and load the data straight into an Excel range or data model.

The reason this matters is simple. Most real-world business data still travels as PDF. Banks deliver statements as PDF. Government agencies publish census tables as PDF. Suppliers email purchase orders as PDF. If you cannot quickly transfer those numbers into rows and columns, you cannot analyze, pivot, or chart them. That is why mastering PDF imports sits alongside foundational skills like vlookup excel formulas and how to create a drop down list in excel as a core productivity capability for analysts, accountants, and operations professionals.

This guide walks through every method available in modern Excel, from the built-in Get Data From PDF connector introduced in Microsoft 365 to fallback options for older versions, Excel for Mac, and scanned image-based PDFs that require optical character recognition. You will learn which approach suits a clean digital PDF, which one handles multi-page reports, and which workflow you should reach for when the source file contains hundreds of merged cells, headers, and footers that would otherwise destroy your data integrity.

We will also cover the common pitfalls that trip up beginners. PDFs were designed for visual fidelity, not data exchange, so the same table can render correctly to the human eye while presenting itself to Excel as a tangled mess of overlapping text boxes. Knowing how to spot these problem files in advance saves hours of cleanup. We will share the diagnostic checks that experienced power users run before they ever click Import, including how to test whether a PDF is text-based or scanned bitmap.

By the end of this article you will have a repeatable workflow you can apply to any PDF, plus a troubleshooting playbook for the inevitable cases where the import does not produce a perfect grid on the first attempt. You will also see how Power Query lets you save the transformation steps so that next month, when an updated version of the same PDF arrives, refreshing the data takes a single click. That refresh capability is the real prize and the reason so many finance teams have standardized on this approach.

We will reference real examples throughout, including a sample bank statement PDF, a multi-page sales report with subtotals, and a scanned tax form. Each example highlights different challenges and shows the exact buttons to click, the menus to expand, and the transformations to apply. You will see screenshots described in detail along with the Power Query M code that runs behind the scenes, so you can adapt the steps to your specific files.

Finally, we will look at when to skip Excel entirely and use Adobe Acrobat, Power BI, or specialized OCR services instead. Excel is powerful, but it is not always the right tool for every PDF, especially those with complex layouts spanning multiple columns. Knowing the limits of the native importer helps you choose the fastest path to clean data and protects you from spending an afternoon wrestling with a file that simply was not designed to be parsed.

PDF Import in Excel by the Numbers

๐Ÿ“Š
2020
Get Data From PDF Released
โฑ๏ธ
<60 sec
Average Import Time
๐Ÿ“‘
95%
Table Detection Accuracy
๐Ÿ”
1 click
Refresh Updated Files
๐Ÿ’พ
50 MB
Recommended File Limit
Test Your Excel Skills With Free Import PDF Into Excel Practice Questions

Four Ways to Import a PDF Into Excel

โšก Get Data From PDF

The native Power Query connector in Microsoft 365 and Excel 2021. Detects tables automatically, lets you preview each one, and loads structured data into a worksheet or data model with full refresh support.

๐Ÿ“‹ Copy and Paste

The fastest method for a single small table. Open the PDF in a reader, select the table, copy, and paste into Excel. Works for simple grids but loses formatting and often misaligns columns.

๐Ÿ“„ Adobe Acrobat Export

Acrobat Pro converts PDFs directly to XLSX with high fidelity, preserving headers and merged cells. Best for complex multi-page reports where Power Query struggles with layout detection.

๐Ÿ” OCR Services

For scanned or image-based PDFs, use Adobe OCR, Google Drive, or dedicated tools like ABBYY FineReader to recognize text first, then import the resulting digital PDF using Power Query.

๐ŸŒ Online Converters

Free web tools like Smallpdf or ILovePDF convert PDF to XLSX in seconds. Convenient for one-off jobs but raise data privacy concerns for confidential financial or HR documents.

The native Power Query workflow is the recommended starting point for anyone running Microsoft 365 or Excel 2021 and later on Windows. To begin, open a blank workbook and navigate to the Data tab on the ribbon. Look for the Get Data dropdown on the far left, click it, then expand From File and select From PDF. A file picker appears. Browse to your PDF, select it, and click Import. After a few seconds of processing, the Navigator dialog opens and lists every table and page the engine detected, prefixed with Table001, Table002, Page001, and so on.

The Navigator is where most of the value lives. Click any item to see a live preview on the right side of the window. Tables are highlighted with structured rows and columns, while Pages show the entire page contents as a continuous text block.

For most data work you want to select the table objects rather than the page objects because tables come pre-parsed with header rows and column boundaries. If your data spans multiple pages of a recurring report, select all the matching table entries by holding Ctrl and clicking each one, then use the Transform Data button rather than Load.

Transform Data opens the Power Query Editor, which is the unsung hero of this entire workflow. Here you can append multiple tables into a single dataset using Home then Append Queries, promote the first row to headers, change column data types, filter out subtotal rows, replace stray characters, and split combined fields. Every action you take is recorded as a step in the Applied Steps pane on the right, building a repeatable recipe. When you click Close and Load, Excel writes the cleaned data into a worksheet table and remembers the recipe for future refreshes.

If you regularly receive monthly versions of the same PDF, the refresh feature transforms your workflow. Save the original PDF to a stable file path or a SharePoint location, then edit the Source step in Power Query to point to that location. Next month when the new PDF arrives, save it over the previous file using the same name, return to Excel, and click Data then Refresh All. Power Query reruns every transformation step against the new file and updates your worksheet. This is the same principle behind dashboards that update overnight without human intervention.

For Excel on Mac, the Get Data From PDF option arrived later and may not appear in all versions. If it is missing, your fallback is to use Power Query on Windows, then save the workbook and open it on Mac, where refresh still works. Alternatively, use a cloud service like Microsoft Power Automate or a flow that converts the PDF to CSV before Excel ever sees it. The principles of clean data and repeatable transformations remain identical regardless of which platform you start from.

One detail many users miss is that Power Query can also pull PDFs directly from a URL. Choose Get Data, From Other Sources, From Web, and paste a link to a publicly hosted PDF such as a government report or financial filing. Excel downloads the file in the background and presents the same Navigator dialog. This is especially useful for analysts who track quarterly investor presentations or regulatory filings, since each new release can be refreshed without manually downloading anything. Tools like how to freeze a row in excel become essential when you later scroll through long imported tables.

Performance varies with file size and complexity. A ten-page financial statement typically imports in under thirty seconds on a modern laptop, but a two-hundred-page audit report with hundreds of tables can take several minutes and consume significant memory. Close other workbooks during large imports, and consider loading data directly to the Power Pivot data model instead of a worksheet if the row count exceeds one million. The data model has no row limit and compresses data efficiently, which keeps your workbook responsive.

FREE Excel Basic and Advance Questions and Answers
Practice both beginner and advanced Excel skills with detailed explanations covering imports, formulas, and data handling.
FREE Excel Formulas Questions and Answers
Master core Excel formulas through targeted practice questions covering lookups, math, text, and date functions.

PDF Types and How Excel Handles Each

๐Ÿ“‹ Text-Based PDF

A text-based PDF was generated digitally from a source application such as Word, Excel, or accounting software. The underlying file contains real text strings, font information, and positional coordinates, which means Power Query can read characters directly without any image recognition. These are the easiest PDFs to import and typically produce clean tables with minimal cleanup.

To confirm a PDF is text-based, open it in any reader and try selecting a word with your cursor. If the selection highlights individual characters cleanly and you can copy the text, you have a digital PDF. The Get Data From PDF connector will recognize tables automatically, and your import accuracy should approach ninety-five percent on well-structured documents like bank statements or tax forms.

๐Ÿ“‹ Scanned Image PDF

A scanned PDF is essentially a photograph of a printed page wrapped in PDF packaging. The file contains image data, not text, so Power Query sees only pixels and cannot extract any meaningful structure. If you try to import one, Excel will either return an empty Navigator or surface page objects with no detected tables, leaving you with nothing usable.

The fix is optical character recognition, or OCR. Run the file through Adobe Acrobat Pro using Tools then Scan and OCR then Recognize Text, or upload it to Google Drive, right-click, and choose Open With Google Docs to extract text. Save the result as a new searchable PDF, then re-import using Power Query. Accuracy depends on scan quality, font clarity, and language support.

๐Ÿ“‹ Hybrid PDF

Hybrid PDFs mix digital text with embedded images, often when someone scans a signed form that originally started as a digital document. These files behave unpredictably during import because some pages parse cleanly while others appear as blank pages or bitmap blocks. You may see partial tables, missing rows, or columns with garbled characters mixed with clean data.

The reliable solution is to flatten the file through OCR even if parts of it are already text. Acrobat has a Recognize Text option that processes only the image regions and leaves digital text untouched. Once flattened, the PDF behaves like a clean text-based file and Power Query handles it consistently. Always test imports page by page when working with hybrid documents.

Should You Use Power Query or Adobe Acrobat for PDF Imports?

Pros

  • Power Query is built into Microsoft 365 with no additional license cost
  • Transformations are recorded and replay automatically on refresh
  • Handles multi-table merges and complex column splits natively
  • Loads data directly into the Excel data model for big datasets
  • Works with PDFs stored locally, on SharePoint, or accessed via URL
  • Integrates seamlessly with pivot tables, Power BI, and dashboards
  • Open-source M language can be extended for custom logic

Cons

  • Cannot read scanned image-based PDFs without external OCR
  • Mac and web versions of Excel still lag behind Windows features
  • Complex layouts with merged cells often need manual cleanup
  • Large PDFs over fifty megabytes can freeze the import dialog
  • No control over PDF passwords or encrypted document handling
  • Refresh fails silently if the source file path changes
  • Learning curve for the Power Query Editor interface
FREE Excel Functions Questions and Answers
Sharpen your understanding of Excel functions with quiz questions covering text, lookup, logical, and date families.
FREE Excel MCQ Questions and Answers
Multiple choice questions on Excel features including data import, Power Query, and worksheet management techniques.

Pre-Import Checklist for Importing PDF Into Excel

Verify the PDF is text-based by attempting to select and copy text inside it
Check that the file is not password protected or encrypted before importing
Confirm your Excel version is Microsoft 365 or 2021 with Power Query enabled
Save the PDF to a stable local or SharePoint path that will not move
Note the total number of pages and tables you expect to find
Close other large workbooks to free memory before starting the import
Decide in advance whether to load to a worksheet or the data model
Identify which columns will need data type changes after import
Plan how to handle subtotal and header rows that repeat across pages
Document the source URL or vendor for traceability and auditing
Test the import on one page before processing the entire document
Back up the original PDF in case you need to re-run the workflow later
Repeatability is the entire point

Every minute you spend cleaning imported PDF data should be reusable. Power Query records each transformation as a replayable step, so next month the same report imports cleanly with one click. Copy and paste forces you to redo every cleanup task from scratch, which is why finance teams standardize on the Get Data From PDF connector for any recurring file.

Once data lands in your worksheet, the real work begins. Even well-formed PDF tables typically arrive with several common issues that need attention before the numbers are trustworthy. The first is data types. Power Query often defaults numeric columns to text, which means SUM formulas return zero and pivot tables refuse to aggregate. Right-click each numeric column in the Power Query Editor, choose Change Type, and select Whole Number, Decimal, or Currency. Date columns need the same treatment or they will sort alphabetically instead of chronologically.

The second issue is duplicate header rows. Multi-page PDFs repeat their column headers on every page, so when Power Query appends the pages it inserts those headers as data rows throughout your dataset. Filter the first column for the header text and use Remove Rows to drop them in one action. The step gets saved into your query and reapplies automatically the next time you refresh. This is dramatically faster than the manual remove duplicates excel feature once the workflow is set up.

A third common cleanup is splitting combined fields. PDFs sometimes pack multiple data points into a single cell, such as a date and description joined by a space or a name followed by an employee ID in parentheses. Use Split Column by Delimiter in the Transform tab, choose space, comma, or a custom character, and Excel breaks the field into separate columns. You can then rename each new column with meaningful headers and adjust data types as needed for downstream analysis.

Footers and page numbers also love to sneak into your data. They typically appear as a row at the bottom of each imported page with values like Page 1 of 10 or Confidential and Proprietary. The cleanest fix is to add a filter step that excludes any row where a key column is null or where a specific text fragment appears. Power Query supports text-contains filters, regular expressions through advanced M code, and conditional column logic to handle even the messiest layouts.

Merged cells in the source PDF translate to repeated values or unexpected blanks in your import. If the original file used merged cells for category groupings such as a region name spanning several rows, those cells often appear only once with subsequent rows showing null. Use the Fill Down feature in Power Query under Transform then Fill then Down to propagate the value into every row beneath it. This single click can save twenty minutes of manual copying and is essential when preparing data for pivot tables or charts.

Currency and number formatting from international PDFs can cause headaches if your locale differs from the source. European formats use commas for decimals and periods for thousands, while American formats reverse the convention. Power Query lets you specify the locale used to parse each column when changing data types, ensuring that 1.234,56 from a German report becomes 1234.56 in your American workbook. Always check a few sample values after import to confirm the magnitudes match the original PDF before trusting the numbers in calculations.

Finally, document your cleanup. Add a Description to each step in the Applied Steps pane by right-clicking and choosing Properties. Six months from now when someone else inherits the workbook, those descriptions explain why each transformation exists. Combined with how to merge cells in excel knowledge for the presentation layer, you build workbooks that are both robust against data changes and friendly to future maintainers, which is the hallmark of a professional Excel deliverable.

Automation is where importing PDFs becomes a force multiplier. If you process the same vendor invoices, bank statements, or regulatory filings every month, you can build a workflow that takes a folder of new PDFs and produces a clean consolidated worksheet in minutes. The foundation is Power Query's Folder connector. Choose Get Data, From File, From Folder, and point Excel at a directory containing all your PDFs. Excel returns a list of files, and you can apply a custom function that runs the same PDF import logic on each one.

The custom function pattern is straightforward once you see it. First, build a working Power Query that imports a single sample PDF and performs all the cleanup steps you need. Then in the Advanced Editor, convert the query into a function by adding a parameter for the file path. Back in the folder query, click Add Column then Invoke Custom Function, choose your new function, and pass in the binary column from the folder listing. Excel processes every PDF, appends the results, and produces a single combined table ready for analysis.

For teams that want to skip Excel for the initial parsing step entirely, Microsoft Power Automate offers a PDF action library that extracts text, tables, and form fields and writes them to Excel Online or SharePoint lists. Combined with email triggers, a flow can watch a mailbox for incoming invoices, extract the line items automatically, and append them to a master ledger. This eliminates manual handling for high-volume workflows and integrates with approval steps for finance compliance.

Another powerful pattern is combining PDF imports with named ranges and structured tables. Once your data lands in a worksheet, convert the range to a Table using Ctrl plus T. Tables automatically expand when new data is added, formulas reference column names instead of cell coordinates, and pivot tables update their source ranges without manual intervention. This pairs especially well with refreshable Power Query connections because each refresh feeds the latest PDF data straight into a Table that all your downstream analysis depends on.

Privacy and data residency deserve careful attention when automating PDF imports. If you use online converters or cloud OCR services, your data may transit through third-party servers in unknown locations. For confidential financial or personally identifiable information, stick to fully local solutions like Power Query and Adobe Acrobat Pro installed on your own machine, or use enterprise services with signed data processing agreements. Most regulated industries explicitly forbid uploading client data to free online tools, so check your compliance policies first.

Excel for the web introduces additional considerations because some Power Query features are not yet fully supported in browser mode. If you build a workbook that imports PDFs and share it via OneDrive or SharePoint, colleagues opening it in Excel Online may see the cached data but cannot trigger a refresh themselves. The standard solution is to schedule refreshes through Power Automate or to ensure that anyone needing a fresh dataset opens the workbook in the desktop application. Power BI offers an even better path for shared dashboards built on PDF sources.

The skills you build importing PDFs transfer directly to other data sources. The same Power Query Editor handles CSV files, Excel workbooks, SQL databases, web pages, JSON feeds, and SharePoint lists. Once you understand the Applied Steps pattern, the function syntax, and the refresh model, you can connect Excel to virtually any data source in your organization. Combined with foundational skills explored in our standard deviation formula in Excel guide, PDF importing becomes one tool among many in a comprehensive data analysis toolkit.

Practice Excel Formulas and Data Import Questions Now

Practical tips separate the experts from the beginners when working with PDF imports. The first tip is always test on a single page before processing an entire document. Right-click the file in the Navigator and choose Edit to open just one table in the Power Query Editor. Validate that the columns, headers, and data types come through correctly. Only after a single page imports cleanly should you go back and select the full set of pages for processing. This habit saves hours when a PDF turns out to have an unexpected layout change halfway through.

The second tip is to name your queries descriptively from the start. The default names like Table001 and Table002 are meaningless three months later when you reopen the workbook. In the Power Query Editor, right-click each query in the Queries pane and rename it to something like InvoiceLineItems or BankTransactions. This pays dividends every time you maintain the workbook and is especially important if you reference the query from formulas, pivot tables, or other queries via merges and appends.

The third tip concerns memory management. Power Query loads PDF data into memory during processing, and large files can consume several gigabytes briefly. If Excel hangs or crashes mid-import, restart with no other applications running, increase your virtual memory if possible, and consider splitting the PDF into smaller chunks using a free tool like PDFsam Basic. Importing five fifty-page PDFs separately and then appending the results is often more reliable than importing one two-hundred-fifty-page monster file in a single operation.

The fourth tip is to validate totals against the source. Once your data lands in Excel, compute a SUM of key numeric columns and compare it against the totals printed in the original PDF. Discrepancies usually point to header rows that snuck in as data, footer rows that contained subtotals, or rows that fell out due to merged cell handling. This sanity check takes ninety seconds and prevents the embarrassment of presenting analysis built on incomplete or duplicated data to stakeholders.

The fifth tip is to leverage parameters for flexibility. Instead of hardcoding a file path or a date range in your Power Query, define a parameter and reference it throughout your queries. Then when you need to point at a different file or a different period, you change one parameter value and every query updates automatically. Parameters live under Home then Manage Parameters in the Power Query Editor and become indispensable as your workbooks grow more sophisticated.

The sixth tip addresses error handling. Power Query offers a Replace Errors transformation that converts error cells into a value you specify, such as zero or null. Apply this to numeric columns that occasionally contain non-numeric stray characters from the PDF. The query continues to load successfully instead of failing on a single bad row, and you can audit which rows were affected by adding a conditional column that flags them. This produces resilient workflows that survive minor variations in the source PDF format month to month.

The seventh and final tip is to learn just enough M language to extend the standard transformations. M is the formula language behind Power Query, accessible through the Advanced Editor. With basic M skills you can write custom functions, build conditional logic that depends on file contents, and parameterize transformations in ways the graphical interface does not expose. Even a working knowledge of Text.Replace, List.Sum, and Table.SelectRows opens significant capabilities and transforms how you handle any data source, not just PDFs imported into Excel.

FREE Excel Questions and Answers
Comprehensive Excel certification practice covering data import, formulas, formatting, and advanced analysis features.
FREE Excel Trivia Questions and Answers
Fun trivia-style questions to test your Excel knowledge across history, features, shortcuts, and lesser-known tricks.

Excel Questions and Answers

How do I import a PDF into Excel without losing formatting?

Use the Get Data From PDF connector under the Data tab in Microsoft 365 or Excel 2021. Power Query detects tables automatically and presents a Navigator dialog where you preview each one. Click Transform Data instead of Load to enter the editor, where you can clean headers, fix data types, and remove footers before sending the data to your worksheet. The transformations save as a refreshable recipe.

Why does my PDF import show no tables in the Navigator?

This almost always means your PDF is scanned image data rather than digital text. Power Query cannot extract structure from images. Run the file through optical character recognition using Adobe Acrobat Pro, Google Drive's Open With Google Docs feature, or a dedicated OCR tool like ABBYY FineReader. Save the OCR result as a new searchable PDF, then re-import. Scan quality and font clarity directly affect accuracy.

Can I import a PDF into Excel on Mac?

Recent versions of Excel for Mac running Microsoft 365 support Get Data From PDF, though the feature arrived later than on Windows. If the option is missing on your Mac, the workaround is to import the PDF on a Windows machine, save the workbook to OneDrive, and open it on Mac. Power Query refresh still works on Mac for workbooks created on Windows in most cases, preserving the data pipeline.

How do I import multiple PDFs at once into Excel?

Use Power Query's From Folder connector under Data, Get Data, From File, From Folder. Point it at a directory containing all your PDFs, then write a custom function that imports a single PDF with your cleanup steps. Invoke that function over the folder list using Add Column, Invoke Custom Function. Excel processes every file and appends the results into one consolidated table ready for analysis.

What is the maximum PDF file size Excel can import?

There is no hard limit, but practical performance degrades above fifty megabytes or two hundred pages. Power Query loads the entire PDF into memory during parsing, and large files can freeze Excel briefly. Split very large PDFs using a free tool like PDFsam Basic into smaller chunks, import each separately, then append the results. This approach is more reliable and provides better visibility into where any errors occur.

How do I refresh data when the PDF is updated each month?

Save the new PDF over the previous file using the same name and path. Open your Excel workbook and click Data then Refresh All. Power Query reruns every transformation step against the updated file and writes the new data into your worksheet. Schedule automated refreshes through Power Automate or Power BI if you need updates without opening the workbook manually. Stable file paths are essential for this workflow.

Why are numbers from my PDF coming in as text in Excel?

Power Query often defaults numeric columns to text when it sees mixed content, currency symbols, or thousands separators. Open the Power Query Editor, right-click each numeric column, choose Change Type, and select Whole Number, Decimal, or Currency. Specify a locale if the PDF uses European number formatting with commas as decimal separators. This conversion ensures SUM and pivot tables work correctly with the imported data.

Can I import a password-protected PDF into Excel?

No, Power Query cannot directly read encrypted or password-protected PDFs. You must remove the password first using Adobe Acrobat Pro or a tool like PDFtk if you have legal authorization to do so. Open the PDF, enter the password, and use Save As to create an unlocked copy. Then import the unlocked file using the standard Get Data From PDF workflow. Never attempt to bypass passwords on documents you do not own.

How accurate is Excel's PDF import for complex tables?

For text-based PDFs with clean column boundaries, accuracy approaches ninety-five percent. Complex layouts with merged cells, multi-line headers, or cells that span multiple pages drop accuracy significantly. In those cases, Adobe Acrobat Pro's Export to Excel often performs better because it uses tagged PDF structure when available. Always validate totals against the source PDF after import to catch alignment issues, dropped rows, or duplicated header data.

What is the difference between Tables and Pages in the Navigator?

Tables are data structures Power Query detected with rows, columns, and inferred headers, ready to load directly into a worksheet. Pages are unstructured text dumps of entire pages, useful when the engine fails to detect any table structure. Always prefer Tables when available. Use Pages only as a fallback to extract text from PDFs where automatic detection misses your data, knowing you will need significant manual parsing afterward.
โ–ถ Start Quiz