PDF Extractor SDK for Windows software developers: PDF to Text, PDF to XML, Images from PDF, Read PDF information, PDF to CSV for Excel.
Bytescout PDF Extractor SDK allows to convert PDF to text, PDF to XML, PDF to CSV, extract images from PDF, extract information about PDF files in .NET and ActiveX interfaces without any additional software required.
converts PDF to plain text (and can follow columns if you converting a newspaper in PDF format) - including invisible text extraction;
converts tables in PDF to Excel (CSV) by reading cells from given rectangle;
converts tables in PDF to XML files;
extracts PDF file metadata (title, author, description) and get other information about the file (number of pages, encrypted or not);
extracts embedded images from PDF document (in ASP.NET, VB.NET, C#, VB6 and VBScript);
DocumentMerger and DocumentSplitter interfaces and classes to merge and split PDF documents;
doesn't require Adobe Reader or any other PDF reader software to be installed;
provides .NET and ActiveX interfaces;
made with 100% managed C# code.
Version 188.8.131.5279: Added filtering of extracted content by font name, font size and color.
Updated OCR engine to the latest version. Update language files from 'tessdata' folder.
Improved text extraction, lines grouping in tabular data, performance, XFA forms extraction, TableDetector, fixed PDF parsing issues.