You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The page number "pno" is a 0-based integer `-∞ < pno < page_count`.
2339
+
.. note::
2340
+
2341
+
Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
2342
+
2343
+
However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: `page.get_fonts` == `page.parent.get_page_fonts(page.number)`.
2344
+
2345
+
2346
+
When calling the :ref:`Document` equivalent methods then the page number is sent through as a parameter, e.g.:
2347
+
2348
+
`Document.get_page_images(pno)` or `Document.get_page_text(pno)`
2349
+
2350
+
.. tip::
2351
+
2352
+
The page number parameter, ``pno``, is a 0-based integer `-∞ < pno < page_count`.
2353
+
2354
+
2355
+
2356
+
2357
+
2358
+
Tables and Related Classes
2359
+
------------------------------------
2360
+
2361
+
The `TableFinder` class is returned by :meth:`Page.find_tables` and has related classes as follows:
2340
2362
2341
2363
2342
2364
.. class:: TableFinder
2343
2365
2344
2366
An object always returned by :meth:`Page.find_tables`. Attributes of interest:
2345
2367
2346
-
... attribute:: tables
2368
+
.. attribute:: tables
2347
2369
2348
-
A list of :ref:`Table` objects, each of which represents a table found on the page. Empty list if no table found.
2370
+
A list of :class:`Table` objects, each of which represents a table found on the page. An empty list if no tables are found.
2349
2371
2350
-
... attribute:: page
2372
+
.. attribute:: page
2351
2373
2352
2374
A reference to the :ref:`Page` object.
2353
2375
2354
2376
2355
2377
.. class:: Table
2356
2378
2357
-
An object representing a table found on the page. Attributes of interest:
2379
+
An object representing a table found on the page.
2380
+
2381
+
2382
+
.. attribute:: page
2383
+
2384
+
A description of the page instance for the table.
2385
+
2386
+
:type: `string`
2387
+
2388
+
.. attribute:: cells
2389
+
2390
+
An array of `Rect` objects for each cell in the table.
2391
+
2392
+
:type: list
2393
+
2394
+
2395
+
.. attribute:: header
2396
+
2397
+
A `TableHeader` object if detected.
2398
+
2399
+
:type: `TableHeader`
2400
+
2358
2401
2359
2402
.. attribute:: bbox
2360
2403
2361
2404
The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table.
2362
2405
2363
-
2364
2406
2365
-
.. attribute:: cells
2407
+
:type::ref:`Rect`
2408
+
2409
+
2410
+
2411
+
.. attribute:: row_count
2412
+
2413
+
Number of rows in the table.
2414
+
2415
+
:type: int
2416
+
2417
+
2418
+
.. attribute:: col_count
2419
+
2420
+
Number of columns in the table.
2421
+
2422
+
:type: int
2423
+
2424
+
2425
+
.. attribute:: rows
2426
+
2427
+
An array of `TableRow` objects for each row in the table.
:arg bool clean: If ``True`` then markdown syntax is removed from cell content.
2444
+
:arg bool fill_empty: If ``True`` then cell content `None` is replaced by the values above (columns) or left (rows) in an effort to approximate row and columns spans.
2445
+
2446
+
2447
+
:type: string
2448
+
2449
+
2450
+
.. method:: to_pandas()
2451
+
2452
+
Return a `pandas DataFrame <https://pypi.org/project/pandas/>`_ `DataFrame <https://pandas.pydata.org/docs/reference/frame.html>`_ version of the table.
2453
+
2454
+
:type: pandas DataFrame
2366
2455
2367
2456
2368
2457
2369
2458
.. class:: TableHeader
2370
2459
2371
-
.. class:: TableRow
2372
2460
2461
+
Dedicated class for table headers.
2373
2462
2463
+
.. attribute:: bbox
2374
2464
2465
+
The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table.
2375
2466
2467
+
:type:`Rect`
2376
2468
2377
-
.. note::
2469
+
.. attribute:: cells
2378
2470
2379
-
Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
2471
+
A list of tuples for each bbox of a column header.
2472
+
2473
+
:type: list
2474
+
2475
+
.. attribute:: names
2476
+
2477
+
A list of strings with column header text.
2478
+
2479
+
:type: list
2480
+
2481
+
.. attribute:: external
2482
+
2483
+
A boolean indicating whether the header is outside the table cells.
2484
+
2485
+
:type: `bool`
2486
+
2487
+
2488
+
.. class:: TableRow
2489
+
2490
+
Dedicated class for table rows.
2491
+
2492
+
2493
+
----
2380
2494
2381
-
However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: *page.get_fonts == page.parent.get_page_fonts(page.number)*.
0 commit comments