Skip to content

Commit 07e262c

Browse files
committed
Documentation: Elaborates on Table related classes.
1 parent 4a53405 commit 07e262c

File tree

1 file changed

+129
-16
lines changed

1 file changed

+129
-16
lines changed

docs/page.rst

Lines changed: 129 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2329,56 +2329,169 @@ This is an overview of homologous methods on the :ref:`Document` and on the :ref
23292329
====================================== =====================================
23302330
**Document Level** **Page Level**
23312331
====================================== =====================================
2332-
*Document.get_page_fonts(pno)* :meth:`Page.get_fonts`
2333-
*Document.get_page_images(pno)* :meth:`Page.get_images`
2334-
*Document.get_page_pixmap(pno, ...)* :meth:`Page.get_pixmap`
2335-
*Document.get_page_text(pno, ...)* :meth:`Page.get_text`
2336-
*Document.search_page_for(pno, ...)* :meth:`Page.search_for`
2332+
:meth:`Document.get_page_fonts` :meth:`Page.get_fonts`
2333+
:meth:`Document.get_page_images` :meth:`Page.get_images`
2334+
:meth:`Document.get_page_pixmap` :meth:`Page.get_pixmap`
2335+
:meth:`Document.get_page_text` :meth:`Page.get_text`
2336+
:meth:`Document.search_page_for` :meth:`Page.search_for`
23372337
====================================== =====================================
23382338

2339-
The page number "pno" is a 0-based integer `-∞ < pno < page_count`.
2339+
.. note::
2340+
2341+
Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
2342+
2343+
However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: `page.get_fonts` == `page.parent.get_page_fonts(page.number)`.
2344+
2345+
2346+
When calling the :ref:`Document` equivalent methods then the page number is sent through as a parameter, e.g.:
2347+
2348+
`Document.get_page_images(pno)` or `Document.get_page_text(pno)`
2349+
2350+
.. tip::
2351+
2352+
The page number parameter, ``pno``, is a 0-based integer `-∞ < pno < page_count`.
2353+
2354+
2355+
2356+
2357+
2358+
Tables and Related Classes
2359+
------------------------------------
2360+
2361+
The `TableFinder` class is returned by :meth:`Page.find_tables` and has related classes as follows:
23402362

23412363

23422364
.. class:: TableFinder
23432365

23442366
An object always returned by :meth:`Page.find_tables`. Attributes of interest:
23452367

2346-
... attribute:: tables
2368+
.. attribute:: tables
23472369

2348-
A list of :ref:`Table` objects, each of which represents a table found on the page. Empty list if no table found.
2370+
A list of :class:`Table` objects, each of which represents a table found on the page. An empty list if no tables are found.
23492371

2350-
... attribute:: page
2372+
.. attribute:: page
23512373

23522374
A reference to the :ref:`Page` object.
23532375

23542376

23552377
.. class:: Table
23562378

2357-
An object representing a table found on the page. Attributes of interest:
2379+
An object representing a table found on the page.
2380+
2381+
2382+
.. attribute:: page
2383+
2384+
A description of the page instance for the table.
2385+
2386+
:type: `string`
2387+
2388+
.. attribute:: cells
2389+
2390+
An array of `Rect` objects for each cell in the table.
2391+
2392+
:type: list
2393+
2394+
2395+
.. attribute:: header
2396+
2397+
A `TableHeader` object if detected.
2398+
2399+
:type: `TableHeader`
2400+
23582401

23592402
.. attribute:: bbox
23602403

23612404
The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table.
23622405

2363-
23642406

2365-
.. attribute:: cells
2407+
:type: :ref:`Rect`
2408+
2409+
2410+
2411+
.. attribute:: row_count
2412+
2413+
Number of rows in the table.
2414+
2415+
:type: int
2416+
2417+
2418+
.. attribute:: col_count
2419+
2420+
Number of columns in the table.
2421+
2422+
:type: int
2423+
2424+
2425+
.. attribute:: rows
2426+
2427+
An array of `TableRow` objects for each row in the table.
2428+
2429+
:type: list
2430+
2431+
2432+
.. method:: extract()
2433+
2434+
Extracts table data into a list.
2435+
2436+
:type: list
2437+
2438+
.. method:: to_markdown(clean=False, fill_empty=True)
2439+
2440+
Extracts table data into a list.
2441+
2442+
2443+
:arg bool clean: If ``True`` then markdown syntax is removed from cell content.
2444+
:arg bool fill_empty: If ``True`` then cell content `None` is replaced by the values above (columns) or left (rows) in an effort to approximate row and columns spans.
2445+
2446+
2447+
:type: string
2448+
2449+
2450+
.. method:: to_pandas()
2451+
2452+
Return a `pandas DataFrame <https://pypi.org/project/pandas/>`_ `DataFrame <https://pandas.pydata.org/docs/reference/frame.html>`_ version of the table.
2453+
2454+
:type: pandas DataFrame
23662455

23672456

23682457

23692458
.. class:: TableHeader
23702459

2371-
.. class:: TableRow
23722460

2461+
Dedicated class for table headers.
23732462

2463+
.. attribute:: bbox
23742464

2465+
The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table.
23752466

2467+
:type:`Rect`
23762468

2377-
.. note::
2469+
.. attribute:: cells
23782470

2379-
Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
2471+
A list of tuples for each bbox of a column header.
2472+
2473+
:type: list
2474+
2475+
.. attribute:: names
2476+
2477+
A list of strings with column header text.
2478+
2479+
:type: list
2480+
2481+
.. attribute:: external
2482+
2483+
A boolean indicating whether the header is outside the table cells.
2484+
2485+
:type: `bool`
2486+
2487+
2488+
.. class:: TableRow
2489+
2490+
Dedicated class for table rows.
2491+
2492+
2493+
----
23802494

2381-
However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: *page.get_fonts == page.parent.get_page_fonts(page.number)*.
23822495

23832496
.. rubric:: Footnotes
23842497

0 commit comments

Comments
 (0)