Skip to content

Commit 8d196e6

Browse files
committed
GH-546: Migrate Sphinx documentation
Fixes #546.
1 parent 7cfc89d commit 8d196e6

29 files changed

+5095
-0
lines changed

.github/workflows/rc.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,38 @@ jobs:
412412
with:
413413
name: release-docs
414414
path: docs.tar.gz
415+
docs:
416+
name: Docs
417+
needs:
418+
- source
419+
runs-on: ubuntu-latest
420+
permissions:
421+
contents: read
422+
packages: write
423+
steps:
424+
- uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
425+
with:
426+
cache: 'pip'
427+
- name: Install Sphinx
428+
run: pip install -r docs/requirements.txt
429+
- name: Download source archive
430+
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
431+
with:
432+
name: release-source
433+
- name: Extract source archive
434+
run: |
435+
tar -xf apache-arrow-java-*.tar.gz --strip-components=1
436+
- name: Build
437+
run: |
438+
cd docs
439+
make html
440+
- name: Compress into single artifact to keep directory structure
441+
run: tar -cvzf docs.tar.gz -C docs/build html
442+
- name: Upload artifacts
443+
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08 # v4.6.0
444+
with:
445+
name: release-sphinx-docs
446+
path: docs.tar.gz
415447
verify:
416448
name: Verify
417449
needs:
@@ -452,6 +484,7 @@ jobs:
452484
name: Upload
453485
if: github.ref_type == 'tag'
454486
needs:
487+
- docs
455488
- verify
456489
runs-on: ubuntu-latest
457490
permissions:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
/dev/release/apache-rat-0.16.1.jar
2121
/dev/release/filtered_rat.txt
2222
/dev/release/rat.xml
23+
/docs/build/
2324
CMakeCache.txt
2425
CMakeFiles/
2526
Makefile

docs/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Documentation
21+
22+
Build with Sphinx.
23+
24+
```
25+
cd docs
26+
pip install -r requirements.txt
27+
make html
28+
```

docs/requirements.txt

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
furo==2024.8.6
19+
myst-parser==4.0.0
20+
Sphinx==8.1.3
21+
sphinx-autobuild==2024.10.3
22+
sphinx-basic-ng==1.0.0b2
23+
sphinxcontrib-applehelp==2.0.0
24+
sphinxcontrib-devhelp==2.0.0
25+
sphinxcontrib-htmlhelp==2.1.0
26+
sphinxcontrib-jsmath==1.0.1
27+
sphinxcontrib-qthelp==2.0.0
28+
sphinxcontrib-serializinghtml==2.0.0

docs/source/algorithm.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
Java Algorithms
19+
===============
20+
21+
Arrow's Java library provides algorithms for some commonly-used
22+
functionalities. The algorithms are provided in the ``org.apache.arrow.algorithm``
23+
package of the ``algorithm`` module.
24+
25+
Comparing Vector Elements
26+
-------------------------
27+
28+
Comparing vector elements is the basic for many algorithms. Vector
29+
elements can be compared in one of the two ways:
30+
31+
1. **Equality comparison**: there are two possible results for this type of comparisons: ``equal`` and ``unequal``.
32+
Currently, this type of comparison is supported through the ``org.apache.arrow.vector.compare.VectorValueEqualizer``
33+
interface.
34+
35+
2. **Ordering comparison**: there are three possible results for this type of comparisons: ``less than``, ``equal to``
36+
and ``greater than``. This comparison is supported by the abstract class ``org.apache.arrow.algorithm.sort.VectorValueComparator``.
37+
38+
We provide default implementations to compare vector elements. However, users can also define ways
39+
for customized comparisons.
40+
41+
Vector Element Search
42+
---------------------
43+
44+
A search algorithm tries to find a particular value in a vector. When successful, a vector index is
45+
returned; otherwise, a ``-1`` is returned. The following search algorithms are provided:
46+
47+
1. **Linear search**: this algorithm simply traverses the vector from the beginning, until a match is
48+
found, or the end of the vector is reached. So it takes ``O(n)`` time, where ``n`` is the number of elements
49+
in the vector. This algorithm is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#linearSearch``.
50+
51+
2. **Binary search**: this represents a more efficient search algorithm, as it runs in ``O(log(n))`` time.
52+
However, it is only applicable to sorted vectors. To get a sorted vector,
53+
one can use one of our sorting algorithms, which will be discussed in the next section. This algorithm
54+
is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#binarySearch``.
55+
56+
3. **Parallel search**: when the vector is large, it takes a long time to traverse the elements to search
57+
for a value. To make this process faster, one can split the vector into multiple partitions, and perform the
58+
search for each partition in parallel. This is supported by ``org.apache.arrow.algorithm.search.ParallelSearcher``.
59+
60+
4. **Range search**: for many scenarios, there can be multiple matching values in the vector.
61+
If the vector is sorted, the matching values reside in a contiguous region in the vector. The
62+
range search algorithm tries to find the upper/lower bound of the region in ``O(log(n))`` time.
63+
An implementation is provided in ``org.apache.arrow.algorithm.search.VectorRangeSearcher``.
64+
65+
Vector Sorting
66+
--------------
67+
68+
Given a vector, a sorting algorithm turns it into a sorted one. The sorting criteria must
69+
be specified by some ordering comparison operation. The sorting algorithms can be
70+
classified into the following categories:
71+
72+
1. **In-place sorter**: an in-place sorter performs the sorting by manipulating the original
73+
vector, without creating any new vector. So it just returns the original vector after the sorting operations.
74+
Currently, we have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter`` for in-place
75+
sorting in ``O(nlog(n))`` time. As the name suggests, it only supports fixed width vectors.
76+
77+
2. **Out-of-place sorter**: an out-of-place sorter does not mutate the original vector. Instead,
78+
it copies vector elements to a new vector in sorted order, and returns the new vector.
79+
We have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.FixedWidthOutOfPlaceVectorSorter``
80+
and ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.VariableWidthOutOfPlaceVectorSorter``
81+
for fixed width and variable width vectors, respectively. Both algorithms run in ``O(nlog(n))`` time.
82+
83+
3. **Index sorter**: this sorter does not actually sort the vector. Instead, it returns an integer
84+
vector, which correspond to indices of vector elements in sorted order. With the index vector, one can
85+
easily construct a sorted vector. In addition, some other tasks can be easily achieved, like finding the ``k`` th
86+
smallest value in the vector. Index sorting is supported by ``org.apache.arrow.algorithm.sort.IndexSorter``,
87+
which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type.
88+
89+
Other Algorithms
90+
----------------
91+
92+
Other algorithms include vector deduplication, dictionary encoding, etc., in the ``algorithm`` module.

0 commit comments

Comments
 (0)