Learn more  » Push, build, and install  RubyGems npm packages Python packages Maven artifacts PHP packages Go Modules Bower components Debian packages RPM packages NuGet packages

arrow-nightlies / nanoarrow   python

Repository URL to install this package:

Version: 0.7.0.dev132 

/ PKG-INFO

Metadata-Version: 2.1
Name: nanoarrow
Version: 0.7.0.dev132
Summary: Python bindings to the nanoarrow C library
Author-email: Apache Arrow Developers <dev@arrow.apache.org>
Maintainer-email: Apache Arrow Developers <dev@arrow.apache.org>
License: Apache-2.0
Project-URL: Homepage, https://arrow.apache.org
Project-URL: Repository, https://github.com/apache/arrow-nanoarrow
Project-URL: Issues, https://github.com/apache/arrow-nanoarrow/issues
Project-URL: Changelog, https://github.com/apache/arrow-nanoarrow/blob/main/CHANGELOG.md
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: test
Requires-Dist: pyarrow; extra == "test"
Requires-Dist: python-dateutil; extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: numpy; extra == "test"
Provides-Extra: verify
Requires-Dist: python-dateutil; extra == "verify"
Requires-Dist: pytest; extra == "verify"
Requires-Dist: numpy; extra == "verify"

<!---
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
-->

<!-- Render with jupyter nbconvert --to markdown README.ipynb -->

# nanoarrow for Python

The nanoarrow Python package provides bindings to the nanoarrow C library. Like
the nanoarrow C library, it provides tools to facilitate the use of the
[Arrow C Data](https://arrow.apache.org/docs/format/CDataInterface.html)
and [Arrow C Stream](https://arrow.apache.org/docs/format/CStreamInterface.html)
interfaces.

## Installation

The nanoarrow Python bindings are available from [PyPI](https://pypi.org/) and
[conda-forge](https://conda-forge.org/):

```shell
pip install nanoarrow
conda install nanoarrow -c conda-forge
```

Development versions (based on the `main` branch) are also available:

```shell
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ \
    --prefer-binary --pre nanoarrow
```

If you can import the namespace, you're good to go!


```python
import nanoarrow as na
```

## Data types, arrays, and array streams

The Arrow C Data and Arrow C Stream interfaces are comprised of three structures: the `ArrowSchema` which represents a data type of an array, the `ArrowArray` which represents the values of an array, and an `ArrowArrayStream`, which represents zero or more `ArrowArray`s with a common `ArrowSchema`. These concepts map to the `nanoarrow.Schema`, `nanoarrow.Array`, and `nanoarrow.ArrayStream` in the Python package.


```python
na.int32()
```




    <Schema> int32




```python
na.Array([1, 2, 3], na.int32())
```




    nanoarrow.Array<int32>[3]
    1
    2
    3



The `nanoarrow.Array` can accommodate arrays with any number of chunks, reflecting the reality that many array containers (e.g., `pyarrow.ChunkedArray`, `polars.Series`) support this.


```python
chunked = na.Array.from_chunks([[1, 2, 3], [4, 5, 6]], na.int32())
chunked
```




    nanoarrow.Array<int32>[6]
    1
    2
    3
    4
    5
    6



Whereas chunks of an `Array` are always fully materialized when the object is constructed, the chunks of an `ArrayStream` have not necessarily been resolved yet.


```python
stream = na.ArrayStream(chunked)
stream
```




    nanoarrow.ArrayStream<int32>




```python
with stream:
    for chunk in stream:
        print(chunk)
```

    nanoarrow.Array<int32>[3]
    1
    2
    3
    nanoarrow.Array<int32>[3]
    4
    5
    6


The `nanoarrow.ArrayStream` also provides an interface to nanoarrow's [Arrow IPC](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc) reader:


```python
url = "https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows"
na.ArrayStream.from_url(url)
```




    nanoarrow.ArrayStream<non-nullable struct<commit: string, time: timestamp('us', 'UTC'), files: int3...>



These objects implement the [Arrow PyCapsule interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) for both producing and consuming and are interchangeable with `pyarrow` objects in many cases:


```python
import pyarrow as pa

pa.field(na.int32())
```




    pyarrow.Field<: int32>




```python
pa.chunked_array(chunked)
```




    <pyarrow.lib.ChunkedArray object at 0x12a49a250>
    [
      [
        1,
        2,
        3
      ],
      [
        4,
        5,
        6
      ]
    ]




```python
pa.array(chunked.chunk(1))
```




    <pyarrow.lib.Int32Array object at 0x11b552500>
    [
      4,
      5,
      6
    ]




```python
na.Array(pa.array([10, 11, 12]))
```




    nanoarrow.Array<int64>[3]
    10
    11
    12




```python
na.Schema(pa.string())
```




    <Schema> string



## Low-level C library bindings

The nanoarrow Python package also provides lower level wrappers around Arrow C interface structures. You can create these using `nanoarrow.c_schema()`, `nanoarrow.c_array()`, and `nanoarrow.c_array_stream()`.

### Schemas

Use `nanoarrow.c_schema()` to convert an object to an `ArrowSchema` and wrap it as a Python object. This works for any object implementing the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) (e.g., `pyarrow.Schema`, `pyarrow.DataType`, and `pyarrow.Field`).


```python
na.c_schema(pa.decimal128(10, 3))
```




    <nanoarrow.c_schema.CSchema decimal128(10, 3)>
    - format: 'd:10,3'
    - name: ''
    - flags: 2
    - metadata: NULL
    - dictionary: NULL
    - children[0]:



Using `c_schema()` is a good fit for testing and for ephemeral schema objects that are being passed from one library to another. To extract the fields of a schema in a more convenient form, use `Schema()`:


```python
schema = na.Schema(pa.decimal128(10, 3))
schema.precision, schema.scale
```




    (10, 3)



The `CSchema` object cleans up after itself: when the object is deleted, the underlying `ArrowSchema` is released.

### Arrays

You can use `nanoarrow.c_array()` to convert an array-like object to an `ArrowArray`, wrap it as a Python object, and attach a schema that can be used to interpret its contents. This works for any object implementing the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface.html) (e.g., `pyarrow.Array`, `pyarrow.RecordBatch`).


```python
na.c_array(["one", "two", "three", None], na.string())
```




    <nanoarrow.c_array.CArray string>
    - length: 4
    - offset: 0
    - null_count: 1
    - buffers: (4754305168, 4754307808, 4754310464)
    - dictionary: NULL
    - children[0]:



Using `c_array()` is a good fit for testing and for ephemeral array objects that are being passed from one library to another. For a higher level interface, use `Array()`:


```python
array = na.Array(["one", "two", "three", None], na.string())
array.to_pylist()
```




    ['one', 'two', 'three', None]




```python
array.buffers
```

Loading ...