lakeFS High-Level Python SDK

lakeFS High Level SDK for Python, provides developers with the following features:

  1. Simpler programming interface with less configuration

  2. Inferring identity from environment

  3. Better abstractions for common, more complex operations (I/O, transactions, imports)

Requirements

Python 3.9+

Installation & Usage

pip install

pip install lakefs

Import the package

import lakefs

Getting Started

Please follow the installation procedure and afterward refer to the following example snippet for a quick start:


import lakefs
from lakefs.client import Client

# Using default client will attempt to authenticate with lakeFS server using configured credentials
# If environment variables or .lakectl.yaml file exist
repo = lakefs.repository(repository_id="my-repo")

# Or explicitly initialize and provide a Client object
clt = Client(username="<lakefs_access_key_id>", password="<lakefs_secret_access_key>", host="<lakefs_endpoint>")
repo = lakefs.Repository(repository_id="my-repo", client=clt)

# From this point, proceed using the package according to documentation
main_branch = repo.create(storage_namespace="<storage_namespace>").branch(branch_id="main")
...

Examples

Difference between two branches

for i in lakefs.Repository("repo").ref("main").diff("twig"):
   print(i)

You can also use the ref expressions here, for instance .diff("main~2") also works. Ref expressions are the lakeFS analogues of how Git specifies revisions.

Search a stored object for a string

with lakefs.Repository("repo").ref("main").object("path/to/data").reader(mode="r") as f:
   for l in f:
     if "quick" in l:
           print(l)

Upload and commit some data

with lakefs.Repository("golden").branch("main").object("path/to/new").writer(mode="wb") as f:
   f.write(b"my data")

# Returns a Reference
lakefs.Repository("golden").branch("main").commit("added my data using lakeFS high-level SDK")

# Prints "my data"
with lakefs.Repository("golden").branch("main").object("path/to/new").reader(mode="r") as f:
   for l in f:
     print(l)

Unlike references, branches are readable. This example couldn’t work if we used a ref.

Tests

To run the tests using pytest, first clone the lakeFS git repository

git clone https://github.com/treeverse/lakeFS.git
cd lakefs/clients/python-wrapper

Unit Tests

Inside the tests folder, execute pytest utests to run the unit tests.

Integration Tests

See testing documentation for more information

Documentation

lakeFS Python SDK

Author

services@treeverse.io

API Reference

Indices and Tables