lakefs.branch module

Module containing lakeFS branch implementation

class lakefs.branch.Branch(repository_id, branch_id, client=None)[source]

Bases: _BaseBranch

Class representing a branch in lakeFS.

cherry_pick(reference, parent_number=None)[source]

Cherry-pick a given reference onto the branch.

Parameters:
  • reference (Union[str, Reference, Commit]) – ID of the reference to cherry-pick.

  • parent_number (Optional[int]) – When cherry-picking a merge commit, the parent number (starting from 1) with which to perform the diff. The default branch is parent 1.

Return type:

Commit

Returns:

The cherry-picked commit at the head of the branch.

Raises:
commit(message, metadata=None, **kwargs)[source]

Commit changes on the current branch

Parameters:
  • message (str) – Commit message

  • metadata (Optional[dict]) – Metadata to attach to the commit

  • kwargs – Additional Keyword Arguments for commit creation

Return type:

Reference

Returns:

The new reference after the commit

Raises:
create(source_reference, exist_ok=False)[source]

Create a new branch in lakeFS from this object

Example of creating a new branch:

import lakefs

branch = lakefs.repository("<repository_name>").branch("<branch_name>").create("<source_reference>")
Parameters:
  • source_reference (Union[str, Reference, Commit]) – The reference to create the branch from (reference ID, object or Commit object)

  • exist_ok (bool) – If False will throw an exception if a branch by this name already exists. Otherwise, return the existing branch without creating a new one

Return type:

Branch

Returns:

The lakeFS SDK object representing the branch

Raises:
delete()[source]

Delete branch from lakeFS server

Raises:
Return type:

None

delete_objects(object_paths)

Delete objects from lakeFS

This method can be used to delete single/multiple objects from branch. It accepts both str and StoredObject types as well as Iterables of these types. Using this method is more performant than sequentially calling delete on objects as it saves the back and forth from the server.

This can also be used in combination with object listing. For example:

import lakefs

branch = lakefs.repository("<repository_name>").branch("<branch_name>")
# list objects on a common prefix
objs = branch.objects(prefix="my-object-prefix/", max_amount=100)
# delete objects which have "foo" in their name
branch.delete_objects([o.path for o in objs if "foo" in o.path])
Parameters:

object_paths (str | StoredObject | Iterable[str | StoredObject]) – a single path or an iterable of paths to delete

Raises:
Return type:

None

diff(other_ref, max_amount=None, after=None, prefix=None, delimiter=None, **kwargs)

Returns a diff generator of changes between this reference and other_ref

Parameters:
  • other_ref – The other ref to diff against

  • max_amount – Stop showing changes after this amount

  • after – Return items after this value

  • prefix – Return items prefixed with this value

  • delimiter – Group common prefixes by this delimiter

  • kwargs – Additional Keyword Arguments to send to the server

Raises:
get_commit()[source]

For branches override the default _get_commit method to ensure we always fetch the latest head

property head: Reference

Get the commit reference this branch is pointing to

Returns:

The commit reference this branch is pointing to

Raises:
property id: str

Returns the reference id

import_data(commit_message='', metadata=None)[source]

Import data to lakeFS

Parameters:
  • metadata (Optional[dict]) – metadata to attach to the commit

  • commit_message (str) – once the data is imported, a commit is created with this message. If default (empty) message is provided, uses the default server commit message for imports.

Return type:

ImportManager

Returns:

an ImportManager object

log(max_amount=None, **kwargs)

Returns a generator of commits starting with this reference id

Parameters:
  • max_amount – (Optional) limits the amount of results to return from the server

  • kwargs – Additional Keyword Arguments to send to the server

Raises:
merge_into(destination_branch, **kwargs)

Merge this reference into destination branch

Parameters:
  • destination_branch (Union[str, Reference, Commit]) – The merge destination (either ID or branch object)

  • kwargs – Additional Keyword Arguments to send to the server

Return type:

str

Returns:

The reference id of the merge commit

Raises:
object(path)

Returns a writable object using the current repo id, reference and path

Parameters:

path (str) – The object’s path

Return type:

WriteableObject

objects(max_amount=None, after=None, prefix=None, delimiter=None, **kwargs)

Returns an object generator for this reference, the generator can yield either a StoredObject or a CommonPrefix object depending on the listing parameters provided.

Parameters:
  • max_amount (Optional[int]) – Stop showing changes after this amount

  • after (Optional[str]) – Return items after this value

  • prefix (Optional[str]) – Return items prefixed with this value

  • delimiter (Optional[str]) – Group common prefixes by this delimiter

  • kwargs – Additional Keyword Arguments to send to the server

Raises:
Return type:

Generator[StoredObject | CommonPrefix]

property repo_id: str

Return the repository id for this reference

reset_changes(path_type='reset', path=None)

Reset uncommitted changes (if any) on this branch

Parameters:
  • path_type (Literal['common_prefix', 'object', 'reset']) – the type of path to reset (‘common_prefix’, ‘object’, ‘reset’ - for all changes)

  • path (Optional[str]) – the path to reset (optional) - if path_type is ‘reset’ this parameter is ignored

Raises:
Return type:

None

revert(reference, parent_number=0, *, reference_id=None)[source]

revert the changes done by the provided reference on the current branch

Parameters:
  • reference_id (Optional[str]) –

    (Optional) The reference ID to revert

    Deprecated since version 0.4.0: Use reference instead.

  • parent_number (int) – when reverting a merge commit, the parent number (starting from 1) relative to which to perform the revert. The default for non merge commits is 0

  • reference (Union[str, Reference, Commit, None]) – the reference to revert

Return type:

Commit

Returns:

The commit created by the revert

Raises:
transact(commit_message='', commit_metadata=None, delete_branch_on_error=True)[source]

Create a transaction for multiple operations. Transaction allows for multiple modifications to be performed atomically on a branch, similar to a database transaction. It ensures that the branch remains unaffected until the transaction is successfully completed. The process includes:

  1. Creating an ephemeral branch from this branch

  2. Perform object operations on ephemeral branch

  3. Commit changes

  4. Merge back to source branch

  5. Delete ephemeral branch

Using a transaction the code for this flow will look like this:

import lakefs

branch = lakefs.repository("<repository_name>").branch("<branch_name>")
with branch.transact(commit_message="my transaction") as tx:
    for obj in tx.objects(prefix="prefix_to_delete/"):  # Delete some objects
        obj.delete()

    # Create new object
    tx.object("new_object").upload("new object data")

Note that unlike database transactions, lakeFS transaction does not take a “lock” on the branch, and therefore the transaction might fail due to changes in source branch after the transaction was created.

Parameters:
  • commit_message (str) – once the transaction is committed, a commit is created with this message

  • commit_metadata (Optional[Dict]) – user metadata for the transaction commit

  • delete_branch_on_error (bool) – Defaults to True. Ensures ephemeral branch is deleted on error.

Return type:

_Transaction

Returns:

a Transaction object to perform the operations on

uncommitted(max_amount=None, after=None, prefix=None, **kwargs)

Returns a diff generator of uncommitted changes on this branch

Parameters:
  • max_amount – Stop showing changes after this amount

  • after – Return items after this value

  • prefix – Return items prefixed with this value

  • kwargs – Additional Keyword Arguments to send to the server

Raises:
exception lakefs.branch.LakeFSDeprecationWarning[source]

Bases: Warning

Warning about use of a deprecated lakeFS or client feature. Unlike DeprecationWarning, this class is displayed by default. See default warning filter for how to disable it.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class lakefs.branch.Transaction(repository_id, branch_id, commit_message='', commit_metadata=None, delete_branch_on_error=True, client=None)[source]

Bases: object

Manage a transaction on a given branch

The transaction creates an ephemeral branch from the source branch. The transaction can then be used to perform operations on the branch which will later be merged back into the source branch. Currently, transaction is supported only as a context manager.