Re: [BuildStream] Partial local CAS

On 10/12/2018 12:18, Sander Striker wrote:

On Mon, Dec 10, 2018 at 1:11 PM Jürg Billeter <j bitron ch> wrote:
Hi Jim,

On Mon, 2018-11-26 at 17:59 +0000, Jim MacArthur via BuildStream-list wrote:
> In the simplest case, we could just check whether a build requirement
> exists on the remote execution storage service first, and if it does,
> and we are using remote execution, we don't need to download that
> artifact or source to the `bst` client machine. It's quite probable that
> someone else sharing the remote execution service will have uploaded
> that artifact already, or a previous run by ourselves will have done.

We generally still need to download the tree (Directory objects) even
for build-only dependencies as that's required for (virtual) staging.
We could potentially download Directory objects on demand instead of
downloading the whole tree, but I think the most important optimization
is to download blobs only if/when required. I consider always fetching
all Directory objects (via GetTree) to be acceptable, at least in a
first step.

The client needs to verify that the blobs are actually on the CAS
server. Either using FindMissingBlobs() or possibly have the
ReferenceStorage service's GetReference() method return success only if
all referenced objects are available. The latter would require a
ReferenceStorage service being available and might be problematic with
potentially optional subdirectories such as 'buildtree'. The former
requires a bit more work on the client side but should be less
problematic, so I would follow that path.


OK, let's just say for now that we want the local CASBasedDirectory object to cope with not having the underlying file objects. I can't see any big problems with that, but I don't know off the top of my head whether it currently will, so I'll need to do a survey of that again and see where it touches actual files.

When we need the file objects, such as on export() or running import_files from CASBasedDirectory into a FileBasedDirectory, we then need to retrieve them from somewhere. I don't really want to put CASRemote information into the Directory system, so I'd make this external, perhaps modifying the CASBasedDirectory to return its own list of missing blobs. Someone then needs to know where to look for them, since they could be on either one of n remote artifact caches or the remote execution storage service. I think we may need to query all of them.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]