Xvc Devlog - 221105

Posted on 2022-11-07 :: Tags: xvc, git, dvc, git-lfs, git-annex, shell, process, which

It’s Saturday, November 5th. The best part of free software development seems to be doing it whenever you want, including Saturdays.

Ah yeah, when you work for free, you can do so in any time you want perhaps. You also don’t have team members, and that means when you sit in front of this, you can move it.

Umm, right. Let’s take a look at outstanding PRs

You have a documentation PR#93. You also have work that you begin to integrate Git. I think it’s better to focus on the latter today.

Right. Let’s think the relationship between Git and Xvc. I believe we should identify a general relation not to end up in a mess like DVC and Git.

Why do you think DVC and Git relationship is mess?

They don’t automate common Git operations like commit after dvc add. There is only auto-stage, and that’s turned off by default. This made me seem that DVC wants to intervene as small as possible with the user’s Git workflow. That’s understandable. I support this. But on the other hand, they use .git/ directory itself to store and manage experiments, in a custom way that creates custom stash objects for experiments. This is against that principle of minimum intervention.

So this makes it mess?

The mess is caused, in my opinion, from the second factor. If DVC doesn’t perform any Git operations, that’s alright. It was aimed to be VCS-agnostic. Then experiments came and used Git internals no other similar tool uses.

Git-LFS and Git-Annex seems to use some non-standard machinery as well.

Ok. Not no other tool uses, but in a way that no other tool used.

You know, Github PRs are also stored in a similar way. They use also non-standard machinery.

Yeah, but these tools are all Git-specific tools. They accept the dominion of Git, and don’t try to bring any VCS-agnosticism.

And Xvc tries to have this agnosticism?

I believe the initial design of DVC that aims to be VCS-agnostic, or being able to run without VCS is a valuable. I like the idea behind Git, but the interface and implementation shows that gradual development. There is no library behind it.

Libgit?

Libgit2 something different. Although it’s said to have some common code, it doesn’t support all features. Git is a command line software with a mix of scripts and compiled executables, and not all code seems to be written in a way that could be used by external tools.

Hmm. The comment you added to the issue says git stash push --staged is not available in libgit2. Can’t you mimic it like DVC does for branches?

I don’t want to depend on Git at that level.

So, you’ll be using the CLI and shell for Git?

Yes, I believe, at the moment, before any performance tests that this doesn’t matter much. Running Git commands once in a while using shell shouldn’t make much difference in overall performance.

Then you’ll use it like a command line tool, like the user?

Yes, and I’ll make it to run outside of the usual threads. All Git will be like a sandwich, wrapping Xvc operations. If there are --git-ref instructions in an xvc command, it will be run before Xvc performs the command, and if there are any changes in Xvc metafiles, they will be commit to current branch.

graph LR

co["git checkout"] --> xvc
xvc --> cm["git commit"]

MERMAID

The first could be a branch, as well. So we have,

graph LR

br["git branch"] --> xvc
co["git checkout"] --> xvc
xvc --> cm["git commit"]

MERMAID

Looks sensible. How will you reflect these in the command line?

With something like xvc --git-checkout my-branch file list

Hmm, and for branch?

I think instead of different options for branch, checkout or tag, we can have a git-ref option that marks the option as a git reference. It will be checkout, or created as a branch from the current one if it doesn’t exist?

I think creating a branch is not a good idea. It should be explicit. You can just send the --git-ref value to git checkout and perform Xvc operation. If the user wants to create a branch, I think they can do it themselves.

What about storing the results in a branch, after adding a bunch of files, they may want to store it in another branch, maybe?

That’s sensible. We can have another option, like --to-branch in certain operations.

Or in xvc command as a general option. In that case, we can change the option names to --from-ref and --to-branch. It will be like:

graph LR

fr["git checkout $(--from-ref)"] --> xvc
xvc --> tb["git checkout --branch $(--to-branch)"]
tb --> co["git add .xvc && git commit -m 'xvc cmd'"]

MERMAID

If no such options are given, xvc will run without branching, right?

Yep. --from-ref and --to-branch options are just shortcuts for user behavior. Any other VCS tool could be used this way. We don’t need to integrate Git at the library level.

This brings the question of portability though. When you aim the software to be portable, you can’t rely on the existence of Git on the host, right?

I think a git.command option in the configuration is a good idea. Xvc will issue a warning if it can’t run the commands.

Will you use shell to run this command. Otherwise no $PATH configuration is possible, you know.

I believe that could be another option. git.use_shell. If git.command is set to an absolute path, Xvc may use it without the shell. Otherwise it can use the shell. Running the process only will make it faster and more secure.

There is also this option to run Xvc in another process. Because we may access Git in the shell that runs Xvc and if we can access it, maybe we don’t need shell execution in the process.

That’s a cool idea. But I wouldn’t add that extra complexity. Instead, we can try to find the git executable if git.command is not an absolute path. If git.command = /usr/bin/git in the configuration, we use it as is. Otherwise we can get $PATH or %PATH% from the environment, and search git.command in that to find the exact executable.

It looks there is a crate called which just doing what we are looking for to do.

Ah, cool. Then we can just ask that for the executable, and run it. We don’t need to drop to shell.

Yep. Let’s go back to implementation now.