Testing infrastructure

I should clarify that “long running test” must be a precondition for entering passing – that’s the reason for having the branch. Short tests can run directly on the PR and could be repeated easily on the PR until everything is globally consistent.

For “obscure configuration”, I’d apply the same criteria as platforms. If it’s officially supported it must pass, but there are plenty of configurations that would not be, and they don’t have to.

Maybe I have to clarify my idea

  • people are supposed to file PRs against master
  • automation will pick them up and create a PR-xxx branch. There can be multiple of these and a dashboard may show where the stand in quality and priority.
  • the staging branch is actually the PR that is most likely to become the next master
  • staging becomes passing when the last action left is fast-forward-merging master. Rationale here, is, that if automatic merging to master does not happen, the state passing is a clear indication this is good until somebody does the fast-forward-merge.

I would perfer master is really rock-solid and there will never be change there (unless it really needed). So it can be used without worriers, as I guess this is what people using seL4/CAmkEs as a black-bock would expect. Not sure if this can be guaranteed with the model you are proposing. There ther is a commit chain stagingpassingmaster, where passing flag something that is very likely to become the next master (but might still be changed) and staging is something you can use if you are keen on cutting edge things, but there is not guarantee at all that there are no force-pushes here. That might all be a bit of overkill at the moment. but it allows updating master roughly once a week only while passing might get updated immediately. I have a feeling you have something like this internally already because I keep hear thing like “master still needs to be pushed to github”.

I think I’m more confused now, sorry :-). There are 3 branches in your proposal, staging, passing, and master? If we have staging, why do we need passing? Or is that the same as master?

In either case, though, I think this would essentially recreate the situation that we have right now – bitbucket master is the staging branch, GitHub master is the passing/master branch. The frustration with this setup is not only that bitbucket is invisible currently (this would be fixed with your proposal), but also that there is a mismatch between PR targets and the actual development head. This has created frustration in the past, because passing tests on a PR to master/passing don’t mean that they pass on the development head and vice/versa. Even with automation that tests both (essentially what we now have between GitHub and Bamboo), you still get this moment where you’ve tested your code, believe that everything is fine, make the PR, and then things don’t work, because master is behind the development head, or manifests haven’t been updated yet etc. The reason for the failure would be more visible, but I do think it would still be frustrating to people.

Define “change”. In my (and Kent’s/Curtis’ I assume) proposal master never changes either apart from new commits, which I assume your proposal also has, otherwise nothing ever happens (the proposal would add new commits to master to fix global consistency issues if they had not already been caught at PR time). I might be wrong, but I think the difference between the two proposals is that one has to be locally always working (for all checks that are runnable within the repo), but not necessarily globally (checks with other repos), and the other (yours) is always globally working?

In terms of solidity: that’s what versioned releases are for. We explicitly do not want people to check out the CAmkES master branch and expect that to work with the seL4 master branch. We were trying to achieve that with pushing out only globally consistent versions to GitHub, but that a) didn’t really work (we still had failures because sometimes we need/want to push manually), and b) it seems to be holding up development.

I don’t think we need to have development versions that always fully work with everything. We do need code that is is always releasable, but that might lag behind development.

Managing dozens of pull requests as separate development branches seems like a lot of overhead that maybe can be managed with automation, but it also has to be. I don’t think we currently have the time and resources to write a whole development branch dashboard. In the other proposal, GitHub basically is that dashboard.

Maybe we’re not actually that far apart, the main difference seems to be which branch PRs are targeted to and which is named what (master vs staging). I think I should start writing this up with slightly more detail so it becomes clearer. There are also multiple parts to this:

  • GitHub being the main host for everything
  • policy change on what GitHub master means (to what bitbucket master currently is), including a passing branch to preserve releasability
  • port test infrastructure to more open foundation management

The last part can be done more incrementally than the first two.

In what I proposed, the passing branches across all repositories provide this to the same extent that master does currently does on GitHub, in that if you take a collection of repositories at their current passing tag, they should (more or less) work together.

This isn’t actually fully guaranteed as those branches are updated for one of the most recent manifest to pass tests rather than for some collection of commits globally that pass tests on all manifests at the same time, so it is possible that the current master across github my only be there because it passed camkes-examples while the latest sel4bench is failing.

This could potentially be resolved as part of any planned changes to CI.

Again, my wording was bad. What I wanted to say is, that the history of master should not be changed (by force-push). So nothing really new.

GitHub primary host
I agree that GitHub is the main host for everything. People can clone or fork foundation repositories from here and expect to get a history of changes for all of the projects. And changes can be PR’d and merged into GitHub repositories.

Each repository has a default branch (typically called master currently, but could also start becoming referred to as main) that tracks the main version of that project. However, some repositories may have additional streams of work, indicated by additional long-lived branches with different names (EG musllibc’s sel4 branch that corresponds to our changes from upstream musl). There is no current requirements about how separate branches in the same repository are connected other than an expectation that they belong to that repository. We try our hardest to ensure that commits appearing under these main branches will always be part of the ancestry of any future commits that are associated with those branches.

Occasionally we mark some commits of these commit streams as releases and this is what we expect people wanting stability will use. There is always the chance that there are future changes to these released versions that would create a new child and potentially new version. Otherwise all changes should be being merged via PR that adds the new changes on top of the most current version of the repository (and update the main branch).

Reporting test results
I see testing as something that happens adjacently and asynchronously to this. I don’t expect that we will be able to have a single standardized CI system that has global control over spawning tests. However it would be nice if there was a standardized server that could host test results and logs and be indexed based on repository commits. Something like what’s under the Status History section here: https://status.aws.amazon.com/
I would like to be able to look up a particular repository revision and look at all of the tests that were run that involved that revision, then for each pass/fail be able to click through to see the log. From the log I can then reverse engineer which commands were run and see what the console output was. This would allow me to look at test results for revisions that are in pending PR and are awaiting merge, for revisions that were merged already and have had long-running tests finish later or even for new tests to be added in the future and run against the old revisions and still be presented.

Additional branch names
I think in addition to the main branch and release tags (and maybe release branches) any other branches get confusing or hard to maintain without some high-degree of tooling support. I don’t think it’s easy to have a passing branch that can be kept tightly behind the main branch because within a project there always seems to end up being sub-components that get broken and aren’t important enough to hold up the rest of the project so the passing branch ends up being some undocumented subset of components that are passing, or falling far behind the main branch until things get fixed. The manifests were supposed to provide a similar mechanism. If you wanted a version of all of the repositories that are passing according to sel4test then you can check out the sel4test-manifest, or likewise for camkes-manifest, camkes-arm-vm-manifest etc.

I think that something like this could be useful sometimes such as at the moment? We could temporarily introduce it when PRs start piling up and there is a large processing lag. But if the queue gets short again then I think we should not keep using it as the PR list should become clear again as to what the strong candidates are.

In conclusion, I think we should prioritize setting up a server that lets us view test results and logs from what ever testing systems we currently have. Then allow merging PRs on GitHub based on being able to access this test information. This would be sufficient to try and get on top of existing backlogs and allowing remote reviewers to see the results of tests automatically and merge PRs without requiring intervention from Data61 (apart from maybe rerunning flaky tests).

Then we can continue to review how to add more testing tools and how people can contribute test data from their own on-premises setups if they wish as an ongoing process.

Ah, good. Yes, we’re all in agreement on this one.

I agree with everything in there.

It’d be nice if we could end up with whichever diverse testing infrastructure is used reporting to GitHub checks (might need an auth token, but that can be provided). Even the ones that run async could report to eg. the manifest repo checks. That would mean that GitHub can become the dashboard (if we want overall status from multiple repos, we can add a web page, or even just .md file that includes the passing/failing status of each repo, GitHub has a nice API for that). That would be fairly easy to achieve.

If a separate server/web app turns out to be easier, I’m not opposed to that either, but I’d try GH as dashboard first. The closer to the code, the better.

This is probably where we still need the most discussion. I agree that the manifests already record the relevant information, so a passing branch would be mostly convenience.

We do want to know when components fall behind, though, and ideally be nagged when that happens. Otherwise it’s very easy to end up in a state where we want to make a release and then find there are first multiple weeks of work to be done before that can happen.

I would like to get a persistent store of test results somewhere that I could checkout (and could be backed-up) and update. Then they could subsequently get mirrored in the GitHub checks dashboard. I don’t think that test results for immutable commits in a mainline branch need to be as ephemeral as they have been so far. It would be nice to settle on a storage format that would mean we don’t lose historical test data whenever we move across CI services. Even on bamboo we can’t easily look at the results from a test run that was too old.

GitHub seems to take a change-based view of test results where each change (By PR) has associated tests. I think it is also necessary to have an absolute view of test results, where you can look at test status at a particular instant (By commit). An example of this distinction is the difference between style checks on a PR and style checks on the entire repository.

Maybe just a separate git repository that gets test results pushed to it would be a good start. Then this could start being served as a webpage or published into GitHub checks/actions or migrated to a database.

I agree that would be nice.

It’s probably easier as a very first step for bamboo to upload logs to something like sel4.systems as plain text files. Won’t be very nice to look at yet, but could be improved over time, and it’d be a start.

I’ll experiment a little if I can make that happen. The GitHub statuses API seems to not require a full web app, just a personal access token and has provisions for including an URL with the result. I’m not fully clear on how things would get triggered, and with what exactly each test should be associated (commit sha seems to be the only target, but I think that would show both on PR and on repo). I’m sure we can figure out details.

Then from there, we can improve the interface, organisation, display and storage location of logs/benchmarking data, add other test platforms etc.

Ok, status update after some experimentation:

  • bamboo now indicates test status with the GitHub status API on commits
  • this also works on commits on PRs, once the corresponding tests are kicked off via the bot
  • logs are uploaded as well, including nice small summary pages such as https://sel4.systems/~bamboo/logs/RELEASE-SEL4TEST-SIMULATIONBUILDSGHSTATUS/1025/
  • bamboo can’t upload a log immediately when a job is finished, because the log doesn’t necessarily exist yet from within that job. You need to wait until all jobs in a stage are done, and if that stage never finishes, you get no logs. This is the case for seL4test currently with its 350 hardware jobs where something somewhere always doesn’t work (esp now that nobody is at the office to quickly have a look at the hw)

Apart from that last point, I think it’s working pretty well. It also shows that the GitHub API is simple enough to use for distribution:

  • we can hand out an access token for just status update
  • anyone who runs tests can set these
  • we can do the same for uploading logs to sel4.systems

That means anyone can listen for GitHub changes and run any kind of test and report back. We’d need to provide a bit more infrastructure and scripts to make this easy to do, but it’s very possible.

I propose, as a next step to make GitHub the main repo (switch from Bitbucket), and follow the weak global consistency model proposed by Kent initially, building up towards more tests and more responsive tests over time so that we can get stronger global consistency on pull requests eventually. We still have the strong consistency guaranteed by the manifests which will for now still be produced and pushed by Bamboo.

All of this (apart from bamboo pushing manifests) is still agnostic of specific test-infrastructure.

Unless there is more discussion on that, I’d write that up in a bit more detail and put to the vote for the TSC.