The Art, Science, and Engineering of Programming

Privacy-Respecting Type Error Telemetry at Scale

2024-02-15T00:00:00+00:00

Context: Roblox Studio lets millions of creators build interactive experiences by programming in a variant of Lua called Luau. The creators form a broad group, ranging from novices writing their first script to professional developers; thus, Luau must support a wide audience. As part of its efforts to support all kinds of programmers, Luau includes an optional, gradual type system and goes to great lengths to minimize false positive errors.

Inquiry: Since Luau is currently being used by many creators, we want to collect data to improve the language and, in particular, the type system. The standard way to collect data is to deploy client-side telemetry; however, we cannot scrape personal data or proprietary information, which means we cannot collect source code fragments, error messages, or even filepaths. The research questions are thus about how to conduct telemetry that is not invasive and obtain insights from it about type errors.

Approach: We designed and implemented a pseudonymized, randomly-sampling telemetry system for Luau. Telemetry records include a timestamp, a session id, a reason for sending, and a numeric summary of the most recent type analyses. This information lets us study type errors over time without revealing private data. We deployed the system in Roblox Studio during Spring 2023 and collected over 1.5 million telemetry records from over 340,000 sessions.

Knowledge: We present several findings about Luau, all of which suggest that telemetry is an effective way to study type error pragmatics. One of the less-surprising findings is that opt-in gradual types are unpopular: there is an 100x gap between the number of untyped Luau sessions and the number of typed ones. One surprise is that the strict mode for type analysis is overly conservative about interactions with data assets. A reassuring finding is that type analysis rarely hits its internal limits on problem size.

Grounding: Our findings are supported by a dataset of over 1.5 million telemetry records. The data and scripts for analyzing it are available in an artifact.

Importance: Beyond the immediate benefits to Luau, our findings about types and type errors have implications for adoption and ergonomics in other gradual languages such as TypeScript, Elixir, and Typed Racket. Our telemetry design is of broad interest, as it reports on type errors without revealing sensitive information.

Let a Thousand Flowers Bloom: An Algebraic Representation for Edge Graphs

2024-02-15T00:00:00+00:00

Context: Edge graphs are graphs whose edges are labelled with identifiers, and nodes can have multiple edges between them. They are used to model a wide range of systems, including networks with distances or degrees of connection and complex relational data.

Inquiry: Unfortunately, the homogeneity of this graph structure prevents an effective representation in (functional) programs. Either their interface is riddled with partial functions, or the representations are computationally inefficient to process.

Approach: We present a novel data type for edge graphs, based on total and recursive definitions, that prevents usage errors from partial APIs and promotes structurally recursive computations. We follow an algebraic approach and provide a set of primitive constructors and combinators, along with equational laws that identify semantically equivalent constructions.

Knowledge: This algebra translates directly into an implementation using algebraic data types, and its homomorphisms give rise to functions for manipulating and transforming these edge graphs.

Grounding: We exploit the fact that many common graph algorithms are such homomorphisms to implement them in our framework.

Importance: In giving a theoretical grounding for the edge graph data type, we can formalise properties such as soundness and completeness of the representation while also minimising usage errors and maximising re-usability.

Broadening the View of Live Programmers: Integrating a Cross-Cutting Perspective on Run-Time Behavior into a Live Programming Environment

2024-02-15T00:00:00+00:00

Live programming provides feedback on run-time behavior by visualizing concrete values of expressions close to the source code. When using such a local perspective on run-time behavior, programmers have to mentally reconstruct the control flow if they want to understand the relation between observed values. As this requires complete and correct knowledge of all relevant code, this reconstruction is impractical for larger programs as well as in the case of unexpected program behavior. In turn, cross-cutting perspectives on run-time behavior can visualize the actual control flow directly. At the same time, cross-cutting perspectives are often difficult to navigate due to the large number of run-time events.

We propose to integrate cross-cutting perspectives into live programming environments based on local perspectives so that the two complement each other: the cross-cutting perspective provides an overview of the run-time behavior; the local perspective provides detailed feedback as well as points of interest to navigate the cross-cutting perspective. We present a cross-cutting perspective prototype in the form of a call tree browser integrated into the Babylonian/S live programming environment. In an exploratory user study, we observed that programmers found the tool useful for debugging, code comprehension, and navigation. Finally, we discuss how our prototype illustrates how the features of live programming environments may serve as the basis for other powerful dynamic development tools.

Arrays in Practice: An Empirical Study of Array Access Patterns on the JVM

2024-02-15T00:00:00+00:00

The array is a data structure used in a wide range of programs. Its compact storage and constant time random access makes it highly efficient, but arbitrary indexing complicates the analysis of code containing array accesses. Such analyses are important for compiler optimisations such as bounds check elimination. The aim of this work is to gain a better understanding of how arrays are used in real-world programs. While previous work has applied static analyses to understand how arrays are accessed and used, we take a dynamic approach. We empirically examine various characteristics of array usage by instrumenting programs to log all array accesses, allowing for analysis of array sizes, element types, from where arrays are accessed and to which extent sequences of array accesses form recognizable patterns. The programs in the study were collected from the Renaissance benchmark suite, all running on the Java Virtual Machine.

We account for characteristics displayed by the arrays investigated, finding that most arrays have a small size, are accessed by only one or two classes and by a single thread. On average over the benchmarks, 69.8% of the access patterns consist of uncomplicated traversals. Most of the instrumented classes (over 95%) do not use arrays directly at all. These results come from tracing data covering 3,803,043,390 array accesses made across 168,686 classes. While our analysis has only been applied to the Renaissance benchmark suite, the methodology can be applied to any program running on the Java Virtual Machine. This study, and the methodology in general, can inform future runtime implementations and compiler optimisations.

Scheduling Garbage Collection for Energy Efficiency on Asymmetric Multicore Processors

2024-02-15T00:00:00+00:00

The growing concern for energy efficiency in the Information and Communication Technology (ICT) sector has prompted the exploration of resource management techniques. While hardware architectures, such as single-ISA asymmetric multicore processors (AMP), offer potential energy savings, there is still untapped potential for software optimizations. This paper aims to bridge this gap by investigating the scheduling of garbage collection (GC) activities on a heterogeneous architecture with both performance cores (“p-cores”) and energy cores (“e-cores”) to achieve energy savings.

Our study focuses on the concurrent ZGC collector in the context of Java Virtual Machines (JVM), as the energy aspect is not well studied in the context of latency-sensitive Java workloads. By comparing the energy efficiency, performance, latency, and memory utilization of executing GC on p-cores versus e-cores, we present compelling findings.

We demonstrate that scheduling GC work on e-cores overall leads to approximately 3% energy savings without performance and mean latency degradation while requiring no additional effort from developers. Overall energy reduction can increase to 5.3±0.0225% by tuning the number of e-cores (still not changing the program!).

Our findings highlight the practicality and benefits of scheduling GC on e-cores, showcasing the potential for energy savings in heterogeneous architectures running Java workloads while meeting critical latency requirements. Our research contributes to the ongoing efforts toward achieving a more sustainable and efficient ICT sector.

Reactive Programming without Functions

2024-02-15T00:00:00+00:00

Context: Reactive programming (RP) is a declarative programming paradigm suitable for expressing the handling of events. It enables programmers to create applications that react automatically to changes over time. Whenever a time-varying signal changes — e.g. in response to values produced by event stream (e.g., sensor data, user input…) — the program state is updated automatically in tandem with that change. This makes RP well-suited for building interactive applications and reactive (soft real-time) systems.

Inquiry: RP Language implementations are often built on top of an existing (host) language as an Embedded Domain Specific Language (EDSL). This results in application code in which reactive code and non-reactive code is inherently entangled. Using a mechanism known as lifting, one usually has access to the full feature set of the (non-reactive) host language in the RP program. However, lifting is also dangerous. First, host code expressed in a Turing-complete language may diverge, resulting in unresponsive programs: i.e. reactive programs that are not actually reactive. Second, the bi-directional integration of reactive and non-reactive code results in a paradigmatic mismatch that, when unchecked, leads to faulty behaviour in programs.

Approach: We propose a new reactive programming language, that has been meticulously designed to be reactive-only. We start with a simple (first-order) model for reactivity, based on reactors (i.e. uninstantiated descriptions of signals and their dependencies) and deployments (i.e. instances of reactors) that consist of signals. The language does not have the notion of functions, and thus unlike other RP languages there is no lifting either. We extend this simple model incrementally with additional features found in other programming languages, RP or otherwise. These features include stateful reactors (that allow for time-based accumulation), signals with dynamic dependencies by means of conditionals and polymorphic deployments, recursively-defined reactors, and (anonymous) reactors with lexical scope.

Knowledge: In our description of these language features, we not only describe the syntax and semantics, but also how each features compares to the problems that exist in (EDSL) RP languages. I.e. by starting from a reactive-only model, we identify which reactive features (that, in other RP languages are typically expressed in non-reactive code) affect the reactive guarantees that can be enforced by the language.

Grounding: We base our arguments by analysing the effect that each feature has on our language: e.g., by analysing how signals are updated, how they are created and how dependencies between signals can be affected. When applicable, we draw parallels with other languages: i.e. similarities shared with other RP languages will be highlighted and thoroughly analysed, and where relevant the same will also be done with non-reactive languages.

Importance: Our language shows how a purely reactive programming is able to express the same kinds of programs as in other RP languages that require the use of (unchecked) functions. By considering reactive programs as a collection of pure (reactive-only) reactors, we aim to increase how reactive programming is comprehended by both language designers and its users.

Collective Allocator Abstraction to Control Object Spatial Locality in C++

2024-02-15T00:00:00+00:00

Disaggregated memory is promising for improving memory utilization in computer clusters in which memory demands significantly vary across computer nodes under utilization. It allows applications with high memory demands to use memory in other computer nodes.

However, disaggregated memory is not easy to use for implementing data structures in C++ because the C++ standard does not provide an adequate abstraction to use it efficiently in a high-level, modular manner. Because accessing remote memory involves high latency, disaggregated memory is often used as a far-memory system, which forms a kind of swap memory where part of local memory is used as a cache area, while the remaining memory is not subject to swapping. To pursue performance, programmers have to be aware of this nonuniform memory view and place data appropriately to minimize swapping.

In this work, we model the address space of memory-disaggregated systems as the far-memory model, present the collective allocator abstraction, which enables us to specify object placement aware of memory address subspaces, and apply it to programming aware of the far-memory model.

The far-memory model provides a view of the nonuniform memory space while hiding the details. In the model, the virtual address space is divided into two subspaces; one is subject to swapping and the other is not. The swapping subspace is further divided into even-sized pages, which are units of swapping. The collective allocator abstraction forms an allocator as a collection of sub-allocators, each of which owns a distinct subspace, where every allocation is done via sub-allocators. It enables us to control object placement at allocation time by selecting an appropriate sub-allocator according to different criteria, such as subspace characteristics and object collocation. It greatly facilitates implementing container data structures aware of the far-memory model.

We develop an allocator based on the collective allocator abstraction by extending the C++ standard allocator for container data structures on the far-memory model and experimentally demonstrate that it facilitates implementing containers equipped with object placement strategies aware of spatial locality under the far-memory model in a high-level, modular manner. More specifically, we have successfully implemented B-trees and skip lists with the combined use of two placement strategies. The modifications therein for the original implementations are fairly modest: addition is mostly due to specifying object placement; deletion and modification are at most 1.2 % and 3.2 % of lines of the original code, respectively. We have experimentally confirmed that the modified implementations successfully have data layouts suppressing swapping.

We forecast that the collective allocator abstraction would be a key to high-level integration with different memory hardware technologies because it straightforwardly accommodates new interfaces for subspaces.

LiveRec: Prototyping Probes by Framing Debug Protocols

2024-02-15T00:00:00+00:00

Context: In the first part of his 2012 presentation “Inventing on Principle”, Bret Victor gives a demo of a live code editor for Javascript which shows the dynamic history of values of variables in real time. This form of live programming has become known as “probes”. Probes provide the programmer with permanent and continuous insight into the dynamic evolution of function or method variables, thus improving feedback and developer experience.

Inquiry: Although Victor shows a working prototype of live probes in the context of Javascript, he does not discuss strategies for implementing them. Later work provides an implementation approach, but this requires a programming language to be implemented on top of the GraalVM runtime. In this paper we present LiveRec, a generic approach for implementing probes which can be applied in the context of many programming languages, without requiring the modification of compilers or run-time systems.

Approach: LiveRec is based on reusing existing debug protocols to implement probes. Methods or functions are compiled after every code change and executed inside the debugger. During execution the evolution of all local variables in the current stack frame are recorded and communicated back to the editor or IDE for display to the user.

Knowledge: It turns out that mainstream debug protocols are rich enough for implementing live probes. Step-wise execution, code hot swapping, and stack frame inspection provide the right granularity and sufficient information to realize live probes, without modifying compilers or language runtimes. Furthermore, it turns out that the recently proposed Debugger Adapter Protocol (DAP) provides an even more generic approach of implementing live probes, but, in some cases, at the cost of a significant performance penalty.

Grounding: We have applied LiveRec to implement probes using stack recording natively for Java through the Java Debug Interface (JDI), and through the DAP for Java, Python, C, and Javascript, all requiring just modest amounts of configuration code. We evaluate the run-time performance of all four probes prototypes, decomposed into: compile-after-change, hot swap, single step overhead, and stack recording overhead. Our initial results show that live probes on top of native debug APIs can be performant enough for interactive use. In the case of DAP, however, it highly depends on characteristics of the programming language implementation and its associated debugging infrastructure.

Importance: Live programming improves the programmer experience by providing immediate feedback about a program’s execution and eliminating disruptive edit-compile-restart sequences. Probes are one way to shorten the programmer feedback loop at the level of functions and methods. Although probes are not new, and have been implemented in (prototype) systems, LiveRec’s approach of building live probes on top of existing and generic debug protocols promises a path towards probes for a host of mainstream programming languages, with reasonable effort.

Conceptual Mutation Testing for Student Programming Misconceptions

2023-10-15T00:00:00+00:00

Context

Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations—analogous to mutation testing—provided by course staff. Doing so results in better student understanding of the problem as well as better test suites to accompany the program, both of which are desirable educational outcomes.

Inquiry

Producing mutant implementations requires care. If there are too many, or they are too obscure, students will end up spending a lot of time on an unproductive task and also become frustrated. Instead, we want a small number of mutants that each correspond to common problem misconceptions. This paper presents a workflow with partial automation to produce mutants of this form which, notably, are not those produced by mutation-testing tools.

Approach

We comb through student tests that fail a correct implementation. The student misconceptions are embedded in these failures. We then use methods to semantically cluster these failures. These clusters are then translated into conceptual mutants. These can then be run against student data to determine whether we they are better than prior methods. Some of these processes also enjoy automation.

Knowledge

We find that student misconceptions illustrated by failing tests can be operationalized by the above process. The resulting mutants do much better at identifying student misconceptions.

Grounding

Our findings are grounded in a manual analysis of student examples and a quantitative evaluation of both our clustering techniques and our process for making conceptual mutants. The clustering evaluation compares against a ground truth using standard cluster-correspondence measures, while the mutant evaluation examines how conceptual mutants perform against student data.

Importance

Our work contributes a workflow, with some automation, to reduce the cost and increase the effectiveness of generating conceptually interesting mutants. Such mutants can both improve learning outcomes and reduce student frustration, leading to better educational outcomes. In the process, we also identify a variation of mutation testing not commonly discussed in the software literature.

Real-World Choreographic Programming: Full-Duplex Asynchrony and Interoperability

2023-10-15T00:00:00+00:00

In the paradigm of choreographic programming, the overall behaviour of a distributed system is coded as a choreography from a global viewpoint. The choreography can then be automatically projected (compiled) to a correct implementation for each participant. This paradigm is interesting because it relieves the programmer from manually writing the separate send and receive actions performed by participants, which simplifies development and avoids communication mismatches.

However, the applicability of choreographic programming in the real world remains largely unexplored. The reason is twofold. First, while there have been several proposals of choreographic programming languages, none of these languages have been used to implement a realistic, widely-used protocol. Thus there is a lack of experience on how realistic choreographic programs are structured and on the relevance of the different features explored in theoretical models. Second, applications of choreographic programming shown so far are intrusive, in the sense that each participant must use exactly the code projected from the choreography. This prevents using the code generated from choreographies with existing third-party implementations of some participants, something that is very beneficial for testing or might even come as a requirement.

This paper addresses both problems. In particular, we carry out the first development in choreographic programming of a widespread real-world protocol: the Internet Relay Chat (IRC) client–server protocol. The development is based on Choral, an object-oriented higher-order choreographic programming language (choreographies can be parametric on choreographies and carry state).

We find that two of Choral’s features are key to our implementation: higher-order choreographies are used for modelling the complex interaction patterns that arise due to IRC’s asynchronous nature, while user-definable communication semantics are relevant for achieving interoperability with third-party implementations. In the process we also discover a missing piece: the capability of statically detecting that choices on alternative distributed behaviours are appropriately communicated by means of message types, for which we extend the Choral compiler with an elegant solution based on subtyping.

Our Choral implementation of IRC arguably represents a milestone for choreographic programming, since it is the first empirical evidence that the paradigm can be used to faithfully codify protocols found “in the wild”. We observe that the choreographic approach reduces the interaction complexity of our program, compared to the traditional approach of writing separate send and receive actions. To check that our implementation is indeed interoperable with third-party software, we test it against publicly available conformance tests for IRC and some of the most popular IRC client and server software. We also evaluate the performance and scalability of our implementation by performing performance tests.

Our experience shows that even if choreographic programming is still in its early life, it can already be applied to a realistic setting.

Provably Fair Cooperative Scheduling

2023-10-15T00:00:00+00:00

The context of this work is cooperative scheduling, a concurrency paradigm, where task execution is not arbitrarily preempted. Instead, language constructs exist that let a task voluntarily yield the right to execute to another task.

The inquiry is the design of provably fair schedulers and suitable notions of fairness for cooperative scheduling languages. To the best of our knowledge, this problem has not been addressed so far.

Our approach is to study fairness independently from syntactic constructs or environments, purely from the point of view of the semantics of programming languages, i.e., we consider fairness criteria using the formal definition of a program execution. We develop our concepts for classic structural operational semantics (SOS) as well as for the recent locally abstract, globally concrete (LAGC) semantics. The latter is a highly modular approach to semantics ensuring the separation of concerns between local statement evaluation and scheduling decisions.

The new knowledge contributed by our work is threefold: first, we show that a new fairness notion, called quiescent fairness, is needed to characterize fairness adequately in the context of cooperative scheduling; second, we define a provably fair scheduler for cooperative scheduling languages; third, a qualitative comparison between the SOS and LAGC versions yields that the latter, while taking higher initial effort, is more amenable to proving fairness and scales better under language extensions than SOS.

The grounding of our work is a detailed formal proof of quiescent fairness for the scheduler defined in LAGC semantics.

The importance of our work is that it provides a formal foundation for the implementation of fair schedulers for cooperative scheduling languages, an increasingly popular paradigm (for example: akka/Scala, JavaScript, async Rust). Being based solely on semantics, our ideas are widely applicable. Further, our work makes clear that the standard notion of fairness in concurrent languages needs to be adapted for cooperative scheduling and, more generally, for any language that combines atomic execution sequences with some form of preemption.

Live Objects All The Way Down: Removing the Barriers between Applications and Virtual Machines

2023-10-15T00:00:00+00:00

Object-oriented languages often use virtual machines (VMs) that provide mechanisms such as just-in-time (JIT) compilation and garbage collection (GC). These VM components are typically implemented in a separate layer, isolating them from the application.

While this approach brings the software engineering benefits of clear separation and decoupling, it introduces barriers for both understanding VM behavior and evolving the VM implementation. For example, the GC and JIT compiler are typically fixed at VM build time, limiting arbitrary adaptation at run time. Furthermore, because of this separation, the implementation of the VM cannot typically be inspected and debugged in the same way as application code, enshrining a distinction in easy-to-work-with application and hard-to-work-with VM code. These characteristics pose a barrier for application developers to understand the engine on top of which their own code runs, and fosters a knowledge gap that prevents application developers to change the VM.

We propose Live Metacircular Runtimes (LMRs) to overcome this problem. LMRs are language runtime systems that seamlessly integrate the VM into the application in live programming environments. Unlike classic metacircular approaches, we propose to completely remove the separation between application and VM. By systematically applying object-oriented design to VM components, we can build live runtime systems that are small and flexible enough, where VM engineers can benefit of live programming features such as short feedback loops, and application developers with fewer VM expertise can benefit of the stronger causal connections between their programs and the VM implementation.

To evaluate our proposal, we implemented Bee/LMR, a live VM for a Smalltalk-derivative environment in 22057 lines of code. We analyze case studies on tuning the garbage collector, avoiding recompilations by the just-in-time compiler, and adding support to optimize code with vector instructions to demonstrate the trade-offs of extending exploratory programming to VM development in the context of an industrial application used in production. Based on the case studies, we illustrate how our approach facilitates the daily development work of a small team of application developers.

Our approach enables VM developers to gain access to live programming tools traditionally reserved for application developers, while application developers can interact with the VM and modify it using the high-level tools they use every day. Both application and VM developers can seamlessly inspect, debug, understand, and modify the different parts of the VM with shorter feedback loops and higher-level tools.

The Design Principles of the Elixir Type System

2023-10-15T00:00:00+00:00

Elixir is a dynamically-typed functional language running on the Erlang Virtual Machine, designed for building scalable and maintainable applications. Its characteristics have earned it a surging adoption by hundreds of industrial actors and tens of thousands of developers. Static typing seems nowadays to be the most important request coming from the Elixir community. We present a gradual type system we plan to include in the Elixir compiler, outline its characteristics and design principles, and show by some short examples how to use it in practice.

Developing a static type system suitable for Erlang’s family of languages has been an open research problem for almost two decades. Our system transposes to this family of languages a polymorphic type system with set-theoretic types and semantic subtyping. To do that, we had to improve and extend both semantic subtyping and the typing techniques thereof, to account for several characteristics of these languages—and of Elixir in particular—such as the arity of functions, the use of guards, a uniform treatment of records and dictionaries, the need for a new sound gradual typing discipline that does not rely on the insertion at compile time of specific run-time type-tests but, rather, takes into account both the type tests performed by the virtual machine and those explicitly added by the programmer.

The system presented here is “gradually” being implemented and integrated in Elixir, but a prototype implementation is already available.

The aim of this work is to serve as a longstanding reference that will be used to introduce types to Elixir programmers, as well as to hint at some future directions and possible evolutions of the Elixir language.

A VM-Agnostic and Backwards Compatible Protected Modifier for Dynamically-Typed Languages

2023-06-15T00:00:00+00:00

In object-oriented languages, method visibility modifiers hold a key role in separating internal methods from the public API. Protected visibility modifiers offer a way to hide methods from external objects while authorizing internal use and overriding in subclasses. While present in main statically-typed languages, visibility modifiers are not as common or mature in dynamically-typed languages.

In this article, we present ProtDyn, a self-send-based visibility model calculated at compile time for dynamically-typed languages relying on name-mangling and syntactic differentiation of self vs non self sends.

We present #Pharo, a ProtDyn implementation of this model that is backwards compatible with existing programs, and its port to Python. Using these implementations we study the performance impact of ProtDyn on the method lookup, in the presence of global lookup caches and polymorphic inline caches. We show that our name mangling and double method registration technique has a very low impact on performance and keeps the benefits from the global lookup cache and polymorphic inline cache. We also show that the memory overhead on a real use case is between 2% and 13% in the worst-case scenario.

Protected modifier semantics enforces encapsulation such as private but allow developers to still extend the class in subclasses. ProtDyn offers a VM-agnostic and backwards-compatible design to introduce protected semantics in dynamically-typed languages.

McMini: A Programmable DPOR-Based Model Checker for Multithreaded Programs

2023-06-15T00:00:00+00:00

Context

Model checking has become a key tool for gaining confidence in correctness of multi-threaded programs. Unit tests and functional tests do not suffice because of race conditions that are not discovered by those tests. This problem is addressed by model checking tools. A simple model checker is useful for detecting race conditions prior to production.

Inquiry

Current model checkers hardwire the behavior of common thread operations, and do not recognize application-dependent thread paradigms or functions using simpler primitive operations. This introduces additional operations, causing current model checkers to be excessively slow. In addition, there is no mechanism to model the semantics of the actual thread wakeup policies implemented in the underlying thread library or operating system. Eliminating these constraints can make model checkers faster.

Approach

McMini is an extensible model checker based on DPOR (Dynamic Partial Order Reduction). A mechanism was invented to declare to McMini new, primitive thread operations, typically in 100~lines or less of C~code. The mechanism was extended to also allow a user of McMini to declare alternative thread wakeup policies, including spurious wakeups from condition variables.

Knowledge

In McMini, the user defines new thread operations. The user optimizes these operations by declaring to the DPOR algorithm information that reduces the number of thread schedules to be searched. One declares: (i) under what conditions an operation is enabled; (ii) which thread operations are independent of each other; and (iii) when two operations can be considered as co-enabled. An optional wakeup policy is implemented by defining when a wait operation (on a semaphore, condition variable, etc.) is enabled. A new enqueue thread operation is described, allowing a user to declare alternative wakeup policies.

Grounding

McMini was first confirmed to operate correctly and efficiently as a traditional, but extensible model checker for mutex, semaphore, condition variable, and reader-writer lock. McMini’s extensibility was then tested on novel primitive operations, representing other useful paradigms for multithreaded operations. An example is readers-and-two-writers. The speed of model checking was found to be five times faster and more, as compared to traditional implementations on top of condition variables. Alternative wakeup policies (e.g., FIFO, LIFO, arbitrary, etc.) were then tested using an enqueue operation. Finally, spurious wakeups were tested with a program that exposes a bug only in the presence of a spurious wakeup.

Importance

Many applications employ functions for multithreaded paradigms that go beyond the traditional mutex, semaphore, and condition variables. They are defined on top of basic operations. The ability to directly define new primitives for these paradigms makes model checkers run faster by searching fewer thread schedules. The ability to model particular thread wakeup policies, including spurious wakeup for condition variables, is also important. Note that POSIX leaves undefined the wakeup policies of pthread_mutex_lock, sem_wait, and pthread_cond_wait. The POSIX thread implementation then chooses a particular policy (e.g., FIFO, arbitrary), which can be directly modeled by McMini.

Coqlex: Generating Formally Verified Lexers

2023-06-15T00:00:00+00:00

A compiler consists of a sequence of phases going from lexical analysis to code generation. Ideally, the formal verification of a compiler should include the formal verification of each component of the tool-chain. An example is the CompCert project, a formally verified C compiler, that comes with associated tools and proofs that allow to formally verify most of those components.

However, some components, in particular the lexer, remain unverified. In fact, the lexer of Compcert is generated using OCamllex, a lex-like OCaml lexer generator that produces lexers from a set of regular expressions with associated semantic actions. Even though there exist various approaches, like CakeML or Verbatim++, to write verified lexers, they all have only limited practical applicability.

In order to contribute to the end-to-end verification of compilers, we implemented a generator of verified lexers whose usage is similar to OCamllex. Our software, called Coqlex, reads a lexer specification and generates a lexer equipped with a Coq proof of its correctness. It provides a formally verified implementation of most features of standard, unverified lexer generators.

The conclusions of our work are two-fold: Firstly, verified lexers gain to follow a user experience similar to lex/flex or OCamllex, with a domain-specific syntax to write lexers comfortably. This introduces a small gap between the written artifact and the verified lexer, but our design minimizes this gap and makes it practical to review the generated lexer. The user remains able to prove further properties of their lexer. Secondly, it is possible to combine simplicity and decent performance. Our implementation approach that uses Brzozowski derivatives is noticeably simpler than the previous work in Verbatim++ that tries to generate a deterministic finite automaton (DFA) ahead of time, and it is also noticeably faster thanks to careful design choices.

We wrote several example lexers that suggest that the convenience of using Coqlex is close to that of standard verified generators, in particular, OCamllex. We used Coqlex in an industrial project to implement a verified lexer of Ada. This lexer is part of a tool to optimize safety-critical programs, some of which are very large. This experience confirmed that Coqlex is usable in practice, and in particular that its performance is good enough. Finally, we performed detailed performance comparisons between Coqlex, OCamllex, and Verbatim++. Verbatim++ is the state-of-the-art tool for verified lexers in Coq, and the performance of its lexer was carefully optimized in previous work by Egolf and al. (2022). Our results suggest that Coqlex is two orders of magnitude slower than OCamllex, but two orders of magnitude faster than Verbatim++.

Verified compilers and other language-processing tools are becoming important tools for safety-critical or security-critical applications. They provide trust and replace more costly approaches to certification, such as manually reading the generated code. Verified lexers are a missing piece in several Coq-based verified compilers today. Coqlex comes with safety guarantees, and thus shows that it is possible to build formally verified front-ends.

Notes on “Notes on the Synthesis of Form”: Dawning Insights in Early Christopher Alexander

2023-02-15T00:00:00+00:00

This essay is a picaresque—a first-person narrative relating the adventures of a rogue (me) sifting through the mind of Christopher Alexander as he left behind formalized design thinking in favor of a more intuitive, almost spiritual process.

The work of Christopher Alexander is familiar to many computer scientists: for some it’s patterns, for some it’s the mystical quality without a name and “Nature of Order”; for many more it’s “Notes on the Synthesis of Form”—Alexander’s formalized design method and foreshadowing ideas about cohesion and coupling in software. Since the publication of “Design Patterns” by Gamma et al. in 1994, there have been hundreds of books published about design / software patterns, thousands of published pattern languages, and tens of thousands of published patterns.

“Notes,” published in 1964, was quickly followed by one of Alexander’s most important essays, “A City is Not a Tree,” in which he repudiates the formal method described in “Notes,” and his Preface to the paperback edition of “Notes” in 1971 repeats the repudiation. For many close readers of Alexander, this discontinuity is startling and unexplained.

When I finally read “Notes” in 2015, I was struck by the detailed worked example, along with a peculiar mathematical treatment of the method, and a hint that the modularization presented in the example was reckoned by a computer program he had written—all in the late 1950s and early 1960s. Because of my fascination with metaheuristic optimization, I couldn’t resist trying to replicate his experimental results.

Computers and their programs relish dwelling on flaws in your thinking—Alexander was not exempt. By engaging in hermeneutics and software archeology, I was able to uncover / discover the trajectory of his thinking as he encountered failures and setbacks with his computer programs. My attempted replication also failed, and that led me to try to unearth the five different programs he wrote, understand them, and figure out how one led to the next. They are not described in published papers, only in internal reports. My search for these reports led to their being made available on the Net.

What I found in my voyage were the early parts of a chain of thought that started with cybernetics, mathematics, and a plain-spoken computer; passed through “A City is Not a Tree”; paused to “make God appear in the middle of a field”; and ended with this fundamental design goal: I try to make the volume of the building so that it carries in it all feeling. To reach this feeling, I try to make the building so that it carries my eternal sadness. It comes, as nearly as I can in a building, to the point of tears.

Black Boxes, White Noise: Similarity Detection for Neural Functions

2023-02-15T00:00:00+00:00

Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages. However, the increasing adoption of deep learning models in software poses a challenge to these tools: these models implement functions that are inscrutable black boxes. As more software includes these DNN functions, new techniques are needed in order to assess the similarity between deep learning components of software.

Previous work has unveiled techniques for comparing the representations learned at various layers of deep neural network models by feeding canonical inputs to the models. Our goal is to be able to compare DNN functions when canonical inputs are not available – because they may not be in many application scenarios. The challenge, then, is to generate appropriate inputs and to identify a metric that, for those inputs, is capable of representing the degree of functional similarity between two comparable DNN functions.

Our approach uses random input with values between -1 and 1, in a shape that is compatible with what the DNN models expect. We then compare the outputs by performing correlation analysis.

Our study shows how it is possible to perform similarity analysis even in the absence of meaningful canonical inputs. The response to random inputs of two comparable DNN functions exposes those functions’ similarity, or lack thereof. Of all the metrics tried, we find that Spearman’s rank correlation coefficient is the most powerful and versatile, although in special cases other methods and metrics are more expressive.

We present a systematic empirical study comparing the effectiveness of several similarity metrics using a dataset of 56,355 classifiers collected from GitHub. This is accompanied by a sensitivity analysis that reveals how certain models’ training related properties affect the effectiveness of the similarity metrics.

To the best of our knowledge, this is the first work that shows how similarity of DNN functions can be detected by using random inputs. Our study of correlation metrics, and the identification of Spearman correlation coefficient as the most powerful among them for this purpose, establishes a complete and practical method for DNN clone detection that can be used in the design of new tools. It may also serve as inspiration for other program analysis tasks whose approaches break in the presence of DNN components.

Control Flow Duplication for Columnar Arrays in a Dynamic Compiler

2023-02-15T00:00:00+00:00

Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes.

While there are frameworks, libraries, and APIs to enable columnar data stores in programming languages, their integration into applications typically requires developer interference. In prior work, researchers implemented an approach for automated transformation of arrays into columnar arrays in the GraalVM JavaScript runtime. However, this approach suffers from performance issues on smaller workloads as well as on more complex nested data structures. We find that the key to optimizing accesses to columnar arrays is to identify queries and apply specific optimizations to them.

In this paper, we describe novel compiler optimizations in the GraalVM Compiler that optimize queries on columnar arrays. At JIT compile time, we identify loops that access potentially columnar arrays and duplicate them in order to specifically optimize accesses to columnar arrays. Additionally, we describe a new approach for creating columnar arrays from arrays consisting of complex objects by performing multi-level storage transformation. We demonstrate our approach via an implementation for JavaScript Date objects.

Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation. Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication. We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries. Queries additionally benefit from multi-level transformation, reaching speedups of around 2x. Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation.

We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common—particularly dynamic—programming languages.

Technical Dimensions of Programming Systems

2023-02-15T00:00:00+00:00

Programming requires much more than just writing code in a programming language. It is usually done in the context of a stateful environment, by interacting with a system through a graphical user interface. Yet, this wide space of possibilities lacks a common structure for navigation. Work on programming systems fails to form a coherent body of research, making it hard to improve on past work and advance the state of the art.

In computer science, much has been said and done to allow comparison of programming languages, yet no similar theory exists for programming systems; we believe that programming systems deserve a theory too.

We present a framework of technical dimensions which capture the underlying characteristics of programming systems and provide a means for conceptualizing and comparing them.

We identify technical dimensions by examining past influential programming systems and reviewing their design principles, technical capabilities, and styles of user interaction. Technical dimensions capture characteristics that may be studied, compared and advanced independently. This makes it possible to talk about programming systems in a way that can be shared and constructively debated rather than relying solely on personal impressions.

Our framework is derived using a qualitative analysis of past programming systems. We outline two concrete ways of using our framework. First, we show how it can analyze a recently developed novel programming system. Then, we use it to identify an interesting unexplored point in the design space of programming systems.

Much research effort focuses on building programming systems that are easier to use, accessible to non-experts, moldable and/or powerful, but such efforts are disconnected. They are informal, guided by the personal vision of their authors and thus are only evaluable and comparable on the basis of individual experience using them. By providing foundations for more systematic research, we can help programming systems researchers to stand, at last, on the shoulders of giants.

Symphony: Expressive Secure Multiparty Computation with Coordination

2023-02-15T00:00:00+00:00

Context

Secure Multiparty Computation (MPC) refers to a family of cryptographic techniques where mutually untrusting parties may compute functions of their private inputs while revealing only the function output.

Inquiry

It can be hard to program MPCs correctly and efficiently using existing languages and frameworks, especially when they require coordinating disparate computational roles. How can we make this easier?

Approach

We present Symphony, a new functional programming language for MPCs among two or more parties. Symphony starts from the single-instruction, multiple-data (SIMD) semantics of prior MPC languages, in which each party carries out symmetric responsibilities, and generalizes it using constructs that can coordinate many parties. Symphony introduces first-class shares and first-class party sets to provide unmatched language-level expressive power with high efficiency.

Knowledge

Developing a core formal language called λ-Symphony, we prove that the intuitive, generalized SIMD view of a program coincides with its actual distributed semantics. Thus the programmer can reason about her programs by reading them from top to bottom, even though in reality the program runs in a coordinated fashion, distributed across many machines. We implemented a prototype interpreter for Symphony leveraging multiple cryptographic backends. With it we wrote a variety of MPC programs, finding that Symphony can express optimized protocols that other languages cannot, and that in general Symphony programs operate efficiently.

Grounding

In addition to developing the formal proofs, the prototype implementation, and the MPC program case studies, we measured the performance of Symphony’s implementation on several benchmark programs and found it had comparable performance Obliv-C, a state-of-the-art two-party MPC framework for C, when running the same programs. We also measured Symphony’s performance on an optimized secure shuffle protocol based on a coordination pattern that no prior language can express, and found it has far superior performance to the standard alternative.

Importance

Programming MPCs is in increasing demand, with a proliferation of languages and frameworks. This work lowers the bar for programmers wanting to write efficient, coordinated MPCs that they can reason about and understand. The work applies to developers and cryptographers wanting to design new applications and protocols, which they are able to do at the language level, above the cryptographic details. The λ-Symphony formalization of Symphony, and the proofs about it, are also surprisingly simple, and can be a basis for follow-on formalization work in MPC and distributed programming. All code and artifacts are available, open-source.

Profiling and Optimizing Java Streams

2023-02-15T00:00:00+00:00

The Stream API was added in Java 8 to allow the declarative expression of data-processing logic, typically map-reduce-like data transformations on collections and datasets. The Stream API introduces two key abstractions. The stream, which is a sequence of elements available in a data source, and the stream pipeline, which contains operations (e.g., map, filter, reduce) that are applied to the elements in the stream upon execution. Streams are getting popular among Java developers as they leverage the conciseness of functional programming and ease the parallelization of data processing.

Despite the benefits of streams, in comparison to data processing relying on imperative code, streams can introduce significant overheads which are mainly caused by extra object allocations and reclamations, and the use of virtual method calls. As a result, developers need means to study the runtime behavior of streams in the goal of both mitigating such abstraction overheads and optimizing stream processing. Unfortunately, there is a lack of dedicated tools able to dynamically analyze streams to help developers specifically locate issues degrading application performance.

In this paper, we address the profiling and optimization of streams. We present a novel profiling technique for measuring the computations performed by a stream in terms of elapsed reference cycles, which we use to locate problematic streams with a major impact on application performance. While accuracy is crucial to this end, the inserted instrumentation code causes the execution of extra cycles, which are partially included in the profiles. To mitigate this issue, we estimate and compensate for the extra cycles caused by the inserted instrumentation code.

We implement our approach in StreamProf that, to the best of our knowledge, is the first dedicated stream profiler for the Java Virtual Machine (JVM). With StreamProf, we find that cycle profiling is effective to detect problematic streams whose optimization can enable significant performance gains. We also find that the accurate profiling of tasks supporting parallel stream processing allows the diagnosis of load imbalance according to the distribution of stream-related cycles at a thread level.

We conduct an evaluation on sequential and parallel stream-based workloads that are publicly available in three different sources. The evaluation shows that our profiling technique is efficient and yields accurate profiles. Moreover, we show the actionability of our profiles by guiding stream-related optimizations on two workloads from Renaissance. Our optimizations require the modification of only a few lines of code while achieving speedups up to a factor of 5x.

Java streams have been extensively studied by recent work, focusing on both how developers are using streams and how to optimize them. Current approaches in the optimization of streams mainly rely on static analysis techniques that overlook runtime information, suffer from important limitations to detect all streams executed by a Java application, or are not suitable for the analysis of parallel streams. Understanding the dynamic behavior of both sequential and parallel stream processing and its impact on application performance is crucial to help users make better decisions while using streams.

Primrose: Selecting Container Data Types by Their Properties

2023-02-15T00:00:00+00:00

Context

Container data types are ubiquitous in computer programming, enabling developers to efficiently store and process collections of data with an easy-to-use programming interface. Many programming languages offer a variety of container implementations in their standard libraries based on data structures offering different capabilities and performance characteristics.

Inquiry

Choosing the best container for an application is not always straightforward, as performance characteristics can change drastically in different scenarios, and as real-world performance is not always correlated to theoretical complexity.

Approach

We present Primrose, a language-agnostic tool for selecting the best performing valid container implementation from a set of container data types that satisfy properties given by application developers. Primrose automatically selects the set of valid container implementations for which the library specifications, written by the developers of container libraries, satisfies the specified properties. Finally, Primrose ranks the valid library implementations based on their runtime performance.

Knowledge

With Primrose, application developers can specify the expected behaviour of a container as a type refinement with semantic properties, e.g., if the container should only contain unique values (such as a set) or should satisfy the LIFO property of a stack. Semantic properties nicely complement syntactic properties (i.e., traits, interfaces, or type classes), together allowing developers to specify a container’s programming interface and behaviour without committing to a concrete implementation.

Grounding

We present our prototype implementation of Primrose that preprocesses annotated Rust code, selects valid container implementations and ranks them on their performance. The design of Primrose is, however, language-agnostic, and is easy to integrate into other programming languages that support container data types and traits, interfaces, or type classes. Our implementation encodes properties and library specifications into verification conditions in Rosette, an interface for SMT solvers, which determines the set of valid container implementations. We evaluate Primrose by specifying several container implementations, and measuring the time taken to select valid implementations for various combinations of properties with the solver. We automatically validate that container implementations conform to their library specifications via property-based testing.

Importance

This work provides a novel approach to bring abstract modelling and specification of container types directly into the programmer’s workflow. Instead of selecting concrete container implementations, application programmers can now work on the level of specification, merely stating the behaviours they require from their container types, and the best implementation can be selected automatically.

Little Tricky Logic: Misconceptions in the Understanding of LTL

2022-10-15T00:00:00+00:00

Context

Linear Temporal Logic (LTL) has been used widely in verification. Its importance and popularity have only grown with the revival of temporal logic synthesis, and with new uses of LTL in robotics and planning activities. All these uses demand that the user have a clear understanding of what an LTL specification means.

Inquiry

Despite the growing use of LTL, no studies have investigated the misconceptions users actually have in understanding LTL formulas. This paper addresses the gap with a first study of LTL misconceptions.

Approach

We study researchers’ and learners’ understanding of LTL in four rounds (three written surveys, one talk-aloud) spread across a two-year timeframe. Concretely, we decompose “understanding LTL” into three questions. A person reading a spec needs to understand what it is saying, so we study the mapping from LTL to English. A person writing a spec needs to go in the other direction, so we study English to LTL. However, misconceptions could arise from two sources: a misunderstanding of LTL’s syntax or of its underlying semantics. Therefore, we also study the relationship between formulas and specific traces.

Knowledge

We find several misconceptions that have consequences for learners, tool builders, and designers of new property languages. These findings are already resulting in changes to the Alloy modeling language. We also find that the English to LTL direction was the most common source of errors; unfortunately, this is the critical “authoring” direction in which a subtle mistake can lead to a faulty system. We contribute study instruments that are useful for training learners (whether academic or industrial) who are getting acquainted with LTL, and we provide a code book to assist in the analysis of responses to similar-style questions.

Grounding

Our findings are grounded in the responses to our survey rounds. Round 1 used Quizius to identify misconceptions among learners in a way that reduces the threat of expert blind spots. Rounds 2 and 3 confirm that both additional learners and researchers (who work in formal methods, robotics, and related fields) make similar errors. Round 4 adds deep support for our misconceptions via talk-aloud surveys.

Importance

This work provides useful answers to two critical but unexplored questions: in what ways is LTL tricky and what can be done about it? Our survey instruments can serve as a starting point for other studies.

A Theory of Composing Protocols

2022-10-15T00:00:00+00:00

In programming, protocols are everywhere. Protocols describe the pattern of interaction (or communication) between software systems, for example, between a user-space program and the kernel or between a local application and an online service. Ensuring conformance to protocols avoids a significant class of software errors. Subsequently, there has been a lot of work on verifying code against formal protocol specifications. The pervading approaches focus on distributed settings involving parallel composition of processes within a single monolithic protocol description. However we observe that, at the level of a single thread/process, modern software must often implement a number of clearly delineated protocols at the same time which become dependent on each other, e.g., a banking API and one or more authentication protocols. Rather than plugging together modular protocol-following components, the code must re-integrate multiple protocols into a single component.

We address this concern of combining protocols via a novel notion of ‘interleaving’ composition for protocols described via a process algebra. User-specified, domain-specific constraints can be inserted into the individual protocols to serve as ‘contact points’ to guide this composition procedure, which outputs a single combined protocol that can be programmed against. Our approach allows an engineer to then program against a number of protocols that have been composed (re-integrated), reflecting the true nature of applications that must handle multiple protocols at once.

We prove various desirable properties of the composition, including behaviour preservation: that the composed protocol implements the behaviour of both component protocols. We demonstrate our approach in the practical setting of Erlang, with a tool implementing protocol composition that both generates Erlang code from a protocol and generates a protocol from Erlang code. This tool shows that, for a range of sample protocols (including real-world examples), a modest set of constraints can be inserted to produce a small number of candidate compositions to choose from.

As we increasingly build software interacting with many programs and subsystems, this new perspective gives a foundation for improving software quality via protocol conformance in a multi-protocol setting.