2024-03-16T10:29:00+00:00https://programming-journal.org//The Art, Science, and Engineering of ProgrammingThe Art, Science, and Engineering of Programming journal is a fully refereed, open access, free, electronic journal. It welcomes papers on the art of programming, broadly construed.The editors of The Art, Science, and Engineering of Programmingeditors@programming-journal.orgScheduling Garbage Collection for Energy Efficiency on Asymmetric Multicore Processors2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F10Shimchenko, MarinaÖsterlund, ErikWrigstad, Tobias<p>The growing concern for energy efficiency in the Information and Communication Technology (ICT) sector has prompted the exploration of resource management techniques. While hardware architectures, such as single-ISA asymmetric multicore processors (AMP), offer potential energy savings, there is still untapped potential for software optimizations. This paper aims to bridge this gap by investigating the scheduling of garbage collection (GC) activities on a heterogeneous architecture with both performance cores (“p-cores”) and energy cores (“e-cores”) to achieve energy savings.</p>
<p>Our study focuses on the concurrent ZGC collector in the context of Java Virtual Machines (JVM), as the energy aspect is not well studied in the context of latency-sensitive Java workloads. By comparing the energy efficiency, performance, latency, and memory utilization of executing GC on p-cores versus e-cores, we present compelling findings.</p>
<p>We demonstrate that scheduling GC work on e-cores overall leads to approximately 3% energy savings without performance and mean latency degradation while requiring no additional effort from developers. Overall energy reduction can increase to 5.3±0.0225% by tuning the number of e-cores (still not changing the program!).</p>
<p>Our findings highlight the practicality and benefits of scheduling GC on e-cores, showcasing the potential for energy savings in heterogeneous architectures running Java workloads while meeting critical latency requirements. Our research contributes to the ongoing efforts toward achieving a more sustainable and efficient ICT sector.</p>
Collective Allocator Abstraction to Control Object Spatial Locality in C++2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F15Hideshima, TakatoSato, ShigeyukiUgawa, Tomoharu<p>Disaggregated memory is promising for improving memory utilization in
computer clusters in which memory demands significantly vary across
computer nodes under utilization. It allows applications with high
memory demands to use memory in other computer nodes.</p>
<p>However, disaggregated memory is not easy to use for implementing data
structures in C++ because the C++ standard does not provide an adequate
abstraction to use it efficiently in a high-level, modular manner.
Because accessing remote memory involves high latency, disaggregated
memory is often used as a far-memory system, which forms a kind of swap
memory where part of local memory is used
as a cache area, while the remaining memory is not subject to swapping.
To pursue performance, programmers have to be aware of this
nonuniform memory view and place data appropriately to minimize swapping.</p>
<p>In this work, we model the address space of memory-disaggregated systems
as the far-memory model, present the collective allocator abstraction,
which enables us to specify object placement aware of memory address
subspaces, and apply it to programming aware of the far-memory model.</p>
<p>The far-memory model provides a view of the nonuniform memory space
while hiding the details. In the model, the virtual address space is
divided into two subspaces; one is subject to swapping and the other is
not. The swapping subspace is further divided into even-sized pages,
which are units of swapping. The collective allocator abstraction
forms an allocator as a collection of sub-allocators, each of which
owns a distinct subspace, where every allocation is done via
sub-allocators. It enables us to control object placement at allocation
time by selecting an appropriate sub-allocator according to different
criteria, such as subspace characteristics and object collocation.
It greatly facilitates implementing container data structures aware of
the far-memory model.</p>
<p>We develop an allocator based on the collective allocator abstraction by
extending the C++ standard allocator for container data structures on
the far-memory model and experimentally demonstrate that it facilitates
implementing containers equipped with object placement strategies aware of
spatial locality under the far-memory model in a high-level, modular
manner. More specifically, we have successfully implemented B-trees and
skip lists with the combined use of two placement strategies. The
modifications therein for the original implementations are fairly
modest: addition is mostly due to specifying object placement; deletion
and modification are at most 1.2 % and 3.2 % of lines of the original
code, respectively. We have experimentally confirmed that the modified
implementations successfully have data layouts suppressing swapping.</p>
<p>We forecast that the collective allocator abstraction would be a key to
high-level integration with different memory hardware technologies
because it straightforwardly accommodates new interfaces for subspaces.</p>
Privacy-Respecting Type Error Telemetry at Scale2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F12Greenman, BenJeffrey, AlanKrishnamurthi, ShriramShah, Mitesh<p><strong>Context</strong>: Roblox Studio lets millions of creators build interactive experiences by programming in a variant of Lua called Luau. The creators form a broad group, ranging from novices writing their first script to professional developers; thus, Luau must support a wide audience. As part of its efforts to support all kinds of programmers, Luau includes an optional, gradual type system and goes to great lengths to minimize false positive errors.</p>
<p><strong>Inquiry</strong>: Since Luau is currently being used by many creators, we want to collect data to improve the language and, in particular, the type system. The standard way to collect data is to deploy client-side telemetry; however, we cannot scrape personal data or proprietary information, which means we cannot collect source code fragments, error messages, or even filepaths. The research questions are thus about how to conduct telemetry that is not invasive and obtain insights from it about type errors.</p>
<p><strong>Approach</strong>: We designed and implemented a pseudonymized, randomly-sampling telemetry system for Luau. Telemetry records include a timestamp, a session id, a reason for sending, and a numeric summary of the most recent type analyses. This information lets us study type errors over time without revealing private data. We deployed the system in Roblox Studio during Spring 2023 and collected over 1.5 million telemetry records from over 340,000 sessions.</p>
<p><strong>Knowledge</strong>: We present several findings about Luau, all of which suggest that telemetry is an effective way to study type error pragmatics. One of the less-surprising findings is that opt-in gradual types are unpopular: there is an 100x gap between the number of untyped Luau sessions and the number of typed ones. One surprise is that the strict mode for type analysis is overly conservative about interactions with data assets. A reassuring finding is that type analysis rarely hits its internal limits on problem size.</p>
<p><strong>Grounding</strong>: Our findings are supported by a dataset of over 1.5 million telemetry records. The data and scripts for analyzing it are available in an artifact.</p>
<p><strong>Importance</strong>: Beyond the immediate benefits to Luau, our findings about types and type errors have implications for adoption and ergonomics in other gradual languages such as TypeScript, Elixir, and Typed Racket. Our telemetry design is of broad interest, as it reports on type errors without revealing sensitive information.</p>
Let a Thousand Flowers Bloom: An Algebraic Representation for Edge Graphs2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F9Liell-Cock, JackSchrijvers, Tom<p><strong>Context</strong>: Edge graphs are graphs whose edges are labelled with identifiers, and nodes can have multiple edges between them. They are used to model a wide range of systems, including networks with distances or degrees of connection and complex relational data.</p>
<p><strong>Inquiry</strong>: Unfortunately, the homogeneity of this graph structure prevents an effective representation in (functional) programs. Either their interface is riddled with partial functions, or the representations are computationally inefficient to process.</p>
<p><strong>Approach</strong>: We present a novel data type for edge graphs, based on total and recursive definitions, that prevents usage errors from partial APIs and promotes structurally recursive computations. We follow an algebraic approach and provide a set of primitive constructors and combinators, along with equational laws that identify semantically equivalent constructions.</p>
<p><strong>Knowledge</strong>: This algebra translates directly into an implementation using algebraic data types, and its homomorphisms give rise to functions for manipulating and transforming these edge graphs.</p>
<p><strong>Grounding</strong>: We exploit the fact that many common graph algorithms are such homomorphisms to implement them in our framework.</p>
<p><strong>Importance</strong>: In giving a theoretical grounding for the edge graph data type, we can formalise properties such as soundness and completeness of the representation while also minimising usage errors and maximising re-usability.</p>
Broadening the View of Live Programmers: Integrating a Cross-Cutting Perspective on Run-Time Behavior into a Live Programming Environment2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F13Rein, PatrickFlach, ChristianRamson, StefanKrebs, EvaHirschfeld, Robert<p>Live programming provides feedback on run-time behavior by visualizing concrete values of expressions close to the source code. When using such a local perspective on run-time behavior, programmers have to mentally reconstruct the control flow if they want to understand the relation between observed values. As this requires complete and correct knowledge of all relevant code, this reconstruction is impractical for larger programs as well as in the case of unexpected program behavior. In turn, cross-cutting perspectives on run-time behavior can visualize the actual control flow directly. At the same time, cross-cutting perspectives are often difficult to navigate due to the large number of run-time events.</p>
<p>We propose to integrate cross-cutting perspectives into live programming environments based on local perspectives so that the two complement each other: the cross-cutting perspective provides an overview of the run-time behavior; the local perspective provides detailed feedback as well as points of interest to navigate the cross-cutting perspective. We present a cross-cutting perspective prototype in the form of a call tree browser integrated into the Babylonian/S live programming environment. In an exploratory user study, we observed that programmers found the tool useful for debugging, code comprehension, and navigation. Finally, we discuss how our prototype illustrates how the features of live programming environments may serve as the basis for other powerful dynamic development tools.</p>
Arrays in Practice: An Empirical Study of Array Access Patterns on the JVM2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F14Åkerblom, BeatriceCastegren, Elias<p>The array is a data structure used in a wide range of programs.
Its compact storage and constant time random access makes it
highly efficient, but arbitrary indexing complicates the
analysis of code containing array accesses. Such analyses are
important for compiler optimisations such as bounds check
elimination.
The aim of this work is to gain a better understanding of how
arrays are used in real-world programs. While previous work has
applied static analyses to understand how arrays are accessed
and used, we take a dynamic approach.
We empirically examine various characteristics of array usage by
instrumenting programs to log all array accesses, allowing for
analysis of array sizes, element types, from where arrays are
accessed and to which extent sequences of array accesses form
recognizable patterns. The programs in the study were collected
from the Renaissance benchmark suite, all running on the Java
Virtual Machine.</p>
<p>We account for characteristics displayed by the arrays
investigated, finding that most arrays have a small size, are
accessed by only one or two classes and by a single thread. On
average over the benchmarks, 69.8% of the access patterns
consist of uncomplicated traversals. Most of the instrumented
classes (over 95%) do not use arrays directly at all.
These results come from tracing data covering 3,803,043,390
array accesses made across 168,686 classes.
While our analysis has only been applied to the Renaissance
benchmark suite, the methodology can be applied to any program
running on the Java Virtual Machine. This study, and the
methodology in general, can inform future runtime
implementations and compiler optimisations.</p>
Reactive Programming without Functions2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F11Oeyen, BjarnoDe Koster, JoeriDe Meuter, Wolfgang<p><strong>Context</strong>:
Reactive programming (RP) is a declarative programming paradigm suitable for expressing the handling of events.
It enables programmers to create applications that react automatically to changes over time.
Whenever a time-varying <strong>signal</strong> changes — e.g. in response to values produced by event stream (e.g., sensor data, user input…) — the program state is updated automatically in tandem with that change.
This makes RP well-suited for building interactive applications and reactive (soft real-time) systems.</p>
<p><strong>Inquiry</strong>:
RP Language implementations are often built on top of an existing (host) language as an Embedded Domain Specific Language (EDSL).
This results in application code in which reactive code and non-reactive code is inherently entangled.
Using a mechanism known as <strong>lifting</strong>, one usually has access to the full feature set of the (non-reactive) host language in the RP program.
However, lifting is also dangerous.
First, host code expressed in a Turing-complete language may diverge, resulting in unresponsive programs: i.e. reactive programs that are not actually reactive.
Second, the bi-directional integration of reactive and non-reactive code results in a paradigmatic mismatch that, when unchecked, leads to faulty behaviour in programs.</p>
<p><strong>Approach</strong>:
We propose a new reactive programming language, that has been meticulously designed to be reactive-only.
We start with a simple (first-order) model for reactivity, based on <strong>reactors</strong> (i.e. uninstantiated descriptions of signals and their dependencies) and <strong>deployments</strong> (i.e. instances of reactors) that consist of <strong>signals</strong>.
The language does not have the notion of functions, and thus unlike other RP languages there is no lifting either.
We extend this simple model incrementally with additional features found in other programming languages, RP or otherwise.
These features include stateful reactors (that allow for time-based accumulation), signals with dynamic dependencies by means of conditionals and polymorphic deployments, recursively-defined reactors, and (anonymous) reactors with lexical scope.</p>
<p><strong>Knowledge</strong>:
In our description of these language features, we not only describe the syntax and semantics, but also how each features compares to the problems that exist in (EDSL) RP languages.
I.e. by starting from a reactive-only model, we identify which reactive features (that, in other RP languages are typically expressed in non-reactive code) affect the <strong>reactive guarantees</strong> that can be enforced by the language.</p>
<p><strong>Grounding</strong>:
We base our arguments by analysing the effect that each feature has on our language: e.g., by analysing how signals are updated, how they are created and how dependencies between signals can be affected.
When applicable, we draw parallels with other languages: i.e. similarities shared with other RP languages will be highlighted and thoroughly analysed, and where relevant the same will also be done with non-reactive languages.</p>
<p><strong>Importance</strong>:
Our language shows how a purely reactive programming is able to express the same kinds of programs as in other RP languages that require the use of (unchecked) functions.
By considering reactive programs as a collection of pure (reactive-only) reactors, we aim to increase how reactive programming is comprehended by both language designers and its users.</p>
LiveRec: Prototyping Probes by Framing Debug Protocols2024-02-15T00:00:00+00:002024-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F16Döderlein, Jean-BaptisteRozen, Riemer vanStorm, Tijs van der<p><strong>Context</strong>: In the first part of his 2012 presentation “Inventing on Principle”, Bret Victor gives a demo of a live code editor for Javascript which shows the dynamic history of values of variables in real time. This form of live programming has become known as “probes”. Probes provide the programmer with permanent and continuous insight into the dynamic evolution of function or method variables, thus improving feedback and developer experience.</p>
<p><strong>Inquiry</strong>: Although Victor shows a working prototype of live probes in the context of Javascript, he does not discuss strategies for implementing them. Later work provides an implementation approach, but this requires a programming language to be implemented on top of the GraalVM runtime. In this paper we present <strong>LiveRec</strong>, a generic approach for implementing probes which can be applied in the context of many programming languages, without requiring the modification of compilers or run-time systems.</p>
<p><strong>Approach</strong>: <strong>LiveRec</strong> is based on reusing existing debug protocols to implement probes. Methods or functions are compiled after every code change and executed inside the debugger. During execution the evolution of all local variables in the current stack frame are recorded and communicated back to the editor or IDE for display to the user.</p>
<p><strong>Knowledge</strong>: It turns out that mainstream debug protocols are rich enough for implementing live probes. Step-wise execution, code hot swapping, and stack frame inspection provide the right granularity and sufficient information to realize live probes, without modifying compilers or language runtimes. Furthermore, it turns out that the recently proposed Debugger Adapter Protocol (DAP) provides an even more generic approach of implementing live probes, but, in some cases, at the cost of a significant performance penalty.</p>
<p><strong>Grounding</strong>: We have applied <strong>LiveRec</strong> to implement probes using stack recording natively for Java through the Java Debug Interface (JDI), and through the DAP for Java, Python, C, and Javascript, all requiring just modest amounts of configuration code. We evaluate the run-time performance of all four probes prototypes, decomposed into: compile-after-change, hot swap, single step overhead, and stack recording overhead. Our initial results show that live probes on top of native debug APIs can be performant enough for interactive use. In the case of DAP, however, it highly depends on characteristics of the programming language implementation and its associated debugging infrastructure.</p>
<p><strong>Importance</strong>: Live programming improves the programmer experience by providing immediate feedback about a program’s execution and eliminating disruptive edit-compile-restart sequences. Probes are one way to shorten the programmer feedback loop at the level of functions and methods. Although probes are not new, and have been implemented in (prototype) systems, <strong>LiveRec</strong>’s approach of building live probes on top of existing and generic debug protocols promises a path towards probes for a host of mainstream programming languages, with reasonable effort.</p>
The Design Principles of the Elixir Type System2023-10-15T00:00:00+00:002023-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F4Castagna, GiuseppeDuboc, GuillaumeValim, José<p>Elixir is a dynamically-typed functional language running on the Erlang Virtual Machine, designed for building scalable and maintainable applications. Its characteristics have earned it a surging adoption by hundreds of industrial actors and tens of thousands of developers. Static typing seems nowadays to be the most important request coming from the Elixir community. We present a gradual type system we plan to include in the Elixir compiler, outline its characteristics and design principles, and show by some short examples how to use it in practice.</p>
<p>Developing a static type system suitable for Erlang’s family of languages has been an open research problem for almost two decades. Our system transposes to this family of languages a polymorphic type system with set-theoretic types and semantic subtyping. To do that, we had to improve and extend both semantic subtyping and the typing techniques thereof, to account for several characteristics of these languages—and of Elixir in particular—such as the arity of functions, the use of guards, a uniform treatment of records and dictionaries, the need for a new sound gradual typing discipline that does not rely on the insertion at compile time of specific run-time type-tests but, rather, takes into account both the type tests performed by the virtual machine and those explicitly added by the programmer.</p>
<p>The system presented here is “gradually” being implemented and integrated in Elixir, but a prototype implementation is already available.</p>
<p>The aim of this work is to serve as a longstanding reference that will be used to introduce types to Elixir programmers, as well as to hint at some future directions and possible evolutions of the Elixir language.</p>
Real-World Choreographic Programming: Full-Duplex Asynchrony and Interoperability2023-10-15T00:00:00+00:002023-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F8Lugović, LovroMontesi, Fabrizio<p>In the paradigm of choreographic programming, the overall behaviour of a distributed system is coded as a choreography from a global viewpoint.
The choreography can then be automatically projected (compiled) to a correct implementation for each participant.
This paradigm is interesting because it relieves the programmer from manually writing the separate send and receive actions performed by participants, which simplifies development and avoids communication mismatches.</p>
<p>However, the applicability of choreographic programming in the real world remains largely unexplored.
The reason is twofold.
First, while there have been several proposals of choreographic programming languages, none of these languages have been used to implement a realistic, widely-used protocol.
Thus there is a lack of experience on how realistic choreographic programs are structured and on the relevance of the different features explored in theoretical models.
Second, applications of choreographic programming shown so far are intrusive, in the sense that each participant must use exactly the code projected from the choreography.
This prevents using the code generated from choreographies with existing third-party implementations of some participants, something that is very beneficial for testing or might even come as a requirement.</p>
<p>This paper addresses both problems.
In particular, we carry out the first development in choreographic programming of a widespread real-world protocol: the Internet Relay Chat (IRC) client–server protocol.
The development is based on Choral, an object-oriented higher-order choreographic programming language (choreographies can be parametric on choreographies and carry state).</p>
<p>We find that two of Choral’s features are key to our implementation: higher-order choreographies are used for modelling the complex interaction patterns that arise due to IRC’s asynchronous nature, while user-definable communication semantics are relevant for achieving interoperability with third-party implementations.
In the process we also discover a missing piece: the capability of statically detecting that choices on alternative distributed behaviours are appropriately communicated by means of message types, for which we extend the Choral compiler with an elegant solution based on subtyping.</p>
<p>Our Choral implementation of IRC arguably represents a milestone for choreographic programming, since it is the first empirical evidence that the paradigm can be used to faithfully codify protocols found “in the wild”.
We observe that the choreographic approach reduces the interaction complexity of our program, compared to the traditional approach of writing separate send and receive actions.
To check that our implementation is indeed interoperable with third-party software, we test it against publicly available conformance tests for IRC and some of the most popular IRC client and server software.
We also evaluate the performance and scalability of our implementation by performing performance tests.</p>
<p>Our experience shows that even if choreographic programming is still in its early life, it can already be applied to a realistic setting.</p>
Live Objects All The Way Down: Removing the Barriers between Applications and Virtual Machines2023-10-15T00:00:00+00:002023-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F5Pimás, Javier E.Marr, StefanGarbervetsky, Diego<p>Object-oriented languages often use virtual machines (VMs)
that provide mechanisms such as just-in-time (JIT) compilation
and garbage collection (GC).
These VM components are typically implemented in a separate layer,
isolating them from the application.</p>
<p>While this approach brings the software engineering benefits
of clear separation and decoupling,
it introduces barriers for both understanding VM behavior
and evolving the VM implementation.
For example, the GC
and JIT compiler are typically
fixed at VM build time, limiting arbitrary adaptation at run time.
Furthermore,
because of this separation,
the implementation of the VM cannot typically be inspected and debugged
in the same way as application code,
enshrining a distinction in easy-to-work-with application
and hard-to-work-with VM code.
These characteristics pose a barrier for application developers to
understand the engine on top of which their own code runs, and fosters
a knowledge gap that prevents application developers to change the VM.</p>
<p>We propose Live Metacircular Runtimes (LMRs) to overcome this
problem. LMRs are language runtime systems that seamlessly integrate the
VM into the application in live programming environments.
Unlike classic metacircular approaches, we propose to completely remove
the separation between application and VM.
By systematically applying object-oriented design to VM components,
we can build live runtime systems that are small and flexible
enough, where VM engineers can benefit of live programming features such
as short feedback loops, and application developers with fewer VM expertise
can benefit of the stronger causal connections between their programs and
the VM implementation.</p>
<p>To evaluate our proposal, we implemented Bee/LMR, a live VM for a
Smalltalk-derivative environment in 22057 lines of code.
We analyze case studies on tuning the garbage collector,
avoiding recompilations by the just-in-time compiler,
and adding support to optimize code with vector instructions
to demonstrate the trade-offs of extending exploratory
programming to VM development
in the context of an industrial application used in production.
Based on the case studies, we illustrate how our approach
facilitates the daily development work
of a small team of application developers.</p>
<p>Our approach enables VM developers to gain access to live
programming tools traditionally reserved for application
developers, while application developers can interact with the VM
and modify it using the high-level tools they use every day.
Both application and VM developers can seamlessly inspect, debug,
understand, and modify the different parts of the VM with shorter
feedback loops and higher-level tools.</p>
Conceptual Mutation Testing for Student Programming Misconceptions2023-10-15T00:00:00+00:002023-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F7Prasad, SiddharthaGreenman, BenNelson, TimKrishnamurthi, Shriram<h4 id="context">Context</h4>
<p>Students often misunderstand programming problem descriptions. This
can lead them to solve the wrong problem, which creates frustration,
obstructs learning, and imperils grades.
Researchers have found that students can be made to better
understand the problem by writing <strong>examples</strong> before they start
programming. These examples are checked against correct and wrong
implementations—analogous to mutation testing—provided by course
staff. Doing so results in better student understanding of the problem
as well as better test suites to accompany the program, both of
which are desirable educational outcomes.</p>
<h4 id="inquiry">Inquiry</h4>
<p>Producing mutant implementations requires care. If there are
too many, or they are too obscure, students will end up spending a
lot of time on an unproductive task and also become
frustrated. Instead, we want a small number of mutants that each
correspond to <strong>common problem misconceptions</strong>. This paper
presents a workflow with partial automation to produce mutants of
this form which, notably, are <strong>not</strong> those produced by
mutation-testing tools.</p>
<h4 id="approach">Approach</h4>
<p>We comb through student tests that <strong>fail</strong> a correct
implementation. The student misconceptions are embedded in these
failures. We then use methods to <strong>semantically</strong> cluster these
failures. These clusters are then translated into <strong>conceptual
mutants</strong>. These can then be run against student data to determine
whether we they are better than prior methods. Some of these
processes also enjoy automation.</p>
<h4 id="knowledge">Knowledge</h4>
<p>We find that student misconceptions illustrated by failing tests
can be operationalized by the above process. The resulting
mutants do much better at identifying student
misconceptions.</p>
<h4 id="grounding">Grounding</h4>
<p>Our findings are grounded in a manual analysis of student examples
and a quantitative evaluation of both our clustering techniques and our
process for making conceptual mutants.
The clustering evaluation compares against a ground truth using
standard cluster-correspondence measures, while the mutant evaluation
examines how conceptual mutants perform against student data.</p>
<h4 id="importance">Importance</h4>
<p>Our work contributes a workflow, with some automation, to reduce the
cost and increase the effectiveness of generating conceptually
interesting mutants. Such mutants can both improve learning
outcomes and reduce student frustration, leading to better
educational outcomes. In the process, we also identify a variation
of mutation testing not commonly discussed in the software
literature.</p>
Provably Fair Cooperative Scheduling2023-10-15T00:00:00+00:002023-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F6Hähnle, ReinerHenrio, Ludovic<p>The <strong>context</strong> of this work is cooperative scheduling, a
concurrency paradigm, where task execution is not arbitrarily
preempted. Instead, language constructs exist that let a task
voluntarily yield the right to execute to another task.</p>
<p>The <strong>inquiry</strong> is the design of provably fair schedulers and
suitable notions of fairness for cooperative scheduling
languages. To the best of our knowledge, this problem has not been
addressed so far.</p>
<p>Our <strong>approach</strong> is to study fairness independently from
syntactic constructs or environments, purely from the point of view
of the semantics of programming languages, i.e., we consider
fairness criteria using the formal definition of a program
execution. We develop our concepts for classic structural
operational semantics (SOS) as well as for the recent <strong>locally
abstract, globally concrete</strong> (LAGC) semantics. The latter is a
highly modular approach to semantics ensuring the separation of
concerns between local statement evaluation and scheduling
decisions.</p>
<p>The new <strong>knowledge</strong> contributed by our work is threefold:
first, we show that a new fairness notion, called <strong>quiescent</strong>
fairness, is needed to characterize fairness adequately in the
context of cooperative scheduling; second, we define a provably fair
scheduler for cooperative scheduling languages; third, a qualitative
comparison between the SOS and LAGC versions yields that the latter,
while taking higher initial effort, is more amenable to proving
fairness and scales better under language extensions than SOS.</p>
<p>The <strong>grounding</strong> of our work is a detailed formal proof
of quiescent fairness for the scheduler defined in LAGC semantics.</p>
<p>The <strong>importance</strong> of our work is that it provides a formal
foundation for the implementation of fair schedulers for cooperative
scheduling languages, an increasingly popular paradigm (for example:
akka/Scala, JavaScript, async Rust). Being based solely on
semantics, our ideas are widely applicable. Further, our work makes
clear that the standard notion of fairness in concurrent languages
needs to be adapted for cooperative scheduling and, more generally,
for any language that combines atomic execution sequences with some
form of preemption.</p>
Coqlex: Generating Formally Verified Lexers2023-06-15T00:00:00+00:002023-06-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F3Ouedraogo, WendlasidaScherer, GabrielStrassburger, Lutz<p>A compiler consists of a sequence of phases going from lexical analysis to
code generation. Ideally, the formal verification of a compiler should include
the formal verification of each component of the tool-chain. An example is the
CompCert project, a formally verified C compiler, that comes with associated
tools and proofs that allow to formally verify most of those components.</p>
<p>However, some components, in particular the lexer, remain unverified. In fact,
the lexer of Compcert is generated using OCamllex, a lex-like OCaml lexer
generator that produces lexers from a set of regular expressions with
associated semantic actions. Even though there exist various approaches, like
CakeML or Verbatim++, to write verified lexers, they all have only limited
practical applicability.</p>
<p>In order to contribute to the end-to-end verification of compilers, we
implemented a generator of verified lexers whose usage is similar to OCamllex.
Our software, called Coqlex, reads a lexer specification and generates a lexer
equipped with a Coq proof of its correctness. It provides a formally verified
implementation of most features of standard, unverified lexer generators.</p>
<p>The conclusions of our work are two-fold: Firstly, verified lexers gain to
follow a user experience similar to lex/flex or OCamllex, with a
domain-specific syntax to write lexers comfortably. This introduces a small gap
between the written artifact and the verified lexer, but our design minimizes
this gap and makes it practical to review the generated lexer. The user remains
able to prove further properties of their lexer. Secondly, it is possible to
combine simplicity and decent performance. Our implementation approach that
uses Brzozowski derivatives is noticeably simpler than the previous work in
Verbatim++ that tries to generate a deterministic finite automaton (DFA) ahead
of time, and it is also noticeably faster thanks to careful design choices.</p>
<p>We wrote several example lexers that suggest that the convenience of using
Coqlex is close to that of standard verified generators, in particular,
OCamllex. We used Coqlex in an industrial project to implement a verified lexer
of Ada. This lexer is part of a tool to optimize safety-critical programs, some
of which are very large. This experience confirmed that Coqlex is usable in
practice, and in particular that its performance is good enough. Finally, we
performed detailed performance comparisons between Coqlex, OCamllex, and
Verbatim++. Verbatim++ is the state-of-the-art tool for verified lexers in Coq,
and the performance of its lexer was carefully optimized in previous work by
Egolf and al. (2022). Our results suggest that Coqlex is two orders of
magnitude slower than OCamllex, but two orders of magnitude faster than
Verbatim++.</p>
<p>Verified compilers and other language-processing tools are becoming important
tools for safety-critical or security-critical applications. They provide trust
and replace more costly approaches to certification, such as manually reading
the generated code. Verified lexers are a missing piece in several Coq-based
verified compilers today. Coqlex comes with safety guarantees, and thus shows
that it is possible to build formally verified front-ends.</p>
A VM-Agnostic and Backwards Compatible Protected Modifier for Dynamically-Typed Languages2023-06-15T00:00:00+00:002023-06-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F2Thomas, IonaAranega, VincentDucasse, StéphanePolito, GuillermoTesone, Pablo<p>In object-oriented languages, method visibility modifiers hold a key role in separating internal methods from the public API. Protected visibility modifiers offer a way to hide methods from external objects while authorizing internal use and overriding in subclasses. While present in main statically-typed languages, visibility modifiers are not as common or mature in dynamically-typed languages.</p>
<p>In this article, we present ProtDyn, a self-send-based visibility model calculated at compile time for dynamically-typed languages relying on name-mangling and syntactic differentiation of self vs non self sends.</p>
<p>We present #Pharo, a ProtDyn implementation of this model that is backwards compatible with existing programs, and its port to Python. Using these implementations we study the performance impact of ProtDyn on the method lookup, in the presence of global lookup caches and polymorphic inline caches. We show that our name mangling and double method registration technique has a very low impact on performance and keeps the benefits from the global lookup cache and polymorphic inline cache. We also show that the memory overhead on a real use case is between 2% and 13% in the worst-case scenario.</p>
<p>Protected modifier semantics enforces encapsulation such as private but allow developers to still extend the class in subclasses. ProtDyn offers a VM-agnostic and backwards-compatible design to introduce protected semantics in dynamically-typed languages.</p>
McMini: A Programmable DPOR-Based Model Checker for Multithreaded Programs2023-06-15T00:00:00+00:002023-06-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2024%2F8%2F1Pirtle, MaxwellJovanovic, LukaCooperman, Gene<h4 id="context">Context</h4>
<p>Model checking has become a key tool for gaining confidence
in correctness of multi-threaded programs. Unit tests and
functional tests do not suffice because of race conditions that
are not discovered by those tests. This problem is addressed
by model checking tools. A simple model checker is
useful for detecting race conditions prior to production.</p>
<h4 id="inquiry">Inquiry</h4>
<p>Current model checkers hardwire the
behavior of common thread operations, and do not recognize
application-dependent thread paradigms or functions using
simpler primitive operations.
This introduces additional operations, causing current
model checkers to be excessively slow.
In addition, there is no mechanism to model the
semantics of the actual thread wakeup policies implemented in
the underlying thread library or operating system.
Eliminating these constraints can make model checkers faster.</p>
<h4 id="approach">Approach</h4>
<p>McMini is an <strong>extensible</strong> model checker
based on DPOR (Dynamic Partial Order Reduction). A mechanism
was invented to declare to McMini new, primitive thread
operations, typically in 100~lines or less of C~code. The mechanism
was extended to also allow a user of McMini to declare alternative
thread wakeup policies, including spurious wakeups from condition
variables.</p>
<h4 id="knowledge">Knowledge</h4>
<p>In McMini, the user defines new
thread operations. The user optimizes these operations by declaring to
the DPOR algorithm information
that reduces the number of thread schedules to be searched.
One declares:
(i) under what conditions an operation
is enabled; (ii) which thread operations are independent of
each other; and (iii) when two operations can be considered
as co-enabled. An optional wakeup policy is implemented
by defining when a wait operation (on a semaphore, condition
variable, etc.) is enabled.
A new enqueue thread operation is described, allowing a user to
declare alternative wakeup policies.</p>
<h5 id="grounding">Grounding</h5>
<p>McMini was first confirmed to operate correctly
and efficiently as a traditional, but extensible model checker
for mutex, semaphore, condition variable, and reader-writer lock.
McMini’s extensibility was then tested on novel primitive
operations, representing other useful paradigms for multithreaded
operations. An example is readers-and-two-writers.
The speed of model checking was found to be five times faster
and more, as compared to traditional implementations on top
of condition variables.
Alternative wakeup
policies (e.g., FIFO, LIFO, arbitrary,
etc.) were then tested using an enqueue operation.
Finally, spurious wakeups were tested with a program that exposes
a bug <strong>only</strong> in the presence of a spurious wakeup.</p>
<h4 id="importance">Importance</h4>
<p>Many applications employ functions for multithreaded paradigms that
go beyond the traditional mutex, semaphore, and condition
variables. They are defined on top of basic operations.
The ability to directly define new primitives for these
paradigms makes model checkers run faster by searching fewer
thread schedules.
The ability to model particular thread wakeup policies,
including spurious wakeup for condition variables, is
also important. Note that POSIX leaves undefined the
wakeup policies of <code class="language-plaintext highlighter-rouge">pthread_mutex_lock</code>,
<code class="language-plaintext highlighter-rouge">sem_wait</code>, and <code class="language-plaintext highlighter-rouge">pthread_cond_wait</code>. The POSIX
thread implementation then chooses a particular policy (e.g.,
FIFO, arbitrary), which can be directly modeled by McMini.</p>
Profiling and Optimizing Java Streams2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F10Rosales, EduardoBasso, MatteoRosà, AndreaBinder, Walter<p>The Stream API was added in Java 8 to allow the declarative expression of data-processing logic, typically map-reduce-like data transformations on collections and datasets. The Stream API introduces two key abstractions. The stream, which is a sequence of elements available in a data source, and the stream pipeline, which contains operations (e.g., map, filter, reduce) that are applied to the elements in the stream upon execution. Streams are getting popular among Java developers as they leverage the conciseness of functional programming and ease the parallelization of data processing.</p>
<p>Despite the benefits of streams, in comparison to data processing relying on imperative code, streams can introduce significant overheads which are mainly caused by extra object allocations and reclamations, and the use of virtual method calls. As a result, developers need means to study the runtime behavior of streams in the goal of both mitigating such abstraction overheads and optimizing stream processing. Unfortunately, there is a lack of dedicated tools able to dynamically analyze streams to help developers specifically locate issues degrading application performance.</p>
<p>In this paper, we address the profiling and optimization of streams. We present a novel profiling technique for measuring the computations performed by a stream in terms of elapsed reference cycles, which we use to locate problematic streams with a major impact on application performance. While accuracy is crucial to this end, the inserted instrumentation code causes the execution of extra cycles, which are partially included in the profiles. To mitigate this issue, we estimate and compensate for the extra cycles caused by the inserted instrumentation code.</p>
<p>We implement our approach in StreamProf that, to the best of our knowledge, is the first dedicated stream profiler for the Java Virtual Machine (JVM). With StreamProf, we find that cycle profiling is effective to detect problematic streams whose optimization can enable significant performance gains. We also find that the accurate profiling of tasks supporting parallel stream processing allows the diagnosis of load imbalance according to the distribution of stream-related cycles at a thread level.</p>
<p>We conduct an evaluation on sequential and parallel stream-based workloads that are publicly available in three different sources. The evaluation shows that our profiling technique is efficient and yields accurate profiles. Moreover, we show the actionability of our profiles by guiding stream-related optimizations on two workloads from Renaissance. Our optimizations require the modification of only a few lines of code while achieving speedups up to a factor of 5x.</p>
<p>Java streams have been extensively studied by recent work, focusing on both how developers are using streams and how to optimize them. Current approaches in the optimization of streams mainly rely on static analysis techniques that overlook runtime information, suffer from important limitations to detect all streams executed by a Java application, or are not suitable for the analysis of parallel streams. Understanding the dynamic behavior of both sequential and parallel stream processing and its impact on application performance is crucial to help users make better decisions while using streams.</p>
Black Boxes, White Noise: Similarity Detection for Neural Functions2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F12Farmahinifarahani, FarimaLopes, Cristina V.<p>Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages.
However, the increasing adoption of deep learning models in software poses a challenge to these tools: these models implement functions that are inscrutable black boxes.
As more software includes these DNN functions, new techniques are needed in order to assess the similarity between deep learning components of software.</p>
<p>Previous work has unveiled techniques for comparing the representations learned at various layers of deep neural network models by feeding canonical inputs to the models. Our goal is to be able to compare DNN functions when canonical inputs are not available – because they may not be in many application scenarios. The challenge, then, is to generate appropriate inputs and to identify a metric that, for those inputs, is capable of representing the degree of functional similarity between two comparable DNN functions.</p>
<p>Our approach uses random input with values between -1 and 1, in a shape that is compatible with what the DNN models expect. We then compare the outputs by performing correlation analysis.</p>
<p>Our study shows how it is possible to perform similarity analysis even in the absence of meaningful canonical inputs. The response to random inputs of two comparable DNN functions exposes those functions’ similarity, or lack thereof. Of all the metrics tried, we find that Spearman’s rank correlation coefficient is the most powerful and versatile, although in special cases other methods and metrics are more expressive.</p>
<p>We present a systematic empirical study comparing the effectiveness of several similarity metrics using a dataset of 56,355 classifiers collected from GitHub. This is accompanied by a sensitivity analysis that reveals how certain models’ training related properties affect the effectiveness of the similarity metrics.</p>
<p>To the best of our knowledge, this is the first work that shows how similarity of DNN functions can be detected by using random inputs. Our study of correlation metrics, and the identification of Spearman correlation coefficient as the most powerful among them for this purpose, establishes a complete and practical method for DNN clone detection that can be used in the design of new tools. It may also serve as inspiration for other program analysis tasks whose approaches break in the presence of DNN components.</p>
Control Flow Duplication for Columnar Arrays in a Dynamic Compiler2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F9Kloibhofer, SebastianMakor, LukasLeopoldseder, DavidBonetta, DanieleStadler, LukasMössenböck, Hanspeter<p>Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes.</p>
<p>While there are frameworks, libraries, and APIs to enable columnar data stores in programming languages, their integration into applications typically requires developer interference.
In prior work, researchers implemented an approach for <strong>automated</strong> transformation of arrays into columnar arrays in the GraalVM JavaScript runtime.
However, this approach suffers from performance issues on smaller workloads as well as on more complex nested data structures.
We find that the key to optimizing accesses to columnar arrays is to identify queries and apply specific optimizations to them.</p>
<p>In this paper, we describe novel compiler optimizations in the GraalVM Compiler that optimize queries on columnar arrays.
At JIT compile time, we identify loops that access potentially columnar arrays and duplicate them in order to specifically optimize accesses to columnar arrays.
Additionally, we describe a new approach for creating columnar arrays from arrays consisting of complex objects by performing <strong>multi-level storage transformation</strong>. We demonstrate our approach via an implementation for JavaScript <code class="language-plaintext highlighter-rouge">Date</code> objects.</p>
<p>Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation.
Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication.
We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries.
Queries additionally benefit from multi-level transformation, reaching speedups of around 2x.
Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation.</p>
<p>We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common—particularly dynamic—programming languages.</p>
Notes on “Notes on the Synthesis of Form”: Dawning Insights in Early Christopher Alexander2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F8Gabriel, Richard P.<p>This essay is a picaresque—a first-person narrative relating the adventures of a rogue (me) sifting through the mind of Christopher Alexander as he left behind formalized design thinking in favor of a more intuitive, almost spiritual process.</p>
<p>The work of Christopher Alexander is familiar to many computer scientists: for some it’s patterns, for some it’s the mystical <strong>quality without a name</strong> and “Nature of Order”; for many more it’s “Notes on the Synthesis of Form”—Alexander’s formalized design method and foreshadowing ideas about cohesion and coupling in software. Since the publication of “Design Patterns” by Gamma et al. in 1994, there have been hundreds of books published about design / software patterns, thousands of published pattern languages, and tens of thousands of published patterns.</p>
<p>“Notes,” published in 1964, was quickly followed by one of Alexander’s most important essays, “A City is Not a Tree,” in which he repudiates the formal method described in “Notes,” and his Preface to the paperback edition of “Notes” in 1971 repeats the repudiation. For many close readers of Alexander, this discontinuity is startling and unexplained.</p>
<p>When I finally read “Notes” in 2015, I was struck by the detailed worked example, along with a peculiar mathematical treatment of the method, and a hint that the modularization presented in the example was reckoned by a computer program he had written—all in the late 1950s and early 1960s. Because of my fascination with metaheuristic optimization, I couldn’t resist trying to replicate his experimental results.</p>
<p>Computers and their programs relish dwelling on flaws in your thinking—Alexander was not exempt. By engaging in hermeneutics and software archeology, I was able to uncover / discover the trajectory of his thinking as he encountered failures and setbacks with his computer programs. My attempted replication also failed, and that led me to try to unearth the five different programs he wrote, understand them, and figure out how one led to the next. They are not described in published papers, only in internal reports. My search for these reports led to their being made available on the Net.</p>
<p>What I found in my voyage were the early parts of a chain of thought that started with cybernetics, mathematics, and a plain-spoken computer; passed through “A City is Not a Tree”; paused to “make God appear in the middle of a field”; and ended with this fundamental design goal: <strong>I try to make the volume of the building so that it carries in it all feeling. To reach this feeling, I try to make the building so that it carries my eternal sadness. It comes, as nearly as I can in a building, to the point of tears.</strong></p>
Technical Dimensions of Programming Systems2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F13Jakubovic, JoelEdwards, JonathanPetricek, Tomas<p>Programming requires much more than just writing code in a programming language. It is usually done in the context of a stateful environment, by interacting with a system through a graphical user interface. Yet, this wide space of possibilities lacks a common structure for navigation. Work on programming systems fails to form a coherent body of research, making it hard to improve on past work and advance the state of the art.</p>
<p>In computer science, much has been said and done to allow comparison of <strong>programming languages</strong>, yet no similar theory exists for <strong>programming systems;</strong> we believe that programming systems deserve a theory too.</p>
<p>We present a framework of <strong>technical dimensions</strong> which capture the underlying characteristics of programming systems and provide a means for conceptualizing and comparing them.</p>
<p>We identify technical dimensions by examining past influential programming systems and reviewing their design principles, technical capabilities, and styles of user interaction. Technical dimensions capture characteristics that may be studied, compared and advanced independently. This makes it possible to talk about programming systems in a way that can be shared and constructively debated rather than relying solely on personal impressions.</p>
<p>Our framework is derived using a qualitative analysis of past programming systems. We outline two concrete ways of using our framework. First, we show how it can analyze a recently developed novel programming system. Then, we use it to identify an interesting unexplored point in the design space of programming systems.</p>
<p>Much research effort focuses on building programming systems that are easier to use, accessible to non-experts, moldable and/or powerful, but such efforts are disconnected. They are informal, guided by the personal vision of their authors and thus are only evaluable and comparable on the basis of individual experience using them. By providing foundations for more systematic research, we can help programming systems researchers to stand, at last, on the shoulders of giants.</p>
Symphony: Expressive Secure Multiparty Computation with Coordination2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F14Sweet, IanDarais, DavidHeath, DavidHarris, WilliamEstes, RyanHicks, Michael<h4 id="context">Context</h4>
<p>Secure Multiparty Computation (MPC) refers to a family of
cryptographic techniques where
mutually untrusting parties may compute functions of
their private inputs while revealing only the function output.</p>
<h4 id="inquiry">Inquiry</h4>
<p>It can be hard to program MPCs correctly and
efficiently using existing languages and frameworks, especially when
they require coordinating disparate computational roles. How can we
make this easier?</p>
<h4 id="approach">Approach</h4>
<p>We present Symphony, a new functional programming
language for MPCs among two or more parties. Symphony starts from the
single-instruction, multiple-data (SIMD) semantics of prior MPC
languages, in which each party carries out symmetric responsibilities,
and generalizes it using constructs that can coordinate many
parties. Symphony introduces <strong>first-class shares</strong> and
<strong>first-class party sets</strong> to provide unmatched language-level
expressive power with high efficiency.</p>
<h4 id="knowledge">Knowledge</h4>
<p>Developing a core formal language called λ-Symphony,
we prove that the intuitive, generalized SIMD view of a program
coincides with its actual distributed semantics. Thus the programmer
can reason about her programs by reading them from top to bottom, even
though in reality the program runs in a coordinated fashion,
distributed across many machines. We implemented a prototype
interpreter for Symphony leveraging multiple cryptographic backends.
With it we wrote a variety of MPC programs, finding that Symphony can express
optimized protocols that other languages cannot, and that in general
Symphony programs operate efficiently.</p>
<h4 id="grounding">Grounding</h4>
<p>In addition to developing the formal proofs, the
prototype implementation, and the MPC program case studies, we
measured the performance of Symphony’s implementation on several
benchmark programs and found it had comparable performance Obliv-C, a
state-of-the-art two-party MPC framework for C, when running the same
programs. We also measured Symphony’s performance on an optimized
<strong>secure shuffle</strong> protocol based on a coordination pattern that no
prior language can express, and found it has far superior performance
to the standard alternative.</p>
<h4 id="importance">Importance</h4>
<p>Programming MPCs is in increasing demand, with a
proliferation of languages and frameworks. This work lowers the bar
for programmers wanting to write efficient, coordinated MPCs that they
can reason about and understand. The work applies to developers and
cryptographers wanting to design new applications and protocols, which
they are able to do at the language level, above the cryptographic
details. The λ-Symphony formalization of Symphony, and the proofs about
it, are also surprisingly simple, and can be a basis for follow-on
formalization work in MPC and distributed programming. All code and
artifacts are available, open-source.</p>
Primrose: Selecting Container Data Types by Their Properties2023-02-15T00:00:00+00:002023-02-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F11Qin, XueyingO'Connor, LiamSteuwer, Michel<h4 id="context">Context</h4>
<p>Container data types are ubiquitous in computer programming, enabling developers to efficiently store and process collections of data with an easy-to-use programming interface.
Many programming languages offer a variety of container implementations in their standard libraries based on data structures offering different capabilities and performance characteristics.</p>
<h4 id="inquiry">Inquiry</h4>
<p>Choosing the <strong>best</strong> container for an application is not always straightforward, as performance characteristics can change drastically in different scenarios, and as real-world performance is not always correlated to theoretical complexity.</p>
<h4 id="approach">Approach</h4>
<p>We present Primrose, a language-agnostic tool for selecting the best performing valid container implementation from a set of container data types that satisfy <strong>properties</strong> given by application developers.
Primrose automatically selects the set of valid container implementations for which the <strong>library specifications</strong>, written by the developers of container libraries, satisfies the specified properties.
Finally, Primrose ranks the valid library implementations based on their runtime performance.</p>
<h4 id="knowledge">Knowledge</h4>
<p>With Primrose, application developers can specify the expected behaviour of a container as a type refinement with <strong>semantic properties</strong>, e.g., if the container should only contain unique values (such as a <code class="language-plaintext highlighter-rouge">set</code>) or should satisfy the LIFO property of a <code class="language-plaintext highlighter-rouge">stack</code>.
Semantic properties nicely complement <strong>syntactic properties</strong> (i.e., traits, interfaces, or type classes), together allowing developers to specify a container’s programming interface <strong>and</strong> behaviour without committing to a concrete implementation.</p>
<h4 id="grounding">Grounding</h4>
<p>We present our prototype implementation of Primrose that preprocesses annotated Rust code,
selects valid container implementations and ranks them on their performance. The design of Primrose is, however, language-agnostic, and is easy to integrate into other programming languages that support container data types and traits, interfaces, or type classes. Our implementation encodes properties and library specifications into verification conditions in Rosette, an interface for SMT solvers, which determines the set of valid container implementations. We evaluate Primrose by specifying several container implementations,
and measuring the time taken to select valid implementations for various combinations of properties with the solver. We automatically validate that container implementations conform to their library specifications via property-based testing.</p>
<h4 id="importance">Importance</h4>
<p>This work provides a novel approach to bring abstract modelling and specification of container types directly into the programmer’s workflow.
Instead of selecting concrete container implementations, application programmers can now work on the level of specification, merely stating the behaviours they require from their container types,
and the best implementation can be selected automatically.</p>
Revisiting Language Support for Generic Programming: When Genericity Is a Core Design Goal2022-10-15T00:00:00+00:002022-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F4Chetioui, BenjaminJärvi, JaakkoHaveraaen, Magne
<h4 id="context">Context</h4>
<p>Generic programming, as defined by Stepanov, is a methodology
for writing efficient and reusable algorithms by considering only the required
properties of their underlying data types and operations. Generic programming
has proven to be an effective means of constructing libraries of reusable
software components in languages that support it. Generics-related language
design choices play a major role in how conducive generic programming is in
practice.</p>
<h4 id="inquiry">Inquiry</h4>
<p>Several mainstream programming languages (e.g. Java and C++)
were first created without generics; features to support generic programming
were added later, gradually. Much of the existing literature on supporting
generic programming focuses thus on retrofitting generic programming into
existing languages and identifying related implementation challenges. Is the programming experience significantly better, or different when programming with a language designed for generic programming without limitations from prior language design choices?</p>
<h4 id="approach">Approach</h4>
<p>We examine Magnolia, a language designed to embody generic
programming. Magnolia is representative of an approach to language design rooted
in algebraic specifications. We repeat a well-known experiment, where we put
Magnolia’s generic programming facilities under scrutiny by implementing a
subset of the Boost Graph Library, and reflect on our development experience.</p>
<h4 id="knowledge">Knowledge</h4>
<p>We discover that the idioms identified as key features for
supporting Stepanov-style generic programming in the previous studies and work
on the topic do not tell a full story. We clarify which of them are more of a
means to an end, rather than fundamental features for supporting generic
programming. Based on the development experience with Magnolia, we identify variadics as an additional key feature for generic programming and
point out limitations and challenges of genericity by property.</p>
<h4 id="grounding">Grounding</h4>
<p>Our work uses a well-known framework for evaluating the
generic programming facilities of a language from the literature to evaluate
the algebraic approach through Magnolia, and we draw comparisons with
well-known programming languages.</p>
<h4 id="importance">Importance</h4>
<p>This work gives a fresh perspective on generic programming,
and clarifies what are fundamental language properties and their trade-offs
when considering supporting Stepanov-style generic programming. The understanding of how to set the ground for generic programming will inform future language design.</p>
Out-of-Things Debugging: A Live Debugging Approach for Internet of Things2022-10-15T00:00:00+00:002022-10-15T00:00:00+00:00urn:doi:10.22152%2Fprogramming-journal.org%2F2023%2F7%2F5Rojas Castillo, CarlosMarra, MatteoBauwens, JimGonzalez Boix, Elisa
<h4 id="context">Context</h4>
<p>Internet of Things (IoT) has become an important kind of distributed systems thanks to the wide-spread of cheap embedded devices equipped with different networking technologies. Although ubiquitous, developing IoT systems remains challenging.</p>
<h4 id="inquiry">Inquiry</h4>
<p>A recent field study with 194 IoT developers identifies debugging as one of the main challenges faced when developing IoT systems. This comes from the lack of debugging tools taking into account the unique properties of IoT systems such as non-deterministic data, and hardware restricted devices. On the one hand, offline debuggers allow developers to analyse post-failure recorded program information, but impose too much overhead on the devices while generating such information.
Furthermore, the analysis process is also time-consuming and might miss contextual information relevant to find the root cause of bugs. On the other hand, online debuggers do allow debugging a program upon a failure while providing contextual information (e.g., stack trace). In particular, remote online debuggers enable debugging of devices without physical access to them. However, they experience debugging interference due to network delays which complicates bug reproducibility, and have limited support for dynamic software updates on remote devices.</p>
<h4 id="approach">Approach</h4>
<p>This paper proposes <em>out-of-things</em> debugging, an online debugging approach especially designed for IoT systems. The debugger is always-on as it ensures constant availability to for instance debug post-deployment situations. Upon a failure or breakpoint, out-of-things debugging moves the state of a deployed application to the developer’s machine. Developers can then debug the application locally by applying operations (e.g., step commands) to the retrieved state. Once debugging is finished, developers can commit bug fixes to the device through live update capabilities. Finally, by means of a fine-grained flexible interface for accessing remote resources, developers have full control over the debugging overhead imposed on the device, and the access to device hardware resources (e.g., sensors) needed during local debugging.</p>
<h4 id="knowledge">Knowledge</h4>
<p>Out-of-things debugging maintains good properties of remote debugging as it does not require physical access to the device to debug it, while reducing debugging interference since there are no network delays on operations (e.g., stepping) issued on the debugger since those happen locally. Furthermore, device resources are only accessed when requested by the user which further mitigates overhead and opens avenues for mocking or simulation of non-accessed resources.</p>
<h4 id="grounding">Grounding</h4>
<p>We implemented an out-of-things debugger as an extension to a WebAssembly Virtual Machine and benchmarked its suitability for IoT. In particular, we compared our solution to remote debugging alternatives based on metrics such as network overhead, memory usage, scalability, and usability in production settings. From the benchmarks, we conclude that our debugger exhibits competitive performance in addition to confining overhead without sacrificing debugging convenience and flexibility.</p>
<h4 id="importance">Importance</h4>
<p>Out-of-things debugging enables debugging of IoT systems by means of classical online operations (e.g., stepwise execution) while addressing IoT-specific concerns (e.g., hardware limitations). We show that having the debugger always-on does not have to come at cost of performance loss or increased overhead but instead can enforce a smooth-going and flexible debugging experience of IoT systems.</p>