Type-Safe Unions in C++ and Rust

2016-10-07

C++, Rust

Type-Safe Unions in C++

Last night I watched Ben Deane’s talk from CppCon on “Using Types Effectively”. In this talk he describes how to effectively use the type system of C++ to enforce invariants at compile-time and make code safer. I highly recommend watching the whole talk. I want to focus on one particular idea which formed the core of this presentation; the implementation of type-safe unions with std::variant.

In his talk, Ben outlines a situation which I have certainly seen occur before in C++ code where a stateful class bundles data for several different states together, even though some of those values might be invalid or unused in certain states.

enum class ConnectionState {
    DISCONNECTED,
    CONNECTING,
    CONNECTED,
    CONNECTION_INTERRUPTED
};
struct Connection {
    ConnectionState m_connectionState;
    
    std::string m_serverAddress;
    ConnectionId m_id;
    std::chrono::system_clock:time_point m_connectedTime;
    std::chrono::milliseconds m_lastPingTime;
    Timer m_reconnectTimer;
};

As Ben points out in his talk, there’s many problems with this format. For example, m_id won’t be used unless the ConnectionState is CONNECTED. In all other states this value could be (and probably is) invalid, so it doesn’t make sense to allow access to it in those states. The solution to this problem presented by Ben was using a separate struct for each state, and combining them with a std::variant.

struct Connection {
    std::string m_serverAddress;
    
    struct Disconnected {};
    struct Conecting {};
    struct Connected {
        ConnectionId m_id;
        std::chrono::system_clock:time_point m_connectedTime;
        std::chrono::milliseconds m_lastPingTime;
    };
    struct ConnectionInterrupted {
        std::chrono::system_clock::time_point m_disconnectedTime;
        Timer m_reconnectTimer;
    };
    
    std::variant<Disconnected,
                 Connecting,
                 Connected,
                 ConnectionInterrupted> m_connection;
};

This immediately makes the use of an enum unnecessary since each state is now represented by an individual type, any of which may be held by the std::variant. It also separates out the state variables for each state, making the meaning of each clearer by embedding them within the context they relate to. Additionally, this struct should now be smaller in memory relative to the size of the original struct: the variant will only take up an amount of space equal to the largest individual struct (all of which are smaller than the original), plus a little overhead for the variant to store the typeid/tag of the contained value.

Type-Safe Unions in Rust

While watching Ben’s presentation, I couldn’t help but feel I’d seen this all before… in Rust! I’m aware the idea for type-safe unions isn’t unique or original to Rust, but it’s the first language I’ve experimented with that has made them a first-class feature. In Rust, these type-safe unions are implemented using the language’s very powerful enumerations. The example from above adapted for Rust would look something like the following:

use time::{PreciseTime, SteadyTime};
enum ConnectionState {
    Disconnected,
    Connecting,
    Connected {
        id: ConnectionId,
        connected_time: PreciseTime,
        last_ping_time: SteadyTime
    },
    ConnectionInterrupted {
        connected_time: PreciseTime,
        reconnect_timer: Timer
    }
}
struct Connection {
    server_address: String,
    state: ConnectionState
}

This doesn’t look all that dissimilar to the C++ implementation, except that we retain the separate declaration for the enum (rather than having it embedded in the Connection struct).

Using Type-Safe Unions

After looking at std::variant next to Rust’s tagged enum, I can’t help but feel that usage is slightly more ergonomic in Rust. This isn’t particularly surprising given the legacy C++ is bound to.

Initializing the union is easy and concise in both languages:

1	Connection conn("my connection", Connection::Disconnected{});

let conn = Connection{
    server_address: "my_connection".to_string(),
    state: ConnectionState::Disconnected
};

There are ~~two~~ three main options for extracting a value from a std::variant. It is possible to:

Use std::get<type>(variant) to get the value for a specific alternative of the variant. Throws a std::bad_variant_access exception if the variant isn’t currently holding a value of the given type.
Use std::get_if<type>(&variant) to get a pointer to the value contained in the variant. Returns nullptr if the variant doesn’t contain a value of the given type.

The exception thrown by std::get will be problematic for some use cases, but can be avoided by checking the variant state using the std::holds_alternative<type>(variant) or std::get_if<type>(&variant) non-member functions.

The third option [2], is using the std::visit non-member function. This function takes a function object (e.g. lambda) with an overload for each type the variant can hold, plus a list of variants. It executes the operator() method corresponding to the type currently held each variant passed to the function. It may also take a generic lambda.

std::visit(
    [](auto&& con) {
        using T = std::remove_cv_t<std::remove_reference_t<decltype(arg)>>;
        if constexpr (std::is_same_v<T, Connection::Disconnected>)
            std::cout << "Connection disconnected" << std::endl;
        else if constexpr (std::is_same_v<T, Connection::Connecting>)
            std::cout << "Connection connecting..." << std::endl;
        else if constexpr (std::is_same_v<T, Connection::Connected>)
            std::cout << "Connection id " << con.m_id 
                << " connected for " << con.m_connectedTime << std::endl;
        else if constexpr (std::is_same_v<T, Connection::ConnectionInterrupted>)
            std::cout << "Connection interrupted!" << std::endl;
    },
    conn
)

Rust has exactly two ways to access the value inside an enum. The first is by using the match construct. match is similar to C++’s switch statement, but it is much more powerful and doesn’t have the same foot-cannons as switch.

match conn.state {
    ConnectionState::Disconnected => println!("Connection disconnected"),
    ConnectionState::Connecting => println!("Connection connecting..."),
    ConnectionState::Connected(
        id: id,
        connected_time: ct,
        _: last_ping_time) => println!("Connection id {} connected for {:?}", id, ct),
    _ => println!("Connection is in an unknown state!")
}

This is a pretty nice way to handle extracting values from the enum, and it has features like pattern matching and exhaustiveness checking which help make it versatile and safe at the same time. Intuition tells me that it may be possible to do something like this in C++ with std::variant using a mix of the switch expression, std::variant_alternative, and some constexpr or template metaprogramming. Until C++17 ships and implementations appear in the wild (likely sometime next year) I’ll just have to imagine how/whether this would work. [2]In C++ the visit method can be used on std::variant to similar effect, but without some of the extra goodies and guarantees offered by Rust’s match.

The other way to extract a value from an enum in Rust is to use the if let or while let expressions. This allows conditional binding to the enum value if the tag matches the one specified.

1
2
3

if let ConnectionState::ConnectionInterrupted{connected_time: ct, reconnect_timer: rt} = conn.state {
    println!("Connection interrupted {:?} ago! Reconnecting in {:?}", ct, rt);
}

The while let expression in Rust is a similar conditional binding, but with the ability to loop until the enum cannot be unpacked. This can be simulated in C++ using std::get and std::holds_alternative:

if (std::holds_alternative<Connection::ConnectionInterrupted>(conn)) {
    Connection::ConnectionInterrupted conn_interrupted = std::get<ConnectionInterrupted>(conn);
    std::cout << "Connection Interrupted " << conn_interrupted.m_disconnectedTime 
        << " ago! Reconnecting in " << conn_interrupted.m_reconnectTimer << std::endl;
}

Closing Comments

I agree with Ben Deane’s closing statement: std::variant will be one of the most important additions to C++ with the introduction of the C++17 standard. It clearly has some deficiencies and ergonomic issues, but it’s largely the the tool C++ deserves (and one it needs). To me, the biggest disappointment in the std::variant API is the use of a pointer for the std::get_if return value instead of the std::optional (also coming in C++17). However, given std::optional‘s history of being delayed from standardisation, I can understand the reluctance of the std::variant authors to do so.

Rust provides an interesting insight into what a “clean room” implementation of such a variant type might look like, and has excellent first-class facilities for handling tagged unions/variants. I prefer the ergonomics of Rust’s approach, but for integration into existing projects and codebases std::variant strikes a practical compromise. I look forward to making use of it when C++17 finally arrives.

Updates:
[1] /u/ssokolow mentioned on the Rust subreddit that this pattern can be taken even further for state machines (at least in Rust).
[2] /u/evaned correctly pointed out that I forgot to mention the std::visitor::visit method which works much the same as Rust’s match, and improves the ergonomics of std::variant.