Routing Problems¶

See also the Multi-Task VRP at the bottom of this page, that includes 16 variants!

Asymmetric Traveling Salesman Problem (ATSP)¶

ATSPEnv ¶

ATSPEnv(
    generator: ATSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Asymmetric Traveling Salesman Problem (ATSP) environment At each step, the agent chooses a customer to visit. The reward is 0 unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.

Observations

distance matrix between customers
the current customer
the first customer (for calculating the reward)
the remaining unvisited customers

Constraints

the tour starts and ends at the same customer.
each customer must be visited exactly once.

Finish Condition

the agent has visited all customers.

Reward

(minus) the negative length of the path.

Parameters:

generator (ATSPGenerator, default: None ) –

ATSPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Source code in rl4co/envs/routing/atsp/env.py

def __init__(
    self,
    generator: ATSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = ATSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

ATSPGenerator ¶

ATSPGenerator(
    num_loc: int = 10,
    min_dist: float = 0.0,
    max_dist: float = 1.0,
    dist_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    tmat_class: bool = True,
    **kwargs
)

Bases: Generator

Data generator for the Asymmetric Travelling Salesman Problem (ATSP) Generate distance matrices inspired by the reference MatNet (Kwon et al., 2021) We satifsy the triangle inequality (TMAT class) in a batch

Parameters:

num_loc (int, default: 10 ) –

number of locations (customers) in the TSP
min_dist (float, default: 0.0 ) –

minimum value for the distance between nodes
max_dist (float, default: 1.0 ) –

maximum value for the distance between nodes
dist_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the distance between nodes
tmat_class (bool, default: True ) –

whether to generate a class of distance matrix

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer

Source code in rl4co/envs/routing/atsp/generator.py

def __init__(
    self,
    num_loc: int = 10,
    min_dist: float = 0.0,
    max_dist: float = 1.0,
    dist_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    tmat_class: bool = True,
    **kwargs
):
    self.num_loc = num_loc
    self.min_dist = min_dist
    self.max_dist = max_dist
    self.tmat_class = tmat_class

    # Distance distribution
    if kwargs.get("dist_sampler", None) is not None:
        self.dist_sampler = kwargs["dist_sampler"]
    else:
        self.dist_sampler = get_sampler("dist", dist_distribution, 0.0, 1.0, **kwargs)

Capacitated Vehicle Routing Problem (CVRP)¶

CVRPEnv ¶

CVRPEnv(
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Capacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations

location of the depot.
locations and demand of each customer.
current location of the vehicle.
the remaining customer of the vehicle,

Constraints

the tour starts and ends at the depot.
each customer must be visited exactly once.
the vehicle cannot visit customers exceed the remaining capacity.
the vehicle can return to the depot to refill the capacity.

Finish Condition

the vehicle has visited all customers and returned to the depot.

Reward

(minus) the negative length of the path.

Parameters:

generator (CVRPGenerator, default: None ) –

CVRPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Methods:

check_solution_validity –

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded
load_data –

Dataset loading from file
replace_selected_actions –

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/cvrp/env.py

def __init__(
    self,
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = CVRPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

check_solution_validity `staticmethod` ¶

check_solution_validity(td: TensorDict, actions: Tensor)

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded

Source code in rl4co/envs/routing/cvrp/env.py

@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor):
    """Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded"""
    # Check if tour is valid, i.e. contain 0 to n-1
    batch_size, graph_size = td["demand"].size()
    sorted_pi = actions.data.sort(1)[0]

    # Sorting it should give all zeros at front and then 1...n
    assert (
        torch.arange(1, graph_size + 1, out=sorted_pi.data.new())
        .view(1, -1)
        .expand(batch_size, graph_size)
        == sorted_pi[:, -graph_size:]
    ).all() and (sorted_pi[:, :-graph_size] == 0).all(), "Invalid tour"

    # Visiting depot resets capacity so we add demand = -capacity (we make sure it does not become negative)
    demand_with_depot = torch.cat((-td["vehicle_capacity"], td["demand"]), 1)
    d = demand_with_depot.gather(1, actions)

    used_cap = torch.zeros_like(td["demand"][:, 0])
    for i in range(actions.size(1)):
        used_cap += d[
            :, i
        ]  # This will reset/make capacity negative if i == 0, e.g. depot visited
        # Cannot use less than 0
        used_cap[used_cap < 0] = 0
        assert (
            used_cap <= td["vehicle_capacity"] + 1e-5
        ).all(), "Used more than capacity"

load_data `staticmethod` ¶

load_data(fpath, batch_size=[])

Dataset loading from file Normalize demand by capacity to be in [0, 1]

Source code in rl4co/envs/routing/cvrp/env.py

@staticmethod
def load_data(fpath, batch_size=[]):
    """Dataset loading from file
    Normalize demand by capacity to be in [0, 1]
    """
    td_load = load_npz_to_tensordict(fpath)
    td_load.set("demand", td_load["demand"] / td_load["capacity"][:, None])
    return td_load

replace_selected_actions ¶

replace_selected_actions(
    cur_actions: Tensor,
    new_actions: Tensor,
    selection_mask: Tensor,
) -> Tensor

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/cvrp/env.py

def replace_selected_actions(
    self,
    cur_actions: torch.Tensor,
    new_actions: torch.Tensor,
    selection_mask: torch.Tensor,
) -> torch.Tensor:
    """
    Replace selected current actions with updated actions based on `selection_mask`.

    Args:
        cur_actions [batch_size, num_loc]
        new_actions [batch_size, num_loc]
        selection_mask [batch_size,]
    """
    diff_length = cur_actions.size(-1) - new_actions.size(-1)
    if diff_length > 0:
        new_actions = torch.nn.functional.pad(
            new_actions, (0, diff_length, 0, 0), mode="constant", value=0
        )
    elif diff_length < 0:
        cur_actions = torch.nn.functional.pad(
            cur_actions, (0, -diff_length, 0, 0), mode="constant", value=0
        )
    cur_actions[selection_mask] = new_actions[selection_mask]
    return cur_actions

CVRPGenerator ¶

CVRPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    depot_distribution: Union[
        int, float, str, type, Callable
    ] = None,
    min_demand: int = 1,
    max_demand: int = 10,
    demand_distribution: Union[
        int, float, type, Callable
    ] = Uniform,
    vehicle_capacity: float = 1.0,
    capacity: float = None,
    **kwargs
)

Bases: Generator

Data generator for the Capacitated Vehicle Routing Problem (CVRP).

Parameters:

num_loc (int, default: 20 ) –

number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
min_loc (float, default: 0.0 ) –

minimum value for the location coordinates
max_loc (float, default: 1.0 ) –

maximum value for the location coordinates
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the location coordinates
depot_distribution (Union[int, float, str, type, Callable], default: None ) –

distribution for the depot location. If None, sample the depot from the locations
min_demand (int, default: 1 ) –

minimum value for the demand of each customer
max_demand (int, default: 10 ) –

maximum value for the demand of each customer
demand_distribution (Union[int, float, type, Callable], default: Uniform ) –

distribution for the demand of each customer
capacity (float, default: None ) –

capacity of the vehicle

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size]: capacity of the vehicle

Source code in rl4co/envs/routing/cvrp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    depot_distribution: Union[int, float, str, type, Callable] = None,
    min_demand: int = 1,
    max_demand: int = 10,
    demand_distribution: Union[int, float, type, Callable] = Uniform,
    vehicle_capacity: float = 1.0,
    capacity: float = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_demand = min_demand
    self.max_demand = max_demand
    self.vehicle_capacity = vehicle_capacity

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = (
            get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
            if depot_distribution is not None
            else None
        )

    # Demand distribution
    if kwargs.get("demand_sampler", None) is not None:
        self.demand_sampler = kwargs["demand_sampler"]
    else:
        self.demand_sampler = get_sampler(
            "demand", demand_distribution, min_demand - 1, max_demand - 1, **kwargs
        )

    # Capacity
    if (
        capacity is None
    ):  # If not provided, use the default capacity from Kool et al. 2019
        capacity = CAPACITIES.get(num_loc, None)
    if (
        capacity is None
    ):  # If not in the table keys, find the closest number of nodes as the key
        closest_num_loc = min(CAPACITIES.keys(), key=lambda x: abs(x - num_loc))
        capacity = CAPACITIES[closest_num_loc]
        log.warning(
            f"The capacity capacity for {num_loc} locations is not defined. Using the closest capacity: {capacity}\
                with {closest_num_loc} locations."
        )
    self.capacity = capacity

Multiple Traveling Salesman Problem (mTSP)¶

MTSPEnv ¶

MTSPEnv(
    generator: MTSPGenerator = None,
    generator_params: dict = {},
    cost_type: str = "minmax",
    **kwargs
)

Bases: RL4COEnvBase

Multiple Traveling Salesman Problem environment At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways:

- `minmax`: (default) the reward is the maximum of the path lengths of all the agents
- `sum`: the cost is the sum of the path lengths of all the agents

Reward is - cost, so the goal is to maximize the reward (minimize the cost).

Observations

locations of the depot and each customer.
number of agents.
the current agent index.
the current location of the vehicle.

Constrains

each agent's tour starts and ends at the depot.
each customer must be visited exactly once.

Finish condition

all customers are visited and all agents back to the depot.

Reward

There are two ways to calculate the cost (-reward):

minmax: (default) the cost is the maximum of the path lengths of all the agents.
sum: the cost is the sum of the path lengths of all the agents.

Parameters:

cost_type (str, default: 'minmax' ) –

type of cost to use, either minmax or sum
generator (MTSPGenerator, default: None ) –

MTSPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Source code in rl4co/envs/routing/mtsp/env.py

def __init__(
    self,
    generator: MTSPGenerator = None,
    generator_params: dict = {},
    cost_type: str = "minmax",
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = MTSPGenerator(**generator_params)
    self.generator = generator
    self.cost_type = cost_type
    self._make_spec(self.generator)

MTSPGenerator ¶

MTSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    min_num_agents: int = 5,
    max_num_agents: int = 5,
    **kwargs
)

Bases: Generator

Data generator for the Multiple Travelling Salesman Problem (mTSP).

Parameters:

num_loc (int, default: 20 ) –

number of locations (customers) in the TSP
min_loc (float, default: 0.0 ) –

minimum value for the location coordinates
max_loc (float, default: 1.0 ) –

maximum value for the location coordinates
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the location coordinates
min_num_agents (int, default: 5 ) –

minimum number of agents (vehicles), include
max_num_agents (int, default: 5 ) –

maximum number of agents (vehicles), include

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer num_agents [batch_size]: number of agents (vehicles)

Source code in rl4co/envs/routing/mtsp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    min_num_agents: int = 5,
    max_num_agents: int = 5,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_num_agents = min_num_agents
    self.max_num_agents = max_num_agents

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

Orienteering Problem (OP)¶

OPEnv ¶

OPEnv(
    generator: OPGenerator = None,
    generator_params: dict = {},
    prize_type: str = "dist",
    **kwargs
)

Bases: RL4COEnvBase

Orienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.

Observations

location of the depot
locations and prize of each customer
current location of the vehicle
current tour length
current total prize
the remaining length of the path

Constraints

the tour starts and ends at the depot
not all customers need to be visited
the vehicle cannot visit customers exceed the remaining length of the path

Finish Condition

the vehicle back to the depot

Reward

the sum of the prizes of visited nodes

Parameters:

generator (OPGenerator, default: None ) –

OPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Methods:

get_action_mask –

Get action mask with 1 = feasible action, 0 = infeasible action.
check_solution_validity –

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.

Source code in rl4co/envs/routing/op/env.py

def __init__(
    self,
    generator: OPGenerator = None,
    generator_params: dict = {},
    prize_type: str = "dist",
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = OPGenerator(**generator_params)
    self.generator = generator
    self.prize_type = prize_type
    assert self.prize_type in [
        "dist",
        "unif",
        "const",
    ], f"Invalid prize_type: {self.prize_type}"
    self._make_spec(self.generator)

get_action_mask `staticmethod` ¶

get_action_mask(td: TensorDict) -> Tensor

Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.

Source code in rl4co/envs/routing/op/env.py

@staticmethod
def get_action_mask(td: TensorDict) -> torch.Tensor:
    """Get action mask with 1 = feasible action, 0 = infeasible action.
    Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.
    """
    current_loc = gather_by_index(td["locs"], td["current_node"])[..., None, :]
    exceeds_length = (
        td["tour_length"][..., None] + (td["locs"] - current_loc).norm(p=2, dim=-1)
        > td["max_length"]
    )
    mask = td["visited"] | td["visited"][..., 0:1] | exceeds_length

    action_mask = ~mask  # 1 = feasible action, 0 = infeasible action

    # Depot can always be visited: we do not hardcode knowledge that this is strictly suboptimal if other options are available
    action_mask[..., 0] = 1
    return action_mask

check_solution_validity `staticmethod` ¶

check_solution_validity(
    td: TensorDict,
    actions: Tensor,
    add_distance_to_depot: bool = True,
) -> None

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded. If add_distance_to_depot if True, then the distance to the depot is added to max length since by default, the max length is modified in the reset function to account for the distance to the depot.

Source code in rl4co/envs/routing/op/env.py

@staticmethod
def check_solution_validity(
    td: TensorDict, actions: torch.Tensor, add_distance_to_depot: bool = True
) -> None:
    """Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.
    If `add_distance_to_depot` if True, then the distance to the depot is added to max length since by default, the max length is
    modified in the reset function to account for the distance to the depot.
    """

    # Check that tours are valid, i.e. contain 0 to n -1
    sorted_actions = actions.data.sort(1)[0]
    # Make sure each node visited once at most (except for depot)
    assert (
        (sorted_actions[:, 1:] == 0)
        | (sorted_actions[:, 1:] > sorted_actions[:, :-1])
    ).all(), "Duplicates"

    # Gather locations in order of tour and get the length of tours
    locs_ordered = gather_by_index(td["locs"], actions)
    length = get_tour_length(locs_ordered)

    max_length = td["max_length"]
    if add_distance_to_depot:
        max_length = (
            max_length
            + (td["locs"][..., 0:1, :] - td["locs"]).norm(p=2, dim=-1)
            + 1e-6
        )
    assert (
        length[..., None] <= max_length + 1e-5
    ).all(), "Max length exceeded by {}".format(
        (length[..., None] - max_length).max()
    )

OPGenerator ¶

OPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    depot_distribution: Union[
        int, float, str, type, Callable
    ] = None,
    min_prize: float = 1.0,
    max_prize: float = 1.0,
    prize_distribution: Union[
        int, float, type, Callable
    ] = Uniform,
    prize_type: str = "dist",
    max_length: Union[float, Tensor] = None,
    **kwargs
)

Bases: Generator

Data generator for the Orienteering Problem (OP).

Parameters:

num_loc (int, default: 20 ) –

number of locations (customers) in the OP, without the depot. (e.g. 10 means 10 locs + 1 depot)
min_loc (float, default: 0.0 ) –

minimum value for the location coordinates
max_loc (float, default: 1.0 ) –

maximum value for the location coordinates
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the location coordinates
depot_distribution (Union[int, float, str, type, Callable], default: None ) –

distribution for the depot location. If None, sample the depot from the locations
min_prize (float, default: 1.0 ) –

minimum value for the prize of each customer
max_prize (float, default: 1.0 ) –

maximum value for the prize of each customer
prize_distribution (Union[int, float, type, Callable], default: Uniform ) –

distribution for the prize of each customer
max_length (Union[float, Tensor], default: None ) –

maximum length of the path

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot prize [batch_size, num_loc]: prize of each customer max_length [batch_size, 1]: maximum length of the path for each customer

Source code in rl4co/envs/routing/op/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    depot_distribution: Union[int, float, str, type, Callable] = None,
    min_prize: float = 1.0,
    max_prize: float = 1.0,
    prize_distribution: Union[int, float, type, Callable] = Uniform,
    prize_type: str = "dist",
    max_length: Union[float, torch.Tensor] = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_prize = min_prize
    self.max_prize = max_prize
    self.prize_type = prize_type
    self.max_length = max_length

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = get_sampler(
            "depot", depot_distribution, min_loc, max_loc, **kwargs
        ) if depot_distribution is not None else None

    # Prize distribution
    if kwargs.get("prize_sampler", None) is not None:
        self.prize_sampler = kwargs["prize_sampler"]
    elif (
        prize_distribution == "dist"
    ):  # If prize_distribution is 'dist', then the prize is the distance from the depot
        self.prize_sampler = None
    else:
        self.prize_sampler = get_sampler(
            "prize", prize_distribution, min_prize, max_prize, **kwargs
        )

    # Max length
    if max_length is not None:
        self.max_length = max_length
    else:
        self.max_length = MAX_LENGTHS.get(num_loc, None)
    if self.max_length is None:
        closest_num_loc = min(MAX_LENGTHS.keys(), key=lambda x: abs(x - num_loc))
        self.max_length = MAX_LENGTHS[closest_num_loc]
        log.warning(
            f"The max length for {num_loc} locations is not defined. Using the closest max length: {self.max_length}\
                with {closest_num_loc} locations."
        )

Pickup and Delivery Problem (PDP)¶

PDPEnv ¶

PDPEnv(
    generator: PDPGenerator = None,
    generator_params: dict = {},
    force_start_at_depot: bool = False,
    **kwargs
)

Bases: RL4COEnvBase

Pickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):

- 1 depot
- `num_loc` / 2 pickup locations
- `num_loc` / 2 delivery locations

The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location

Observations

locations of the depot, pickup, and delivery locations
current location of the vehicle
the remaining locations to deliver
the visited locations
the current step

Constraints

the tour starts and ends at the depot
each pickup location must be visited before its corresponding delivery location
the vehicle cannot visit the same location twice

Finish Condition

the vehicle has visited all locations

Reward

(minus) the negative length of the path

Parameters:

generator (PDPGenerator, default: None ) –

PDPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator
force_start_at_depot (bool, default: False ) –

whether to force the agent to start at the depot If False (default), the agent won't consider the depot, which is added in the get_reward method If True, the only valid action at the first step is to visit the depot (=0)

Methods:

get_num_starts –

Only half of the nodes (i.e. pickup nodes) can be start nodes
select_start_nodes –

Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected

Source code in rl4co/envs/routing/pdp/env.py

def __init__(
    self,
    generator: PDPGenerator = None,
    generator_params: dict = {},
    force_start_at_depot: bool = False,
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = PDPGenerator(**generator_params)
    self.generator = generator
    self.force_start_at_depot = force_start_at_depot
    self._make_spec(self.generator)

get_num_starts ¶

get_num_starts(td)

Only half of the nodes (i.e. pickup nodes) can be start nodes

Source code in rl4co/envs/routing/pdp/env.py

def get_num_starts(self, td):
    """Only half of the nodes (i.e. pickup nodes) can be start nodes"""
    return (td["locs"].shape[-2] - 1) // 2

select_start_nodes ¶

select_start_nodes(td, num_starts)

Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected

Source code in rl4co/envs/routing/pdp/env.py

def select_start_nodes(self, td, num_starts):
    """Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected"""
    num_possible_starts = (td["locs"].shape[-2] - 1) // 2
    selected = (
        torch.arange(num_starts, device=td.device).repeat_interleave(td.shape[0])
        % num_possible_starts
        + 1
    )
    return selected

PDPGenerator ¶

PDPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    depot_distribution: Union[
        int, float, str, type, Callable
    ] = None,
    **kwargs
)

Bases: Generator

Data generator for the Pickup and Delivery Problem (PDP). Args: num_loc: number of locations (customers) in the PDP, without the depot. (e.g. 10 means 10 locs + 1 depot)

    - 1 depot
    - `num_loc` / 2 pickup locations
    - `num_loc` / 2 delivery locations
min_loc: minimum value for the location coordinates
max_loc: maximum value for the location coordinates
init_sol_type: the method type used for generating initial solutions (random or greedy)
loc_distribution: distribution for the location coordinates
depot_distribution: distribution for the depot location. If None, sample the depot from the locations

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot

Source code in rl4co/envs/routing/pdp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    depot_distribution: Union[int, float, str, type, Callable] = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.init_sol_type = init_sol_type

    # Number of locations must be even
    if num_loc % 2 != 0:
        log.warn(
            "Number of locations must be even. Adding 1 to the number of locations."
        )
        self.num_loc += 1

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = get_sampler(
            "depot", depot_distribution, min_loc, max_loc, **kwargs
        ) if depot_distribution is not None else None

Prize Collecting Traveling Salesman Problem (PCTSP)¶

PCTSPEnv ¶

PCTSPEnv(
    generator: PCTSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Prize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.

Observations

locations of the nodes
prize and penalty of each node
current location of the vehicle
current total prize
current total penalty
visited nodes
prize required to visit a node
the current step

Constraints

the tour starts and ends at the depot
the vehicle cannot visit nodes exceed the remaining prize

Finish Condition

the vehicle back to the depot

Reward

the sum of the saved penalties

Parameters:

generator (PCTSPGenerator, default: None ) –

OPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Methods:

get_action_mask –

Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes
check_solution_validity –

Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

Source code in rl4co/envs/routing/pctsp/env.py

def __init__(
    self,
    generator: PCTSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = PCTSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

get_action_mask `staticmethod` ¶

get_action_mask(td: TensorDict) -> Tensor

Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes

Source code in rl4co/envs/routing/pctsp/env.py

@staticmethod
def get_action_mask(td: TensorDict) -> torch.Tensor:
    """Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes"""
    mask = td["visited"] | td["visited"][..., 0:1]
    mask[..., 0] = (td["cur_total_prize"] < 1.0) & (
        td["visited"][..., 1:].int().sum(-1) < td["visited"][..., 1:].size(-1)
    )
    return ~(mask > 0)  # Invert mask, since 1 means feasible action

check_solution_validity `staticmethod` ¶

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

Source code in rl4co/envs/routing/pctsp/env.py

@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited"""

    # Check that tours are valid, i.e. contain 0 to n -1
    sorted_actions = actions.data.sort(1)[0]

    # Make sure each node visited once at most (except for depot)
    assert (
        (sorted_actions[..., 1:] == 0)
        | (sorted_actions[..., 1:] > sorted_actions[..., :-1])
    ).all(), "Duplicates"

    prize = td["real_prize"][..., 1:]  # Remove depot
    prize_with_depot = torch.cat((torch.zeros_like(prize[:, :1]), prize), 1)
    p = prize_with_depot.gather(1, actions)

    # Either prize constraint should be satisfied or all prizes should be visited
    assert (
        (p.sum(-1) >= 1 - 1e-5)
        | (
            sorted_actions.size(-1) - (sorted_actions == 0).int().sum(-1)
            == (td["locs"].size(-2) - 1)
        )  # no depot
    ).all(), "Total prize does not satisfy min total prize"

PCTSPGenerator ¶

PCTSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    depot_distribution: Union[
        int, float, str, type, Callable
    ] = None,
    penalty_factor: float = 3.0,
    prize_required: float = 1.0,
    **kwargs
)

Bases: Generator

Data generator for the Prize-collecting Traveling Salesman Problem (PCTSP).

Parameters:

num_loc (int, default: 20 ) –

number of locations (customers) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
min_loc (float, default: 0.0 ) –

minimum value for the location coordinates
max_loc (float, default: 1.0 ) –

maximum value for the location coordinates
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the location coordinates
depot_distribution (Union[int, float, str, type, Callable], default: None ) –

distribution for the depot location. If None, sample the depot from the locations
min_demand –

minimum value for the demand of each customer
max_demand –

maximum value for the demand of each customer
demand_distribution –

distribution for the demand of each customer
capacity –

capacity of the vehicle

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each city depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size, 1]: capacity of the vehicle

Source code in rl4co/envs/routing/pctsp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    depot_distribution: Union[int, float, str, type, Callable] = None,
    penalty_factor: float = 3.0,
    prize_required: float = 1.0,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.penalty_fctor = penalty_factor
    self.prize_required = prize_required

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = get_sampler(
            "depot", depot_distribution, min_loc, max_loc, **kwargs
        ) if depot_distribution is not None else None

    # Prize distribution
    self.deterministic_prize_sampler = get_sampler(
        "deterministric_prize", "uniform", 0.0, 4.0 / self.num_loc, **kwargs
    )
    self.stochastic_prize_sampler = get_sampler(
        "stochastic_prize", "uniform", 0.0, 2.0, **kwargs
    )

    # For the penalty to make sense it should be not too large (in which case all nodes will be visited) nor too small
    # so we want the objective term to be approximately equal to the length of the tour, which we estimate with half
    # of the nodes by half of the tour length (which is very rough but similar to op)
    # This means that the sum of penalties for all nodes will be approximately equal to the tour length (on average)
    # The expected total (uniform) penalty of half of the nodes (since approx half will be visited by the constraint)
    # is (n / 2) / 2 = n / 4 so divide by this means multiply by 4 / n,
    # However instead of 4 we use penalty_factor (3 works well) so we can make them larger or smaller        
    self.max_penalty = kwargs.get("max_penalty", None)
    if self.max_penalty is None:  # If not provided, use the default max penalty
        self.max_penalty = MAX_LENGTHS.get(num_loc, None)
    if (
        self.max_penalty is None
    ):  # If not in the table keys, find the closest number of nodes as the key
        closest_num_loc = min(MAX_LENGTHS.keys(), key=lambda x: abs(x - num_loc))
        self.max_penalty = MAX_LENGTHS[closest_num_loc]
        log.warning(
            f"The max penalty for {num_loc} locations is not defined. Using the closest max penalty: {self.max_penalty}\
                with {closest_num_loc} locations."
        )

    # Adjust as in Kool et al. (2019)
    self.max_penalty *= penalty_factor / self.num_loc
    self.penalty_sampler = get_sampler(
        "penalty", "uniform", 0.0, self.max_penalty, **kwargs
    )

Split Delivery Vehicle Routing Problem (SDVRP)¶

SDVRPEnv ¶

SDVRPEnv(
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: CVRPEnv

Split Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations

location of the depot.
locations and demand/remaining demand of each customer
current location of the vehicle.
the remaining capacity of the vehicle.

Constraints

the tour starts and ends at the depot.
each customer can be visited multiple times.
the vehicle cannot visit customers exceed the remaining capacity.
the vehicle can return to the depot to refill the capacity.

Finish Condition

the vehicle has finished all customers demand and returned to the depot.

Reward

(minus) the negative length of the path.

Parameters:

generator (CVRPGenerator, default: None ) –

CVRPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Methods:

check_solution_validity –

Check that the solution is valid (all demand is satisfied)

Source code in rl4co/envs/routing/sdvrp/env.py

def __init__(
    self,
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(generator, generator_params, **kwargs)

check_solution_validity `staticmethod` ¶

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that the solution is valid (all demand is satisfied)

Source code in rl4co/envs/routing/sdvrp/env.py

@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that the solution is valid (all demand is satisfied)"""

    batch_size, graph_size = td["demand"].size()

    # Each node can be visited multiple times, but we always deliver as much demand as possible
    # We check that at the end all demand has been satisfied
    demands = torch.cat((-td["vehicle_capacity"], td["demand"]), 1)

    rng = torch.arange(batch_size, out=demands.data.new().long())
    used_cap = torch.zeros_like(td["demand"][..., 0])
    a_prev = None
    for a in actions.transpose(0, 1):
        assert (
            a_prev is None or (demands[((a_prev == 0) & (a == 0)), :] == 0).all()
        ), "Cannot visit depot twice if any nonzero demand"
        d = torch.min(demands[rng, a], td["vehicle_capacity"].squeeze(-1) - used_cap)
        demands[rng, a] -= d
        used_cap += d
        used_cap[a == 0] = 0
        a_prev = a
    assert (demands == 0).all(), "All demand must be satisfied"

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)¶

SPCTSPEnv ¶

SPCTSPEnv(**kwargs)

Bases: PCTSPEnv

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.

Note

The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).

Source code in rl4co/envs/routing/spctsp/env.py

def __init__(self, **kwargs):
    super().__init__(**kwargs)

Traveling Salesman Problem (TSP)¶

TSPEnv ¶

TSPEnv(
    generator: TSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Traveling Salesman Problem (TSP) environment At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations

locations of each customer.
the current location of the vehicle.

Constraints

the tour must return to the starting customer.
each customer must be visited exactly once.

Finish condition

the agent has visited all customers and returned to the starting customer.

Reward

(minus) the negative length of the path.

Parameters:

generator (TSPGenerator, default: None ) –

TSPGenerator instance as the data generator
generator_params (dict, default: {} ) –

parameters for the generator

Methods:

check_solution_validity –

Check that solution is valid: nodes are visited exactly once
replace_selected_actions –

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/tsp/env.py

def __init__(
    self,
    generator: TSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = TSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

check_solution_validity `staticmethod` ¶

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that solution is valid: nodes are visited exactly once

Source code in rl4co/envs/routing/tsp/env.py

@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that solution is valid: nodes are visited exactly once"""
    assert (
        torch.arange(actions.size(1), out=actions.data.new())
        .view(1, -1)
        .expand_as(actions)
        == actions.data.sort(1)[0]
    ).all(), "Invalid tour"

replace_selected_actions ¶

replace_selected_actions(
    cur_actions: Tensor,
    new_actions: Tensor,
    selection_mask: Tensor,
) -> Tensor

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/tsp/env.py

def replace_selected_actions(
    self,
    cur_actions: torch.Tensor,
    new_actions: torch.Tensor,
    selection_mask: torch.Tensor,
) -> torch.Tensor:
    """
    Replace selected current actions with updated actions based on `selection_mask`.

    Args:
        cur_actions [batch_size, num_loc]
        new_actions [batch_size, num_loc]
        selection_mask [batch_size,]
    """
    cur_actions[selection_mask] = new_actions[selection_mask]
    return cur_actions

TSPGenerator ¶

TSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    **kwargs
)

Bases: Generator

Data generator for the Travelling Salesman Problem (TSP).

Parameters:

num_loc (int, default: 20 ) –

number of locations (customers) in the TSP
min_loc (float, default: 0.0 ) –

minimum value for the location coordinates
max_loc (float, default: 1.0 ) –

maximum value for the location coordinates
init_sol_type (str, default: 'random' ) –

the method type used for generating initial solutions (random or greedy)
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

distribution for the location coordinates

Returns:

–

A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer

Source code in rl4co/envs/routing/tsp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.init_sol_type = init_sol_type

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

Multi-Task Vehicle Routing Problem (MTVRP)¶

MTVRPEnv ¶

MTVRPEnv(
    generator: MTVRPGenerator = None,
    generator_params: dict = {},
    check_solution: bool = False,
    **kwargs
)

Bases: RL4COEnvBase

MTVRPEnv is a Multi-Task VRP environment which can take any combination of the following constraints:

Features:

Capacity (C) - Each vehicle has a maximum capacity \(Q\), restricting the total load that can be in the vehicle at any point of the route. - The route must be planned such that the sum of demands and pickups for all customers visited does not exceed this capacity.
Time Windows (TW) - Every node \(i\) has an associated time window \([e_i, l_i]\) during which service must commence. - Additionally, each node has a service time \(s_i\). Vehicles must reach node \(i\) within its time window; early arrivals must wait at the node location until time \(e_i\).
Open Routes (O) - Vehicles are not required to return to the depot after serving all customers. - Note that this does not need to be counted as a constraint since it can be modelled by setting zero costs on arcs returning to the depot \(c_{i0} = 0\) from any customer \(i \in C\), and not counting the return arc as part of the route.
Backhauls (B) - Backhauls generalize demand to also account for return shipments. Customers are either linehaul or backhaul customers. - Linehaul customers require delivery of a demand \(q_i > 0\) that needs to be transported from the depot to the customer, whereas backhaul customers need a pickup of an amount \(p_i > 0\) that is transported from the client back to the depot. - It is possible for vehicles to serve a combination of linehaul and backhaul customers in a single route, but then any linehaul customers must precede the backhaul customers in the route.
Duration Limits (L) - Imposes a limit on the total travel duration (or length) of each route, ensuring a balanced workload across vehicles.

The environment covers the following 16 variants depending on the data generation:

VRP Variant	Capacity (C)	Open Route (O)	Backhaul (B)	Duration Limit (L)	Time Window (TW)
CVRP	✔
OVRP	✔	✔
VRPB	✔		✔
VRPL	✔			✔
VRPTW	✔				✔
OVRPTW	✔	✔			✔
OVRPB	✔	✔	✔
OVRPL	✔	✔		✔
VRPBL	✔		✔	✔
VRPBTW	✔		✔		✔
VRPLTW	✔			✔	✔
OVRPBL	✔	✔	✔	✔
OVRPBTW	✔	✔	✔		✔
OVRPLTW	✔	✔		✔	✔
VRPBLTW	✔		✔	✔	✔
OVRPBLTW	✔	✔	✔	✔	✔

You may also check out the following papers as reference:

Tip

Have a look at https://pyvrp.org/ for more information about VRP and its variants and their solutions. Kudos to their help and great job!

Parameters:

generator (MTVRPGenerator, default: None ) –

Generator for the environment, see :class:MTVRPGenerator.
generator_params (dict, default: {} ) –

Parameters for the generator.

Methods:

load_data –

Dataset loading from file
render –

Simple wrapper for render function
select_start_nodes –

Select available start nodes for the environment (e.g. for POMO-based training)
solve –

Classical solver for the environment. This is a wrapper for the baselines solver.
check_variants –

Check if the problem has the variants

Source code in rl4co/envs/routing/mtvrp/env.py

def __init__(
    self,
    generator: MTVRPGenerator = None,
    generator_params: dict = {},
    check_solution: bool = False,
    **kwargs,
):
    if check_solution:
        log.warning(
            "Solution checking is enabled. This may slow down the environment."
            " We recommend disabling this for training by passing `check_solution=False`."
        )

    super().__init__(check_solution=check_solution, **kwargs)

    if generator is None:
        generator = MTVRPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

load_data ¶

load_data(fpath, batch_size=[], scale=False)

Dataset loading from file Normalize demand by capacity to be in [0, 1]

Source code in rl4co/envs/routing/mtvrp/env.py

def load_data(self, fpath, batch_size=[], scale=False):
    """Dataset loading from file
    Normalize demand by capacity to be in [0, 1]
    """
    td_load = load_npz_to_tensordict(fpath)
    if scale:
        td_load.set(
            "demand_linehaul",
            td_load["demand_linehaul"] / td_load["capacity_original"],
        )
        td_load.set(
            "demand_backhaul",
            td_load["demand_backhaul"] / td_load["capacity_original"],
        )
    return td_load

render `staticmethod` ¶

render(*args, **kwargs)

Simple wrapper for render function

Source code in rl4co/envs/routing/mtvrp/env.py

@staticmethod
def render(*args, **kwargs):
    """Simple wrapper for render function"""
    from .render import render

    return render(*args, **kwargs)

select_start_nodes ¶

select_start_nodes(td, num_starts)

Select available start nodes for the environment (e.g. for POMO-based training)

Source code in rl4co/envs/routing/mtvrp/env.py

def select_start_nodes(self, td, num_starts):
    """Select available start nodes for the environment (e.g. for POMO-based training)"""
    num_loc = td["locs"].shape[-2] - 1
    selected = (
        torch.arange(num_starts, device=td.device).repeat_interleave(td.shape[0])
        % num_loc
        + 1
    )
    return selected

solve `staticmethod` ¶

solve(
    instances: TensorDict,
    max_runtime: float,
    num_procs: int = 1,
    solver: str = "pyvrp",
    **kwargs
) -> tuple[Tensor, Tensor]

Classical solver for the environment. This is a wrapper for the baselines solver. Available solvers are: pyvrp, ortools, lkh. Returns the actions and costs.

Source code in rl4co/envs/routing/mtvrp/env.py

@staticmethod
def solve(
    instances: TensorDict,
    max_runtime: float,
    num_procs: int = 1,
    solver: str = "pyvrp",
    **kwargs,
) -> tuple[torch.Tensor, torch.Tensor]:
    """Classical solver for the environment. This is a wrapper for the baselines solver.
    Available solvers are: `pyvrp`, `ortools`, `lkh`. Returns the actions and costs.
    """
    from .baselines.solve import solve

    return solve(instances, max_runtime, num_procs, solver, **kwargs)

check_variants `staticmethod` ¶

check_variants(td)

Check if the problem has the variants

Source code in rl4co/envs/routing/mtvrp/env.py

@staticmethod
def check_variants(td):
    """Check if the problem has the variants"""
    has_open = td["open_route"].squeeze(-1)
    has_tw = (td["time_windows"][:, :, 1] != float("inf")).any(-1)
    has_limit = (td["distance_limit"] != float("inf")).squeeze(-1)
    has_backhaul = (td["demand_backhaul"] != 0).any(-1)
    return has_open, has_tw, has_limit, has_backhaul

MTVRPGenerator ¶

MTVRPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[
        int, float, str, type, Callable
    ] = Uniform,
    capacity: float = None,
    min_demand: int = 1,
    max_demand: int = 10,
    min_backhaul: int = 1,
    max_backhaul: int = 10,
    scale_demand: bool = True,
    max_time: float = 4.6,
    backhaul_ratio: float = 0.2,
    distance_limit: float = 3.0,
    speed: float = 1.0,
    prob_open: float = 0.5,
    prob_time_window: float = 0.5,
    prob_limit: float = 0.5,
    prob_backhaul: float = 0.5,
    variant_preset=None,
    use_combinations=True,
    subsample=True,
    **kwargs
)

Bases: Generator

MTVRP Generator. Class to generate instances of the MTVRP problem. If a variant is declared and Subsample is True, the generator will sample the problem based on the variant probabilities. By default, we use Mixed-Batch Training as in Berto et al. 2024 (RouteFinder), i.e. one batch can contain multiple variants.

Example presets:

"all": Sample uniformly from 16 variants
"single_feat": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW (as done in Liu et al. 2024 (MTPOMO))
"single_feat_otw": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW, OVRPTW (as done in Zhou et al. 2024 (MVMoE))
"cvrp": Only CVRP (similarly for other variants)

Parameters:

num_loc (int, default: 20 ) –

Number of locations to generate
min_loc (float, default: 0.0 ) –

Minimum location value
max_loc (float, default: 1.0 ) –

Maximum location value
loc_distribution (Union[int, float, str, type, Callable], default: Uniform ) –

Distribution to sample locations from
capacity (float, default: None ) –

Vehicle capacity. If None, get value based on get_vehicle_capacity
min_demand (int, default: 1 ) –

Minimum demand value
max_demand (int, default: 10 ) –

Maximum demand value
min_backhaul (int, default: 1 ) –

Minimum backhaul value
max_backhaul (int, default: 10 ) –

Maximum backhaul value
scale_demand (bool, default: True ) –

Scale demand values (by default, generate between 1 and 10)
max_time (float, default: 4.6 ) –

Maximum time window value (at depot)
backhaul_ratio (float, default: 0.2 ) –

Fraction of backhauls (e.g. 0.2 means 20% of nodes are backhaul)
distance_limit (float, default: 3.0 ) –

Distance limit
speed (float, default: 1.0 ) –

Speed of vehicle. Defaults to 1
subsample –

If False, we always sample all attributes (i.e., OVRPBLTW) If true, we use the
**kwargs –

Additional keyword arguments

Methods:

subsample_problems –

Create subproblems starting from seed probabilities depending on their variant.
generate_locations –

Generate seed locations.
generate_demands –

Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation.
generate_time_windows –

Generate time windows (TW) and service times for each location including depot.
generate_distance_limit –

Generates distance limits (L) and checks their feasibilities.
generate_open_route –

Generate open route flags (O). Here we could have a sampler but we simply return True here so all
generate_speed –

We simply generate the speed as constant here

Source code in rl4co/envs/routing/mtvrp/generator.py

def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: Union[int, float, str, type, Callable] = Uniform,
    capacity: float = None,
    min_demand: int = 1,
    max_demand: int = 10,
    min_backhaul: int = 1,
    max_backhaul: int = 10,
    scale_demand: bool = True,
    max_time: float = 4.6,
    backhaul_ratio: float = 0.2,
    distance_limit: float = 3.0,
    speed: float = 1.0,
    prob_open: float = 0.5,
    prob_time_window: float = 0.5,
    prob_limit: float = 0.5,
    prob_backhaul: float = 0.5,
    variant_preset=None,
    use_combinations=True,
    subsample=True,
    **kwargs,
) -> None:
    # Location distribution
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    if capacity is None:
        capacity = get_vehicle_capacity(num_loc)
    self.capacity = capacity
    self.min_demand = min_demand
    self.max_demand = max_demand
    self.min_backhaul = min_backhaul
    self.max_backhaul = max_backhaul
    self.scale_demand = scale_demand
    self.backhaul_ratio = backhaul_ratio

    self.max_time = max_time
    self.distance_limit = distance_limit
    self.speed = speed

    assert not (subsample and (variant_preset is None)), (
        "Cannot use subsample if variant_preset is not specified. "
    )
    if variant_preset is not None:
        log.info(f"Using variant generation preset {variant_preset}")
        variant_probs = VARIANT_GENERATION_PRESETS.get(variant_preset)
        assert (
            variant_probs is not None
        ), f"Variant generation preset {variant_preset} not found. \
            Available presets are {VARIANT_GENERATION_PRESETS.keys()} with probabilities {VARIANT_GENERATION_PRESETS.values()}"
    else:
        variant_probs = {
            "O": prob_open,
            "TW": prob_time_window,
            "L": prob_limit,
            "B": prob_backhaul,
        }
    # check probabilities
    for key, prob in variant_probs.items():
        assert 0 <= prob <= 1, f"Probability {key} must be between 0 and 1"
    self.variant_probs = variant_probs
    self.variant_preset = variant_preset
    if isinstance(variant_preset, str) and variant_preset != "all":
        log.warning(f"{variant_preset} selected. Will not use feature combination!")
        use_combinations = False
    self.use_combinations = use_combinations
    self.subsample = subsample

subsample_problems ¶

subsample_problems(td)

Create subproblems starting from seed probabilities depending on their variant. If random seed sampled in [0, 1] in batch is greater than prob, remove the constraint thus, if prob high, it is less likely to remove the constraint (i.e. prob=0.9, 90% chance to keep constraint)

Source code in rl4co/envs/routing/mtvrp/generator.py

def subsample_problems(self, td):
    """Create subproblems starting from seed probabilities depending on their variant.
    If random seed sampled in [0, 1] in batch is greater than prob, remove the constraint
    thus, if prob high, it is less likely to remove the constraint (i.e. prob=0.9, 90% chance to keep constraint)
    """
    batch_size = td.batch_size[0]

    variant_probs = torch.tensor(list(self.variant_probs.values()))

    if self.use_combinations:
        # in a batch, multiple variants combinations can be picked
        keep_mask = torch.rand(batch_size, 4) >= variant_probs  # O, TW, L, B
    else:
        # in a batch, only a variant can be picked.
        # we assign a 0.5 prob to the last variant (which is normal cvrp)
        if self.variant_preset in list(
            VARIANT_GENERATION_PRESETS.keys()
        ) and self.variant_preset not in (
            "all",
            "cvrp",
            "single_feat",
            "single_feat_otw",
        ):
            cvrp_prob = 0
        else:
            cvrp_prob = 0.5
        if self.variant_preset in ("all", "cvrp", "single_feat", "single_feat_otw"):
            indices = torch.distributions.Categorical(
                torch.Tensor(list(self.variant_probs.values()) + [cvrp_prob])[
                    None
                ].repeat(batch_size, 1)
            ).sample()
            if self.variant_preset == "single_feat_otw":
                keep_mask = torch.zeros((batch_size, 6), dtype=torch.bool)
                keep_mask[torch.arange(batch_size), indices] = True

                # If keep_mask[:, 4] is True, make both keep_mask[:, 0] and keep_mask[:, 1] True
                keep_mask[:, :2] |= keep_mask[:, 4:5]
            else:
                keep_mask = torch.zeros((batch_size, 5), dtype=torch.bool)
                keep_mask[torch.arange(batch_size), indices] = True
        else:
            # if the variant is specified, we keep the attributes with probability > 0
            keep_mask = torch.zeros((batch_size, 4), dtype=torch.bool)
            indices = torch.nonzero(variant_probs).squeeze()
            keep_mask[:, indices] = True

    td = self._default_open(td, ~keep_mask[:, 0])
    td = self._default_time_window(td, ~keep_mask[:, 1])
    td = self._default_distance_limit(td, ~keep_mask[:, 2])
    td = self._default_backhaul(td, ~keep_mask[:, 3])

    return td

generate_locations ¶

generate_locations(batch_size, num_loc) -> Tensor

Generate seed locations.

Returns:

locs ( Tensor ) –

[B, N+1, 2] where the first location is the depot.

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_locations(self, batch_size, num_loc) -> torch.Tensor:
    """Generate seed locations.

    Returns:
        locs: [B, N+1, 2] where the first location is the depot.
    """
    locs = torch.FloatTensor(*batch_size, num_loc + 1, 2).uniform_(
        self.min_loc, self.max_loc
    )
    return locs

generate_demands ¶

generate_demands(batch_size: int, num_loc: int) -> Tensor

Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation. Initialize the demand for nodes except the depot, which are added during reset. Demand sampling Following Kool et al. (2019), demands as integers between 1 and 10. Generates a slightly different distribution than using torch.randint.

Returns:

linehaul_demand ( Tensor ) –

[B, N]
backhaul_demand ( Tensor ) –

[B, N]

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_demands(self, batch_size: int, num_loc: int) -> torch.Tensor:
    """Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation.
    Initialize the demand for nodes except the depot, which are added during reset.
    Demand sampling Following Kool et al. (2019), demands as integers between 1 and 10.
    Generates a slightly different distribution than using torch.randint.

    Returns:
        linehaul_demand: [B, N]
        backhaul_demand: [B, N]
    """
    linehaul_demand = (
        torch.FloatTensor(*batch_size, num_loc)
        .uniform_(self.min_demand - 1, self.max_demand - 1)
        .int()
        + 1
    ).float()
    # Backhaul demand sampling
    backhaul_demand = (
        torch.FloatTensor(*batch_size, num_loc)
        .uniform_(self.min_backhaul - 1, self.max_backhaul - 1)
        .int()
        + 1
    ).float()
    is_linehaul = torch.rand(*batch_size, num_loc) > self.backhaul_ratio
    backhaul_demand = (
        backhaul_demand * ~is_linehaul
    )  # keep only values where they are not linehauls
    linehaul_demand = (
        linehaul_demand * is_linehaul
    )
    return linehaul_demand, backhaul_demand

generate_time_windows ¶

generate_time_windows(
    locs: Tensor, speed: Tensor
) -> Tensor

Generate time windows (TW) and service times for each location including depot. We refer to the generation process in "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization" (Liu et al., 2024). Note that another way to generate is from "Learning to Delegate for Large-scale Vehicle Routing" (Li et al, 2021) which is used in "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024). Note that however, in that case the distance limit would have no influence when time windows are present, since the tw for depot is the same as distance with speed=1. This function can be overridden for that implementation. See also https://github.com/RoyalSkye/Routing-MVMoE

Parameters:

locs (Tensor) –

[B, N+1, 2] (depot, locs)
speed (Tensor) –

[B]

Returns:

time_windows ( Tensor ) –

[B, N+1, 2]
service_time ( Tensor ) –

[B, N+1]

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_time_windows(
    self,
    locs: torch.Tensor,
    speed: torch.Tensor,
) -> torch.Tensor:
    """Generate time windows (TW) and service times for each location including depot.
    We refer to the generation process in "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization"
    (Liu et al., 2024). Note that another way to generate is from "Learning to Delegate for Large-scale Vehicle Routing" (Li et al, 2021) which
    is used in "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024). Note that however, in that case
    the distance limit would have no influence when time windows are present, since the tw for depot is the same as distance with speed=1.
    This function can be overridden for that implementation.
    See also https://github.com/RoyalSkye/Routing-MVMoE

    Args:
        locs: [B, N+1, 2] (depot, locs)
        speed: [B]

    Returns:
        time_windows: [B, N+1, 2]
        service_time: [B, N+1]
    """

    batch_size, n_loc = locs.shape[0], locs.shape[1] - 1  # no depot

    a, b, c = 0.15, 0.18, 0.2
    service_time = a + (b - a) * torch.rand(batch_size, n_loc)
    tw_length = b + (c - b) * torch.rand(batch_size, n_loc)
    d_0i = get_distance(locs[:, 0:1], locs[:, 1:])
    h_max = (self.max_time - service_time - tw_length) / d_0i * speed - 1
    tw_start = (1 + (h_max - 1) * torch.rand(batch_size, n_loc)) * d_0i / speed
    tw_end = tw_start + tw_length

    # Depot tw is 0, max_time
    time_windows = torch.stack(
        (
            torch.cat((torch.zeros(batch_size, 1), tw_start), -1),  # start
            torch.cat((torch.full((batch_size, 1), self.max_time), tw_end), -1),
        ),  # en
        dim=-1,
    )
    # depot service time is 0
    service_time = torch.cat((torch.zeros(batch_size, 1), service_time), dim=-1)
    return time_windows, service_time  # [B, N+1, 2], [B, N+1]

generate_distance_limit ¶

generate_distance_limit(
    shape: Tuple[int, int], locs: Tensor
) -> Tensor

Generates distance limits (L) and checks their feasibilities.

Returns:

distance_limit ( Tensor ) –

[B, 1]

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_distance_limit(
    self, shape: Tuple[int, int], locs: torch.Tensor
) -> torch.Tensor:
    """Generates distance limits (L) and checks their feasibilities.

    Returns:
        distance_limit: [B, 1]
    """
    # calculate distance of all locations to depot
    dist_to_depot = torch.cdist(locs, locs[:, 0:1, :], p=2)
    assert (
        dist_to_depot * 2 < self.distance_limit  # go back and forth
    ).all(), "Distance limit too low, not all nodes can be reached from the depot."
    return torch.full(shape, self.distance_limit, dtype=torch.float32)

generate_open_route ¶

generate_open_route(shape: Tuple[int, int])

Generate open route flags (O). Here we could have a sampler but we simply return True here so all routes are open. Afterwards, we subsample the problems.

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_open_route(self, shape: Tuple[int, int]):
    """Generate open route flags (O). Here we could have a sampler but we simply return True here so all
    routes are open. Afterwards, we subsample the problems.
    """
    return torch.ones(shape, dtype=torch.bool)

generate_speed ¶

generate_speed(shape: Tuple[int, int])

We simply generate the speed as constant here

Source code in rl4co/envs/routing/mtvrp/generator.py

def generate_speed(self, shape: Tuple[int, int]):
    """We simply generate the speed as constant here"""
    # in this version, the speed is constant but this class may be overridden
    return torch.full(shape, self.speed, dtype=torch.float32)

Routing Problems¶

Asymmetric Traveling Salesman Problem (ATSP)¶

ATSPEnv ¶

ATSPGenerator ¶

Capacitated Vehicle Routing Problem (CVRP)¶

CVRPEnv ¶

check_solution_validity staticmethod ¶

load_data staticmethod ¶

replace_selected_actions ¶

CVRPGenerator ¶

Multiple Traveling Salesman Problem (mTSP)¶

MTSPEnv ¶

MTSPGenerator ¶

Orienteering Problem (OP)¶

OPEnv ¶

get_action_mask staticmethod ¶

check_solution_validity staticmethod ¶

OPGenerator ¶

Pickup and Delivery Problem (PDP)¶

PDPEnv ¶

get_num_starts ¶

select_start_nodes ¶

PDPGenerator ¶

Prize Collecting Traveling Salesman Problem (PCTSP)¶

PCTSPEnv ¶

get_action_mask staticmethod ¶

check_solution_validity staticmethod ¶

PCTSPGenerator ¶

Split Delivery Vehicle Routing Problem (SDVRP)¶

SDVRPEnv ¶

check_solution_validity staticmethod ¶

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)¶

SPCTSPEnv ¶

Traveling Salesman Problem (TSP)¶

TSPEnv ¶

check_solution_validity staticmethod ¶

replace_selected_actions ¶

TSPGenerator ¶

Multi-Task Vehicle Routing Problem (MTVRP)¶

MTVRPEnv ¶

load_data ¶

render staticmethod ¶

select_start_nodes ¶

solve staticmethod ¶

check_variants staticmethod ¶

MTVRPGenerator ¶

subsample_problems ¶

generate_locations ¶

generate_demands ¶

generate_time_windows ¶

generate_distance_limit ¶

generate_open_route ¶

generate_speed ¶

check_solution_validity `staticmethod` ¶

load_data `staticmethod` ¶

get_action_mask `staticmethod` ¶

check_solution_validity `staticmethod` ¶

get_action_mask `staticmethod` ¶

check_solution_validity `staticmethod` ¶

check_solution_validity `staticmethod` ¶

check_solution_validity `staticmethod` ¶

render `staticmethod` ¶

solve `staticmethod` ¶

check_variants `staticmethod` ¶