There are many pest species that invade valuable ecosystems and damage ecosystem services. An important management goal is to limit the spread of these species or even eradicate them. From the computational perspective, these problems can be formulated as Markov Decision Processes. However, solving them is difficult, because of their spatial nature. Viewed abstractly, the current state of the ecosystem can be formalized as a graph. Each node of the graph may be invaded or not, and at each time step, a “local” action must be selected at each node. Consequently, at the level of the whole MDP, each global action is a vector of local actions of length N, the number of nodes in the graph. The exponentially large action space raises major computational problems. During the state transitions, the invasive species can spread along the edges of the graph. We study a specific instance of this general problem based on the spread of Tamarisk in river networks. In this formulation, we work with the dual of the standard river network graph. Each node corresponds to a stretch of the river between two confluences (also called a “reach”). We divide each reach into H “slots”, where each slot can either be occupied by a single native tree, occupied by a single Tamarisk tree, or empty. In each time step, the plants (both native and invader) first must survive through winter (with some probability). Then, the surviving trees produce seeds that disperse through the river network according to a dispersal kernel that prefers downstream movement with high probability. Each dispersing seed lands at some slot in some node of the network. All seeds arriving at a slot compete to establish themselves in that slot (if the slot is empty). In our model, there are three local actions: simple eradication of invasive plants, eradication followed by planting of native plants, or doing nothing at all. Doing nothing costs nothing, while eradication is more expensive, and eradication plus planting is the most expensive. One local action must be chosen for each node (reach) at each time step, and it applies to all H slots at that node. The actions only succeed stochastically. The planning goal is to minimize the expected discounted sum of the cost of invasive damage and the cost of management actions. To solve this planning problem, we apply the standard value iteration algorithm. However, standard value iteration requires a fully-specified state transition matrix. In our problem, it is easy to write a simulator for drawing samples of the dispersal and competition processes, but it is computationally intractable to compute the exact transition probabilities. Hence, we estimate those probabilities by drawing a large number of samples from the simulator. Once we have an optimal policy on the approximated MDP, we explore its behaviour both directly and via simulated rollouts. The optimal policy can be visualized with one picture per state annotated by the types of plants in each reach and the optimal action that should be taken in each reach. We also run the optimal policy forward in time on many simulated trajectories in order to gather statistics about its behaviour such as: how long it takes to reach a steady state, average cost of a policy over time, or the frequency with which completely invaded or uninvaded states are reached. Comparisons of these statistics show that in many situations our optimal policies greatly outperform established rules of thumb from the literature. An example of a rule of thumb is to always prioritize treatment of upstream over downstream reaches. Another rule targets the most invaded reaches (those with the most slots occupied by Tamarisk) first. We have found that the optimality of these rules of thumb depends strongly on the dispersal and competition parameters. Some contributions this work makes to bioeconomics and ecology are showing evidence that the stochastic spread of the invasion process needs to be modelled and that the spatial characteristics of the system under invasion are relevant to optimal management. The computational challenges raised by this problem have led us into research on methods for planning that can provide bounds on the optimality of the policy while minimizing the number of calls to expensive simulators. We are also studying algorithms that can scale to large graphs with thousands of nodes and large out-degrees. Finally, we see potential in these problems for intelligently reducing the size of the state space and the complexity of the policy description by utilizing domain knowledge.