Academia.eduAcademia.edu
FOCUS: CLIMATE CHANGE SOFTWARE Enabling Open Development Methodologies in Climate Change Assessment Modeling Joshua Introne, Robert Laubacher, and Thomas Malone, MIT Center for Collective Intelligence // The Radically Open Modeling Architecture (ROMA) lets climate change policy stakeholders create and run surrogate simulations and composite models. // Models also play a central role for climate change policy-makers, but are so complex and computationally demanding that experts must run them and interpret their results, creating a bottleneck between models and stakeholders. This reduces the lexibility that individual stakeholders have to explore alternative scenarios and limits the number of stakeholders that can query the models. It also makes models more opaque to stakeholders because experts summarize model results and tend to omit details about the models’ assumptions. Complexity and opacity, in turn, reduce public conidence in such models. Drawing inspiration from open source development practices, we wanted to address these problems by providing support for modularization of and open access to models that can inform climate policy deliberations. We thus developed a publicly accessible Web service called ROMA (Radically Open Modeling Architecture) that allows anyone to create, combine, and run modular simulations. ROMA currently provides the modeling functionality in the Climate CoLab (http:// climatecolab.org), a collective intelligence application in which large numbers of people work together to develop proposals to address climate change.4,5 In time, we hope that ROMA will support a community focused on model development and analysis. Design COMPUTATIONAL SIMULATION MODELS help support scientiically grounded “what if” analyses by translating specialized knowledge into tools that can project the likely future impact of current actions. Models have thus become important in a variety of policy domains. In recent years, several software platforms for environmental policy-making and urban planning have added simulation models to decision support tools to provide stakeholders with direct access to these models. This trend is continuing to gain ground.1–3 56 I E E E S O F T W A R E | P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y We initially developed ROMA as part of the Climate CoLab, where community members run models to predict the outcomes of proposals to address climate change. Modeling in the CoLab helped crystallize two of ROMA’s technical design challenges. First, it had to simplify complex, computationally expensive models for a broad, Web-based 0 74 0 -74 5 9 / 11 / $ 2 6 . 0 0 © 2 0 11 I E E E community; models must execute rapidly and without great loss of accuracy, and we must be able to lexibly tune any model’s interface to meet diverse users’ needs. Our second design challenge was how to use modeling functionality to support collective intelligence. Research has demonstrated that large groups of diverse individuals can i nd better solutions than similarly sized groups of experts, but only if the individuals have a basic understanding of the domain and are free to explore the space.6 Thus, ROMA must provide modeling capabilities that could inject expertise into users’ exploration of the domain yet still let individuals try out ideas that model creators haven’t foreseen. ROMA provides two core functionalities to meet these design challenges. It provides tools for generating and running surrogate simulations of much larger integrated assessment models (IAMs; integrative models that predict the impacts of climate change across a variety of domains). Clients can run surrogate models very quickly and easily customize them to reduce input number and complexity. ROMA also offers a uniform API and componentized view of models and stored model runs and lets clients combine components to create executable composite models. These features let stakeholders explore climate and integrated assessment models directly; the design also supports a division of labor in which experts in different subspecialties can easily add new component models that stakeholders can then combine with others to explore competing assumptions about the world. ROMA Service Architecture ROMA describes simulation models by their inputs and outputs. These variables can be of any standard data type (for example, integers, doubles, or text) and can represent vector or scalar values. A Components Variable Model Name Description Author Mapped models Version URL run(List<Tuple>:Scenario ExternalName: 0 ..* inputs 1 ..* String ExternalName: 0 ..* outputs 1 ..* String MappedModel Name Description Unit Cardinality DataType Precision Label Tuple Replication SamplingFrequency MappingFunction Values Scenario CompositeModel Name Author Version VariableMapping ExecutionOrder FIGURE 1. Partial class diagram for the modeling service. The model and associated variable classes capture metadata about models known to ROMA. A scenario captures data for a particular run of a model in tuples. Both models and scenarios support versioning. model can also be associated with other metadata (such as a name and description). ROMA publishes its own URL that clients can use to run models. When a model runs, ROMA generates and stores a dataset called a scenario that contains all the concrete input and output values and a reference to the model that generated it. For composite models, the scenario will also contain subscenarios corresponding to the inputs and outputs of each component model in the composite. Because it maintains a connection between scenarios and the models that create them, ROMA can semiautomatically update stored scenarios if a model changes. It can also swap out subscenarios or replace component models to update a composite scenario. This enables less tightly coupled worklows around the creation of scenarios. For example, a team developing a scenario for a global emissions policy in the CoLab can plug in different national policy scenarios that were developed elsewhere. Figure 1 shows a class diagram that describes ROMA’s core components. ROMA exposes the four main elements—models, variables, scenarios, and tuples—via a RESTful interface that lets Web clients retrieve XML descriptions of these entities. All other functionality is available through a set of Web forms. ROMA offers two kinds of support for combining models: mapped models can transform other models’ inputs and outputs, and composite models can contain other models and maintain connections among them. Figure 2 shows how ROMA uses these types of models. The use of mapped models allows several transformations: • Replication. A model can be repeated n times over incoming values with higher cardinality. For example, a mapped model with a replication value of n > 1 can transform a model that accepts scalar values into a model that accepts vectors with a cardinality n. N OV E M B E R / D E C E M B E R 2 0 11 | IEEE S O F T W A R E 57 FOCUS: CLIMATE CHANGE SOFTWARE which case the system will calculate all downstream changes and update the composite scenario’s version number. Similarly, users can replace a component model in a composite model with a new model that has the same inputs and outputs and request that the system update all scenarios to the new composite model. Composite model Model Mapped model ∑ Surrogate Models Step 1 Step 2 Step 3 FIGURE 2. A notional schematic illustrating how to connect models. A composite model consists of several steps that embed component models. • Subsampling. Users can reduce any model’s output cardinality by subsampling its outputs at a given frequency. For example, a user can sample a model that provides predictions for atmospheric CO2 for each year over the course of a century at a period of 10 years to generate data for another model that requires decadal CO2 values as inputs. • Many-to-one mapping. Users can also reduce any model’s output cardinality by applying a many-to-one mapping function—for example, sum, average, i rst, and last. If users want to combine these transformations, ROMA applies them sequentially as ordered in the preceding list. Thus, ROMA i rst repeats a model over its inputs, subsamples the results of that operation, and i nally combines them using the many-to-one mapping function if one is speciied. Composite models arrange their component models in a series of ordered, connected steps. Each step can contain any number of models as long as they have no dependencies on each other. A set of connection descriptors speciies connections from the composite model inputs to steps, connections between steps, and from steps to the composite model’s outputs. ROMA allows users to connect only those variables that have the same units, data type, and cardinality. More sophisticated compatibility checking is left to the composite model creator. Connections between steps must be from output variables in an earlier step to input variables in a later (though not necessarily adjacent) step so that cycles can’t occur. When a user runs a composite model, ROMA executes the steps in order. Running a composite model produces a composite scenario that contains references to scenarios generated by each of the component models. Users can replace component scenarios (that don’t have their inputs determined by upstream models) after ROMA has generated a composite scenario, in 58 I E E E S O F T W A R E | W W W. C O M P U T E R . O R G / S O F T W A R E Other domains often use surrogate models when “real” models are too expensive to run for all parameter combinations of interest or when model authors prefer to control access to their technology.7 Researchers construct surrogate models by interpolating between known data points that the actual model generated. In practice, a surrogate model is often elaborated as a researcher explores a model’s parameter space. In the case of climate and integrated assessment models, though, we generally have access to published data instead of the actual models, so we construct surrogate models based on this data. To simplify generating surrogate models in ROMA, the service accepts scenario-based data—a form commonly used for presenting data from IAMs—and automatically generates surrogate models. We’re currently developing a user interface that will make it easy for anyone with access to such a set of scenarios (for instance, a model creator) to create a surrogate model within ROMA. Surrogate models provide users with a very rapid estimate of much larger models for a bounded region of their parameter space. Of course, these estimates are only approximations, the accuracy of which depend on the curveitting algorithm used, the amount of data available, the complexity of the output surface, and other factors. Users must weigh the trade-offs between speed and accuracy for each particular application and domain in which they employ surrogate models. Model Execution and Spreadsheet Models We intended for ROMA to work with models that run on other servers. For it to run an external model, the model provider must present a URL that accepts a form post with values for each input variable in the model and be able to return data to ROMA. Although ROMA is agnostic with respect to the technology that runs individual models, no provision is currently made for models that have long execution times (greater than the HTTP request timeout) or that require scheduling. In addition to externally hosted models, ROMA includes functionality that can transform a spreadsheet into an executable model. Spreadsheet models map input and output variables to cells and cell ranges; the user dei nes this association when uploading a spreadsheet to ROMA and usersupplied functions embedded in the spreadsheet perform all model calculations. The system uses the spreadsheet engine available from the open source Apache POI project (http://poi.apache. org) to run this type of model. Although they’re computationally limited, spreadsheets are widely understood, and many people use them to create informal models that support decision-making. Thus, spreadsheets provide an easy way to “open up” modeling to a broad community without requiring users to learn a domain-speciic language for building models. ROMA Application in the Climate CoLab In the Climate CoLab, some kinds of user proposals must be attached to a ROMA-generated scenario that predicts the impacts of that proposal. The CoLab uses the XML data ROMA provides to generate an interface that lets users enter input variables, run models, and view stored results. So far, all proposals that require models in the CoLab have been for a Composite model User-provided inputs Ouputs Land use Climate Regional emissions Afforestation C-Learn climate model Deforestation Sea level Energy Modeling Forum-22 mitigation cost (7 surrogate models) Emissions change Option: 3 region Option: 7 region Atmospheric CO2 7 region Damage cost (2 surrogate models) Temperature Economic % GDP mitigation % GDP damage Sea level Temperature Option: 15 region Physical impacts (2 models) Physical systems Water, food, etc. GDP: Gross Domestic Product FIGURE 3. MIT composite model inputs, modules, and outputs. The three-, seven-, and 15-region inputs for emissions are interface options that let the user specify emissions reductions at different levels of granularity. global agreement to address climate change, and contributors have used one of three variants of a single composite model to develop scenarios. The composite model combines a climate simulation with models that predict economic and physical systems’ impacts (see Figure 3). The model’s variants differ in the degree of granularity with which users specify emissions reduction commitments for the world’s nations. To run the model, users specify global land use goals and emissions commitments broken out as inputs by region. They can choose to specify emissions targets for three (developed countries, rapidly developing countries, and other developing countries), seven (with larger economies broken out), or 15 regions. Models in the seven- and 15-region variants of the composite model transform the emissions inputs into the three regions accepted by the C-Learn Climate Model. The MIT composite model feeds emissions commitments for three regions and land use goals into the CLearn climate simulation. C-Learn is a version of the Climate Rapid Overview and Decision-support Simulation (CROADS), 8 a lightweight climate model developed by Climate Interactive (http://climateinteractive.org) that can run on personal computers. C-Learn runs as a separate Web service hosted on an internal server and produces predictions for several indicators including the climate outputs in Figure 3. CLearn outputs are for each year from 2000 to 2100 inclusive. Two physical impact models produce a brief textual summary of the anticipated effects of temperature change on geophysical (water, land, ecosystems, and singular events) and human systems (health and food/agriculture). This information is derived from published research that provides predictions for each Celsius degree of change.10,11 The CoLab model then captures this information as a simple spreadsheet model that looks up the appropriate output based on the temperature change by 2100, and uses a mapped model to transform the vector outputs from C-Learn into the scalar output that the physical impacts models require. N OV E M B E R / D E C E M B E R 2 0 11 | IEEE S O F T W A R E 59 FOCUS: CLIMATE CHANGE SOFTWARE Business as usual Scenario A Scenario B Year % 2005 emissions 2080 % Emissions × GDP % GDP 2080 % GDP Year FIGURE 4. Development of mitigation cost surrogate. For each year, emissions values in each scenario are plotted against Gross Domestic Product (GDP) values. Surrogate models use the resultant curve to infer GDP over the entire range of emissions values for that year. Based on data generated by a handful of well-known IAMs, several surrogate models can compute economic outputs. Typically, IAMs report two types of economic costs: damage cost (the cost of climate change reported as a percentage deviation from anticipated future Gross Domestic Product [GDP] if climate change were not to occur9,11) and mitigation cost (the cost of reducing emissions, also reported as percent deviation GDP from an anticipated baseline). The MIT composite reports seven predictions using surrogate models based on data from the Stanford Energy Modeling Forum’s (EMF) 22 exercise.12 Preparing Mitigation Cost Surrogate Models We based mitigation cost models in the CoLab on data generated during the EMF 22 exercise. Modeling teams who participated in EMF 22 simulated a group of scenarios that relected a range of potential global mitigation policy approaches plus a reference, called a business as usual (BAU) scenario with no mitigation policy. Each scenario involved stabilizing greenhouse gas concentrations at a particular target level. Data reported included greenhouse gas (GHG) emissions and sequestration and a variety of economic indicators such as GDP. We created surrogate models to predict the effect of emissions reduction on anticipated GDP from 2000 to 2100. We chose changes in GHG emissions as input because emission reductions are the primary mechanism by which to achieve GHG stabilization and because actions to reduce emissions will be the primary driver of mitigation policy costs. To construct the surrogates, we used two sets of data for each model: percentage change in fossil fuel CO2 emissions versus 2005 levels and percentage reduction in GDP versus the reference scenario (no policy or BAU). Thus, for each model in each year, we had n points that associate an emissions level with a percent deviation in 60 I E E E S O F T W A R E | W W W. C O M P U T E R . O R G / S O F T W A R E GDP, where n is the number of scenarios our analysis used for that model (see Figure 4). To determine the impact on GDP for any emissions level in a particular year, we located the point on a curve that best fit the n data points available for that year and then used linear piecewise interpolation to approximate this curve. More sophisticated approaches (for example, higherorder polynomials) are possible, but we didn’t feel they were justified without additional data. If emissions levels are lower than the most aggressive scenario in a particular year, the surrogate model doesn’t report a value. If emission levels are higher than BAU (the scenario for which mitigation cost is zero), the model simply reports zero percent change in anticipated GDP. For policy proposals in the CoLab that are too aggressive to be simulated with a particular surrogate mitigation model (for example, emissions levels are too low in a particular year), the system reports that the modeling team in question likely judges the policy scenario to be technically or economically infeasible. Some inaccuracies arose for the CoLab mitigation cost models because emissions values generated by C-Learn and used as inputs to the surrogate models didn’t correspond in every detail with the emissions values the original mitigation models used. For example, CoLab users could specify land use goals to manipulate emissions levels, but the surrogate models didn’t incorporate this as a source of emissions. Land use accounted for approximately 8 percent of total CO2 emissions in 2010, so differences in land use policy would have an incremental impact on both environmental and economic outcomes. To enhance the system’s accuracy, we’re exploring the incorporation of land use emissions in a future surrogate model. ABOUT THE AUTHORS T he modeling functionality ROMA offers to Climate CoLab users is only a subset of its potential. We plan to introduce more advanced functionality as we develop organizational processes to help scaffold its use. Throughout 2011, the CoLab will launch a series of contests to create both national and global proposals for emissions reduction. Occurring in parallel, these contests will be phased with interim evaluations at the end of each phase. Within the CoLab, the validity of the models has been established via a centrally administered review process with a panel of experts. To support the vision of an open-modeling community, we hope to design processes and technical support to better leverage the collective intelligence of the community. For instance, the community could be invited to look for obvious errors (ini nite or impossible values at the extrema of the input space) for each model. Model creators and experts might attach key assumptions to individual models, and experts could weigh in on the validity of those assumptions. These assessments could be summarized to provide policy-makers with indications about model maturity and uncertainty. The hurdles to creating community processes around model creation, analysis, and validation are as much social and organizational as they are technical. Integrated assessment models have traditionally been implemented as monolithic software projects developed by small teams of experts, and these development processes have led to the complexity and opacity that currently cause dificulties. By emphasizing modularity and offering a set of features that allow stakeholders to become more directly involved in climate and assessment modeling, we hope ROMA will enable the social and organizational processes that ultimately improve our chances of creating solutions to climate change. JOSHUA INTRONE is a research scientist at the MIT Center for Collective Intelligence and the software architect of the Climate CoLab. His research interests include the design of mediating tools to improve collaborative and collective performance, the impact of social network structure on collaborative information processing, and the development of sociotechnical architectures for problem-solving. Introne has a PhD in computer science from Brandeis University. Contact him at jintrone@ mit.edu. ROBERT LAUBACHER is a research scientist and associate director at the MIT Center for Collective Intelligence, where he manages the Climate CoLab project. His research interests include developing approaches that can make complex simulation models accessible to interested citizens. Laubacher has an MA in modern history from Harvard. Contact him at rjl@mit.edu. THOMAS W. MALONE is the Patrick J. McGovern Professor of Management at the MIT Sloan School of Management and the founding director of the MIT Center for Collective Intelligence. His research interests include collective intelligence, organizational design, and computer-supported cooperative work. Malone has a PhD in cognitive and social psychology from Stanford University. Contact him at malone@mit.edu. References 1. M. Matthies, C. Giupponi, and B. Ostendorf, “Environmental Decision Support Systems: Current Issues, Methods and Tools,” Environmental Modelling & Software, vol. 22, no. 2, 2007, pp. 123–127. 2. I.S. Mayer et al., “Collaborative Decision Making for Sustainable Urban Renewal Projects: A Simulation-Gaming Approach,” Environment and Planning B: Planning and Design, vol. 32, no. 3, 2005, pp. 403–423. 3. B. Friedman et al., “Laying the Foundations for Public Participation and Value Advocacy: Interaction Design for a Large-Scale Urban Simulation,” Proc. 2008 Int’l Conf. Digital Govt. Research, Digital Gov’t Soc. North America, 2008, pp. 305–314. 4. T.W. Malone, R. Laubacher, and C. Dellarocas, “The Collective Intelligence Genome,” Sloan Management Rev., vol. 51, no. 3, 2010, pp. 21–31. 5. J. Introne et al., “The Climate CoLab: Large Scale Model-Based Collaborative Planning,” Proc. 2011 Conf. Collaboration Technologies and Systems, IEEE CS Press, 2011, pp. 40–47. 6. L. Hong and S.E. Page, “Groups of Diverse Problem Solvers Can Outperform Groups of High-Ability Problem Solvers,” Proc. Nat’l Academy of Sciences of the United States of America, Nat’l Academy of Sciences, vol. 101, no. 46, 2004, pp. 16385–16389. 7. D. Gorissen et al., “A Surrogate Modeling and Adaptive Sampling Toolbox for ComputerBased Design,” J. Machine Learning Research, vol. 11, 2010, pp. 2051–2055. 8. T. Fiddaman et al., C-ROADS Simulator Reference Guide, Climate CoLab, 2011. 9. P.W.D. Nordhaus, A Question of Balance: Weighing the Options on Global Warming Policies, Yale Univ. Press, 2008. 10. M.L. Parry, O.F. Canziani, and J.P. Palutikof, “Technical Summary. Climate Change 2007: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the 4th Assessment Report of the Intergovernmental Panel on Climate Change,” Report of the Intergovernmental Panel on Climate Change, M.L. Parry et al., eds., Cambridge Univ. Press, 2007, pp. 23–78. 11. N.H. Stern, The Economics of Climate Change: The Stern Review, Cambridge Univ. Press, 2007. 12. L. Clarke et al., “International Climate Policy Architectures: Overview of the EMF 22 International Scenarios,” Energy Economics, vol. 31, no. 2, 2009, pp. S64–S81. Selected CS articles and columns are also available for free at http://ComputingNow.computer.org. N OV E M B E R / D E C E M B E R 2 0 11 | IEEE S O F T WA R E 61