Social Enterprising in Vietnam

I spent three months in Vietnam volunteering for an NGO. The Hanoi-based organization, CSIP (Center for Social Initiatives Promotion), promotes social enterprises and entrepreneurship in Vietnam through startup training, incubation, and acceleration programs. They also raise awareness about social enterprises, help NGOs become self-sustainable, and shape the political climate for social businesses through their online resources, published research, and public outreach efforts. Since 2009, they have funded and provided business support to over 40 self-sustainable enterprises, which as of 2013 are estimated to have improved the lives of over 200,000 disadvantaged people.

Operating with a double bottom line – business profit and social impact – social enterprises are gaining traction as a highly effective means to economic development. For example, in rural villages with high disability rates among children, families are burdened with providing specialized care and attention to their young while facing lost income from their children’s inability to work. One of CSIP’s social enterprises, Tòhe, offers an innovative solution. The company organizes art workshops for disadvantaged children, selects the top artwork produced, and prints them on merchandise – handbags, notebooks, clothing, etc. – that are sold across the country. Tòhe then pays the children for their art, allowing them to contribute to their family’s income while remaining engaged throughout the day. This is just one of many success stories that truly marked me over the course of my volunteering stint.

At CSIP, I was tasked with rebuilding their website. It loaded painfully slowly and was laden with inconsistent design schemes and distracting animations. The landing page didn’t define what CSIP was or did as an organization, and, worst of all, it didn’t cater to or try to solicit action from any of its target audiences. A lot of my work entailed identifying what CSIP expected to gain from their website and how they would measure success. After setting goals for their website and ranking them by importance, I was able to separate the important information from the less important and create a landing page that was consistent with the organization’s marketing ambitions.

On the technology front, administrators complained about the current website’s performance and its clunky content management interface. I rebuilt the website in Drupal, which was faster and easier for admins to use than Joomla. I also created a wireframe mockup of the new site and worked closely with a graphic designer to create a simple and attractive design. Then over the few remaining weeks I worked with CSIP’s administrators to teach them how to use the new administrative backend.

The new CSIP website, in addition to offering a greater user experience, ties more closely into the organization’s objectives. CSIP is now better positioned to attract applications to their training and incubation programs, to solicit investments and donations to fund their operations, and to spread the word about the social enterprise movement in Vietnam. Also, through the SEO techniques we applied, CSIP will also benefit from greater online traffic, finally receiving the exposure it deserves as the top social startup incubator in the country.

It pained me to leave Vietnam, a country I came to love so much. Long after my departure, I will continue to be amazed by CSIP’s approach and strong dedication to promoting sustainable social progress. I am honored to have been a part of their efforts and will certainly return there one day. So until then, to them I say: Hẹn sớm gặp lại bạn (see you soon)!

Predicting Train Accidents

On January 6th, 2005, a freight train moving northbound through Graniteville, South Carolina, intersected a railroad switch set in the wrong position.  The train was directed into an industrial siding and collided with a parked train, derailing 3 locomotives and 17 freight cars – three of which contained chlorine gas.  One of the chlorine tanks ruptured and killed 9 people, injured 554, and temporarily displaced 5,400 residents living within a 1-mile radius of the accident1.

The Graniteville collision was just one of many fatal train accidents that occurred over the past decade.  In 2008, after another deadly collision in Chatsworth, California2, the government intervened and passed the Rail Safety Improvement Act (RSIA).  The RSIA requires all Class I railroads carrying passengers or toxic-by-inhalation (TIH) materials to be equipped with Positive Train Control (PTC) technologies3.  PTC systems prevent accidents by detecting and warning train crews of impending hazards, and by automatically stopping the train if necessary.

Final regulations published in 2010 required affected railroads to demonstrate through a risk assessment that their proposed PTC technology would provide at least an 80% reduction in risk over their existing system.  To ensure that assessments were reliable, the Federal Railroad Administration (FRA) hired our firm, DecisionTek, to investigate methods for evaluating the inherent safety risk of rail systems.  Given any rail territory, our challenge was to predict train accidents over its life cycle, with and without PTC technology, and to assess whether the rail operator’s PTC implementation on the territory provided a sufficient degree of additional safety.

How to predict train accidents

Train accidents are rare.  They are also highly dependent of the territories on which they occur, specifically the territories’ track topology and operating environment.  As a result, predictions for accident frequency cannot be based solely on empirical data, and must instead be determined using simulation analysis.

Ideally, we would simulate hundreds of years of train operations on a modeled territory that closely replicates ours of study, and generate a sufficient number of accidents on the territory to derive statistically significant estimates for their frequency.  Such a simulation would take into account the physical characteristics of the territory (e.g. track connections, grades, curvatures, and speed zones) and its traffic conditions (e.g. train equipment, timetables, and schedules), and would incorporate human errors and equipment failures that lead to train accidents.  I helped to develop the simulation software that does exactly this.

However, our simulator had to overcome a big drawback of using traditional simulation methods (e.g. Monte Carlo) to predict rare events.  Simulating railroad operations is computationally intensive; even when deployed on a 12-core high-performance server, our software required 22 seconds to simulate one day of operations on a low-traffic territory.  If 1,000 years of simulated operations were required to generate statistically reliable estimates of accident frequency, our software would need 93 days to perform the analysis.  To address this problem, we devised a simulation technique that employs the concept of multi-level splitting4.

Multi-level splitting for rare-event simulation

The idea of multi-level splitting is to break the simulation process into several successive stages, where each stage generates events that are increasingly likely to lead to accidents or incidents.

In our implementation, the first stage focuses on generating human errors or equipment failures.  When those events occur in a simulation, the software pauses, freezes the current state (i.e. serializes all objects in memory), stores the state in the database, then resumes the simulation.  After capturing a sufficient number of events, the second simulation stage can begin.

In the second stage, the software randomly selects, unserializes, and resumes states that were stored in the first stage.  Because each selected simulation state contains an error or failure event, simulations in the second stage are more likely to generate accident events.  However, because accidents are still too difficult to generate at this point, in this stage we focus on generating hazardous events. Some of these include when trains exceed their authority (i.e. “burn a red light”), overspeed, or intersect a switch that is set in the wrong position.  When such events occur, again, the system state is recorded and stored for use in the following stage.

The third and final stage randomly selects from events stored in the second stage and resumes them from the point at which they were stored.  This stage seeks to generate accident or incident events, such as collisions or derailments, and is more likely to generate them because all simulations begin from a hazardous situation.  When accidents occur, specifics such as train identification, time, location, and speed are recorded and the simulation for that selected sample terminates.  The stage terminates when a sufficient number of accidents are generated to obtain a reliable estimate for their frequency.

The math behind multi-level splitting

Each simulation stage is performed separately and requires unique simulation parameters.  When simulating the first stage, users define railroad infrastructure and operational data, some period of analysis (e.g. January 1st, 2013 through December 31st, 2038), and reliability rates for human errors and equipment failures.  Without parallel processing, the simulation can take up to two days for a typical 25-year period.  During this time, the simulation can generate a set of human errors and equipment failures that are diverse-enough to lead to every possible accident or incident event.

When the simulation terminates, we can calculate the mean time to error or failure using the equation:

MTTE = T/N, where T is the period of analysis and N is the number of errors or failures generated.

When running the next simulation stage, users select a stage 1 simulation result and specify the number of times they would like to sample from its events.  Each “trial” is a simulation that begins from a stage 1 event (i.e. human error or equipment failure) and that can lead either to a hazardous event or to a safe resolution.  A small number of trials may not be sufficient to generate all possible hazardous situations, whereas an excessively large number of trials may waste computer resources by generating duplicate events after exhausting all possible paths.

When the stage 2 simulation completes, we can calculate the probability of a hazardous event given an error or failure using the following formula:

 pHE=NH/nT1 , where NH is the number of hazardous events generated in the second stage and nT1 is the number of trials used to generate those events.

This value is then used to calculate the mean time to hazardous event, defined as:

 MTTH = MTTE/pHE , where MTTE is the mean time to error calculated in the first simulation stage.

The final simulation stage requires users to specify a stage 2 simulation result and a number of trials.  Each trial is a simulation that begins from a stage 2 event (i.e. hazardous event) and that can lead either to an accident event or to a safe resolution.

When the stage 3 simulation completes, we can calculate the probability of an accident given a hazardous event using the following formula:

 p=NA/nT2, where NA is the number of accidents generated in stage 3 and nT2 is the number of stage 3 trials.

This value is used to calculate the mean time to accident:

 MTTA = MTTH/pAH , where MTTH is the mean time to hazard calculated in the previous stage.

The statistical confidence of these results depends on the number of trials chosen in each stage.  Our software uses a formula that determines a minimum number of trials in each stage required to obtain statistically reliable estimates of accident frequency.

Sample predictions and conclusion

To assess PTC risk, two safety evaluations must be performed: one for a base case scenario where PTC is not installed, and another for an alternate case scenario where PTC is installed.  For each scenario we calculate accident frequency, then we compare the results to verify that PTC achieves an 80% reduction in risk.

Here are sample results taken from our publication:

Level 2 Event Scenario Mean Time to Level 2 Event (MTTA) in days
Work zone accident Base
Over 300 years
Head-to-head collision Base
Over 300 years
Head-to-tail collision Base
Over 300 years
Over 300 years
Sideswipe collision Base
Emergency brake derailment Base
Overspeed derailment Base
Misaligned switch derailment Base
Over 300 years
Unauthorized switch derailment Base
Over 300 years

In conclusion, by focusing computer resources only on paths that are likely to lead to accidents, multi-level splitting is an effective simulation technique for predicting rare events.  Our simulator can produce a PTC risk analysis in as little as three days and can guarantee high statistical confidence.  The PTC-implementing railroads can now conduct speedy analyses to demonstrate the safety of their new PTC systems, and the FRA can trust and approve those technologies to make America’s railroads safer.

  1. Taken from the Graniteville NTSB accident report.  View it here: []
  2. View the NTSB report on the Chatsworth collision here: []
  3. []
  4. The ideas, formulae, and results detailed in this page are taken directly from our research published in the 2012 Railways issue of the Transportation Research Record:  Meyers, T., Stambouli, A., McClure, K., Brod, D. (2012) Risk Assessment of Positive Train Control by Using Simulation of Rare Events. Transportation Research Record, 2289, 34-41 []

Crowdsourcing Homeless Location Data

Every day, homeless individuals, particularly those with mental or addiction issues, refuse shelter or are left unattended. Relief organizations search for these people to provide blankets, food, or transportation to a nearby shelter, but could use help tracking locations in need and coordinating among volunteers and staff.  I developed technology that addresses this information problem, and in the last year I worked with an organization in Seattle to pilot test my solution.

Seattle’s Union Gospel Mission (SUGM), the largest homeless relief organization in Seattle, has been rescuing homeless individuals for almost thirty years.  Every day, they comb the streets of Seattle in their “Search and Rescue van” to locate individuals in need. They provide on-the-spot relief to as many individuals as they can find, and when they are unequipped to help, they mark the person’s location and provide relief at a later time.

I developed an Android application that SUGM staff can use during their search and rescue missions to record and view locations of homeless individuals.  When encountering an individual in need, staff can use the app to record the individual’s GPS coordinates and his or her needs.  For example, they can specify that an adult and child require food and water, or that a man with mental or addiction issues is wounded and needs first aid.  After the information is submitted, it is stored in a secure centralized server and is made available to all SUGM staff.

I also developed a web application for the SUGM staff working in headquarters to view and manage the data recorded during search and rescue missions.  The idea is for them to use the data to determine resources required by the search and rescue team and to dispatch relief and coordinate teams accordingly.  Sharing data between all parties within the organization increases transparency and information-exchange, decreases wait-time, and reduces errors associated with manually writing and reading locations.  Other benefits include improving the engagement of volunteers through the use of trendy mobile technology.

To learn more about this project, visit our website:  If you are interested in working on this project and have experience developing web/mobile applications (especially iPhone apps), please contact me.

Visualizing Railroad Infrastructure Data

At Decisiontek, we developed a train movement simulator to help rail operators determine the safety of their territories. In order to produce precise and dependable results, the software relied on a large set of inputs — the most important of which were the territory’s railroad infrastructure data.

Railroad infrastructure data describe all infrastructure and equipment within a rail system or subdivision. These include properties of the physical track such as their length, connectivity, grades, and curvatures, as well as equipment and devices including railroad switches, grade crossings, and signals. To conduct a study using our simulator, users had to define these data across 12 spreadsheets, which was an onerous process for territories that spanned hundreds of miles and included complex track configurations. We decided to re-design the experience of entering and managing railroad infrastructure data to accelerate this process and make it more accessible to non-technical users.

Our goal was to create new software that would present a visual representation of the track data that users could interact with. By abstracting the information into simple shapes and symbols, we could provide an intuitive interface for building, editing, and viewing railroad data. I developed this software over the course of 4 months using Microsoft Silverlight and the .NET Framework.

Designing the Interface

Our first task was to create a wireframe for the software. The major graphical elements of a territory’s visualization are the physical track along with its grades, curvatures, and speed limits. We organized these four elements and assigned them graphical representations using standards set forth in track diagrams – charts created by private railroads to depict their infrastructure. By preserving most of the design conventions used in track charts, our target users were a priori familiar with the user interface, giving them a sense of comfort when interacting with it.

Understanding the Data Model

Track segments and railroad switches form the basis of every railroad system. Track segments are the minimum unit of physical track that connect either to other segments or to switches. Railroad switches allow for one track to diverge into two tracks or for two tracks to converge back to one track. Together, track segments and switches form a ‘track network’ – a collection of inter-connected infrastructure elements that make up a rail system. This graph structure was reflected in the data model design using database columns that store both the preceding and succeeding infrastructure elements of each track or switch.

Executing the Track Visualization

One of the most common mistakes users made when preparing infrastructure data was specifying incorrect connectivity data (e.g. stating that segment 56 connects to both segments 55 and 73, when in fact it connects to segments 55 and 57). To validate against connectivity errors, I constructed the data visualization using a graph traversal algorithm. This method ensured that the territory depictions were only as accurate as their specified underlying data, and that only sections of the territory with connecting infrastructure elements would become displayed.

Most importantly, the traversal algorithm was used to discover and position the infrastructure elements on the canvas. Because territories often had multiple entry and exit points, the algorithm searched through every combination of system entry point, exit point, and direction in between. After all paths were searched, the algorithm knew exactly where elements were positioned relative to each other and could safely assign them x-positions on the canvas. An element’s x-position was determined using the cumulative length of all segments preceding it. An element’s y-position was available in the data and assumed to be set by the user at the time the element was created (users would eventually be able to create territories from scratch using the software, not only visualize existing ones).

In some cases, adjustments to track length or position were required to account for topological differences between actual track positions and the simplified track visualization. A common example is a siding, which track charts display as a section of track parallel to and with the same length as the main track. In reality, the main and siding tracks can have diverging paths with differing lengths. As a result, the algorithm had to trim or extend segments to ensure that diverging sidings connected back to the mainline at exact switch locations. I faced similar complications when visualizing track crossovers and wyes.

Displaying Track Grades, Curvatures, and Speed Zones

Specifying segment grades, curvatures, and speed zones allows the simulator to replicate the real-life operating conditions of a railroad. After users input these data, visualizing them can highlight errors such as excessive grades, drop-offs in elevation, or exceedingly sharp turns. Because of limitations in canvas space, these three track characteristics can only be displayed for one user-selected path through the system. However, users can still click on any segment to view its characteristic data.

Grades derive from elevation points, which are points along segments that hold measures of elevation in feet. Curvatures derive from heading points, which are points along segments that hold angle measures between the lines tangent to each point and the North-South line. Elevation and heading data are often obtained by the private railroads using satellite data, whereas grade and curvature data can be found on track charts and originate from on-site measurements. Grades are displayed like any other elevation profile, and curvatures are represented by semi-ellipses that face up or down when track curves to the left or right, respectively. Speed zones are shown as gray lines with maximum speed displayed above for each train type.

Editing Infrastructure Data

After territories could be visualized, I added menu items, forms, and data grids for editing the underlying data. Users could click on any track segment to edit it, delete it, or append a new one to it. If new segments were added, re-running the graph traversal algorithm refreshed the visualization.

To edit track characteristic data, users could summon an editable data grid by right-clicking a grade/curvature/speed zone shape or track segment. Modifications were asynchronously pushed to the database and the visualization was refreshed immediately afterwards. Users could also make modifications in a local spreadsheet and upload their changes.

Creating New Rail Systems

The final task was the ability to build new systems from scratch. I allowed users to start with a blank canvas and a single entry point from which they could build a rail line through a series of clicks. They could also add new entry points and interconnect parallel lines through crossovers. The process could take as little as three days to model a 100-mile double-track territory, as opposed to the three weeks of spreadsheet preparation, error trapping, and validation required by the previous data entry method.

Our final product received overwhelming positive feedback from our users. They saved an estimated 10 hours per week using the software, and our monthly number of troubleshooting requests declined substantially. Best of all, the railroads complimented us for our accurate depictions, and for the now-seamless process of running a full-fledged simulation analysis.

Empowering Chinese Labor NGOs

For my master’s thesis at Berkeley, my team built TIRO, a hotline management system designed to give small NGOs serving vulnerable clients in China better record-keeping and reporting capabilities. We identified this need during a field site research trip in China, where we studied how migrant workers adapt to city life and find reliable and safe work. During the trip we met with labor NGOs, and discovered opportunities to improve the efficiency and sustainability of their services.

Problem Statement

Chinese NGOs operate within a challenging political environment. While the country recognizes value in the social services that NGOs provide, it imposes limits on their work in order to maintain what it considers a harmonious society. All things being equal, the key to an NGO’s best chances for sustainability is for it to be able to clearly, compellingly, and forthrightly demonstrate that its work is aligned with the Party’s goals, or else face risk of crackdown when authorities have doubts about the true nature of its activities.

Many NGOs also lack the resources to operate efficiently, or to demonstrate the nature and value of their work. NGOs that operate hotline services still use paper logging and undergo laborious processes for manual data entry. Furthermore, they are unable to aggregate their data to demonstrate impact, which affects their fundraising efforts and their ability to share insights with government stakeholders.


Our approach was to equip Chinese NGOs with better, lower-cost tools for record-keeping and report-making. Specifically, because phone-based consultations remain the most integral part of an NGO’s operations, we developed a mobile phone application to record and manage information that surfaced through their hotlines. Our Android app allows hotline operators to log conversation content and retrieve call details, and an accompanied web application permits NGOs to generate reports featuring demographic, caller relationship, and service provisioning metrics.

For our project, we identified an NGO in China whom we partnered with for purposes of usability testing and a pilot study. We wanted to demonstrate that by using this system, our partner NGO could operate more efficiently and better communicate its work towards sustainability efforts — all while protecting the privacy and security of its clients’ information.


Over the course of this project and especially in the years since its completion in 2015, the climate for Chinese labor NGOs worsened. Local governments and police are applying pressure for them to stop their work through intimidation tactics, and in some cases even violence and arrests. While we believe TIRO can still be a valuable tool for mobile helpline operators around the world, our original mission to strengthen relationships between Chinese labor NGOs and local governments is unfortunately out of touch with the realities of today’s political environment.

Learn more about the project here:, and go check out my amazing team-members Faye Ip, Jenny Lo, and Sophia Lay.