Learn C practically and Get Certified .

Popular Tutorials

Popular examples, reference materials, learn c interactively, c introduction.

  • Getting Started with C
  • Your First C Program

C Fundamentals

  • C Variables, Constants and Literals
  • C Data Types
  • C Input Output (I/O)
  • C Programming Operators

C Flow Control

C if...else Statement

  • C while and do...while Loop

C break and continue

C switch Statement

C goto Statement

  • C Functions
  • C User-defined functions
  • Types of User-defined Functions in C Programming
  • C Recursion
  • C Storage Class

C Programming Arrays

  • C Multidimensional Arrays
  • Pass arrays to a function in C

C Programming Pointers

  • Relationship Between Arrays and Pointers
  • C Pass Addresses and Pointers
  • C Dynamic Memory Allocation
  • C Array and Pointer Examples
  • C Programming Strings
  • String Manipulations In C Programming Using Library Functions
  • String Examples in C Programming

C Structure and Union

  • C structs and Pointers
  • C Structure and Function

C Programming Files

  • C File Handling
  • C Files Examples

C Additional Topics

  • C Keywords and Identifiers
  • C Precedence And Associativity Of Operators
  • C Bitwise Operators
  • C Preprocessor and Macros
  • C Standard Library Functions

C Tutorials

  • Make a Simple Calculator Using switch...case
  • Find the Largest Number Among Three Numbers

List of all Keywords in C Language

The switch statement allows us to execute one code block among many alternatives.

You can do the same thing with the if...else..if ladder. However, the syntax of the switch statement is much easier to read and write.

Syntax of switch...case

How does the switch statement work?

The expression is evaluated once and compared with the values of each case label.

  • If there is a match, the corresponding statements after the matching label are executed. For example, if the value of the expression is equal to constant2 , statements after case constant2: are executed until break is encountered.
  • If there is no match, the default statements are executed.
  • If we do not use the break statement, all statements after the matching label are also executed.
  • The default clause inside the switch statement is optional.
  • switch Statement Flowchart

Flowchart of switch statement

  • Example: Simple Calculator

The - operator entered by the user is stored in the operation  variable. And, two operands 32.5 and 12.4 are stored in variables n1 and n2 respectively.

Since the operation  is - , the control of the program jumps to

Finally, the break statement terminates the switch statement.

Table of Contents

  • Introduction
  • Syntax of switch

Video: C if switch case

Sorry about that.

Related Tutorials

case study in c programming

  • Computers & Technology
  • Programming Languages

Buy new: .savingPriceOverride { color:#CC0C39!important; font-weight: 300!important; } .reinventMobileHeaderPrice { font-weight: 400; } #apex_offerDisplay_mobile_feature_div .reinventPriceSavingsPercentageMargin, #apex_offerDisplay_mobile_feature_div .reinventPricePriceToPayMargin { margin-right: 4px; } $73.53 $ 73 . 53 FREE delivery Wednesday, June 5 Ships from: Goodvibes Books Sold by: Goodvibes Books

Save with used - like new .savingpriceoverride { color:#cc0c39important; font-weight: 300important; } .reinventmobileheaderprice { font-weight: 400; } #apex_offerdisplay_mobile_feature_div .reinventpricesavingspercentagemargin, #apex_offerdisplay_mobile_feature_div .reinventpricepricetopaymargin { margin-right: 4px; } $68.65 $ 68 . 65 $3.99 delivery june 5 - 6 ships from: sablevision sold by: sablevision.

Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required .

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Image Unavailable

C How to Program: With Case Studies in Applications and Systems Programming, Global Edition

  • To view this video download Flash Player

Follow the author

H.M. Deitel

C How to Program: With Case Studies in Applications and Systems Programming, Global Edition 9th Edition

Purchase options and add-ons.

For courses in computerprogramming.

A user-friendly,code-intensive introduction to C programming with case studies introducingapplications and system programming C How to Program is a comprehensive introduction toprogramming in C. Like other texts of the Deitels’ How to Program series,the book’s modular presentation serves as a detailed, beginner source ofinformation for college students looking to embark on a career in coding, orinstructors and software-development professionals seeking to learn how toprogram with C. The signature Deitel live-code approach presents concepts inthe context of 142 fully working programs rather than incomplete snips of code.This gives students a chance to run each program as they study it and see howtheir learning applies to real-world programming scenarios.

Current standards, contemporary practice, and hands-onlearning opportunities are integrated throughout the 9th Edition .Over 350 new, integrated Self-Check exercises with answers allow students totest their understanding of important concepts ― and check their code ― as theyread. New and enhanced case studies and exercises use real-world data and focuson the latest ACM/IEEE computing curricula recommendations, highlighting security,data science, ethics, privacy, and performance concepts.

  • ISBN-10 1292437073
  • ISBN-13 978-1292437071
  • Edition 9th
  • Publisher Pearson
  • Publication date May 23, 2022
  • Language English
  • Dimensions 8.07 x 0.98 x 10.04 inches
  • See all details

Amazon First Reads | Editors' picks at exclusive prices

Customers who viewed this item also viewed

C How to Program

Product details

  • Publisher ‏ : ‎ Pearson; 9th edition (May 23, 2022)
  • Language ‏ : ‎ English
  • ISBN-10 ‏ : ‎ 1292437073
  • ISBN-13 ‏ : ‎ 978-1292437071
  • Item Weight ‏ : ‎ 3.36 pounds
  • Dimensions ‏ : ‎ 8.07 x 0.98 x 10.04 inches
  • #331 in Computer Programming Languages
  • #2,379 in Programming Languages (Books)
  • #25,871 in Unknown

About the author

H.m. deitel.

Discover more of the author’s books, see similar authors, read author blogs and more

Customer reviews

Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.

To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.

  • Sort reviews by Top reviews Most recent Top reviews

Top review from the United States

There was a problem filtering reviews right now. please try again later..

case study in c programming

Top reviews from other countries

case study in c programming

  • Amazon Newsletter
  • About Amazon
  • Accessibility
  • Sustainability
  • Press Center
  • Investor Relations
  • Amazon Devices
  • Amazon Science
  • Sell on Amazon
  • Sell apps on Amazon
  • Supply to Amazon
  • Protect & Build Your Brand
  • Become an Affiliate
  • Become a Delivery Driver
  • Start a Package Delivery Business
  • Advertise Your Products
  • Self-Publish with Us
  • Become an Amazon Hub Partner
  • › See More Ways to Make Money
  • Amazon Visa
  • Amazon Store Card
  • Amazon Secured Card
  • Amazon Business Card
  • Shop with Points
  • Credit Card Marketplace
  • Reload Your Balance
  • Amazon Currency Converter
  • Your Account
  • Your Orders
  • Shipping Rates & Policies
  • Amazon Prime
  • Returns & Replacements
  • Manage Your Content and Devices
  • Recalls and Product Safety Alerts
  • Conditions of Use
  • Privacy Notice
  • Consumer Health Data Privacy Disclosure
  • Your Ads Privacy Choices

InterviewBit

15+ Exciting C Projects Ideas With Source Code

Introduction, c projects for beginners, simple calculator, student record management system, mini project for phone book, unit converter project, intermediate c projects with source code, mini voting system, tic-tac-toe game, matrix calculator, library management system, electricity bill calculator, movie ticket booking system, advanced c projects with source code, snakes and ladders game, lexical analyzer, bus reservation system, pac-man game, other project ideas, frequently asked questions, additional resources.

If you are looking for project ideas to enhance your C Programming skills, you are at the right place. Programming is more about what you can figure out than what you know. With the technology landscape continually changing, problem-solving is the one skill that allows you to navigate through innovations while also evolving. Start with C, the language using which most of the current programming languages are derived, to sharpen your essential programming skills and develop problem-solving abilities. C is widely used in practically every field and is regarded as the best language for novices, despite the fact that it was first introduced 50 years ago. This raises the question of what C is and why it is still so popular. 

The C programming language is a procedural programming language. Dennis Ritchie created it as a system programming language for writing operating systems. Low-level memory access, a small collection of keywords, and a clean style are all qualities that make C language excellent for system programmings, such as operating system or compiler development. C quickly established itself as a powerful and reliable language in the software development area, with some of the most well-known names still linked with it today. C is used to create Microsoft Windows, Apple’s OS X, and Symbian. The C programming language is also used by Google’s Chromium, MySQL, Oracle, and the majority of Adobe’s apps. It also plays an important role in our daily lives, as most of the smart devices we use today are powered by C-programmed technology. 

Let’s see what are the features that make C a popular and demanded language.

Confused about your next job?

  • Flexibility – The seamless flexibility it offers in terms of  memory management and allocation is one of the key reasons why C is so extensively used. Programmers have complete control over how they allocate and reallocate memory, resulting in increased efficiency and improved optimization. The C programming language provides several functions for memory allocation and management like calloc(), malloc() etc.
  • Portability – C continues to be a very portable assembly language. It comes with numerous libraries that improve its functionality and allow it to work with practically any processor architecture. Compilers, libraries, and interpreters for a variety of other programming languages are typically written in C. This enables more efficient computation and accessibility.
  • Simplicity – C is classified as a mid-level language, which implies it has characteristics of both high-level and low-level languages. It’s straightforward to understand and use as a result of this. It also helps users to break down code into smaller, more legible parts because it is a structured programming language.
  • Structured Language – C is a structured programming language in the sense that functions can be used to break down a program into smaller chunks (functions). These functions also allow you to reuse code. As a result, it is simple to comprehend and work on. 
  • Memory management – C supports dynamic memory allocation (that is, allocation of memory at runtime). We can free the allocated memory at any time in the C language by using pre-defined functions.
  • Speed – There is no denying the fact that the compilation and execution time of the C language is fast since there are lesser inbuilt functions and hence the lesser overhead.

Compiled language – A Compiler is used in the C language to compile the code into object code, which is nothing more than machine code that the computer understands. You can split your code into many source code files in the C programming language. The files will be compiled individually and then linked together for execution.

We’ll look at some intriguing C projects that you may find on GitHub in this article. We believe that these project ideas will assist you in improving your problem-solving abilities, broadening your knowledge base, and enriching your learning experience. Mini projects, mini-games, and little apps are among the C projects described here. The majority of these programs make good use of functions, file handling, and data structure. Analyze and comprehend the source code of these projects, and you’ll be able to develop a similar project by learning how to add, modify, view, search, and delete data using files.

You can build a simple calculator with C using switch cases or if-else statements. This calculator takes two operands and an arithmetic operator (+, -, *, /) from the user, however, you can expand the program to accept more than two operands and one operator by adding logic. Then, based on the operator entered by the user, it conducts the computation on the two operands. The input, however, must be in the format “number1 operator1 number2” (i.e. 2+4).

Source Code – Calculator

Using C language, you can also create a student management system. To handle students’ records (like Student’s roll number, Name, Subject, etc.) it employs files as a database to conduct file handling activities such as add, search, change, and remove entries. It appears a simple project but can be handy for schools or colleges that have to store records of thousands of students.

Source Code – Student Management

If you have ever lost track of which day of the week is today or the number of days in that particular month, you should build a calendar yourself. The Calendar is written in the C programming language, and this Calendar assists you in determining the date and day you require. We can implement it using simple if-else logic and switch-case statements. The display() function is used to display the calendar and it can be modified accordingly. It also has some additional functions. The GitHub link of the calendar has been provided below.

Source Code – Calendar

This Phone book Project generates an external file to permanently store the user’s data (Name and phone number). The phone book is a very simple C project that will help you understand the core concepts of capacity, record keeping, and data structure. This program will show you how to add, list, edit or alter, look at, and delete data from a record.

Source Code – Phone Book

Forgot how to convert degree Fahrenheit to Celsius? Don’t worry. We have a solution for you. This unit converter converts basic units such as temperature, currency, and mass.

Source Code – Unit Converter

An online voting system is a software platform that enables organizations to conduct votes and elections securely. A high-quality online voting system strikes a balance between ballot security, convenience, and the overall needs of a voting event. By collecting the input of your group in a systematic and verifiable manner, online voting tools and online election voting systems assist you in making crucial decisions. These decisions are frequently taken on a yearly basis – either during an event (such as your organization’s AGM) or at a specific time of the year. Alternatively, you may conduct regular polls among your colleagues (e.g. anonymous employee feedback surveys).

With this voting system, users can enter their preferences and the total votes and leading candidate can be calculated. It’s a straightforward C project that’s simple to grasp. Small-scale election efforts can benefit from this.

Source Code – Voting System

Tic-tac-toe, also known as noughts and crosses or Xs and Os, is a two-person paper and pencil game in which each player alternates marking squares in a three-by-three grid with an X or an O. The winner is the player who successfully places three of their markers in a horizontal, vertical, or diagonal row. You can implement this fun game using 2D arrays in the C programming language. It is important to use arrays while creating a Tic Tac Toe game in the C programming language. The Xs and Os are stored in separate arrays and passed across various functions in the code to maintain track of the game’s progress. You can play the game against the computer by entering the code here and selecting either X or O. The source code for the project is given below.

Source Code – Tic Tac Toe

Mathematical operations are an everyday part of our life. Every day, we will connect with many forms of calculations in our environment. Matrices are mathematical structures in which integers are arranged in columns and rows. In actual life, matrices are used in many applications. The most common application is in the software sector, where pathfinder algorithms, image processing algorithms, and other algorithms are developed. Some fundamental matrix operations are performed in this project, with the user selecting the operation to be performed on the matrix. The matrices and their sizes are then entered. It’s worth noting that the project only considers square matrices.

Library management is a project that manages and preserves electronic book data based on the demands of students. Both students and library administrators can use the system to keep track of all the books available in the library. It allows both the administrator and the student to look for the desired book. The C files used to implement the system are: main.c, searchbook.c, issuebook.c, viewbook.c, and more.

Source Code – Library Management

The Electricity Cost Calculator project is an application-based micro project that predicts the following month’s electricity bill based on the appliances or loads used. Visual studio code was used to write the code for this project. This project employs a multi-file and multi-platform strategy ( Linux and Windows ). People who do not have a technical understanding of calculating power bills can use this program to forecast their electricity bills for the coming months; however, an electricity bill calculator must have the following features:

  • All loads’ power rating
  • Unit consumed per day
  • Units consumed per month, and
  • Total load calculation

Source Code – Electricity Billing

The project’s goal is to inform a consumer about the MOVIE TICKET BOOKING SYSTEM so that they can order tickets. The project was created with the goal of making the process as simple and quick as possible. The user can book tickets, cancel tickets, and view all booking records using the system. Our project’s major purpose is to supply various forms of client facilities as well as excellent customer service. It should meet nearly all the conditions for reserving a ticket.

Source Code – Movie Ticket Booking

Snakes and ladders, also known as Moksha Patam, is an ancient Indian board game for two or more players that is still considered a worldwide classic today. It’s played on a gridded game board with numbered squares. On the board, there are several “ladders” and “snakes,” each linking two distinct board squares. The dice value can either be provided by the user or it can be generated randomly. If after moving, the pointer points to the block where the ladder is, the pointer is directed to the top of the ladder. If unfortunately, the pointer points to the mouth of a snake after moving, the pointer is redirected to the tail of the snake. The objectives and rules of the game can be summarized as-

Objective – Given a snake and ladder game, write a function that returns the minimum number of jumps to take the top or destination position.

You can assume the dice you throw results in always favor of you, which means you can control the dice.

Source Code – Snakes and Ladders

The Lexical Analyzer program translates a stream of individual letters, which are generally grouped as lines, into a stream of lexical tokens. Tokenization, for example, of source code words and punctuation symbols. The project’s main goal/purpose is to take a C file and generate a sequence of tokens that can be utilized in the next stage of compilation. This should also account for any error handling requirements that may arise during tokenization.

Source Code – Lexical Analyzer

This system is built on the concept of booking bus tickets in advance. The user can check the bus schedule, book tickets, cancel reservations, and check the bus status board using this system. When purchasing tickets, the user must first enter the bus number, after which the system will display the entire number of bus seats along with the passengers’ names, and the user must then enter the number of tickets, seat number, and person’s name. We will be using arrays, if-else logic, loop statements, and various functions like login(), cancel(), etc. to implement the project.

Source Code – Bus Reservation System

This little project is a modest recreation of the Offline Google Chrome game Dinosaur Jump. The game can be played at any moment by the user. The entire project is written in the C programming language. The X key is used to exit the game, and the Space bar is used to leap. play and score as many points as you can; this is a fun, simple game designed specifically for novices, and it’s simple to use and understand.

Source Code – Dino Game

Pacman, like other classic games, is simple to play. In this game, you must consume as many small dots as possible to earn as many points as possible. The entire game was created using the C programming language. Graphics were employed in the creation of this game. To create the game, you have to first define the grid function to manage the grid structure. To control the movement, you can define functions such as move_right(), move_left(), move_up() and move_down(). C files to add ghosts and their functionalities, positions check, etc. can be added to make the game more fun. The customers will find this C Programming game to be simple to comprehend and manage.

Source Code – Pac Man

Some project ideas are given below. These are just ideas, source code links for these have not been provided.

  • Bank management system
  • Airlines reservation system
  • Vaccine registration portal
  • A calculator
  • Tic toe game
  • Password management system
  • Phone book management system
  • Snake and ladders game
  • Rock paper scissor game
  • Unit conversion system
  • Tip calculator
  • Employee information management system
  • Library management system
  • Mini voting system
  • Vaccine registration system
  • Cricket Scorecard management system
  • Hangman game
  • Pac-Man game
  • Grocery list
  • Medical store management system
  • School billing system
  • Student record system
  • Typing tutor
  • Traffic control management system
  • Telephone billing system
  • Hotel accommodation system

We collected some C language projects and ideas for you in this article. GitHub has established a huge collection of projects from programmers who routinely examine and critique each other’s codes as the world’s largest software development community. Furthermore, because the platform supports many programming languages, there are a plethora of C project ideas on GitHub for anyone to get ideas from. As the developer, it’s up to you to think outside the box, come up with inventive solutions using available resources, and contribute to the future of software. For the benefit of clarity, the projects/software are grouped into distinct headings. So, if you’re new to project development, start by understanding and analyzing a tiny project before going on to a project with a broader scope and application.

Q. Is C good for big projects? A. C is indeed suitable for large projects. Programming in C requires a great deal of discipline than most modern programming languages. C aids in the learning of programming fundamentals, and because it is a procedural language, it necessitates a large amount of hard code in comparison to its competitors.

Q. Can you program games with C? A. The C programming language can be used to create games, however, most people choose to choose other languages.

Q. Is C still used today? A. Yes, C is still one of the most popular programming languages today.

Q. What should I build in C? A. Start with a little project to understand and analyze before moving on to a project with a larger scope and applicability if you’re new to project development. Some project ideas along with their source code are given in this article.

  • C Interview Questions
  • Online C Compiler
  • Features of C Language
  • Difference Between C and Python
  • Difference Between C and Java
  • Difference between C and C++
  • C Programming

Previous Post

7 best system design books in 2023, 9 best data science courses by data scientists (free & paid) – 2023.

Programming case studies

A case study consists of a programming problem, one or more solutions to the problem, and a narrative description of the process used by an expert to produce the solutions.

Rationale for case studies and ideas for using them in a programming course are laid out in the following papers:

  • “The Case for Case Studies of Programming Problems” , Marcia C. Linn and Michael J. Clancy, Communications of the ACM , volume 35, number 3, pages 121-132, March 1992.
  • “Case Studies in the Classroom” , Michael J. Clancy and Marcia C. Linn, proceedings of the 23rd SIGCSE Technical Symposium on Computer Science Education, Kansas City, Missouri, March, 1992; published as SIGCSE Bulletin , volume 24, number 1, March 1992. (A revised version of this paper is available here .)

Marcia Linn and I put together two collections of case studies in Pascal programming. You can get them used at amazon.com for around $2—such a deal!

  • Designing Pascal Solutions: Case Studies with Data Structures , Michael J. Clancy and Marcia C. Linn, W.H. Freeman and Company, 1996.
  • Designing Pascal Solutions: A Case Study Approach , Michael J. Clancy and Marcia C. Linn, W.H. Freeman and Company, 1992.

Other textbooks with a strong case study approach include the following:

  • (books will be listed here as I find them)

Our Scheme-based introductory programming course for non-CS majors uses several case studies. The self-paced version of the course, CS 3S , uses the case studies listed below.

  • "Difference between Dates" case study
  • "Roman Numerals" case study
  • "Statistics" case study

I'm still developing case studies. In particular, I've tried two new ones in CS 61B , our course on data structures and programming methodology. Comments and feedback are welcome.

  • "Medical Diagnosis" case study (unfinished) This involves a program to match up reported symptoms with diseases that the symptoms may suggest. I had planned to solve the problem using "simple" data structures, identify bottlenecks revealed by runs on big data sets, and then replace the data base by something more efficiently accessed and updated. However, it didn't turn out the way I planned.
  • "Bowling Scores" case study Loosely based on http://www.xprogramming.com/xpmag/acsBowling.htm and http://www.xprogramming.com/xpmag/acsBowlingProcedural.htm —thanks much to Rodney Hoffman for these links—this describes the test-driven development of a "bowling scorer" object. Unfortunately, I had trouble linking this case study to the rest of the data structures course.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Course: Resources   >   Unit 1

  • Programming content overview
  • Tracking progress of programming students
  • Classroom debugging guide
  • Pair programming in the classroom
  • Teaching guide: Intro to JS - Drawing Basics
  • Teaching guide: Intro to JS - Coloring
  • Teaching guide: Intro to JS - Variables
  • Teaching guide: Intro to JS - Animation basics
  • Teaching guide: Intro to JS - Interactive Programs
  • Teaching guide: Intro to JS - Resizing with variable expressions
  • Teaching guide: Intro to JS - Text and strings
  • Teaching guide: Intro to JS - Functions
  • Teaching guide: Intro to JS - Logic and if statements
  • Teaching guide: Intro to JS - Looping
  • Teaching guide: Intro to JS - Arrays
  • Teaching guide: Intro to JS - Objects
  • Teaching guide: Intro to JS - Object-oriented design
  • Programming classroom handouts
  • Additional programming projects
  • Lesson plans: teaching programming in the classroom

Programming case study: Encouraging cross-disciplinary projects

  • Programming case study: Going beyond the KA curriculum
  • Programming case study: Teaching an elementary school class

case study in c programming

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Advertisement

Advertisement

A case study investigating programming students’ peer review of codes and their perceptions of the online learning environment

  • Published: 05 February 2020
  • Volume 25 , pages 3553–3575, ( 2020 )

Cite this article

case study in c programming

  • Roshni Sabarinath   ORCID: orcid.org/0000-0001-8473-7744 1 &
  • Choon Lang Gwendoline Quek 1  

10 Citations

Explore all metrics

Programming in schools is no longer a novel subject. It is now quite commonly found in our schools either in formal or informal curriculum. Programmers use creative learning tactics to solve problems and communicate ideas. Learning to program is generally considered challenging. Developing and implementing new methodologies in teaching programming is imperative to overcome the current challenges associated with teaching and learning of programming. This case study aims to contribute to the programming education in schools by investigating how students learn in an online programming while involved in peer review of codes. The study subsequently examines students’ perceptions of the pedagogical, social and technical design of the online programming learning environment. When students are involved in providing and receiving feedback and creating their own identity in a programming community, they may be better prepared for learning and applying programming in their undergraduate studies and their future career in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

case study in c programming

Similar content being viewed by others

case study in c programming

Integrating Code Reviews into Online Lessons to Support Software Engineering Education

case study in c programming

Code Review at High School? Yes!

case study in c programming

Evaluating the Effectiveness of a New Programming Teaching Methodology Using CodeRunner

Agalianos, A., Noss, R., & Whitty, G. (2001). Logo in mainstream schools: The struggle over the soul of an educational innovation. British Journal of Sociology of Education, 22 (4), 479–500.

Google Scholar  

Awuah, L. J. (2015). Supporting 21st-century teaching and learning: The role of Google apps for education (GAFE). Journal of Instructional Research, 4 , 12–22.

Barr, V., & Guzdial, M. (2015). Advice on teaching CS, and the learnability of programming languages. Communications of the ACM, 58 (3), 8–9.

Blau, I., & Caspi, A. (2008). Don’t edit, discuss! The influence of wiki editing on learning experience and achievement. In D. Ben-Zvi (Ed.), Innovative e-learning in higher education (pp. 19–23). Haifa: University of Haifa.

Boud, D., Cohen, R., & Sampson, J. (1999). Peer learning and assessment. Assessment & Evaluation in Higher Education, 24 (4), 413–426.

Brennan, K., & Resnick, M. (2012). New frameworks for studying and assessing the development of computational thinking. In Proceedings of the 2012 annual meeting of the American Educational Research Association, Vancouver, Canada (Vol. 1, p. 25).

Brown, M. E., & Hocutt, D. L. (2015). Learning to use, useful for learning: A usability study of Google apps for education. Journal of Usability Studies, 10 (4), 160–181.

Cheng, K. H., Liang, J. C., & Tsai, C. C. (2015). Examining the role of feedback messages in undergraduate students' writing performance during an online peer assessment activity. The Internet and Higher Education, 25 , 78–84.

Chi, M. T. (1997). Quantifying qualitative analyses of verbal data: A practical guide. The Journal of the Learning Sciences, 6 (3), 271–315.

Chiang, F. K., & Qin, L. (2018). A pilot study to assess the impacts of game-based construction learning, using scratch, on students’ multi-step equation-solving performance. Interactive Learning Environments, 26 (6), 803–814.

Cho, K., Chung, T. R., King, W. R., & Schunn, C. (2008). Peer-based computer-supported knowledge refinement: An empirical investigation. Communications of the ACM, 51 (3), 83–88.

Connelly, S. (2017). Top skills you need in your arsenal to ride the augmented reality wave. Retrieved November 2, 2018 from https://blog.sparksgroupinc.com/candidate/blogs/candidate/augmented-reality/top-ar-skills . Accessed 2 Nov 2018.

Crane, G. E. (2016). Leveraging Digital Communications Technology in Higher Education: Exploring URI’s Adoption of Google Apps for Education 2015.

Creswell, J. W. (2014). Educational research: Planning, conducting and evaluating quantitative and qualitative research (4th ed.). London: Pearson Publications Ltd..

Dickes, A. C., & Sengupta, P. (2013). Learning natural selection in 4th grade with multi-agent-based computational models. Research in Science Education, 43 (3), 921–953.

Fahy, P. J. (2001). Addressing some common problems in transcript analysis. The International Review of Research in Open and Distributed Learning, 1(2). Retrieved September 5, 2018 from http://www.irrodl.org/index.php/irrodl/article/view/321/530 . Accessed 5 Sep 2018.

Gelder, T. V. (2005). Teaching critical thinking: Some lessons from cognitive science. College Teaching, 53 (1), 41–48.

Gibson, J. J. (1966). The senses considered as perceptual systems . Boston: Houghton Mifflin.

Glassner, A., Weinstock, M., & Neuman, Y. (2005). Pupils' evaluation and generation of evidence and explanation in argumentation. British Journal of Educational Psychology, 75 (1), 105–118.

Google for Education (n.d.). Retrieved August 10, 2018 from https://edu.google.com/

Grover, S., & Pea, R. (2013). Computational thinking in K–12: A review of the state of the field. Educational Researcher, 42 (1), 38–43.

Grover, S., Pea, R., & Cooper, S. (2015). Designing for deeper learning in a blended computer science course for middle school students. Computer Science Education, 25 (2), 199–237.

Han, B., Bae, Y., & Park, J. (2016). The effect of mathematics achievement variables on scratch programming activities of elementary school students. International Journal of Software Engineering and Its Applications, 10 (12), 21–30.

Heggart, K. R., & Yoo, J. (2018). Getting the Most from Google classroom: A pedagogical framework for tertiary educators. Australian Journal of Teacher Education, 43 (3), 9.

Hsia, L. H., Huang, I., & Hwang, G. J. (2016). A web-based peer-assessment approach to improving junior high school students' performance, self-efficacy and motivation in performing arts courses. British Journal of Educational Technology, 47 (4), 618–632.

Kafai, Y. B., & Burke, Q. (2013). Computer programming goes back to school. Phi Delta Kappan, 95 (1), 61–65.

Klein, R., Orelup, R., & Smith, M. (2012). Google apps for education: Valparaiso University's migration experience. In Proceedings of the 40th annual ACM SIGUCCS conference on User services (pp. 203-208). ACM.

Koul, R., Fraser, B., & Nastiti, H. (2018). Transdisciplinary instruction: Implementing and evaluating a primary-school STEM teaching model. International Journal of Innovation in Science and Mathematics Education (formerly CAL-laborate International), 26 (8), 17–29.

Lai, C. S., & Lai, M. H. (2012, June). Using computer programming to enhance science learning for 5th graders in Taipei. In Computer, Consumer and Control (IS3C), 2012 International Symposium on (pp. 146-148). IEEE.

Lee, E. Y., Chan, C. K., & van Aalst, J. (2006). Students assessing their own collaborative knowledge building. International Journal of Computer-Supported Collaborative Learning, 1 (1), 57–87.

Lindh, M., Nolin, J., & Hedvall, K. N. (2016). Pupils in the clouds: Implementation of Google apps for education. First Monday, 21 (4).

Liu, N. F., & Carless, D. (2006). Peer feedback: The learning element of peer assessment. Teaching in Higher Education, 11 (3), 279–290.

Liu, E. Z. F., Lin, S. S., Chiu, C. H., & Yuan, S. M. (2001). Web-based peer review: The learner as both adapter and reviewer. IEEE Transactions on Education, 44 (3), 246–251.

Lye, S. Y., & Koh, J. H. L. (2014). Review on teaching and learning of computational thinking through programming: What is next for K-12? Computers in Human Behavior, 41 , 51–61.

McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia medica: Biochemia medica, 22 (3), 276–282.

MathSciNet   Google Scholar  

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook . Thousand Oaks: Sage.

Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas . New York: Basic Books.

Pond, K., Ul-Haq, R., & Wade, W. (1995). Peer review: A precursor to peer assessment. Innovations in Education and Training International, 32 (4), 314–323.

Price, E., Goldberg, F., Robinson, S., & McKean, M. (2016). Validity of peer grading using calibrated peer review in a guided-inquiry, conceptual physics course. Physical Review Physics Education Research, 12 (2), 020145.

Quek, C. L., & Wang, Q. (2014). Exploring teachers’ perceptions of wikis for learning classroom cases. The Australian Journal of Teacher Education, 39 (2), 100–120.

Resnick, M. (2013). Learn to code, code to learn. EdSurge , May. Retrieved July 30, 2018 from https://www.edsurge.com/news/2013-05-08-learn-to-code-code-to-learn . Accessed 30 Jul 2018.

Rick, J., & Guzdial, M. (2006). Situating CoWeb: A scholarship of application. International Journal of Computer-Supported Collaborative Learning, 1 (1), 89–115.

Robertson, C. (2013). Using a cloud-based computing environment to support teacher training on common core implementation. TechTrends, 57 (6), 57–60.

Roth, W. M. (1997). From everyday science to science education: How science and technology studies inspired curriculum design and classroom research. Science & Education, 6 (4), 373–396.

Sáez-López, J. M., Román-González, M., & Vázquez-Cano, E. (2016). Visual programming languages integrated across the curriculum in elementary school: A two year case study using “Scratch” in five schools. Computers & Education, 97 , 129–141.

Sanjanaashree, P., & Soman, K. P. (2014). Language learning for visual and auditory learners using scratch toolkit. In Proceedings of the Computer Communication and Informatics (ICCCI), 2014 International Conference on (pp. 1–5). Coimbatore: IEEE.

Sentance, S., & Csizmadia, A. (2017). Computing in the curriculum: Challenges and strategies from a teacher’s perspective. Education and Information Technologies, 22 (2), 469–495.

Sherin, B. L. (2001). A comparison of programming languages and algebraic notation as expressive languages for physics. International Journal of Computers for Mathematical Learning, 6 (1), 1–61.

Shitut , N. (2018). 5 skills you need to know to become a big data analyst. Retrieved December, 2018 from https://analyticstraining.com/5-skills-need-know-become-big-data-analyst/ . Accessed 30 Nov 2018.

Singh, A. K. J., Harun, R. N. S. R., & Fareed, W. (2013). Affordances of Wikispaces for collaborative learning and knowledge management. GEMA Online® Journal of Language Studies, 13 (3), 79–97.

Stein, S., Ware, J., Laboy, J., & Schaffer, H. E. (2013). Improving K-12 pedagogy via a cloud designed for education. International Journal of Information Management, 33 (1), 235–241.

Stemler, S. (2001). An overview of content analysis. Practical Assessment, Research & Evaluation, 7 (17), 137–146.

Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68 (3), 249–276.

Topping, K. J., & Ehly, S. W. (2001). Peer assisted learning: A framework for consultation. Journal of Educational and Psychological Consultation, 12 (2), 113–132.

Tsai, Y. C., & Chuang, M. T. (2013). Fostering revision of argumentative writing through structured peer assessment. Perceptual and Motor Skills, 116 (1), 210–221.

Tsivitanidou, O. E., Zacharia, Z. C., & Hovardas, T. (2011). Investigating secondary school students’ unmediated peer assessment skills. Learning and Instruction, 21 (4), 506–519.

Utting, I., Cooper, S., Kölling, M., Maloney, J., & Resnick, M. (2010). Alice, greenfoot, and scratch--a discussion. ACM Transactions on Computing Education (TOCE), 10 (4), 17.

Wagh, A. (2016). Building v/s exploring models: Comparing learning of evolutionary processes through agent-based modelling . Evanston: Northwestern University.

Wang, Y., & Jin, B. (2010). The application of SaaS model in network education-take google apps for example. In Education Technology and Computer (ICETC), 2010 2nd International Conference on (Vol. 4, pp. V4-191). IEEE.

Wang, Q. Y., Woo, H. L., & Chai, C. S. (2010). Affordances of ICT tools for learning. In C. S. Chai & Q. Y. Wang (Eds.), ICT for self-directed and collaborative learning (pp. 70–79). Singapore: Pearson/Prentice Hall.

Wang, H. Y., Huang, I., & Hwang, G. J. (2016a). Comparison of the effects of project-based computer programming activities between mathematics-gifted students and average students. Journal of Computers in Education, 3 (1), 33–45.

Wang, Y., Liang, Y., Liu, L., & Liu, Y. (2016b). A multi-peer assessment platform for programming language learning: Considering group non-consensus and personal radicalness. Interactive Learning Environments, 24 (8), 2011–2031.

Wang, X. M., Hwang, G. J., Liang, Z. Y., & Wang, H. Y. (2017). Enhancing students’ computer programming performances, critical thinking awareness and attitudes towards programming: An online peer-assessment attempt. Journal of Educational Technology & Society, 20 (4), 58–68.

Wilensky, U., & Reisman, K. (2006). Thinking like a wolf, a sheep, or a firefly: Learning biology through constructing and testing computational theories—An embodied modeling approach. Cognition and Instruction, 24 (2), 171–209.

Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49 (3), 33–35.

Xia, B. S. (2017). An in-depth analysis of learning goals in higher education: Evidence from the programming education. Journal of Learning Design, 10 (2), 25–34.

Xiao, Y., & Lucking, R. (2008). The impact of two types of peer assessment on students' performance and satisfaction within a wiki environment. The Internet and Higher Education, 11 (3–4), 186–193.

Yim, S., Warschauer, M., & Zheng, B. (2016). Google docs in the classroom: A district-wide case study. Teachers College Record, 118 (9).

Download references

Author information

Authors and affiliations.

National Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore, 637616, Singapore

Roshni Sabarinath & Choon Lang Gwendoline Quek

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Roshni Sabarinath .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1.1 Review criteria for assessing source code

Maximum marks to be granted is 100.

In case of compilation error, NO MARKS WILL BE GRANTED.

Marks will be deducted in case the coding rules are not followed.

1.1 Sample peer review comments

figure a

1.1 A survey on students’ perceptions of the online learning environment

Dear Participant.

The purpose of this survey is to capture students’ perceptions on the use of online learning environment. This survey will take about 15 min.

Your honest response will provide valuable inputs to improve on the quality of teaching and learning in this class. All responses will be kept confidential and no names will be identified in the report of findings.

To what extent do you agree with the following statements?

(5-point LIKERT scale: 1 – Strongly Disagree to 5 – Strongly Agree)

Rights and permissions

Reprints and permissions

About this article

Sabarinath, R., Quek, C.L.G. A case study investigating programming students’ peer review of codes and their perceptions of the online learning environment. Educ Inf Technol 25 , 3553–3575 (2020). https://doi.org/10.1007/s10639-020-10111-9

Download citation

Received : 15 August 2019

Accepted : 17 January 2020

Published : 05 February 2020

Issue Date : September 2020

DOI : https://doi.org/10.1007/s10639-020-10111-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Programming
  • Peer review
  • Online learning environment
  • Google apps
  • Students’ perceptions
  • Find a journal
  • Publish with us
  • Track your research

National Academies Press: OpenBook

Guide to Establishing Monitoring Programs for Travel Time Reliability (2014)

Chapter: c--case studies.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

209 C CASE STUDIES INTRODUCTION This appendix contains fi ve case studies that demonstrate approaches to the travel time reliability monitoring techniques described in the Guide. The case studies illustrate real-world examples of using a travel time reliability monitoring system (TTRMS) to quantify the effect of various infl uencing factors on the reliability of the system. The goal of each case study is to illustrate how agencies apply best practices for monitoring system deployment, travel time reliability calculation methodology, and agency use and analysis of the system. To accomplish this goal, a prototype TTRMS was implemented at each of the fi ve sites. These systems take in sensor data in real time from a variety of transportation networks, process this data inside a large data ware- house, and generate reports on travel time reliability to help agencies better operate and plan their transportation systems. Each case study consists of the following sections: • Monitoring system; • Methodological advancement; • Use case analysis; and • Lessons learned. These sections map to the master system components, as shown in Figure C.1. The remaining fi ve parts of this appendix are titled by the location of the case study demonstrations: San Diego, California; Northern Virginia; Sacramento–Lake Tahoe, California; Atlanta, Georgia; and New York/New Jersey. Figure C.2 shows the case study locations.

210 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.2. Case study locations. Map data © 2012 Google. Figure C.1. Reliability monitoring system overview, with boxes for modules and circles for inputs and outputs.

211 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Case Study 1 SAN DIEGO, CALIFORNIA This case study focused on using a mature reliability monitoring system in San Diego, California, to illustrate the state of the art for existing practice. Led by its metro politan planning organization, the San Diego Association of Governments (SANDAG), and the California Department of Transportation (Caltrans), the San Diego region has developed one of the most sophisticated regional travel time monitoring systems in the United States. This system is based on an extensive network of sensors on freeways, arterials, and transit vehicles. It includes a data warehouse and software system for calculating travel times automatically. Regional agencies use these data in sophisti- cated ways to make operations and planning decisions. Because this technical and institutional infrastructure was already in place, the team focused on generating sophisticated reliability use case analyses. The rich, multi- modal nature of the San Diego data presented numerous opportunities for state-of-the- art reliability monitoring, as well as challenges in implementing Guide methodologies on real data. The purpose of this case study was to • Assemble regimes and travel time probability density functions (TT-PDFs) from individual vehicle travel times. • Explore methods to analyze transit data from automated vehicle location (AVL) and automated passenger count (APC) equipment. • Demonstrate high-level use cases encompassing freeways, transit, and freight systems. • Relate travel time variability to the seven sources of congestion. The monitoring system section further details the reasons for selecting San Diego as a case study and gives an overview of the region. It briefly summarizes agency monitoring practices, discusses the existing travel time sensor network, and describes the software system that the team used to analyze use cases. The section also details the development of travel time reliability software systems and their relationships with other systems.

212 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Methodology is the most experimental and least site specific. It is dedicated to an ongoing investigation, spread across all five case studies, to test, refine, and implement the Bayesian travel time reliability calculation methodology outlined in Chapter 3. For this section, the team is using, as appropriate, site data and other data to investigate this approach. The goal of each case study methodology section is to advance the team’s understanding of the theoretical framework and practical implementation of the new Bayesian methodology. Use cases are less theoretical and more site specific. Their basic structure is derived from the user scenarios described in Appendix D, which were derived from the results of a series of interviews with transportation agency staff regarding agency practice with travel time reliability. Lessons learned summarizes the key findings from this case study with regard to all aspects of travel time reliability monitoring: sensor systems, software systems, cal- culation methodology, and use. These lessons learned will be integrated into the final guide for practitioners. MONITORING SYSTEM Site Overview The team selected San Diego as an exemplar of the leading edge of the state of the practice for using conventional monitoring systems within an urbanized metropolitan area. Led by SANDAG and Caltrans, the San Diego region has developed one of the most sophisticated regional travel time monitoring systems in the United States. This system is based on an extensive network of sensors on freeways, arterials, and transit vehicles. It includes a data warehouse and software system for calculating travel times automatically. Regional agencies use these data in sophisticated ways to make opera- tions and to plan decisions. The San Diego metropolitan area encompasses all of San Diego County, which is approximately 4,200 square miles and the fifth most-populous county in the United States. The county, bordered by Orange and Riverside counties to the north, Imperial County to the east, Mexico to the south, and the Pacific Ocean to the west, contains over 3 million people. Approximately 1.3 million of these people live within the City of San Diego, with the rest concentrated within the southern suburbs of Chula Vista and National City; the beachside cities of Carlsbad, Oceanside, and Encinitas; the northern, inland suburbs of Escondido and San Marcos; and the eastern suburb of El Cajon. The metropolitan area also includes significant rural areas within and to the east of the Coastal Range Mountains, with the Sonoran Desert and the Cleveland National Forest on the far eastern edge and the Anza–Borrego Desert State Park in the northeast corner of the county. The county has a large military presence, containing numerous Naval, Marine Corps, and Coast Guard stations and bases. Tourism also plays a major role in the regional economy, behind the military and manufacturing, particularly during the summer months.

213 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Over the past several years, transportation agencies operating within the San Diego region, through partnerships between SANDAG, Caltrans, local jurisdictions, transit agencies, and emergency responders, have been updating and integrating their traffic management systems, as well as developing new systems, under the concept of inte- grated corridor management (ICM). The goal of ICM is to improve system produc- tivity, accessibility, safety, and connectivity by enabling travelers to make convenient and informed shifts between corridors and modes to complete trips. The partnering agencies selected I-15 from SR-52 in San Diego to SR-78 in Escondido as the corridor along which to implement an ICM pilot project using federal ICM initiative funding. A concept of operations document for this pilot project was completed in March 2008, and San Diego was selected for the demonstration phase of the ICM initiative early in 2010. Because of this effort and others, San Diego has a sophisticated travel time moni- toring software infrastructure. Among the systems that will share data as part of the planned Integrated Corridor Management System are the Advanced Transportation Management System (ATMS), Performance Measurement System (PeMS), ramp meter information system, lane closure system, the managed lane closure and congestion pricing systems on I-15, the regional arterial management system, and the regional transit management system. Sensors Freeway District 11 of Caltrans manages San Diego’s freeway network. District 11 encompasses San Diego and Imperial counties, though only the managed portion of the freeway system in San Diego County will be considered as part of this case study. Within San Diego County, District 11 is responsible for 2,000 centerline miles of monitored freeways, 64 lane miles of which are managed high-occupancy vehicle (HOV)–high- occupancy toll lane facilities. Several major Interstates pass through the district, including I-5, which passes through many major cities on the West Coast between Mexico and Canada; I-8, which connects Southern California with I-10 in Arizona; and I-15, which connects San Diego with Las Vegas. Within the county, I-5 connects downtown San Diego with the Mexican border at Tijuana to the south and the North County beachside sub- urbs and Orange County to the north. I-8 connects the north part of the City of San Diego with El Cajon and the Southern California desert. I-15 connects downtown San Diego with the inland suburbs of Rancho Bernardo and Escondido, then travels up through the Los Angeles suburbs in Riverside County. Other major freeways include I-805, which parallels I-5 on the inland side between the Mexican border and its inter- section with I-5 between La Jolla and Del Mar. SR-163 connects I-5 in downtown San Diego with I-15 near the Marine Corps Air Station in Miramar. SR-94 links I-5 downtown with eastern suburbs, paralleling I-8 to the south. SR-78 is the major east– west freeway in North County, connecting Oceanside and Carlsbad with Escondido, and travel ing further east into the mountainous regions of the county. A map of San Diego’s freeway network is shown in Figure C.3.

214 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To monitor its freeways, District 11 has 3,592 intelligent transportations system traffic sensors deployed at 1,210 locations that collect and transmit data in real time to a central database. Of these, 2,558 sensors are in the freeway mainline lanes, 20 are in HOV lanes, and the rest are located at on-ramps, off-ramps, or interchanges. These sensors are a mixture of loop detectors and radar detectors. Approximately 90% of the intelligent transportations system detection is owned by Caltrans, with the remain- der owned by NavTeq/Traffic.com. San Diego County has had freeway detection in place since 1999, with the number of detectors steadily increasing over time. Detectors are spaced relatively frequently on major freeway facilities. Most moni- tored freeways have an average detector station spacing of between one-half mile and 1 mile. The number and average spacing of detector stations for each monitored main- line facility in the county are indicated in Table C.1. Figure C.3. San Diego freeway network. Map data © 2012 Google.

215 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.1. SAN DIEGO COUNTY FREEWAY DETECTION Freeway No. of Monitored Lane Miles No. of Detector Stations Average Spacing (mi) HOV I-5 NB 61.8 98 0.65 X I-5 SB 60.8 89 0.70 X I-8 EB 26.3 45 0.60 I-8 WB 26.3 46 0.60 I-15 NB 39.1 50 0.80 X I-15 SB 37.9 45 0.85 X I-805 NB 28.7 49 0.60 I-805 SB 28.7 46 0.60 I-905 WB 3.0 2 1.50 SR-52 EB 14.8 17 0.90 SR-52 WB 14.8 16 0.90 SR-54 EB 7.0 3 2.30 SR-54 WB 6.8 3 2.30 SR-56 EB 5.7 3 1.90 SR-56 WB 5.7 3 1.90 SR-78 EB 20.2 17 1.20 SR-78 WB 20.2 23 0.90 SR-94 EB 11.1 14 0.80 SR-94 WB 11.6 20 0.60 SR-125 NB 10.8 13 0.85 SR-125 SB 10.7 13 0.80 SR-163 NB 11.1 15 0.75 SR-163 SB 11.1 15 0.75 Note: NB, SB, EB, and WB = northbound, southbound, eastbound, and westbound, respectively; X = presence of HOV lane. District 11 also owns and maintains almost 2,000 census count stations. All of these stations report data on traffic volumes, and 20 also provide vehicle classification and weight information. The stations do not report conditions in real time, but data are obtained and input into the PeMS database via an offline batch process. In San Diego County, real-time flow, occupancy, and (at some locations) speed data are collected in the field by controller cabinets wired to the individual sensors. Data are transmitted from these controller cabinets to the Caltrans District 11 traffic management center via a front-end processor. The traffic management center’s ATMS parses the raw, binary field data from the field and writes outputs into a traffic man- agement center database. These values (measured flow and occupancy values for every 30-second time period at every detector) are then transmitted to the PeMS Oracle database in real time via the Caltrans wide-area network. PeMS performs database

216 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY routines on the data, including detector diagnostics, imputation, speed calculations, performance measure computations, and aggregation. These processing steps have been fully described in Chapter 3 of the Guide. Arterial Although San Diego’s arterial facilities are managed by the cities in which they are physically located, SANDAG assists these local agencies in implementing the regional arterial management system, a regionwide traffic signal integration system that allows for interjurisdictional management and coordination of freeway–arterial interchanges. As part of a project to evaluate technologies for monitoring arterial performance, SANDAG installed an arterial travel time monitoring system along 4 miles of Telegraph Canyon Road and Otay Lakes Road between I-805 and SR-125 in Chula Vista, a sub- urb in San Diego’s South Bay. The corridor has 18 sensor locations (nine in each direc- tion of travel). The sensors deployed along this corridor are wireless mag netometer dots that directly measure travel times by reidentifying unique vehicle magnetic sig- natures across detector locations. In order to read a vehicle’s magnetic signature, the dots need to be deployed in series of five at each location. Consequently, 90 wireless magnetometer sensors have been deployed along this corridor. After a vehicle passes over a sensor location, each set of five sensors wirelessly transmits the vehicle’s magnetic signature information to an access point on the side of the roadway. If the sensors are located further than 150 feet from the access point, a battery-operated repeater is needed to transmit the data from the sensor to the access point. The access point collects the sensor data and transmits it via Ethernet or a high- speed cellular modem to a data archive server in the traffic management center. At the traffic management center, the magnetic signatures are matched between upstream and downstream sensor stations, and travel times are computed. Transit The largest share of San Diego County’s transit service is operated by the San Diego Metropolitan Transit System (MTS). MTS operates bus and light rail service (through its subsidiary, San Diego Trolley) in 570 square miles of the urbanized area of San Diego, as well as rural parts of East County, totaling 3,420 square miles of service area. To monitor its transit fleet, MTS has equipped over one-third of its bus fleet with AVL transponders and over one-half of its fleet with APC equipment. The AVL infrastructure allows for the real-time polling of buses to obtain real-time location and schedule adherence data. The APC data are not available in real time, but they can be used for offline analysis to report on system utilization and efficiency. Data Management Freeway The primary data management software system in the region is PeMS. All Caltrans districts use PeMS for data archiving and performance measure reporting. PeMS in- tegrates with a variety of other systems to obtain traffic, incident, and other types of data. It archives raw data, filters it for quality, computes performance measures, and reports them to users through the web at various levels of spatial and temporal

217 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY granularity. It reports performance measures such as speed, delay, percentage of time spent in congestion, travel time, and travel time reliability. These performance mea- sures can be obtained for specific freeways and routes and are also aggregated up to higher spatial levels, such as county, district, and state. These flexible reporting options are supported by the PeMS web interface, which allows users to select a date range over which to view data, as well as the days of the week and times of day to be pro- cessed into performance metrics. Since PeMS has archived data for San Diego County dating back to 1999, it provides a rich and detailed source of both current travel times and historical reliability information. In Southern California, PeMS obtains volume and occupancy data for every detec- tor every 30 seconds from the Caltrans ATMS, which governs operations at the district transportation management centers (TMCs). The ATMS is used for real-time opera- tions such as automated incident detection and for handling special event traffic situ- ations. ATMS data transmitted to the PeMS Oracle database supports the majority of transportation performance measures reported by PeMS and serves as the primary source of data for the travel time system validations discussed in this case study. PeMS integrates, archives, and reports on incident data collected from two sources: the California Highway Patrol (CHP) and Caltrans. CHP reports current incidents in real time on its website. PeMS obtains the text from the website, uses algorithms to parse the accompanying information, and inserts it into the PeMS database for dis- play on a real-time map, as well as for archiving. Additionally, Caltrans maintains an incident database, called the Traffic Accident Surveillance and Analysis System (TASAS), which links to the highway database so that incidents and their locations can be analyzed. PeMS obtains and archives TASAS incident data via a batch process approximately once per year. Incident data contained in PeMS have been leveraged to demonstrate use cases associated with how different sources of congestion affect travel time reliability. PeMS also integrates data on freeway construction zones from the Caltrans lane closure system, which is used by the Caltrans districts to report all approved closures for the next seven days, plus all current closures, updated every 15 minutes. PeMS obtains this data in real time from the lane closure system, displays it on a map, and lets users run reports on lane closures by freeway, county, district, or state. Lane clo- sure data in PeMS were used in the validation of the use cases associated with how different sources of congestion affect travel time reliability. Arterial Arterial travel time systems are an emerging concept in San Diego. At present de- tection is available for arterial travel time support on one corridor in the suburb of Chula Vista. San Diego uses the Arterial Performance Measurement System (A-PeMS), an arterial extension of PeMS that collects and stores arterial data. A-PeMS receives a live feed of travel times and volume data from a server at Sensys Networks (the manufacturer of the arterial sensors deployed on this corridor) and stores them in the PeMS data base. Within PeMS, these data are integrated with information on each intersection’s signal timing, which allows for the computation of arterial performance measures. As part of the San Diego A-PeMS deployment, cycle-by-cycle timing plan

218 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY information is parsed from time-of-day signal-timing plans. A-PeMS can also integrate real-time signal-timing cycle lengths and phase green times from traffic signal control- lers. The performance reporting capabilities within A-PeMS are similar to those within PeMS. Users can view arterial-specific performance measures such as control delay and effective green time, as well as general performance measures such as travel times. Outside of the reliability and performance monitoring aspects of arterial opera- tions, the various agencies operating within San Diego County, led by SANDAG, are working toward development of a regional arterial management system. This system has relevance to Project L02 because its signal-timing plan data could eventually be used to support the widespread monitoring of travel time variability on county arteri- als. This would facilitate a greater understanding of how different arterial facilities interact with one another, with transit service, and with freeway operations. Transit District 11 also uses a transit extension of PeMS, the Transit Performance Measure- ment System, to obtain schedule, AVL, and APC data from its existing real-time transit management system; compute performance measures from these data; and aggregate and store them for further analysis. METHODOLOGICAL ADVANCES Overview One objective of the case studies was to test and refine the methods developed in Phase 2 for defining and identifying segment and route regimes for freeway and arterial networks. The team’s research to date has focused on identifying operational regimes based on individual vehicle travel times and determining how to relate these regimes to system-level information on average travel times. Since individual vehicle trip travel times on freeways are not available in the San Diego metropolitan region, data from the Berkeley Highway Laboratory (BHL) was used in this analysis. Analysis Setting and Data BHL is a 2.7-mile section of I-80 in west Berkeley and Emeryville, California. The BHL includes 14 surveillance cameras and 16 directional dual-inductive loop detector sta- tions dedicated to monitoring traffic for research purposes. The sensors are a unique resource because they provide individual vehicle measurements. The system collects in- dividual vehicle actuations from all 164 loops in the BHL every 1/60th of a second and archives both the actuation data and a large set of aggregated data, such as volumes and travel times. The loop data collection system currently generates approximately 100 megabytes of data per day. A suite of loop diagnostic tests has been developed over the last two years that continuously tests the data stream received from the loops and archives the test results. The BHL loop data are unique because they provide event data on individual vehi- cle actuations, accurate to 1/60th of a second. Most other loop detector systems collect only aggregated data over periods of 20 seconds or longer. Collecting the individual

219 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY loop actuations allows the generation of data sets that are not found elsewhere, such as vehicle stream data, which can be used for headway studies, gap analysis, and merging studies. The BHL loops also provide individual vehicle length measurements, allowing for the classification of freeway traffic. Rich data sets of individual vehicle travel times stemming from research that developed a vehicle reidentification algorithm to calcu- late travel times between successive loop stations are also available. A final benefit of the BHL data is that the corridor was temporarily instrumented with two Bluetooth reader stations (BTRs) along eastbound I-80. These stations record the time stamps and media access control (MAC) addresses of Bluetooth devices in passing vehicles. Travel times can be derived from the matching of MAC addresses between two read- ers. A map of the BTR locations is shown in Figure C.4. Analysis was performed on a day’s worth of BHL data, collected on Tuesday, November 16, 2010. One data file was obtained for each of the two BTRs, with each file containing every MAC address captured by that sensor on that day. Some MAC address IDs were repeated within the file, due to the fact that passing devices can be sampled multiple times by a single reader. Since the BTRs are located along the east- bound side of the freeway, the majority of MAC address reidentifications were for eastbound traffic, though some westbound vehicles were also captured. There was a 1-hour gap in the data between 4:30 and 5:30 a.m. due to an error in the BHL data- base. Additionally, some of the initial time stamps in the file for the midnight hour were negative, possibly due to clock error. Six files of loop detector actuation data were also obtained. Together, these files contain all of the vehicles records at all of the BHL stations on this day. Figure C.4. Bluetooth reader locations on I-80 eastbound.

220 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Methodological Use Cases Overview Five concepts are important in this analysis: 1. Regardless of the data source, the methodology must always generate a full TT- PDF. All reliability measures can be generated from the PDF. 2. There are two types of PDFs: a. Those pertaining to the distribution of travel times derived from individual travelers along a segment or route, which accounts for travel time variability (for a route or a segment) among individual travelers and over time. b. Those pertaining to the distribution of the mean travel time along a segment or route, which accounts for variations in the mean travel time (for a segment or a route) over time. 3. It is desirable (and the team believes possible) to generate individual traveler TT- PDFs directly from some data sources (e.g., Bluetooth or global positioning system [GPS] sources) and indirectly from others (e.g., loop detectors or video). 4. The TT-PDFs can be reasonably characterized by a shifted gamma distribution with parameters α, β, and δ: a. α is the shape of the density function, with α > 1 implying that it has a log- normal type shape. b. β is the spread in the density function, with larger values implying more spread. c. δ is the offset of the zero point from the value of zero, or, in this context, the smallest possible travel time. 5. A finite number of traffic states, or regimes, describe all possible TT-PDFs for a route or a segment. Regime PDFs can be continuously updated using real-time data. For use cases that serve motorists in need of traveler information, the develop- ment of reliability statistics from individual TT-PDFs is ideal. The use cases examined in this section are shown in Table C.2. They are intended to provide information on recommended trip start times for constrained trips, subject to certain arrival time per- formance criteria. In this discussion, the analysis is focused on developing the PDF of travel times for those individual travelers who depart an origin in a prespecified time interval in order to meet a prespecified arrival time at the destination within an acceptable and specified level of risk. The size of the time interval is selected in such a way as to ensure station- ary travel conditions within the interval, as well as to capture a sufficient sample of travelers to characterize or update the developed travel time distribution. It is hypothesized that the route travel time distribution can be stitched from the distribution of segment travel times that make up the route. This hypothesis is still subject to testing and validation using field data. Furthermore, it is assumed that there is a finite number of TT-PDFs (or regimes) that can fully characterize the travel time

221 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY distribution between an origin and a destination on a given route over a full year. Fig- ure C.5 illustrates an example that uses four PDFs and a transition PDF (labeled T); each cell color represents a unique travel time regime based on historical travel time data for a given O–D pair on a given route. TABLE C.2. USE CASES MC1, MC2, AND MC3 Use Case Description What Is Known? Desired Deliverable Metrics MC1 User wants to know in advance what time to leave for a trip and what route to take—planning level analysis Origin position, destination position, day of week, desired arrival time at destination A list of alternative routes, with the mean travel time and required start time on each route to ensure meeting arrival time 95% of the time Average O–D travel time by path, planning time MC2 User wants to know immediately what route to take and time to leave for a trip to arrive on time at destination—real-time analysis Origin position, destination position, desired arrival time at destination A ranked list of alternative routes, their mean travel time based on current conditions, and required start time on each route to ensure meeting arrival time 95% of the time Average O–D travel time by path, planning time MC3 User wants to know the extra time needed for a trip to arrive on time at destination with a certain probability Origin position, destination position, probability of arriving on time, day of week, time of day Map of the route with lowest travel time meeting the threshold, the route average travel time, selected percentage travel time and buffer time Buffer time, percentage travel time, average travel time for O–D pair Note: O–D = origin–destination. Hour of the Day 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 D ay o f t he W ee k M Tu W Th T T F Sa Su Figure C.5. Historical route TT-PDFs by time of day and day of the week. MD PMAM

222 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY It is further hypothesized that the individual auto travel times on links or routes can be characterized by a three-parameter (α, β, and δ) shifted gamma distribution as shown by Equation C.1: g t t e tfor , o.e.w.t, , 1β α δ δ( ) ( ) ( )= Γ − ≥α β δ α α β δ( )− − − (C.1) where t = time. For α = 1, the gamma distribution degenerates into the shifted exponential distri- bution. Figure C.6 shows a diagram of the distribution for α > 1.0. There is a unique set of distribution parameters associated with each O–D pair, route, and PDF regime. Use Case MC1: User Wants to Know in Advance What Time to Leave for a Trip and What Route to Take The procedure for validating Use Case MC1 is depicted in Figure C.7. The top-right corner represents user-driven input, such as O–D selection, desired arrival time (DAT) at destination, and possible routes to be evaluated. The top-left corner represents field data collection of travel times to develop and update offline historical TT-PDFs that follow the shifted gamma distribution described above. The bottom section of Figure C.7 represents the actual algorithm used to determine the computed user start time in order to meet the DAT criterion. The outcomes, shown in the table in Figure C.7, are consistent with the Use Case MC1 results requirement, which is to create a list of alternative routes that have a start time that allows for an on-time arrival 95% of the time where the start time is based on the average travel time. Based on this example, the entry time PDF consistent with the DAT of 8:40 a.m. is the 8:00 to 10:00 a.m. entry time. An example application of the procedure using hypothetical travel time parameter values is shown in Figure C.8. The procedure works as follows: Figure C.6. Shifted gamma distribution of travel times. Shift Parameter = δ Shape Parameter = δ Scale Parameter = β Link or Route Travel Time (t) Pr ob ab ili ty D en si ty F un cti on

223 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 1. User enters origin, destination, and a DAT of 8:40 a.m. at the destination on a Thursday. 2. The user or the system identifies (or retrieves from a route library) a finite number of routes connecting the input O–D (or nearby locations). The first route is labeled Route 1. Figure C.7. Validation process for Use Case MC1. Figure C.8. Example application of Use Case MC1.

224 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 3. The system identifies the relevant time-dependent PDF (the morning peak) consis- tent with the user-specified DAT and day of week. It represents all travel times for entry times between 8:00 and 10:00 a.m. on Thursdays. 4. Based on the retrieved PDF, achieving a 95% on-time arrival requires a planned 30-minute travel time, compared with the average travel time of 23 minutes. 5. Thus, the recommended start time (ST) is 8:10 a.m. Other DAT scenarios and outcomes are also shown in the table in Figure C.7. Use Case MC3: User Wants to Know the Extra Time Needed for a Trip to Arrive on Time at Destination with a Certain Probability This use case represents a simple variation of Use Case MC1 and is therefore dis- cussed before the real-time use case, which is MC2. Here, for a known O–D, DAT, and day of week, the user is interested in identifying a route, average travel time (AT) and planned travel time (PT) that will ensure his or her on-time arrival R% of the time. The algorithm for MC1 is adjusted slightly to meet these new requirements, as shown in Figure C.9. The hypothesized PDFs for the two candidate routes are shown in Figure C.10. These are designed to highlight the contrast between a shorter route (Route 2) and a more reliable route (Route 1). In this case, the system would recom- mend the selection of Route 1 and a departure time of no later than 8:44 a.m. to guarantee arrival at the destination by 9:40 a.m. with 90% certainty. The user would have to depart 10 minutes earlier on Route 2 to achieve the same probability of on- time arrival. This is confirmed by comparing the buffer times between the two routes. Figure C.9. Validation process for Use Case MC3.

225 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Use Case MC2: User Wants to Know Immediately What Route to Take and What Time to Leave for a Trip to Arrive on Time at a Destination This use case is different and more challenging to demonstrate than MC1 or MC3. It also represents the application with the highest utility from the driver’s perspective because it will provide real-time information on the recommended trip start time, in- cluding the effects of incidents or other events not explicitly accounted for in historical TT-PDFs. The principal issue, therefore, is how to combine the historical and real-time data streams to provide up-to-date travel time estimates and predictions based on cur- rent conditions. As an example, during major weekend road construction projects, the more accurate distribution may be the weekday morning peak profile, rather than the historical weekend TT-PDF. Several stipulations are important to note: • It is possible that there are no feasible solutions to the current user request. A departure at the earliest departure time may not guarantee the user’s DAT at the specified probability R on some or all of the feasible routes. • Although historical PDFs are important, they are not appropriate for use in a real- time context. The system must be able to detect in which PDF regime each link or route is operating, based on the real-time data stream. • The PDF regime selection process is similar to the plan selection algorithm used in many urban traffic signal control systems. Those algorithms collect traffic data (typically key link volumes and occupancies) to be matched with the signal plans most appropriate for the collected data patterns. Figure C.10. Illustration of reliable (top) and a faster average (bottom) route PDFs.

226 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • In a real-time context, when computational speed is of the essence, the num- ber of PDFs to be considered should be kept to a minimum. Each link or route could theoretically be considered to operate in four regimes: uncongested, transi- tion from uncongested to congested, congested, and transition from congested to uncongested. • The procedure for Use Case MC2 is shown in Figure C.11. It assumes that there are three feasible alternate routes, that the earliest departure time is 8:15 a.m., and that the DAT is 9:40 a.m. The system checks which of the routes is feasible and deter mines the required start time assuming average and 95th percentile travel times. In this case, Route 3 is deemed infeasible, while Routes 1 and 2 are both feasible. Route Selection Criteria An interesting byproduct of the use case analyses is the possibility of developing addi- tional route selection criteria that can account for the differential utilities of early and late arrivals. Thus far, the selection between routes has been made on the basis of the route yielding the latest trip start time while ensuring a prespecified on-time arrival probability (e.g., Route 1 in Figure C.11). Specifying different penalty functions for late and early arrivals could change the selection. Figure C.11. Validation process for Use Case MC2.

227 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Analysis of Bluetooth Travel Times To support the methodologies presented in Use Cases MC1, MC2, and MC3, Bluetooth data from the BHL were analyzed to see what could be learned about individual vehicle travel times and their PDFs. The raw data were filtered to remove MAC addresses with six or more time stamps on either reader. Contiguous time stamps from the same vehicle were averaged to obtain an estimate of when the vehicle was adjacent to the sensor. The filtering process resulted in a data set of 5,028 travel time measurements. These were filtered a second time to remove observations for which the speed between the readers was below 5 mph. This resulted in 5,012 final measurements. These travel times and speeds are plotted in Figure C.12 and Figure C.13, respectively. By inspection, three time periods of operative regimes were identified: • Free flow: 0:00:00–14:30:00 and 19:45:00–23:59:59; • Transition: 14:30:00–15:45:00 and 19:30:00–19:45:00; and • Congested: 15:45:00 –19:30:00. Figure C.12. BHL Bluetooth-measured travel times for November 16, 2010. 0 20 40 60 80 100 120 0:00:00 4:00:00 8:00:00 12:00:00 16:00:00 20:00:00 0:00:00 Tr av el T im e (s ec on ds ) Travel Time – November 16, 2010

228 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The resulting distribution of the Bluetooth data regime classifications is shown in Table C.3. TABLE C.3. BLUETOOTH DATA REGIME CLASSIFICATIONS Category Flag Observation Free flow 1 2,679 Transition 2 484 Congested 3 1,849 The data were analyzed using EasyFit software to see how different PDFs fit the data and to estimate the parameters for each density function. Tables C.4, C.5, and C.6 present the goodness-of-fit results down to the three- parameter gamma distribution [gamma (3P)], sorted by the Anderson–Darling statis- tic. Figures C.14, C.15, and C.16 show the resulting plots of the gamma (3P) density functions. The gamma (3P) fits relatively well for the free-flow and congested condi- tions. It is likely that there will be multiple transition regimes, and the gamma (3P) fit may be improved for stratified transition regimes. Later case studies determined that the distributions were too complex to be adequately characterized by gamma or similar distributions. A nonparametric representation of the distribution is generally recommended. Figure C.13. BHL Bluetooth-measured speeds for November 16, 2010. 0 10 20 30 40 50 60 70 80 90 0:00:00 4:00:00 8:00:00 12:00:00 16:00:00 20:00:00 0:00:00 Sp ee d (M PH ) Speed – November 16, 2010

229 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.4. GOODNESS-OF-FIT RESULTS FOR FREE-FLOW REGIME Distribution Kolmogorov– Smirnov Anderson– Darling Chi-Squared Statistic Rank Statistic Rank Statistic Rank Pearson 5(3P) 0.01352 1 1.0731 1 6.7917 1 Pearson 6 (4P) 0.01377 2 1.0875 2 7.2844 2 Dagum 0.01795 4 1.4442 3 13.684 3 Burr (4P) 0.02118 6 2.1326 4 22.291 5 Generalized logistic 0.01975 5 2.3254 5 23.294 6 Log logistic (3P) 0.02309 9 2.4624 6 25.444 8 Frechet (3P) 0.02139 7 2.726 7 23.419 7 Generalized extreme value 0.02174 8 2.9185 8 27.343 9 Burr 0.02749 11 3.748 9 30.88 10 Lognormal (3P) 0.0172 3 5.798 10 16.413 4 Frechet 0.03534 15 7.4445 11 44.274 13 Generalized gamma (4P) 0.02908 12 11.258 12 51.262 14 Inverse gaussian (3P) 0.03043 13 11.749 13 36.427 11 Fatigue life (3P) 0.03055 14 11.915 14 38.129 12 Log logistic 0.04617 19 12.611 15 117.0 18 Pearson 5 0.03864 17 13.959 16 69.484 15 Gamma (3P) 0.03686 16 18.252 17 84.176 16 Note: 3P and 4P = three parameter and four parameter, respectively. Figure C.14. Three-parameter gamma distribution for free-flow regime. Gamma (3P) α =5.6131 β =1.5248 γ =13.252 20 40 60 80 100 120 140 160 180 200 Gamma (3P)Histogram 0.24 0.36 0.32 0.28 0.20 0.08 0.12 0.16 0 0.04 0.40 0.44 0.48 f(x ) Probability Density Function X

230 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.5. GOODNESS-OF-FIT RESULTS FOR TRANSITION REGIME Distribution Kolmogorov– Smirnov Anderson– Darling Chi-Squared Statistic Rank Statistic Rank Statistic Rank Burr 0.02676 1 0.81555 1 16.645 1 Burr (4P) 0.02709 2 0.82053 2 19.224 2 Johnson SU 0.03208 4 0.95065 3 22.543 10 Dagum (4P) 0.03373 6 0.97157 4 21.446 6 Dagum 0.03512 7 1.0092 5 21.472 7 Generalized extreme value 0.03317 5 1.0168 6 22.294 8 Frechet 0.02965 3 1.058 7 20.188 5 Frechet (3P) 0.03808 10 1.1188 8 22.351 9 Log logistic (3P) 0.03514 8 1.1732 9 19.289 3 Generalized logistic 0.03904 12 1.2923 10 19.714 4 Pearson 5 (3P) 0.04297 13 1.3807 11 23.961 11 Pearson 6 (4P) 0.04478 14 1.5089 12 25.479 12 Lognormal (3P) 0.05205 15 2.0513 13 29.111 13 Inverse Gaussian (3P) 0.05956 16 2.5343 14 33.83 14 Fatigue life (3P) 0.06274 18 2.8342 15 37.124 16 Generalized gamma (4P) 0.06117 17 3.0654 16 36.792 15 Log Pearson 3 0.03856 11 5.3555 17 na Pearson 5 0.08043 21 5.4889 18 47.002 17 Wakeby 0.03568 9 5.4998 19 na Gamma (3P) 0.08052 22 5.6067 20 53.894 20 Note: na = not available. Figure C.15. Three-parameter gamma distribution for transition regime. Gamma (3P) α=3.2116 β=4.9403 γ=14.826 0.06 0.12 0.10 0.08 0.04 0 0.02 0.14 0.16 0.18 f(x ) 20 40 60 70 80 90 100 110 120 13030 50 Probability Density Function Gamma (3P)Histogram X

231 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.6. GOODNESS-OF-FIT RESULTS FOR CONGESTED REGIME Distribution Kolmogorov– Smirnov Anderson– Darling Chi-Squared Statistic Rank Statistic Rank Statistic Rank Fatigue life (3P) 0.02266 3 0.9031 1 13.229 4 Inverse Gaussian (3P) 0.02297 4 0.94129 2 13.141 3 Gamma (3P) 0.02734 10 1.0359 3 13.408 5 Figure C.16. Three-parameter gamma distribution for congested regime. Gamma (3P) α=3.5995 β=8.8466 γ=18.318 24 32 40 48 56 64 72 96 104 1128880 0.12 0.18 0.16 0.14 0.10 0.04 0.06 0.08 0 0.02 0.20 0.22 0.26 f( x) 0.24 Probability Density Function Gamma (3P)Histogram X The three PDFs are superimposed in Figure C.17. It is apparent that the free-flow PDF has a lower mean travel time, a smaller standard deviation, and the lowest 95th percentile value. The congested PDF is at the other end of this extreme, with the larg- est mean, the largest standard deviation, and the highest 95th percentile value. Not unexpectedly, the PDF for the transition regime lies between these two. The numerical values are presented in Table C.7. TABLE C.7. THREE-PARAMETER GAMMA DISTRIBUTION MEANS, STANDARD DEVIATIONS, AND 95TH PERCENTILES Condition Mean (s) SD (s) 95th Percentile (s) Uncongested 21.8 3.57 28.3 Transition 30.4 9.11 47.7 Peak 50.0 17.0 83.5

232 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Conclusions This analysis examined BHL data to see if operative regimes for individual vehicle travel times can be identified from Bluetooth data. The research team concluded that this can, indeed, be done. Based on more than 5,000 observations of individual travel times, three regimes were identified: (a) off peak or uncongested, (b) peak or congested, and (c) transition between congested and uncongested. All three can be characterized by three-parameter gamma density functions, although later case studies determined that a nonparametric representation is generally better than gamma or other simple density functions. The PDF for the free-flow condition has the lowest mean, the small- est standard deviation, and the lowest 95th percentile; the congested PDF is at the other extreme; and the transition PDF is in between. Further investigation is needed into the individual vehicle PDFs and the parameters that describe them, but the efficacy of the concepts seems sound. Two issues that need to be explored in the near future are (a) how the PDFs for individual vehicle travel times relate to mean travel times (e.g., those computed from loop detectors) during the same time periods and (b) whether there are ways to retrieve information from loop detectors that would help to infer the PDFs that describe individual vehicle travel times. 0 0.02 0.04 0.06 0.08 0.1 0.12 0 20 40 60 80 100 120 f(T ra ve l T im e) Travel Time (seconds) Fitted Probability Density Functions by Regime Free Flow Transition Congested 95th 28.3 Figure C.17. Three-parameter gamma distributions for three regimes.

233 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY USE CASE ANALYSIS Overview Chapter 4 of the Guide and Appendix D: Use Case Analyses present dozens of use cases intended to satisfy the myriad ways that different classes of users can derive value from a reliability monitoring system. For the San Diego case study, various use cases were combined to form six high-level use cases that broadly encompass the types of reliability information that users are most interested in and that were suited for valida- tion using the San Diego data sources. These six use cases, their primary user groups, and the Guide use cases that they encompass are shown in Table C.8. TABLE C.8. DEMONSTRATED USE CASES IN SAN DIEGO Use Case Primary Users Guide Subuse Cases Freeways Conducting offline analysis on the relationship between travel time variability and the seven sources of congestion Planners and roadway managers MC4, PE1, PE2, PE3, PE4, PE5, PE11, PP1 Using planning-based reliability tools to determine departure time and travel time for a trip Motorists MC1, MC2, MC3 Combining real-time and historical data to predict travel times in real time Operations managers MM1, MM2, MC5 Transit Using planning-based reliability tools to determine departure time and travel time for a trip Transit riders TP1, TS2, TO2, TC4 Conducting offline analysis on the relationship between travel time variability and the seven sources of congestion Transit planners and managers PE1, PE2, PE3, PE4, PE5, PE11, PP1 Freight Using historical data to evaluate freight travel time reliability Drivers and freight carriers FP1, FP3, FP4, FP6 Each of the three following subsections presents the analytical results of validating the use cases for freeways, transit, and freight with reliability monitoring system data and methods. Freeways Use Case 1: Conducting Offline Analysis on the Relationship Between Travel Time Variability and the Seven Sources of Congestion Summary This use case aims to quantify the impacts on travel time variability of the seven sources of congestion: incidents, weather, work zones, fluctuations in demand, special events, traffic control devices, and inadequate base capacity. To perform this analysis,

234 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY methods were developed to create TT-PDFs from large data sets of travel times that occurred under each event condition. From these PDFs, summary metrics such as the median travel time and planning travel time were computed to show the variability impacts of each event condition. Users This use case has broad applications to different user groups. For planners, knowing the relative contributions of the different sources of congestion toward travel time reliability helps to better prioritize travel time variability mitigation measures on a facility-specific basis. For example, if unreliability on a particular route is predomi- nantly caused by the frequent occurrence of incidents, planners may want to consider measures such as freeway service patrol tow truck deployments to help clear incidents faster. If unreliability on a particular route has a high contribution from special event traffic impacts, planners may want to consider providing better traveler information before events to inform travelers of alternate routes. The outputs of this use case are also of value to operators, providing them with information on the range of operating conditions that can be expected on a route given certain source conditions. Knowing the historical impacts of the different sources of congestion helps operators better manage similar conditions in real time by, for exam- ple, changing ramp-metering schemes to mitigate congestion or posting expected travel times on variable message signs. It is important for operators to have outputs from this use case at a time-of-day level. For example, on some facilities, incidents may signifi- cantly affect reliability during one or more peak hours, but may have little impact dur- ing the midday due to lower baseline traffic volumes. On some facilities, weather may have a major impact at all times of the day, since all vehicles may need to slow to safely travel in adverse weather conditions. Understanding the time dependency of variability impacts would help operators more effectively manage events as they occur. Finally, the outputs of this use case have value to travelers by providing better predictive travel times under certain event conditions that could be posted in real time on variable message signs or traveler information websites. This information would help users better know what to expect during their trip, both during normal operating conditions and when an external event is occurring. Sites Two routes were selected for the evaluation of this use case to highlight the varying contributions of congestion factors to travel time reliability across different facilities, days of the week, and times of the year. These routes are shown in Figure C.18. The first route analyzed is a 10-mile stretch of westbound I-8 beginning at Lake Murray Boulevard in the eastern suburb of La Mesa and ending at I-5 north of the San Diego International Airport. This route was selected because it provides access to Qualcomm Stadium, located at the major interchange of I-8 and I-15, which hosts San Diego Chargers football games, as well as college football bowl games, concerts, and other events. Because this route is a major commute route, the impacts of the sources on

235 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY travel time variability were investigated for weekdays between the months of Novem- ber and February (when Qualcomm Stadium regularly hosts events and when San Diego experiences the most inclement weather). The second route is a 27-mile stretch of northbound I-5 beginning just south of the I-805 interchange in San Diego and ending north of SR-78 in the northern suburb of Oceanside. This route was selected because it has a significant amount of congestion and incidents, and it sees special event traffic impacts during the summer months due to the San Diego County Fair and Del Mar horse races. The route also has significant traffic congestion on weekends. For this reason, travel time variability and its relation- ship with the sources of congestion were evaluated over a year-long period on Satur- days and Sundays. Methods These routes were analyzed to determine the travel time variability impacts caused by five sources of congestion: incidents, weather, special events, lane closures, and fluc- tuations in demand. Traffic control contributions were not investigated because ramp- metering location and timing data could not be obtained. The impacts of inadequate base capacity were also not considered due to the difficulty of quantifying this factor. For each route, 5-minute travel times were gathered from PeMS for each day in the time period of analysis (four months of weekdays for the westbound I-8 route and one year of weekends for the northbound I-5 route). To ensure data quality, 5-minute travel times computed from more than 20% imputed data were discarded from the data set. Figure C.18. Freeway Use Case 1 routes. Map data © 2012 Google. Westbound I-8 Northbound I-5

236 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To link travel times with the source condition active during their measurement, each 5-minute travel time was tagged with one of the following sources: baseline, incident, weather, special event, lane closure, or high demand. A travel time reliability monitoring system (TTRMS) that supports this use case would ideally integrate data on external sources of freeway congestion such as incidents, weather, lane closures, special events, and demand levels. The PeMS operational in San Diego integrates state- wide TASAS incident data and statewide lane closure data from Caltrans. PeMS also reports peak-period vehicle miles traveled (VMT) data for freeway routes. These PeMS data were used to evaluate the relationship between travel time variability and inci- dents, lane closures, and fluctuations in demand. Hourly weather data from the Auto- mated Weather Observing System station at the San Diego International Airport was obtained from the National Data Center of the National Oceanic and Atmospheric Administration. Special event data were collated manually from various sports and event calendars for venues adjacent to the study routes. Travel times were tagged as follows: • Baseline. A travel time was tagged with baseline if none of the factors was active during that 5-minute time period. • Incident. A travel time was tagged with incident if an incident was active anywhere on the route during that 5-minute time period. Incident start times and durations reported through PeMS were used to determine when incidents were active along the route. Incidents with durations shorter than 15 minutes were not considered. • Weather. A travel time was tagged with weather if the weather station used for data collection reported precipitation during that hour. • Special event. A travel time was tagged with special event if a special event was active at a venue along the route during that time period. Special event time periods were determined from the start time of the event and the expected duration of that event type. For example, if a football game at Qualcomm Stadium had a start time of 6:00 p.m. and was scheduled to end around 9:00 p.m., the event was consid- ered active between 4:00 and 6:00 p.m. and between 8:30 and 10:00 p.m., as this is when the majority of traffic would be accessing the freeways surrounding the venue. • Lane closure. A travel time was tagged with lane closure if a lane closure (sched- uled or emergency) was active anywhere along the route during that time period. • High demand. A travel time was tagged with high demand if the VMT measured during that time period were more than 10% higher than the average VMT for that time period. This approach was adapted from the SHRP 2 L03 project, which considered high demand to be any time period during which demand was 5% higher than the average for that time period. For the L02 research effort a 10% increase was selected because a 5% increase in demand had no measurable impact on travel times on either of the selected corridors.

237 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY During a few time periods within each data set more than one factor was active during a single 5-minute period. In these cases, the travel time was tagged with the factor that was deemed to have the larger travel time impact (e.g., when an incident coincided with light precipitation, the travel time was tagged with incident). Tagged travel times were divided into categories based on the time of the day, because the impacts of the congestion sources are time dependent. For the westbound I-8 route, which was analyzed for weekdays, two time periods were evaluated: morn- ing peak, 7:00 to 9:00 a.m.; and afternoon peak, 4:00 to 8:00 p.m. For the northbound I-5 route, which was analyzed for weekends, two time periods were also evaluated: morning, 8:00 a.m. to noon; and afternoon, noon to 9:00 p.m. Finally, within each time period, TT-PDFs were assembled separately for all travel times and for those occurring during each source condition. The PDFs were plotted and summarized in various ways to give a thorough description of how the sources of congestion affect travel time variability and conditions on a route. Route 1 Results For the westbound I-8 route, travel time variability and its contributing factors were investigated for weekdays during the 4-month period between November 2008 and February 2009. Data on incidents, weather, lane closures, special events, and fluctua- tions in demand were collected from PeMS and external sources as described in the section on methods. Due to the preference of scheduling freeway lane closures dur- ing overnight, weekend hours, no lane closures were active on the route during the selected hours and date range. As a result, the contribution of lane closures to travel time variability on this route is zero. Analysis of VMT for the demand fluctuations component showed that demand is very steady and consistent on this corridor. Only three days were identified as having a demand level not otherwise attributable to a special event that exceeded 10% of the average weekday demand level. All of these hours of high demand were during the 4-month period. Morning Peak. Figure C.19 illustrates the distribution of 5-minute travel times in the morning period (7:00 to 9:00 a.m.), divided by source condition. The morning period is the peak period for commute traffic on this route, since it begins in the east- ern suburbs and terminates near downtown San Diego. As such, it is the time period with the most travel time variability. As shown by the plot, there is a wide distribution of travel times during the morning hours, ranging from approximately 8.5 minutes free flow to 25 minutes at a maximum, a travel time measured when there was an inci- dent. The only source conditions active during the weekday morning period over the 4-month study period were incidents and precipitation. No special events or hours of high demand were noted. The histogram shows that almost 25% of the time, the travel time is a near-free-flow 9 minutes. The travel time only falls below 9 minutes when there is no external source of congestion active. The tail end of the travel time distribu- tion, however, is dominated by weather and incident events. In particular, travel times ranging between 15 and 20 minutes (or double the free-flow travel time) only occur when either an incident or a weather event is active. Travel times greater than 20 min- utes only occur when there is an incident on the route.

238 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Interestingly, it is apparent from this graph that sometimes, even when an incident is active, the travel time falls below 10 minutes. This is likely due to the fact that this analysis does not account for the severity of incidents in the travel time tagging pro- cess. The incident travel times shown in this figure that are near the median are likely minor incidents that were promptly moved to the shoulder and then cleared. Another way of viewing the travel time reliability impacts of different sources is to plot the TT-PDFs under each source condition. TT-PDFs for the baseline, incident, and weather conditions are shown in Figure C.20. The PDFs shown in this use case were assembled using nonparametric kernel density estimation. As the baseline PDF plot shows, the distribution of travel times is very small when there is no external congestion source active on the corridor; there is only a 2-minute difference between the median travel time and the 95th percentile travel time in this case. When an incident is active Figure C.19. Weekday morning peak distribution of travel times for westbound I-8. 47 2014.04.23 10 L02 Guide Appendix C Part 2_final for composition.docx Figure C.19. Weekday morning peak distribution of travel times for westbound I-8. [Insert Figure C.20] [caption]

239 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY on the corridor, the distribution of travel times is much wider. An incident increases the median travel time on the facility by 2 minutes over the baseline condition and, with a 95th percentile travel time of 18.7 minutes, requires travelers to add a buffer time of 9.8 minutes, almost doubling their typical commute, to arrive on time during an incident. A weather event increases the median travel time even higher, to 15 minutes, resulting in a buffer time comparable to that caused by an incident. A final way of summarizing this analysis is shown in Table C.9, which lists the per- centage of time that each source condition was active when travel times exceeded the 85th percentile travel time (10.6 minutes) and the 95th percentile travel time (15.0 min- utes). As shown in the table, each of the three source conditions (baseline, incidents, and weather) occurred approximately one-third of the time that travel times exceeded the 85th percentile. For travel times that exceeded the 95th percentile, weather was responsible for the largest share, followed closely by incidents. When the travel time exceeds the 95th percentile during the morning period on this facility, there is almost always some type of causal condition active on the roadway. Figure C.20. Weekday morning peak TT-PDFs for westbound I-8.

240 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.9. WEEKDAY MORNING PEAK TRAVEL TIME VARIABILITY CAUSALITY FOR WEST- BOUND I-8 Source Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 37.7% 3.3% Incident 31.2% 41.1% Weather 30.6% 55.6% The conclusions that can be made from the morning time period analysis are that weather almost always slows down travel times significantly. Travelers need to plan to more than double their travel time over the typical condition when it is raining on this route. Incidents have a wider range of impacts on the corridor, depending on their severity. At the 95th percentile level, incidents increase travel times by almost 10 min- utes over the median condition. Given no incidents or weather on this route, travelers can expect to see a travel time less than the 14.5-minute 95th percentile. Thus, when no nonrecurrent sources of congestion are active, travelers need only add a buffer time of 5.5 minutes to arrive at their destination on time. Afternoon Peak. The same analysis was also conducted for the afternoon peak period. The travel time variability source analysis for the afternoon period includes two factors that were not active during the morning: special events and high demand. There were three special events active on this corridor over the study period: one San Diego Chargers Monday Night Football game and two college football games. All three events took place at Qualcomm Stadium. In addition, there were three time periods over the study date range that experienced greater than 1.1 times the normal demand level that were unrelated to special events. The breakdown of travel times by source is shown in Figure C.21. Since the majority of traffic on this route commutes during the morning time period, the distribution of travel times during the afternoon period is small: there is a difference of only 0.7 minutes between the median travel time and the 95th percentile travel time. Travel times exceeding the 95th percentile have contributions from multiple factors. Travel times between 10 and 12 minutes appear to be predominately caused by precipitation. Travel times exceeding 12 min- utes appear to be caused by incidents or special events. The travel times measured dur- ing high-demand time periods do not vary significantly from the median travel time. Figure C.22 shows the different PDFs for the five source conditions active during the afternoon period over the 4 months. At a glance, it is clear that the baseline and high-demand event conditions have very tight, similarly shaped distributions, with less than a minute difference between the median and 95th percentile travel times. The lack of variability impacts of high demand is likely because the baseline volume is low enough during this time period that increasing it by 10% has minimal traffic impacts. Although special events are rare on weekdays on this route, they can have a significant travel time impact when they do occur. The large difference between the median special event travel time and the 95th percentile special event travel time is likely due to the uncertainty of determining when the special event’s travel time impacts would occur

241 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY during the data tagging process. The 16.2-minute 95th percentile travel time likely represents the short time period when the majority of people are trying to access the special event venue, and the faster special event travel times are likely from the periods before the events start, when attendees are just beginning to trickle in. The impacts of incidents during the afternoon period are similar to those in the morning period, though the travel time variability impact of incidents is larger during the heavier morn- ing commute. The PDF for the weather condition is of a different shape and a smaller distribution than it is for the other two time periods. This is possibly due to smaller amounts of precipitation in the afternoon period during the data collection process. Figure C.21. Weekday afternoon peak distribution of travel times for westbound I-8.

242 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Table C.10 summarizes the contribution of each source condition to travel times exceeding the 85th percentile (8.9 minutes) and the 95th percentile (9.2 minutes). The 85th percentile travel time is close to the median travel time, so there are many cases when the travel time exceeds the 85th percentile but no causal source is occurring. However, when travel times exceed the 95th percentile, there is a weather event 50% of the time and an incident 30% of the time. The contribution of the other factors to high travel times is low because of their infrequency. Figure C.22. Weekday afternoon peak TT-PDFs for westbound I-8.

243 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.10. TRAVEL TIME VARIABILITY CAUSALITY DURING AFTERNOON PEAK FOR WESTBOUND I-8 Source Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 59.7% 15.2% Incident 13.4% 29.8% Weather 22.4% 50% Special event 3.6% 4.5% High demand 1.0% 0.6% Synthesis. From a planning and operational standpoint, the only room for reliabil- ity improvement on this route exists during the morning period, as this is the only time period when substantial travel time variability exists. Although little can likely be done to reduce the variability caused by weather, focusing on better incident response or incident reduction methods could reduce the overall variability on the facility, which now requires travelers to add a buffer time of 5.6 minutes (63%) to their morning commute to consistently arrive on time. In the other two time periods, travel time variability is minimal, and the travel time impact of incidents is less severe than in the morning. From a traveler’s perspective, this analysis provides insight into the range of condi- tions that can be expected given certain events. For instance, weather appears to slow down travel times across all time periods. It may prove useful to provide information to travelers on the travel times that they can expect to experience during rainy condi- tions, so that they can appropriately plan for an on-time arrival or defer a trip until conditions improve. In addition, special events, when they occur, cause travel times to more than double on this route. In these instances, operators may want to consider providing information for alternate routes so that through travelers can avoid the event-based congestion. Route 2 Results For the northbound I-5 route, travel time variability and its contributing factors were investigated for weekends during the entire year of 2009. Data on incidents, weather, lane closures, special events, and demand fluctuations were collected from PeMS and external sources as described in the section on methods. Due to the preference of scheduling freeway lane closures during overnight, weekend hours, no lane closures were active on the route during the selected hours and date range. As a result, the contribution of lane closures to travel time variability on this route is zero. The contri- butions of the factors to travel time variability were investigated for two time periods that corresponded to observed traffic patterns on the facility: morning, 8:00 a.m. to noon; and afternoon, noon to 9:00 p.m. Morning Peak. Figure C.23 shows the distribution of travel times during the week- end morning hours on northbound I-5. There is very little spread in the travel times measured on this corridor during the morning period: only a difference of 1 minute

244 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY between the median and 95th percentile travel times. The travel times exceeding the 95th percentile predominantly occurred under incident and weather conditions. There were a number of high-demand time periods on this corridor when VMT exceeded 1.1 times the average VMT for weekend mornings. This situation was most likely during the summer months as a result of increased beach traffic. However, travel times during high-demand time periods never exceeded the 95th percentile, so the demand increases in the morning were typically not significant enough to cause severe congestion. There were no special events recorded during the morning hours of the study period. Figure C.24 illustrates the TT-PDFs that were assembled for each source condi- tion. The baseline and weather PDFs have a small distribution. The lack of travel time variability during weather conditions is likely related to the fact that there were only a few weekend days of precipitation over the study year, and the precipitation was Figure C.23. Weekend morning distribution of travel times for northbound I-5.

245 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.24. Weekend morning TT-PDFs for northbound I-5. relatively light during those days. The high-demand PDF has a longer tail, showing that enough demand can cause slower travel times on this facility. Incidents appear to have the biggest impact on travel time variability during the morning hours, requiring motorists to add a buffer time of 8.5 minutes to the typical travel time. Finally, Table C.11 summarizes which source conditions were active when travel times exceeded the 85th and 95th percentile travel times on this route. Although the high percentages for the baseline condition indicate that the sources of congestion cannot explain much of the variability, the variability on this route is very low. It is conceivable that a number of travel times that would be considered typical for the cor- ridor fall outside the 95th percentile threshold. The results of the weekend morning analysis show that travel conditions remain relatively uniform throughout the year, though some variability is caused by incidents and rare levels of high demand.

246 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.11. WEEKEND MORNING TRAVEL TIME VARIABILITY CAUSALITY FOR NORTH- BOUND I-5 Source Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 79.5% 64.2% Incident 11.3% 20.9% Weather 0.4% 0.1% High demand 6.5% 8.8% Afternoon Peak. Figure C.25 shows the distribution of travel times by source con- dition during weekend afternoons and evenings on northbound I-5. Compared with the morning travel time distribution, the afternoon travel time distribution has a sig- nificantly longer tail, with travel times ranging from 23.5 minutes free flow to over 70 minutes, which occurred during a special event. Travel times exceed the 95th percentile travel time under various source conditions, in particular, during incidents and special events. The special events considered in this analysis were the San Diego County Fair and the Del Mar horse races. Both events are active on multiple days during the sum- mertime and are known to have major impacts on corridor traffic. Figure C.26 illustrates the different TT-PDFs assembled for the various source conditions that occurred on weekend afternoons on this study corridor. Similar to the morning time period, the PDFs for the baseline condition and the weather condition show little travel time variability. The weather events recorded over the study period were very minor, which might explain the difference in weather variability impacts between this corridor and the westbound I-8 corridor analyzed above. High demand unrelated to any specific special event has the potential to increase travel times, but only in extreme circumstances; the typical demand fluctuations on the corridor incur only minor variability impacts. The sources that cause the most travel time variability are incidents and special events. The median travel time during an incident is 3 minutes higher than the normal median travel time, and it can be almost double the free-flow travel time at the 95th percentile level. On this corridor, special events are the source that has the potential to cause the highest travel time variability. Though they are relatively infrequent in that they are concentrated in the summer months, the median travel time during a special event requires an additional travel time of 15 minutes, a 64% increase over the ordinary median travel time. The 95th percentile travel time during a special event requires a buffer time of 45 minutes over the normal median travel time, requiring travelers to almost triple their typical travel time during this time period. Finally, Table C.12 summarizes which source conditions were active when travel times exceeded the 85th and 95th percentile travel times on the route. Incidents and special events appear to be responsible for the majority of travel times that exceed the 95th percentile.

247 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.25. Weekend afternoon distribution of travel times for northbound I-5.

248 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.12. WEEKEND AFTERNOON TRAVEL TIME VARIABILITY CAUSALITY FOR NORTH- BOUND I-5 Source Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 51.4% 20.2% Incidents 29.1% 48.2% Weather 0.0% 0.0% Special events 8.8% 25.3% High demand 10.8% 6.3% Figure C.26. Weekend afternoon TT-PDFs for northbound I-5.

249 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Synthesis. The morning weekend travel time variability on the corridor is very minor, leaving little room for improvement from planning or operational interven- tions. The afternoon period, however, has significant travel time variability. This vari- ability is predominantly caused by incidents throughout the year and by high demand and special events during the summer months. Because special events can cause such extraordinary travel time variability (causing travel times to double or triple the typi- cal travel time on the route), traveler information during these event time periods is key. Diverting travelers whose destination is not the event to alternate routes, or encouraging them to travel when the event is not active, could help mitigate the vari- ability caused by these events. Conclusion. This use case analysis illustrates one potential method for linking travel time variability with the sources of congestion. The methods used are relatively simple to perform with data that are generally available, either from the TTRMS or from external sources. The application of the methodology to the two study corridors in San Diego reveals key insights into how this type of analysis should be performed. To ensure that sufficient travel time samples within each source category are being captured, this analysis should be performed on no less than three months’ worth of data. It also should be performed separately for different days of the week, depending on the local traffic patterns. For example, the magnitude of the contributions of the sources to travel time variability on the northbound I-5 study corridor would likely be very different on weekday afternoons, when the corridor serves commuters, than on weekend afternoons, when the corridor serves recreational and event traffic. Addition- ally, it is important to consider the seasonal dependence of the congestion factors when selecting the time period for analysis and when reviewing the analyses. For example, weather was shown to be a large contributing factor to travel time variability on the westbound I-8 corridor because the study period was November through February. If the analysis period were over the summer, the contribution of weather to travel time variability on this corridor would be nearly zero, as San Diego receives virtually no precipitation outside of winter. Finally, the contributions of the sources should be ana- lyzed separately by time of the day in a manner consistent with local traffic patterns. For example, although incidents had a major impact on the median travel time and the planning time during the morning commute period on the westbound I-8 study corridor, they had little impact on variability during other parts of the day. Elucidating the time dependence of the factors is critical to providing outputs that can be used by planners and engineers to improve the reliability of their facilities. Use Case 2: Using Planning-Based Reliability Tools to Determine Departure Time and Travel Time for a Trip Summary The purpose of this use case is to demonstrate how a reliability monitoring system can help travelers better plan for trips of varying levels of time sensitivity. Currently, most traveler information systems that report travel times to end users focus solely on the average travel time, and give users little insight into the variability of their travel route. Although this may be fine for trips with a flexible arrival time, it is less useful for trips

250 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY for which the traveler must arrive at the destination at or before a specified time (such as a typical morning commute to work). This use case demonstrates how a reliability monitoring system can provide information both on the average expected travel time and the worst-case planning travel time so that users can choose a departure time com- mensurate with their need for an on-time arrival. It also helps users choose between alternate routes; one route may offer a faster average travel time, but it may have more travel time variability than a parallel route that is slower on average but has more consistent travel times. Users This use case is of most value to travelers who are the end consumers of information that informs on the average and planning travel times for alternate routes between selected origins and destinations. The analysis behind this use case is also of value to operators, who can post estimated average and planning travel times throughout the day on variable message signs to help travelers on the road choose between different routes based on their need for an on-time arrival. Scope The use case demonstrated in this section is broad and could provide a range of travel time reliability metrics to end users in a variety of formats. To narrow the scope of this use case for validation purposes, this section explores the following specific use case: The user wants to view, for alternate routes, the latest departure times needed to arrive at a destination at 5:30 p.m. on a Friday both on average and to guarantee on-time arrival 95% of the time. This definition means that the system needs to provide, for each alternate route, the median travel time and planning time for trips traveling between 5:00 and 5:30 p.m. on Fridays. It is envisioned that this use case involves the traveler’s using the mon- itoring system for information in advance of a trip, likely from a computer, although other applications and dissemination methods are possible. Site Three alternate routes, which travel from just south of the I-5/I-805 diverge near La Jolla and Del Mar to the U.S. Naval Base in National City, south of downtown San Diego, are studied in this use case. Figure C.27 shows the three routes. Route 1 is approximately 17 miles long and travels only along southbound I-5. Route 2 is approximately 16 miles long and travels along southbound I-805, southbound I-15, and southbound I-5. Route 3 is also 16 miles long and travels along southbound I-805, southbound SR-163, and southbound I-5. Methods The state of the practice for the few agencies who report travel time reliability metrics through their traveler information systems is to compute them from TT-PDFs assembled based on the time of day and day of the week of the trip for which information is being requested. For example, to give a user the average and 95th percentile travel times for a Wednesday afternoon trip departing at 5:30 p.m., the system might obtain all of the travel times for trips that departed between 5:15 and 5:45 p.m. for the past 10 weekdays.

251 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.27 Freeway Use Case 2 alternate routes. Map data © 2012 Google. The time-of-day and day-of-week approach to travel time reliability is valid and is used to demonstrate multiple use cases at the San Diego site. However, this use case evaluation incorporates the work that the research team has conducted into cat- egorizing a route’s historical and current performance into regimes and assembling TT-PDFs based on similar regime designations. Regimes are a way of categorizing travel times based on the prevailing operating condition when the travel time was measured. Regimes can be considered an extension of the time-of-day approach to reli- ability; on most corridors, regimes typically have a strong relationship with the time of day of travel. For example, a route that travels from a suburb to a downtown area may have four operating regimes on weekdays: (1) a severely congested regime during the morning peak, (2) a mildly congested regime during the midday period, (3) a mod- erately congested regime during the afternoon peak, and (4) a free-flow regime that occurs during the middle of the night. There may also be transitional regimes that are observed when a route switches from congested to uncongested. Weekends may only have a free-flow regime and a slightly congested regime. An example regime assign- ment for a route that has five weekday regimes and two weekend regimes is shown in Figure C.28. As the figure shows, regimes are closely related to the time of day, but they help capture the variability in operating conditions that occur across different

252 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY days of the week, as well as show the similarity in operating conditions across certain hours of the day. Regime assignment is addressed in the discussion of methodological advances in this case study, and the team is further refining its regime assignment methodologies. In this use case validation, each route is assigned a regime for each 5-minute time period of each day of the week. Routes are categorized into one of four regimes (free flow, slightly congested, moderately congested, and severely congested) based on the ratio of the average travel time during the time period to the free-flow travel time, otherwise known as the travel time index (TTI). This metric was selected for regime identifica- tion because it is travel time–based and it groups sets of travel times based on similar baseline operating conditions and levels of congestion, rather than a strictly time of day–based categorization. After the regime identification process, travel times are assembled into regime- based PDFs based on the time of day and day of week of the traveler’s request for trip information. From these PDFs, average travel times and planning times are computed and used to generate required departure times for each route based on the time sensi- tivity of the trip. Validation The validation consists of three steps: regime identification, PDF generation, and user output. Regime Identification. In this use case, the TT-PDFs used to calculate reliability metrics for alternate routes are assembled based on regime conditions. In a TTRMS, this regime assignment step would be done before the user makes the request for travel time information for alternate routes. For the three alternate routes, regime assign- ments were made for each day-of-the-week type according to local traffic patterns. The five day-of-the-week types selected for separate regime classifications were Monday; midweek days (Tuesday, Wednesday, Thursday); Friday; Saturday; and Sunday. Each Hour 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 D ay o f W ee k M T T T T Tu T T T T W T T T T Th T T T T F T T T T Sa Su Figure C.28. Example regime assignment for a route. PMAM OFF MID

253 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 5-minute period of each day-of-the-week type was assigned to a regime based on aver- age TTI during that time period. Average TTIs for each time period were calculated using 6 months of 5-minute travel time data (excluding holidays). The breakdown of regimes by TTI is shown in Table C.13. These TTIs were selected by assuming a free- flow speed of 65 mph, then assuming that average speeds less than 40 mph represent severely congested conditions, speeds between 40 and 50 mph represent moderately congested conditions, and speeds greater than 60 mph represent slightly congested conditions. Other routes in other regions may need different thresholds or numbers of regimes to accurately capture the varying levels of congestion along an individual corridor. TABLE C.13. REGIMES BY TRAVEL TIME INDEX Regime TTI Color Travel Time (min) Route 1 Route 2 Route 3 Free flow <1.1 <15.6 <13.4 <15.4 Slightly congested 1.1–1.3 15.6–18.5 13.4–15.8 15.4–18.2 Moderately congested 1.3–1.6 18.5–22.7 15.8–19.5 18.2–22.4 Severely congested >1.6 >22.7 >19.5 >22.4 The connection between regimes and travel times for each of the three study routes is shown in Table C.13. The colors in the table correspond with the regime assignments by day-of-the-week type for each of the routes, shown in Figure C.29, Figure C.30, and Figure C.31. Although the regime assignments in these tables are shown for each 20-minute time period, regimes were actually assigned to each 5-minute time period. The regime assignment allows for a comparison of the average performance by day of week and time of day on each of the three routes. The free-flow travel times on each route are fairly comparable. Route 2 is the shortest route and has the fastest free-flow travel time (12.2 minutes). Routes 1 and 3 are of comparable length; Route 1 has a slightly faster free-flow travel time (14 minutes) than Route 3 (14.2 minutes). Analysis of the regime tables shows that the duration of congestion on Route 1 is much narrower than it is on the other routes, and severe congestion only occurs right around the 5:00 p.m. hour during the midweek days. The duration of congestion on Route 2 is very wide; it lasts throughout the midday, is severe during the 5:00 p.m. hour on the midweek days, and is severe beginning at 4:00 p.m. on Friday. Route 3 is the only route to have morning congestion throughout the work week. It is also the only route to have weekend congestion, possibly because it traverses San Diego’s Balboa Park, a popular tourist destination. Congestion is severe on Route 3 Tuesday through Friday during the 5:00 p.m. hour.

254 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY H ou r 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Day of Week M Tu W Th F Sa Su Fi g u re C .2 9 . Ro ut e 1 re gi m es fo r so ut hb ou nd I- 5. H ou r 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Day of Week M Tu W Th F Sa Su Fi g u re C .3 0 . Ro ut e 2 re gi m es fo r so ut hb ou nd I- 15 .

255 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY H ou r 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Day of Week M Tu W Th F Sa Su Fi g u re C .3 1 . Ro ut e 3 re gi m es fo r so ut hb ou nd S R- 16 3.

256 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY PDF Generation. Although regime assignments are made offline, this validation assumes that the regime-based PDFs are assembled in real time in response to a user’s request for information. Future work by the research team will develop methods for creating PDFs offline and storing them in advance of a user query to reduce the need for real-time computation. This validation assumes that the user wants to know the average and planning departure times for three routes that allow for arrival at 5:30 p.m. on a Friday to the destination. PDFs are generated for each of the three routes’ operating regimes dur- ing the Friday 5:00 p.m. hour. The regime matrices show that during this time period Route 1 is in the moderately congested regime, and Routes 2 and 3 are in the severely congested regime. This validation effort generates TT-PDFs for each route using all of the travel times within the same regime category measured on Fridays over the past six months. An alternate method is to generate PDFs based on travel times within the same regime category measured on any day. In this case, since six months of data were used to form the PDFs, it was determined that Friday data alone would generate enough travel time data points to form an accurate PDF. The plots of each PDF are shown in Figure C.32. Route 1 appears to have the smallest distribution of travel times during this time period; the most frequently occur- ring travel time is around 20 minutes. Route 2 has significantly more travel time vari- ability during this time period; the most frequently occurring travel times on Friday during severe congestion are around 18 minutes, but the TT-PDF has a long tail end, Figure C.32. Alternate route TT-PDFs for a Friday trip at 5:30 p.m.

257 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY and travel times upward of 30 minutes can occur. The most frequently occurring travel time on Route 3 is approximately 24 minutes, and the route has significant travel time variability on Fridays. User Outputs. In this final step, the TT-PDFs are generated into useful summary metrics to assist the user in itinerary planning. In this case, the goal is to provide the user the departure times needed to arrive on time on average and with a buffer time along each route. The median and planning travel times on each route during the user’s desired time of travel are summarized in Table C.14. Route 2 has the fastest median travel time, but this route also has significant travel time variability, requiring a trav- eler with a nonflexible arrival time to add a buffer time of 14 minutes to the median travel time to ensure on-time arrival 95% of the time. Route 1 is almost 2 minutes slower than Route 2 on average, but it offers significant (5 minutes) time savings when variability is included. Even Route 3, which has a much slower median travel time than the other two routes, has a faster planning time than Route 2. TABLE C.14. MEDIAN AND PLANNING TRAVEL TIMES ALONG ALTERNATE ROUTES Route Median Travel Time (min) Planning Travel Time (min) Route 1 (I-5) 20.8 28.1 Route 2 (I-15) 19.1 33.2 Route 3 (SR-163) 23.3 32.4 Table C.15 synthesizes these travel time estimates into recommended departure times, the information that is of most use to the end user. These departure time esti- mates are distinguished by departure times for 50% and 95% on-time arrival to help the user plan the trip with consideration of the need for an on-time arrival. Other applications of this use case could provide departure times calculated from other reli- ability metrics, such as the 85th percentile travel time rather than the 95th, or the 99th percentile travel time for trips for which on-time arrival is imperative. TABLE C.15. ALTERNATE ROUTE DEPARTURE TIME ESTIMATES Route Departure Time for 50% On-Time Arrival Departure Time for 95% On-Time Arrival Route 1 (I-5) 5:09 p.m. 5:01 p.m. Route 2 (I-15) 5:10 p.m. 4:56 p.m. Route 2 (SR-163) 5:06 p.m. 4:57 p.m. Conclusion This use case validation illustrates the value of incorporating reliability-based travel time estimates into traveler information systems for use before trips so that trav- elers can plan itineraries based on their need for on-time arrival. As proven by the San

258 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Diego validation, the route that is the fastest on average is not always the route that consistently gets travelers to their destination on time. Providing buffer time measures for alternate routes conveys this message to the end user, ultimately giving them more confidence in the ability of the transportation system to get them to their destination on time. Use Case 3: Combining Real-Time and Historical Data to Predict Travel Times in Real Time Summary The purpose of this use case is to extend the system capabilities described in the free- way planning time use case to support the prediction of travel times along a route in real time, using both historical and real-time data. While various methods for perform- ing this data fusion to predict travel times have been implemented in practice, most only generate a single expected travel time estimate. This use case validation extends the methodology to generate, in addition to a single expected travel time, a range of predictive travel times that incorporate the measured historical variability along a route. Users This use case is of most value to travelers, who currently lack quality real-time infor- mation on expected travel times while en route to a destination. The analysis behind this use case is also of value to operators, who can use these methodologies to provide better predictive travel times to post on variable message signs or via other dissemina- tion technologies. Scope This use case validation describes methodologies for predicting near-term travel time ranges along a route. Specifically, it predicts travel time ranges for a 5:35 p.m. Thurs- day trip for two alternate routes. Site Two of the same alternate routes used to demonstrate freeway Use Case 2 (alternate route planning times) were used to demonstrate this predictive travel time use case. Both routes begin just south of the I-5/I-805 diverge and end near the U.S. Naval Base in National City. The first route, called the I-15 route, travels along southbound I-805, southbound I-15, and southbound I-5. The second route, called the I-5 route, travels solely along I-5. Maps of these two routes are shown in Figure C.33. Methods According to the use case requirements, the validation needs to use both data from the historical archive as well as real-time data to generate travel time predictions for trips that are already occurring or are to begin immediately. To meet these requirements, a nearest-neighbors approach was adopted. This method uses the measured real-time conditions along a route to search for similar conditions in the past and predicts a travel time based on historical travel times measured under similar conditions. Simi- lar approaches have been well documented in the literature, and a nearest-neighbors

259 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY approach is currently used in PeMS to predict travel times along a route for the rest of the day (1, 2). The method used in this validation extends traditional techniques to incorporate reliability information. Instead of providing one predictive travel time, this use case validation outputs a range of predictive travel times that incorporate the potential variability in travel times that may occur, as gathered from similar historical conditions. The employed methodology is only valid for near-term travel time predic- tion. This use case assumes that predictions are only made for the next three upcoming 5-minute time periods. To estimate a real-time predictive travel time range for a route, the methodology compares travel time data collected over the past six 5-minute time periods with travel time data collected over the same six time periods over the most recent 15 days of the same day of the week. In this use case, which aims to predict travel times for a 5:35 p.m. Thursday trip, this means that travel times measured between 5:00 and 5:30 p.m. on the current day are compared with travel times measured between 5:00 and 5:30 p.m. over the 15 most recent Thursdays. The nearest neighbors to the current day are selected by comparing the “distance” between the measured 5-minute travel time on the historical day with the measured travel time for the same 5-minute period on the current day. The distances between travel times for different 5-minute periods are weighted differently: similarity for the 5-minute trip that immediately precedes a trip is weighted more than similarity for the 5-minute trip that occurred 30 minutes before the current trip. The weighting factors used for each 5-minute period are shown in Table C.16. Figure C.33. Freeway Use Case 3 alternate routes. Map data © 2012 Google Southbound I-15 Route Southbound I-5 Route

260 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The distance dh between the current day travel time and the historical day travel time is calculated using Equation C.2: d x w x T x T xh h c 2( ) ( ) ( ) ( )= ∗ −  (C.2) where Tc = current day travel time; Th = historical day travel time; dh = distance between current day 5-minute travel time and historical day 5-minute travel time; Dh = total distance between current day travel time and historical day travel times for all 5-minute periods before a trip; x = time period before trip start (ranges from 1 for 5 minutes before to 6 for 30 minutes before); and w = weight factor. The total distance Dh between a current day and a historical day is calculated by summing up all the distances dh using Equation C.3: D d xh h x 1 6 ∑ ( )= = (C.3) TABLE C.16. WEIGHT FACTORS FOR MINUTES BEFORE TRIP Minutes Before Trip Weight Factor (w) 5 (x = 1) 1 10 (x = 2) 1/2 15 (x = 3) 1/4 20 (x = 4) 1/8 25 (x = 5) 1/16 30 (x = 6) 1/32 The result of the distance calculation step is a measure of travel time closeness between each historical day and the current day. From here, the method of k-nearest neighbors is followed; rather than selecting the travel time profile of the nearest day as the predicted travel time, the method considers the travel time profiles from the three nearest days to make a prediction. The goal of this use case is to predict a travel time range for the next three 5-minute time periods. In this validation, the expected travel time for the next three time periods is computed as the median of the travel times from the three nearest-neighbor days. The lower bound of the predictive range is computed as the expected travel time minus the variance of the three neighbor travel times. The upper bound of the predictive range is computed as the expected travel time plus the variance of the three neighbor travel times.

261 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Results The travel time prediction methodology was used to compute predictive travel time ranges for the two example alternate routes between 5:35 and 5:45 p.m. on Thursday, August 12, 2010. Because there are data on what the travel times actually were on this day, this validation has a ground-truth data source with which to compare the estimates generated by the selected methodology. I-15 Route. To predict 5:35 to 5:45 p.m. travel times on Thursday, August 12, 2010, 5-minute travel times between 5:05 and 5:45 p.m. were obtained for 15 Thurs- days between April 29, 2010, and August 12, 2010. The distance calculation method was used to determine the nearest neighbors. Table C.17 shows the travel times measured for each 5-minute time period over the 15 selected days. The first row shows the travel times measured on the current day of August 12, and all other rows show the travel times measured on each previous Thursday. The last column shows the total distance measured between the travel times on each day and the travel times on the current day. The three shaded rows indicate the days on which the distance was lowest, which were concluded to be similar to the current day. Figure C.34 compares the travel times measured on the predicted day with those measured on the closest three Thursdays and extends the x-axis to show the travel times on these three days for the periods of 5:35 p.m., 5:40 p.m., and 5:45 p.m. These are the travel times from which the predictive range for the current day is to be cal- culated. The thick black line indicates the travel times for the current day up until 5:30 p.m. Figure C.35 shows the results of using the median of the nearest-neighbor travel times approach to make a prediction of the expected travel times for the upcoming 15 minutes and compares these predictions to the travel times that were actually mea- sured on this day. Table C.18 shows this information in tabular form and also gives the predictive travel time ranges, which account for travel time variability in the evolving traffic conditions. As shown in the table, each actual measured travel time fell within the predictive range. The expected travel times varied from the measured travel times by only 5%.

262 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.17. NEIGHBORING THURSDAY TRAVEL TIMES ON I-15 (MINUTES) Date in 2010 5:05 p.m. 5:10 p.m. 5:15 p.m. 5:20 p.m. 5:25 p.m. 5:30 p.m. Distance Aug. 12 28.3 28.1 27.9 26.1 25.9 24.6 — May 6 17.1 18.1 18.7 18.8 18.5 17.7 10.4 May 13 18.6 18.0 18.4 18.6 18.1 18.3 10.2 May 20 18.8 19.7 19.8 19.7 18.8 18.4 9.4 May 27 15.6 16.5 16.9 17.0 16.9 16.4 12.5 June 3 28.0 27.9 28.1 27.9 27.6 26.8 2.7 June 10 17.5 19.1 20.1 21.0 21.0 21.0 6.9 June 17 18.2 19.2 19.0 19.2 18.5 17.1 10.6 June 24 34.8 35.0 36.0 37.4 37.0 37.4 16.5 July 1 24.6 25.2 25.9 24.9 24.8 24.3 1.6 July 8 17.5 18.2 18.4 17.1 16.1 15.7 13.0 July 15 15.8 16.1 16.4 16.5 16.9 16.5 12.6 July 22 20.7 22.0 22.4 22.5 22.6 22.9 4.4 July 29 20.8 20.4 20.3 20.0 20.2 19.5 8.0 Aug. 5 22.9 24.5 26.2 26.4 25.7 25.1 1.6 21 22 23 24 25 26 27 28 29 5:05 PM 5:10 PM 5:15 PM 5:20 PM 5:25 PM 5:30 PM 5:35 PM 5:40 PM 5:45 PM Tr av el T im e (m in s) 6/3/2010 7/1/2010 8/5/2010 8/12/2010 Figure C.34. Travel time profiles of three closest Thursdays for I-15.

263 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.18. PREDICTED VERSUS ACTUAL TRAVEL TIMES FOR AUGUST 12, 2010, FOR I-15 Travel Time Measurement 5:35 p.m. 5:40 p.m. 5:45 p.m. Predicted lower range (min) 23.6 22.3 20.6 Predicted upper range (min) 26.7 26.2 24.1 Predicted (min) 25.1 24.3 22.3 Measured (min) 23.9 23.1 22.1 Measured in range of predicted? Yes Yes Yes Difference between predicted and measured (%) 5.0% –5.2% –1.0% I-5 Route. The same approach was taken to estimate a predictive travel time range for the alternate southbound I-5 route for the same 15-minute time period. Figure C.36 plots the travel times for the three closest Thursdays identified by the distance calcu- lation method. The heavy black line indicates the travel times for the current day up until 5:30 p.m. Figure C.37 compares the median travel time prediction for the upcoming 15-minute period with the actual travel times that were measured on this route and day. Table C.19 expands this information to show the lower and upper bounds of the predicted travel time ranges and compares the estimates with the travel times actually measured on this day. Each measured travel time fell within the predictive range, and expected travel times varied from the measured travel times by less than 5%. Figure C.35. Measured and predicted travel times for August 12, 2010, for I-15. 20 21 22 23 24 25 26 27 28 29 5:05 PM 5:10 PM 5:15 PM 5:20 PM 5:25 PM 5:30 PM 5:35 PM 5:40 PM 5:45 PM Tr av el T im e (m in ) Predicted Measured

264 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 20 21 22 23 24 25 26 27 28 5:05 PM 5:10 PM 5:15 PM 5:20 PM 5:25 PM 5:30 PM 5:35 PM 5:40 PM 5:45 PM Tr av el T im e (m in s) 6/3/2010 7/15/2010 8/5/2010 8/12/2010 18 19 20 21 22 23 24 25 26 27 28 5:05 PM 5:10 PM 5:15 PM 5:20 PM 5:25 PM 5:30 PM 5:35 PM 5:40 PM 5:45 PM Tr av el T im e (m in s) Predicted Measured Figure C.36. Travel time profiles from three closest Thursdays for I-5. Figure C.37. Predicted and measured travel times for August 12, 2010, for I-5.

265 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.19. PREDICTED VERSUS ACTUAL TRAVEL TIMES FOR AUGUST 12, 2010, FOR I-5 Travel Time Measurements 5:35 p.m. 5:40 p.m. 5:45 p.m. Predicted lower range (min) 21.6 18.2 18.4 Predicted upper range (min) 25.9 27.3 26.4 Predicted (min) 23.8 22.8 22.4 Measured (min) 24.5 23.2 21.8 Measured in range of predicted? Yes Yes Yes Difference between predicted and measured (%) –2.9% –1.7% 3.8% Synthesis It is envisioned that the results of the travel time prediction methodologies can be used to provide updated travel time information in real time to help users select alternate routes based on current traffic conditions, as well as historical travel time patterns and reliability. For the example case, the following information could be posted on a vari- able message sign to provide travelers with current information: TRAVEL TIMES TO NATIONAL CITY I-5: 21–26 MIN I-805/I-15: 23–27 MIN Conclusion This use case validation shows that it is possible to provide predictive travel time ranges and expected near-term travel times by combining real-time and archived travel time data. The validation uses a k-nearest-neighbors approach to compare recent travel times from the current day with travel times measured on previous days. It then approximates near-term travel times based on the measurements from the most similar days. The travel time predictions for both study routes proved very similar to the actual travel times measured on the sample day. The travel time ranges output by the prediction method provide a way to report travel time reliability information in real time to give travelers a more realistic idea of the range of conditions they can expect to see during a trip.

266 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Transit Use Case 1: Conducting Offline Analysis on the Relationship Between Travel Time Variability and the Seven Sources of Congestion Summary This use case aims to quantify the impacts on travel time variability for transit trips of the seven sources of congestion: incidents, weather, work zones, fluctuations in demand, special events, traffic control devices, and inadequate base capacity. To per- form this analysis, methods were developed to extract travel times from APC bus data. These travel times were then flagged with the type of event they occurred under (if any) and aggregated into TT-PDFs. From these PDFs, summary metrics such as the median travel time and planning travel time were computed to show the extent of the variabil- ity impacts of each event condition. Users This use case has broad applications to a number of different user groups. For transit planners, knowing the relative contributions of the different sources of congestion to travel time reliability would help them to better prioritize travel time variability miti- gation measures on a route-specific basis. The outputs of this use case would also be of value to operators, providing them with information that informs on the range of operating conditions that can be expected on a route given certain event conditions. Finally, the outputs of this use case would have value to travelers, by providing better predictive travel times under certain event conditions that could be posted in real time on variable message signs at stops or on vehicles, or on traveler information websites. This information would help users better know what to expect during their trip, both during normal operating conditions and when a congestion-inducing event is occurring. Site Three routes were selected for the evaluation of this use case to highlight the varying contributions of congestion factors to travel time reliability across different routes, service patterns, and times of day. The first route analyzed is Route 20 southbound, which travels from the Kearny Mesa area down SR-163 into downtown San Diego. For this analysis, the team selected a subset of the route spanning 16.4 miles. This study section of Route 20 begins near the intersection of Miramar Road and Kearny Villa Road on the northern edge of the Marine Corps Air Station Miramar and con- tinues south along SR-163 to downtown San Diego. At Balboa Avenue and SR-163, after traveling along SR-163 for 6.6 miles, Route 20 takes a detour to Fashion Valley Transit Center at Friars Road and SR-163 before reentering SR-163 at I-8. Finally, the route terminates in downtown San Diego at 10th Avenue and Broadway. The second route analyzed, Route 20X, is identical to Route 20 except it does not stop at the Fashion Valley Transit Center. Here, the team studied a 14.7-mile long stretch of Route 20X beginning near the intersection of Miramar Road and Kearny Villa Road on the northern edge of the Marine Corps Air Station Miramar and con- tinuing south along SR-163 for 12.6 miles to downtown San Diego, terminating at 10th Avenue and Broadway.

267 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The third route analyzed was Route 50 southbound, which travels along I-5 into downtown San Diego. This route begins near the Clairemont Drive on-ramp to I-5, continues south along I-5 for 6.4 miles, and ends 0.8 miles later at 10th Avenue and Broadway. The route is 7.2 miles long. Routes 20 and 50 were chosen because they travel for significant distances along freeways, meaning that roadway incident data can be obtained for them through PeMS. Second, these routes were chosen because they travel toward downtown, which hosted several special events during the period of study, so their travel times can be analyzed for the effect of special events. Finally, these are routes for which a compara- tively large amount of APC data is readily available. A map of all routes is shown in Figure C.38. Methods These routes were analyzed to determine the travel time variability impacts caused by three sources of transit congestion: incidents, special events, and fluctuations in demand. Traffic control contributions were not investigated as ramp-metering location and timing data could not be obtained. Weather contributions were not considered due to the lack of inclement weather in San Diego for the August 2010 study period (the only month for which the APC data could be obtained). Lane closures were also not considered as they are expected to have little impact on transit service, even when the transit route runs along a freeway. The impacts of inadequate base capacity were not considered for the same reason. Figure C.38. Transit Use Case 2 routes. Routes 20 and 20X (dashe )dehsad( 05 etuoR )d

268 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY For every weekday run for which data were available on each of the three routes described above, APC data were analyzed to determine the in-vehicle travel time from delivered service records. Passenger loadings were also extracted from the APC data. To link travel times with the event condition that was active during their measure- ment, each transit run for which a travel time was obtained was tagged with one of the following events: baseline (none), special event, incident, or high demand. A travel time was tagged with baseline if none of the factors were active during that run. A travel time was tagged with incident if an incident was active anywhere on the route during that run. Incident start times and durations reported through PeMS were used to determine when incidents were active along the route. Incidents with durations shorter than 15 minutes were not considered. A travel time was tagged with special event if a special event was active at a venue along the route during that time period. Special event time periods were determined from the start time of the event and the expected duration of that event type. For example, if a football game at Qualcomm Stadium had a start time of 6:00 p.m. and was scheduled to end around 9:00 p.m., the event was considered active between 4:00 and 6:00 p.m. and between 8:30 and 10:00 p.m., when the majority of traffic would be accessing and leaving the venue. Finally, a travel time was tagged with high demand if the number of passengers on board the transit vehicle reached or exceeded 50 at any point during the run. For cases in which more than one factor was active, the travel time was tagged with the factor that was deemed to have the larger travel time impact (e.g., when a long-lasting incident coincided with a trip that also ran during the edge of a low-attendance special event window, the travel time was tagged with incident). Tagged travel times were then divided into different categories based on the time of day, because the impacts of the congestion sources are time dependent. For all three transit routes, three time periods were evaluated: morning peak, 7:00 to 9:00 a.m.; midday, 9:00 a.m. to 4:00 p.m.; and afternoon peak, 4:00 to 8:00 p.m. Finally, within each time period, TT-PDFs were assembled for all measured travel times. Results This section describes the results for the different routes. Route 20 Southbound. For Route 20 southbound, travel time variability and its contributing factors were investigated for the 22 weekdays in August 2010. The period of study was limited to a single month due to a shortage of data on other months. Data on incidents, special events, demand fluctuations, and travel times were collected from PeMS, external sources, and the in-vehicle APC sensors, as described above in the sec- tion on methods. Scheduled travel times for the subset of Route 20 travel times considered here over the period of study range from 39 to 60 minutes, averaging 51.7 minutes. In August 2010, vehicles averaged 8.1 minutes longer to complete this portion of the route than the scheduled time. The travel time distribution of trips on this route appears to be roughly unimodal, with a high standard deviation, greater frequency on the smaller side of the mode, and several outlying trips with long travel times. The mode occurs at 54.2 minutes.

269 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Over the period of study, 129 transit trips were made on this route. Of these 129 trips, seven special event, two incident, and 16 high-demand trips were recorded. Fig- ure C.39 shows the travel time distribution for all trips over the study period according to the event present (if any) during that trip. Figure C.40 shows the distribution of travel times during the weekday morning peak of August 2010. Relatively few trips occurred during the morning peak on Route 20. Those that did occur appeared to be clustered together around 44.2 minutes. This could be due to fluctuations in the transit schedule throughout the day, with trips occurring early in the morning scheduled with shorter travel times than trips occurring later in the day. No events were flagged for trips occurring in this time period. Figure C.39. Total travel time distribution for Route 20 for August 2010. Figure C.40. Morning peak travel time distribution for Route 20 for August 2010. 0 5 10 15 20 25 40.90 50.88 60.87 70.85 Travel Time (min) Baseline Special event Incident High demand 0 5 10 15 20 25 40.90 50.88 60.87 70.85 Travel Time (min) Baseline Special event Incident High demand

270 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.41 shows the travel time distribution over the month for midday trips. Travel times for the midday period, in contrast to those seen in the morning peak, appear clustered around the primary mode of 54.2 minutes, as shown in Figure C.39. The distribution of variability-causing events is interesting, with the two recorded inci- dent trips associated with longer-than-average trips, and two of the six longest trips associated with special events. However, all 11 high-demand trips had shorter than average travel times, indicating that large passenger loadings do not have much effect on travel times along this route during the middle of the day. This is fortunate, as 11 of the 14 high-demand events on this route occurred during the middle of the day. Figure C.42 shows the travel time distribution of trips taken during the after- noon peak period. There is one special event, a San Diego Padres baseball game that occurred late in the evening, associated with a relatively low travel time of 42.6 min- utes. High-demand events are also visible throughout the distribution, although they do not appear to be correlated with longer travel times. This is the most highly variable time period for which this route was analyzed. Table C.20 summarizes the contribution of each event condition to all travel times, to those exceeding the 85th percentile (57.2 minutes), and to those exceeding the 95th percentile (70.6 minutes). Although just 3.82% of all trips were associated with a special event, 10% of trips on which travel times exceeded the 85th percentile were associated with a special event. When limiting the pool to trips that exceeded the 95th percentile travel time, a full 14.29% of that total can be associated with special events. From a planning and operational standpoint, this indicates that special events are associated with long travel times on this route. Thus, there could be some room for reliability improvements by improving signage, adding capacity, or advertising alterna- tive routes during special events. Figure C.41. Midday travel time distribution for Route 20 for August 2010. 0 5 10 15 20 25 40.90 50.88 60.87 70.85 Travel Time (min) Baseline Special event Incident High demand

271 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.20. TRAVEL TIME VARIABILITY CAUSALITY FOR ROUTE 20 Source Active (%) Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 82.4 80.0% 85.7% Special Event 3.8 10.0% 14.3% Incident 1.5 5.0% 0.0% Demand 12.2 5.0% 0.0% Route 20X Southbound. Travel time variability and its contributing factors were also investigated for Route 20X southbound for the 22 weekdays in August 2010. The period of study was limited to a single month due to a shortage of data on other months. Data on incidents, special events, demand fluctuations, and travel times were collected from PeMS, external sources, and the in-vehicle APC sensors, as described above. Scheduled travel times for the subset of Route 20X considered here over the period of study ranged from 29 to 35 minutes, averaging 32.5 minutes, nearly a full 10 min- utes less than Route 20. In August 2010, on average, buses took 10.1 more minutes than they were scheduled to complete this portion of the route. A bimodal distribution can immediately be seen in Figure C.43, which plots all trip travel times over the month, with most travel times clustered around the higher mode, 42 minutes, and a smaller grouping around 32 minutes. The source of the bimodal distribution is not immediately clear. There is virtually no correlation between the scheduled travel time and actual travel time (R2 = 0.043) on this route; trips belonging to the lower mode do not necessarily have shorter scheduled travel times. However, of the 11 trips with travel times less than 36 minutes, 10 correspond to the 7:13 a.m. run, 0 5 10 15 20 25 40.90 50.88 60.87 70.85 Travel Time (min) Baseline Special event Incident High demand Figure C.42. Afternoon peak travel time distribution for Route 20 for August 2010.

272 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY and nine were made by the same driver. Of the 11 days when this smaller travel time was not seen, 10 had no 7:13 a.m. run scheduled. Thus, there seems to be an unknown factor associated with this particular run and driver that leads to a smaller travel time on this portion of the route. Figure C.44 depicts the distribution of travel times along Route 20X during the morning peak period (7:00 to 9:00 a.m.) labeled by event condition. The bimodal distribution described above can be seen most clearly here as all of the low-travel-time trips occurred during the morning peak. This is in stark contrast to the distribution of travel times for the morning peak period on Route 20. Both modes appear to be tightly bunched. This bimodal distribution makes the morning peak the period with the largest travel time variability for this route. Only one event condition was noted during the morning peak on this route: a high-demand event that was associated with a travel time of 48.6 minutes. Figure C.45 depicts the distribution of travel times along Route 20X during the midday period (9:00 a.m. to 4:00 p.m.), labeled by event condition. Here, a single mode is seen around 42.6 minutes. As with Route 20, the midday period saw the larg- est number of high passenger loadings on this route, with eight high-demand events. However, also similar to Route 20, these high loadings do not appear to be associated with longer travel times. A single incident event was associated with the highest mid- day travel time seen on this route, 49.8 minutes. Figure C.46 depicts the distribution of travel times along Route 20X during the afternoon peak period (4:00 to 8:00 p.m.), labeled by event condition. Few trips were taken on this route during this time span, so no overwhelming travel time trend can be identified other than the high variability of travel times. The largest travel times seen on this route occurred during the afternoon period. Of the five largest travel times, two were associated with high-demand events. Figure C.43. Complete travel time distribution for Route 20X for August 2010. 0 2 4 6 8 10 12 14 16 18 31.02 38.06 45.10 52.14 Travel Time (min) Baseline Special event Incident High demand

273 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 0 2 4 6 8 10 12 14 16 18 31.02 38.06 45.10 52.14 Travel Time (min) Baseline Special event Incident High demand 0 2 4 6 8 10 12 14 16 18 31.02 38.06 45.10 52.14 Travel Time (min) Baseline Special event Incident High demand Figure C.44. Morning peak travel time distribution for Route 20X for August 2010. Figure C.45. Midday travel time distribution for Route 20X for August 2010.

274 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Table C.21 summarizes the contribution of each event condition to all travel times, to those exceeding the 85th percentile (49.1 minutes), and to those exceeding the 95th percentile (51.4 minutes). Although 85.19% of all trips had no associated variability- inducing event, of those trips that exceeded the 85th percentile travel time, 25% were associated with either an incident or high demand, with high-demand events occurring more often. All of the high-demand trips that exceeded the 85th percentile travel time occurred during the afternoon peak period. From a planning and operational stand- point, this indicates that there could be some room for reliability improvements by adding capacity to high-demand trips on this route during the afternoon peak. TABLE C.21. TRAVEL TIME VARIABILITY CAUSALITY FOR ROUTE 20X Source Active (%) Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 85.2 75.0% 75.0% Special event 0.0 0.0% 0.0% Incident 1.2 8.3% 8.3% Demand 13.6 16.7% 16.7% Route 50 Southbound. For the subset of Route 50 southbound travel times con- sidered here, travel time variability and its contributing factors were investigated for the 22 weekdays in August 2010. The period of study was limited to a single month due to a shortage of data for other months. Data on incidents, special events, demand fluctuations, and travel times were collected from PeMS, external sources, and the 0 2 4 6 8 10 12 14 16 18 31.02 38.06 45.10 52.14 Travel Time (min) Baseline Special event Incident High demand Figure C.46. Afternoon peak travel time distribution for Route 20X for August 2010.

275 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY in-vehicle APC sensors as described above. Scheduled travel times for the 158 runs analyzed for this route ranged between 18 and 21 minutes, averaging 19.5 minutes. The average delivered travel time for this route was 28.75 minutes, a full 9.25 minutes more than the average scheduled travel time. Figure C.47 shows the total distribution of trip travel times by event condition over the study period. The morning peak distribution for this route, shown in Figure C.48, appears simi- lar to Route 20X, with two widely distributed modes appearing on either side of the distribution. However, unlike Route 20X, this bimodal distribution was not exclusive to the morning peak period for this route. No events were flagged for trips occurring in this time period. 0 4 8 12 16 20 15.57 23.40 31.24 39.07 Travel Time (min) Baseline Special event Incident High demand Figure C.47. Complete travel time distribution for Route 50 for August 2010. 0 4 8 12 16 20 15.57 23.40 31.24 39.07 Travel Time (min) Baseline Special event Incident High demand Figure C.48. Morning peak travel time distribution for Route 50 for August 2010.

276 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Similar to the other two routes analyzed here, the midday period, shown in Fig- ure C.49, carried the majority (four of five) of the high-demand trips on this route. However, continuing the trend of Routes 20 and 20X, those high-demand trips are not strongly associated with longer travel times. A majority of the trips clustered around the low end of the travel time distribution occurred during the midday period. Figure C.50 depicts the travel time distribution of trips taken during the afternoon peak period on Route 50. Immediately visible is the apparent relationship between inci- dent events and long travel times, as two of the three longest travel times seen during this month were associated with incidents (the third was associated with a special event). 0 4 8 12 16 20 15.57 23.40 31.24 39.07 Travel Time (min) Baseline Special event Incident High demand 0 4 8 12 16 20 15.57 23.40 31.24 39.07 Travel Time (min) Baseline Special event Incident High demand Figure C.49. Midday travel time distribution for Route 50 for August 2010. Figure C.50. Afternoon peak travel time distribution for Route 50 for August 2010.

277 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Table C.22 summarizes the contribution of each event condition to all travel times, to those exceeding the 85th percentile (35.9 minutes), and to those exceeding the 95th percentile (37.1 minutes). Although 92.36% of all trips had no associated variability- inducing event, of those trips that exceeded the 85th percentile travel time, the percent- age with no variability-inducing event dropped to 91.67%. When limiting the pool to trips that exceeded the 95th percentile travel time, a full 25% of that total can be associated with incidents (although special events were associated with just 3.18% of all trips). From a planning and operational standpoint, this indicates that there could be some room for reliability improvements by focusing more resources on clearing roadway incidents more quickly along this route to lessen the severity of their impact. TABLE C.22. TRAVEL TIME VARIABILITY CAUSALITY FOR ROUTE 50 Source Active (%) Active When Travel Time Exceeded 85th Percentile 95th Percentile Baseline 92.4 91.7% 75.0% Special event 0.6 0.0% 0.0% Incident 3.2 8.3% 25.0% Demand 5.1 0.0% 0.0% Conclusion This use case analysis illustrates one method for exploring the relationship between travel time variability and the sources of congestion. The methods used are relatively simple to perform provided that the transit APC data can be obtained and sufficiently cleaned. The application of the methodology to the three San Diego routes revealed key insights into how this type of analysis should be performed. Of note is the limited sample size used in this analysis. To ensure statistical signifi- cance and meaningful analysis, ideally no less than three months’ worth of data should be used to avoid invalid conclusions due to anomalies. Breaking the travel times down by time of day according to local traffic patterns is valuable as it isolates the effects of sources of congestion by time of day. For example, on Route 20 high passenger load- ings are associated with longer trip times during the afternoon peak period, but not at other times of day. Use Case 2: Using Planning-Based Reliability Tools to Determine Departure Time and Travel Time for a Trip Overview Perhaps the most commonly occurring use case related to transit data is the case of the transit user seeking information about the system for trip-planning purposes. This happens thousands of times each day in cities across the country, and with good reason. The dissemination of traveler information such as real-time arrivals, in-trip guidance, and routing can lead to a more satisfactory transit experience for the user and potentially increase ridership.

278 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Conversely, uncertainty can also have a significant effect on the traveler experi- ence. The agony associated with waiting for transit service has been well documented; research suggests that passengers overestimate the time they spend waiting by a fac- tor of two to three compared with in-vehicle time (3). Meanwhile, driving is often perceived as offering travelers a greater sense of control compared with other modes. Offering transit users accurate and easily accessible information on the transit system, although certainly stopping short of providing direct control over the trip, can give peace of mind to transit riders, reducing uncertainty along with the discomfort of wait- ing for service. As the reliability of this information improves, so will the experience of transit users. The use of planning-based reliability tools to determine departure times and travel times for a trip therefore has the potential to improve passenger understanding of the state of the transit network, leading to less uncertainty and greater ease of use of the transit system. Site Characteristics Transit agencies are rarely able to equip their entire fleets with APC or AVL sensors, making it difficult to conduct a thorough analysis of the entire network. For San Diego’s bus network, approximately 40% of buses have APC or AVL sensors installed, though not all of these sensors are fully operational. Due to malfunctioning sensors and limi- tations in the distribution of APC- or AVL-equipped vehicles, only 30% of San Diego routes are covered by transit vehicles equipped with functioning APC or AVL sensors. Route 30 northbound was chosen for this study primarily because it is the route for which the largest quantity of APC data was available for the period of study (August 2010). A subset of the route from the Grand Avenue exit on Highway 5 along the coast to the intersection of Torrey Pines Road and La Jolla Shores Drive (8.13 miles) was chosen for this study. For comparative purposes, Route 11 northbound was also examined. This route also contains a comparatively large amount of APC data for August 2010. It travels through the Southcrest neighborhood at 40th Street and National Avenue West on National Avenue, through downtown and north on 1st Avenue to University Avenue and Park Boulevard. The total length for the portion of the route analyzed here is 11.68 miles. Both routes are shown in Figure C.51. Data The data used in this analysis was obtained from SANDAG. It is APC data collected from August 1 to August 31, 2010, and it consists of measurements taken every time the vehicle opens its doors. Each data point contains the following variables, among others: • Operator ID; • Vehicle ID; • Trip ID; • Route ID;

279 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Door open time; • Door close time; • Number of passengers boarding; • Number of passengers alighting; and • Passenger load. Notably absent from these data is any kind of service pattern designation, which is necessary to group similar trips together for comparison purposes. Route ID is not a sufficient level at which to group trips, since a single route often consists of multiple service patterns (e.g., express patterns and alternate termination patterns). This means that the APC data must be preprocessed in order to identify which trip measurements can be grouped into the same service pattern. APC passenger count data are collected by detecting disturbances of dual light beams positioned at the doors of the transit vehicle. Boardings and alightings are detected based on the order in which the beams are broken by a passenger entering or exiting the vehicle. This data can be unreliable as some preprocessing of the data occurs on the sensor itself; specifically, the passenger load is never allowed to drop below zero. For the subset of the Route 30 trip times considered here, scheduled trip times range from 32 to 38 minutes, and scheduled headways range between 13 and 46 min- utes (the mean scheduled headway is 21.6 minutes). Approximately 700 vehicle trips over 20 weekdays in August 2010 were analyzed. Of the APC data for this entire route, 50% is imputed. It is necessary to impute data for points for which the measured data Figure C.51. Analyzed portions of Routes 30 and 11. Route 30 R oute 11

280 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY are missing or do not make physical sense. For example, if a given transit stop has no passengers waiting at it, and no riding passengers have requested a stop there, it is common for the transit vehicle to skip this stop. This results in a missing APC data point for that stop that must be imputed. Because this practice is particularly common at the beginnings and ends of runs, for this subset of the route it is expected that the percentage of data imputed is lower than 50%. For the subset of Route 11 trip times considered here, scheduled trip times range from 40 to 56 minutes, and scheduled headways range between 15 and 76 minutes (the mean scheduled headway is 30 minutes). Approximately 850 vehicle trips over 20 weekdays in August 2010 were analyzed. Of the APC data for this route, 53.20% is imputed. Approach Most other analyses of AVL and APC data consider transit trip components (e.g., run time, dwell time, and headways) separately (4–7). This can be considered an agency- centric approach as it attempts to answer questions that a transit system operator may be interested in, such as “How are dwell times affecting on-time performance?” and “What is an appropriate layover time?” In this analysis, the team combined headways and in-vehicle travel times in order to view transit performance measurement from a more passenger-centric perspective. The service experienced by the passenger is studied by focusing the analysis on answer- ing the fundamental passenger question “If I were to go to the bus stop at a certain time, when would I arrive at my destination?” This study assumes that passengers do not plan their transit trips according to real- time or scheduled data, but rather follow a uniform arrival pattern throughout the day, beginning their transit trips independently of the state of the system. Methods To begin this validation, the literature was surveyed to determine the recommended planning-based means for calculating the best departure time for a trip in a general way. An appropriate departure time will take into account the variability within the transit system, while being calculated in a way that is intuitive and useful to users. The focus group interviews conducted with passenger travelers showed that for daily, unconstrained trips, planning time is the most appropriate metric for passengers. Planning time is a travel time metric that accounts for variability within the system, representing a percentile (often the 85th or 95th) travel time for a trip. That is to say, the planning time for a trip is the travel time that should be accounted for in order for the traveler to be on time a certain percentage of the time. Trip here is taken to mean a pattern of movement between two points at a certain time of day; thus, planning time is always computed based on travel times for a single trip over a range of dates. In order to satisfy this use case and determine the planning time for a transit trip, the travel times for a single trip over a range of days must be found. It is possible to calculate such a table based on APC data alone. To do this, the team took the follow- ing steps:

281 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 1. A section of 8.13 miles of Route 30 northbound (from the Grand Avenue exit on Highway 5 along the coast to the intersection of Torrey Pines Road and La Jolla Shores Drive) was chosen to analyze for this use case. 2. APC data were used to measure actual travel times for trips along this route begin- ning every 2 minutes throughout the day. These trips begin independently of the bus schedule. 3. The previous step was repeated for each of the dates in the study range. 4. By using the data garnered from the previous steps, a table was created whose col- umns are dates, rows are times of day, and values are travel times along this transit route. The PDF distribution of travel times for each of the trips is computed in this table. The notion of computing such a table of travel times is common in highway per- formance measurement, but less common for transit performance measurement. Tran- sit performance measurement tends to focus on travel time in relation to a schedule (schedule adherence) rather than absolute travel time. The results of this analysis for August 31, 2010, can be seen in Figure C.52. The troughs correspond to trips that began immediately before the departure of a bus. The peaks represent trips that began just after the departure of a bus. The steadily down- ward sloping lines following peaks indicate trips that began between bus departures; the trips within a single downward sloping section are related in that they all go on to travel on the same bus, whose arrival is indicated by the following trough. The travel times are complemented by a Marey graph of the trips for this day. It can be seen that the troughs correspond to bus departures. A similar Marey graph and travel time plot, also for August 31, 2010, are shown in Figure C.53 for Route 11 northbound. Results Analyzing multiple days yields statistical measures of travel time variability. Here, 22 weekdays in August 2010 are analyzed following the preceding methodology to obtain Figure C.54, which depicts average travel times and the distribution of travel times along the vertical axis, with darker shading corresponding to higher frequency. All that remains to complete the validation of this use case is to select a desired arrival time and subtract the expected travel time from it. The expected travel time can be extracted from the distributions presented in Figure C.54, and a range of expected travel times are given. Interpolation may be necessary to obtain precise arrival times depending on the sample size. The departure times and travel times resulting from this analysis are presented in Tables C.23 and C.24 for Routes 30 and 11, respectively. Because bus departures are discrete and not continuous events, it is possible that a range of departure times can correspond to a single arrival time. This effect goes away with larger sample sizes.

282 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.23. DEPARTURE TIMES AND TRAVEL TIMES ON ROUTE 30 NORTHBOUND 75th Percentile Departure Time 75th Percentile Travel Time 85th Percentile Departure Time 85th Percentile Travel Time 95th Percentile Departure Time 95th Percentile Travel Time Arrival Time 6:54 a.m. 1 h 6 min 6:53 a.m. 1 h 7 min 6:52 a.m. 1 h 8 min 8:00 a.m. 10:06 a.m. 54 min 9:53 a.m. 1 h 7 min 9:45 a.m. 1 h 15 min 11:00 a.m. 2:02 p.m. 58 min 2:00 p.m. 1 h 1:58 p.m. 1 h 2 min 3:00 p.m. 4:03 p.m. 57 min 4:00 p.m. 1 h 3:27 p.m. 1 h 33 min 5:00 p.m. Figure C.52. Marey graph (top) and passenger travel times (bottom) by time of day for Route 30 on August 31, 2010. 117 2014.0 .23 10 L02 Guide Appendix C Part 2_final for composition.docx A similar Marey graph and travel time plot, also for August 31, 2010, are shown in Figure C.53 for Route 11 northbound. [Insert Figure C.52] [caption] Figure C.52. Marey graph (top) and passenger travel times (bottom) by time of day for Route 30 on August 31, 2010. [Insert Figure C.53] [caption]

283 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.53. Marey graph (top) and passenger travel times (bottom) by time of day for Route 11 on August 31, 2010. TABLE C.24. DEPARTURE TIMES AND TRAVEL TIMES ON ROUTE 11 NORTHBOUND 75th Percentile Departure Time 75th Percentile Travel Time 85th Percentile Departure Time 85th Percentile Travel Time 95th Percentile Departure Time 95th Percentile Travel Time Arrival Time 6:26 a.m. 1 h 34 min — — — — 8:00 a.m. 8:55 a.m. 2 h 5 min 8:41 a.m. 2 h 19 min 8:40 a.m. 2 h 20 min 11:00 a.m. 12:40 p.m. 2 h 20 min 12:38 p.m. 2 h 22 min 12:38 p.m. 2 h 22 min 3:00 p.m. 2:54 p.m. 2 h 6 min 2:52 p.m. 2 h 8 min 2:37 p.m. 2 h 23 min 5:00 p.m.

284 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Conclusion The most direct analysis would be achieved by restricting the date range to dates with identical schedules; however, in practice it can be rare to find days with the exact same schedule. Regardless, because for routes with headways smaller than 10 minutes it is common for passengers to arrive at bus stops independently of the schedule, the con- stant arrival pattern used in this simulation may be more meaningful. Agencies should strive to either reduce transit travel times across the day or estab- lish reliable times of day when the transit travel time can be expected to be low. As seen in the transition between Figure C.52 and Figure C.53, as more days are added to the analysis, the strong peaks correlating to regular bus departures can become obscured if the transit schedule is not regular day to day. This results in the slightly blurry look of the distributions in Figure C.54. However, if a period of study is selected in which the transit schedule is fixed, the troughs will always appear in the same locations, indicat- ing good reliability across days from the transit user’s perspective. Figure C.54. Planning time for trips on (top) Route 30 northbound and (bottom) Route 11 northbound.

285 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Use Case 3: Analyzing the Effects of Transfers on the Travel Time Reliability of Transit Trips Summary The goal of this use case is to demonstrate a methodology for quantifying the effects of missed transfers on travel time (and travel time reliability) for a particular transit trip. The likelihood of a transfer being missed is predicted based on three factors: the mea- sured performance of the vehicles on the route, the schedule, and an assumed passen- ger arrival distribution. In this use case, two transfer trips in San Diego are simulated, and the resulting passenger travel time histograms (accounting for the effects of missed transfers) for each route are presented. The delay applied when a transfer is missed is based on the vehicles’ measured performance, as well as the schedule. Practically, this methodology could aid in the identification of a pair of buses whose chronic schedule deviations at a particular location are likely to cause missed transfers. Missed transfers in a transit system are rarely monitored, despite the problems they cause for passengers. In practice, transit systems are most often evaluated accord- ing to the performance of individual vehicles, stops, and routes, not the interactions between them. In contrast, the likelihood of a missed transfer occurring depends on combinations of several factors, making it hard to estimate. This use case takes a systems approach to quantify the effects of three distributions: passenger arrival rate, on-time vehicle performance, and schedule-based transfer time on passenger travel time distributions. Additionally, a sensitivity analysis is used to isolate the effects of changes in each of these three distributions on the percentage of transfers predicted to be missed and the total passenger travel time histogram for the route. The simulation techniques found in this use case are made possible by the increas- ing availability of data from APC and AVL systems. These data are typically rich, con- taining vehicle arrival times and passenger loading information at stops along a route, often accompanied by contextual geographic information to relate records from mul- tiple vehicles. All simulations in this use case are based entirely on APC data from the San Diego bus system, the bus schedule, and an assumed passenger arrival distribution. Users The anticipated users of this case study are transit agency operators with an interest in minimizing missed transfers and their negative effects on passenger travel time. Opera- tors of transit agencies with APC data collection systems in place will find guidance on how to use their observed schedule adherence data to identify the predicted rates of transfers missed between a given pair of vehicles. Schedule or route adjustments can then be made to reduce the rate of missed transfers and decrease passenger travel times. Transit passengers are expected to be the prime beneficiaries of this use case. For the passenger, missing a transfer that should have been available according to the schedule is costly in terms of increased travel time and stress. Computer-based trip planners almost exclusively route passengers across transfers based on the transit schedule, not real-time data. Furthermore, trip planners can often recommend routes that transfer at unofficial transfer points. This means that any time a transfer is missed (i.e., the scheduled arrival order of two buses at a stop is reversed due to schedule

286 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY deviations), passengers may be affected, even if the transfer was officially untimed. Any efforts to reduce passenger travel times across the system must consider the effects that missed transfers can have on overall system travel times and travel time reliability. Site San Diego’s transit network is extensive and well connected, containing many transfer points. This makes it an ideal test setting. It includes 88 bus routes and several light rail lines. Most importantly, many buses in this system are equipped with APC equipment to monitor on-time performance. Two routes containing transfers through San Diego were selected for this analysis and are described in Table C.25. These routes were cho- sen for their popularity with riders as well as their high data coverage rates. Maps of the routes are shown in Figure C.55. Route A is from the Gaslight District to the San Diego Zoo and has a predicted travel time of 39 minutes. Route B is from Sea World to the Birch Aquarium and has a predicted travel time of 55 minutes. Trip times are simulated from APC data collected on these buses. These data origi- nate from location-tracking devices installed directly in the buses themselves and are based on GPS technology. Each APC device keeps a detailed event-based record of the vehicle’s performance as it drives along the route. A data point is created every time the vehicle makes a stop. For this use case, the relevant elements in each data point are as follows: • The name of the route that the bus was on (e.g., Route 30); • A unique ID corresponding to the individual run being made; • A unique ID corresponding to the stop at which the record was made, enabling stops across routes to be cross referenced; • The time when the doors opened at the stop; • The time when the doors closed at the stop; and • The scheduled time when the stop was supposed to be made. TABLE C.25. ROUTE CHARACTERISTICS Route Bus A Distance (mi) Bus B Distance (mi) Total Distance (mi) Transfer Location Estimated Timea (min) A: Gaslight District to the Zoo 3.7 1.0 4.7 Park Boulevard and University Avenue 39 B: Sea World to Birch Aquarium 3.7 6.4 10.1 Mission Boulevard and Felspar Street 55 a Estimated time is from the San Diego MTS trip planner for a trip departing at 10:00 a.m. on a weekday.

287 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Methods This section describes the data preparation and trip time methodologies. Data Preparation. In order to relate trip times on these transfer routes to the on- time performance of the buses serving them, several issues with the raw APC data must first be addressed. Most critically, the data are not a complete record of all vehicle activity throughout the system. Only a portion of the vehicle fleet is instrumented with APC equipment, and certain routes have higher coverage rates than others. With data available on only a fraction of the runs, gaps in data coverage become problematic, particularly when exploring missed transfers. Because of the missing data, the number of directly observed transfers between two buses at a given stop and time is relatively low, as either the arriving or departing bus will often be uninstrumented. This means that in this setting it is impossible to simply observe the missed transfers and total trip times directly. To circumvent this problem of incomplete instrumentation, a simulation-based method is used. This method works on the assumption that the on-time performance of the runs for which APC data exists is representative of the on-time performance of all trips. Rather than directly observing on-time performance that would result in a missed transfer, a large number of virtual trips on Routes A and B are simulated based on APC data, an empirical passenger arrival distribution (8), and the schedule. Figure C.55. Routes A (left) and B (right). Route endpoints are hollow, and the transfer point is filled in.

288 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The APC data contribute distributions of arrival schedule adherence, departure schedule adherence, and travel times for the relevant buses and stops. In order to construct these distributions accurately, only data from runs that serve both the origin and transfer (or transfer and destination) stops should be included. Grouping the data into service patterns facilitates this step. A service pattern is a finer unit of organiza- tion than a route and represents a grouping of trips that share the same stops in the same order. Route variations with alternate termination points or express service are examples of distinct service patterns within the same route. To detect service patterns in the data, repeating patterns of stops made by different vehicles within a single route were identified. Runs were then labeled according to the service pattern they represent. Considering APC traces at the service pattern level instead of the route level allows data from trips that do not serve the desired stops to be discarded. The inclusion of the passenger’s arrival time at the origin in the simulated trips means that there are actually two transfers on each route (from walking to Bus A and then walking from Bus A to Bus B). Thus, the simulated passenger can catch both buses, miss only Bus A, miss only Bus B, or miss both buses. The passenger arrival time distribution is based on a distribution empirically derived by Bowman and Turnquist (8), scaled to the 15-minute headway of Bus A (on both Routes A and B). The distribution of schedule-based transfer times was constructed based on the daytime weekday schedule for Buses A and B on each route. The transfer times for both routes are irregular as they are untimed. However, despite their irregularity, in each there was some correlation between consecutive transfer times. For example, if one transfer time was short, the following transfer time was scheduled to be longer. Because of this, missed connections at the transfer point were assessed a travel time penalty corresponding to the transfer time immediately following the one that was missed (without another independent sample). This additional travel time is the same as Bus B’s headway at that time of day. The relevant distributions used to simulate travel times on Route B are shown in Figure C.56. Figure C.56. Distributions used to simulate travel times on Route B.

289 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Several of these distributions correlate with each other, affecting how the samples are drawn in each simulation; Figure C.57 shows an example. On both routes, there was some correlation between Bus A’s departure time at the origin, Bus A’s travel time, and Bus A’s departure time at the transfer point. That is to say, a bus that departed late from the origin was more likely to be late when it left the transfer point. Correlation between Bus B’s departure time at the transfer point and Bus B’s travel time was also found. Because of these relationships between the distributions, simulated trips must not sample values from these related distributions independently. In a single travel time simulation, the values sampled from correlated distributions must come from the same APC trip record because they are related. Approach to Obtain Trip Times. The procedure for determining a single trip time can be seen in Figure C.58. To begin, values are randomly sampled from the Bus A departure and passenger arrival distributions. These values (both relative to Bus A’s scheduled departure at the origin) are then compared to determine whether Bus A is caught. If the departure time for Bus A is greater than the passenger’s arrival time, the first bus is caught. Otherwise, the first bus is missed (as a result of the passenger’s late arrival, the bus departing early, or some combination of the two). If the bus is missed, a single Bus A headway is added to the total trip time to represent the time spent wait- ing for the next bus. For both Routes A and B, Bus A maintained regular 15-minute headways during the daytime on weekdays. Once Bus A is caught, the Bus A travel time value from the same data record as Bus A’s departure from the origin is added to the total trip time, bringing the virtual passenger to the transfer point. Whether the transfer is made depends on three things: Bus A’s departure time from the transfer point relative to the schedule, the scheduled transfer time, and Bus B’s departure from the transfer point relative to the schedule. In order to be conservative and to acknowledge the time required by the passenger to move between buses, the measured time that Bus A departs from the transfer point is actually used to construct the distribution of Bus A’s arrival time at the transfer point. Figure C.57. Positive correlation between Bus B’s departure time at the transfer stop and its arrival time at the destination on Routes A and B. Bus A Departure Adherence at Origin(minutes past schedule)Bus A Departure Adherence at Origin(minutes past schedule)

290 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.58. Procedure followed to generate travel times. This represents a worst-case scenario. If Bus A’s departure adherence is earlier than the sum of Bus B’s departure adherence and the scheduled transfer time, Bus B is caught. Otherwise, Bus B is missed. Because of their correlation, the value used to represent Bus A’s arrival at the transfer point is taken from the same run in the APC data as Bus A’s schedule adherence at the origin and Bus A’s travel time. If Bus B is missed, then a penalty of one Bus B headway is assessed to the trip time. For both Routes A and B, Bus B’s headways were irregular. Because of this, the time until the arrival of the next consecutive Bus B is taken, as opposed to simply sampling another transfer time value, or an independent headway from Bus B. After the trans- fer, the Bus B travel time value from the same run as the sampled Bus B transfer point departure is applied to the total trip time. This completes the simulation, and the total travel time is computed as the sum of its components. The arrival and departure adherence distributions (passenger arrival time, Bus A’s departure time at the origin, Bus A’s departure time at the transfer point, and Bus B’s departure time at the transfer point) are all in terms of schedule adherence: actual time−scheduled time. The other travel time and transfer time distributions are magnitudes of time. This process was repeated 10,000 times for each route to obtain travel time histograms that accurately reflect the sample distributions. Results This section describes the results for the different routes. Route A: Gaslight to the San Diego Zoo. A simulation of 10,000 trips on Route A produced the PDF for travel time shown in Figure C.59. The shortest travel time is 22 minutes, and the longest is 96 minutes. The 50th percentile is reached at 47 min- utes, and the 95th percentile is reached at 62 minutes. The average is 47 minutes. The longest travel time is 104% longer than the mean and 336% as long as the shortest time. Guidance to potential passengers might be that they should expect the trip to take 47 minutes, but one out of every 20 trips will take longer than 62 minutes. The histogram of travel times appears normally distributed, with a portion of the simulated travel times skewed to the right. These trips represent times when a very long in-vehicle travel time was sampled for one of the legs of the trip, not necessarily

291 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY trips on which a connection was missed. Further insight into travel times on this route can be gained by dividing the simulated trips into those that made or missed each bus. Table C.26 presents a breakdown of the simulations by scenario. Four out of five simulated trips were able to catch both buses and enjoyed shorter average travel times. The median travel time increased by approximately 9 minutes for each bus that was missed. Surprisingly, the trip time histograms for trips that missed buses were more tightly grouped (and thus had better travel time reliability) than those that made both buses. This observation is discussed in further detail in the following section. Route B: Sea World to the Birch Aquarium. A simulation of 10,000 trips on Route B produced the PDF shown in Figure C.60. The shortest travel time is 42 minutes, and the longest is 138 minutes. The 50th percentile is reached at 66 minutes, and the 95th percentile is reached at 85 minutes. The average travel time is 67 minutes. Thus, the longest travel time is 109% longer than the mean and 229% as long as the shortest time. Guidance to potential passengers might be that they should expect the trip to take 66 minutes, but one out of every 20 trips will take longer than 85 minutes. The histogram of travel times appears approximately normal, with a longer tail of high travel times. Figure C.59. Histogram of 10,000 simulated trips on Route A. TABLE C.26. TRAVEL TIME DISTRIBUTIONS UNDER DIFFERENT TRIP SCENARIOS ON ROUTE A Scenario Percentage (%) Minimum (min) Median (min) 95th Percentile (min) Maximum (min) Mean (min) Standard Deviation (min) Make both 81.71 22 45 59 96 46 7.78 Miss A, make B 7.69 36 54 66 96 54 7.05 Make A, miss B 9.76 34 52 65 79 53 6.84 Miss A, miss B 0.84 42 63 69 71 62 5.25 Total 100 22 47 62 96 47 8.25

292 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY On Route B, whether either Bus A or Bus B, or both, were missed or caught was tracked for the purposes of exploring the effects of missed transfers on travel time. Travel time histograms corresponding to each scenario are plotted in Figure C.61 and described in Table C.27. Missing one or more buses leads to increased travel times on this route, although (as with Route A) the travel time reliability actually improves as well. This apparent improvement in reliability may be unexpected, but according to Figure C.58, simulated trips that miss Bus A or Bus B are subjected to no or little additional randomness. If Bus A is missed, a predetermined 15-minute headway is added to the trip time. If Bus B is missed, a Bus B headway (ranging between 13 and 16 minutes) is added to the trip time (note that the standard deviation is greater when missing Bus B than when missing Bus A). Thus, the smaller standard deviations when missing buses are attributed to the smaller sample sizes and the presence of outliers in the make-both case. The presence of a few extremely long travel times for Bus A and Bus B on each route contributed to these patterns. With a greater number of simulations catching both buses on each route, more make-both simulated trips had the opportunity to experience an extremely long in-vehicle travel time. Thus, the rare occurrence of an extremely long travel time (roughly twice as long as the average travel time in these data) can have a greater effect than the occasional missed bus. However, it is important to note that trips that miss one or more buses do so unexpectedly, so even though the reliability in those scenarios is improved, the passenger cannot plan for them, and their existence diminishes the reliability of the trip as a whole. Discussion A sensitivity analysis comparing the effects on various measures of travel time (as well as the percentages of simulated passengers who miss one or more buses) on Route B is presented in Table C.28. The baseline case represents the results of the simulation with all distributions unaltered. The passenger arrival distribution, Bus B’s departure adher- ence at the transfer stop, and the scheduled transfer time are then each incrementally Figure C.60. Histogram of 10,000 simulated trips on Route B.

293 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.61. Travel times when catching and missing buses on Route B.

294 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.28. SENSITIVITY ANALYSIS ON ROUTE B Mean (min) Median (min) 95th Percentile (min) Standard Deviation (min) Make Both (%) Miss A, Make B (%) Make A, Miss B (%) Miss Both (%) Baseline 67 66 85 9.75 85.93 4.71 8.91 0.45 Pax arrival + 1 min 67 65 85 9.93 82.54 7.33 9.29 0.84 Pax arrival + 2 min 67 66 85 9.84 76.28 14.57 7.65 1.50 Pax arrival + 3 min 67 66 86 10.17 66.49 23.90 6.95 2.66 Pax arrival * 1.2 68 67 87 10.03 86.86 3.38 9.35 0.41 Pax arrival * 1.4 69 68 88 10.46 87.05 3.24 9.30 0.41 Bus B departure – 1 min 68 67 87 10.25 83.70 3.35 12.47 0.48 Bus B departure – 2 min 68 67 87 10.40 79.23 3.24 17.04 0.49 Bus B departure – 3 min 68 67 86 10.21 72.53 2.87 23.64 0.96 Bus B departure * 1.2 69 68 88 10.35 86.65 3.77 9.09 0.49 Bus B departure * 1.4 69 68 89 10.62 86.11 3.70 9.83 0.36 Scheduled transfer time – 1 min 68 67 87 10.23 83.41 3.36 12.63 0.60 Scheduled transfer time – 2 min 68 67 86 10.21 80.05 3.27 16.02 0.66 Scheduled transfer time – 3 min 68 66 86 10.25 76.51 3.05 19.54 0.90 Scheduled transfer time * 0.8 67 66 85 9.77 85.08 3.37 11.02 0.53 Scheduled transfer time * 0.6 66 65 84 9.65 80.09 3.64 15.61 0.66 Note: Pax = passenger. TABLE C.27. TRAVEL TIME DISTRIBUTIONS UNDER DIFFERENT TRIP SCENARIOS ON ROUTE B Scenario Percentage (%) Minimum (min) Median (min) 95th Percentile (min) Maximum (min) Mean (min) Standard Deviation (min) Make both 85.93 42 65 82 138 66 9.22 Miss A, make B 4.71 51 73 86 122 74 8.10 Make A, miss B 8.91 56 75 92 122 76 8.53 Miss A, miss B 0.45 69 82 100 117 83 8.67 Total 100 42 66 85 138 67 9.75 shifted or scaled, and 10,000 trips with the adjusted distributions are simulated. The scheduled transfer time was held at zero instead of allowing it to go negative. Each of these 15 scenarios is designed to disrupt transfers. However, missed trans- fers do not directly affect in-vehicle travel time, which makes up the bulk of the total travel time. For example, when Bus B’s departure was shifted 3 minutes earlier, 15.24%

295 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY more passengers missed the second bus, with each of those passengers experiencing a delay of one Bus B headway. However, the mean travel time in this scenario only increased by 1 minute. This could be because the duration of the transfer is a relatively small part of the total trip time on Route B due to its length. Also, shifting departures earlier makes all trips in which the bus is not missed start sooner, decreasing wait times overall and offsetting increases in the mean and median due to missed connections. This suggests that traditional performance metrics (even reliability-based metrics) may not be capable of capturing the full effects of missed transfers. When the scheduled transfer time is confined to a tighter range, travel time reli- ability (as measured by standard deviation) increases. This is because the distribution of transfer times has such a wide range on this route (from 1 to 26 minutes) that when those long transfer times are cut nearly in half, as in the (scheduled transfer time * 0.6) case, each simulation benefits equally from reduced transfer times, even though the percentage of passengers who miss Bus B rises. Conclusion This use case has leveraged a simulation-based approach to demonstrate the possibility of simulating the percentages of missed transfers on a route based on APC data. These missed transfers could be due to late passenger arrivals, mistimed vehicle arrivals at the transfer point, or a transfer time that is too short as scheduled. The impacts of missed transfers on travel time and travel time reliability are explored through a sensitivity analysis. It is concluded that unusually long in-vehicle travel times can have a larger effect on traditional reliability measures than missed transfers, potentially hiding the existence of missed transfers on a route. Freight Use Case: Using Freight-Specific Data to Study Travel Times and Travel Time Variability Across an International Border Crossing Overview Calculating travel time reliability for freight poses unique data challenges and begs the question: How does travel time reliability for freight transportation systems dif- fer from the question of reliability in the overall surface transportation system? From the research performed in this project, the team determined that two primary factors differentiate freight systems and the overall surface transportation system: traveler context and trip characteristics. Traveler context is a primary differentiator between freight trips and all other surface modes: rather than delivering travelers to a destination, a freight trip delivers goods. Because freight drivers are being paid to perform a freight trip, the commercial ecosystem surrounding this concept means that the entire program of scheduling and executing freight trips is much more organized than a typical passenger trip. Thus, freight drivers acquire and use travel time reliability information in a fundamentally different manner than other travelers.

296 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY In terms of trip characteristics, freight and overall travel have spatial differences, temporal differences, and facility differences. Spatial differences refer to the fact that origins and destinations with the heaviest freight traffic do not necessarily also have the highest overall traffic volumes. Numerous O–D surveys have been employed to identify high-priority freight corridors, and these can be used to focus freight reliability monitoring efforts. In terms of temporal differences, freight traffic generally does not follow the same temporal morning and afternoon peak pattern of passenger travel. In fact, many freight trips are made during off-peak hours to avoid recurrent conges- tion. Finally, facility differences refer to the existence in some locations of freight-only lanes or corridors, which would need to be monitored separately from general purpose travel lanes. Given these differences, the project team decided to take a different approach than that taken for the freeway and transit data and instead focus analysis on a very spe- cific freight reliability concern: travel times and reliability across international border crossings. Data Challenges This freight use case validation presented a number of data challenges, mostly due to the fact that distinguishing freight traffic within an overall traffic stream using conven- tional data sources is difficult. The project team considered estimating freight traffic volumes from single-loop detectors and computing freight reliability statistics using the same methodologies employed in the freeway use case validations. However, these estimates, which rely on algorithms that compare lane-by-lane speeds to estimate truck traffic percentages, were deemed too unreliable to support accurate travel time vari- ability computations. The team also considered using data from the handful of spe- cialized weigh-in-motion sensors in the region that report vehicle classification data and truck weights, but these were too sparsely located to prove useful for travel time analysis. Because of the unsuitability of traditional traffic monitoring infrastructure for freight reliability calculations, the team’s preference was to base analysis on freight- specific data. There are troves of data on freight vehicle movements (including data on route reliability) available from one stakeholder group: freight movers themselves. Compa- nies such as Qualcomm and Novacom have developed data systems for freight mover operations. They rely on GPS devices outfitted on individual trucks to track position and speed, generally on a subhour basis. Although these data are frequently not fine grained enough to calculate some of the detailed urban reliability information that has been demonstrated elsewhere in this case study, they are adequate for freight movers to understand their travel time reliability environment and to schedule departures appro- priately for the just-in-time delivery windows demanded by their customers. However, this proprietary, competitive information, which freight movers gather on their own operations, is not generally available for studies such as L02. Although these companies have begun to share this data with some partners (such as third- party data providers), these deals are struck under terms of strict confidentiality and anonymity. There are some recent efforts to leverage this information for public sec- tor agency analysis, such as the border crossing work at Otay Mesa by the Federal

297 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Highway Administration (FHWA) described in the following section, but these efforts are still in the research phase and are not feasible for public sector agencies to put into operational practice. There is a strong overlap in freeway and arterial data systems of the data required to understand reliability in freight systems, as freight vehicles are generally part of the overall traffic stream. Because of this, they share the same overall reliability charac- teristics of the freeway and arterial systems as a whole. However, in many cases, the data required to understand freight movements are scarcer than data needed to under- stand the overall transportation system, simply because freight-related data pertain to a small percentage of overall trips in a given region. The project team was fortunate to be given access by FHWA to freight-specific GPS data collected at the Otay Mesa truck-only border crossing facility from Mexico into the United States. Thus, although this use case validation has a narrow geographic scope, it explores a major issue in freight travel. Site The Otay Mesa crossing has a truck-only facility that, during peak season (October to December), provides access to the United States to approximately 2,000 trucks per day. The crossing is equipped to handle trucks that participate in the Free and Secure Trade (FAST) expedited customs processing program, as well as those required to undergo standard processing. U.S.-bound trucks pass through Mexican Export pro- cessing prior to entering the United States and are required to be screened at a CHP commercial vehicle weighing and inspection station before accessing U.S. roadways. For travel time analysis, the Otay Mesa crossing was broken into 10 districts, as shown in Figure C.62. These districts are 1. Export approach; 2. Departure east; 3. Departure west; 4. Mexico customs; 5. U.S. primary; 6. U.S. secondary gate; 7. U.S. secondary; 8. Secondary departure; 9. CHP approach; and 10. CHP inspection. Data As part of an FHWA project (9), data were collected at Otay Mesa from 175 trucks passing through the crossing repeatedly from December 2008 through March 2009. The total number of crossings for GPS-equipped trucks ranged from 5% to 12% of the total population of trucks passing through the Otay Mesa crossing. The resulting

298 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.62. Otay Mesa border crossing district map. Source: Delcan Corporation. data set contained 900,000 individual points. A number of these data points were outside the crossing analysis zone, and thus were discarded prior to analysis. In ad- dition, almost 30% of trip records contained no travel times, making them unusable for freight reliability analysis. Analysis was performed on the remaining 300,000 indi- vidual points, or a third of the total data set. The Otay Mesa data were used for two types of reliability analysis: evaluation of the reliability within and across different districts and evaluation of the reliability associated with different types of inspections. For the district-level analysis, one data complication is that the quantity of reported travel times varies by district. Most indi- vidual districts have tens of thousands of travel time records, as shown in Figure C.63. However, few trip records (0.07%) contain travel times for all districts. The sparseness of this data makes it challenging to monitor travel times across groups of districts. For example, analyzing the travel time reliability between Districts 4 and 7 requires a large set of trips with data points within both Districts 4 and 7. Figure C.64 shows the number of trips that spanned multiple districts. Those with zero districts indicate trips for which data points were all outside of the geographical analysis range. Results As outlined in the data section, analysis focused on investigating reliability across Otay Mesa districts and for vehicles subjected to different inspection types. The results of each type of analysis are detailed below.

299 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.64. Otay Mesa trips spanning multiple districts. Figure C.63. Otay Mesa GPS points by district.

300 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY District Reliability. To understand which geographical segments of the bor- der crossing have the most travel time variability, the research team assembled the TT-PDFs for trips within each of the 10 individual districts and for two trips spanning multiple districts. The PDFs for Districts 1 through 6 are shown in Figure C.65, and the PDFs for Districts 7 through 10 are shown in Figure C.66. All of the PDFs are plotted on the same x-axis scale to facilitate comparison. These data are also summarized into Figure C.65. TT-PDFs for Districts 1 through 6.

301 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY median, standard deviation, and 95th percentile travel times by district in Table C.29. From the plots, the district that notably stands out as having the most travel time vari- ability is District 7 (U.S. secondary inspection). From the distribution, it appears that the most frequently occurring travel time through District 7 is about 15 minutes, but the trip regularly can take longer than an hour. The median travel time through this district is only 20 minutes, but the 95th percentile travel time is 90 minutes. Districts 1, 2, 3, 4, 8, and 10 also all have 95th percentile travel times at or greater than 1 hour, which is significantly higher than their median travel times of less than 10 minutes. The district with the most reliability is District 9 (CHP inspection approach). Here, the median travel time is only 12 seconds, with a 95th percentile travel time of 2 minutes. Figure C.66. TT-PDFs for Districts 7 through 10.

302 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.29. DISTRICT-BY-DISTRICT TRAVEL TIMES AND VARIABILITY District Median Travel Time (min) Standard Deviation (min) 95th Percentile Travel Time (min) 1 4 27 65 2 4 21 59 3 7 20 56 4 5 24 68 5 5 7 22 6 3 16 29 7 21 32 90 8 1 83 65 9 0.2 8 2 10 7 36 87 The research team also looked at the travel times for trucks to get from District 1 to District 6 (the gate to the U.S. secondary inspection) and to travel from District 1 to District 10. The PDFs for these two trips are shown in Figure C.67, and the results are summarized into the median, standard deviation, and 95th percentile travel times in Table C.30. TABLE C.30. CROSS-DISTRICT TRAVEL TIMES AND VARIABILITY Trip Median Travel Time (min) Standard Deviation (min) 95th Percentile Travel Time (min) Districts 1 to 6 37 40 132 Districts 1 to 10 50 48 157 The most commonly occurring travel time between District 1 and District 6 is slightly less than half an hour, though a significant number of trips can take upward of 1 or 2 hours. The median travel time for this trip is 37 minutes, but the 95th percen- tile travel time is 2 hours and 12 minutes. The median travel time to pass through the Otay Mesa crossing (as represented by the District 1 to 10 travel time samples) is only 50 minutes, but 5% of trips experience travel times exceeding 2.5 hours. Checkpoint Reliability. The team also considered the average travel times and travel time variability of trucks passing through certain combinations of checkpoints at different times of the day. As described in the freight data section, many of the freight GPS data records included information on which checkpoints a truck had to pass through while making its trip. Although all trucks have to go through certain checkpoints (Mexican exports, U.S. inspection, and CHP inspection), some trucks are subjected to additional inspections (Mexican secondary inspection or U.S. secondary inspection, or both). These were used to calculate travel times and reliability for each hour of the day for different checkpoint combinations.

303 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.67. Cross-district travel time PDFs. Approximately 15% of trucks that use the crossing qualify for FAST status, which means that although they have to pass through all the required checkpoints, they can do so in designated FAST lanes (1). Figure C.68 shows, for all days over which data were received, the total number of sampled FAST-lane trucks that traveled during each hour and did not have to stop for any secondary inspections, the average travel time they experienced, and the standard deviation in the travel times they experienced. The data represent over 3,500 records of vehicles that made FAST-lane trips. As the plot shows, travel times and travel time variability are actually the highest in the early morning hours, when the fewest sampled trucks were traveling. This may be because

304 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY drivers are resting or because there is less staff available to perform inspections. The peak number of trucks use the FAST lanes at around noon and between 4:00 and 6:00 p.m. Average travel times are fairly steady throughout the day, hovering at or slightly above 1 hour. The standard deviation of the travel times also remains steady at 40 to 50 minutes, meaning that it is fairly frequent for FAST-lane border crossings to take almost 2 hours. Figure C.69 shows the same plot for 7,400 non-FAST trucks that were selected for U.S. secondary inspections. As in the FAST lanes, travel times are the highest during the early morning hours. Throughout the rest of the day, travel times are steady, but are 20 to 30 minutes higher on average than the FAST travel times. Figure C.70 shows the hourly vehicle counts and travel times for FAST trucks that were selected for a U.S. secondary inspection. Interestingly, average travel times for FAST vehicles going through a U.S. secondary inspection are actually slower (between 90 and 100 minutes) during most hours than they are for non-FAST vehicles (between 80 and 90 minutes) going through a U.S. secondary inspection. The standard deviations of travel times for both types of trips are approximately the same. Conclusions This freight use case validation represents an initial use of the Otay Mesa truck travel time data to evaluate travel time reliability for different aspects of a border crossing. The research analyzed and compared travel time reliability across different physical sections of a freight-only border crossing, as well as for different combinations of inspection points passed through by individual trucks. By understanding where the bottlenecks are in the border crossing process and how they are affecting travel times 0 50 100 150 200 250 300 0 20 40 60 80 100 120 Tr uc k C ou nt Tr av el T im e (m in s) Count Average SD Figure C.68. FAST-lane truck counts, average travel times, and standard deviation travel times by hour.

305 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 0 100 200 300 400 500 600 0 20 40 60 80 100 120 140 160 Tr uc k C ou nt Tr av el T im e (m in s) Count Average SD Figure C.69. U.S. secondary inspection truck counts, average travel times, and standard deviation travel times by hour. Figure C.70. FAST U.S. secondary inspection truck counts, average travel times, and standard deviation travel times by hour. 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 120 140 160 180 Tr uc k C ou nt Tr av el T im e (m in s) Count Average SD

306 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY and reliability, managers can begin to take steps to improve operations: for example, adding lanes to capacity-restricted locations or adding staff to checkpoints that affect reliability during peak hours of the day. Extensions of the district-level analysis would group travel times by hour of the day to explain not just where travel time reliability is high, but when it is high, as well. Extensions of the checkpoint-based analysis would look at travel time reliability for different days of the week, and for different seasons, because truck border crossings have strong temporal patterns that affect the underlying reliability analysis. LESSONS LEARNED Overview During this case study, the research team focused on fully using a mature reliability monitoring system to illustrate the state of the art for existing practice. This was pos- sible because of many years of coordinated efforts by transportation agencies in the region, led by SANDAG and Caltrans. These efforts put in a large sensor network, developed the software to process the data from these sensors, and created the insti- tutional processes to utilize this information. Because this technical and institutional infrastructure was already in place, the team focused on generating sophisticated reli- ability use case analyses. The rich, multimodal nature of the San Diego data presented numerous opportunities for state-of-the-art reliability monitoring, as well as challenges in implementing Guide methodologies on real data. Methodological Advancement For methodological advancement, the team used data from the Berkeley Highway Laboratory section of I-80. This section is valuable because it has colocated dual-loop detectors and Bluetooth sensors. This data set provided an opportunity for the team to begin to assemble regimes and TT-PDFs from individual vehicle travel times. These TT-PDFs are needed to support motorist and traveler information use cases. Because the majority of the other case study sites would not provide data on individual traveler variability, the availability of these data was important because it let the research team study the connection between individual travel time variability and aggregated travel times, and whether the former can be estimated from the latter. In general, it was pos- sible to divide the system into specific travel regimes, but the team was not able to harmonize these two types of data. Transit Data The biggest data challenge in this case study validation was processing the transit data, which are stored in a newly developed performance measurement system. This case study represented the first research effort to use these data and this system. The team found that data quality is a major issue when processing transit data to compute travel times. Many of the records reported by equipped buses had errors that had to be programmati- cally filtered out. Errors had various causes. Some buses reported that they were on one route, but were serving a completely different set of stops. GPS malfunctions resulted in erroneous locations. Passenger count sensors failed and left holes in the data.

307 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY After the identification and removal of these data points, assembling route-based reliability statistics using a drastically reduced subset of good data presented the next challenge. This limited the number of routes that the research team could consider, as not all trips on all routes are made by equipped buses, and trips made by equipped buses contain holes due to erroneous data records. From this experience, the research team concluded that transit travel time reliability monitoring requires a robust data processing engine that can programmatically filter data to ensure that archived travel times are accurate. In addition, transit reliability analysis requires a long timeline of historical data, because, typically, a subset of buses is monitored and a large percent- age of obtained data points will prove invalid. Seven Sources Analysis From a use case standpoint, the research team was challenged to find the best ways to leverage the unique data available in San Diego to demonstrate use cases that might not be possible to explore at other sites. For the freeway studies, the research team focused on relating travel time variability with the seven sources of congestion, as this data set was unique to San Diego and the results have high value to planners and opera tors. In the past, the team developed a sophisticated statistical model that can esti mate the percentage of a route’s buffer time attributable to each source of conges- tion. This model has been documented in Chapter 3 of the Guide. In this case study, the team opted to pursue a less-sophisticated but more-accessible approach that de- velops TT-PDFs for each source using a simple data tagging process. This approach was selected because it provides meaningful and actionable results without requiring agency staff to have advanced statistical knowledge. Conclusions The San Diego case study validation provided the first opportunity for the team to test Guide recommendations, implement advanced methodologies, and formally respond to use cases. REFERENCES 1. Kwon, J., B. Coifman, and P. Bickel. Day-to-Day Travel-Time Trends and Travel- Time Prediction from Loop-Detector Data. In Transportation Research Record: Journal of the Transportation Research Board, No. 1717, TRB, National Research Council, Washington, D.C., 2000, pp. 120–129. 2. Rice, J., and E. van Zwet. A Simple and Effective Method for Predicting Travel Times on Freeways. IEEE Transactions on Intelligent Transportations Systems, Vol. 5, No. 3, 2004, pp. 200–207. 3. Mohring, H., J. Schroeter, and P. Wiboonchutikula. The Value of Waiting Time, Travel Time, and a Seat on a Bus. Rand Journal of Economics, Vol. 18, No. 1, 1987, pp. 40–56.

308 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 4. Berkow, M., A. El-Geneidy, R. Bertini, and D. Crout. Beyond Generating Transit Performance Measures: Visualizations and Statistical Analysis with Historical Data. In Transportation Research Record: Journal of the Transportation Research Board, No. 2111, Transportation Research Board of the National Academies, Washington, D.C., 2009, pp. 158–168. 5. Bertini, R. L., and A. M. El-Geneidy. Modeling Transit Trip Time Using Archived Bus Dispatch System Data. Journal of Transportation Engineering, Vol. 130, No. 1, 2004, pp. 56–67. 6. Bertini, R. L., and A. M. El-Geneidy. Generating Transit Performance Measures with Archived Data. In Transportation Research Record: Journal of the Transpor- tation Research Board, No. 1841, Transportation Research Board of the National Academies, Washington, D.C., 2003, pp. 109–119. 7. Pangilinan, C., A. Moore, and N. Wilson. Bus Supervision Deployment Strategies and the Use of Real-Time Automatic Vehicle Location for Improved Bus Service Reliability. In Transportation Research Record: Journal of the Transportation Research Board, No. 2063, Transportation Research Board of the National Acad- emies, Washington, D.C., 2008, pp. 28–33. 8. Bowman, L. A., and M. A. Turnquist. Service Frequency, Schedule Reliability and Passenger Wait Times at Transit Stops. Transportation Research Part A, Vol. 15A, No. 6, 1981, pp. 465–471. 9. Delcan Corporation. Measuring Cross-Border Travel Times for Freight: Otay Mesa International Border Crossing Final Report. Report No. FHWA-HOP-10-051. Federal Highway Administration, Washington, D.C., September 2010.

309 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Case Study 2 NORTHERN VIRGINIA This case study provides an example of a more traditional transportation data col- lection network operating in a mixture of urban and suburban environments. North- ern Virginia was selected as a case study site because it provided an opportunity to integrate a reliability monitoring system into a preexisting, extensive data collection network. The focus of this case study was to describe the required steps and consider- ations for integrating a travel time reliability monitoring system (TTRMS) into exist- ing data collection systems. The purpose of this case study was to • Describe the data acquisition and processing steps needed to transfer information between the existing system and the Performance Measurement System (PeMS) reliability monitoring system. • Demonstrate methods to ensure data quality of infrastructure-based sensors by comparing probe vehicle travel times using the procedures described in Chapter 3. • Develop multistate travel time reliability distributions from traffic data. The monitoring system section details the reasons for selecting Northern Virginia as a case study and gives an overview of the region. It briefly summarizes agency moni- toring practices, discusses the existing sensor network, and describes the software sys- tem that the team used to analyze use cases. The section also details the development of travel time reliability software systems and their relationships with other systems. Specifically, it describes the steps and tasks that the research team completed to trans- fer data from a preexisting collection system into a TTRMS. Methodology describes the implementation of a multistate travel time reliability model, developed by the SHRP 2 L10 research team, using the Northern Virginia freeway data. It is intended to showcase a tractable method for assembling travel time probability density functions from historical travel time data, as well as highlight the connections of this project to others under the SHRP 2 umbrella. This method was selected for emphasis in the present case study because the original work was per- formed using model-generated travel times from the same I-66 corridor being moni- tored as part of this case study. Work on refining the Bayesian travel time reliability

310 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY calculation methodology outlined in Chapter 3 and introduced in the San Diego case study will resume as part of the final three case studies. Use cases are less theoretical and more site specific. Their basic structure is derived from the user scenarios described in Appendix D, which are derived from the results of a series of interviews with transportation agency staff regarding agency practice with travel time reliability. Since the focus of this case study is to describe the required steps and considerations for integrating a TTRMS into existing data collection systems, only one use case is described in this case study. Lessons learned summarizes the lessons learned during this case study with regard to all aspects of travel time reliability monitoring: sensor systems, software systems, calculation methodology, and use. MONITORING SYSTEM Site Overview The team selected Northern Virginia to provide an example of a more traditional transportation data collection network operating in a mixture of urban and suburban environments. The Northern Virginia (NOVA) District of the Virginia Department of Transportation (VDOT) includes over 4,000 miles of urban, suburban, and rural roadways in Fairfax, Arlington, Loudoun, and Prince William counties. Figure C.71 shows a map of the Northern Virginia District. Figure C.71. Map of the NOVA District. Source: Virginia Department of Transportation.

311 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Traffic operations in the district are overseen from the NOVA Traffic Operations Center (TOC), which manages more than 100 miles of instrumented roadways, includ- ing high-occupancy vehicle (HOV) facilities on I-95/I-395, I-295, I-66, and the Dulles Toll Road. To support these activities, the TOC has deployed a range of intelligent transportation system technologies, including • 109 cameras; • 222 dynamic message signs; • 24 gates on I-66 HOV lanes for use during peak travel hours; • 21 gates on I-95/I-395 for reversible HOV lanes; • 25 ramp meters on I-66 and I-395; • 30 lane control signals; • 23 vehicle classification stations; and • Approximately 250 traffic sensors (see Figure C.72 for deployment locations). Overall, the NOVA TOC is a high-tech communications hub that manages some of the nation’s busiest roadways. Its systems collect, archive, manage, and distribute data and video generated by these resources for use in transportation administration, policy evaluation, safety, planning, performance monitoring, program assessment, operations, and research applications. An archived data management system has been developed by the University of Virginia Smart Travel Lab to support VDOT in con- ducting these activities. TOC staff use dynamic message signs and highway advisory radio sites to alert commuters about changing traffic conditions. Commuters and other travelers can also tune to AM 1620, call the Highway Helpline at 1-800-367-ROAD (7623) for real-time traffic information, or view the road conditions map on 511 Virginia. Figure C.72. Locations of NOVA District freeway-based traffic sensors. Source: Virginia Department of Transportation.

312 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY VDOT’s management strategy has undergone a dramatic change in the last few years, transitioning from a two-pronged build–maintain regime to a three-pronged build–operate–maintain scheme. As such, VDOT is evolving into a customer-driven organization with a focus on outcomes and a 24/7 performance orientation. As part of these efforts, VDOT has developed four Smart Travel goals: • Enhance public safety; • Enhance mobility; • Make the transportation system user friendly; and • Enable cross-cutting activities to support Goals 1 to 3. These goals are geared toward providing better services to NOVA District cus- tomers by improving the quality of their travel and responding promptly to their issues. The focus is on attaining greater operating efficiencies from existing roadway infra- structure as an alternative to building additional capacity. The NOVA Smart Travel vision is as follows: “Integrated deployment of Intelligent Transportation Systems will help NOVA optimize its services, supporting a secure multimodal transportation sys- tem that improves quality of life and customer satisfaction by ensuring a safer and less-congested transportation network.” As part of its activities, the NOVA District has significant interaction with agencies in the District of Columbia and Maryland (in particular in Montgomery and Prince Georges counties). Various federal, state, and local transportation stakeholders, includ- ing transit, police, emergency, medical, and other agencies, also play important roles in operating and managing area roadways and other regional transportation systems. There has been a recent push within the region to strive toward increased regional coordination and interoperability. To that end, a regional coordinating entity called Capitol Region Communications and Coordination, or CapCOM, has been created to focus on collecting data from a variety of sources to facilitate the creation of an overall picture of regional traffic. Due to the major transportation-related construction that began in the region dur- ing 2008, which is anticipated to continue through 2011, mitigation of construction- related congestion is a major focus for the district. Major projects are concurrently occurring, including • Construction of 14 miles of high-occupancy toll lanes on I-495; • Construction of 56 miles of high-occupancy toll lanes on I-395/I-95; • Widening of I-95 between Newington and Dumfries; • Widening of I-495; and • Roadway improvements at the I-495/Telegraph Road interchange. Sensors Northern Virginia suffers from severe road congestion, and it is generally considered one of the most congested regions in the nation. To help alleviate gridlock, VDOT en- courages use of Metrorail, carpooling, and other forms of mass transportation. Major

313 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY limited-access highways include I-495 (the Capital Beltway), I-95, I-395, and I-66; the Fairfax County Parkway and Franconia–Springfield Parkway; the George Washington Memorial Parkway; and the Dulles Toll Road. HOV lanes are available for use by commuters and buses on I-66, I-95/I-395, and the Dulles Toll Road. A portion of the region’s HOV lanes have been designed to be reversible, accommodating traffic flow heading north and east in the morning and south and west in the afternoon. VDOT operates five regional TOCs: Northern Virginia, Hampton Roads, Richmond, Staunton, and Salem. At the core of each VDOT TOC is an advanced transportation management system that controls each region’s field devices and man- ages information associated with the operation of the roadway network. Operators at each TOC monitor traffic and road conditions on a continuous basis via closed-circuit television cameras, vehicle detection infrastructure, and road weather information sen- sors. In Northern Virginia, VDOT has deployed an extensive network of point-based detectors (primarily inductive loops and radar-based detectors) to facilitate real-time data collection on freeways. Volume, occupancy, and (limited) speed data are collected from these detectors and used by NOVA TOC staff to manage traffic and incidents and provide information to motorists regarding current conditions. The breakdown of NOVA data sources is as follows: • Multiple types of traffic sensors have been deployed along I-95, I-495, I-395, and I-66 including inductive loop detectors, remote traffic microwave sensor radar, magnetometers, SmartSensor digital radar, and SAS-1 sensors. • Trichord has deployed acoustic sensors on I-95, I-395, I-495, and I-66. • Traffic.com has deployed sensors on I-495, I-395, I-66, and the Dulles Toll Road. TOC operators also enter incident data, planned events and work zones, and weather events into a web-based application called the Virginia Traffic Information Management System (VaTraffic), a statewide traffic information management and con- ditions reporting system developed by VDOT to provide an efficient, integrated plat- form for managing activities that affect the quality of travel experienced by motorists. VaTraffic comprises a suite of applications that VDOT staff use to manage planned events, such as roadway maintenance and unplanned events such as traffic accidents and heavy congestion, and to provide information for use by other VDOT systems. VaTraffic information is shared with the public, VDOT management, and other key state and local emergency response agencies. Although a number of major Interstate roadways pass through the NOVA region, including I-95, I-495, I-395, and I-66, for the purposes of this study the team con- ducted analyses exclusively on I-395 and I-66, the two primary entry–egress interstates southwest of Washington, D.C. On I-66 and I-395, point detectors are placed at approximately half-mile intervals. Due to accuracy and maintainability issues with inductive loop detectors and other older sensors, there are no plans to replace failed units that have been deployed on the mainline lanes. Instead, plans are in motion to transition to the use of nonintrusive radar-based detection technologies. These sensors are being deployed both as replace- ments for older failed units, as well as at all locations where detection infrastructure

314 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY is being deployed for the first time. As a result of a combination of older loop detec- tor station failures, ongoing roadway construction, and the need to configure many of the newer radar-based units, data are available for only about 75 of the detectors. Figure C.73 provides a visual indication of the availability of data on I-66 and I-395; lighter-colored icons indicate working stations, and darker icons indicate nonworking stations. Data Management NOVA TOC staff use a regional freeway management system to monitor and manage traffic data from the advanced transportation management system, respond to inci- dents, and disseminate traveler information. The freeway management system is linked to VaTraffic. The data from these three sources are made available via a data gateway. The data gateway was first deployed in VDOT in 2004 as an interconnection between the Virginia State Police (VSP) and the Richmond Traffic TOC. It has since grown into a statewide network that is used to exchange critical information. The data gateway is an XML Publish and Subscribe network fully compliant with the Emer- gency Data Exchange Language (EDXL) standard, providing the maximum degree of interoperability between systems. The data gateway allows a number of diverse systems to share data, including the following: • VaTraffic uses the data gateway to exchange information with nearly 1,500 state- wide users, the 511 Interactive Voice Response and web applications, and other Virginia DOT systems. VaTraffic publishes information for incidents, planned events, road conditions, snow conditions, and bridge schedules. • OpenTMS, which is deployed in the North, Central, Northwest, and Southwest TOCs, publishes information concerning incidents and dynamic message sign messages. In the future, OpenTMS is planned to provide information on weather sensors, work zones, HOV gate control, and other lane control data. Figure C.73. Map of working versus nonworking sensor stations at the time of this study. Map data © 2012 Google.

315 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • VSP has used the data gateway to share VSP data since 2004. Data entered into the VSP computer-aided dispatch system are shared in real time with all participating TOCs. VDOT currently reports on roadway conditions via various performance-related products, including its quarterly report, web-based Performance Dashboard, and bimonthly performance reports to the VDOT Commissioner (internal). The VDOT Performance Dashboard (http://dashboard.virginiadot.org/) provides a range of trans- portation performance–related data, including • Travel times on key commuter routes; • Congestion along interstates; • HOV travel speeds; • Incident duration; and • Annual hours of delay. Performance measurement, which has become an important function within VDOT, enables TOC engineers and operators to identify, measure, and report the status of both the freeway system and individual facilities at different geographic (spatial) and temporal scales. System Integration Overview For the purposes of this case study, data from NOVA’s data collection network and system were integrated into a developed archived data user service and TTRMS. The steps and challenges encountered in enabling the information and data exchange be- tween these two large and complex systems are described in detail in this section. The goal of this section is to provide agencies with a real-world example of the resources needed to accomplish data collection and monitoring system integration, and the likely challenges that will be encountered when procuring a monitoring system. This section first describes the source system (VDOT’s data collection system) and the reliability monitoring system (PeMS). It then describes the data acquisition and pro- cessing steps needed to transfer information between the two systems. Finally, it sum- marizes findings and lessons learned. Source System VDOT’s Northern Region Operations site receives detector data from two systems: one that collects data along part of I-66 and one that collects data for the rest of I-66 and I-395. These two data streams are integrated into a standardized format in a single text file that is generated every minute. This text file is passed in real time to the Regional Integrated Transportation Information System (RITIS), which is developed and maintained by the CATT Laboratory at the University of Maryland. RITIS, with- out further processing the data, parses the text file and puts it into an XML document that is updated every minute on a page of the RITIS web site. Access to this web page is limited to preapproved Internet protocol (IP) addresses. These real-time detector

316 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY data XML documents are the primary traffic data source for NOVA PeMS. When data quality, largely due to recent construction on monitored roadways, proved to be a major issue impeding the study of reliability on the 2011 data, the research team also acquired a database dump of detector data along I-66 and I-395 for the entire year of 2009 from the CATT lab. Reliability Monitoring System PeMS is a traffic data collection, processing, and analysis tool that extracts informa- tion from real-time intelligent transportation systems data, saves it permanently in a data warehouse, and presents it in various forms to users via the web. PeMS can cal- culate many different performance measures; the requirements for linking PeMS with an existing system depend on the features being used. Because the function of PeMS in this case study was to collect traffic data from point detectors, quality control it, gener- ate and store travel times, and report reliability statistics, PeMS needed the following data from the source systems: • Metadata on the roadway linework of facilities being monitored; • Metadata on the detection infrastructure, including the types of data collected and the locations of equipment; and • Real-time traffic data in a constant format at a constant frequency (such as every 30 seconds or every minute). The foundation of PeMS is the traffic detector, which reports at least two of the three fundamental parameters that describe traffic on a roadway: flow, occupancy, and speed. Detectors report or are polled for data in real time at a predefined time interval. In PeMS, detectors have a location denoted by a freeway number, direction of travel, latitude and longitude, and a milepost that marks the distance of a detector down a freeway. Each detector is assigned a unique ID that remains with it throughout time and can never be assigned to another detector, even if the original detector is removed. Every detector belongs to a station, which is a logical grouping of detectors that moni- tor the same type of lane (e.g., mainline versus HOV) along the same direction of freeway at the same location. Each station has a unique ID, a type (e.g., mainline, HOV, ramp), a number of lanes, and a corresponding set of detectors. The final pieces of equipment in the PeMS framework are controllers, which are located along the roadside and collect data from one or more stations. They have a latitude–longitude and mile marker location, as well as a set of corresponding stations. This hierarchy—a controller collecting data from stations composed of detectors—gives structure to the roadway instrumentation configuration, making it easy to spatially aggregate data and diagnose problems in the data collection chain, such as a broken detector or controller or a failed communication line. PeMS collects detector data, either by directly polling each detector or obtain- ing it from an existing data collection system, in real time and stores it in an Oracle database. The raw data are permanently stored in a raw database table, and they are also aggregated up to the 5-minute level, at which point PeMS computes the aver- age 5-minute speed for detectors that transmit flow and occupancy, as well as the

317 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY average 5-minute occupancy for detectors that transmit flow and speed. These data are stored in a 5-minute detector database table. At the 5-minute level, PeMS also aggregates the lane-by-lane detector data up to the station level, which represents the total flow, average occupancy, and average speed across all the lanes at that location during that 5-minute period. These data are stored in a 5-minute station database table. The station data are further aggregated up to the hourly and daily levels and stored in corresponding database tables. PeMS computes travel times on routes, which can traverse more than one free- way and are defined by a starting on-ramp, freeway-to-freeway connectors (if any), and an ending off-ramp. It computes travel times for routes at the 5-minute and hourly levels from the data in the detector and station database tables using the infrastructure-based sensor calculation method described in Chapter 3 of the Guide. It stores these travel times permanently in 5-minute and hourly travel time database tables. These travel times can then be queried to assemble the historical distribution of travel times along a route for different times of the day and days of the week, as well as to compute reliability metrics such as the buffer time index and percentile travel times. Data Acquisition This section describes, in general, the transfer of data between the source system and the monitoring system in order to monitor travel time reliability. It also details the spe- cific data exchanges occurring between the source system and PeMS in this case study. General Typically, reliability monitoring systems must acquire two categories of information from the source system to produce accurate performance metrics: (a) metadata on the roadway network and detection infrastructure and (b) traffic data. The traffic data are unusable for travel time calculation purposes if they are not accompanied by a detailed description of the configuration of the system. Configuration information provides the contextual and spatial information on the sensor network needed to make sense of the real-time data. Ideally, these two types of information should be transmitted separately (i.e., not in the same file or data feed). Roadway and equipment configuration infor- mation is more static than traffic data, as it only needs to be updated with changes to the roadway or the detection infrastructure. Keeping the reporting structure for these two types of information separate reduces the size of the traffic data files, allowing for faster data processing, better readability, and lower bandwidth cost for external par- ties who may be accessing the data through a feed. In addition, the data acquisition step often involves reconciliation between the framework of the source system and the monitoring system. For example, different terminology can lead to incorrect interpretations of the data. This step often requires significant communication between the system contractor and the agency staff who have familiarity with the data collection system to resolve open questions and make sure that accurate assumptions are being made.

318 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Metadata PeMS needs to acquire two types of metadata before traffic data can be stored in the database: roadway network information and equipment configuration data. To represent the monitored roadway network and draw it on maps, PeMS needs to have geographic information system (GIS)–type roadway polylines defined by latitudes and longitudes. To help the agency link PeMS data and performance metrics with their own linear referencing system, PeMS also associates these polylines with state road- way mileposts. In most state agencies, mileposts are a reference system used to track highway mileage and denote the locations of landmarks. Typically, these mileposts reset at county boundaries. In locations where freeway alignments have changed over time, it is likely that the difference between two milepost markers no longer represents the true physical distance down the roadway. For this reason, PeMS adds a third rep- resentation of the roadway network called an absolute postmile. These are similar to mileposts, but they represent the true linear distance down a roadway as computed from the polylines. To facilitate the computation of performance metrics across long sections of freeway, absolute postmiles do not reset at county boundaries. In PeMS, this information is ultimately stored in a freeway configuration database table that contains a record for every 0.10 mile on every freeway. Each record contains the free- way number, direction of travel, latitude and longitude, state milepost, and absolute postmile. The research team was not able to obtain any GIS data for the NOVA network within the project time frame. Since the monitored network consisted of only two corridors, roadway linework was obtained by entering the starting and ending points of each corridor into Google Maps and exporting the results to a KML, or keyhole markup language, file. From these data, polylines and their latitudes and longitudes were parsed and placed in a PeMS database. The next step was to add state milepost markers to these latitude–longitude freeway locations. Since both of the monitored freeway segments fell into only one county, this was done by researching the mileposts at the county boundary and then interpolating the mileposts in at least 0.10-mile incre- ments along the rest of the freeway segment. In the NOVA case, state mileposts and PeMS absolute postmiles are the same. The second type of metadata required is information about the detection equip- ment from which the source system is collecting data. PeMS has a strict equipment con- figuration framework (described in the subsection on reliability monitoring systems) to which all source information must conform. The rigidity of this framework is due to the need to standardize data collection and processing across all agencies, regard- less of their source system structures. Configuration information ultimately populates detector, station, and controller configuration database tables in PeMS, and it is used to correctly aggregate data and run equipment diagnostic algorithms. NOVA equipment configuration information was obtained from an XML file posted on the RITIS website that is updated periodically (typically, not more than every few days). A representative section of this file is shown in Figure C.74. The file is

319 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY composed of <detector> elements, each of which has a unique ID, a textual name that includes a mile marker, a latitude and longitude, a type (such as inductive loop), and one or more <detection-zone> elements. Each <detection-zone> element has a unique ID, a number of lanes, a latitude and longitude, a direction, and, sometimes, a type (such as shoulder or lane). Figure C.74. NOVA RITIS detector configuration XML format.

320 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Once the file was obtained, the next step was to fit the data into the PeMS configu- ration framework. The third step was to parse the XML file, insert relevant fields into the PeMS database, and write a program to automatically download the configuration file from the RITIS website and populate relevant information into the database when- ever the file updated. Because the XML file was not accompanied by an explanatory text file, the second step took considerable time and effort, as a number of issues were uncovered that made it challenging to map the NOVA information into the PeMS database. These issues, described below, related to conflicting terminologies, informa- tion required by PeMS that was missing from the configuration file, and equipment types not supported by PeMS. The first challenge was to determine how the NOVA <detector> and <detection- zone> elements should map to the PeMS equipment framework of detectors, stations, and controllers. From the properties of the NOVA detectors, it was clear that they did not refer to the same entity as a PeMS detector. NOVA detectors contain multiple zones, and each zone has a lane count, a location, a direction, and a type. From these attributes, it was concluded that the NOVA detection zone was conceptually equiva- lent to the PeMS station and that the NOVA detector was the conceptual equivalent of the PeMS controller. This conclusion was confirmed by looking at samples of the RITIS traffic data XML files, which report flow, occupancy, and speed data for each <detection-zone>. After performing this matching and reviewing the traffic data, the team concluded that, despite the terminology used, the NOVA configuration informa- tion had no notion of a detector in the PeMS or the conventional sense (i.e., a sensor that monitors traffic in a single lane at a single location). As PeMS is built around a collection of lane-specific data from detectors, which enables the reporting of lane- by-lane flows, volumes, and occupancies at point locations and lane-by-lane travel times along routes, this presented a challenge. The problem was ultimately solved by using the number of lanes reported for each NOVA detection zone to assign artificially constructed PeMS detectors to monitor each lane. Each detector was given an ID that was assigned by appending to the detection zone ID an integer representing the lane number. During the real-time data integration, the flows, volumes, and occupancies reported by each detection zone were divided by the number of lanes and assigned to each detector. Another challenge was matching the NOVA detection zone types with the station types supported by PeMS. Every station in PeMS is assigned a type to denote the lane type that it monitors. Station types must be one of the following: mainline, HOV, collec- tor–distributor, freeway–freeway connector, off-ramp, or on-ramp. In the NOVA con- figuration XML file, not every detection zone is assigned a type, and the types that are assigned (shoulder, lane, exit ramp, RHOV, and HOV) do not align with those defined in PeMS. The NOVA shoulder zone type is a reflection of the fact that, during peak hours, the shoulder lanes on I-66 are open to traffic. The RHOV zone type is assigned to HOV lanes that are reversible based on the time of day. These two operational characteristics added significant complexity to the monitoring process. The operation of shoulder lanes meant that the number of lanes at a given location changed by time of day, a charac- teristic that PeMS could not accurately represent. Similarly, the reversible HOV lane

321 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY operation meant that sensors monitored different directions of travel based on the time of day, which PeMS also could not accurately configure. For this reason, shoulder and RHOV stations were not stored in the PeMS database. A related problem was that many detection zones were not assigned types in the configuration file. To solve this problem, the latitude and longitude of each NOVA detection zone was mapped in Google Earth and manually inspected to determine which PeMS station category it belonged to. The end product of this step was a comma-separated values (csv) file that listed each detec- tion zone ID and its corresponding PeMS station type. A third issue was that, through the metadata, PeMS needed to learn what types of data it would be receiving from each station. Typically, detectors can report up to three values: flow, occupancy, and speed. Some detectors, such as on- and off-ramp loop detectors, only report flows. Single inductive loop detectors report flow and occupancy. Radar detectors report flow and speed. Double-loop detectors report flow, occupancy, and speed. PeMS needs to know which detectors report which values, so that, for detectors reporting two of the three values, the third is calculated via an algorithm. This information is not directly present in the NOVA configuration XML file. NOVA detectors (PeMS controllers) are assigned types (either inductive loop or microwave radar) in the configuration file. Since Virginia DOT staff confirmed that the inductive loops are single-loop detectors, the team expected that zones made up of inductive loops would report flow and occupancy, and zones made up of microwave radar sensors would report flow and speed. However, in the traffic data XML file, all zones, regardless of their detector type in the configuration file, reported all three values or only flow. The implications of this finding are further described in the sub- section below on traffic data. From a metadata perspective, there was no sure way of tagging NOVA zones with the types of data expected to be received. For this reason, PeMS ultimately stored whatever values each zone transmitted via the XML file. This meant that for detectors reporting only flow, speeds and occupancies were entered as zero, even though this clearly did not reflect the actual field conditions. The metadata quality control steps described above were the bulk of the work to insert NOVA configuration information into PeMS. After this, a custom program was written to parse the PeMS-required fields from the XML configuration file, supple- ment them with the zone type information in the csv file, discard metadata for ele- ments that PeMS could not support, and insert information into the required database tables in PeMS. Ultimately, PeMS acquired configuration information for 260 mainline zones and 69 HOV zones, which became the equivalent of PeMS mainline and HOV stations, respectively. Traffic Data After the metadata acquisition, the next step was to acquire and archive traffic data. Real-time traffic data were acquired via an XML file posted every minute onto the RITIS web page in the same location as the configuration XML file. The end goal of the traffic data acquisition process was to take 1-minute traffic data from the XML file and insert the data into the appropriate tables in the PeMS database. Before this could be done, the research team had to develop a full and accurate understanding of the NOVA real-time data. Because the generation of accurate reliability information

322 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY requires a large set of historical travel times, the team wanted to minimize the delay in acquiring traffic data. For this reason, as soon as the metadata were inserted into PeMS, the team implemented a program to download the traffic data XML file from the RITIS website every minute and save it so that data could be parsed from the files and placed into the PeMS database as soon as the file format was thoroughly understood. A sample of a real-time traffic data XML file is shown in Figure C.75. It is composed of <collection-period-item> elements, each defined by a time stamp and a 60-second measurement duration. This element contains the most recent measurements for each Figure C.75. NOVA RITIS XML traffic data format.

323 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY NOVA <detector> that most recently sent data during that time stamp. Working con- trollers are reported in the <collection-period-item> element marked by the most recent time stamp. If a controller is not currently transmitting data, its most recent data trans- mission is included in a <collection-period-item> marked by the time stamp for which the system last received data from it. Each <collection-period-item> element contains a <zone-report> element for each controller that last reported data during that time stamp. Each <zone-report> element then contains a <zone-data-item> with each zone’s most recent flow, occupancy, and speed values. For many zones, the flows are nonzero, but the occupancies and speeds are zero. For others, all three values are nonzero. Review of the data led to a number of questions. The team first wanted to know what processing was done on the data to generate the values in the XML files. This issue relates to a fundamental issue that agencies collecting data should consider. Many agencies encounter external parties that have an interest in obtaining a traffic data feed generated from public-sector detection infrastructure. The level of inter- est in raw versus processed data differs depending on the intended use. Maintaining even one data feed can be a challenge; maintaining multiple data feeds is likely to be infeasible for many agencies. If agencies want to provide a feed of processed data, all steps should be documented in as much detail as possible to inform data users on what is being reported and how values are being generated. Optimally, a mature reliability monitoring system would collect raw data, apply quality controls, and aggregate and report the data using robust methods. Such a process would ensure uniformity in the data at the lowest temporal and spatial level possible while accurately evaluating and reporting the quality of data. The team concluded that the NOVA data were heavily preprocessed before being placed in the data feed. First, the XML file contains no lane-by-lane data, despite the fact that a number of the NOVA detector types are single inductive loop detectors, which monitor individual lanes. This means that, at some point, the source system aggregates values from individual lanes into total flow and average occupancy and speed across all lanes at a given location. As the foundation of PeMS is lane-by-lane data, this issue was addressed by dividing the flows by the number of lanes and assum- ing that the reported average speeds and occupancies applied to all lanes. Although this allowed the data to be transformed into the PeMS framework, it showed that a loss of information can occur when an agency preprocesses the data. In this case, the reliability monitoring system no longer has the ability to report on the differences in travel times along different lanes on the same route. Another sign that the NOVA data were preprocessed lay in the fact that many zones reported flow, occupancy, and speed. Since the corridor detectors were all single-loop detectors or microwave radar detectors, they only directly transmitted two of the three values. In these cases, PeMS would normally calculate the third value from the known two using a lane, location, and time-specific assumption about the average vehicle length, called a g-factor (1). When receiving all three values, PeMS does not have to perform this calculation step, but this comes at the expense of not knowing how the third value was computed. The team contacted University of Maryland and VDOT staff to determine what was being done, but both organizations stated that they do not process the data ultimately posted

324 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY in the RITIS XML file. From this, it was concluded that the data collection system in the field is doing the preprocessing, but the team was not able to ascertain exactly what was being done. Without being able to evaluate the methods, the team decided to have PeMS store whatever data it received via the XML files. In all cases, whether PeMS received all three values, or whether it received a nonzero flow and zero occupancy and speed, it stored these values in the database. The second issue that had to be addressed was determining the units of the occu- pancy values being reported. Typically, occupancy is reported as the percentage of the reporting period that a vehicle was directly above the detector. Reasonable values range from zero to 15. When reviewing the traffic data XML file, the team noted that many of the occupancy values were high, with some consistently exceeding 100. The team surmised that perhaps occupancy was being reported in tenths of a percent. The issue was ultimately discussed with VDOT staff, who confirmed that the units of occupancy are whole percentage points, and that zones reporting high occupancies are broken, largely due to construction projects in the vicinity. The third issue related to the discrepancy between the NOVA data being reported at the zone, or station, level, and the PeMS requirement for lane-by-lane data. From a metadata perspective, as previously described, this was resolved by assigning detectors to each zone within PeMS. For the real-time data, the team decided to simply divide the zone flows by the number of lanes at the zone and assign them to each lane. To keep flows as whole numbers, any remainders after the division were assigned starting at Lane 1 (the leftmost lane), resulting in an overall upward bias of vehicle counts in the left-hand lanes. For occupancy and speed, the team assigned the values reported by the zone to each of its detectors. By the time the issues described above had been resolved, the team had been down- loading and saving the 1-minute traffic data XML files from the RITIS website for three weeks. The next step was to write a program to parse the zone values, assign them to the PeMS detectors, and store them in the PeMS database according to the <detection-time-stamp> element. The team did this for the 3 weeks of archived traffic data XML files and also developed a program to download the XML files every minute from the RITIS website and perform the same steps to place data in the database in real time. Data Processing The data acquisition phase resolved all discrepancies between the NOVA framework and PeMS framework and successfully mapped all of the relevant fields in the XML files to the PeMS database. It also resulted in an automated, real-time acquisition chain between the RITIS web page and PeMS, with PeMS obtaining data from the web page every minute and inserting it into the PeMS database. From this point forward, PeMS could perform its standard data processing to assess the health of NOVA detection infrastructure, discard bad data and impute values, aggregate data across lanes and over time, and calculate performance metrics such as travel times. In its detector health assessment step, PeMS looks at the data transmitted by each detector over a single day and makes a determination as to whether the data are good or problematic. PeMS makes this assessment based on the flow and occupancy values

325 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY for each detector. There are a few common problems with detection infrastructure, and they manifest themselves in distinct ways in transmitted data, allowing for an auto- mated quality control process. One example is the situation in which PeMS receives no data or few data samples from a detector, a station, or a controller over a day. This is most likely evidence that a communication line is down or that there is a hardware malfunction in the device. Another example is a detector repeating the same flow and/ or occupancy values across multiple time periods. Other examples include detectors reporting high occupancy values, indicating that the detector is stuck on, or report- ing mismatched flow and occupancy values (e.g., zero flow and nonzero occupancy, or vice versa), indicating that the detector is hanging on. If PeMS detects any of these scenarios over a day, it discards the detector’s data and imputes replacement values. In this imputation process, PeMS estimates what the detector’s data might have been based on developed statistical relationships with nearby detectors or based on his- torical averages observed at the broken detector. PeMS then aggregates the full set of observed and imputed detector data across all lanes to the station level and computes spatial performance metrics. To inform the user about the quality of the data or perfor- mance measure that they are viewing, PeMS reports the “percent observed” of every metric, which represents the percentage of data points used to compute the metric that were directly observed from a detector, as opposed to imputed. For example, if the percentage observed for a 5-minute travel time along a route is 75%, then 75% of the detectors on the route were reporting good data, and 25% were reporting bad data. When the detector health algorithms were run on the NOVA data, the team real- ized that the majority of the detectors on the selected corridors were reporting no data or bad data. Figure C.76 plots the daily percentage of good detectors between March 1, 2011, and May 9, 2011, as well as the percentage of bad detectors attribut- able to the two leading causes: no data and stuck. The number of good detectors never exceeds 30%, and generally hovers around 25%. The highest percentage of detectors is in the stuck category, meaning that they are reporting constant flow or occupancy values, or both. VDOTT staff attributed this high percentage to the need to calibrate new detectors after large-scale construction projects. Additionally, a significant per- centage (around 30%) of the detectors never sent PeMS any data. The days on which there were drops in every category represented times when internet outages prevented the research team from acquiring the XML files from the RITIS website. The low percentage of usable data available over the 2011 study time frame greatly concerned the research team, as the quality of computed travel times would be poorer given the missing data. In addition, because the majority of detectors in the network never sent PeMS any good data, it was not possible to develop the historical statisti- cal relationships with the data at nearby working detectors needed to drive the most accurate imputation algorithms. Without these statistical relationships, PeMS had to use less-robust imputation algorithms, which further decreased the accuracy of com- puted travel times. To show the effect this has on the detector data recorded, consider a detector on westbound I-66 that fell into the stuck category for two weeks (Monday to Sunday). This resulted in imputed flows across that entire time period, as shown in Figure C.77. Because this detector never sent PeMS any good data, PeMS could impute

326 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 0 5 10 15 20 25 30 35 40 45 50 3/1/11 3/8/11 3/15/11 3/22/11 3/29/11 4/5/11 4/12/11 4/19/11 4/26/11 5/3/11 Pe rc en t o f D et ec to rs Good Stuck No Data Figure C.76. Daily detector health status, NOVA PeMS deployment, 2011. Figure C.77. Imputed flow values at a broken detector.

327 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY only crude estimates of its flow values based on flows observed at nearby detectors. This meant that PeMS repeated the same flow, occupancy, and speed data for a given hour from week to week. In the sample plot, the hourly flows imputed for the first week are identical to those imputed for the second week. This constancy in imputed data is not ideal for computing travel time reliability, which relies on the ability of the traffic network to capture the real variability in conditions over time. Because data had to be imputed for such a large percentage of the detectors in 2011, the research team decided to seek additional data for 2009, hoping that the data quality would be sufficient to support methodological advancement and use case analysis. This effort is described in the following section. Historical Data The research team worked with the University of Maryland CATT lab to obtain traffic data for 2009. The historical data were delivered in 12 zipped csv files, each about 45 megabytes in size. The csv files contained the same information as the traffic data XML files, so it was straightforward to write a program to parse information from the csv files and put it in the correct database tables in PeMS, with an associated time stamp corresponding to when the data were collected. The one issue that was encoun- tered was that no historical configuration data were available. The team manually compared the IDs of detectors and zones reported in the archived data with those pres- ent in the 2011 configuration XML file and determined that the 2011 configuration data would suffice to represent the 2009 detector locations. After the historical data were entered into the PeMS database and processed, detector health was investigated to see if the 2009 data were better than the 2011 data. Figure C.78 plots the weekly percentage of good detectors over 2009, as well as Figure C.78. Weekly detector health status, NOVA PeMS deployment, 2009. 0 10 20 30 40 50 60 70 80 1/4/09 2/4/09 3/4/09 4/4/09 5/4/09 6/4/09 7/4/09 8/4/09 9/4/09 10/4/09 11/4/09 12/4/09 P er ce nt o f D et ec to rs Good No Data Stuck

328 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY the percentage of detectors falling into the leading two error categories: no data and stuck. During 2009, the number of working detectors was significantly higher than in 2011, generally hovering above 70% for most of the year. The percentage of detec- tors that were stuck and transmitting constant data were much lower, being less than 5% across the whole year. The biggest error category remained the no data condition, which likely represented detectors that were listed in the configuration file but were not yet calibrated to send data. Travel Time Results After acquisition of the real-time and 2009 data, eight routes were constructed in PeMS to monitor reliability across different segments of the four study freeway direc- tions. Five-minute and hourly travel times were created for these eight routes for the entire year of 2009 and March through May of 2011. To evaluate the impact of indi- vidual detector data quality on route-level travel times, the team compared the route travel times for 2009 with those for 2011. Figure C.79 plots the hourly travel times calculated on a 26-mile stretch of westbound I-66 for March through April of 2009. Overall, the PeMS percentage observed for this travel time data was 79%. Figure C.80 plots the same data for the same months of 2011; in this case, only 22% of the data were observed. Overall, the 2009 data follow the pattern expected of a highly con- gested facility with a peak-hour commute; travel times are high on every weekday, but the peak value varies from day to day. Due to the high percentage of imputed detector data, the travel time patterns for the month of April 2011, the weekly travel time pat- terns, look almost identical. It is doubtful that such consistency exists. These patterns are more likely caused by the high percentage of imputed data. For this reason, the team chose to base the methodological advancements of this case study on the 2009 data and to use the 2011 data only to compare with probe travel time runs to further evaluate the data quality. Figure C.79. Travel time, westbound I-66, March 1 through April 30, 2009.

329 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.80. Hourly travel times, westbound I-66, March 1 through April 30, 2011. Summary Data collection is an essential part of any transportation planning or operations activity. Today, transportation agencies are increasingly turning to sophisticated sen- sor arrays to monitor the performance of their infrastructure, which allow for the use of advanced traffic management techniques and traveler information services. External systems, such as a reliability monitoring system, can leverage this data to further maxi- mize the value of installing and maintaining detection. As shown by this case study, data collection for a TTRMS can be automated, but it requires significant time and resources to get it started. METHODOLOGY EXPERIMENTS Overview Because of the type of data available in this case study and investigations done previ- ously in the I-66 corridor, the research team elected to experiment with travel time reli- ability monitoring ideas that are being developed in SHRP 2 Project L10, Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Project L10 researchers are experimenting with a multistate travel time reliability modeling framework using mixed-mode normal distributions to represent the probability density functions of travel time data from a simulation model of eastbound I-66 in Northern Virginia. They are also using this same technique to analyze travel times from toll tag data collected on I-35 in San Antonio, Texas (1). The present case study adopted that technique and applied it to the travel times calculated from the freeway loop detectors on eastbound I-66. According to the SHRP 2 L10 research, multistate models are appropriate for modeling travel time distributions because most freeways operate in multiple states (conditions) across the year (or some other time frame): for example, an uncongested state; a congested state; and a congested state caused by nonrecurrent events, such as

330 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY incidents, construction, weather, or fluctuations in demand. This concept is illustrated in Figure C.81, which shows the distribution of weekday travel times on a corridor in Northern Virginia. Three travel time modes are evident, which may be interpreted as the most frequently occurring travel times for the uncongested state, the congested state, and the nonrecurring congestion state. Multistate models also provide a helpful framework for delivering understandable information to the end consumer of travel time reliability information: the driver. They provide two pieces of information: the probability that a particular state will be extant during a given time period and the travel time distribution for that state during that time period. This provides a way of creating reliability information that is similar to how people are accustomed to receiv- ing weather forecasts; for example, “There is a 60% chance that it will rain tomorrow, and if it does rain, the expected precipitation will be 1 inch.” The reliability analog to this is, for example, “The percentage chance of encountering an incident-based con- gestion state during the morning peak period is 20%. If one does occur, the expected average travel time is 45 minutes and the 95th percentile travel time is 1 hour.” Beyond its suitability for modeling travel time distributions and providing useful metrics, this methodology also fits well with the work that the SHRP 2 L02 team is doing to develop travel time distributions for different operating regimes. The differ- ent states of the normal-mixture models are the conceptual equivalent of the regimes that L02 is working on to classify the operating conditions of routes and facilities. It Figure C.81. Distribution of travel times on eastbound I-66, 5:00 a.m. to 9:00 p.m.

331 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY also provides an opportunity to test a methodology that was developed for modeling the distribution of individual vehicle travel times on aggregated travel times calculated from loop detectors. Site Description A multistate model was developed for a 26-mile stretch of eastbound I-66 from Manassas to Arlington, Virginia. A map of the corridor is shown in Figure C.82. This segment of freeway is monitored by 96 sensors that are a mix of radar detectors and loop detectors. The selected data set consists of 17,568 travel times at a 5-minute-level aggregation, which represents the average travel time experienced by vehicles depart- ing the route origin during that 5-minute time period. This data set covers the travel times for departures every 5 minutes during the weekdays between January 1, 2009, and March 30, 2009. The route is a major commute path from the suburbs of Northern Virginia into Washington, D.C. It sees the highest demand levels during the morning peak period, as well as a smaller increase in demand during the afternoon peak. A PeMS-generated plot of the minimum, average, and maximum travel times by hour of the day measured over the study time frame is shown in Figure C.83. Figure C.82. Map of eastbound I-66 study corridor. Map data © 2012 Google.

332 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Method The goal of this study was to generate, for each hour of the day, two outputs: the percentage chance that the traveler would encounter a certain condition; and for each condition, the average and 95th percentile travel time. The mathematical details of these steps are explained by Guo et al. (2). Under this framework, three questions are answered for each time period: 1. How many states are needed to model the travel time distribution? 2. What is the probability of each state occurring? 3. What parameters describe the normal distribution for each state? Analysis was performed using the MCLUST package in R, which provides func- tions to support normal-mixture modeling and model-based clustering (3). Normal- mixture models were developed to represent travel times for each hour of the day between 4:30 and 12:30 a.m. The early-morning hours were not considered due to the lack of any congestion. The first question above was answered by putting the data set for each hour into a function that initially clusters the data into the num- ber of states that provide a best fit (in this guide, the “optimal” number of states). The best fit was determined using the Bayesian information criterion (BIC), defined as −2 log(L) + k log(n), where L is the likelihood function of the model parameters, k is the number of parameters, and n is the sample size of the data. This function considers the fit of the model while penalizing for an increased number of parameters to prevent against overfitting. The model with the number of states that produces the lowest BIC Figure C.83. Minimum, average or mean, and maximum study corridor travel times on eastbound I-66, weekdays, January 1 through March 30, 2009.

333 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY is selected as the optimal model, and each data point is given an initial probability of belonging to each state. The outputs of this step (the model type, number of states, and initial probabilities of a data point belonging to each state) are then put into an expectation–maximization algorithm, an iterative method appropriate for mixture models, which is used to find maximum likelihood estimates of parameters. The expectation–maximization algo- rithm outputs the mixture component for each state (its probability of occurrence), the mean and standard deviation of each state, and the final estimates of the probability that a data point belongs to each state. These estimates are used to form a user-centric output; for example, “If you depart on a trip at noon, you will have a 20% chance of experiencing congestion. During congestion, the average travel time is 30 minutes and the 95th percentile travel time is 45 minutes.” During the analysis process, complications arose that required the research team to balance the desire for a best-fitting model with the need to provide useful and clear information to the end user. The initial clustering step suggested that either three or four states were needed to optimally model the travel times for each hour. The opti- mal number of states for each hour’s model is summarized in Table C.31, along with the associated BICs. However, in the practical realm, a historical set of travel times from a given hour can be conceptualized as consisting of no more than three states. Early-morning time periods may only have one state, noncongested, and can thus be described by a single distribution. Time periods when demand fluctuates may have two states: noncongested and congested, with congestion being triggered either by high demand or a nonrecurrent condition. Finally, the peak periods may have three states: a noncongested state, likely rare, when demand is low; a congested state, which is common; and a very congested state, which may be triggered by an incident or spe- cial event. The fourth state has no clear physical explanation that can be effectively conveyed to the end user. As such, each hour’s data set was run through the clustering algorithm again, this time with a constraint of three maximum states. The constrained best-fit state for each hour and its associated BIC are shown in Table C.31. Three states provided the best-fit for all but two hours (12:30 and 2:30 p.m.), when two states provided the best fit. Following the expectation–maximization step using the constrained number of states, the mean travel time estimates for each state were evaluated. These mean travel times are summarized in Table C.32. For the majority of hours (all hours outside of the morning peak), the mean travel times for State 1 (S1) and State 2 (S2) were very similar (within 3 minutes of each other). These are denoted in Table C.32 by gray shading. Because such small differences in average travel times are not meaningful enough to the end user to be considered different states, any hour for which three states were sug- gested, but mean travel times between consecutive states differed by less than 3 min- utes, were reduced to two states. The model parameters were then reestimated for this final number of states. In the end, the models for each hour were composed of two states, with the exception of the morning peak hours (6:30 to 10:30 a.m.), which remained composed of three states. The final number of states and associated BICs for each hour are shown in the final column in Table C.31.

334 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.31. SELECTION OF STATES Hour Optimal Constrained Final No. of States BIC No. of States BIC No. of States BIC 4:30 to 5:30 a.m. 3 1387 3 1387 2 1443 5:30 to 6:30 a.m. 4 3580 3 3595 2 3620 6:30 to 7:30 a.m. 3 4322 3 4322 3 4322 7:30 to 8:30 a.m. 3 5017 3 5017 3 5017 8:30 to 9:30 a.m. 4 4854 3 4855 3 4855 9:30 to 10:30 a.m. 3 3876 3 3876 3 3876 10:30 to 11:30 a.m. 4 2561 3 2567 2 2622 11:30 a.m. to 12:30 p.m. 3 1578 3 1578 2 1640 12:30 to 1:30 p.m. 4 960 2 968 2 968 1:30 to 2:30 p.m. 3 1081 3 1081 2 1132 2:30 to 3:30 p.m. 3 1118 3 1118 2 1153 3:30 to 4:30 p.m. 3 1675 3 1675 2 1725 4:30 to 5:30 p.m. 2 3074 2 3074 2 3074 5:30 to 6:30 p.m. 3 3160 3 3160 2 3170 6:30 to 7:30 p.m. 3 2793 3 2793 2 2812 7:30 to 8:30 p.m. 4 1459 3 1464 2 1477 8:30 to 9:30 p.m. 3 1283 3 1283 2 1291 9:30 to 10:30 p.m. 4 1220 3 1220 2 1233 10:30 to 11:30 p.m. 3 2398 3 2398 2 2488 11:30 p.m. to 12:30 a.m. 3 2162 3 2162 2 2178 Results This section first summarizes the travel time reliability findings for each weekday hour. It then provides an in-depth analysis of model results for the morning peak hours. Overall Figure C.84 presents, for each hour of the day and for each state, the probability of the state’s occurrence (top left), the mean travel time (top right), the standard devia- tion of its travel times (bottom left), and the 95th percentile travel time (bottom right). Estimates for State 1 are shown by the dashed line, State 2 by the solid line, and State 3 (when applicable) by the bold line. Values are also summarized in Table C.33. As can be seen in the plot of each state’s probability, State 1 is by far the most com- mon state encountered during the early-morning, the midday, and the late-night hours. When this state is active during these hours, the mean travel time tends to be near free flow at around 25 minutes, the standard deviation is low, and the 95th percentile is close to the mean. During these off-peak periods, the percentage chance of congestion (State 2) generally stays between 10% and 20%. Even when the congested state is

335 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY active during these hours, the mean travel time is still generally less than 30 minutes, and the 95th percentile travel time is generally less than 35 minutes. At the beginning of the afternoon peak (4:30 to 5:30 p.m.), State 1 and State 2 each have a 50% chance of occurring. During the afternoon peak hour (5:30 to 6:30 p.m.), the probability of congestion increases to 67%. At the end of the afternoon peak hour (6:30 to 7:30 p.m.), the probability of State 1 and State 2 effectively swap; State 1 has a 64% change of occurring and State 2, a 36% change of occurring. Throughout the afternoon peak, the mean and 95th percentile travel times of each state are consis- tent. State 1 has a mean travel time of 26 to 27 minutes and a 95th percentile travel time of 27 to 28 minutes, and State 2 has a mean travel time of 30 to 31 minutes and a 95th percentile travel time of 33 to 34 minutes. The 4 hours of the morning peak (6:30 to 10:30 a.m.) have three active states, as they have both the most congestion and travel time variability. Within these 4 hours, however, both the relative probabilities of each state and the parameters of each state differ significantly. State 3 (conceptualized as the nonrecurrent congestion state) has the greatest chance of occurring at the beginning of the morning peak (between 6:30 and 7:30 a.m.) and between 8:30 and 9:30 a.m. (41% and 39%, respectively). Its TABLE C.32. MEAN TRAVEL TIMES BY STATE FOR CONSTRAINED PARAMETERS Hour S1 S2 S3 4:30 to 5:30 a.m. 24 25 29 5:30 to 6:30 a.m. 25 26 31 6:30 to 7:30 a.m. 28 33 37 7:30 to 8:30 a.m. 28 39 46 8:30 to 9:30 a.m. 28 34 42 9:30 to 10:30 a.m. 26 29 34 10:30 to 11:30 a.m. 25 26 28 11:30 a.m. to 12:30 p.m. 25 25 29 12:30 to 1:30 p.m. 23 25 N/A 1:30 to 2:30 p.m. 24 26 30 2:30 to 3:30 p.m. 24 25 27 3:30 to 4:30 p.m. 24 26 29 4:30 to 5:30 p.m. 26 27 N/A 5:30 to 6:30 p.m. 25 31 31 6:30 to 7:30 p.m. 27 27 31 7:30 to 8:30 p.m. 25 26 28 8:30 to 9:30 p.m. 25 26 31 9:30 to 10:30 p.m. 25 26 28 10:30 to 11:30 p.m. 25 26 31 11:30 p.m. to 12:30 a.m. 25 26 30 Note: Shading denotes very similar mean travel times. N/A = not applicable.

336 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY likelihood is around 25% during the other 2 hours. The severity of congestion in this state differs across each hour. It has the highest mean travel time (46 minutes) and 95th percentile travel time (58 minutes) during the 7:30 a.m. hour, indicating that this is the true morning peak hour. At 8:30 a.m., the mean travel time of this state is reduced to 42 minutes, and the 95th percentile travel time to 51 minutes. On the shoulders of the Figure C.84. State probabilities, mean travel times, standard deviation, and 95th percentile travel times by time of day.

337 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.33. PROBABILITY, MEAN TRAVEL TIME, STANDARD DEVIATION, AND 95TH PERCENTILE TRAVEL TIME BY STATE Time Probability (%) Mean Standard Deviation 95th Percentile S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S2 S3 4:30 a.m. 92 8 N/A 24 27 N/A 0.5 2.1 N/A 25 30 N/A 5:30 a.m. 46 54 N/A 25 30 N/A 0.9 3.3 N/A 27 36 N/A 6:30 a.m. 23 36 41 28 33 37 1.3 1.9 4.5 30 36 44 7:30 a.m. 17 58 25 28 39 46 1.9 4.5 7.7 31 46 58 8:30 a.m. 29 32 39 28 34 42 1.9 3.6 5.7 31 40 51 9:30 a.m. 25 50 24 26 29 34 0.8 1.9 3.9 27 32 40 10:30 a.m. 68 32 N/A 25 28 N/A 0.6 2.3 N/A 26 32 N/A 11:30 a.m. 89 11 N/A 25 28 N/A 0.5 2.8 N/A 25 28 N/A 12:30 p.m. 89 11 N/A 25 26 N/A 0.3 1.0 N/A 25 28 N/A 1:30 p.m. 91 9 N/A 25 26 N/A 0.4 1.8 N/A 25 29 N/A 2:30 p.m. 85 15 N/A 25 26 N/A 0.3 1.2 N/A 25 28 N/A 3:30 p.m. 83 17 N/A 25 27 N/A 0.5 1.7 N/A 26 30 N/A 4:30 p.m. 50 50 N/A 26 30 N/A 0.8 1.6 N/A 27 33 N/A 5:30 p.m. 33 67 N/A 27 31 N/A 0.7 1.5 N/A 28 34 N/A 6:30 p.m. 64 36 N/A 26 30 N/A 0.7 1.6 N/A 27 33 N/A 7:30 p.m. 91 9 N/A 25 28 N/A 0.5 1.8 N/A 26 31 N/A 8:30 p.m. 96 4 N/A 25 28 N/A 0.4 4.5 N/A 26 37 N/A 9:30 p.m. 95 5 N/A 25 28 N/A 0.4 2.3 N/A 26 31 N/A 10:30 p.m. 74 26 N/A 25 29 N/A 0.6 2.6 N/A 26 33 N/A 11:30 p.m. 89 11 N/A 26 29 N/A 0.7 1.7 N/A 27 32 N/A Note: N/A = not applicable. morning peak, the mean travel times of State 3 are 34 and 37 minutes, and the 95th percentile travel times are 40 and 44 minutes. State 2 occurs with varying probabilities during the morning peak, ranging from a low of 32% at 8:30 a.m. to a high of 58% at 7:30 a.m. The mean and 95th percentile travel times of State 2 are significantly higher during the morning peak than at any other time period of the day. Even though this time period usually experiences congestion and some travel time variability, there are days (approximately one out of five) when the corridor operates in the uncongested state, and mean travel times are around 28 minutes. The information gained from the example plot and accompanying table can be used to provide intuitive and useful information to the traveling public, in ways illus- trated in the following section, which focuses on interpreting the results for the morn- ing peak hours.

338 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Morning Peak As discussed in previous sections, a three-state normal-mixture model was selected to measure reliability statistics for the four morning peak hours. Figure C.85 provides a visual comparison of the relative model fits of the three-state normal-mixture model, a two-state normal-mixture model, and a lognormal distribution model. These fits are also quantitatively summarized in Table C.34, which compares the BICs for each model for each hour. Visually, it is clear that for every hour except 8:30 a.m., the three- state normal model approximates the data the most closely. This is also reflected in the BIC values, which are the lowest for the three-state normal-mixture model. During the Figure C.85. Lognormal and two- and three-state normal-mixture models for morning peak hours.

339 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 8:30 a.m. hour, the fits between the three-state and two-state mixture models appear comparable, and their BICs are essentially equivalent. Figure C.86 provides a clearer visual comparison of the different travel time distri- butions within each morning hour by plotting them on the same x- and y-axis scales. It is evident that the two middle peak hours (7:30 and 8:30 a.m.) have the most travel time variability; the distributions for the shoulder hours are more tightly packed. In particular, there is a large spike in the travel time distribution for the 9:30 a.m. hour at 25 minutes, which is essentially free flow for this corridor. In this figure, each bar of the travel time histogram is shaded according to which state the model determined it was the most likely to fall into. There are no clearly defined boundaries for each state; rather, for each observed travel time, the model provides the percentage chance that the data point belongs to each state. For some values (e.g., 24 minutes), there is a near 100% likelihood that the travel time belongs in State 1. For others, such as a 46-minute travel time during the 7:30 a.m. hour, there is a near 50% chance that the data point belongs to State 2 and a near 50% chance that it belongs to State 3. Thus, these shadings are meant only to be a rough visualization of the component travel times of each state. The desired final output of these analyses is reliability information that can be readily interpreted and used by corridor drivers who are planning to make a trip at a certain time. From the information presented above, the following examples convey information that could be provided to drivers on a pretrip basis to aid them in their planning process: • For trips made between 7:30 and 8:30 a.m., there is a 60% chance of experiencing congestion. If congestion occurs, the expected travel time is 39 minutes, and the 95th percentile travel time is 46 minutes. There is also a 25% chance of experi- encing severe, incident-based congestion. If this occurs, the expected travel time is 46 minutes, and the 95th percentile travel time is 58 minutes. • For trips made between 9:30 and 10:30 a.m., there is a 50% chance of experienc- ing congestion. If congestion occurs, the expected travel time is 29 minutes, and the 95th percentile travel time is 32 minutes. There is also a 25% chance of expe- riencing severe, incident-based congestion. If this occurs, the expected travel time is 34 minutes, and the 95th percentile travel time is 40 minutes. TABLE C.34. BICS BY DISTRIBUTION MODEL Time Three-State Normal Two-State Normal Lognormal 6:30 to 7:30 a.m. 4322 4346 4330 7:30 to 8:30 a.m. 5017 5053 5034 8:30 to 9:30 a.m. 4856 4856 4910 9:30 to 10:00 a.m. 3876 3954 3981

340 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.86. Travel time distributions and states, morning peak. Summary This case study leverages the methodologies developed by the SHRP 2 L10 research team and applies them to 3 months of 5-minute aggregated loop detector data collected on a 26-mile corridor of eastbound I-66 in Northern Virginia. The results indicate that normal-mixture models reasonably approximate travel time data observed within a given time period. Two-state models seem sufficient to accurately model off-peak hours, but three-state models are needed to capture the variability during the peak hours. Beyond providing a good fit to travel time data, mixture models also output data in a form that can be easily conveyed to help end users better plan for trips.

341 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY PROBE VEHICLE COMPARISONS Introduction To better understand the implications of the data quality issues on travel times, the team performed a quality control procedure. Probe vehicle runs were conducted along I-66 to amass ground-truth data that could be compared with the sensor data. A GPS- based data collection device was used capable of collecting data at 1-second intervals. The sections of roadway along which probe runs were conducted and details concern- ing the sensor data collected as part of this effort are described in Table C.35. Along this corridor, as elsewhere in the study region, point detectors are placed at approximately half-mile intervals. Due to accuracy and maintainability issues with inductive loop detectors and other older sensors, there are no plans to replace the failed units deployed on the mainline lanes of NOVA-region freeways. Instead, plans are in motion to transition to the use of nonintrusive radar-based detection technolo- gies along the freeways. These sensors are being deployed both as replacements for older failed units, as well as new installations. As a result of a combination of the fail- ure of some older loop detector stations, ongoing roadway construction, and the need to configure many of the newer radar-based units, data are available for only about 75 of NOVA’s freeway detectors. Figure C.87 provides a visual indication of the avail- ability of data on I-66 and I-395; darker-colored icons indicate working stations, and lighter-colored icons indicate nonworking stations. Data-Related Issues Associated with NOVA Sensors As discussed, construction and maintenance issues resulted in a limited number of operational sensors from which data were available for use. In addition, a number of sensors that at first appeared to be in working order were actually transmitting speed or flow data of questionable quality. For example, Figure C.87 shows five work- ing sensors operating in close proximity to one another along I-395. However, a closer analysis of the data output by several of these sensors indicated conditions that are either decidedly irregular or simply inaccurate. Examples are shown in Figure C.88. Although the sensor providing the speed and flow data in Figure C.88 appeared to be functioning properly (as reported by the automated system used by the team to collect and to analyze data as part of this project), a review of the speed data (y-axis) and flow data (z-axis) indicates the following: TABLE C.35. OVERVIEW OF PROBE RUNS Segment Route Time Period Runs Start and End Mileposts No. of Sensors Date A > B I-66 EB Afternoon peak 1, 2, and 3 68.5 to 74.3 4 April 19, 2011 C > D I-66 WB Afternoon peak 4, 5, and 6 74.2 to 69.9 3 April 19, 2011 E > F I-66 EB Morning off peak 7, 8, and 9 54.4 to 56.3 4 April 20, 2011 G > H I-66 WB Morning off peak 10, 11, and 12 56.3 to 54.4 4 April 20, 2011 Note: EB = eastbound; WB = westbound.

342 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.87. Display of functioning versus nonfunctioning sensor stations. Map data © 2012 Google. Figure C.88. Speed and flow data gathered from suspect sensor along I-395. Source: NOVA PeMS.

343 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Speeds reported by this sensor are approximately 27 mph at all times of day except during the middle of the night, when traffic speeds increase significantly. • The reported traffic flows appear fairly normal (with the exception of an apparent issue occurring between approximately 1 and 5 p.m. on May 10, 2011), except that the peak traffic volume is reported as occurring between noon and 3 p.m., rather than the typical 4 to 7 p.m. A field review of conditions by team members at this location and during this time period does not support this suggested condition. A review of data collected from other sensors along southbound I-395 adjacent to this detector shows similar conditions in that the peak traffic flow is reported as occur- ring between noon and 3 p.m., resulting in a concomitant drop in speeds to between 30 and 40 mph. As indicated above, a field review of conditions did not support this reported condition. Figure C.89 shows similar issues for a sensor along I-66. As with the sensor data reported in Figure C.88, data from the sensor displayed in Figure C.89 indicate the existence of conditions along I-66 that diverge from conven- tional wisdom concerning the time of day at which the peak travel condition occurs. According to these data, peak volumes and the lowest speeds regularly occur at this location between approximately 2:00 and 6:30 a.m., with speeds near 70 mph present during the remainder of the day. Again, a field review conducted by team members indicated that these data do not accurately represent the conditions that really exist. Figure C.89. Speed and flow data gathered from suspect sensor along I-66. Source: NOVA PeMS.

344 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY It is likely that some portions of these data-related issues are the result of the high percentage of imputed detector data being used to represent conditions at many detector stations (e.g., 59% of data used to generate the contents of Figure C.88 are imputed rather than observed). However, an even more significant issue is related to the need for these detectors to be fully calibrated on a systemwide basis to ensure they accurately represent real-world conditions. Although VDOT is in the process of doing this, the team recommends that speed, flow, and estimated travel time data derived from these quantities be used sparingly until this process is complete. Failure to do so may result in decisions based on largely erroneous data, potentially resulting in a significant waste of resources and labor. Methodology The primary question the team wanted to answer in this probe-based experiment was this: How well do the probe data align with the traffic speed and travel time estimates provided by the sparsely deployed point-based detectors? The primary method for answering this question was to compare data collected at 1-second intervals from a GPS-based data collection device against speed estimates based on data from VDOT sensors deployed along each of the four sections of I-66 described above. As part of this effort, the analytical approach described below was used. For each segment of roadway, graphs were used to compare the speed of the probe vehicle with speeds reported by the sensors. Speeds were displayed on the vertical axis and mileposts on the horizontal axis. The solid line represented the speed estimates generated by the sensors (based on aggregate data collected from all lanes of travel), and the dotted line represented the probe vehicle speeds. In cases in which data from the sensors was of suspected quality, the line representing the speed estimate provided by that sensor was dashed rather than completely solid. The locations of all the sensors from which data were collected along each roadway were indicated by a solid circle at the midpoint of each segment, accompanied by the sensor’s identification number. The team subsequently analyzed the differences between these two data sets along each segment. In addition to analyzing the speed data, the team analyzed the differences between the travel times experienced by the probe vehicle during each trip versus the estimated travel times generated from the sensor speeds. In situations in which unreliable sensor data were present, a combination of observed sensor speeds and imputed speeds was used to fill in the gaps. Results of each analysis were then compared to calculate the average (absolute) error for each segment of roadway, as well as for the complete set of runs as a whole. Data Analysis The speed data from the probe-based runs was compared with the speed estimates generated using the spot speed sensors located along the same sections of roadway.

345 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Data Analysis Along I-66 Inside of I-495 Eastbound Figures C.90, C.91, and C.92 show plots of the instantaneous speeds recorded by the vehicle probe as it traversed I-66 eastbound inside of I-495 at three times on Tuesday, April 19, 2011, plotted against the speeds reported by the detectors (804, 822, 808, and 817) along that stretch of roadway at that those same times. Figure C.90. Segment A > B, Run 1, I-66 eastbound at 3:40 p.m., Tuesday, April 19, 2011. Figure C.91. Segment A > B, Run 2, I-66 eastbound at 5:23 p.m., Tuesday, April 19, 2011.

346 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Comparison of the probe speeds with the sensor-based speeds suggests the following: • Data generated by Sensor 804 (Mileposts 68.5 to 70.0) were not consistent with the probe data collected along this roadway segment. The most likely explanation is data quality issues with the sensor. The speed reported by this sensor for most of the day was about 27 mph. • All data (100%) for Sensor 822 (Mileposts 70.0 to 71.05) were imputed. The im- puted data suggest a sustained free-flow speed that is clearly inaccurate based on the speeds observed by the probe vehicle. • Data generated by Sensor 808 (Mileposts 71.05 to 72.7) were not consistent with the probe data. Again, the explanation is likely to be data quality issues with the sensor. The speed reported by this sensor was about 28 mph for most of the day. • Sensor 817 (Mileposts 72.7 to 74.3) is the one sensor that appeared to provide reli- able speed data for the time periods during which the probe runs were conducted. Even so, the probe vehicle speeds were lower, and significantly so for probe Runs 1 and 2. Data Analysis Along I-66 Inside of I-495 Westbound Figures C.93, C.94, and C.95 show plots of the instantaneous speeds recorded by the vehicle probe as it traversed I-66 westbound inside of I-495 at three times on Tuesday, April 19, 2011, plotted against the speeds reported by the detectors (819, 1422, and 806) along that stretch of roadway at that those same times. Comparison of these probe data with the sensor-based speeds suggests the following: • Sensor 819 (Mileposts 74.2 to 72.7) reported fairly reliable speeds for the time periods during which the probe runs were conducted. Even so, the probe speeds Figure C.92. Segment A > B, Run 3, I-66 eastbound at 6:18 p.m., Tuesday, April 19, 2011.

347 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.93. Segment C > D, Run 4, I-66 westbound at 3:27 p.m., Tuesday, April 19, 2011. Figure C.94. Segment C > D, Run 5, I-66 westbound at 4:05 p.m., Tuesday, April 19, 2011. were lower than those reported by the sensor, especially during the latter portion of probe Run 3, during which significant congestion was encountered. • Data generated by Sensor 1422 (Mileposts 72.7 to 70.8) were not consistent with the probe data due to data quality issues with the sensor. The speed reported by this sensor was about 28 mph for most of the day. • All data (100%) for Sensor 806 (Mileposts 70.8 to 69.9) were imputed. No field observations were generated by the sensor during any of the probe runs. Imputed data for this section of roadway indicates near free-flow speeds that were demon- strated to be inaccurate by the probe vehicle.

348 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.95. Segment C > D, Run 6, I-66 westbound at 6:38 p.m., Tuesday, April 19, 2011. Data Analysis Along I-66 Outside of I-495 Eastbound Figures C.96, C.97, and C.98 show plots of the instantaneous speeds recorded by the vehicle probe as it traversed I-66 eastbound outside of I-495 at three times on Wednes- day, April 20, 2011, plotted against the speeds reported by the detectors (1139, 1157, 1141, and 1142) along that stretch of roadway at that those same times. Figure C.96. Segment E > F, Run 7, I-66 eastbound at 9:43 a.m., April 20, 2011.

349 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.97. Segment E > F, Run 8, I-66 eastbound at 10:20 a.m., April 20, 2011. Figure C.98. Segment E > F, Run 9, I-66 eastbound, 10:36 a.m., April 20, 2011.

350 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Comparison of these probe data with the sensor-based speeds suggests the following: • Only 15% of the speeds reported by Sensor 1139 (Mileposts 54.4 to 54.9) were actually observed. Consequently, although those speeds are reasonably consistent with the conditions observed by the probe vehicle, it is unclear whether this sensor would provide accurate data under other conditions. • All speeds (100%) reported by Sensor 1157 (Mileposts 54.9 to 55.4) were im- puted. Those imputed data suggested sustained free-flow speeds, which is consis- tent with the conditions encountered by the probe vehicle. • All speeds (100%) reported by Sensor 1141 (Mileposts 55.4 to 55.8) were im- puted. Those imputed speeds suggest sustained free-flow conditions, which is con- sistent with the conditions encountered by the probe vehicle (although the sensor shows slightly higher speeds during two of the three probe runs). • As was reported by Sensor 1139, only 15% of the speeds reported by Sensor 1142 (Mileposts 55.8 to 56.3) were actually observed. Although the sensor suggested the conditions encountered by the probe vehicle, it is unclear whether this sensor would provide accurate data under other conditions. Data Analysis Along I-66 Outside of I-495 Westbound Figures C.99, C.100, and C.101 show plots of the instantaneous speeds recorded by the vehicle probe as it traversed I-66 westbound outside of I-495 at three times on Wednesday, April 20, 2011, plotted against the speeds reported by the detectors (1143, 1156, 1158, and 1140) along that stretch of roadway at that those same times. Figure C.99. Segment G > H, Run 10, I-66 westbound at 9:34 a.m., April 20, 2011.

351 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.100. Segment G > H, Run 11, I-66 westbound at 9:53 a.m., April 20, 2011. Figure C.101. Segment G > H, Run 12, I-66 westbound at 10:27 a.m., April 20, 2011.

352 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Comparison of these probe data with the sensor-based speeds suggests the following: • Only 15% of the speeds generated by Sensor 1143 (Mileposts 56.3 to 55.7) were actually observed. Consequently, although the speeds reported by this sensor are close to those observed by the probe, it is unclear whether this sensor would pro- vide accurate data under other conditions. • All speeds (100%) reported by Sensor 1156 (Mileposts 55.7 to 55.3) were im- puted. The imputed speeds suggest sustained free-flow speeds along this portion of the freeway mainline, which is consistent with the conditions encountered by the probe vehicle. • All speeds (100%) for Sensor 1158 (Mileposts 55.3 to 54.85) were imputed. Those imputed speeds suggest sustained near free-flow speeds, which is somewhat lower than speed data generated by the probe vehicle. • As with Sensor 1143, only 15% of the data generated by Sensor 1140 (Mileposts 54.85 to 54.4) were observed. This lack of observed data helps to explain the lower speeds generated by this sensor versus those reported by the probe vehicle. Comparison of Travel Times: Probe (Measured) versus Sensor (Estimated) Based on the speed data from the probe vehicle runs and speed estimates provided by the sensors, segment travel times were generated for each of the 12 probe runs de- scribed above. Two approaches were used to calculate roadway travel times based on the sensor data: • Approach 1: All speed data received by the team from the sensors were used re- gardless of whether the data were good, imputed, or suspect. • Approach 2: Data from nearby sensors were used in place of the data from the sensors that were flagged (manually) as likely generating suspect data, based on the reporting of very low speeds over significant periods of time: — Runs 1, 2, and 3: Substituted data for Sensors 804 and 808; — Runs 4, 5, and 6: Substituted data for Sensor 1422; — Runs 7, 8, and 9: No substitution of data; and — Runs 10, 11, and 12: No substitution of data. As no substitution of sensor data occurred for Runs 7 to 12, Approach 2 was not employed as part of the travel time estimation process along those segments of roadway. Travel times are given below for Runs 1 to 3 (Table C.36), Runs 4 to 6 (Table C.37), Runs 7 to 9 (Table C.38), and Runs 10 to 12 (Table C.39). Travel times collected during the first day of probe data collection (April 19) dif- fered significantly from the estimated travel times generated by the sensor data (using either Approach 1 or 2). For example, for Runs 1, 2, and 3 there was an overall absolute average error of 15% for Approach 1 and 39% for Approach 2. Although this might result in a perception that the sensor data along this segment are useful for

353 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.36. TRAVEL TIMES FOR RUNS 1, 2, AND 3 (A > B), APRIL 19, 2011 Start Time Road Start MP End MP Probe Vehicle Travel Time (Measured) (min) VDOT Sensor Travel Time (Estimated) (min) Approach 1 Error (%) Approach 2 Error (%)Approach 1 Approach 2 3:40 p.m. I-66 EB 68.5 74.3 6.3 7.0 4.7 +11 −25 5:23 p.m. I-66 EB 68.5 74.3 10.1 7.0 4.6 −31 −54 6:18 p.m. I-66 EB 68.5 74.3 7.4 7.1 4.6 −4 −37 Note: MP = milepost. TABLE C.37. TRAVEL TIMES FOR RUNS 4, 5, AND 6 (C > D), APRIL 19, 2011 Start Time Road Start MP End MP Probe Vehicle Travel Time (Measured) (min) VDOT Sensor Travel Time (Estimated) (min) Approach 1 Error (%) Approach 2 Error (%)Approach 1 Approach 2 3:27 p.m. I-66 WB 74.2 69.9 7.2 6.3 4.0 −12 −44 4:05 p.m. I-66 WB 74.2 69.9 4.6 6.3 4.0 +37 −13 6:38 p.m. I-66 WB 74.2 69.9 12.2 6.1 4.1 −50 −66 Note: MP = milepost. TABLE C.38. TRAVEL TIMES FOR RUNS 7, 8, AND 9 (E > F), APRIL 20, 2011 Start Time Road Start MP End MP Probe Vehicle Travel Time (Measured) (min) VDOT Sensor Travel Time (Estimated) (min) Approach 1 Error (%) Approach 2 Error (%)Approach 1 Approach 2 9:43 a.m. I-66 EB 54.4 56.3 1.8 1.7 N/A −6 N/A 10:20 a.m. I-66 EB 54.4 56.3 1.8 1.7 N/A −6 N/A 10:36 a.m. I-66 EB 54.4 56.3 1.8 1.7 N/A −6 N/A Note: MP = milepost; N/A = not applicable because no substitution of sensor data occurred for Runs 7 to 9, so Approach 2 was not used. TABLE C.39. TRAVEL TIMES FOR RUNS 10, 11, AND 12 (G > H), APRIL 20, 2011 Start Time Road Start MP End MP Probe Vehicle Travel Time (Measured) (min) VDOT Sensor Travel Time (Estimated) (min) Approach 1 Error (%) Approach 2 Error (%)Approach 1 Approach 2 9:34 a.m. I-66 WB 56.3 54.4 1.7 1.8 N/A +6 N/A 9:53 a.m. I-66 WB 56.3 54.4 1.7 1.8 N/A +6 N/A 10:27 a.m. I-66 WB 56.3 54.4 1.7 1.8 N/A +6 N/A Note: MP = milepost; N/A = not applicable because no substitution of sensor data occurred for Runs 10 to 12, so Approach 2 was not used.

354 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY calculating travel times, it must be remembered that two of the sensors generated sus- pect speed data—in this case, very low freeway speeds. Incorporating these speeds into the travel time estimation appears to have offset the higher roadway speeds generated by the other two roadway sensors, speeds that were generally much higher than those reported by the probe vehicle. Consequently, incorporation of the likely erroneous slow speeds resulted in travel times closer to those experienced by the probe vehicle— an unintended consequence of the use of this data. Moreover, the nearly identical travel time estimates generated using both approaches over the course of several hours speaks to the likely impact of the considerable amount of data imputation that occurred. The steadiness of these travel time estimates is not ideal for computing reliability, which relies on the ability of the system to detect variability in traffic conditions over time. Reviewing the content of the histogram in Figure C.102, which provides a breakdown of afternoon peak period (3:00 to 7:00 p.m.) travel times along the roadway segment used as part of Runs 1, 2, and 3 (Segment A > B) for a 2-month period (March 15 to May 15), demonstrates a fairly low amount of travel time variability over the more than 2,000 5-minute data collection periods for which data were collected. Travel times collected during the second day of probe runs conform much more closely to the estimates from the sensors, with an average error of 6% in each direc- tion of travel. However, nearly all of these data were imputed (only 15% was observed data, which was provided by four of the eight sensors from which data were collected). As a result, it is highly unlikely that these sensors would provide accurate travel times under most congested conditions. The full extent of this problem is made clear by the histogram in Figure C.103, which demonstrates that over the course of 2 months, a total of only 44 (of 2,156 total) 5-minute time slices along Segment E > F (Runs 7, 8, and 9) were reported as having travel times in excess of 2 minutes during the morn- ing peak period. A nearly identical travel time distribution exists for westbound travel times along this segment of I-66 during the morning peak period. Figure C.102. I-66 eastbound afternoon peak travel times, Mileposts 68.5 to 74.3, from March 15, 2011, through May 15, 2011.

355 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY LESSONS LEARNED Overview The team selected Northern Virginia as a case study site because it provided an oppor- tunity to integrate a reliability monitoring system into a preexisting, extensive data collection network. The data collected on NOVA roadways are already passed to a number of external systems, including RITIS at the University of Maryland, the archived data management system at the University of Virginia, and the statewide 511 system. Configuring PeMS to receive NOVA data helped define the requirements for complex traffic systems integration and illustrate what agencies can do to facilitate the process of implementing reliability monitoring. Systems Integration The process of fully integrating the NOVA data with PeMS took several weeks. Although this amount of effort is standard when integrating archived data user sys- tems with traffic data collection systems, there are a number of steps that agencies can take to make this integration go more smoothly and quickly. It is important that the implementation and maintenance of a traffic data collection system be carried out with a broad audience in mind. Efforts such as the federal gov- ernment’s 2009 Open Government Initiative underscore the value of providing public access to government data. Often, increasing access to data outside an organization can help to further agency goals; for example, providing data to mobile application developers can help agencies distribute information in a way that increases the effi- ciency of the transportation network. It will also help agencies support contractors’ efforts to implement procured systems, such as a TTRMS. Figure C.103. I-66 eastbound morning peak travel times, Mileposts 54.4 to 56.3, from March 15, 2011, through May 15, 2011.

356 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY One of the ways that agencies can facilitate the distribution of data from their data collection system is by establishing one or more data feeds. As discussed in the first chapter, different parties will want to acquire data processed to different levels, depending on the intended use. For example, a mobile application developer may only be interested in heavily processed data, such as route-level travel times. A third-party data aggregator may be interested in obtaining speeds computed from loop detectors to be fused with other travel time data sources. A traffic engineering firm may prefer raw detector flow and occupancy data that they can quality check using their own established methods and then use to calculate performance measures. Maintaining multiple data feeds can be a challenge. If agencies want to provide a feed of processed data, it will save resources in the long run to document the processing steps performed on the data. Such documentation will allow implementers of external systems to evalu- ate these steps and undo them, if needed. Aside from the processing documentation, maintaining clear documentation on the format of data files and units of data will greatly facilitate the use of data outside of the agency. In addition, maintaining documentation on the path of data from a detector through the agency’s internal systems can be of value to contractors and other external data users. Clearly explaining this information in a text file minimizes the back-and-forth communication between agency staff and contractors and prevents inaccurate assumptions. Methodological Advancement From a methodological standpoint, this case study focused on implementing a multi- state travel time reliability model developed by SHRP 2 Project L10. The original research developed this model on automated vehicle identification travel time mea- surements in San Antonio, as well as travel times generated by a microsimulation traf- fic model on a section of I-66 in Northern Virginia. The present effort extended this research by applying it to point speeds generated by multiple loop detectors along a freeway segment. The methodological findings of this case study are that multistate normal distribu- tion models can approximate travel time distributions generated from loop detectors better than normal or lognormal distributions. During the peak hours on a congested facility, three states are generally sufficient to balance a good model (distribution) fit with the need to generate information that can be easily communicated to interested parties. During off-peak hours, two states typically provide a reasonable model (distri- bution) fit. The outputs of this method can inform travelers of the percentage chance that they will encounter moderate or severe congestion and, if they do, what their expected and 95th percentile travel times will be. Probe Data Comparison Most public agency–managed data processing systems rely on fixed sensor infrastruc- ture to support the calculation of roadway travel times and subsequent generation of travel time reliability metrics. This state of affairs may change over time as more private sector sources of data become available, but this will not happen overnight. To that end, agencies need to consider how to make the best use of the data currently

357 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY available to them. As part of this use case, the team examined the data available from a network of fixed infrastructure sensors (a combination of single loops and radar-based sensors) undergoing the process of being modernized by the Virginia Department of Transportation. The team’s analysis of the data available from these sensors has yielded a number of findings of potential interest to a variety of agencies, particularly those facing maintenance and calibration issues associated with older sensor systems, as well as those agencies with more sparsely spaced spot sensors. Overall, the team found that five primary factors accounted for differences between measured probe vehicle data and the speed and estimated travel times from VDOT sensor data; these factors are detailed below. Likely one of the most significant impacts on sensor-based speeds, and at the same time most difficult to measure, is associated with research that suggests that fixed roadway sensors may not always accurately measure very low speeds during highly congested conditions. Although impossible to definitively evaluate here, this issue should be taken into consideration as part of all such analysis. The Virginia Department of Transportation is in the process of modernizing its sensor network in NOVA, and as yet the majority of sensors are not fully calibrated or fully configured to properly communicate with backoffice data analysis systems. This resulted in the types of data quality issues discussed earlier in the case study. This issue makes clear the need for public agencies to conduct regular sensor maintenance programs to ensure that their detection networks are generating the most accurate data possible. Beyond any issues that spot sensors may have accurately assessing low-speed, stop-and-go traffic conditions, another issue that sensor users must contend with is the problem associated with extrapolating speeds (and subsequently travel times) for a segment of roadway based solely on conditions within the sensor’s field of detection. All speeds and travel times for a segment are based on the assumption that conditions along the segment are identical to those experienced within the sensor’s field of view. As a result, it is likely that any data generated by spot sensors will fail to detect con- gestion and incidents that occur outside of the sensor’s immediate vicinity, with the impact becoming more pronounced the longer the segment. The problem associated with extrapolating spot sensor data to cover entire seg- ments of roadway is related to the need to impute data from adjacent sensors or segments of roadway to fill in gaps in sensor coverage. Although not necessarily an enormous problem in cases for which data for a single lane of travel are filled in based on conditions experienced by adjacent sensor stations, the types of imputation required as part of this case study resulted in speeds being generated for segments of roadway based largely on historical data for a sensor or macroscopic speed and flow data for a section of the roadway network. Although such replacement data are a necessity for computing speed and estimated travel time for the given segment, their use further aggravated the data-related issues described above.

358 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Another dynamic that affects the comparison of sensor data with probe vehicle data stems from a basic difference between these data sets: • Sensor data represent 5-minute, average conditions across all lanes of travel ob- served at the sensor location; but • Probe data represent the movement of a single vehicle through one lane of travel across the segment being evaluated. These differences may potentially result in significant differences in speed and esti- mated travel time between the two data sources if one lane of travel experiences signifi- cant congestion, but the other lanes do not. This is especially true in cases in which the probe vehicle is slowed by congestion outside of a sensor’s detection zone, while other lanes of travel are moving at higher, less-congested (or even free-flow) rates of speed. Each of the factors described above almost certainly had some degree of impact on the differences between the probe vehicle speeds the team collected and the speed and estimated travel times that were based on VDOT sensor data. Moreover, with the exception of the final factor (basic differences between probe and sensor data sets), each of these factors has the potential to affect the quality of data collected by spot sensor–based fixed data collection infrastructure. Public agency staff should take each of these factors into consideration when making decisions concerning both the deployment of new data collection infrastructure and the maintenance or expansion of existing systems. REFERENCES 1. Jia, Z., C. Chen, B. Coifman, and P. Varaiya. The PeMS Algorithms for Accu- rate, Real-Time Estimates of g-Factors and Speeds from Single-Loop Detectors. University of California, Berkeley, undated. http://pems.eecs.berkeley.edu/Papers/ gfactoritsc.pdf. Accessed Sept. 2, 2012. 2. Guo, F., H. Rakha, and S. Park. A Multistate Model for Travel Time Reliability. In Transportation Research Record: Journal of the Transportation Research Board: No. 2188, Transportation Research Board of the National Academies, Washington, D.C., 2010, pp. 46–54. 3. Fraley, C., and A. E. Raftery. MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering. Technical Report No. 504. Department of Statistics, University of Washington, 2009. http://www.stat.washington.edu/fraley/mclust. Accessed Sept. 2, 2012.

359 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Case Study 3 SACRAMENTO–LAKE TAHOE, CALIFORNIA The Sacramento–Lake Tahoe region of northern California was selected for this case study because it provided an example of a rural transportation network with sparse data collection infrastructure. The study region was of additional interest because it includes urban, suburban, and rural areas and has routes with heavy recreational traf- fic and areas where adverse weather can have a major effect on travel time reliability. The purpose of the case study was to • Describe the process of transferring data from existing data collection systems into a travel time reliability monitoring system (TTRMS). • Discuss how filtering techniques might be applied during the analytical process to refine the travel time estimates generated from Bluetooth-based data. • Describe and assess the impact of detector network configuration on the data ulti- mately available for use by a TTRMS. • Consider the privacy challenges associated with collecting and using toll tag– and Bluetooth-based data. The monitoring system section details the reasons for selecting the Sacramento– Lake Tahoe region in northern California as a case study and provides an overview of the region. It briefly summarizes agency monitoring practices, discusses the existing sensor network, and describes the software system that the team used to analyze the use cases. Specifically, it describes the steps and tasks that the research team completed to transfer data from the data collection systems into a TTRMS. Methodological experiments describe the manner in which different types of filter- ing techniques might be applied at different stages of the analytical process to further refine the travel time estimates generated from Bluetooth-based data sets. Use cases are less theoretical and more site specific. The first two use cases assess the impact of detector network configuration on the data ultimately available for use by a TTRMS. The third use case attempts to quantify the impact of adverse weather– and demand-related conditions on travel time reliability using data derived from the

360 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Bluetooth- and electronic toll collection (ETC)–based systems deployed in rural areas as part of this case study. Privacy considerations address the challenges associated with collecting data using toll tag– and Bluetooth-based technologies in a manner that respects the privacy of the individuals from whom the data are being collected. Lessons learned summarizes the lessons learned during this case study with regard to all aspects of travel time reliability monitoring: sensor systems, software systems, calculation methodology, and use. MONITORING SYSTEM Site Overview The team selected the Lake Tahoe region located in California Department of Trans- portation (Caltrans) District 3 to provide an example of a rural transportation net- work with fairly sparse data collection infrastructure. Caltrans District 3 encompasses the Sacramento Valley and Northern Sierra regions of California. Its only metropoli- tan area is Sacramento. The district DOT is responsible for maintaining and operat- ing 1,410 centerline miles and 4,700 lane miles of freeway in 11 counties. District 3 includes urban, suburban, and rural areas, including areas near Lake Tahoe where weather is a serious travel time reliability concern and there is heavy recreational traffic. The district also contains 64 lane miles of high-occupancy vehicle lanes, with an additional 140 lane miles proposed, all within the greater Sacramento region. Two major Interstates pass through the district: I-80, which travels from east to west, and I-5, which travels from north to south. Other major freeway facilities include US-50, which connects Sacramento and South Lake Tahoe, and SR-99. Built in 2000, the District 3 regional traffic management center is located in Rancho Cordova, 15 miles east of Sacramento. The traffic management center serves as the focal point for traffic information within District 3 and staff are responsible for managing (1) • A regional network of sensors, cameras, changeable message signs (CMS), high- way advisory radios, and a road weather information system; • Delivery of traveler information; and • Dispatch of other Caltrans resources. The weather-related conditions that contribute to serious travel time reliability concerns in District 3 include the following (1): • Fog and visibility. The region is prone to thick tule fog after heavy rain. • High winds. Several bridges in the district are exposed to high winds. • Frost and ice. Freezing can occur on longer viaduct sections during cold weather. • Snow in Sierras. High winds combined with snow accumulation create white-out conditions over mountain roadways.

361 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Caltrans and its regional partners are pursuing the creation of corridor system management plans (CSMPs), defined by Caltrans as follows: A CSMP is a comprehensive, integrated management plan for increasing transportation options, decreasing congestion, and improving travel times in a transportation corridor. A CSMP includes all travel modes in a defined corridor—highways and freeways, parallel and connecting roadways, public transit (bus, bus rapid transit, light rail, intercity rail) and bikeways, along with intelligent transportation technologies, which include ramp metering, coordinated traffic signals, changeable message signs for traveler information, incident management, bus/carpool lanes and car/vanpool programs, and tran- sit strategies. CSMP success is based on the premise of managing a selected set of transportation components within a designated corridor as a system rather than as independent units. Each CSMP identifies current management strate- gies, existing travel conditions and mobility challenges, corridor performance management, planning management strategies, and capital improvements. (2) In District 3, six CSMPs have been developed along I-80, I-5/SR-99, US-50, SR-99 North, SR-49, and SR-65 (2). Sensors Caltrans District 3 only collects traffic data along freeway facilities. It operates 2,251 point detectors (either radar or loop detectors) located in over 1,000 roadway loca- tions across the district. Point detection infrastructure in the mountainous regions of the district is sparser, with detectors often miles apart. To supplement the point detection network in rural portions of the Sierra Nevada Mountains near Lake Tahoe, the district has installed ETC readers on I-80 and Bluetooth-based data collection readers along I-5 and US-50 (see Figure C.104). These readers register the move- ment of vehicles equipped with FasTrak tags (Northern California’s ETC system) and Bluetooth-based devices (e.g., smart phones) for the purpose of generating roadway travel times. Table C.40 and Table C.41 provide details about the ETC readers and Bluetooth readers (BTRs), respectively, used in this case study. Both ETC- and Bluetooth-based data collection technologies use vehicle- identification technologies to record the presence of vehicles as they pass instrumented points along a roadway. Field controllers typically record location, time, and vehicle- identification information for each vehicle to support the calculation of travel times. By knowing the length of the road segment between two instrumented points and the starting and ending times at which travel between those points took place, the travel time for that section of roadway can be determined.

362 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.104. Map of electronic toll collection and Bluetooth readers deployed in Caltrans District 3. TABLE C.40. BREAKDOWN OF DEPLOYED ELECTRONIC TOLL COLLECTION READERS IN STUDY AREA ETC ID from Figure C.104 Roadway and Direction of Travel ETC Reader ID Nearest Crossroad Postmile ETC 1 I-80 EB 42003 Auburn 123.1 ETC 2 I-80 WB 42035 Baxter 148.5 ETC 3 I-80 WB 42041 Kingvale 168.0 ETC 4 I-80 EB 42036 Rainbow 168.1 ETC 5 I-80 EB 42042 Rest area 176.2 ETC 6 I-80 EB 42044 Donner Lake 179.9 ETC 7 I-80 WB 42006 Prosser Village 189.0 ETC 8 I-80 WB 42015 Hirschdale 193.4 Note: EB = eastbound; WB = westbound.

363 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.41. BREAKDOWN OF DEPLOYED BLUETOOTH READERS IN STUDY AREA BTR ID from Figure C.104 Roadway and Direction of Travel Bluetooth Reader ID Nearest Crossroad Postmile Bluetooth 1 I-5 NB 1005 Elk Grove 506.4 Bluetooth 2 I-5 NB 1011 Pocket 511.5 Bluetooth 3 I-5 SB 2101 Florin 512.4 Bluetooth 4 I-5 SB 2009 Gloria 513.5 Bluetooth 5 I-5 NB 1039 Vallejo 517.2 Bluetooth 6 I-5 NB 1004 L Street 518.9 Bluetooth 7 US-50 EB 1054 Placerville 48.4 Bluetooth 8 US-50 EB 2055 Twin Bridges 87.1 Bluetooth 9 US-50 WB 2058 Echo Summit 94.9 Bluetooth 10 US-50 EB 2056 Meyers 98.7 Note: NB = northbound; SB = southbound. Electronic Toll Collection–Based Data Collection in District 3 The ETC-based data collection infrastructure deployed along I-80 consists of eight FasTrak toll tag reader stations installed and operated by Caltrans District 3. The readers were initially installed to provide the Bay Area’s 511 system with travel times to Lake Tahoe, but they have not yet been used for that purpose. According to Caltrans, each reader is mounted on an overhead CMS or other fixed overhead sign. Each reader station consists of a cabinet mounted to the sign pole, which is connected to antennae mounted on the edge of the sign closest to the road- way; these antennae are directed such that they monitor traffic in each lane of travel. All of the readers are deployed at roadway sections that have two lanes of travel in each direction, with the exception of one location, where there are three lanes of travel in each direction. ETC transponders passing these readers are each encoded with a unique identification number. Data from these transponders are collected via dedicated short-range communication radio by the readers and assigned time and date stamps, as well as an antenna identification stamp for use in calculating travel time. Bluetooth-Based Data Collection in District 3 This case study also leverages data from BTRs installed by Caltrans’ research division on I-5 in Sacramento and along US-50 between Placerville and Lake Tahoe. BTRs are typically placed on the side of a roadway, ideally at a vehicle wind- shield height or higher to minimize obstructions between the reader and the in-vehicle Bluetooth-enabled devices. In Caltrans’ case, each BTR was mounted inside an equip- ment cabinet strapped to poles along the freeway. The BTRs deployed by Caltrans used the standard Bluetooth device inquiry algo- rithm, scanning all 32 available channels every 5.12 seconds (split into two 2.56-second phases of 16 channels each). Each BTR records the unique media access control (MAC) address generated by every Bluetooth device it detects during each scan cycle for use in calculating travel time.

364 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Data Management The primary data management software system in the District 3 region is Caltrans’ Performance Measurement System (PeMS). All Caltrans districts use PeMS for data archiving and performance measure reporting. PeMS integrates with a variety of other systems to obtain traffic, incident, and other types of data. It archives raw data, filters it for quality, computes performance measures, and reports them to users through the web at various levels of spatial and temporal granularity. It reports performance measures such as speed, delay, percentage of time spent in congestion, travel time, and travel time reliability. These performance measures can be obtained for specific freeways and routes and are also aggregated up to higher spatial levels, such as county, district, and state. These flexible reporting options are supported by the PeMS web interface, which allows users to select a date range over which to view data, as well as the days of the week and times of the day to be processed into performance metrics. Since PeMS has archived data for Caltrans dating back to 1999, it provides a rich and detailed source of both current travel times and historical reliability information. PeMS integrates, archives, and reports on incident data collected from two sources: the California Highway Patrol and Caltrans. The California Highway Patrol reports current incidents in real time on its website. PeMS obtains the text from the website, uses algorithms to parse the accompanying information, and inserts it into the PeMS database for display on a real-time map, as well as for archiving. In addition, Caltrans maintains an incident database, called the Traffic Accident Surveillance and Analysis System (TASAS), which links to the highway database so that incidents and their loca- tions can be analyzed. PeMS obtains and archives TASAS incident data via a batch process approximately once per year. Incident data contained in PeMS have been lev- eraged to validate use cases associated with how different sources of congestion affect travel time reliability. PeMS also integrates data on freeway construction zones from the Caltrans lane closure system, which is used by Caltrans districts to report all approved closures for the next seven days, plus all current closures, updated every 15 minutes. PeMS obtains these data in real time from the lane closure system, displays them on a map, and lets users run reports on lane closures by freeway, county, district, or state. Lane closure data in PeMS were used in the validation of the use cases associated with how different sources of congestion affect travel time reliability. Systems Integration Data Acquisition in Support of Travel Time Reliability Analysis PeMS can calculate many types of performance measures; the requirements for link- ing PeMS with an existing system depend on the features being used. The basic data that PeMS requires from the source system to support these functions include the following: • Metadata on the roadway linework of facilities being monitored; • Metadata on the detection infrastructure, including the types of data collected and the locations of equipment (configuration); and

365 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Real-time traffic data in a constant format at a constant frequency (such as every 30 seconds or every minute). Traffic data are generally unusable for travel time calculation purposes if not accompanied by a detailed description of the configuration of the system. Configura- tion information provides the contextual and spatial information on the sensor net- work needed to make sense of the real-time data. These two types of information should be transmitted separately (i.e., not in the same file or data feed). Roadway and equipment configuration information is more static than traffic data, as it only needs to be updated with changes to the roadway or the detection infrastructure. Keeping the reporting structure for these two types of information separate reduces the size of the traffic data files, allowing for faster data processing, better readability, and lower bandwidth cost for external parties who may be accessing the data through a feed. To represent the monitored roadway network and draw it on maps, PeMS requires geographic information system–type roadway polylines defined by latitudes and lon- gitudes. To help the agency link PeMS data and performance metrics with their own linear referencing system, PeMS also associates these polylines with state roadway mileposts. In most state agencies, mileposts are a reference system used to track high- way mileage and denote the locations of landmarks. Typically, these mileposts reset at county boundaries. In locations where freeway alignments have changed over time, it is likely that the difference between two milepost markers no longer represents the true physical distance down the roadway. For this reason, PeMS adds a third representation of the roadway network called an absolute postmile. These are similar to mileposts, but they represent the true linear distance down a roadway, as computed from the polylines. To facilitate the computation of performance metrics across long sections of freeway, absolute postmiles do not reset at county boundaries. In PeMS, this infor- mation is ultimately stored in a freeway configuration database table that contains a record for every 0.10 mile on every freeway. Each record contains the freeway number, direction of travel, latitude and longitude, state milepost, and absolute postmile. PeMS also requires metadata concerning the detection equipment from which the source system is collecting data. This requirement is due to the need to standardize data collection and processing across all agencies, regardless of their source system structures. Configuration information ultimately populates detector, station, and con- troller configuration database tables in PeMS and is used to correctly aggregate data and run equipment diagnostic algorithms. Finally, the data acquisition step often involves reconciliation between the frame- work of the source system and the monitoring system. For example, different terminol- ogy can lead to incorrect interpretations of the data. This step often requires significant communication between the system contractor and the agency staff who have famil- iarity with the data collection system to resolve open questions and make sure that accurate assumptions are being made.

366 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Integration of District 3 Case Study Data Sources into PeMS The two sources of data used in support of this case study, which are based on the movement of vehicles equipped with ETC and Bluetooth devices, are extremely new and are not currently integrated into Caltrans District 3’s existing PeMS data feed. Consequently, it was necessary to incorporate these data sets into project-specific in- stances of PeMS for analysis as part of this project. This section provides an overview of the resources needed to conduct the prerequisite data collection through monitoring system integration–related activities and discusses some of the challenges likely to be encountered when developing such a monitoring system. Such activities included the following: • With the Tahoe area ETC data, the goal was to use preexisting PeMS ETC process- ing and equipment configuration software, as well as the road network definitions in use by PeMS for Caltrans. This effort proved to be fairly straightforward, and no special accommodations were required other than dealing with detectors that would occasionally go offline during real-time collection. Following public agency policy, all individual toll tag identifiers from the ETC readers were deleted every 24 hours; • For each ETC reader station, the research team was provided with information re- garding the county in which it was deployed, the freeway on which it was located, the direction of travel for which it was collecting data, milepost, textual location, and the Internet protocol (IP) address used to communicate with it to obtain data. To integrate each reader into PeMS so that data could be collected in real time, the research team assigned each reader a unique ID and determined its latitude and longitude. Software was developed to communicate with each reader’s IP address, obtain its data, and incorporate that data into the PeMS database; and • With the Lake Tahoe area Bluetooth data, the goal was to configure PeMS so that the BTRs and data they produced could be used as if the data were from standard ETC reader stations. For each BTR, the research team received configuration data in a text file, with fields for the node (reader) ID, a textual location, and a lati- tude–longitude reading. Configuration data were provided for 26 BTRs. Caltrans also provided the research team with a 2-gigabyte SQL file containing all of the Bluetooth data collected by the BTRs between December 25, 2010, and April 21, 2011. The research team subsequently integrated these data into PeMS and pro- cessed them to compute travel times between each pair of BTRs. Analyzing Electronic Toll Collection and Bluetooth Data PeMS collects sensor data either by directly polling each detector, obtaining data from an existing data collection system, or via integration of data from another archival resource; it then stores the data in an Oracle database. Reliability measures available based on these data will depend on the type of detector from which they have been collected; for example, loop detectors will provide different raw data for analysis than ETC- or Bluetooth-based data collection systems. Reliability metrics available in PeMS based on data from the ETC and Bluetooth systems are as follows:

367 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Minimum: the fastest vehicles that traveled across a roadway segment during a given period of time; • 25th percentile: the 25th percentile travel time during a given period of time; • Mean: the mean travel time during a given period of time; • Median: the median travel time during a given period of time; • 75th percentile: the 75th percentile travel time during a given period of time; and • Maximum: the slowest-moving vehicles that traveled across a roadway segment during a given period of time. It is likely that much (if not all) of these data are composed of outliers that made at least one stop between two consecutive readers before completing their trip. Each of the reliability measures described above is available for analysis based on 5-minute and hourly time periods. The research team used preexisting PeMS ETC processing and equipment con- figuration software to support the development and deployment of ETC and BTR instances of PeMS. Existing PeMS analysis tools create reports of travel time versus starting time. For a given starting (or source) tag reader, the travel time to a destina- tion tag reader is defined as the amount of time it takes for a specific tag to be seen at the destination tag reader. Following public agency policy, PeMS does not store travel times for individual ETC tag reads, only recording summary statistics for all of the tags that traversed the distance between each consecutive pair of readers during a given period of time. However, because similar regulations do not exist regarding the use of data collected from Bluetooth devices, the research team had access to a much wider variety of raw and summary data concerning the movement of Bluetooth- enabled vehicles for use as part of this case study. The algorithm currently used by PeMS to calculate travel times based on ETC and Bluetooth data is fairly simple, with its only purpose being to identify travel times for vehicles that pass between consecutive readers regardless of whether the resultant travel time makes logical sense. For example, there is no way of knowing if a given vehicle left the freeway in between reader stations. The team only knows when vehicles were seen at each station. As a result, the travel times produced by PeMS based on these data have the potential to be significantly influenced by outliers and can at times be quite noisy. Two key differences, directionality and background noise, between the ETC and Bluetooth technologies needed to be accounted for as part of the research team’s efforts to use the available BTR data. In the first major difference, ETC detectors are aimed in such a way as to sense traffic flowing in a particular direction. In most cases, well over 95% of data collected by an ETC device is from traffic flowing in the direction that the detector is anticipated to be measuring. In contrast, BTRs do not have this directional bias. Both ETC and BTRs are capable of recording the presence of a single vehicle multiple times as it passes through the reader’s detection zone. In the case of ETC readers, a vehicle is seldom detected more than twice because of the limited range and directionality (aimed down on a spot on the road, not parallel to the ground) of

368 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY the ETC antenna. However, BTRs can record any device generating a Bluetooth signal within its sensing radius, sometimes from 100 meters away. This detection ability can result in a single Bluetooth device being detected many times as it passes through the reader’s detection zone, especially if it is traveling slowly or is stopped. PeMS expects data to come from devices that have a directional bias. To accom- modate this issue, the research team configured PeMS to view each BTR as generating data for two directions of travel and fed the data into PeMS twice, assigning it first to one detector in one direction of travel and then assigning a copy of that data to the other direction of travel. Background noise, the second major difference between Bluetooth and ETC tech- nologies, was problematic as several BTRs deployed as part of this project are located within a few dozen meters of office buildings, homes, or parking lots. Consequently, there are many stationary (or nearly so) Bluetooth devices residing within these loca- tions that produce a reading every few seconds for hours on end. These data have the potential to overwhelm legitimate vehicular data, sometimes by as much as a factor of 10 times or greater. The research team’s initial solution for dealing with this issue in order to generate roadway travel times for analysis was to eliminate all subsequent reports of unique Bluetooth MAC addresses collected within 1 hour of the initial reporting. Additional information concerning activities undertaken by the project team to optimize the usefulness of these data sets is contained in the section below on method- ological experiments and the first two use cases. METHODOLOGICAL EXPERIMENTS Overview Due to the significant amounts of Bluetooth-based travel time data available for analy- sis as part of this case study, the research team elected to focus its methodological efforts on this data set rather than on data generated by the ETC-based system. This decision stemmed from an awareness that Bluetooth-based systems, although new, have been rapidly embraced by a wide range of transportation agencies interested in identifying low-cost, easily deployed solutions for collecting roadway travel times. A great deal remains unknown regarding the underlying nature of these data, including how filtering techniques might be applied at different stages of the analytical process to further refine generated travel times. This section focuses on the evaluation of methods for identifying individual vehicle trips between BTRs and includes a statistical analysis of procedurally generated vehicle travel times. Filtering techniques at both the proce- dural and statistical levels are also explored as methods for improving the quality of travel time estimates. The primary output of this section is a methodology for obtaining filtered travel time histograms that depict the distribution of travel times within a sample of Bluetooth data. It is possible to generate parameterized probability density functions (PDFs) from these histograms as was done in the San Diego case study; however, this step is omitted in favor of analysis of the underlying data issues.

369 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Bluetooth Device Data Impact of Bluetooth Reader Hardware on Available Data The characteristics of Bluetooth-based data available for analysis are determined largely by the capabilities of the BTR deployed at the roadside. For example, only 5 of the 10 BTRs deployed by Caltrans had the ability to read and store signal strength measurements for each observation. Signal strength measurements are important because they provide the ability to determine the relative distance of each Bluetooth- enabled mobile device from the reader. Whether a specific BTR can read and report signal strength values for each mobile device depends on the nature of the BTR’s host controller interface, which is an interface between the Bluetooth protocol stack and the device’s controller hardware. BTRs that reported signal strength values were based on Linux boards using the BlueZ protocol stack; units not reporting signal strength values used a microcontroller-based implementation. In the case of the Bluetooth Class 1 devices deployed by Caltrans, which have a range (radius) of detection of approximately 100 meters (see Figure C.105), knowing the signal strength of each mobile device observation can be important to accurately calculate the travel times of those devices to the next BTR. To clarify, if a vehicle is traveling at 40 mph, it will pass through the device’s full 200-meter detection zone in approximately 10 seconds. However, if heavy congestion is present and the BTR zone traversal speed is only 5 mph, it will take approximately 82 seconds. When BTRs are fairly close together, the accurate calculation of travel time can be significantly affected by whether the travel time analysis system is able to determine the time at which each Bluetooth device is closest to each BTR; the impact is even greater during periods of congestion, when vehicles are moving slowly and generating many more observations. This issue is underscored by the fact that within the Caltrans data set, Bluetooth- enabled mobile devices each generated approximately one observation (on average) per second (see Table C.42), resulting in a mean number of observations per mobile device per visit to each BTR of between 1.06 and 21.30. TABLE C.42. BLUETOOTH READER DETECTION ZONE TRAVERSAL TIMES AND OBSERVATIONS Bluetooth Reader ID Mean Zone Traversal Time (s) Standard Deviation of Zone Traversal Time (s) Mean Observations per Visit BTR 1 1.43 12.02 1.06 BTR 2 0.88 9.88 1.10 BTR 3 1.27 25.45 1.16 BTR 4 4.20 48.56 1.44 BTR 5 0.58 10.91 1.09 BTR 6 7.93 48.01 1.38 BTR 7 8.77 37.49 11.49 BTR 8 8.73 65.38 11.44 BTR 9 5.77 36.78 8.19 BTR 10 23.10 116.65 21.30

370 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.105 depicts the detection zones generated by ETC- and Bluetooth-based data collection technologies. Table C.43 provides examples of mean, maximum, and standard deviation of mobile device signal strengths collected by the various BTRs involved in this study. BTRs with signal strength characteristics noted as NA (not available) did not have the capability to collect signal strength data. Signal strength readings are proportional to the distance between a BTR and each mobile device. A BTR’s mean signal strength is therefore a function of the location of the BTR relative to the roadway. In addition, BTR antenna gain varies as a function of manufacturer and type, which affect mean signal strength (3). Figure C.105. Bluetooth and ETC reader detection zones.

371 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.43. BTR SIGNAL STRENGTH CHARACTERISTICS Bluetooth Reader ID No. of Observations Mean Signal Strength (%)a Maximum Signal Strength (%) Signal Strength Standard Deviation (%) BTR 1 319 N/A N/A N/A BTR 2 430,679 N/A N/A N/A BTR 3 442,739 N/A N/A N/A BTR 4 1,055,037 27.24 55 16.04 BTR 5 870,362 N/A N/A N/A BTR 6 401 N/A N/A N/A BTR 7 1,507,667 77.66 96 3.53 BTR 8 893,232 77.68 94 3.38 BTR 9 403,628 77.32 93 3.01 BTR 10 2,178,002 77.18 95 3.38 a Percent of the maximum signal strength the Bluetooth reader expects to see. Note: N/A = not applicable. Figure C.106 compares observed signal strengths over time for three vehicles traveling through BTR detection zones; each plot is centered (from a temporal per- spective) on the time at which the peak signal strength is detected for each vehicle. The first vehicle (top) arrives in the detection zone, travels past the reader, and stops Figure C.106. Comparison of observed signal strengths versus time for three vehicles.

372 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY for approximately 11 minutes within the detection zone. The second vehicle (middle) passes through the detection zone in approximately 17 seconds, traveling at 24 mph. The third vehicle (bottom) enters the BTR detection zone, pauses for approximately 18 seconds, passes the BTR, and then departs the detection zone. Mobile Device Data Characteristics Bluetooth device data collected as part of this case study exhibited various character- istics that should be understood before attempting the calculation of roadway travel times; these are discussed below. Devices Visiting Only One BTR One way to classify mobile devices is by the total number of unique BTRs they visit. For the purposes of calculating segment (BTR-to-BTR) travel times, observations gen- erated by devices that visit only a single BTR can be ignored. Based on the team’s analysis, approximately 29% of all mobile devices represented in the Caltrans data set visited only a single BTR during a given trip; these devices contributed 12.5% of all mobile device observations (Table C.44). TABLE C.44. OBSERVATIONS GENERATED BY DEVICES BY NUMBER OF BLUETOOTH READERS VISITED Visited 1 BTR Visited > 1 BTR Total No. of Devices 146,075 (29%) 356,408 (71%) 502,483 (100%) No. of Observations 2,315,389 (13%) 16,176,143 (88%) 18,491,532 (100%) Variable Bluetooth Reader Detection Zone Traversal Times Mobile devices take varying amounts of time and generate unpredictable numbers of observations each time they pass through a given BTR’s detection zone. Generally, the number of observations generated by a device is proportional to the amount of time the vehicle is present within the detection zone, and that amount of time is propor- tional to the vehicle’s speed. Based on analysis conducted as part of this case study, the research team believes that the mean zone traversal time (see Table C.42) is affected by a combination of the physical location of the reader relative to the roadway and other roadway characteristics. For example, BTR 2 (see Table C.41 for the location of each BTR) is located at the end of an entrance ramp and is isolated from nearby arterials and buildings. It has a mean detection zone traversal time of 0.88 seconds, with ap- proximately 1.10 observations per visit. This can be seen in the zone traversal time fre- quency distribution (top distribution in Figure C.107) with no delay time for vehicles passing through the detection zone. This reader contrasts with BTR 10, which has a mean detection zone traversal time of 23.1 seconds and 21.3 observations per visit (bottom distribution Figure C.107). BTR 10 is located on one leg of a T-intersection with a single stop sign. Consequently, cars queuing at the stop sign may be contribut- ing significantly to the long zone traversal times.

373 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.107. Node traversal time frequency distribution for BTRs 2 (top) and 10 (bottom). Multiple Mobile Device Observations per Bluetooth Reader Individual mobile devices can enter and exit a single BTR’s detection zone multiple times during a sufficiently long period of time. Depending on the size of the window of time, these individual observations can potentially be matched with a significant num- ber of observations from other BTRs. Table C.45 displays the results of one vehicle visiting BTR 10 four times during 1 day. The final two visits are separated by just 10 minutes (the third and fourth visits are shown in Figure C.108). This demonstrates that a travel time algorithm that processes device observation data must have the abil- ity to aggregate and differentiate between clouds of such observations separated in time as a step in the process of calculating travel times between BTRs. TABLE C.45. MULTIPLE DEVICE OBSERVATIONS FOR ONE DEVICE AT BTR 10 Visit No. Time of Day No. of Observations Time Delta (s) 1 09:50 1 0.00 2 15:00 5 2.54 3 16:33 2 18.05 4 16:43 39 42.98

374 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Calculating Travel Times Based on Bluetooth Device Data The primary goal of BTR-based data analysis is to characterize segment travel times between BTRs based on the reidentification of observations derived from unique mo- bile devices. Generally, the data processing procedures associated with the calculation of BTR-to-BTR travel times can be broadly broken down into three processes, as shown below. The first two processes are procedural; the third process is statistical: 1. Identification of passage times a. Aggregate device observations into visits. b. Select BTR passage time. 2. Generation of passage time pairs a. Method 1: Determine maximum origin and destination permutations. b. Method 2: Use all origin visits. c. Method 3: Aggregate visits. 3. Generation of segment travel time histograms a. Filter outliers across days. b. Filter outliers across time intervals. c. Remove intervals with few observations. d. Remove highly variable intervals. The steps involved in these three processes are discussed below. Figure C.108. Details of Visits 3 and 4 for a single mobile device at BTR 10. Time of Day

375 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Process 1: Identification of Passage Times The first step in the process of calculating segment travel time PDFs (TT-PDFs) for a roadway is the calculation of segment travel times for individual vehicles. A vehicle segment travel time is calculated as the difference between the vehicle’s passage times at both the origin and destination BTRs. Passage time is defined as the single point in time selected to represent when a vehicle passed through a BTR’s detection zone. Because mobile devices typically generate multiple observations as they pass through a BTR’s detection zone, selection of appropriate passage times is an important step in maximizing the accuracy of calculated segment travel times for individual vehicles. Aggregating Device Observations into Visits The goal in this step is to identify clusters of observations that represent a vehicle’s continuous presence in the detection zone. Each group of observations is referred to as a visit. For example, Figure C.109 depicts clusters of observations associated with two distinct visits by a single vehicle to the same BTR over the course of several minutes. Figure C.109 displays two separate visits by a single vehicle (each with multiple obser- vations) separated in time by a stop outside of the detection zone. Identifying unique visits is an important step in increasing the accuracy of segment travel time calcula- tions. Associating multiple observations clustered in time as part of a single visit ratio- nalizes the selection of a single passage time for calculating the vehicle’s travel time to a destination BTR. The alternative, which makes little sense, would be to calculate a travel time for each origin observation to the destination BTR. Identifying visits also enables the assessment of arrival (first mobile device observation) and departure (last mobile device observation) times for each mobile device, which can, depending on the circumstances, be used as the passage time. Figure C.109. Visits as clusters of observations in time.

376 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The method used to aggregate visits is a causal sliding time window filter. This method is a filter in that it removes unnecessary observations during the aggregation process. It is causal in that it uses only past and present observations to support its decision making. This filter discards all subsequent observations that are within a fixed time span from the time of a prior observation. However, when the time between an observation and the prior observation with which it is being compared exceeds this time span, it is considered to be part of a new visit. This has the effect of aggregat- ing observations into visits by arrival time (or departure time, depending on how it is implemented) and discarding all other observations. Figure C.110 displays 11 obser- vations that have been aggregated into two visits due to a sufficiently large time gap between the fifth (part of Visit 1) and sixth (part of Visit 2) observations. This filter is an efficient method of processing real-time observations and compressing large quanti- ties of observation data for efficient storage. The size of the filter interval time depicted in Figure C.110 determines the gran- ularity of identified visits. The effect of different-sized interval times is shown in Figure C.111. In general, selecting the largest reasonable interval time is desirable because it results in more accurate estimates for arrival and departure times (and hence, passage times, depending on the method used). However, overaggregating visits is potentially problematic. The research team has identified the following two error types to consider when selecting an interval time: • Observation overaggregation. When observations belonging to multiple visits are incorrectly aggregated as a single visit, the arrival passage time and departure passage time may be calculated as too early or too late, depending on the method used. This may also result in the classification of stopped nondelay time as stopped delay time because the vehicle is incorrectly identified as being continuously in the detection zone. For example, if a filter time interval of 20 minutes is used and the vehicle leaves the detection area and returns 10 minutes later, then this 10-minute absence would be classified as having been spent within the zone. If the distance to adjacent BTRs is close, overaggregation risks subsuming valid origin visits, resulting in the deletion of valid trips. For these reasons, underaggregation is preferred to overaggregation. Figure C.110. Aggregation of observations into visits using time intervals.

377 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Observation underaggregation. Incorrectly subdividing observations from a single visit into multiple visits may result in the incorrect calculation of passage time, depending on the method used. Underaggregation is less problematic because mul- tiple sequential visits that are not interwoven with visits to other BTRs can be aggre gated and considered to be a single visit (see below). The Caltrans Bluetooth data were processed by storing the arrival, departure, and maximum signal strength (when available) for each identified visit. Observations were aggregated using a 120-second time window. The 120-second window size was selected due to the small distance between BTRs 9 and 10 (about 3.8 miles) and the preference for underaggregating visits. A 60-second time interval was found to under- aggregate observations in too large a number of cases. Other researchers appear to be using a 5-minute interval (4), which may be appropriate for large BTR-to-BTR dis- tances. When deploying permanent travel time data collection systems based on BTR (or related) technologies, it is likely that the filter interval should be adjusted for each BTR as a function of its location and the characteristics of the surrounding region. For example, if a snow chain–fitting area is nearby, longer interval times may be optimal. Figure C.111. Influence of window sizes on observation aggregation.

378 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Vehicles that are continuously within a BTR’s detection zone (and generating observations) are either in travel mode (e.g. driving, in congestion, at a stop light) or trip mode (e.g. stopped at a fuel station, parked at the side of the road). Without more information, distinguishing between trip and travel behavior within a single visit is difficult. In contrast, distinguishing between trip and travel behaviors across multiple visits is possible. Repeat visits to the same BTR (without visiting any other BTR) can be assumed to be unrelated to travel time and therefore eliminated. For example, if the vehicle in Table C.45 did not visit other BTRs between 9:50 a.m. and 16:43 p.m., then these visits can be eliminated from travel time calculations. Selecting Bluetooth Reader Passage Time The precise methodology used to determine a vehicle’s passage time depends on the availability of signal strength data, the distance to adjacent BTRs, and traffic flow pat- terns in the area surrounding a BTR. When signal strength data are available, passage time can be considered as corresponding to the mobile device observation with the greatest signal strength. When signal strength data are not available and the distance to adjacent BTRs is large, the arrival, mean, or departure time may be used as the pas- sage time without introducing a significant bias. However, if traffic through the detec- tion zone is subject to stop delay time (e.g., traffic signals, stop signs, congestion), then the use of arrival or departure times may either introduce or eliminate significant bias. This is illustrated below with BTRs 9 and 10. BTR 10 provides an example of how the use of arrival versus departure times as a proxy for passage time (in cases for which no signal strength data are available) can influence the calculation of segment travel times. BTR 10 is located on one leg of a T-intersection with a single stop sign, as shown in Figure C.112. The closest BTR is 3.8 miles away. The mean detection zone traversal time for BTR 10 is 23.1 sec- onds, which is partially the result of vehicles queuing at the nearby stop sign. Vehicles queued at the stop sign either turn right, away from BTR 10, or turn left and pass it. The free-flow speed of traffic passing the BTR is approximately 45 mph. At this speed, traffic passes through the detection zone in 9 seconds. For vehicles not queued at the stop sign, arrival times are (on average) 4.5 seconds earlier and departure times 4.5 seconds later than when the mobile device passes the BTR. If BTR 10 is used as the point of origin for generating a segment travel time (e.g., with BTR 9 as the destination), then left-turning vehicles proceeding from the stop sign will pass the reader and generate an arrival time (and consequently a passage time) that is approxi- mately 23.1 – 4.5 = 18.6 seconds early, or nearly 7% of the travel time to the next BTR (3.8 miles away with a free-flow speed of 45 mph). This error may be further compounded by heavy traffic, causing longer queues at the stop sign within the BTR 10 detection zone. In contrast, basing vehicle passage time on departure time, which removes the delay associated with the presence of the stop sign, would introduce only about 4.5 seconds of error, representing a substantial improvement over the use of arrival time. This example demonstrates why significant attention needs to be paid to the process used to calculate passage time when signal strength data are not available.

379 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY In addition to considering the impact of BTR passage times, users of Bluetooth data must consider that the accurate calculation of segment travel time is a function of the relationship between BTR-to-BTR distance and the maximum speed error. Follow- ing on the analysis performed by Haghani et al. (4), this relationship is depicted in Figures C.113 and C.114 as the maximum error in segment speed versus BTR-to-BTR distance (marked “Distance between nodes” in both figures) for four speeds. Equa- tion C.4 shows BTR-to-BTR distance L: L = S * T (C.4) where vehicle speed is S, and the travel time between adjacent BTRs is T. Equation C.5 introduces error terms for L, S, and T. In Equation C.6, the maximum error in dis- tance, ΔLmax, is assumed to be 600 feet (the diameter of each BTR’s detection zone). L + ΔL = (S + ΔS) (T + ΔT) (C.5) ΔSmax ≤ (ΔLmax − ΔT * S)/(L/S + ΔT) (C.6) As shown in Figure C.113, if the time error (ΔT) equals zero, then speed error is maximized as vehicle speed increases. As a result, for BTRs spaced less than 2 miles apart collecting Bluetooth data from vehicles traveling at high rates of speed, the maximum speed error becomes quite significant. However, due to a combination of clock synchronization error or Bluetooth time stamp inaccuracies, or both, it is highly unlikely that ΔT will often (if ever) equal zero. As shown in Figure C.114, if the time error (ΔT) is greater than zero, then both slower and faster vehicle speeds have the potential to maximize speed errors. Within the context of this graph, the influence of time errors has a tendency to negate the Figure C.112. BTR 10 geometry in relation to adjacent BTR 9.

380 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.114. Relationship between maximum speed error and BTR-to-BTR distance with ΔT > 0. Figure C.113. Relationship between maximum speed error and BTR-to-BTR distance with ΔT = 0.

381 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY effect of distance errors. A time error of 4 seconds was used based on clock synchro- nization error; associated with Caltrans’ method for synchronizing BTRs when local time differed from network time by more than 2 seconds. Process 2: Generation of Passage Time Pairs It is common for vehicles to generate multiple sequential visits per BTR, which may be interwoven in time with visits at other BTRs (see Table C.46). For BTRs with a signifi- cant mean zone traversal time, it is common for vehicles to generate multiple visits close in time. The motivation for grouping visits is evident in Table C.46, which shows that the vehicle was at the origin BTR multiple times (see Rows 1 to 3) before traveling to the destination BTR (see Row 4). Based on these data, three travel time pairs could be calculated: 1 to 4; 2 to 4; or 3 to 4. Which pair or pairs represent the most likely trip? The benefit of performing more complex analysis of visits is that many likely false trips can be eliminated, increasing the quality of the calculated travel time. Three methods of identifying segment trips are discussed below. TABLE C.46. VISITS FOR A SINGLE VEHICLE BETWEEN TWO BLUETOOTH READERS Row Origin BTR Visit Destination BTR Visit Time No. of Observations per Visit 1 1 Friday, Jan. 28, 13:07:19 7 2 2 Friday, Jan. 28, 16:24:06 13 3 3 Friday, Jan. 28, 17:41:50 25 4 1 Friday, Jan. 28, 22:07:36 3 5 4 Saturday, Jan. 29, 15:07:40 4 6 2 Sunday, Jan. 30, 10:49:03 129 7 5 Sunday, Jan. 30, 12:15:33 3 8 6 Monday, Jan. 31, 13:05:54 10 Note: Year is 2011 for all observations. Method 1 The first potential method for identifying segment trips is simple: create an origin and destination pair for every possible permutation of visits, except those generating nega- tive travel times (Figure C.115). For example, the visits in Table C.46 show six origin and two destination visits, resulting in 12 possible pairs. Five pairs can be discarded because they generate negative travel times. Even so, this approach will generate many passage time pairs that do not represent actual trips. Using this method, 243,777 travel times were generated between one pair of BTRs over a 3-month period. Method 2 The second potential method for identifying segment trips is also simple, but represents an improvement from the first method. It creates an origin and destination pair for every origin visit and the closest (in time) destination visit, as shown in Figure C.116.

382 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.115. Trips generated from all visit permutations. Figure C.116. Trips generated from all origin visits to first destination visit. Multiple origin visits would therefore potentially be paired with a single destination visit. Using this method with the data in Table C.46 would generate four pairs: 1 to 4; 2 to 4; 3 to 4; and 5 to 6. This method generated 60,537 travel times between the ori- gin and destination BTRs, eliminating 183,240 (75%) potential trips compared with the first method. Method 3 Vehicles frequently make multiple visits to an origin BTR before traveling to a destina- tion BTR. The third method of eliminating invalid segment trips aggregates those ori- gin visits that would otherwise be interpreted incorrectly as multiple trips between the origin and destination readers. This method can be described as aggregating visits at the BTR network level. This is an additional level of aggregation beyond aggregating individual observations into visits. Logically, a single visit represents a vehicle’s con- tinuous presence within a BTR’s detection zone. In contrast, multiple visits aggregated into a single grouping represent a vehicle’s continuous presence within the geographic region around the BTR, as determined by the distance to adjacent BTRs. This method is an example of using knowledge of network topology to identify valid trips.

383 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY This method can be applied to the data displayed in Table C.46, which show three origin visits in Rows 1, 2, and 3. The question is whether any of these visits can be aggregated or whether each should be considered a valid origin departure. The dis- tance from the origin (BTR 7) to the destination (BTR 10) is 50 miles (100 miles round trip). Driving at some maximum reasonable speed (for that road segment, anything over 80 mph is unreasonable), a vehicle would take 76 minutes for the round trip. Therefore, if the time between visits at the origin is less than 76 minutes, they can be aggregated and considered as a single visit. In Table C.46, Visits 2 and 3 (Rows 2 and 3) meet this criterion and can therefore be aggregated (Figure C.117). This aggregation eliminates one of three origin visits that could potentially be paired with the destina- tion visit in Row 4. Again, the idea is to identify when the vehicle was continuously within the geographic region around the origin BTR and eliminate departure visits whenever possible. When this method was applied to the data set, it generated 39,836 travel times, eliminating 20,701 (34%) potential trips compared with Method 2 dis- cussed above. Additional filters could be used to identify and eliminate greater numbers of trips. For example, an algorithm could take advantage of graph topology and interspersed trips to other BTRs to aggregate larger numbers of visits. In addition, the algorithm could potentially track which destination visits had previously been paired with origin visits, eliminating unlikely trips. If PDFs are developed based on historical data, selec- tion among multiple competing origin visits paired with a single destination visit could be probabilistic. These are potential topics for future research. Process 3: Generation of Segment Travel Time Histograms Determining travel times based on Bluetooth data is done by first identifying vehicle passage times at each of the BTRs and then pairing those passage times from the same vehicle at origin and destination locations. These techniques were developed with the goal of maximizing the validity of the travel times. However, because of Bluetooth data’s susceptibility to erroneous travel time measurements, even the most careful pair- ing methodology will result in trip times (which could include stops and detours) that need to be filtered to obtain accurate ground-truth travel times (the actual driving time). Figure C.117. Trips generated from aggregating origin visits.

384 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY This section of the methodology describes a four-step technique for filtering travel times, presents travel time histograms before and after filtering, and compares the effects of two passage time pairing techniques (Method 2 and Method 3, described above) on the resulting travel time histograms. The underlying parameterized TT-PDFs could be approximated from the filtered travel time histograms presented here. How- ever, this step is omitted from this methodology section in order to closely focus on the low-level issues associated with obtaining travel time distributions from Bluetooth data. To begin, the distribution of raw travel times obtained from two passage time pair- ing methods can be seen in Figure C.118. The data presented here as Method 2 were developed using the second passage time pairing method described above in the section on generating passage time pairs. Data labeled as belonging to Method 3 were built according to the third passage time pairing method in that same section. No Method 1 analysis is included here due to that method’s lack of sophistication. In Figure C.118, the unfiltered travel time distributions appear similar apart from the quantity of data present. Both distributions have extremely long tails, with most trips lasting an hour or less and many taking months. It is clear from these figures that even the carefully constructed Method 3 data are unusable before filtering. Several plans have been developed to filter Bluetooth data. Here, the team adopts a four-step filtering method proposed by Haghani et al. (4) in which points are discarded based on their statistical characteristics, such as coefficient of variation and distance from the mean. The four data-filtering steps are as follows: 1. Filter outliers across days. This step is intended to remove measurements that do not represent an actual trip but rather a data artifact (e.g., a vehicle being missed one day and detected the next). Here, the team groups the travel times by day and plots PDFs of the speeds observed in each day (rounded to the nearest inte- ger). To filter the data, thresholds are defined based on the moving average of the Figure C.118. Unfiltered travel times between one pair of Bluetooth readers, February 6, 2011, to February 12, 2011. Travel Times (hours) 0 500 1,000 1,500 Method 2: Unfiltered Travel Times Fr eq ue nc y 0 1, 00 0 2, 00 0 3, 00 0 Travel Times (hours) 0 500 1,000 1,500 Method 3: Unfiltered Travel Times Fr eq ue nc y 0 1, 00 0 2, 00 0 3, 00 0

385 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY distribution of the speeds (with a recommended radius of 4 mph). The low and high thresholds are defined as the minima of the moving average on either side of the modal speed (i.e., the first speed on either side of the mode in which the moving average increases with distance from the mode). All speeds above or below these values are discarded (see Figure C.119). 2. Filter outliers across time intervals. For the remaining Steps 2 to 4, time intervals smaller than one full day are considered (the team used both 5- and 30-minute intervals). In this step, speed observations beyond the range mean ± 1.5σ within an interval are thrown out. The mean and standard deviation are based on the measurements within the interval. 3. Remove intervals with few observations. Haghani et al. (4) determine the mini- mum number of observations in a time interval required to effectively estimate ground-truth speeds. This number is based on the minimum detectible traffic vol- ume and the length of the interval. Based on these factors, intervals with fewer than three measurements per 5 minutes (or 18 measurements per 30 minutes) are discarded; and 4. Remove highly variable intervals. In Step 4, the variability among speed observa- tions is kept to a reasonable level by discarding all measurements from time inter- vals whose coefficient of variation is greater than one. To carry out Step 1, the moving average (with a radius of 4 mph) is computed over the speed distribution for each day (speeds are found by simply dividing route length by travel time). The moving average and distribution of speeds from a single day can be seen in Figure C.119. To exclude unreasonably low speeds, the modal speed is defined as the speed corresponding to the peak of the moving average above 20 mph (53 mph in this case). On this day, as a result of filtering Step 1, the upper threshold Figure C.119. Distribution of speeds. Fr eq ue nc y 10 0 80 60 40 20 0 Speed (mph) 0 10 30 40 50 6020

386 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY was set to the maximum observed speed (the minimum of the moving average above the modal speed), and the lower threshold was set to 25 mph (the minimum of the moving average below the modal speed). Thus, on this day, all data points representing speeds below 25 mph or above 62 mph were discarded as a result of Step 1. Step 1 is carried out across days; Steps 2 to 4 are carried out across 5- and 30-min- ute intervals. The 5-minute interval matches the practice of Haghani et al. (4) and represents a standard baseline filter. Filtering results based on a 30-minute interval are also included to compare the effects of a wider filtering interval. A wider filtering interval may be more appropriate for sparse data sets like that available in the rural Lake Tahoe area, where many 5-minute intervals contain no measurements at all. The particular details of Steps 2 to 4 are more straightforward and are omitted from fur- ther discussion. The results of the four-step filtering method on the data obtained using passage time pairing Method 2 are presented in Figure C.120 for the week beginning Febru- ary 6, 2011. The white points are points identified to be discarded in that step, and the black points are points to be kept following that step. Note that the steps are per- formed sequentially, so that points discarded after Step 1 are not considered in Step 2, and so on. Higher traffic volumes due to weekend traffic in the area can clearly be seen on Friday and Saturday. After the data have been filtered, the travel time distributions over the week appear more meaningful, as can be seen by comparing Figure C.121 with Figure C.118. The filtered travel times have lost their unreasonably long values, and in both cases a nicely shaped distribution is visible. Note that the earlier comparison of the data sets remains true: both have similarly shaped distributions, but the data set prepared using passage time pairing Method 2 contains a greater quantity of data. This is because that data set was larger initially, and also a larger percentage of points from it survived filtering. Table C.47 presents a summary of the filtering results on both data sets using 5- and 30-minute intervals. It can be seen that Step 1, which removes outliers by day, takes out a much smaller percentage of the data generated from Method 2. This is because the data in Method 2 were prepared in a way such that the resulting data are grouped more closely, even though they were not prepared as carefully. For example, if a particular origin–destination pair contained three vehicle passage times at the origin and one at the destination, Method 3 would report the single travel time from the lat- est origin time stamp to the destination time stamp. Method 2, however, would report this as three separate travel times, all likely with similar magnitudes. Thus, the data in Method 3 are of higher quality but are not as closely grouped, and are penalized for this in filtering Step 1. Method 3, which has fewer points, is more vulnerable to overaggressive filtering in Step 3 (which removes sparse intervals). This can be seen in the larger bands of dark gray in the Method 3 columns of Figure C.122. The data sets prepared using Method 3 were much sparser initially. As a result, filtering routines that discard intervals with sparse detection may be overaggressive for sparse data sets such as those prepared by Method 3, even if the data are more meaningful.

387 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.120. Four-step filtering on passage time pairing Method 2 with 5-minute intervals.

388 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.121. Filtered travel times (5-minute intervals). Method 2: Filtered Travel Times Travel Times (minutes) Fr eq ue nc y 50 60 70 80 90 100 0 20 40 60 80 120 100 Method 3: Filtered Travel Times Travel Times (minutes) Fr eq ue nc y 50 60 70 80 90 100 0 20 40 60 80 120 100 TABLE C.47. COMPARISON OF PASSAGE TIME PAIRING METHODS 5-Minute Intervals 30-Minute Intervals Method 2 Method 3 Method 2 Method 3 Total 5,886 points 4,185 points 5,886 points 4,185 points Removed at Step 1 3,118 points (53%) 2,687 points (64%) 3,118 points (53%) 2,687 points (64%) Removed at Step 2 117 points (2%) 20 points (0%) 273 points (5%) 119 points (3%) Removed at Step 3 915 points (16%) 883 points (21%) 1,272 points (22%) 1,185 points (28%) Removed at Step 4 0 points (0%) 0 points (0%) 0 points (0%) 0 points (0%) Total removed 4,150 points (71%) 3,590 points (86%) 4,663 points (79%) 3,991 points (95%) Remaining 1,736 points (29%) 595 points (14%) 1,223 points (21%) 194 points (5%) Mean after filtering 58.3 min 58.4 min 57.9 min 58.2 min Standard deviation after filtering 5.9 min 5.7 min 3.8 min 4.2 min Overall, data sets constructed with passage time pairing Method 2 had a higher percentage of points survive the filtering process when using both 5- and 30-minute intervals (see Table C.47 and Figure C.122), although both data sets performed more poorly when 30-minute intervals were used. This could be because the longer time intervals do not allow for quickly changing conditions such as weekend traffic conges- tion or adverse weather events.

389 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY SUMMARY This section evaluates various methodological approaches and processes for estimat- ing ground-truth segment travel times based on Bluetooth data. The characteristics of Bluetooth data at each node were found to vary significantly as a function of the sur- rounding roadway configuration. When internode distances are small, the availability of signal strength was determined to be an important factor in increasing the accuracy of calculated travel times. Methods are also explored for identifying invalid segment trips, most especially via the analysis of network topology, which facilitates the genera- tion of fewer and higher-quality segment trips for use in statistical analysis. The generation of travel time histograms used filters proposed by Haghani et al. (4). A comparison of two passage time pairing methods was made through histograms of filtered and unfiltered data. Potential pitfalls of using standard filtering procedures on Bluetooth data (such as discarding sparsely populated intervals) were also identi- fied. The filtering methodology demonstrated is statistical in nature, in the sense that data points are discarded based on their statistical characteristics, such as coefficient of variation and distance from the mean. By comparison, passage time pairing strategies are based on the physical characteristics of the network. This exercise showed that to obtain valid travel times, knowledge of the characteristics of the network should be leveraged to the greatest extent possible, although there will still likely be a need for statistics-based filtering due to the nature of Bluetooth data. Figure C.122. Proportions of data points discarded at each step (none discarded at Step 4). 52 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition. ocx Removed at Step 4 0 points (0%) 0 points (0%) 0 points (0%) 0 points (0%) Total removed 4,150 points (71%) 3,590 points (86%) 4,663 points (79%) 3,991 points (95%) Remaining 1,736 points (29%) 595 points (14%) 1,223 points (21%) 194 points (5%) Mean after filtering 58.3 min 58.4 min 57.9 min 58.2 min Standard deviation after filtering 5.9 min 5.7 min 3.8 min 4.2 min [Insert Figure C.122] [caption] Figure C.122. Proportions of data points discarded. 0% 25% 50% 75% 100% Method 2 5-minute Method 3 5-minute Method 2 30-minute Method 3 30-minute Step 1 Step 2 Step 3 Remaining

390 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY USE CASE ANALYSIS This case study explores the use of ETC- and Bluetooth-based vehicle reidentification technologies in support of travel time reliability monitoring in a rural setting. These data collection technologies work by sampling the population of vehicles along the roadway and subsequently matching unique toll tag IDs or Bluetooth MAC addresses between contiguous reader stations. Their effectiveness in accurately calculating road- way travel times is dependent on various factors, including the percentage of the total traffic steam sampled at individual readers and the reidentification rate between pairs of readers. In general, the percentage of the vehicle population sampled by individual readers depends on the penetration rate of the technology within the vehicle population, the positioning and mounting of the reader, and the roadway configuration at the reader’s location. The reidentification rate between pairs of readers can depend on all of these factors, as well as the distance between readers and the likelihood of a trip diverting between the origin and destination reader. Little can be done to increase the technol- ogy’s penetration rate when deploying a reliability monitoring system, so locating, positioning, and configuring readers to maximize their collection of quality data are crucial to the success of the system. As this case study leveraged data generated by networks of existing data collection devices, the research team could not evaluate the process for installing and configur- ing detection infrastructure. However, the team was able to analyze the impacts of the configuration of existing ETC and BTR networks on the nature of the data ultimately collected for use by a TTRMS. The team developed two network configuration– related use cases. The first use case details the findings of the research team’s investiga- tion into the configuration of the Lake Tahoe ETC network and discusses time-of-day dependency of the toll tag penetration rate, the number of lanes that can be moni- tored using ETC infrastructure, and reidentification of toll tags between readers sepa- rated by different distances. The second use case details the team’s investigation into configuration-related issues associated with the BTR network, including the relation- ship between reader location and the number of lanes monitored and the sample sizes measured between readers on different freeways over varying distances. A third use case seeks to quantify the impact of adverse weather– and demand- related conditions on travel time reliability using data derived from the case study’s Bluetooth- and ETC-based systems deployed in rural areas. To examine travel time reliability within the context of this use case, methods were developed to generate PDFs from large quantities of travel time data representing different operating condi- tions. To facilitate this analysis, travel time and flow data from ETC readers deployed on I-80 westbound and BTRs deployed on US-50 eastbound and US-50 westbound were obtained from PeMS and compared with weather data from local surface obser- vation stations. PDFs were subsequently constructed to reflect reliability conditions along these routes during adverse weather conditions, as well as according to time of day and day of the week. Practical data quality issues specific to Bluetooth and ETC data were also explored.

391 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Impact of Electronic Toll Collection Reader Deployment Configuration on Data Quality This use case details the findings of the research team’s investigation into the impact of the configuration of the Lake Tahoe ETC network on the quality of travel time data collected. The ETC detection network consisted of eight FasTrak readers located on I-80 between the eastern outskirts of Sacramento and North Lake Tahoe. Caltrans provided the team with each reader’s county, freeway, a single direction of travel, mile- post, a textual location, and the IP address that could be used to communicate with it and obtain its data. To place data from this network into PeMS, the research team assigned each reader a unique ID and determined its latitude and longitude from the provided milepost. Code was then written to connect with each reader’s IP address, and data were obtained once per minute for storage in the PeMS database. Methodology The configuration data obtained from Caltrans was sufficient to place each reader at a location alongside the roadway. Using that information, the team sought to answer the following questions to more fully understand the impacts of the network’s configura- tion on the characteristics of the reported travel times: 1. Are the readers where they are reported to be? 2. Are any of the readers monitoring multiple directions of travel? 3. What percentage of total traffic is being detected? 4. What percentage of toll tags is matched between pairs of readers? The first question, which addresses where the readers are located, appears straight- forward, but agencies often struggle to track detection equipment in the field. This is especially problematic with vehicle reidentification technologies, which can be easily moved from location to location. Although one solution to this problem is to equip readers with global positioning system units, this is not common practice. The issue is compounded when multiple departments within a single agency, or multiple agen- cies, use the data from these readers for their own purposes and are not informed in a timely manner of configuration changes. To verify reader locations, the team evaluated the travel time data reported between each pair of readers to make sure that the travel times and the number of samples reported within a given time period were reasonable given the distance between the readers and the direction of travel for which they were supposed to be collecting data. Answering the second question is important because, in some cases, ETCs can be deployed such that they monitor two directions of travel. This question was addressed by examining the roadway configuration of each reader deployment and evaluating the ETC data collected to determine whether a significant number of toll tag matches occurred between that reader and the neighboring reader in each direction of travel. The third question addresses the hit rate occurring at each reader. The team cal- culated the hit rate by comparing hourly ETC tag reads against hourly volumes col- lected from nearby loop detectors. In an effort to relate mounting configuration to the percentage of traffic sampled, hit rates were subsequently compared between readers.

392 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The final question relates to the quality of travel times being reported. As a higher percentage of matches will be more likely to result in a more accurate travel time esti- mate, the research team assessed the percentage of tags matched between all possible combinations of upstream and downstream ETC readers. Results from each combina- tion of ETC readers were then compared to determine how the percentage of matches is affected by the hit rates of each individual reader, as well as the distance between readers. Analysis This subsection documents the process used and analysis conducted by the team to develop answers to the four questions discussed above. Are the Readers Where They Are Reported to Be? According to Caltrans, each reader is mounted to an overhead CMS or an overhead fixed sign. Each reader consists of a cabinet mounted to the sign pole, which is con- nected to two antennae mounted on the edge of the sign closest to the roadway. Figure C.123 shows a photograph taken during the installation of the ETC reader at the Donner Lake exit on eastbound I-80. Using the information provided by Caltrans, the team verified that there was a CMS or overhead sign at the latitude–longitude reported for each ETC station. Photo- graphs of each deployment were obtained to determine each ETC’s mounting configu- ration, its positioning over the roadway, and the roadway geometry at that location. Photographs of each reader’s mounting structure, as indicated by Caltrans, are dis- played in Figure C.124. Figure C.123. Electronic toll collection reader installation. Source: Caltrans District 3. 58 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx <H4>Are the Readers Where They Are Reported to Be? According to Caltrans, each reader is mounted to an overhead CMS or an overhead fixed sign. Each reader consists of a cabinet mounted to the sign pole, which is connected to two antennae mounted on the edge of the sign closest to the roadway. Figure C.123 shows a photograph taken during the installation of the ETC reader at the Donner Lake exit on eastbound I-80. [Insert Figure C.123] [caption] Figure C.123. Electronic toll collection reader installation. [source] Source: Caltrans District 3.

393 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.124. Electronic toll collection locations. 60 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.124. Electronic toll collection locations. [Insert Figure C.125] [caption] Auburn–Bell Road/I-80 eastbound Rainbow/I-80 eastbound Rest area/I-80 eastbound Donner Lake/I-80 eastbound Hirschdale/I-80 westbound Prosser Village/I-80 westbound Baxter/I-80 westbound Kingvale/I-80 westbound

394 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The team next evaluated the minimum travel times reported between each pair of readers to ensure they were reasonable given the distances involved. All travel times were determined to be reasonable with the exception of trips that originated or ended at the Kingvale reader, stated by Caltrans as being located on I-80 westbound, adjacent to the Rainbow reader on I-80 eastbound. Results of the team’s analysis indicated that the Kingvale reader was actually located on I-80 eastbound, approximately 3 minutes downstream of the Donner Lake reader, which was later confirmed by Caltrans (see Figure C.125). Are Any of the Readers Monitoring Multiple Directions of Travel? The next step in understanding the impact of the various ETC reader configurations on the nature of the data collected was to determine whether any readers were capturing traffic in both directions of travel. The photographs in Figure C.124 indicated that the eastbound and westbound directions of travel at the Rainbow, rest area, and Donner Lake reader deployments were completely separated from one another. As a result, it was not possible for these readers to monitor the opposite direction of travel. The ability of the other readers to capture bidirectional traffic depended on the size and orientation of the detection zone generated by their antennae. To conduct this analysis, the research team calculated the minimum and median travel times and the number of matches reported between each pair of adjacent readers monitoring opposite directions of travel along I-80. When the minimum travel times reported between two readers approximated the free-flow speed given their geographic distance, and significant numbers of matches were generated that approximated that speed, the research team determined that the destination reader was likely capable of monitoring bidirectional traffic. Alternatively, if the minimum travel times were high and the number of matches low, then the matches likely represented vehicles making a round trip. Figure C.126 shows a depiction of a one-way trip versus a round trip. Figure C.127 displays the hourly travel times and tag matches from Friday, May 27 through Saturday, May 28, 2011, between Auburn–Bell on I-80 eastbound (origin) and Prosser Village on I-80 westbound (destination). The Prosser Village reader is 66 miles east of the Auburn–Bell reader and is deployed in the freeway median. As Figure C.125. Updated Kingvale ETC reader location, I-80 eastbound. 61 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.125. Updated Kingvale ETC reader location, I-80 eastbound. <H4>Are Any of the Readers Monitoring Multiple Directions of Travel? The next step in understanding the impact of the various ETC reader configurations on the nature of the data collected was to determine whether any readers were capturing traffic in both directions of travel. The photographs in Figure C.124 indicated that the eastbound and westbound directions of travel at the Rainbow, rest area, and Donner Lake reader deployments were completely separated from one another. As a result, it was not possible for these readers to monitor the opposite direction of travel. The ability of the other readers to capture bidirectional traffic depended on the size and orientation of the detection zone generated by their antennae. To conduct this analysis, the research team calculated the minimum and median travel times and the number of matches reported between each pair of adjacent readers monitoring opposite directions of travel along I-80. When the minimum travel times reported between two readers approximated the free-flow speed given their geographic distance, and significant numbers of matches were generated that approximated that speed, the research team determined that the destination reader was likely capable of monitoring bidirectional traffic. Alternatively, if

395 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.127. Travel times between origin I-80 eastbound at Auburn–Bell and destination I-80 westbound at Prosser Village. Min = minimum. 63 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.126. Depiction of a one-way trip versus a round trip. [Insert Figure C.127] [caption] Figure C.127. Travel times between origin I-80 eastbound at Auburn–Bell and destination I-80 westbound at Prosser Village. Min = minimum. Figure C.126. Depiction of a one-way trip versus a round trip. 63 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.126. Depiction of a one-way trip versus a round trip. [Insert Figure C.127] [caption] Figure C.127. Travel times between origin I-80 eastbound at Auburn–Bell and destination I-80 t nd at Prosser Village. Min = minimum.

396 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY indicated in Figure C.127, the minimum travel times between Auburn–Bell on I-80 eastbound and Prosser Village on I-80 westbound range between 60 and 80 minutes, which is reasonable given the 60-mile distance between them. The 75th percentile travel times are higher, likely reflecting the travel times of vehicles detected passing the Prosser Village reader on I-80 westbound as part of a round trip after having first passed both the Auburn–Bell reader, as well as the Prosser Village reader, but not being detected by it, while traveling east. Finally, the median travel times more closely reflect the minimum travel times, indicating that the Prosser Village reader was matching more toll tags traveling past it along I-80 eastbound than were being generated based on I-80 westbound round-trips, as reflected in the 75th percentile travel time. Overall, the research team’s analysis indicated that only the Prosser Village reader was capable of monitoring bidirectional traffic; Caltrans later indicated that the Prosser Village reader had been deployed with antennae facing in both directions of travel. What Percentage of Total Traffic Is Being Detected? The percentage of the vehicle population sampled by individual readers depends on various factors, including the penetration rate of the technology within the vehicle population and the positioning and mounting of the reader. The Bay Area Toll Authority reported in January 2011 that 53% of drivers pass- ing through its toll plazas were equipped with FasTrak tags, with that percentage increasing to 65% during weekday peak periods (5). Even so, the ETCs in the Tahoe area are more than 100 miles from the nearest toll plaza. Consequently, the percentage of vehicles equipped with FasTrak tags depends, to a great extent, on traffic patterns between the Bay Area and Lake Tahoe. Previous studies of ETC-based travel time data collection deployments have noted that a number of configuration-related factors can potentially affect the quantity and quality of tag reads (6). For example, when readers are positioned directly overhead, such as at tolling facilities, they reliably capture data from almost all toll tags. But in other real-world traffic monitoring deployments, such as at Lake Tahoe, ETC readers are placed at the side of the road, increasing their distance from vehicles and reducing the efficiency of their tag reads. Such configurations also make it more difficult for readers to capture traffic across all lanes of travel, particularly when there are multiple lanes of traffic. To calculate the hit rate, or the percentage of vehicles sampled at each reader, the research team compared ETC tag reads with the traffic flows measured at nearby loop detectors. The hit rate at Prosser Village was not analyzed because there were no work- ing loop detectors nearby. Table C.48 displays the average daily hit rates, by day of the week, for each of the ETCs along I-80, collected during the week of May 9 to May 15, 2011. Low hit rates on Sunday and Monday were common to all of the eastbound readers, especially on the eastern end of the monitored corridor. Another trend common across all readers, though especially marked at the Auburn–Bell Road reader, was the spike in the hit rate during the overnight hours (see Figure C.128). This could be due to the higher percent- age of freight traffic during these hours, which may be more likely to be equipped with FasTrak tags.

397 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Comparing average hourly hit rates for all of the readers on I-80 eastbound from Tuesday through Friday makes it clear that some readers are sampling a significantly higher percentage of traffic than others. An examination of photographs of the signs on which each reader was mounted provides no clear explanation for why the hit rates at some readers are approximately double those at other readers. The hit rate at Auburn–Bell Road may be lower because it is the only location with three lanes of travel (all other readers monitor only two lanes) being monitored by two antennae. The rest area reader, though appearing to be optimally positioned above the roadway, also has a low hit rate. A possible reason for the low hit rate is that the antennae here are not properly aligned with the two lanes of travel, resulting in a reduced number of toll tag reads. TABLE C.48 AVERAGE ETC HIT RATES BY DAY OF THE WEEK Reader Average (%) Sunday (%) Monday (%) Tuesday– Thursday (%) Friday (%) Saturday (%) Auburn–Bell Road (I-80 EB) 3.4 2.9 2.6 4.0 4.0 3.6 Rainbow (I-80 EB) 5.4 2.9 3.3 6.9 7.3 6.7 Rest area (I-80 EB) 3.2 1.6 1.8 5.0 4.0 3.4 Donner Lake (I-80 EB) 6.3 3.6 3.7 8.9 8.5 6.8 Kingvale (I-80 EB) 6.4 3.9 4.4 9.6 7.2 7.0 Hirschdale (I-80 WB) 4.5 4.1 3.7 5.6 5.0 4.1 Baxter (I-80 WB) 6.7 6.2 6.0 8.1 7.1 5.9 Note: Time frame is 7:00 a.m. to 8:00 p.m. Figure C.128. Hourly hit rates for Auburn–Bell Road reader. 66 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Baxter (I-8 WB) 6.7 6.2 6.0 8.1 7.1 5.9 Note: Time frame is 7:00 a.m. to 8:00 p.m. [Insert Figure C.128] [caption] Figure C.128. Hourly hit rates for Auburn–Bell Road reader. Comparing average hourly hit rates for all of the readers on I-80 eastbound from Tuesday through Friday makes it clear that some readers are sampling a significantly higher percentage of traffic t an others. An exa ination of photographs of the signs on which each reader was mounted provides no clear explanation for why the hit rates at some readers are approximately double those at other readers. The hit rate at Auburn–Bell Ro d may be lo er because it is the only location with three lanes of travel (all oth r aders monitor only two lanes) being monitored by two antennae. The rest area reader, though appearing to be optimally positioned above the roadway, also has a low hit rate. A possible reason for the low hit rate is that the

398 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To gauge the sampling rate in another way, the team also looked at the raw num- ber of hourly tag reads reported by each reader, the results of which are displayed in Figure C.129 for the eastbound direction of travel. Despite its low percentage of tag reads, the reader at Auburn–Bell Road still records a large number of reads, simply because the traffic volumes are higher here than at any other reader. The highest num- ber of reads recorded across readers is on Friday, due to the recreational pattern of weekend trips to Lake Tahoe. What Percentage of Toll Tags Is Matched Between Pairs of Readers? For the purpose of calculating travel time reliability–related metrics, it is most impor- tant to have the ability to quantify the typical percentages and volumes of toll tags matched between multiple readers, as this directly affects the quality of aggregated travel times. To quantify typical tag match rates between readers, the team looked at the percentage of vehicles being matched between the farthest upstream readers (Auburn–Bell Road in the eastbound direction and Hirschdale Road in the westbound direction) and all subsequent downstream readers between May 9 and May 15, 2011 (see Figure C.104 for the deployment layout). Figure C.130 shows the percentage of toll tags detected at the Auburn–Bell reader that are reidentified at each downstream I-80 eastbound reader (ordered from left to right by distance from origin). If each reader’s data collection capabilities, and therefore its hit rates, were identical, one would expect to see the percentage of matched tag reads decrease with distance from the origin reader as vehicles detected at Auburn–Bell Road deviated from I-80 eastbound. However, this trend does not hold for these readers. Instead, the highest matching percentage (91%) is seen between Auburn–Bell Road and the Kingvale reader, which are separated by a distance of 50 miles. At the same time, matches between Auburn–Bell Road and the Rainbow and rest area readers are much lower, which is consistent with the low hit rates measured at these three stations. Figure C.131 displays the total number of hourly matches measured on eastbound I-80 between the origin reader at Auburn–Bell Road and all downstream readers. At all readers, matches are the highest on Fridays, which is supported by local traffic Figure C.129. Hourly tag reads for I-80 eastbound readers.

399 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY patterns between the Bay Area and Lake Tahoe. Again, despite being the second far- thest reader from the origin, the Kingvale reader often sees the most matches on I-80 eastbound. Daytime matches between Auburn–Bell and Kingvale generally exceed 30 per hour (three per 5 minutes or eight per 15 minutes). Although this number of samples is likely too low to compute accurate 5-minute average travel times, these data have the potential to be used to generate average 15-minute or hourly statistics throughout the week. Figure C.130. Average percentage of toll tags matched on I-80 eastbound from origin Auburn–Bell Road. 69 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.130. Average percentage of toll tags matched on I-80 eastbound from origin Auburn– Bell Road. Figure C.131 displays the total number of hourly matches measured on eastbound I-80 between the origin reader at Auburn–Bell Road and all downstream readers. At all readers, matches are the highest on Fridays, which is supported by local traffic patterns between the Bay Area and Lake Tahoe. Again, despite being the second farthest reader from the origin, the Kingvale reader often sees the most matches on I-80 eastbound. Daytime matches between Auburn–Bell and Kingvale generally exceed 30 per hour (three per 5 minutes or eight per 15 minutes). Although this number of samples is likely too low to compute accurate 5-minute average travel times, these data have the potential to be used to generate average 15-minute or hourly statistics throughout the week. [Insert Figure C.131] [caption] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rainbow Rest Area Donner Kingvale Prosser % o f t ol l t ag s Matched Unmatched 70 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.131. Hourly toll tag matches on I-80 eastbound for origin reader at Auburn–Bell Road. <H3>Findings Two primary variables affect hit rate: the total number of ETC tags in the population of vehicles that pass a reader and the number of tags actually read by a specific reader. The product of these factors has a significant influence on the accuracy of travel time data generated between any two ETC readers. This use case sought to demonstrate how the hit rates and matching percentages generated by the Tahoe region ETC network may be affected by the configuration of individual ETC readers. Table C.49 summarizes the configuration and average data collection results for each of the single-directional ETC readers deployed along I-80 eastbound during Friday afternoons and evenings (12:00 noon to 8:00 p.m.), considered the peak period for this roadway due to weekend traffic between the Bay Area and Lake Tahoe. All of the readers included in Table C.49 are deployed along I-80, are mounted on overhead signs, and monitor two (or in once case three) lanes of traffic. These readers are deployed under what appear to be similar Figure C.131. Hourly toll tag matches on I-80 eastbound for origin reader at Auburn–Bell Road.

400 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Findings Two primary variables affect hit rate: the total number of ETC tags in the population of vehicles that pass a reader and the number of tags actually read by a specific reader. The product of these factors has a significant influence on the accuracy of travel time data generated between any two ETC readers. This use case sought to demonstrate how the hit rates and matching percentages generated by the Tahoe region ETC net- work may be affected by the configuration of individual ETC readers. Table C.49 summarizes the configuration and average data collection results for each of the single- directional ETC readers deployed along I-80 eastbound during Friday afternoons and evenings (noon to 8:00 p.m.), considered the peak period for this roadway due to weekend traffic between the Bay Area and Lake Tahoe. All of the readers included in Table C.49 are deployed along I-80, are mounted on overhead signs, and monitor two (or in once case three) lanes of traffic. These readers are deployed under what appear to be similar conditions, yet there are notable differences in the percentage of the traffic stream sampled at the different locations. To begin with, hit rates for the Donner Lake and Rainbow readers are more than twice those generated by the Auburn–Bell Road and rest area readers. Although the team was not able to investigate the underlying reason for these differences, it is believed they are most likely the result of reader antennae being misaligned at some locations and the inability of the reader at Auburn–Bell Road to accurately collect data from three lanes of traffic using only two ETC readers. As seen in Table C.49, differences in hit rates of only 2% to 3% can make a signifi- cant difference in the number of tag reads collected, which is crucial for ensuring that a sufficient number of samples are reidentified downstream to generate accurate travel times and travel time distributions. TABLE C.49. I-80 EASTBOUND ETC READER SUMMARY FOR FRIDAYS Reader Mounting Type No. of Lanes No. of Tag Reads Traffic Sampled (%) Distance to Next Reader (mi) No. of Exits Between Readers Hits Reidentified Downstream (%) Auburn– Bell Road Roadside CMS 3 648 3.5 45 24 83 Rainbow Roadside CMS 2 789 7.4 8 1 42 Rest area Roadside sign 2 380 3.6 4 1 99 Donner Roadside sign 2 785 7.4 3 2 96 Kingvale Roadside sign 2 696 6.6 6 na 61 Note: Time frame is noon to 8 p.m.; all signs and CMS were deployed in the eastbound direction; Kingvale is at the end of the test site so na (not applicable).

401 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY As expected, the hit rate for an individual reader has a profound impact on that reader’s ability to reidentify vehicles initially detected at upstream readers. For exam- ple, as shown in Table C.49, even though the Auburn–Bell Road reader is 45 miles and 24 exits from the downstream reader at Rainbow, the high hit rate at this downstream reader enables it to reidentify 83% of vehicles initially detected at Auburn–Bell Road. Given the number of opportunities to exit the freeway, this likely represents nearly all of the ETC-equipped vehicles that pass between the readers. Conversely, despite the fact that the rest area reader is only 8 miles from the Rainbow reader, with only one exit ramp between them, the rest area reader is only able to match 42% of vehicles ini- tially identified at Rainbow. Overall, at least on rural roads experiencing fairly signifi- cant through traffic, the readers’ hit rates appear to affect the percentage of matched vehicles to a greater extent than the distance between the readers. However, even with ideal reader placement and configuration, the primary con- straint on the percentage of traffic sampled will always be the penetration rate of toll tags in the population. In rural areas, it is uncommon to have electronic tolling infra- structure, so deploying ETCs in these locations requires that at least some portion of the traffic stream be composed of vehicles equipped with toll tags used in nearby urban areas. The results of this use case show that this penetration rate can vary by time of day and day of the week; for example, on I-80 eastbound, far fewer FasTrak-equipped vehicles travel the corridor on Sundays and Mondays than during the rest of the week. Impact of Bluetooth Reader Deployment Configuration on Data Quality This use case details the findings of the research team’s investigation into the impact of the configuration of the Lake Tahoe Bluetooth network on the quality of travel time data collected. The BTR network leveraged in this case study was deployed along I-5 in Sacramento and US-50 between Placerville and Lake Tahoe. For each BTR, the research team received configuration data in a csv file, with fields for the node ID, a textual location, and a latitude–longitude reading. The research team was also pro- vided with a 2-gigabyte SQL file containing all of the Bluetooth data collected at the readers between December 25, 2010, and April 21, 2011. This use case only uses the eight BTRs that provided more than a week of data. These data were downloaded into PeMS for use in computing travel times between each BTR pair. Methodology In evaluating the impact of the network’s configuration on the characteristics of the reported travel times, the team sought to answer questions of a similar nature to those explored as part of the ETC use case, including 1. Are the readers where they are reported to be? 2. Which facilities is each reader monitoring? 3. What percentage of total traffic is being detected? 4. What percentage of Bluetooth devices is matched between pairs of readers?

402 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The first question was particularly important for this use case as the BTRs were deployed as part of a test, and not as permanent data collection infrastructure. As a result, each BTR changed locations multiple times over a span of several months. To compute accurate travel times, the team had to ensure that the locations provided in the configuration file matched the data delivered in the SQL file. This was achieved by mapping the latitude and longitude provided by Caltrans to determine whether the data matched the textual locations provided. Answering the second question is critical to all Bluetooth studies. Class 1 Bluetooth devices like the ones used in this case study have a detection radius of 300 feet. As a result, the potential exists for the BTRs to monitor bidirectional traffic along a road- way, as well as traffic along parallel facilities, which presents challenges when trying to compute accurate travel times. The research team evaluated the reader locations and data to approximate the lanes of travel each reader monitored; whether the readers were monitoring traffic bidirectionally; whether they might also be detecting vehi- cles on on-ramps, off-ramps, or frontage roads; and whether the potential existed for them to capture data concerning the movement of other modes of travel, such as from bicyclists. The third question addresses the hit rate occurring at each reader. The team cal- culated this by comparing hourly BTR reads against hourly volumes collected from nearby loop detectors. In an effort to relate mounting configuration to the percentage of traffic sampled, hit rates were subsequently compared between readers. The final question relates to the quality of travel times being reported. As travel time estimates are more likely to be accurate with higher numbers of matches, the research team assessed the percentage of Bluetooth devices matched between all possi- ble combinations of upstream and downstream BTRs. Results from each combination of BTRs were then compared to determine how the percentage of matches is affected by the hit rates of each individual reader, as well as the distance between readers. Analysis This subsection documents the process used and analysis conducted by the team to develop answers to the four questions discussed above. Are the Readers Where They Are Reported to Be? Using the information provided by Caltrans, the team verified the location of each BTR according to both its latitude and longitude and textual description. Although some of the readers represented in the configuration file were erroneously located (e.g., the latitude–longitude of one placed it in a lake), the eight readers used as part of this use case all appeared to be in roughly the correct location. Photographs of each BTR station used as part of this use case are displayed in Figure C.132. One BTR, deployed on US-50 at Echo Summit, is not visible as a result of being buried in snow. Despite this, the team was able to use the data collected from this station as part of its analysis. As a final location confirmation step, the team evaluated the minimum travel times computed between each BTR to ensure they were reasonable given the distances involved. All minimum travel times were subsequently determined to be reasonable, and the BTR locations deemed accurate.

403 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Which Facilities Is Each Reader Monitoring? The next step in understanding the impact of each BTR’s configuration on the nature of the data collected was to determine which readers might be capturing traffic data for multiple directions of travel. The BTRs evaluated as part of this use case were deployed as follows: • Three of the readers on US-50 were deployed in locations where there is one lane of travel in each direction. • The reader at US-50 and Placerville monitored two lanes in each direction. • The reader at US-50 and Meyers was located near an intersection that might result in its picking up MAC addresses from vehicles turning on to US-50 from a cross street. • The reader at I-5 and Vallejo had the potential to monitor up to five lanes in each direction. • The reader at I-5 and Gloria (the only BTR along I-5 located on the southbound side of the roadway) had the potential to monitor up to four lanes of travel in each direction. • The reader at I-5 and Florin was located in the middle of the cloverleaf on-ramp of Florin Road to I-5 northbound. It was adjacent to four mainline northbound and southbound lanes. Given the reader’s location, it was likely detecting significant numbers of vehicles entering and exiting I-5, both traveling at slower speeds and being detected earlier (for on-ramp vehicles) or later (for off-ramp vehicles) than if they were actually traveling on I-5. • The reader at I-5 and Pocket was located some distance from the northbound side of the roadway. It had the potential to monitor two on-ramp lanes to I-5 north- bound, three mainline lanes in each direction, and a cloverleaf off-ramp from I-5 southbound. Based on this analysis, the research team concluded that all BTRs were likely monitoring at least bidirectional traffic, a conclusion that was confirmed by the data analysis conducted in support of the following subsection. This effort also provided some insight into how the reader’s locations can potentially affect the sampling of nonrepresentative trips. What Percentage of Total Traffic Is Being Detected? As with the ETCs, the percentage of the traffic stream monitored by a BTR depends on the penetration of Bluetooth-enabled devices within the vehicle population. Although it is estimated that 20% of travelers now have Bluetooth devices with them in their vehicles, at least a quarter of them do not have the device set to discoverable mode. The detection rate also depends on the reader’s configuration. Given its 300-foot detection radius, a single Class 1 BTR could easily monitor all lanes of a freeway that has four lanes of traffic in each direction of travel and is barrier separated. However, it might also collect undesired samples, such as Bluetooth devices on parallel facilities or within office buildings. All readers used in this case study had approximately the same average signal strength, so this variable was not a factor.

404 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 79 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx US-50/Echo Summit: not visible in photograph US-50/Meyers: off US-50 eastbound US-50/Twin Bridges: off US-50 eastbound US-50/Placerville: off US-50 eastbound I-5/Vallejo: off I-5 northbound Figure C.132. Bluetooth reader cabinet locations. (Continued on next page.)

405 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 80 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.132. Bluetooth reader cabinet locations. [Insert Figure C.133] [caption] I-5/Gloria: off I-5 southbound I-5/Florin: off Florin on-ramp to I-5 northbound I-5/Pocket: off Pocket on-ramp to I-5 northbound 79 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx US-50/Echo Summit: not visible in photograph US-50/Meyers: off US-50 eastbound US-50/Twin Bridges: off US-50 eastbound US-50/Placerville: off US-50 eastbound I-5/Vallejo: off I-5 northbound Figure C.132. Bluetooth reader cabinet locations (continued).

406 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To calculate the percentage of vehicles sampled at each BTR, the research team compared Bluetooth mobile device reads with the traffic flows measured at nearby loop detectors; the result is referred to as the hit rate. Hit rates were computed for the four readers on I-5 (there were no working loop detectors near the US-50 readers). Because all readers were presumed to monitor both directions of travel along I-5, and as it is impossible to assign a direction of travel to unmatched Bluetooth reads, hit rates were calculated by comparing hourly detections at each reader with hourly volumes summed from nearby northbound and southbound loop detectors over a week-long period (Monday, February 28, to Sunday, March 6, 2011). In addition, because the Florin and Pocket readers clearly detected traffic on roadway on-ramps, hit rates at these readers were computed by comparing the hourly reader detections with the hourly volumes summed from the mainline and on-ramp loop detectors to avoid upwardly biasing the hit rates at these readers. Bluetooth hit rates were first evaluated to determine if they exhibited any time-of- day or day-of-the-week patterns. As with the ETC readers, hit rates were lowest dur- ing the early morning hours. There were no other discernible patterns. Figure C.133 compares the hourly hit rates measured over three days (Tuesday to Thursday) across the four readers. The hit rates at all readers generally ranged between 6% and 10%. Hit rates were usually highest (between 8% and 10%) at the Gloria reader, which was directly adjacent to the southbound lanes. The reader at Pocket typically had the lowest hit rates (between 6% and 8%), possibly due to its being set back from the roadway. The data displayed in Figure C.134 present the raw number of hourly MAC address reads at each reader listed. The reads shown in this plot are based on the num- ber of MAC address reads remaining after filtering to remove duplicate IDs at the same reader during the same hour. Although the reader at I-5 and Vallejo does not have the highest hit rate, it generally records the largest number of MAC addresses per hour, Figure C.133. Hourly hit rates for I-5 Bluetooth readers. L02  Guide   Inserts  for  2nd  pages   2014.08.07           For  Fig.  C.133:      

407 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY reaching nearly 1,000 reads per hour during the weekday afternoon peak. In contrast, the reader at I-5 and Pocket has both the lowest hit rate and the lowest number of reads, with between 500 and 600 MAC address reads per hour during the peak hours and only 300 to 400 per hour during the midday. Although the hit rate could not be computed for the readers on US-50, the research team did evaluate the raw number of hourly MAC address reads at each reader on this road (see Figure C.135). The pattern of reads on US-50 differs from that on I-5, which follows the more typical morning and afternoon peak commute pattern. On US-50, each reader detects the most reads on Fridays, Saturdays, and Sundays, which reflects the recreational traffic patterns near Lake Tahoe. At the Meyers, Echo Summit, and Twin Bridges readers, which are all relatively closely spaced (within 12 miles of 82 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx similar, and are quite low (30 to 50 per hour, or two to four per 5 minutes) from Monday through Thursday. The number of reads at the Placerville reader, which is closer to Sacramento, is higher, especially during the work week, when this location has higher traffic volumes. [Insert Figure C.134] [caption] Figure C.134. Hourly reads for I-5 Bluetooth readers. [Insert Figure C.135] [caption] Figure C.134. Hourly reads for I-5 Bluetooth readers. Figure C.135. Hourly reads for US-50 Bluetooth readers. 83 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.135. Hourly reads for US-50 Bluetooth readers. <H4>What Percentage of Bluetooth Devices Is Matched Between Pairs of Readers? For the purpose of supporting the calculation of travel time reliability–related metrics, it is most important to have the ability to quantify the typical percentages and volumes of Bluetooth devices matched between multiple readers, as this directly affects the quality of aggregated travel times. The first step in performing this analysis was to evaluate the percentage of each reader’s Bluetooth MAC address reads reidentified at downstream readers. Results for the readers along I-5 are shown in Figure C.136 and for the US-50 readers in Figure C.137. On I-5, the Vallejo (northernmost) and Pocket (southernmost) readers only have downstream readers in one direction of travel (see Figure C.104 for the deployment layout). Reidentification of devices between these readers occurred as follows:  For the Vallejo reader, approximately 42% of the MAC address reads were reidentified at the Gloria reader located about 4 miles to the south.  For the Gloria reader, about 48% of the reads were reidentified in the northbound direction at Vallejo, 50% of the reads were reidentified in the southbound direction at Florin, and 2% were not reidentified at all.

408 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY one another) near South Lake Tahoe, the number of hourly reads are fairly similar, and are quite low (30 to 50 per hour, or two to four per 5 minutes) from Monday through Thursday. The number of reads at the Placerville reader, which is closer to Sacramento, is higher, especially during the work week, when this location has higher traffic volumes. What Percentage of Bluetooth Devices Is Matched Between Pairs of Readers? For the purpose of supporting the calculation of travel time reliability–related metrics, it is most important to have the ability to quantify the typical percentages and vol- umes of Bluetooth devices matched between multiple readers, as this directly affects the quality of aggregated travel times. The first step in performing this analysis was to evaluate the percentage of each reader’s Bluetooth MAC address reads reidentified at downstream readers. Results for the readers along I-5 are shown in Figure C.136 and for the US-50 readers in Figure C.137. On I-5, the Vallejo (northernmost) and Pocket (southernmost) readers only have downstream readers in one direction of travel (see Figure C.104 for the deployment layout). Reidentification of devices between these readers occurred as follows: • For the Vallejo reader, approximately 42% of the MAC address reads were reidentified at the Gloria reader located about 4 miles to the south. • For the Gloria reader, about 48% of the reads were reidentified in the northbound direction at Vallejo, 50% of the reads were reidentified in the southbound direc- tion at Florin, and 2% were not reidentified at all. • For the Florin reader, 53% of the reads were reidentified in the northbound direc- tion at Gloria, 39% were reidentified in the southbound direction at Pocket, and 8% were not reidentified in either direction. • For the Pocket reader, 48% of the reads were reidentified in the northbound direc- tion at Florin. Overall, the rate of matching between readers was high, with the vast majority of Bluetooth devices matched at another sensor for use in generating travel times. Reidentification rates were also high between the readers along US-50, particularly the three deployments closest to Lake Tahoe. Reidentification of devices between these readers occurred as follows: • For the Meyers (easternmost) reader, for which there is no downstream reader in the eastbound direction, 50% of the reads were reidentified at the Echo Summit reader, 4 miles to the west. • For the Echo Summit reader, 57% of the reads were reidentified at Meyers and 43% were reidentified at Twin Bridges, 8 miles to the west. Virtually none of the reads captured at Echo Summit went unmatched, likely due to its location at a point on the roadway that has no parallel facilities, and the fact that there were few possible exits between Echo Summit and Meyers or Echo Summit and Twin Bridges.

409 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 85 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx  For the Placerville (westernmost) reader, 22% of the reads were reidentified downstream at Twin Bridges. Based on these high reidentification rates, the team concluded that the readers on US-50 were capable of detecting and reidentifying a very high proportion of the Bluetooth devices that pass through their detection zones, likely due to the narrow roadway width at these locations and the limited options available to exit the roadway. [Insert Figure C.136] [caption] Figure C.136. MAC address matching rates for I-5 readers. [Insert Figure C.137] [caption] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Vallejo Gloria Florin Pocket Pe rc en t o f M A C ID s SB matches NB matches unmatched Figure C.136. MAC address matching rates for I-5 r aders. 86 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.137. MAC address matching rates for US-50 readers. The next technique for evaluating Bluetooth device reidentification between readers was to examine the raw number of matches between readers to assess whether the match volumes were sufficient to yield accurate average travel times. This assessment was carried out by selecting an origin reader and computing the hourly matches to a series of destination readers. Figure C.138 displays the results of this analysis in the southbound direction along I-5 for the Gloria, Florin, and Pocket readers from the origin reader at Vallejo. Highlights of this analysis include the following:  As the team expected, the greatest number of matches occurred with the closest downstream reader, Gloria, and the fewest matches with the reader farthest away, Pocket. These matches differed by about 100 during each afternoon peak hour, representing a difference of 25%.  Matches between Vallejo and all downstream readers averaged about 16 per 5 minutes during daytime hours, likely sufficient for obtaining 5-minute travel times. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Meyers Echo Twin Placer P er ce nt o f M AC ID s WB matches EB matches unmatched Figure C.137. MAC address matching rates for US-50 readers. • For the Twin Bridges reader, 40% of the reads were reidentified to the east at Echo Su mit, 47% of the reads were reidentified at Pl cerville (39 miles to the west), and 13% of the reads were not reidentified. • For the Placerville (westernmost) reader, 22% of the reads were reidentified down- stream at Twin Bridges.

410 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Based on these high reidentification rates, the team concluded that the readers on US-50 were capable of detecting and reidentifying a very high proportion of the Bluetooth devices that pass through their detection zones, likely due to the narrow roadway width at these locations and the limited options available to exit the roadway. The next technique for evaluating Bluetooth device reidentification between read- ers was to examine the raw number of matches between readers to assess whether the match volumes were sufficient to yield accurate average travel times. This assessment was carried out by selecting an origin reader and computing the hourly matches to a series of destination readers. Figure C.138 displays the results of this analysis in the southbound direction along I-5 for the Gloria, Florin, and Pocket readers from the origin reader at Vallejo. Highlights of this analysis include the following: • As the team expected, the greatest number of matches occurred with the closest downstream reader, Gloria, and the fewest matches with the reader farthest away, Pocket. These matches differed by about 100 during each afternoon peak hour, representing a difference of 25%. • Matches between Vallejo and all downstream readers averaged about 16 per 5 minutes during daytime hours, likely sufficient for obtaining 5-minute travel times. • The number of matches peaked during the afternoon period, at around 350 to 500 per hour, when travelers were departing Sacramento for its southern suburbs. • Volumes were lower on weekends but still sufficient to support average and me- dian travel time computations at a fine granularity. Figure C.139 displays results of this analysis for hourly northbound matches at the Florin, Gloria, and Vallejo readers from the origin reader at Pocket. Highlights of this analysis include the following: Figure C.138. MAC address matches on I-5 south from Vallejo reader. 87 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx  The number of matches peake during the afternoon period, at arou d 350 to 500 per hour, when travelers were departing Sacramento for its southern suburbs.  Volumes were lower on weekends but still sufficient to support average and median travel time computations at a fine granularity. [Insert Figure C.138] [caption] Figure C.138. MAC address matches on I-5 south from Vallejo reader. Figure C.139 displays results of this analysis for hourly northbound matches at the Florin, Gloria, and Vallejo readers from the origin reader at Pocket. Highlights of this analysis include the following:  As Pocket had the lowest hit rate of the readers on I-5, a smaller percentage of vehicles was available for reidentification when using this reader as an origin.

411 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • As Pocket had the lowest hit rate of the readers on I-5, a smaller percentage of vehicles was available for reidentification when using this reader as an origin. • The numbers of matches at each of the three destination readers were similar, and generally differed by less than 25 per hour, representing a difference of about 10%. • The number of matches peaked during the morning period at around 350 to 400 per hour, when the majority of traffic was commuting north to Sacramento. • As seen in the southbound direction, matches were lower on Saturdays and Sundays, but still likely sufficient to calculate fine-grained average and median travel times. Results for the number of hourly matches between the Meyers reader (eastern- most) and downstream readers on US-50 are displayed in Figure C.140. Highlights include the following: Figure C.139. MAC address matches on I-5 north from Pocket reader. 88 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx  The numbers of matches at each of the three destination readers were similar, and generally differed by less than 25 per hour, representing a difference of about 10%.  The number of matches peaked during the morning period at around 350 to 400 per hour, when the majority of traffic was commuting north to Sacramento.  As seen in the southbound direction, matches were lower on Saturdays and Sundays, but still likely sufficient to calculate fine-grained average and median travel times. [Insert Figure C.139] [caption] Figure C.139. MAC address matches on I-5 north from Pocket reader. Results for the number of hourly matches between the Meyers reader (easternmost) and downstream readers on US-50 are displayed in Figure C.140. Highlights include the following: Figure C.140. MAC address matches on US-50 westbound from Meyers reader. Placerville

412 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Although the number of matches decreased with distance from the origin reader, the number of matches was similar between the three destinations due to a sig- nificant amount of traffic on US-50 traveling its entire length from Lake Tahoe to Sacramento. • The number of matches was much lower than along I-5, likely due to the rural characteristics and lower traffic volumes on US-50. • The number of matches was highest on Saturdays and Sundays due to recreational traffic, which peaked on Sunday afternoons, when travelers return from Lake Tahoe to Sacramento and the Bay Area. • During the peak hours on Sunday, there were 100 to 140 hourly matches (8 to 12 per 5 minutes or 25 to 35 per 15 minutes) between the Meyers reader and the Placerville reader, 50 miles away, likely sufficient to calculate 15-minute travel times, and possibly 5-minute travel times, for this facility’s peak hour. • During the rest of the week, the number of hourly matches ranged from around 20 to 50 during the daylight hours (2 to 4 per 5 minutes or 5 to 12 per 15 minutes). This number of matches is not likely sufficient to compute average and median travel times every 5 minutes, though it might be used to compute 15-minute or hourly average and median travel times. The number of hourly matches for traffic between the origin reader at Placerville (westernmost) and the destination readers at Twin Bridges, Echo Summit, and Meyers is shown in Figure C.141. Observations of particular note in the figure include the following: • Matches peaked on Fridays and Saturdays as vehicles traveled from the Bay Area and Sacramento to Lake Tahoe. • During the peak hours on Friday and Saturday afternoons, matches between the Placerville reader and the Meyers reader, near South Lake Tahoe, were around Figure C.141. MAC address matches on US-50 eastbound from Placerville reader. 91 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.141. MAC address matches on US-50 eastbound from Placerville reader. As many travelers make trips between Sacramento and Lake Tahoe, there is potentially value in knowing the travel times between the two. For this reason, the team also examined the number of hourly matches between the readers along I-5 and the readers on US-50. Figure C.142 shows the number of matches between the reader at Meyers (closest to South Lake Tahoe) and other readers along I-5. As these readers are on different freeways and are about 100 miles apart, the key question is whether there are sufficient matches to compute travel times at any level of granularity. Figure C.142 displays the results of this analysis, representing trips along US-50 westbound, exiting onto I-5 southbound. Key observations in the figure include the following:  The peak number of matches occurred on Sunday afternoon, when some hours had up to 16 to 18 matches. However, even during this peak, there were some hours when the number of matches dipped to only five per hour. Consequently, it does not appear possible to consistently calculate travel times at a fine granularity, even on Sunday afternoons. However, there are sufficient matches to compute hourly travel times, which

413 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 100 per hour (8 per 5 minutes or 25 per 15 minutes), likely enough to compute average and median travel times at a 5- or 15-minute granularity. During the rest of the week, however, there were only about 20 matches per hour. As many travelers make trips between Sacramento and Lake Tahoe, there is poten- tially value in knowing the travel times between the two. For this reason, the team also examined the number of hourly matches between the readers along I-5 and the readers on US-50. Figure C.142 shows the number of matches between the reader at Meyers (closest to South Lake Tahoe) and other readers along I-5. As these readers are on different freeways and are about 100 miles apart, the key question is whether there are sufficient matches to compute travel times at any level of granularity. Figure C.142 displays the results of this analysis, representing trips along US-50 westbound, exiting onto I-5 southbound. Key observations in the figure include the following: • The peak number of matches occurred on Sunday afternoon, when some hours had up to 16 to 18 matches. However, even during this peak, there were some hours when the number of matches dipped to only five per hour. Consequently, it does not appear possible to consistently calculate travel times at a fine granularity, even on Sunday afternoons. However, there are sufficient matches to compute hourly travel times, which could provide a reasonable indication of travel time reliability for travelers who want to make this return trip from Lake Tahoe. • During the rest of the week, there were insufficient matches to compute accurate average and median travel times by time of day and day of the week, though travel times could be collected over a period of many weeks to compute average and median travel times and travel time variability. Figure C.143 displays the hourly matches between the Pocket reader (southernmost) on I-5 and each of the readers along US-50. These matches most likely represent vehicles traveling north on I-5 toward Sacramento, and then exiting to US-50 eastbound in the Figure C.142. MAC address matches on I-5 southbound from Meyers on US-50. 92 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx could provide a reasonable indication of travel time reliability for travelers who want to make this r turn trip from Lake Tahoe.  During t rest of the week, there were i sufficient matches to compute accurate average and medi n travel times by time of day and day of the week, though travel times could be co lected over a peri d of many w ek to compute average and median travel times and travel time variability. [Insert Figure C.142] [c pt on] Figure C.142. MAC address matches on I-5 southbound from Meyers on US-50. Figure C.143 displays the hourly matches between the Pocket reader (southernmost) on I- 5 and each of the readers along US-50. These matches most likely represent vehicles traveling north on I-5 toward Sacramento, and then exiting to US-50 eastbound in the direction of Lake Tahoe. The number of matches peaked at between six and 12 per hour on Friday afternoon, and was also higher on Saturday morning, at around eight per hour. Matches during the rest of the

414 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY direction of Lake Tahoe. The number of matches peaked at between 6 and 12 per hour on Friday afternoon, and was also higher on Saturday morning, at around 8 per hour. Matches during the rest of the week were lower, but could potentially be studied over time to better understand the variability of travel times by time period. Findings This use case provided the opportunity to assess the performance of a Bluetooth-based travel time monitoring system deployed in an urban environment with one deployed in a rural environment, while simultaneously demonstrating how sensor configuration affects both the amount and quality of data collected. One overarching finding of this use case is that the potential exists to use BTRs to generate travel times over long distances between urban and rural settings based on travel along adjoining roadways; however, this use is heavily dependent on the pres- ence of the right set of conditions. For example, as indicated in Table C.50, during an average Friday afternoon and evening, 132 vehicles detected at the Vallejo reader on I-5 (2% of the Bluetooth reads at this location) were later reidentified at the Placerville reader on US-50, more than 46 miles away. For this origin–destination pair, this degree of mobile device reidentification was sustained only on Fridays and Saturdays; on other days of the week, far fewer matches were registered. Within urban environments, BTRs placed along the same freeway have the capacity to produce sufficient numbers of matches to continuously compute fine- grained 5-minute travel times. In contrast, due to lower overall traffic volumes in rural areas, fewer travel time matches are generated, and this capacity is therefore reduced. Even so, at least in the area around Lake Tahoe, sufficient matches were generated to compute fine-grained travel times during peak days. The research team’s results also indicate that a single BTR can typically be used to monitor bidirectional traffic. Although the number of lanes at each reader used as part of this case study ranged from 2 to 10, data indicate that each reader was able to capture traffic in most, if not all, of the lanes at its location. Figure C.143. MAC address matches on US-50 eastbound from Pocket reader on I-5. 93 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx week were lower, but could potentially be studied over time to better understand the variability of travel times by time period. [Insert Figure C.143] [caption] Figure C.143. MAC address matches on US-50 eastbound from Pocket reader on I-5. <H3>Findings This use case provided the opportunity to assess the performance of a Bluetooth-based travel time monitoring system deployed n a urban environment with one depl yed in a rural environment, while simultaneously demonstrating how sensor configuration affects both the amount and quality of data coll cted. One overarching finding of this use case is that the potential exists to use BTRs to generate travel times over long distances between urban and rural settings based on travel along adjoining roadways; however, this use is heavily dependent on the presence of the right set of

415 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Finally, this use case enabled the research team to compare hit rates and matching percentages for readers located in both urban and rural environments. In this study, as is typical for urban versus rural settings, the biggest differences between the readers deployed on I-5 and US-50 included the number of lanes at each reader, the distance between readers, and the traffic volumes at each reader. The I-5 readers were all configured to monitor traffic across six or more lanes of bidirectional traffic. Although three of the four I-5 readers were placed adjacent to the northbound lanes, the content of Tables C.50 and C.51 demonstrates that readers generate significant hit rates in both directions of travel. A similar situation exists for the Gloria reader deployed adjacent to the southbound lanes. This demonstrates that BTRs have the potential to monitor wide bidirectional freeway segments. Tables C.50 and C.51 also indicate that directional traffic patterns have a sig- nificant degree of influence on Bluetooth device-matching patterns. For example, on I-5, where northbound and southbound traffic volumes are comparable throughout TABLE C.50. BTR READER SUMMARY, I-5 NORTHBOUND TO US-50 EASTBOUND FOR FRIDAYS Reader Mounting Type No. of Lanes MAC ID Reads Traffic Sampled (%) Distance to Next Reader (mi) No. of Exits Between Readers Hits Reidentified Downstream (%) Pocket (I-5) NB roadside controller cabinet 3 4,208 7.2 0.9 1 43 Florin (I-5) NB roadside controller cabinet 4 5,402 8.1 1.1 0 47 Gloria (I-5) SB roadside controller cabinet 4 5,843 9.6 4 2 45 Vallejo (I-5) NB roadside controller cabinet 5 6,642 7.7 46 27 2a Placerville (US-50) EB roadside controller cabinet 1 1,676 NA 39 7 34 Twin Bridges (US-50) EB roadside controller cabinet 1 882 NA 8 3 55 Echo Summit (US-50) WB roadside controller cabinet 1 771 NA 4 2 74 Meyers (US-50) EB roadside controller cabinet 1 1,059 NA N/A N/A N/A a Low reidentification rate. Note: Time frame is noon to 9 p.m. NA = not available. N/A = not applicable—no downstream reader.

416 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY the day, none of the readers reidentify more than 50% of the hits from upstream readers. In contrast, as Table C.51 shows, 68% of the hits from the Twin Bridges reader on rural US-50 are reidentified at the Placerville reader (39 miles away). These higher rural matching percentages, despite longer distances between readers, are in part because US-50 exhibits much stronger directional trends (e.g., eastbound US-50 carries the majority of traffic on Friday afternoons and evenings). Despite this, the volumes of Bluetooth reads along I-5 are several times greater than those along US-50, facilitating the calculation of more granular travel time reliability metrics. Finally, each of the eight BTRs from which data were collected as part of this use case were mounted on roadside controller cabinets and used directional antennae to focus signal strength toward the roadway. The fact that each of the readers had high hit rates and produced significant matching percentages with downstream readers dem- onstrates that this is an effective configuration for capturing multilane, bidirectional TABLE C.51. BTR READER SUMMARY, US-50 WESTBOUND TO I-5 SOUTHBOUND FOR SUNDAYS Reader Mounting Type No. of Lanes No. of MAC ID Reads Traffic Sampled (%) Distance to Next Reader (mi) No. of Exits Between Readers Hits Reidentified Downstream (%) Meyers (US-50) EB roadside controller cabinet 1 936 NA 4 2 52 Echo Summit (US-50) WB roadside controller cabinet 1 771 NA 8 3 66 Twin Bridges (US-50) EB roadside controller cabinet 1 968 NA 39 7 68 Placerville (US-50) EB roadside controller cabinet 1 1,495 NA 46 27 6a Vallejo (I-5) NB roadside controller cabinet 5 4,940 8.1 4 2 45 Gloria (I-5) SB roadside controller cabinet 4 4,352 8.7 1 1 48 Florin (I-5) NB roadside controller cabinet 4 4,003 8.5 1 1 42 Pocket (I-5) NB roadside controller cabinet 3 3,208 7.5 N/A N/A N/A a Low reidentification rate. Note: Time frame is noon to 9 p.m. NA = not available. N/A = not applicable—no downstream reader.

417 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY traffic. However, the team also found that the readers are most effectively deployed in locations where they only monitor traffic in the mainline lanes. This is particularly a problem with readers placed adjacent to on-ramps or off-ramps, such as the readers on I-5 at Florin and Pocket, as the travel time reidentification between the vehicle’s time stamp at the on-ramp and its time stamp at the next downstream reader will be higher than the true travel time on the freeway; this is especially true if the ramp is congested or has ramp metering. For agencies using Bluetooth networks already in the field, it is important to determine which readers may be monitoring ramp traffic so that these travel time biases can be understood and mitigated. The low reidentification rates for Vallejo (2%) in Table C.50 and Placerville (6%) in Table C.51 are primarily the result of the next downstream reader for each being 46 miles away and on an adjoining roadway: I-5 to US-50 and US-50 to I-5, respectively. Using Bluetooth and Electronic Toll Collection Data to Analyze Travel Time Reliability in a Rural Setting This use case aims to quantify the impact of adverse weather– and demand-related conditions on travel time reliability using data derived from Bluetooth- and ETC- based systems deployed in rural areas. At present, loop detectors provide the majority of transportation data used for highway analysis. These detectors must be embedded in the roadway and require regu- lar quality checking and often costly maintenance. Bluetooth- and ETC-based systems, however, can be mounted on existing infrastructure either overhanging or adjacent to the roadway, thus reducing the costs of deployment, reconfiguration, repair, and replacement. These systems work by scanning compatible devices deployed inside passing vehicles for unique identification information (i.e., MAC IDs for Bluetooth devices and tag ID numbers for toll transponders). If multiple readers detect identifica- tion information for a uniquely identifiable device, a record of that vehicle’s travel time can be constructed. Because these devices do not need to be permanently fixed on the roadway, they offer a more flexible and often more cost-effective method for detection, especially in rural locations. To examine travel time reliability within the context of this use case, methods were developed to generate PDFs from large quantities of travel time data representing dif- ferent operating conditions. To facilitate this analysis, travel time and flow data from ETC readers deployed on I-80 westbound and BTRs deployed on US-50 eastbound and US-50 westbound were obtained from PeMS and compared with weather data from local surface observation stations. PDFs were subsequently constructed to reflect reliability conditions along these routes during adverse weather conditions, as well as according to time of day and day of the-week. Practical data quality issues specific to Bluetooth and ETC data were also explored. This use case has value to a broad range of user groups. Transportation agencies with data collection needs in rural areas will benefit from seeing a travel time reliability analysis of real-world data obtained from Bluetooth and ETC devices. This type of data is still fairly uncommon in practice, and this use case should help to demystify it, demonstrating how such data sets compare with more commonly available types of

418 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY traffic data. Operators and analysts will benefit from a discussion of the quality and typical characteristics of this type of data. Transportation agencies with specific data needs and cost constraints seeking a flexible sensor deployment may find Bluetooth- or ETC-based systems more attractive based on the results of this analysis. This use case also has value to operators who are interested in the effects of vary- ing weather conditions and weekend travel on travel time reliability within a rural set- ting. Understanding the historical effects of different weather and demand conditions on the performance of a given roadway enables operators to respond to similar condi- tions as they occur; for example, the expected range of travel times could be posted on dynamic message signs located at key decision points. Use Case Analysis Sites Two sites were used in the validation of this use case to compare similar phenomena in different locations, as well as to highlight the different types of data available in this region (see Figure C.144). Site 1 is a 45.2-mile stretch of primarily four-lane divided highway along I-80 westbound with an estimated free-flow travel time of 46 minutes. It begins east of the Truckee–Tahoe Airport weather station and ends just after I-80 exits the western border of the Tahoe National Forest. This roadway is instrumented with ETC readers mounted on sign structures overhanging the roadway. Figure C.145 shows an example of Site 1. 101 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx constraints of the detection infrastructure) to enable the research team to closely tie its analysis to the ata generated by the weather stations, thus maximizing the relevance of the weather data. Both sites are rural and receive relatively little commute or intercity traffic during the week. However, Lake Tahoe is a popular weekend and holiday destination for residents of the Bay Area, which is just a 3.5-hour drive away. I-80 and US-50 are both popular routes to take to get to Lake Tahoe from the Bay Area, and they are known for their heavy weekend traffic as large numbers of people nter and leave area at nearly the s me time. [Insert Figure C.144] [caption] Figure C.144. Use case site map showing Site 1 (top) and Site 2 (bottom). [Insert Figure C.145] Figure C.144. Use case site map showing Site 1 (top) and Site 2 (bottom).

419 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.145. Example of Site 1 on I-80. Site 2 is a 10.8-mile stretch of two-lane highway along US-50 with an estimated free- flow travel time of 14 minutes. This site was examined in both the eastbound and west- bound directions of travel. It approaches the South Lake Tahoe airport on its eastern end and terminates in the west just outside of Twin Bridges. Site 2 is instrumented with BTRs deployed along the side of the roadway. Figure C.146 shows an example of Site 2. Table C.52 provides details about the two sites. TABLE C.52. SITE CHARACTERISTICS Site No. Highway Distance (mi) Estimated Travel Time (min) Type 1 I-80 WB 45.2 46 Four lane, divided 2 US-50 EB and US-50 WB 10.8 14 Two lane Figure C.146. Example of Site 2 on US-50.

420 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY These two sites were selected due to their strong weekend traffic patterns, as well as their proximity to local weather observation stations. They were made as short as possible (within the constraints of the detection infrastructure) to enable the research team to closely tie its analysis to the data generated by the weather stations, thus maxi- mizing the relevance of the weather data. Both sites are rural and receive relatively little commute or intercity traffic during the week. However, Lake Tahoe is a popular weekend and holiday destination for residents of the Bay Area, which is just a 3.5-hour drive away. I-80 and US-50 are both popular routes to take to get to Lake Tahoe from the Bay Area, and they are known for their heavy weekend traffic as large numbers of people enter and leave the area at nearly the same time. Analysis Methodology The routes included as part of this use case were analyzed to determine the effects of weather and weekend travel conditions on travel time reliability. To do this, TT-PDFs that isolate certain operating scenarios (e.g., snow on a weekday) were constructed. Time-of-day, day-of-the-week, and weather conditions–based PDFs were generated for Site 1, and time-of-day and day-of-the-week conditions–based PDFs were generated for Site 2. To begin the analysis, travel time statistics for 5-minute windows at both sites were obtained from PeMS. For Site 1, where weather conditions were considered, travel time data were matched with weather data from the nearby Automated Weather Observing System III (AWOS-III) surface observation station. Each 5-minute time interval was marked with its corresponding weather event (rain, snow, fog, or thunder- storm), if any; visibility distance (zero to 10 miles); and precipitation (in inches). For Site 2, where only weekend travel effects were considered, intervals were grouped into three categories. Travel times were labeled as belonging to a weekday (Monday through Thursday), a Friday, a Saturday, a Sunday, or a holiday. With the travel time data collected and labeled, an effort was made to determine which data points, if any, should be discarded. As discussed above and explored fur- ther in the data section of this use case, travel time data obtained from Bluetooth and ETC readers can, depending on a number of variables, contain artificially long vehicle travel times. The travel time data, labeled with weather condition and day of the week, were used to construct PDFs of travel times under varying operating conditions. The effects of weather and weekend travel can be seen in the differences in travel time variability, as indicated in the PDFs reflecting different conditions. Finally, aggregate travel time reliability statistics such as the 95th percentile travel time were computed for different conditions. Data Collected To complete this use case, Bluetooth and ETC data were retrieved from PeMS for the two sites. ETC data were obtained at Site 1 and Bluetooth data at Site 2 (see Table C.53). To benefit from the availability of this rich data set, all available data were used in both cases. This was particularly desirable, as the available data do not span seasonal changes. The data obtained from PeMS included the following:

421 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Minimum travel time; • Average travel time; • Maximum travel time; • 25th, 50th, and 75th percentile travel times; and • Flow (number of vehicles observed during the window). Each of these metrics was collected for a series of consecutive 5-minute windows. Not all 5-minute windows during the periods of observation contained usable data, so some gaps exist in the data. TABLE C.53. DATA SET DESCRIPTIONS BY SITE Roadway Data Type Date Range Data Completeness (%) Quantity of Data I-80 WB ETC April 25 to June 29, 2011 59.5 11,071 points US-50 WB Bluetooth January 28 to April 21, 2011 35.9 8,576 points US-50 EB Bluetooth January 28 to April 21, 2011 38.9 9,376 points To examine the effect of weather on travel times across Site 1, weather data were obtained from the nearby AWOS-III surface observation station located at the Truckee–Tahoe Airport. These data were available in windows ranging between 5 and 20 minutes and were sufficiently fine grained to match well with the 5-minute travel times. Here, the research team focused on optional event tags (fog, rain, snow, or thunder storm), visibility (zero to 10 miles), and precipitation (in.). After the weather and travel time data sets were obtained, the travel time data were quality checked to ensure no erroneous data points were included. As discussed above, Bluetooth- and ETC-based data collection systems are susceptible to data errors due to the way they measure travel times. These detectors work by recording a time stamp and the MAC address or toll tag ID of vehicles that pass them on the roadway. These identification data are matched between detectors such that a vehicle passing multiple BTRs produces a travel time for that link. However, if that vehicle stops somewhere along the roadway after passing the first BTR before it continues on to the second, an artificially large travel time will be seen. Similarly, if a vehicle visits the first BTR, then travels to an adjacent, but unmonitored roadway before returning to the monitored roadway and passing the second BTR, the travel time for that trip will be artificially large. In addition, vehicles traveling past the same BTR more than once in different directions can also cause data errors when readers are capable of measuring multiple directions of travel simultaneously. To prepare the raw data for analysis, these inaccurate travel times should typically be removed individually. The data set for this use case was composed of aggregate sta- tistics that had already been computed based on all available travel time data, includ- ing data that were potentially inaccurate. To prevent the analysis from being skewed

422 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY by those values, the research team used the median travel time for each 5-minute interval. In this case, working with the median as opposed to the mean has a significant effect on the analysis, reducing the appearance of implausible extreme values. This technique works well for periods of time with significant traffic flow because unrea- sonably long travel times are muted as the sample size increases. However, the problem remains when the sample size is small, as a time interval containing a single extreme value will still result in an unreasonable median. As can be seen in Figure C.147, which represents conditions for Site 1, virtually all long travel times in the data occurred dur- ing low-volume time periods. The flows shown in Figure C.147 are not sustained, but rather 5-minute aggregates. However, this does not mean that poorly represented time intervals should be discarded. Although it is true that median travel times from sample intervals with larger numbers of vehicles should better conform to the expected value, and medians of smaller samples are more likely to contain outliers, this phenomenon is also rep- resentative of the fundamental behavior of traffic: both high (uncongested) and low (congested) speeds are seen at low flows. Thus, points from sparsely populated time intervals should not necessarily be discarded on those grounds alone (as long as the points can be assumed to be valid). Plotting the data for Site 1 from Figure C.147 another way yields an empirical fundamental diagram for speed and flow (see Figure C.148). The expected triangle shape can be seen with congested conditions represented by the points sloping down and to the left from the peak flow (seen around 60 mph) and uncongested conditions represented by the points sloping down and to the right from the peak flow (with speeds above 60 mph). When viewed like this, all points appear to be valid, because they are behaving according to basic traffic flow theory. Because longer travel times can be reasonably expected for time periods with few observations (i.e., during congested flow), it was determined that for the purposes of this use case no data points would be excluded. Figure C.147. Travel time versus flow on I-80 (Site 1). 107 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.147. Travel time versus flow on I-80 (Site 1). However, this does not mean that poorly represented time intervals should be discarded. Although it is true that median travel times from sample intervals with larger numbers of vehicles should better conform to the expected value, and medians of smaller samples are more likely to contain outliers, this phenomenon is also representative of the fundamental behavior of traffic: both high (uncongested) and low (congested) speeds are seen at low flows. Thus, points from sparsely populated time intervals should not necessarily be discarded on those grounds alone (as long as the points can be assumed to be valid). Plotting the data for Site 1 from Figure C.147 another way yields an empirical fundamental diagram for speed and flow (see Figure C.148). The expected triangle shape can be seen with congested conditions represented by the points sloping down and to the left from the peak flow (seen around 60 mph) and uncongested conditions represented by the points sloping down and to the right from the peak flow (with speeds above 60 mph). When viewed like this, all points appear to be valid, because they are behaving according to basic traffic flow theory. Because longer travel times can be reasonably 0 25 50 75 100 125 150 175 200 0 50 100 150 200 250 300 350 400 Fl ow (v eh ic le s/ ho ur ) Travel Time (minutes)

423 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Travel Time Analysis: Site 1 Site 1, which lies on I-80 westbound and begins just north of Lake Tahoe, is known to receive heavy traffic from vehicles returning to the Bay Area from weekend trips on Sunday evenings. The breakdown of travel times by day of the week from April 25 to June 29, 2011, shown in Figure C.149 indicates that the Sunday 95th percentile travel time exceeds that of a normal weekday by approximately 34%. This difference indi- cates increased travel time unreliability on Sundays, but the rest of the week appears fairly consistent. Because Sundays have a significantly different pattern of traffic, they were considered separately as part of the research team’s weather analysis. Having assessed travel time reliability trends over the entire week, the team next examined travel times within individual days to determine if it was necessary to handle morning and afternoon peak conditions separately during the analysis. To facilitate this, the distribution of travel times for each 5-minute interval over the course of a full day Figure C.148. Speed versus flow on I-80 (Site 1). 108 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx expected for time periods with few observations (i.e., during congested flow), it was determined that for the purposes of this use case no data points would be excluded. [Insert Figure C.148] [caption] Figure C.148. Speed versus flow on I-80 (Site 1). <H4> ravel Time Analysis: Site 1 Site 1, which lies on I-80 westbound and begi s jus north of Lake Tahoe, is known to receive heavy traffic from vehicles returning to the Bay Are from weekend trips on Sunday evenings. The breakdown of travel times by day of the week from April 25 to June 29, 2011, shown in Figure C.149 indicates that the Sunday 95th percentile travel time exceeds that of a normal weekday by approximately 34%. This difference indicates increased travel time unreliability on Sundays, but the rest of the week appears fairly consistent. Because Sundays have a significantly different pattern of traffic, they were considered separately as part of the research team’s weather analysis. 0 25 50 75 100 125 150 175 200 0 10 20 30 40 50 60 70 80 90 Fl ow (v eh ic le s/ ho ur ) Median Speed (miles/hour) Figure C.149. 95th percentile and median travel times for Site 1. 109 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx [Insert Figure C.149] [caption] Figure C.149. 95th percentile and median travel times for Site 1. Having assessed travel time reliability trends over the entire week, the team next examined travel times within individual days to determine if it was necessary to handle morning and afternoon peak conditions separately during the analysis. To facilitate this, the distribution of travel times for each 5-minute interval over the course of a full day was plotted as shown in Figure C.150. This figure demonstrates that no significant time-of-day trends exist on this route. If the typical day had shown some periodicity, it would have been necessary to examine weather effects during peak and off-peak hours separately. However, as travel times appear to be consistently between 30 and 45 minutes throughout the day, the research team was able to conduct its weather-related travel time reliability analysis without accounting for differences between daily peak conditions. 0 20 40 60 80 100 120 Mon-Thu Fri Sat Sun Tr av el T im e Type of Day 95th Percentile Travel Time Median Travel Time

424 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY was plotted as shown in Figure C.150. This figure demonstrates that no significant time- of-day trends exist on this route. If the typical day had shown some periodicity, it would have been necessary to examine weather effects during peak and off-peak hours sepa- rately. However, as travel times appear to be consistently between 30 and 45 minutes throughout the day, the research team was able to conduct its weather-related travel time reliability analysis without accounting for differences between daily peak conditions. Figure C.150 also indicates the presence of significantly longer travel times (hover- ing near the top of the chart). As these travel times do not appear to follow any time-of- day trends, it was surmised that they might be the result of adverse weather conditions. The team explored this idea further by generating PDFs of travel times collected during varying weather conditions. These PDFs were built by placing each 5-minute median travel time into a bin, each of which was 5 minutes wide. To define discrete weather conditions, the team adopted five labeled event categories (baseline, snow, rain, fog, and thunderstorm). The team then divided the quantitative measures of precipitation and visibility into categories. Precipitation was broken down into no precipitation and some precipitation cases. Visibility was subdivided into low- visibility, medium-visibility, and high-visibility cases corresponding to 0 to 3, 3 to 7, and 7 to 10 miles of visibility, respectively. The event conditions were all mutually exclusive, as were the visibility and precipitation categories. The baseline event condition does not necessarily mean that driving conditions were ideal, but that no event was associated with that time (e.g., there may have been precipitation or low visibility). The resulting weather PDFs can be seen in Figure C.151, and their effects on travel time are summarized in Figure C.152. The scale of the vertical axis in Figure C.151 is not consistent across each of the graphs because of the variable quantities of data available for each condition. Figure C.150. Site 1 time-of-day travel time distribution. 111 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.150. Site 1 time-of-day travel time distribution. The resulting weather PDFs can be seen in Figure C.151, and their effects on travel time are summarized in Figure C.152. The scale of the vertical axis in Figure C.151 is not consistent across each of the graphs because of the variable quantities of data available for each condition. [Insert Figure C.151] [caption]

425 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.151. Site 1 TT-PDFs during various weather conditions. 112 2014.04.23 12 L02 Guide Appendix C Part 4_ final for composition.docx Figure C.151. Site 1 TT-PDFs during various weather conditions. 0 200 400 600 0 60 120 180 240 300 Fr eq ue nc y Travel Time (minutes) No Precipitation 0 200 400 600 Fr eq ue nc y 0 60 120 180 240 300 Travel Time (minutes) High Visibility 0 4 8 12 Fr eq ue nc y 0 60 120 180 240 300 Travel Time (minutes) Low Visibility 0 4 8 12 Fr eq ue nc y 0 60 120 180 240 300 Travel Time (minutes) Fog 0 60 120 180 240 300 Travel Time (minutes) 0 200 400 600 Fr eq ue nc y Baseline 0 60 120 180 240 300 Travel Time (minutes) 0 3 6 9 Fr eq ue nc y Rain 0 60 120 180 240 300 Travel Time (minutes) 0 6 12 18 Fr eq ue nc y Snow 0 60 120 180 240 300 Travel Time (minutes) 0 15 30 45 Fr eq ue nc y Some Precipitation 0 10 20 30 Fr eq ue nc y 0 60 120 180 240 300 Travel Time (minutes) Thunderstorm 0 10 20 30 Fr eq ue nc y 0 60 120 180 240 300 Travel Time (minutes) Medium Visibility

426 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY It is clear from Figure C.152 that snow, low-to-moderate visibility, and precipita- tion have a measurable effect on travel time reliability. The 95th percentile travel times during those weather conditions are significantly higher than their median travel times, indicating that the distribution of travel times is skewed toward the high end. The team also explored these data by assessing which conditions were present dur- ing the longest travel times occurring on this route. The results of this analysis are pre- sented in Table C.54. This perspective complements that of the PDFs displayed above by revealing that adverse weather events are present during many more long travel times than short travel times. In fact, the research team’s analysis indicated that if a travel time exceeded the 95th percentile for this route, there was nearly a 50% chance that it was snowing, despite snow accounting for only 5% of all trips. S om e P re ci pi ta tio n Figure C.152. Site 1 summary of weather effects.

427 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.54. WEATHER CONDITIONS ACTIVE DURING LONG TRAVEL TIMES Condition Active Active When Travel Time Exceeded 85th Percentile 95th Percentile No precipitation 90.3% 84.3% 76.6% Precipitation 9.7% 15.7% 23.4% Baseline 90.7% 74.9% 54.7% Snow event 5.8% 23.0% 45.3% Rain event 1.6% 1.6% 0.00% Fog event 1.7% 0.5% 0.00% Thunderstorm event 0.3% 0.00% 0.00% High visibility 84.7% 67.5% 48.4% Medium visibility 4.8% 11.0% 11.0% Low visibility 3.8% 14.1% 31.3% Travel Time Analysis: Site 2 Site 2 is similar to Site 1 in that it is subject to periodic spikes in demand from weekend travel. However, Site 1 is a four-lane divided highway, but Site 2 is a two-lane highway (with only intermittent passing opportunities) and thus not as well-equipped to handle the additional demand. Site 2 was equipped with Bluetooth detectors that were used to construct travel times in a similar manner to the ETC readers used for Site 1. The goal of the Site 2 travel time analysis was to determine the effects of weekend travel on this site. The team began by examining a typical day on US-50 to check for the presence of morning or afternoon peak conditions, which would have to be controlled for as part of a day-of-week analysis. Similarly to Site 1, 5-minute median travel times were obtained from PeMS. The time-of-day average of these median travel times is presented in Figure C.153, which does not appear to show any true peak conditions. Although the maximum daily travel time appears to occur at around 5:00 a.m., this does not appear to be a true morning peak, likely being attributable to artificially high travel times occurring during low-volume periods as discussed in the data collection section. This assessment is supported by the average daily flow data displayed in Figure C.154. Due to the absence of daily peak conditions at this site, the research team decided to consider each day as a whole. If strong peak conditions had been observed, it would have been necessary to develop travel time distributions for peak and off-peak conditions separately. If this site were indeed subject to heavy weekend demand, it would be expected that travel times would be less reliable during the weekend. To explore whether the data supported this possibility, the team first plotted the average vehicle flow over the course of the week for each direction of traffic for this site. Figures C.155 and C.156 show that weekend demand dominates the traffic profile for this section of roadway. As a result, one would expect travel time unreliability to follow a similar pattern.

428 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Time of Day Figure C.153. US-50 westbound average travel time by time of day (Site 2). Time of Day Figure C.154. US-50 westbound average flow by time of day (Site 2). To visualize travel time unreliability for this site, the research team constructed a travel time density plot, shown in Figure C.157, representing a full week for US-50 eastbound. This figure is a collection of PDFs for each 5-minute period over the course of the entire week. Since this figure represents travel times in the eastbound direction, one would expect more unreliability on Friday and Saturday as weekend travelers are making their way to Lake Tahoe from the Bay Area. The PDF appears to confirm this, as it can be seen at a glance that Friday is the day with the most severe unreli- ability, with Sunday through Thursday exhibiting much more consistent travel times in comparison.

429 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Day of Week Figure C.155. Weekly flow on US-50 eastbound (Site 2). Day of Week Figure C.156. Weekly flow on US-50 westbound (Site 2).

430 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.157. Week-long distribution of travel times on US-50 eastbound (Site 2). Day of Week If this variation in travel time reliability over the course of the week is the result of weekend travel patterns and not adverse weather or some other factor, one would expect to see a complementary trend on US-50 westbound. Sunday should be the least reliable day in this direction of travel as heavy traffic causes unreliability for travelers returning to the Bay Area from Lake Tahoe at the end of the weekend. After construct- ing PDFs for the opposite direction of travel, one sees that this is in fact supported by the data, as shown in Figure C.158. The travel time variability by weekday on US-50 can be expressed in terms of the 95th percentile of travel time. This is presented for both directions of travel, along with the mean by day, in Table C.55. Weekend travel patterns appear in the longer 95th percentile travel times seen on Friday in the eastbound direction and Sunday in the westbound direction.

431 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY PRIVACY CONSIDERATIONS Innovations in data collection technology are providing exciting opportunities in the area of roadway travel time measurement. At the same time, use of these technologies is not without challenges, some technical, others related to protecting the confidenti- ality of personal information contained in ETC toll tag and Bluetooth mobile device data sets. Because an individual driver’s privacy may potentially be compromised when others have the ability to track the driver’s movements across the public roadway net- work, users of these data, both public and private, have developed a variety of plans and programs to ensure that data gathered in support of the generation of roadway travel times cannot be linked back to individuals. Recognizing that the data collection technologies described in this case study have the potential to raise public concerns Figure C.158. Week-long distribution of travel times on US-50 westbound (Site 2). Day of Week TABLE C.55. TRAVEL TIME RELIABILITY BY WEEKDAY ON US-50 (SITE 2), IN MINUTES Travel Time Metric Sunday Monday Tuesday Wednesday Thursday Friday Saturday EB mean travel time 9.6 8.1 8.9 8.3 12.4 14.3 10.4 EB 95th percentile travel time 25.4 22.0 18.2 23.7 36.0 41.1 32.2 WB mean travel time 12.7 11.5 10.7 11.3 14.4 12.3 13.1 WB 95th percentile travel time 40.4 30.7 20.8 30.8 41.0 37.7 31.0

432 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY over privacy, this section provides examples of the types of privacy protection policies and procedures currently in use by both public agencies and private sector companies to guard against the misuse of drivers’ personal information. Electronic Toll Tag–Based Data Collection Overview of Personal Privacy Concerns When used for toll collection purposes, toll transponders are automatically identified whenever they pass within the detection zone of a compatible ETC reader. Every time this occurs, the ETC reader prompts the tolling system to deduct a predetermined amount of money from the prepaid debit account associated with that transponder’s unique ID number. Recognizing that this technology would make it possible to track the path of each transponder-enabled vehicle between successive ETC readers, a num- ber of agencies have deployed supplemental (nonrevenue generating) ETC readers and backoffice data analysis systems to facilitate the calculation of point-to-point travel times based on these data. Although it is not instantaneous, a direct connection exists between a toll tran- sponder’s unique ID and the personal information of the transponder user that is stored in agency databases. The existence of this connection creates concerns for some users stemming from the potential loss of anonymity associated with their travel behavior. Policies and Procedures to Protect the Privacy of Electronic Toll Tag Transponder Users Two of the agencies best known for making use of anonymous ETC transponder data in support of travel time data collection are Houston TranStar (Houston, Texas) and the Metropolitan Transportation Commission (MTC) of the San Francisco Bay Area. Although both agencies have made significant efforts to protect the personal informa- tion of ETC toll tag users, only MTC has developed detailed guidelines concerning the use, archiving, and dissemination of these data. Houston TranStar Houston was the first city in the United States to apply ETC-based tolling technology to the collection of data concerning travel times and average speeds. The toll tag data on which this system is based is collected from ETC reader stations deployed at 1- to 5-mile intervals over 700 miles of Houston area roads. Traffic management center staff use this system to detect congestion along area freeways and high-occupancy vehicle lanes; these data are also provided to the public via media reports, travel times posted to roadside CMS, and the Houston TranStar website. In an effort to protect the privacy of the drivers from whom travel time data are being collected, TranStar has configured its ETC readers to store only the last four digits of each toll tag’s ID number. Truncating ID numbers in this way allows the agency’s automated systems to track, but not identify, individual vehicles as they move across the data collection network. TranStar staff are acutely aware of drivers’ concerns regarding the protection of their personal information and have made efforts to inform the public that not only do they collect just a portion of each toll tag’s ID number but also that none of the information concerning the move- ment of individual transponders is available for use by agency staff or law enforcement.

433 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Metropolitan Transportation Commission In support of its 511-traveler information service, MTC operates a travel time data collection system based on information collected from the region’s FasTrak toll system. As part of this effort, MTC takes the following steps to ensure the protection of toll tag users’ personal information (7): • Encryption software in the central software system encrypts each toll tag ID be- fore any other processing is carried out to ensure that the toll tags are treated anonymously. • Encrypted toll tag IDs are retained for no longer than 24 hours before being dis- carded. No historical database of encrypted IDs is maintained beyond that time period. In addition to establishing the guidelines described above concerning the manage- ment of toll tag data, MTC has also developed the following principles regarding the protection of personal privacy (7): 1. All traffic data collection activities will be implemented in a manner consistent with federal and California laws governing an individual’s right to privacy. 2. The tag user’s consent will be secured before the operation of any data collection system based on toll tags. 3. No information about, or that is traceable to, any individual person will be col- lected, stored, or manipulated. 4. Information on the data collection, aggregation, and storage practices will be available at the 511.org website, which will include traffic data collection meth- ods, privacy policy, and full disclosure on the use of the data. 5. Members of the public will be given the ability to contact the program to discuss any privacy questions or concerns. 6. All recipients of the data shall comply with these privacy principles. 7. An annual evaluation will be conducted to assure that individual privacy is protected. Although MTC provides the third-party contractors who operate its 511 and related services with access to the toll tag data collected as part of this system, as Items 6 and 7 indicate, these firms are required to observe all of MTC’s privacy principles. They are also subject to an annual evaluation to verify their compliance. Bluetooth-Based Data Collection Overview of Personal Privacy Concerns Like ETC-based systems, Bluetooth-based travel time data collection systems operate via the reidentification of mobile device ID data at successive locations along a road- way. However, although other technologies used to calculate roadway travel times based on the movement of probe vehicles (e.g., toll tag– and license plate reader–based systems) have the potential, if abused, to directly link a specific user to the movement of his or her vehicle, identification of individuals based on their Bluetooth signature

434 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY (i.e., MAC address) is much less straightforward. In theory, if the MAC address of the mobile device has been set by its manufacturer, the possibility exists, however remote, for a link to be established between the product part number and its owner via a prod- uct registration database or product warranty. Even so, the MAC addresses of mobile devices, though unique, are not linked to specific individuals or vehicles via any type of central database or user account. Despite these facts, public perception regarding this method of data collection varies widely and has the potential to interfere with its implementation. As a result, users of this technology have implemented a range of procedures to minimize the pos- sibility of infringing on users’ privacy. Policies and Procedures to Protect the Privacy of Users’ Bluetooth ID Data Two of the entities deploying Bluetooth-based data collection technologies for the pur- pose of calculating roadway travel times are Post Oak Traffic Systems, which uses technology developed at the Texas Transportation Institute, and Traffax, which uses technology developed at the University of Maryland. Users of Bluetooth-based data collection technologies stress that the MAC addresses collected by their systems are not directly associated with a specific user and do not contain any personal data or information that could easily be used to identify or track an individual person’s whereabouts. Nevertheless, all recommend taking addi- tional steps to further ensure that the information collected from individual Bluetooth devices is kept as anonymous as possible. Post Oak Traffic Systems Post Oak Traffic Systems uses various techniques to help protect the personal privacy of drivers. Only the Bluetooth device information necessary to facilitate the calcula- tion of travel times (MAC address, device reader location, and time stamp) is polled. Although other data can be accessed as part of the Bluetooth device polling process, such as device name and packets of information concerning data exchanged between a mobile phone and its associated Bluetooth headset, Post Oak staff recommend only collecting the data absolutely necessary to calculate segment travel times. To further address potential privacy concerns, Post Oak field processors are programmed to en- crypt all Bluetooth ID data immediately on receipt. Doing so ensures that the actual device ID is not sent or stored anywhere. Traffax Traffax company staff recommend implementing the following additional measures concerning the retention, encryption, and dissemination of Bluetooth MAC address data to ensure that no unauthorized use of data occurs (8): • Destroy or encrypt any base-level MAC address information after processing. • Use industry standard encryption and network security. Proper security protocols, passwords, encryption, and other methods should be incorporated into the data systems that store and process the MAC address data.

435 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Use data processing safeguards (encryption and randomization) to prevent the recovery of unique MAC addresses: — Encryption methods transform MAC address data (at the sensor level) into an output form that requires special knowledge (such as an encryption key) to recover the original information. This activity preserves the uniqueness of the ID so that matching can still be performed without risking exposure of actual device ID data. — Randomization methods deliberatively degrade the data such that individual observations are no longer globally unique, and the ability to track individu- als based on their MAC addresses becomes theoretically impossible. A simple example of this would be to truncate the final three characters of the MAC ID. All of the privacy protection methods recommended by Traffax are implemented at the sensor level, not at the central processing station. This practice makes it virtually impossible to obtain the complete and globally unique MAC address of any particular device. Application of Privacy Principles It has been amply demonstrated that travel time data collection technologies based on device reidentification (e.g., ETC toll tags and Bluetooth devices) may potentially be abused in ways that would cause significant privacy-related concerns. Although this section of the case study has reviewed a number of techniques being used to ensure that drivers’ anonymity is preserved, long-term acceptance of these technologies will ultimately rely on maintenance of the public’s trust. To that end, the Intelligent Trans- portation Society of America has established a set of Fair Information and Privacy Principles (9) aimed at safeguarding individual privacy within the context of the de- ployment and operation of intelligent transportation systems. Although advisory in nature, these principles are intended to act as guidelines for use by public agencies and private entities to protect drivers’ right to privacy. LESSONS LEARNED Overview The team selected the Lake Tahoe region located in Caltrans District 3 to provide an example of a rural transportation network with fairly sparse data collection infrastruc- ture. The data used as part of this case study were generated by ETC readers on I-80 and Bluetooth-based data collection readers along I-5 and US-50 (see Figure C.104). These readers register the movement of vehicles equipped with FasTrak tags (North- ern California’s ETC system) and Bluetooth-based devices (e.g., smart phones) for the purpose of generating roadway travel times. Methodological Experiments This case study examined vehicle travel time calculation and reliability using Bluetooth and radio frequency ID reidentification systems. Various factors were identified that influence travel time reliability and guided the development of methods for processing

436 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY reidentification observations and calculating segment travel times. The results show that smart filtering and processing of Bluetooth data to better identify likely segment trips increase the quality of calculated segment travel time data. This approach helps preserve the integrity of the data set by retaining as many points as possible and basing decisions to discard points on the physical characteristics of the system, rather than their statistical qualities. To preserve the correctly measured variability of the data, it is important to only filter out unlikely trips. The benefit of a more careful accounting procedure during the vehicle-identification stage allows for later statistical filtering of the data to be milder, preserving more meaning. Filtering trips based on statistical properties is less desirable because criteria for eliminating points are not based on the physical system. If all of the data points in an interval are valid, it does not make sense to discard that entire interval simply because it does not contain many points. It is important to be aware of the interactions between preprocessing procedures. Future research may explore other smarter methods for filtering out unlikely segment trips. For example, considering observations across the entire BTR network would be useful for identifying unlikely segment trips. Various factors were found to influence vehicle segment travel times. For example, if the distance between BTRs is small, errors in calculated travel times may be signifi- cant, and methods for determining passage times must be carefully considered. Signal strength availability enables easy and accurate determination of passage times. With- out signal strengths, using arrival and departure times for passage times may improve travel time accuracy. This was found to be likely for BTR 10 based on the location of the reader relative to an intersection, the intersection configuration, and the short distance to the nearest BTR. Aggregating observations into visits was also found to be useful for distinguishing between trip and travel time for individual vehicles at a BTR. Use Case Analysis This case study explored four aspects of the ETC and BTR networks used in the Lake Tahoe case study: (a) detailed locations and mounting structures, (b) lanes and facili- ties monitored, (c) percentage of traffic sampled, and (d) percentage and number of ve- hicles reidentified between readers. As a whole, it showed that vehicle reidentification technologies are suitable for monitoring reliability in rural environments, provided that traffic volumes are high enough to generate a sufficient number of samples. For rural areas that have heavy recreational or event traffic, vehicle reidentification tech- nologies such as ETC and Bluetooth can provide enough samples to calculate accu- rate average travel times at a fine granularity during high-traffic time periods. During these high-volume periods, vehicle reidentification technologies can be used to monitor travel times and reliability over long distances, such as between the rural region and nearby urban areas. For agencies deploying vehicle reidentification monitoring networks, it is neces- sary to understand that the quality of the collected data is highly dependent on the decisions made during the design and installation process. The mounting position and antennae configuration of ETC readers affects the number of lanes sampled at a

437 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY given location. The positioning of BTRs, which have a large detection radius, dictates whether ramp, parallel facility, or multimodal traffic is also sampled. Such additional data can introduce large errors into travel time and reliability computations. In addi- tion to choosing an optimal positioning of readers, it is also important to place and space them appropriately. Readers should be placed where they can provide travel time information for heavily traveled origins and destinations. Because vehicle reidentifica- tion readers can be easily moved, there are opportunities to do pilot tests to evaluate the quality, quantity, and value of collected data, so that the final deployment robustly supports the desired measures. For agencies leveraging existing networks, it is important to fully understand the configuration of the network before using its data. At a minimum, this should include taking steps to verify that reader locations are correct and that the computed travel times and number of matches are reasonable given the distance and known traffic patterns between reader pairs. In locations where readers are closely spaced, comput- ing reader hit rates and comparing between readers can help identify the reader most suited for monitoring travel times at a given location. Finally, evaluating the percent- age and volume of matched reads between each reader pair by time of day and day of the week can indicate which time periods typically have sufficient matches to support average travel time computations at different granularities. This case study also explored an approach for isolating and exploring the effects of weather and weekend travel on travel time reliability. As implemented here, the analysis should be fairly straightforward to replicate with data from a TTRMS such as PeMS and the appropriate weather data. The PDFs of travel times under different operating conditions consistently demonstrated the unreliability associated with low visibility, rain, and travel under high-demand conditions. This use case also described the travel time unreliability associated with such events in terms of 95th percentile travel time. Taken together, these tools should be valuable to planners, operators, and engineers interested in analyzing and communicating the travel time reliability of a section of roadway, especially one of a rural nature. Finally, application of the research team’s approach has revealed several insights into the nature of working with Bluetooth- and ETC-based sources of data. Specifically, due to the nature of how these data collection technologies calculate travel time, it is necessary to account for the arti- ficially long travel times likely contained in the data set before conducting any analysis. Despite this shortcoming, both these technologies provide users with the potential to effectively assess roadway travel times and consequently, reliability of travel, in rural areas where the cost of deploying and maintaining spot-based sensors (e.g., loop detec- tors) makes their use impracticable. Privacy Considerations For either of the data collection technologies described in this case study to be suc- cessful over the long term, safeguards must be put in place to ensure that the privacy of individual drivers is protected. With this in mind, the team recommends that any probe data collection program implemented by public agencies or by private sector companies on their behalf should adhere to a predetermined set of privacy principles

438 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY (e.g., ITS America’s Fair Information and Privacy Principles) aimed at maintaining the anonymity of specific users. In addition, any third-party data provider working for a public agency to implement a travel time data collection solution based on either of the technologies described in this case study should be required to submit an affidavit indicating that they will not use data collected on the agency’s behalf in an inappropri- ate manner, including • Renting, leasing, selling, or otherwise providing data to any entity without explicit written permission of the agency. • Using data for any purposes other than those described as part of the project- specific requirements. • Attempting to identify the ownership of individual vehicles or devices whose per- sonal information is collected as part of the system’s data collection infrastructure. REFERENCES 1. Simi, B., and M. Heiman. Caltrans District 3 Regional Transportation Manage- ment Center (RTMC). Undated presentation. http://www.ntoctalks.com/webcast_ archive/to_aug_6_09/to_aug_6_09_bs.pdf. Accessed Sept. 2, 2012. 2. California Department of Transportation. Corridor System Management Planning. Sacramento. http://www.dot.ca.gov/dist3/departments/planning/corridorplanning. html. Accessed Sept. 2, 2012. 3. Porter, J. D., D. S. Kim, M. E. Magana, P. Poocharoen, and C. A. Gutierrez Arriaga. Antenna Characterization for Bluetooth-Based Travel Time Data Collec- tion. Presented at 90th Annual Meeting of the Transportation Research Board, Washington, D.C., 2011. 4. Haghani, A., M. Hamedi, K. F. Sadabadi, S. Young, and P. Tarnoff. Data Collection of Freeway Travel Time Ground Truth with Bluetooth Sensors. In Transportation Research Record: Journal of the Transportation Research Board, No. 2160, Transportation Research Board of the National Academies, Washington, D.C., 2010, pp. 60–68. 5. Bay Area Toll Authority. FasTrak® Electronic Toll Collection (ETC). http://bata. mtc.ca.gov/tolls/fastrak.htm. Accessed Sept. 2, 2012. 6. Haas, R., M. Carter, E. Perry, J. Trombly, E. Bedsole, and R. Margiotta. iFlorida Model Deployment Evaluation Report. Report No. FHWA-HOP-08-050. Federal Highway Administration, Washington, D.C., Jan. 2009. 7. Metropolitan Transportation Commission. 511 Privacy Policy. Effective date Nov. 11, 2011; updated July 24, 2012. http://511.org/privacy.asp. Accessed Sept. 2, 2012. 8. Traffax, Inc. Privacy Concerns. http://www.traffaxinc.com/content/privacy- concerns. Accessed Sept. 2, 2012. 9. E-Squared Engineering. ITS Fair Information and Privacy Principles. http:// www.e-squared.org/privacy.htm. Accessed Sept. 2, 2012.

439 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Case Study 4 ATLANTA, GEORGIA The team selected the Atlanta, Georgia, metropolitan region to provide an example of a mixed urban and suburban site that primarily relies on video detection cameras for real-time travel information. The main objectives of the Atlanta case study were to • Demonstrate methods to resolve integration issues by using real-time data from Atlanta’s traffic management system for travel time reliability monitoring. • Compare probe data from a third-party provider with data reported by agency- owned infrastructure. • Fuse the regime-estimation and nonrecurrent congestion analysis methodologies to inform on the reliability impacts of nonrecurrent congestion. The monitoring system section details the reasons for selecting the Atlanta region as a case study and provides an overview of the region. It briefly summarizes agency monitoring practices, discusses the existing sensor network, and describes the software system that the team used to analyze the use cases. Specifically, it describes the steps and tasks that the research team completed to transfer data from the data collection systems into a travel time reliability monitoring system (TTRMS). Methodological advances leverages methods developed in previous case studies to propose a framework for analyzing the impacts of nonrecurrent congestion on a given facility’s operating travel time regimes. Use cases are less theoretical and more site specific. The first use case details the challenges of leveraging advanced traffic management system (ATMS) data to drive a TTRMS. The second use case compares the results of analyzing congestion using agency-owned infrastructure-based sensor data and third-party provider speed and travel time data. The third use case quantifies and explains the statistical difference between multiple sources of vehicle speed data. Lessons learned summarizes the lessons learned during this case study with regard to all aspects of travel time reliability monitoring: sensor systems, software systems, calculation methodology, and use. These lessons learned will be integrated into the final Guide for practitioners.

440 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY MONITORING SYSTEM Site Overview With a population of approximately five-and-a-half million people, Atlanta is the ninth largest metropolitan area in the United States. The layout of the freeway network fol- lows a radial pattern. The core of the city is circled by a ring road (I-285, known locally as the Perimeter) that is intersected by various Interstates and state routes that radiate from downtown Atlanta into its outlying suburbs. Major radial highways include I-75 and I-85, which merge together to form a section of freeway called the Downtown Connector within the I-285 loop; I-20, which is the major east–west freeway in the region; and GA-400, which travels from north of downtown toward Alpharetta. A map of the major freeway facilities in the region is shown in Figure C.159. The metropolitan Figure C.159. Map of Atlanta freeways. 4 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.159. Map of Atlanta freeways. In addition, on October 1, 2011, the Georgia Department of Transportation (GDOT) opened the state’s first express lanes, which are operational on I-85 from I-285 to just south of the GA-365 split. The agency plans to deploy express lanes on I-75 north of Atlanta in 2015. Atlanta’s growing congestion is a major concern to GDOT and other agencies in the region. In 2008, the Atlanta region was granted $110 million by the U.S. DOT for a congestion- reduction demonstration program. Under this agreement, GDOT is partnering with the Georgia Regional Transportation Authority and the State Road and Tollway Authority to implement

441 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY freeway network also contains 90 miles of high-occupancy vehicle (HOV) lanes that operate 24 hours a day, 7 days a week on the following facilities: • I-75 inside the I-285 loop; • The Downtown Connector; • I-20 east of the Downtown Connector; and • I-85 between Brookwood and SR-20. In addition, on October 1, 2011, the Georgia Department of Transportation (GDOT) opened the state’s first express lanes, which are operational on I-85 from I-285 to just south of the GA-365 split. The agency plans to deploy express lanes on I-75 north of Atlanta in 2015. Atlanta’s growing congestion is a major concern to GDOT and other agencies in the region. In 2008, the Atlanta region was granted $110 million by the U.S. DOT for a congestion-reduction demonstration program. Under this agreement, GDOT is partnering with the Georgia Regional Transportation Authority and the State Road and Tollway Authority to implement innovative strategies to alleviate congestion. The first phase of this program involved the conversion of HOV lanes to high-occupancy toll lanes on I-85. Future phases will add additional express lanes to major freeway facilities, enhance commuter bus service, and construct new park-and-ride lots. GDOT is also undertaking a radial freeway strategic improvement plan to investigate the implementation of operational improvements, managed lanes, and capacity expansion on congested freeways, as well as to study how to increase transit mode share. GDOT monitors traffic in the Atlanta metropolitan area in real time through Navigator, its ATMS. The Transportation Management Center (TMC), located in downtown Atlanta, is the headquarters and information clearinghouse for Navigator. TMC staff support regional congestion and incident management through a three- phase process: • Phase 1: Collect information. TMC operators monitor the roadways and review real-time condition information from sensors deployed along regional Interstates. Operators also gather information provided by 511 users regarding traffic conges- tion and roadway incidents. • Phase 2: Confirm and analyze information. TMC operators confirm all incidents by identifying the problem, the cause, and the effect it is anticipated to have on the roadway. Based on their analysis, proper authorities, such as police or fire responders, are notified. • Phase 3: Communicate information. TMC operators communicate information regarding congestion and incidents to travelers by posting relevant messages to regional changeable message signs (CMS) and updating the Navigator website and 511 telephone service. GDOT’s traffic management system integrates with traffic sensors, closed-circuit televisions (CCTVs), CMS, ramp meters, weather stations, and highway advisory radio. At the TMC, staff use the real-time data and CCTV feed to detect congestion

442 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY and incidents. To minimize the disruption of traffic caused by lane-blocking incidents, TMC staff can dispatch highway emergency response operator patrols. GDOT esti- mates that the implementation of these patrols through the TMC has reduced the average incident duration by 23 minutes and reduced yearly delay time by 3.2 million hours during the peak commute (1). To facilitate information sharing and coordinated responses, the central TMC is linked to seven regional transportation control centers, as well as the City of Atlanta and the Metropolitan Atlanta Rapid Transit Authority. Sensors In the Atlanta region, GDOT collects data from over 2,100 roadway sensors, which include a mix of video detection sensors and radar detectors. Both of these types of sensors consist of single devices that monitor traffic across multiple lanes. The major- ity of active sensors monitor freeway lanes, with some limited coverage of conven- tional highways. Sensors in the active network are manufactured by four vendors, as shown in Table C.56. TABLE C.56. GDOT SENSOR NETWORK SUMMARY Vendor Sensor Type Use in Percentage of GDOT Network (%) Traficon Video 80 Autoscope Video 8 NavTeq Radar 8 EIS Radar 4 The make and model of the sensor dictates the type of data that it collects and the frequency at which data are retrieved from the device (and thus, the level of aggrega- tion of the data). Traficon video detection cameras make up approximately 80% of GDOT’s active detection network. In Georgia, these sensors monitor flow, occupancy, and speed and report data to a centralized location every 20 seconds. Autoscope video detection sensors make up another 8% of the GDOT detection network. These cam- eras also monitor flow, occupancy, and speed, but in the Atlanta region, the cameras report these data to a centralized location every 75 seconds. The remainder of the detection network is composed of radar detectors, which also report aggregated flows, occupancies, and speeds. NavTeq radar detectors make up 8% of GDOT’s active detec- tion network and report data every 1 minute. Finally, EIS’s remote traffic microwave sensor radar detectors make up 4% of GDOT’s active detection network and report data every 20 seconds. In addition to the aggregated flow, occupancy, and speed data, these sensors report on the percentage of passenger cars versus truck traffic. In general, the different types of sensors are divided up by freeway. Figure C.160 shows the location of active mainline sensors in the GDOT network, broken down by manufacturer. The predominant sensors, the video detectors manufactured by Traficon, exclusively cover the I-285 ring road, I-75, the I-75/I-85 Downtown Connector, and I-575. Traficon sensors also monitor GA-400 north of the ring road and the majority of I-85 and share coverage of I-20 with NavTeq radar detectors. In most of the network,

443 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Traficon sensors are placed with a very dense spacing of about one-third of a mile. Autoscope cameras monitor a small portion of I-85 near the Hartsfield– Jackson Atlanta International Airport with a spacing comparable to that of the Traficon cameras. In addition to sharing coverage of I-20 within the ring road with the Traficon sensors, NavTeq radar detectors exclusively monitor I-20 outside of the ring road, I-675, GA-400 inside of the ring road, and GA-316. NavTeq detectors are spaced approxi- mately 1 mile apart. Finally, remote traffic microwave sensor radar detectors exclusively monitor US-78, GA-141, and GA-166. All sensors in the network are capable of monitoring multiple lanes. For this reason, the same sensors that monitor mainline lanes can be configured to monitor HOV lanes. Figure C.161 shows the sensors that monitor HOV lanes. The moni- tored HOV lanes are I-75 inside of the ring road (Traficon), the I-75/I-85 Downtown Connector (Traficon), I-85 north of the I-75 split (Traficon), and I-20 from east of downtown Atlanta to east of the ring road. Along each of these freeway segments, HOV lanes are operational seven days a week, 24 hours a day along both directions of travel. In addition to the real-time detection network, GDOT staff use approximately 500 CCTV cameras positioned at approximately 1-mile intervals on most major Interstates around Atlanta to monitor conditions. Figure C.160. GDOT traffic detector network. 9 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.160. GDOT traffic detector network. [Insert Figure C.161] [caption]

444 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Data Management The primary data management system used in the Atlanta region is the Georgia DOT’s Navigator system. Navigator was initially deployed in metropolitan Atlanta in prepa- ration for the 1996 Summer Olympic Games. Navigator collects traffic data from video and radar detectors in the field, automatically updates CMS with travel time information, and controls ramp metering. It also pushes information to the public through a variety of outlets, including a traveler information website and a 511 tele- phone information service. In addition, Navigator data are used by several private sector companies who enhance and package the data for distribution to media outlets. The Navigator system is divided into six subsystems (1): 1. Field data acquisition services; 2. Management services; 3. Audio–video services; 4. System services; 5. Geographical information services; and 6. System security services. Figure C.161. GDOT managed-lane detector network. 10 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.161. GDOT managed-lane detector network. <H2>Data Management The primary data management system used in the Atlanta region is the Georgia DOT’s Navigator system. Navigator was initially deployed in metropolitan Atlanta in preparation for the 1996 Summer Olympic Games. Navigator collects traffic data from video and radar detectors in the field, automatically updates CMS with travel time information, and controls ramp metering. It also pushes information to the public through a variety of outlets, including a traveler information website and a 511 telephone information service. In addition, Navigator data are used by several private sector companies who enhance and package the data for distribution to media outlets. The Navigator system is divided into six subsystems (1):

445 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The field data acquisition subsystem, which is responsible for device communica- tion and management, acquires data from CMS, detector stations, ramp meters, a parking management system, and highway advisory radio. The management services system helps TMC staff analyze data to determine conditions and develop response plans; it includes the Navigator graphical user interface, congestion and incident detection and management services, response plan management, and the historical logging of detector data. The audio–video subsystem lets TMC staff control CCTVs in the field, as well as the display of information within the TMC. The system services subsystem communicates speed information with GDOT’s advanced traveler informa- tion system and logs system alarms. The geographical information services subsystem provides a graphical view of the roadway network and real-time data. The final sub- system provides system security. The primary functions of Navigator are the monitoring of and response to real- time traffic conditions. Navigator collects lane-specific volume, speed, and occupancy data in real time from the disparate detector types at their respective sampling fre- quencies (e.g., every 20 seconds for the Traficon cameras) and stores the raw data in a database table for 30 minutes. This database table always contains the most recent 30-minute subset of collected data. An associated table contains configuration data (such as locations and detector types) for all of the devices that sent data within the past 30 minutes. Besides being accessible at the TMC, these raw data are used to com- pute travel times on key routes, which are then automatically displayed on regional CMS, as well as distributed through traveler information systems. The raw data are not processed or quality controlled before being stored in the real-time data table. Every 15 minutes, the raw Navigator traffic data samples are aggregated up to lane-specific 15-minute volumes, average speeds, and average occupancies and are archived for each detector station. The data are not filtered or quality controlled before being archived. Many agencies and research institutions use this data set for performance measurement purposes; for example, the Georgia Regional Transporta- tion Authority (the metropolitan planning organization for the Atlanta region) uses it to develop its yearly Transportation AP Report, which tracks the performance of the region’s transportation system. Aside from the traffic data, Navigator also maintains a historical log of incidents. When the TMC receives a call about an incident, TMC staff log it as a potential inci- dent in Navigator until it can be confirmed through a camera or multiple calls. Once the incident has been confirmed, its information is updated in Navigator to include the county, type of incident, and estimated duration. This incident information is archived and stored. Systems Integration For the purposes of this case study, data from GDOT’s Navigator system was inte- grated into the Performance Measurement System (PeMS), a developed archived data user service and TTRMS. This section briefly describes the steps involved in integrat- ing the two systems. A more detailed account of the integration process and associated challenges is presented in the use case section.

446 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY PeMS is a traffic data collection, processing, and analysis tool that extracts infor- mation from real-time intelligent transportation systems, saves it permanently in a data warehouse, and presents it in various forms to users via the web. To report perfor- mance measures such as travel time reliability, PeMS requires three types of informa- tion from the data source system: • Metadata on the roadway line work of facilities being monitored; • Metadata on the detection infrastructure, including the types of data collected and the locations of equipment; and • Real-time traffic data in a constant format at a constant frequency (such as every 30 seconds or every minute). PeMS acquires the first piece of required information, roadway line work and mile marker information, from OpenStreetMap, an open source, user-generated mapping service. PeMS acquires the second piece of required information, detection infrastructure metadata, directly from GDOT database tables at the beginning of the integration process. The Navigator data framework is based around two components: devices and detectors. Devices are the physical unit in the field (either the video detection system [VDS] or the radar detector) that collect the data. Detectors represent the spe- cific lanes from which data are being collected. Because all GDOT detectors are VDS or radar, detectors in the GDOT network are virtual, rather than physical, entities. To define devices and detectors, GDOT has database tables that are modified each time field equipment is added, removed, or modified. The PeMS framework consists of two similar entities: stations (parallel to devices) and detectors. Because of this similarity, the mapping of GDOT infrastructure into PeMS was relatively straight- forward. Challenges related to using metadata from GDOT’s disparate detector types are described in the use case section. PeMS continuously acquires the final piece of required information, real-time data, from GDOT database tables. As described in the data management section above, Navigator stores all of the raw data for the most recent 30-minute period in a database table. To obtain data, PeMS consumes and stores the entirety of this data- base table every 5 minutes and discards any duplicate records. The Navigator raw database table is copied into PeMS every 5 minutes rather than every 30 minutes to support the near-real-time computation of travel times. Two aspects of the Navigator framework presented major challenges for incorpo- rating the traffic data into PeMS: the frequency of data reporting differs for different device types, and many VDS device data samples were missing. These challenges are further discussed in the use case section. Other Data Sources To deepen the case study analysis and explore alternative data sources, the project team acquired a parallel probe traffic data set provided by NavTeq. The data set covers the entirety of the I-285 ring road and is reported by traffic message channel ID. The following data are reported every minute for each traffic message channel ID:

447 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • Current travel time; • Free-flow travel time; • Current speed; • Free-flow speed; • Jam factor; • Jam factor trend; and • Confidence. The lengths of the traffic message channel segments vary but are generally between 0.3 and 2 miles long. PeMS acquires the NavTeq data through a real-time data feed. Although the computational methods and sources of the data are proprietary, the data are generally computed from a mixture of probe and radar data. When there is not sufficient real-time data to generate the reported measures, the data are also based on historical averages. The confidence interval reflects the amount of real-time data used in the computation. This data set is addressed in more detail in the use case section. To enable investigation into the impact of the seven sources of congestion on travel time reliability, the research team also acquired event data (consisting of incident and lane closure data) collected by Navigator. The issues involved in preparing this data set for use in analysis are detailed in the first use case. The results of the analysis into the impact of the sources of congestion on unreliability are discussed in the second use case. Summary The Atlanta metropolitan area offered the densest network of fixed point sensors of any of the five sites studied in this project and presented the challenges of adapting opera tional ATMS data for reliability monitoring. The site also provided the opportu- nity to analyze a third-party probe-based data set. METHODOLOGICAL ADVANCES Overview The methodological advancement of this case study builds on methods established and validated in previous case studies. Two of the main themes of the case study valida- tions are (a) estimating the quantity and characteristics of the operating travel time regimes experienced by different facilities and (b) calculating the impacts of the seven sources of nonrecurrent congestion on travel time reliability. To estimate regimes, the San Diego case study grouped time periods with simi- lar average travel time indices, within which travel time probability density functions (PDFs) were assembled. To refine the regime-estimation process, the Northern Virginia case study validated the use of multistate normal density functions to model the multi- modal nature (in a statistical sense) of travel time distributions for a particular facility and time of day. This approach has the advantage of providing a useful, traveler- centric output of the likelihood of congestion and the travel time variability under different congestion scenarios.

448 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The San Diego and Lake Tahoe case studies focused on estimating PDFs for travel times measured during instances of nonrecurrent congestion. These distributions help distinguish between the natural travel time variability of a facility due to the complex interactions between demand and capacity and the travel time variability during spe- cific events. The methodological goal of this case study is to fuse the previously developed regime-estimation and nonrecurrent congestion analysis methodologies by using multi- state models to inform on the reliability impacts of nonrecurrent congestion. Providing a way for agencies to link the travel time regimes that their facilities experience with the factors that cause them, such as incidents or special events, would allow agencies to better predict travel times when these events occur in real time, as well as develop targeted projects to improve reliability over the long term. The background and steps of this analysis are described below, with detailed results presented in Use Case 2. Site Description The methodology was applied to the segment of southbound I-75 starting just north of the interchange with I-85 and ending just north of the I-20 interchange in downtown Atlanta. A map of this corridor is shown in Figure C.162. This corridor was selected because it has • Significant recurrent congestion during the a.m. and p.m. weekday peak periods; • A high frequency of incidents; and • Proximity to special event venues, such as the Georgia Dome and Phillips Arena. Figure C.162. Downtown Connector study route. 18 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.162. Downtown Connector study route. <H2>Method The method to develop the regimes and estimate the impacts of nonrecurrent congestion events consists of three steps: 1. Regime characterization, to estimate the number and characteristics of each travel time regime measured along the facility; 2. Data fusion, to link travel times with the source active during their measurement; and

449 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Method The method to develop the regimes and estimate the impacts of nonrecurrent conges- tion events consists of three steps: 1. Regime characterization, to estimate the number and characteristics of each travel time regime measured along the facility; 2. Data fusion, to link travel times with the source active during their measurement; and 3. Seven sources analysis, to calculate the contributions of each source to each travel time regime. Regime Characterization The details of how to implement multistate normal models for approximating travel time density functions are thoroughly described in the methodology section of the Northern Virginia case study. With multistate models, the data set is modeled as a function of the probability of each state’s occurrence and the parameters of each state. In generalized form, multistate models take the form shown in Equation C.7: f T f T, K K K K K 1 ∑λ θ λ θ( )( ) = = (C.7) where T = travel time; f T ,λ θ( ) = travel time density function for the data set; K = state number; f TK Kθ( ) = density function for travel times in the Kth state; λK = probability of the Kth state occurring; and θK = distribution parameters for the Kth state. For the multistate normal distribution, θK is composed of the mean (μ) and the standard deviation (σ) of the state’s travel times. More practically, if a three-state normal model provides the best fit to a set of travel times collected at the same time of day over multiple days, the first state can be considered the least-congested state, the second state a more-congested state, and the third state the most-congested state. Each state is defined by a mean travel time and a standard deviation travel time, with the first state having the fastest mean travel time and the third state having the slowest mean travel time. The development of a multistate model consists of two steps: (a) identifying the optimal number of states to fit the data and (b) calculating the parameters (probability of occurrence and mean and standard deviation travel times) to define each state. The methods for performing these tasks are described in the Northern Virginia case study. In addition to providing the number of operating states and their parameters, the model also outputs, for each measured travel time, the percentage chance that it belongs within each state. By assigning each travel time to the state to which it is most likely to belong, it is possible to derive a set of travel times that belong within each

450 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY state. This output is used to drive the nonrecurrent congestion reliability analysis, which is described in the following subsection. Data Fusion To test the methodology, the research team downloaded 5-minute travel times mea- sured on all nonholiday weekdays between September 9, 2011 (the first day that PeMS was set up for data collection), and December 31, 2011, from the reliability monitor- ing system. Due to drops in the data feed, there were many days of missing data during November and December. Each travel time was then manually tagged with the source active during its measurement, following the methodology used and described in the San Diego case study and briefly summarized below. The following sources were in- cluded in the fusion process: • Baseline. No source was active during the 5-minute time period. • Incident. Incident data were acquired from Georgia Tech’s Navigator event data archive. The challenges of quality controlling the incident data set are described in the first use case below. The research team ultimately associated incident travel times with events that were marked as blocking at least one lane in the incident data set. The types of events are as follows: — Accident or crash; — Debris (all types); — Fire or vehicle; and — Stalled vehicle or lanes blocked. In previous case studies, the research team assumed that incident impacts began at the start time of the incident and ended 15 minutes after the incident closed, to allow for queue discharge. However, because the incident durations seemed unusually long in this data set, it was assumed for this study that incident impacts ended at the incident closure time. • Weather. Hourly weather data were downloaded from the National Oceano- graphic and Atmospheric Administration National Data Center and were mea- sured at a weather station housed at the Hartsfield–Jackson Atlanta International Airport (located approximately 10 miles southwest of the study corridor). The research team assumed that weather impacts were incurred when greater than 0.1 inch of precipitation was measured during the hour. The Navigator event data set also documented instances of roadway flooding (through the incident type “weather/road flooding”). Travel times measured during these events were also associated with this source. • Special events. Special event data from the Georgia Dome and Philips Arena were collated manually from sport and event calendars. Determining when special events affect traffic is challenging, as the impact of the event depends on the type of event. Typically, event traffic impacts begin before the start time and end after the event is over. However, although event start times are typically available, event

451 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY end times are rarely explicit and have to be assumed. In this study, a travel time was tagged with special event if it occurred up to 1 hour before the event start time and in the hour after the estimated end time. • Lane closures. Lane closures were gathered from the Georgia Tech’s Navigator event data archive, which contained events marked as “planned/maintenance activity,” “planned/construction,” and “planned/rolling closure.” The research team tagged travel times with the lane closure source if a closure affecting at least one lane was active during the 5-minute time period. In the San Diego case study, fluctuations in demand were also measured. In Atlanta, fluctuations in demand could not be analyzed because of the large quantity of missing data samples, which affected the ability of the system to monitor traffic volumes. This problem is explained in Use Case 1. Seven Sources Analysis The model development process described above results in a set of travel times, each tagged with the nonrecurrent congestion source active during its measurement and cat- egorized according to the state to which it belongs. From this it is possible to calculate two key measures concerning the relationships between nonrecurrent congestion and the travel time regimes: (1) within each state, the percentage of travel times measured during each source; and (2) for each source, the percentage of its travel times that belong in each state. The use case section presents the results of these two measures for a freeway corri- dor in downtown Atlanta. It also visualizes the results through travel time histograms divided into states and color-coded according to the source active during the travel time’s measurement. Results Results are presented in Use Case 2. USE CASE ANALYSIS Use Case 1: Integrating ATMS Data into a Travel Time Reliability Monitoring System Summary For this case study, data from GDOT’s Navigator ATMS system was brought into a TTRMS (PeMS) and archived to support the computation of historical and real-time travel time and reliability metrics. This case study was the project team’s first oppor- tunity to use ATMS data, which is focused on real-time congestion and incident detec- tion, for monitoring travel time reliability. In contrast, the San Diego and Lake Tahoe case study sites relied primarily on data within PeMS that had already been quality controlled and processed, and the Northern Virginia site leveraged data collected from an archived data user service at the University of Maryland. In each of these three cases, the data leveraged by the project team had already been processed to fill in any data holes and aggregated to ensure a consistent granularity across all of the raw data

452 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY samples. Because ATMS data are conventionally used only for real-time operations, the acceptable level of data quality is much lower than it is for the analysis of archived data. Conceptually, identifying gaps and errors in the real-time data is easier for TMC staff than it is for analysts; TMC staff have access to other data sources, such as CCTV cameras and reports from the field, but analysts evaluate historical travel times and performance measures without the benefit of any other contextual information. Given the nature of the Atlanta data, initial case study efforts focused on the integra- tion issues with acquiring unprocessed, incomplete data from disparate sensor types and using that data to compute travel time reliability. Encountered issues fell into two categories: (1) metadata integration, in which GDOT device and detector infor- mation is transferred into PeMS; and (2) data integration, in which real-time traffic data are acquired by PeMS, processed, cleaned, stored, and ultimately used to mea- sure travel times and reliability. The project team acquired metadata and traffic data through direct access to the relevant Navigator database tables. This use case describes the challenges of interpreting the information in the database tables and inputting it into PeMS. It also describes the process for interpreting the event data acquired from Georgia Tech from Navigator. Metadata Integration As described in the section on the monitoring system, the data model for Navigator detection devices (devices containing multiple detectors) is very similar to the PeMS data model (stations containing multiple detectors), and the mapping between the two system models was trivial. The primary metadata integration challenge was interpret- ing the fields and formats of the Navigator metadata database tables and filtering out nonactive infrastructure. Navigator defines devices and detectors in two separate data- base tables. The project team acquired complete copies of these database tables at the beginning of the integration project and used them to generate the detection network for PeMS. The device database table contained 14,581 rows, with nearly all device IDs hav- ing multiple records corresponding to different version numbers (up to 14 for some devices). The version number appeared to be driven by the modified date column, with the highest version numbers corresponding to the most recent modified date. The set of devices was reduced to a single record for each device ID with the highest version number. This step reduced the number of devices to 4,633. After excluding those miss- ing latitude and longitude information, which PeMS requires, 3,406 unique devices remained. The detector database table contained 40,496 records, which was reduced to 34,135 after excluding detectors associated with devices that had missing locations. Each detector was assigned a “lane_type.” PeMS assigns detectors to one of six possi- ble lane types: mainline, HOV, on-ramp, off-ramp, collector–distributor, and freeway- to-freeway connector. When assessing the Navigator detector lane types, the project team noted 21 possible categories. This high number is because Navigator, as a result of its operational nature, allows for the same type of lane to be identified in differ- ent ways. For example, in the detector database table, the lane types Entrance Ramp, Entrance_ramp, Left_entrance_ramp, Right_entrance_lane, and Right_entrance_ramp

453 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY are all used to denote on-ramp detectors. This multiplicity of terms required the devel- opment of a mapping structure to appropriately categorize Navigator detectors in PeMS, as shown in Table C.57. In doing this, the project team noted that a large percentage of the devices that had no locations monitored “arterial” detectors. The project team hypothesized that these devices were planned for deployment, but were not yet configured to report data into the system. TABLE C.57. MAPPING OF LANE TYPES FROM NAVIGATOR TO PEMS PeMS Lane Type Navigator Lane Type Mainline Mainline Through_lane Through_lanes Through-lanes THRU/THRU THRU/OFF-RAMP (THRU) THRU/ON-RAMP (THRU) HOV High Occupancy Vehicle Hov_lanes THRU/HOV On-ramp Entrance Ramp Entrance_ramp Left_entrance_ramp Right_entrance_lane Right_entrance_ramp Off-ramp Exit Ramp Right_exit_lane Right_exit_ramp Collector–distributor Collector/Distributor Freeway-to-freeway connector Connecting Lanes Not applicable Arterial Using the above structure, Navigator devices and detectors were mapped as sta- tions and detectors in PeMS. This mapping allowed for the real-time data integration, described in the next subsection, to begin. Agency Data Integration As described in the section on the monitoring system, two characteristics of the GDOT detection network presented major data integration challenges for the case study: vari- able data sampling rates across detectors and missing data samples for detectors and devices. Varying data sampling rates are problematic because PeMS assumes that all detectors within the same data feed report data at a constant, known frequency (e.g., in the San Diego case study, this frequency is every 30 seconds). This assumption enables the accurate aggregation of raw data up to the 5-minute level, from which travel times and other measures are then calculated. Although all GDOT detectors

454 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY report flow, occupancy, and speed, the frequency at which they report varies. GDOT stores the most recent 30 minutes of data from each active detector in a database table. PeMS obtains real-time data from GDOT by copying over the GDOT raw data- base table every 5 minutes and eliminating any duplicate records acquired in previous 5-minute periods. An initial manual review of the database table showed a data report- ing frequency of every 20 seconds, and so this frequency was the basis for aggrega- tion up to the 5-minute level. Through inspection of the aggregated data, however, it became evident that the frequency of data reporting varies by the vendor. Table C.58 shows the observed reporting frequencies by vendor type. TABLE C.58. DATA REPORTING FREQUENCIES BY VENDOR TYPE Vendor Reporting Frequency (s) Traficon 20 Autoscope 75 NavTeq 60 EIS 20 Although the majority of GDOT detectors report data every 20 seconds, a sig- nificant number do not, and thus were not being aggregated correctly in PeMS. The project team decided that the best way to handle this issue was to change the process for extracting data from the GDOT raw database table. Instead of extracting data from all detectors in a single feed, the problem could be solved by establishing three data feeds, each with its own aggregation routine, to obtain data from all detectors that report at the same frequency (20, 60, and 75 seconds). The second challenge identified by the project team was that a significant number of expected data samples were missing. For example, since Traficon detectors are con- figured to send data every 20 seconds, and GDOT stores the most recent 30 minutes of data from each detector, the research team expected to see 90 samples for each Traficon detector in each copy of the database table. Instead, many 20-second time periods were missing data for one or more detectors. For many of the VDS detectors, almost no samples were reported during the nighttime hours. The research team con- cluded that some of the detectors were not able to monitor traffic in the dark. Many samples were also missing during the daytime hours. The missing samples, combined with the fact that none of the data samples ever reported zero volume, made it clear that the detectors send no data sample if they detect no vehicles during the time inter- val. This data reporting scheme is problematic because monitoring systems need to be able to distinguish between when the detector or data feed is broken (requiring data imputation to fill in the hole) and when no vehicles passed the location during the time interval (requiring a recording of zero volume in the database). With PeMS, the GDOT detector reporting framework caused two main problems:

455 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 1. PeMS performs detector diagnostics at the end of every day. If less than 60% of expected data samples is received, then the detector is deemed to be broken and all of its data are imputed. 2. PeMS performs imputation for missing data samples in real time. If the cause of the missing sample is that there were no vehicles at the location over the time period, then the imputation results in an overcounting of volumes. In the Atlanta site, the first issue was deemed minimal because PeMS only runs the detector diagnostics on samples collected between the hours of 5:00 a.m. and 9:00 p.m. Since the majority of missing samples occur outside these hours (in the middle of the night), few detectors sent less than 60% of expected samples during the diagnostic hours. The second issue, however, was deemed more serious, because it means that volumes are overestimated and speeds are estimated from unnecessary amounts of imputed data. The ideal, permanent solution to mitigate both issues would be to change the way that the field equipment interacts with the data collection system so that data samples are sent even when no traffic is measured. This change would need to be made at the device level. However, because this was a case study validation effort and not a procured monitoring system for GDOT, the team decided that the fol- lowing solution would be more practical: 1. Turn off real-time imputation to allow missing data samples. 2. Calculate 5-minute volumes by summing up the nonmissing raw data samples. 3. Calculate 5-minute speeds by taking the flow-weighted average of the nonmissing raw data samples. 4. Compute travel times from all detectors with nonmissing 5-minute travel time samples along a route. The end result of this solution is that volume-based performance measures (such as vehicle miles traveled and vehicle hours of delay) may be underreported, but speed- based measures are more accurate than they would be under the PeMS traditional real-time imputation regime. Event Data Integration To enable seven sources analysis, the research team acquired a database dump of all Navigator events (primarily incidents and lane closures) from September through De- cember 2011 from Georgia Tech. The data, which were delivered in an Excel spread- sheet in a format summarized in Table C.59, contained 21,540 event records summa- rizing Navigator events within the Atlanta metropolitan region.

456 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.59. EVENT DATA FORMAT Column Name Description Example 1 ID Unique ID 244835 2 Primary Road Freeway number I-75 3 Dir Direction of travel N 4 MM Mile marker 228 5 Cross Cross street Jonesboro Rd 6 County County Clayton 7 Start Event start date and time 09/01/2011 01:00 8 End Event end date and time 09/02/2011 06:15 9 Type Type of event Accident/Crash 10 Status Status of event Terminated 11 Blockage Number of lanes blocked 2 The breakdown of events by type in the data set, grouped and summed into event types in similar categories, is shown in Table C.60. TABLE C.60. EVENT DATA SET BY EVENT TYPE Event Type No. of Events Accident (crash, hazardous material spill, other) 3,311 Debris (animal, mattress, tire, tree, other) 1,896 Fire (structural, vehicle, other) 237 Infrastructure (bridge closure, downed utility lines, gas or water main break, road failure) 120 Planned (accident investigation, construction, emergency roadwork, maintenance activity, rolling closure, special event) 4,499 Signals (bulb out, flashing, not cycling) 638 Stall (lane[s] blocked, no lanes blocked) 10,690 Unplanned (live animal, police activity, presence detection, rolling closure) 55 Weather (dense fog, icy condition, road flooding) 99 The ability of the data to detail incidents and lane closures on a 10-mile segment of southbound I-75 was assessed for use in analyzing the impacts of the seven sources of congestion on travel time variability on this corridor (see the section on method- ological advancement for more details). During this assessment, the team noted that the following data set characteristics complicated the assignment of incidents and lane closures to measured travel times:

457 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY • The same freeway was given different names in the Primary Road column. • Mileposts were missing from some events. • There were inconsistencies between the number of lanes blocked in the event type column and the blockage column. • Durations for many of the events were longer than expected for the event type. With respect to the first issue, the segment of I-75 studied was given the following names in the data set: 75/85, I-75, 75/85 SB, I-75/85, and 75. The project team had to ensure that all of the possible freeway names were evaluated and narrowed down by milepost to avoid missing any events on the study route. The second issue was dealt with by manually mapping the given cross street to determine if the location was on the study segment. The third issue related to the numerous events of the types “stall, lane(s) blocked” and “stall, no lanes blocked,” for which the degree of lane blockage was contradicted by the number in the blockage column. In these cases, the team used the event type description to determine if there was lane blockage. The fourth issue related to event durations; in many cases, the event duration computed from the start and end times seemed longer than would be expected for an event of that type. For example, it was common to see events of type “stall, no lanes blocked” last for longer than 3 hours. Without any other source of data to reference, the team simply had to accept the reported duration, and note it as a potential inaccuracy in the analysis. Conclusions Because most metropolitan areas are already equipped with ATMS detection and soft- ware systems, ATMS data are a likely source of information for an urban TTRMS. The integration of ATMS data into a TTRMS presents challenges in ensuring data quality and quantity. Practitioners may encounter the following issues when acquiring and integrating ATMS data for reliability monitoring purposes: • Sensor metadata and event data with missing required attributes, such as location; • Sensor metadata and event data with nonstandardized naming classification; • Data at miscellaneous sampling rates; and • Missing data samples. When required sensor information is missing, the only alternative to obtaining the information from the field is to discard the sensor from the reliability monitoring sys- tem. For nonstandardized classifications, the best alternative is to manually translate ATMS terminology into the monitoring system framework, prioritizing the translation of mainline and managed-lane detectors. The data variability issues are more challeng- ing to deal with and are best solved on a permanent level by changing the way that the field equipment communicates with the ATMS system to ensure that all the informa- tion needed for historical travel time monitoring is acquired.

458 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Use Case 2: Determining Travel Time Regimes and the Impact of the Seven Sources of Congestion Summary The Northern Virginia case study analyses developed methodologies for modeling the multimodal nature of travel time distributions to determine the operating regimes of a facility. The San Diego case study analyses validated ways to evaluate the impact of the seven sources of congestion on travel time variability. This use case seeks to combine these two methods to identify the impacts of the seven sources of congestion on the different travel time regimes that a facility experiences. The methodology that drives this analysis and a description of the study route are presented in the method- ological advancements section. This use case documents the results of performing the regime characterization, data fusion, and seven sources analysis steps on a 10-mile study route through downtown Atlanta during the weekday morning, midday, and afternoon periods. Results Regime Characterization The first step in the analysis was to identify the number of modes, or regimes, in the travel time distribution. In this study, the data set consisted of 5-minute travel times measured on nonholiday weekdays between September 9, 2011, and December 31, 2011. To appropriately identify the number of operating regimes along the study route, the travel time data set was grouped by similar typical operating conditions (de- fined by the mean travel time) and time of day into the following categories: • Morning peak, 7:20 to 9:20 a.m. (mean travel times exceeding 14 minutes); • Midday, 9:30 a.m. to 4:00 p.m. (mean travel times less than 13 minutes); and • Afternoon peak, 5:00 to 6:20 p.m. (mean travel times exceeding 18 minutes). An algorithm in the program R was used to identify the optimal number of multi- modal normal regimes to model each of the three travel time data sets. Results showed that the a.m. and p.m. peak time periods were best modeled with two normal distribu- tions and that the midday period was best modeled with three normal distributions. Figures C.163, C.164, and C.165 show a histogram of the travel time distribution for the morning, midday, and afternoon periods, respectively, as well as the PDFs for each of the regimes (dashed lines) and the overall mixed-normal density function (solid line). Table C.61 summarizes the regime parameters (probability of occurrence and mean travel time) by time period.

459 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.163. Morning multistate normal PDFs. Figure C.164. Midday multistate PDFs.

460 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.165. Afternoon multistate PDFs. TABLE C.61. REGIME PARAMETERS BY TIME PERIODS Probability (%) Mean Travel Time (min) State 1 State 2 State 3 State 1 State 2 State 3 Morning 47 53 N/A 12 16 N/A Midday 52 44 4 11 14 18 Afternoon 92 7 N/A 20 30 N/A Note: N/A = not applicable. In the morning peak time period, each regime (uncongested and congested) occurs about half of the time. The mean of the first, uncongested regime is 12 minutes, with little travel time variability in the distribution. The mean of the congested regime is 16 minutes, and the distribution of travel times is wider. The midday time period has three regimes. The uncongested regime happens 52% of the time, the slightly congested regime happens 44% of the time, and the congested regime happens only 4% of the time (this small percentage makes the regime invis- ible in Figure C.164). The mean of the uncongested regime is 11 minutes (free flow), the mean of the slightly congested regime is 14 minutes, and the mean of the most- congested regime is 18 minutes. The afternoon time period is characterized by two regimes. The congested regime happens 92% of the time, with a mean travel time of 20 minutes (almost double the free-flow travel time). The very congested regime happens only 7% of the time, but it has a mean travel time of 30 minutes (almost three times the free-flow travel time).

461 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Data Fusion In the data fusion step, the seven sources data described in the methodological ad- vancements section were fused with the 5-minute travel times. Table C.62 summarizes the number and percentage of travel time samples by source within each time period. Special events only occurred during the afternoon time period; lane closures only oc- curred during the morning and midday time periods. Incidents made up a similar per- centage of the data set in all three time periods. TABLE C.62. FIVE-MINUTE TRAVEL TIME SAMPLES BY SOURCE AND TIME PERIOD Source Morning Midday Afternoon Baseline 297 (60%) 1,254 (71%) 413 (78%) Incidents 77 (16%) 286 (16%) 73 (14%) Weather 115 (23%) 119 (7%) 36 (9%) Special events 0 (0%) 0 (0%) 10 (2%) Lane closure 7 (2%) 115 (6%) 0 (0%) Total 496 1,774 532 Seven Sources Analysis The final step in the analysis was to assess the contributions of the sources of conges- tion to each travel time regime. Figures C.166, C.167, and C.168 illustrate the break- down of travel times by source within each state in the morning, midday, and after- noon time periods, respectively. Tables C.63, C.64, and C.65 summarize each state’s parameters, the percentages of each state’s travel times tagged with each source, and the percentage of each source’s travel times that occur within each state in the morn- ing, midday, and afternoon time periods, respectively. During the morning peak time period, State 2 has a 4-minute higher mean travel time than State 1, and also contains more variability (a standard deviation of 3 min- utes versus less than a minute). Incident travel times are seen in both states, but inci- dents are three times more likely to result in the most-congested state. Weather events, in contrast, are found more frequently in the uncongested state (58%) than in the congested state (42%). There were few lane closure samples to evaluate, so lane clo- sures do not appear to be a driving factor of morning peak congestion and travel time variability on this route. State 2 contains a significant number of baseline travel times (51%), indicating that something other than incidents, weather, and lane closures is causing delay and unreliability on this corridor during the morning commute. The midday peak time period has three states. The most-congested state, which occurs only 4% of the time, is composed of around one-third weather-influenced travel times, one-fifth incident-influenced travel times, and one-tenth lane closure travel times, with the remainder baseline travel times. The fact that the less-congested states contain a significant proportion of the congestion-influenced travel times indicates that only the most severe instances of the sources result in a reduction in capacity below the midday demand levels.

462 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY During the afternoon peak time period, the congested state that happens 93% of the time (State 1) contains nearly all of the congestion source travel times. However, this state has a wide distribution of travel times, and Figure C.168 shows that many of these incident- and weather-influenced travel times occupy the rightmost part of the State 1 travel time distribution. The very congested second state during the afternoon peak time period is composed of one-third weather-influenced travel times and one- tenth incident-influenced travel times, with the rest baseline travel times, indicating that this most unreliable state is caused by some other influence. 43 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.166. Morning peak travel times by source. [Insert Figure C.167] [caption] Figure C.166. Morning peak travel times by source.

463 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 44 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.167. Midday peak travel times by source. [Insert Figure C.168] [caption] Figure C.167. Midday peak travel times by source. 45 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.168. Afternoon peak travel times by source. [COMPOSITION: Please do not align decimal places in the top section (the three rows after “Parameter”) of Tables C.63, 64, and 65. Please do align decimals from State Travel Times by Source (%) to the bottom of each table.] Table C.63. Source Contributions to Morning Peak Regimes State 1 State 2 Parameter Probability (%) 47 53 Mean (min) 12 16 Standard deviation (min) 0.7 3 Figure C.168. Afternoon peak travel times by sourc

464 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.63. SOURCE CONTRIBUTIONS TO MORNING PEAK REGIMES State 1 State 2 Parameter Probability (%) 47 53 Mean (min) 12 16 Standard deviation (min) 0.7 3 State Travel Times by Source (%) Baseline 67 51 Incidents 7 26 Weather 24 22 Special events 0 0 Lane closure 2 1 Source Travel Times by State (%) Baseline 62 38 Incidents 25 75 Weather 58 42 Special events 0 0 Lane closure 71 29 TABLE C.64. SOURCE CONTRIBUTIONS TO MIDDAY REGIMES State 1 State 2 State 3 Parameter Probability (%) 52 44 4 Mean (min) 11 14 18 Standard deviation (min) 0.2 3 4 State Travel Times by Source (%) Baseline 75 67 32 Incidents 10 24 20 Weather 6 6 35 Special events 0 0 0 Lane closure 9 3 13 Source Travel Times by State (%) Baseline 59 40 1 Incidents 36 62 2 Weather 54 34 2 Special events 0 0 0 Lane closure 78 17 4

465 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY TABLE C.65. SOURCE CONTRIBUTIONS TO AFTERNOON PEAK REGIMES State 1 State 2 Parameter Probability (%) 93 7 Mean (min) 20 30 Standard deviation (min) 4 4 State Travel Times by Source (%) Baseline 79 59 Incidents 14 7 Weather 5 34 Special events 2 0 Lane closure 0 0 Source Travel Times by State (%) Baseline 96 4 Incidents 97 3 Weather 72 28 Special events 100 0 Lane closure 0 0 Conclusions By combining the regime-estimation and seven sources analysis methodologies used in previous case studies, this application shows that it is possible to characterize the im- pact of the sources of nonrecurrent congestion on the different travel time states that a facility experiences. On the study route of I-75 into downtown Atlanta, the analysis showed that something other than weather, incidents, lane closures, and special events is a leading factor contributing to the high and unreliable travel times that make up the rightmost portion of the travel time distribution. This additional factor may be fluctuations in demand and capacity due to a bottleneck, which were not measurable at this case study site. On this route, weather is the source that, when it occurs, most frequently drives the travel time regime into the most-congested state. Use Case 3: Quantifying and Explaining the Statistical Difference Between Multiple Sources of Vehicle Speed Data Summary This use case identifies issues associated with the integration of data feeds from multiple sources. Speed measurements from Traficon video detectors and NavTeq probe vehicle runs are compared. For each of these technologies, the data come from a 10-mile segment of I-285 in Atlanta, where peak period congestion is observed on weekdays. Some preprocessing was necessary to translate the data sets into a common format that could be easily compared, after which correlations between pairs of detectors of each type at the same location were computed. A possible source of difference in the

466 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY measurements, which was the distance between each pair of compared detectors, was analyzed and found to be moderately significant. Data from multiple sources, if properly understood, can be aggregated to pro- vide a rich set of performance monitoring information. Multiple data sources add redundancy to the system, preventing a data blackout in the event that one of the data feeds goes down. Multiple data sources also facilitate the cross validation of detectors, providing an additional way to identify malfunctioning equipment. However, if the additional data sources are integrated incorrectly, they can conflict with each other, decreasing the accuracy of the monitoring system in unpredictable ways. The observed traffic data are the fundamental driver of the performance measures computed by a TTRMS. While the underlying traffic model also influences the per- formance measures, its influence is typically static. For example, a particular method- ology for computing travel times may be consistently biased toward overestimating travel times. A systematic bias like this can be recognized and accounted for. However, the effects of misconfigured data sources can change as the incoming data changes. Understanding the peculiarities of data from different sources is critical because the observed data feed directly into the measures computed by the monitoring system. Users This use case is applicable to all users of a TTRMS, particularly those systems that integrate data from multiple sources or technologies. It provides practical guidance on how to properly compare traffic measurements from multiple data sources. The data comparison techniques presented here are the necessary first steps to transform raw detector data from multiple sources into aggregated traffic information. This informa- tion will give important context to TTRMS users by improving their understanding of the performance measures they compute. Information technology professionals responsible for the data integration and pre- processing tasks necessary to build and maintain a TTRMS will also benefit directly from this use case. This use case provides guidance on the steps necessary to compare data from two sources, which is a necessary initial step in data integration. Under- standing these issues can also help system managers more easily troubleshoot systems in which computed performance measures are suspect. For example, data feeds that are aggregated incorrectly can be compared using the techniques presented in this use case as part of a troubleshooting routine. This use case is also valuable to transportation professionals interested in explor- ing new data sources. Global positioning system (GPS)–based probe data are increasing in availability and offer a rich roadway monitoring solution, with speed and position measurements taken from actual vehicles throughout their trips. Probe data are also appealing because they do not require any ongoing maintenance of detection equip- ment. With this technology, there is no roadway-based detection hardware; the data collection infrastructure resides entirely within the vehicles themselves. Compared with conventional infrastructure-based sensors, which only record roadway informa- tion at discrete locations and must be regularly maintained, probe data can be very appealing. This use case provides guidance on how probe data compare with more traditional infrastructure-based data sources.

467 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Data Characteristics This use case compares two types of traffic data: speed data from vehicle probes, pro- vided by NavTeq, and speed data from Traficon video detectors. The vehicle probe data come from GPS chips residing within individual vehicles, directly measuring their speed and location. Traficon data come from video cameras installed at fixed locations along the roadway that measure speed, volume, and density. Data from infrastructure- based sensors such as these (and loop detectors) are currently more common than probe data sources. For this reason, many TTRMS users conceptualize the data they see primarily in terms of fixed-infrastructure sensors. The rising availability of probe data for transportation system monitoring makes the NavTeq probe data a desirable data set to compare with fixed-infrastructure data. Because the video data come from fixed-infrastructure sensors and the probe data come from in-vehicle sensors, they require different types of network configurations to relate them to the roadway. The video data are organized by device, with each device applying to a single location on the roadway. Data from each device correspond to traffic at that point. The probe data, in contrast, are organized directly by location through traffic message channel paths. Each traffic message channel path represents a stretch of roadway in a single direction that is explicitly defined by a starting and ending milepost. The lengths and locations of these paths are irregular, and there are gaps between them. The NavTeq probe data differentiates between mainline speeds and speeds on managed lanes such as HOV or high-occupancy toll lanes, although it does not pro- vide mainline speeds disaggregated by lane. A data point is calculated for each traffic message channel path roughly every 2 minutes (0.5 Hz). This is a lower sampling rate than many other types of detectors, but because the measurements are taken directly from actual vehicles (representing ground-truth conditions), they are generally consid- ered more accurate, making sampling frequency less important. The Traficon video detector data closely resemble traditional infrastructure-based data, such as that from loop detectors. Each video detector is assigned to a specific milepost and lane on the roadway, and its measurements apply directly to that point location. Each video detector directly reports occupancy, speed, and flow at a maxi- mum frequency of once every 20 seconds (3 Hz). This frequency is comparable to that of most loop detectors. Sites A 10-mile section of I-285 around Atlanta (known locally as the Perimeter) was chosen for this study for several reasons. As discussed above, I-285 is covered by both Traficon video detectors and NavTeq probe data, and this location has good data availability for both. The heavy commute traffic on I-285 leads to strong peak period congestion and a range of congestion levels; I-285 carries the largest volume of traffic of any Atlanta freeway, providing the metropolitan area access to major Interstates I-20, I-75, and I-85, which lead to several residential suburbs. Data covering both the northbound and southbound directions of travel were examined. The study area spanned Mileposts 25 to 35 in the northbound direction and Mileposts 45 to 55 in the southbound direction. Although these milepost ranges

468 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY differ, they represent the same stretch of roadway (see Figure C.169). The study area extends from the Belvedere Park area at its southern end to the I-85 interchange at its northern end. During the time period studied, free-flow speed was measured around 70 mph. The typical weekday flow was 80,000 to 90,000 vehicles per day northbound and approximately 100,000 vehicles per day southbound. In the northbound direction, 3.9 of the 10 miles in the study area are covered by eight traffic message channel paths, with an average length of 0.5 miles. Of the 24 working Traficon detectors in the northbound direction, seven lie within a traffic mes- sage channel path. In the southbound direction, 5.3 of the 10 miles in the study area are covered by eight paths, with an average length of 0.7 miles. Of the 19 working Traficon detectors in the southbound direction, 12 lie within a traffic message channel path (see Figure C.170). This site’s congestion patterns made it a desirable choice. Morning peak time period congestion was seen in the northbound direction between 6 and 9 a.m., and afternoon peak time period congestion was seen in the southbound direction between 4 and 7 p.m. In both directions, the congestion was most pronounced on Tuesdays, Wednesdays, and Thursdays. Five-minute speed measurements as low as 15 mph were commonly observed in both directions. 54 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx 2011. Data from this initial date through December 23, 2011 (the beginning of a gap in availability), were obtained for the 51 video detectors in the study area from PeMS. All available data for each detector were included, weekends in addition to weekdays, in order to compare the data sets across a range of conditions. PeMS stores Traficon video detector data at a 5-minute resolutio at the finest, which is the level of aggregation used in the comparison. It was immediately obser ed that two northbound and six southbound video detectors were not reporting any data, and they were discarded. [Inse t Figure C.169] [caption] Figure C.169. Locations of NavTeq traffic message channel paths (longitudinal black lines) and Traficon video detectors (perpendicular black lines).

469 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Methods The comparison of the probe and video speed data began with the procurement of that data. PeMS began collecting live Traficon video detector data in the Atlanta re- gion on September 9, 2011. Data from this initial date through December 23, 2011 (the beginning of a gap in availability), were obtained for the 51 video detectors in the study area from PeMS. All available data for each detector were included, weekends in addition to weekdays, in order to compare the data sets across a range of conditions. PeMS stores Traficon video detector data at a 5-minute resolution at the finest, which is the level of aggregation used in the comparison. It was immediately observed that two northbound and six southbound video detectors were not reporting any data, and they were discarded. PeMS began archiving NavTeq probe data in the Atlanta region on September 18, 2011. All available data from this date through December 23, 2011, were obtained from all 17 traffic message channel paths in the study area. Each probe data point is the result of NavTeq’s aggregation of many GPS measurements from multiple vehicles into a single speed value for a particular traffic message channel path. PeMS stores these aggregated speed measurements at their finest provided resolution, which is one data point roughly every 2 minutes (0.5 Hz). Figure C.170. Study area on I-285. 55 2014.04.23 13 L02 Guide Appendix C Par 5_final for com osition.docx Figure C.169. Locations of NavTeq traffic message channel paths (longitudinal black lines) and Traficon video detectors (perpendicular black lines). [Insert Figure C.170] [caption] Figure C.170. Study area on I-285. PeMS began archiving NavTeq probe data in the Atlanta region on September 18, 2011. All available data from this date through December 23, 2011, were obtained from all 17 traffic message channel paths in the study area. Each probe data point is the result of NavTeq’s aggregation of many GPS measurements from multiple vehicles into a single speed value for a

470 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To properly compare the two data sets it is immediately necessary to convert them to a common time standard. As obtained from PeMS, the video data and probe data have different time ranges and different sampling frequencies. A Perl script was written to fix the time range of all data sets to extend between September 9, 2011, and Decem- ber 23, 2011, with empty cells for any time points without data. This same script fixed the probe data to the 5-minute resolution of the video data, the coarser of the two data resolutions. This was done by dividing the predefined time range into 5-minute windows and averaging all probe data points that fell inside each window (see Fig- ure C.171). As discussed above, GDOT’s Navigator system also aggregates Traficon data into 15-minute periods. Each 5-minute Traficon video speed measurement is also accompanied by a value called “percent observed” that represents the degree to which that data point repre- sents an actual roadway measurement. Certain time periods might have a low per- centage observed due to errors in the detector or feed. In those cases, PeMS fills in the missing data according to certain estimation algorithms. To keep the comparison Figure C.171. Common temporal aggregation of comparison data. NavTeq: NavTeq:

471 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY focused solely on the data generated by the sensors, only 100% observed data points were included. After this filtering, between 40% and 50% of 5-minute periods con- tained data for most Traficon video detectors. By comparison, the NavTeq probe data sets all contained data for 20% of all 5-minute periods, and all traffic message channel paths followed the same pattern of data availability. This data availability indicates that the few probe data outages were caused by system issues. Once the video and probe data were all in the same temporal frame of reference, the comparison began by identifying the pairs of video detectors and traffic message channel paths that applied to the same stretch of roadway. Since video detectors are fixed to a point and traffic message channel paths span a length of roadway, each video detector can have no more than one associated path, but each path can have many matching video detectors (see Figure C.169). There were 7 pairs of video detectors and traffic message channel paths in the northbound direction and 12 in the southbound direction. With video and probe detectors paired by location, their speed measurements can be plotted and compared visually. Figure C.172 shows video detector and probe speeds at the same location on I-285 in the northbound direction over three consecu- tive weekdays. Both data sets seem to agree closely on the speed profile during the congested period. However, the NavTeq probe data are clearly capped at an artificial ceiling around 55 mph. This means that the probe data are only valid for times when speeds were below 55 mph. To maintain the integrity of the comparison, all 5-minute periods during which any traffic message channel path had a reported speed of 55 mph were identified as artificial and discarded. Critically, the corresponding time period in the paired video detector was also discarded in order to maintain the same temporal reference in both data sets. Figure C.173 plots the results of this filtering on the time range and data from Figure C.172, showing all of the time points from Figure C.172 during which both data sets contained directly observed data. The removal of data from certain time periods creates discontinuities in the time basis of the data, so each point is now iden- tified by its index in the data set. This procedure effectively removes all noncongested time periods from each comparison, which means that the fundamental basis of com- parison of these data sets is the observed speeds during congested periods. Many techniques are available for numerically computing the similarity of two data sets. In this case, the Pearson correlation coefficient was computed between each pair of processed data sets. The correlation coefficient is defined as the covariance of the two data sets (a measure of their linear dependence) normalized by the product of their standard deviations. Covariance is a useful measure of the degree to which two data sets increase and decrease together, but its magnitude is difficult to interpret. Normalizing the covariance by the product of the standard deviations allows correla- tions to be compared across pairs of data sets. Correlation coefficients were computed between each pair of processed data sets in R to determine the degree to which the speed measurements from each source agree.

472 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Figure C.172. Comparison of speeds from video (black) and probe (gray) sources. 59 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx Figure C.172. Comparison of speeds from video (black) and probe (gray) sources. Many techniques are available for numerically computing the similarity of two data sets. In this case, the Pearson correlation coefficient was computed between each pair of processed October 12, 2011 S pe ed 00:00 06:00 12:00 18:00 00:00 Time S pe ed 00:00 06:00 12:00 18:00 00:00 Time October 13, 2011 10 30 50 70 10 30 50 70 10 30 50 70 S pe ed 00:00 06:00 12:00 18:00 00:00 Time October 11, 2011

473 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Inspection of Figure C.173 reveals that the probe data at this location appear to lag slightly behind the video detector data. This lag can be quantified by computing the cross-correlation of the two data sets. To demonstrate this, the cross-correlation for the data shown in Figure C.173 was computed. It can be seen in Figure C.174 that the peak correlation occurs at a lag of −1; the unshifted data have a correlation of 0.80. When the probe data are shifted earlier by one index position, as recommended by the cross-correlation function, the correlation of the two data sets improves to 0.93 (see Figure C.175). This technique can be used to calibrate sensor measurements. I-285 Northbound Results Correlations in speed measurements from the northbound direction of travel were strong, ranging from 0.75 to 0.87. Of the seven video detector–traffic message channel path pairs, five (71%) had correlations exceeding 0.8. The most poorly correlated pair was located at the northern end of the study segment, near the North Hills Shopping Center. The best correlation was seen between the longest traffic message channel path and the detector located near its middle, close to the Decatur Road exit. 60 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx data sets. The correlation coefficient is defined as the covariance of the two data sets (a measure of their linear dependence) normalized by the product of their standard deviations. Covariance is a useful measure of the degree to which two data sets increase and decrease together, but its magnitude is difficult to interpret. Normalizing the covariance by the product of the standard deviations allows correlations to be compared across pairs of data sets. Correlation coefficients were computed between each pair of processed data sets in R to determine the degree to which the speed measurements from each source agree. [Insert Figure C.173] [caption] Figure C.173. Comparison of speeds from video (black) and probe (gray) sources after filtering. Inspection of Figure C.173 reveals that the probe data at this location appear to lag slightly behind the video detector data. This lag can be quantified by computing the cross- correlation of the two data sets. To demonstrate this, the cross-correlation for the data shown in Figure C.173 was computed. It can be seen in Figure C.174 that th peak correlation occurs at a lag of −1; the unshifted data have a correlation of 0.80. When the probe data are shifted earlier by one index position, as recommended by the cross-correlation function, the correlation of the 0 20 40 60 80 100 120 Index 10 30 50 70 S pe ed (m ph ) Figure C.173. Comparison of speeds from video (black) and probe (gray) sources after filtering. 61 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx two data sets improves to 0.93 (see Figure C.175). This technique can be used to c librate sen or measurements. [Insert Figure C.174] [caption] Figure C.174. Cross-correlation of data from Figure C.173. [Insert Figure C.175] [caption] Figure C.175. Data from Figure C.173 after shifting probe (gray) data. −15 −10 −5 0 5 10 15 −0.2 0 0.2 0.4 0.6 0.8 Lag Au to co rre la tio n Fu nc tio n 0 20 40 60 80 100 120 Index 10 30 50 70 S pe ed (m ph ) Figure C.174. Cross-correlation of data from Figure C.173.

474 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY I-285 Southbound Results Correlations in speed measurements from the southbound direction of travel were slightly weaker than in the northbound direction, ranging from 0.69 to 0.87. The range of correlations was greater in this direction of travel, perhaps because of the larger number of pairs. Of the 12 video detector–traffic message channel path pairs, only one (8%) had a correlation exceeding 0.8, although 10 (83%) exceeded 0.75, a good correlation. The most poorly correlated pair was located on the southern edge of the longest traffic message channel path, near Midvale Road. The best correlated pair was seen between the path and detector located near the junction of US-78 and I-285. Discussion Although the video detector speeds and probe speeds correlated well with each other, a better understanding of the source of the differences in the measurements was sought. Some part of the difference is likely due to random error, but another part could be related to the locations of the video detectors and traffic message channel paths. Since each detector that sat along any part of a path was paired with that path, one source of difference could be related to the location of the video detector within its paired traffic message channel path. It seems reasonable to assume that a path paired with a video detector located at its midpoint would correlate better than a path paired with a video detector near the path’s edge. To investigate this hypothesis, the distance between each video detector and the midpoint of its paired traffic message channel path was calculated. These distances ranged from 0.02 to 0.27 miles in the northbound direction and from 0.01 to 0.72 miles in the southbound direction. Scatterplots were made between these distances and the correlation of the corresponding video detector and traffic message channel path for each freeway direction (see Figure C.176). One would expect each pair’s correlation to increase as the distance decreases, and indeed this negative relationship seems to appear in the southbound direction (R2 = 0.55). However, no linear relationship between cor- relation and distance is apparent in the northbound direction. When plotting distances and correlations from both directions of traffic together, the same approximate linear Figure C.175. Data from Figure C.173 after shifting probe (gray) data. 61 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx two data sets improves to 0.93 (see Figure C.175). This technique can be used to calibrate sensor measurements. [Insert Figure C.174] [caption] Figure C.174. Cross-correlation of data from Figure C.173. [Insert Figure C.175] [caption] Figure C.175. Data from Figure C.173 after shifting probe (gray) data. −15 −10 −5 0 5 10 15 −0.2 0 0.2 0.4 0.6 0.8 Lag Au to co rre la tio n Fu nc tio n 0 20 40 60 80 100 120 Index 10 30 50 70 S pe ed (m ph )

475 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY relationship that was seen in the southbound direction reemerges, with a slightly lower correlation coefficient (R2 = 0.43). This result indicates that part of the difference in the video detector and probe data speed measurements may be due to the distance between the video detector and the midpoint of the traffic message channel path. Another way to compare two sets of speed measurements would be to simply compute the difference between them at each time point. Figure C.177 shows the dif- ference in speed measurements for the same pair of detectors and time ranges shown in Figures C.172 and C.173. Speed measurements from this pair of detectors matched well, with a correlation coefficient of R2 = 0.85. Figure C.173 shows both speed pro- files in general agreement. However, when the difference in speed measurements is plotted in Figure C.177, it can be seen that the measurements often differ by as much as 20 mph during individual 5-minute time periods. This level of discrepancy indicates that measurements from the two types of detectors may not agree at fine time resolu- tions, even if the detectors are properly configured and in good working order. That 64 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx measurements for the same pair of detectors and time ranges shown in Figures C.172 and C.173. Speed measurements from this pair of detectors matched well, with a correlation coefficient of R2 = 0.85. Figure C.173 shows both speed profiles in general agreement. However, when the difference in speed measurements is plotted in Figure C.177, it can be seen that the measurements often differ by as much as 20 mph during individual 5-minute time periods. This level of discrepancy indicates that measurements from the two types of detectors may not agree at fine time resolutions, even if the detectors are properly configured and in good working order. That the speed difference appears to fluctuate around zero indicates further that this pair is still a good match. Since the detectors agree on the general duration and speed profile of congestion and their difference is centered at zero, their correlation will likely improve as the data are rolled up to coarser levels of temporal aggregation. [Insert Figure C.176] [caption] Figure C.176. Scatterplots comparing correlation of speed measurements with distance between detectors. [Ins rt Figure C.177] I-285 Northbound and Southbound 0.6 0.7 0.8 0.9 1.0 0. 0 0. 2 0. 4 0. 6 0. 8 Correlation M id po in t D is ta nc e (m ile s) R2 = 0.429 I-285 Northbound 0.6 0.7 0.8 0.9 1.0 0. 0 0. 2 0. 4 0. 6 0. 8 Correlation M id po in t D is ta nc e (m ile s) R2 = 0.003 0.6 0.7 0.8 0.9 1.0 0. 0 0. 2 0. 4 0. 6 0. 8 Correlation M id po in t D is ta nc e (m ile s) I-285 Southbound R2 = 0.547 Figure C.176. Scatterplots comparing correlation of speed measurements with distance between detectors. 65 2014.04.23 13 L02 Guide Appendix C Part 5_final for composition.docx [caption] Figure C.177. Difference in speed measurements (video – probe). <H3>Conclusion This use case explored the steps necessary to compare speed measurements from two types of detectors. Differences in sampling rate (3 versus 0.5 Hz), configuration basis (detector based versus traffic message channel path based), and data availability range were addressed by aggregating speed measurements at the finest available grain to 5-minute windows. Time points during which a video detector was less than 100% observed or a traffic message channel path reported the 55 mph speed ceiling were discarded. After this preprocessing, the speed values of detectors from the same roadway segment were compared by computing their correlations. The video detector speeds correlated well with probe-based speeds at the same location, particularly in terms of the magnitude of speed drops and their profiles. Thus, these disparate detector types can be used together to determine the time, duration, and extent of congestion. Additional analysis revealed that some part of the differences between the two types of measurements may 0 20 40 60 80 100 120 Index 10 -10 -20 -30 20 0 30 S pe ed D iff er en ce (m ph ) Figure C.177. Difference in speed measurements (video – probe).

476 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY the speed difference appears to fluctuate around zero indicates further that this pair is still a good match. Since the detectors agree on the general duration and speed pro- file of congestion and their difference is centered at zero, their correlation will likely improve as the data are rolled up to coarser levels of temporal aggregation. Conclusion This use case explored the steps necessary to compare speed measurements from two types of detectors. Differences in sampling rate (3 versus 0.5 Hz), configuration basis (detector based versus traffic message channel path based), and data availability range were addressed by aggregating speed measurements at the finest available grain to 5-minute windows. Time points during which a video detector was less than 100% observed or a traffic message channel path reported the 55 mph speed ceiling were dis- carded. After this preprocessing, the speed values of detectors from the same roadway segment were compared by computing their correlations. The video detector speeds correlated well with probe-based speeds at the same location, particularly in terms of the magnitude of speed drops and their profiles. Thus, these disparate detector types can be used together to determine the time, duration, and extent of congestion. Addi- tional analysis revealed that some part of the differences between the two types of measurements may be due to the distance of the video detector from the midpoint of its matched traffic message channel path. Finally, plotting the difference between two data sets reveals the hazards of comparing data from individual 5-minute periods. LESSONS LEARNED Overview This case study shows that, with proper quality control and integration measures, ATMS data can be used for travel time reliability monitoring, including the linking of travel time variability with the sources of nonrecurrent congestion. It shows that ATMS systems can be a source of traffic data, as well as a source of information on the relationship between travel time reliability and the seven sources of congestion. In evaluating the similarity between ATMS and third-party probe data, this case study also sheds light on points of consideration for integrating different data sources into a TTRMS. The remainder of this section describes lessons learned within each of these areas. Systems Integration The key systems integration finding from this case study is that ATMS data require sig- nificant evaluation and quality-control processing before they can be used to compute travel times and inform on the causes of unreliability. Four major issues were noted with ATMS data and metadata: 1. Sensor metadata and event data may not contain locational information at the ac- curacy required for travel time computation and analysis. 2. Descriptive information for sensor metadata and event data can be free form and nonstandardized.

477 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY 3. Traffic data may not be received at constant sampling rates. 4. Expected data samples may be missing. Due to the short-term nature of this case study, these issues were handled inter- nally by the project team by changing the properties of the data collection feeds and discarding sensors and events that did not have sufficient information to allow for interpretation. For staff executing a long-term deployment of a reliability monitoring system, these issues highlight the need for a thorough understanding of the ATMS data model and processing steps. These issues also highlight the need for a good relation- ship with ATMS staff so that the necessary information can be acquired and problems resolved. Methodological Advancement The methodology work of this case study linked the regime-estimation work devel- oped in the Northern Virginia case study site with the seven sources analysis developed for the San Diego site. At the San Diego study site, analysis showed incidents and weather events to be leading drivers of travel time variability. On the Atlanta corridor, although incidents, weather, lane closures, and special events all contributed to the slowest and most variable travel time regimes, a large portion of travel time variability was not attributable to any of the measured seven sources. This observation indicates that, particularly for urban corridors that experience a lot of recurrent congestion, the harder-to-measure sources of fluctuations in demand and inadequate base capacity are likely leading drivers of travel time variability. Probe Data Comparison This case study provided the first opportunity to compare speed data reported by infrastructure-based sensors with speeds obtained from a third-party data provider. It showed that there are three main points of consideration for integrating different data sources into a reliability monitoring system: (1) standardizing the data sampling rate (in this case study, 3 versus 0.5 Hz); (2) standardizing the spatial aggregation of the data (in this case study, detector based versus traffic message channel path based); and (3) handling instances of missing or low-quality data samples among the sources. These issues must be dealt with before disparate data sources can be fused together for reliability monitoring. Following the necessary integration steps and the discarding of any artificial speed bounds in the third-party data set (in this case study, third-party speeds were capped at 55 mph), the comparison analysis showed that the agency- owned video detection speeds correlated well with the corresponding probe-based speeds. However, results showed that speed differences between data sources may in- crease with the distance between the midpoint of the traffic message channel path and the infrastructure detector.

478 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY REFERENCE 1. Federal Highway Administration. Congestion Mitigation and Air Quality Improve- ment Program: Advancing Mobility and Air Quality: NAVIGATOR—Advanced Transportation Management System (ATMS)—Atlanta, Georgia. http://www.fhwa. dot.gov/environment/air_quality/cmaq/research/advancing_mobility/03cmaq07. cfm. Accessed Sept. 2, 2012.

479 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Case Study 5: NEW YORK/NEW JERSEY The New York City metropolitan area in the states of New York and New Jersey was chosen to provide insight into travel time monitoring in a high-density urban location. The main objectives of the New York/New Jersey case study included • Obtaining time-of-day travel time distributions for a study route from probe data; • Identifying the cause of bimodal travel time distributions on certain links; and • Exploring the causal factors for travel times that vary significantly from the mean conditions. The route analyzed in this case study begins in the Boerum Hill neighborhood of Brooklyn, traverses three major freeways—the Brooklyn–Queens Expressway (I-278), the Queens–Midtown Expressway (I-495), and the Van Wyck Expressway (I-678)— and ends at JFK International Airport. The route is illustrated later in this section. The monitoring system section details the reasons for selecting New York as a case study site and gives an overview of the setting. It briefly summarizes the archived probe vehicle data source and the underlying road network to which it corresponds and gives an overview of the approach the team took to analyze that data. Methodology describes the steps necessary to obtain probability density functions (PDFs) of travel time distributions based entirely on probe data along a New York City route. Critically, these probe data are sparse, and no probe vehicles traverse the entire route. Techniques are presented to preserve the correlation in speed measurements on consecutive links while synthesizing the aggregate route travel time PDF (TT-PDF) from segments of multiple probe vehicle runs. Use case analysis is less theoretical and more site specific. It is motivated by the user scenarios described in Appendix D, which are the results of a series of interviews with transportation agency staff regarding agency practice with travel time reliability. Although the methodology section of this case study describes the steps necessary to process and interpret probe vehicle data, the use case section focuses on a specific application of this methodology. This case study contains a single use case that focuses on three alternative methodologies for constructing TT-PDFs from probe data.

480 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Lessons learned summarizes the lessons learned during this case study with regard to all aspects of travel time reliability monitoring: sensor systems, software systems, calculation methodology, and use. These lessons will be integrated into the final Guide for practitioners. MONITORING SYSTEM Site Overview The New York City site was chosen to provide insight into travel time monitoring in a high-density urban location. The 2000 U.S. Census estimated New York City’s popula- tion to be in excess of 8 million residents, at a density near 26,500 people per square mile (1). Kings County (Brooklyn) is the second most densely populated county in the United States after New York County (Manhattan) (1). New York City has a low rate of auto ownership; only 55% of households had access to an automobile in 2010 (2). For drivers of single-occupant vehicles, 53% of all commute trips take 30 minutes or more, with an average commute travel time of 31 minutes (2). This site is covered by a probe vehicle data set provided to the research team by ALK Technologies, Inc. Probe vehicle data are collected from mobile devices inside of vehicles and consist of two types of data: individual vehicle trajectories defined by time stamps and locations and link-based speeds calculated from each vehicle’s trajec- tory. Probe vehicle detection technology provides high-density information about the vehicle’s entire path, allowing travel times to be directly monitored at the individual vehicle level. In contrast, infrastructure-based sensors such as loop detectors measure traffic only at discrete points along the roadway and do not keep track of individual vehicles as they travel. Probe data rely on the roadway’s users to generate performance data, greatly reducing detection maintenance costs to the agency. These features make probe vehicles an attractive roadway data source to agencies. The research team obtained probe data for a region of New York City defined by a rectangular box running 25 miles east to west and 40 miles north to south. Figure C.178 shows this bounding box, which covers Manhattan, the Bronx, and Brooklyn in their entirety, along with most of Queens. Data from all roadway seg- ments within this box were obtained. Probe runs that crossed the boundary of the box were truncated such that only the segments within the box were included in the data set. Segments that had been truncated in this way were treated as unique trips. The data obtained for this site were a static collection of raw traces and processed speed measurements collected by probe vehicles between May 19, 2000, and Decem- ber 29, 2011. No real-time data were acquired or analyzed for this case study because such data were unavailable. Unlike the other case study sites, in the New York/New Jersey location an archived data user service was not deployed. All data processing and visualization were carried out through custom routines run offline. Like roadway data from the other case studies, this probe data set was accom- panied by a network configuration. A network configuration connects traffic data to the physical roadway network through a referencing system. This configuration is necessary for proper interpretation and analysis of the traffic data, such as computing

481 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY route travel times from point speeds. For these probe data, the network configuration is made up of links defined by ALK. Links are unique to a roadway segment and direc- tion and are less than 0.1 mile long on average. Due to limitations in global position- ing system (GPS) location accuracy, these links are not lane specific; as a result, link data are interpreted as the mean speed across all lanes. The full data set obtained for this case study contained 180,061 links representing 14,402 roadway miles over the 1,000-square-mile area enclosed by the bounding box (Figure C.178). Data The three probe vehicle–based data sets that contributed to this case study are listed in Table C.66. Each of these three data sets is based on the same original collection of probe vehicle runs collected in the raw data set. The raw data contain unaltered GPS sentences, as originally recorded by the probe vehicles, and were not obtained by the project team. The second data set, called gridded GPS track data (GGD), contains Figure C.178. Site map with data bounding box.

482 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY most or all raw GPS points, matched to ALK’s link-based network configuration. The third data set, called one monument, is an aggregation of the GGD data set. The one monument data contain a more manageable number of speed measurements that cor- respond with the vehicle’s speed and time stamp at the midpoint of each ALK link. TABLE C.66. PROBE VEHICLE DATA SETS Data Set Description No. of Data Points Uncompressed Size Raw Untouched NMEA sentences 36,683,340 (or more) 4.19 GB (or more) GGD Data points reformatted and identified by ALK link 36,683,340 4.19 GB One monument One vehicle measurement per link midpoint 4,282,136 0.48 GB Note: NMEA = National Marine Electronics Association. The raw GPS data set was stored in the standard NMEA sentences originally recorded by the GPS device in the probe vehicle. A different file is typically created for each vehicle trip. The primary GPS data elements of interest for traffic analysis are location (latitude and longitude), speed, heading, and time stamp. GPS sampling fre- quency affects the temporal resolution of all three probe data sets. The data analyzed by the research team was based on GPS data recorded every 3 seconds. The GGD data set was produced through the cleaning and map-matching routines carried out on the raw GPS data; it contains speeds on links and travel times between links, which are organized into trips. This data set is contained in a single file with entries that include time stamp, link ID, position along the link, speed, trip ID, and sequence within the trip. The organization into trips follows that of the GPS files. A gap greater than 4 minutes in a single GPS file is interpreted as the boundary between two trips made by the same vehicle. This preserves continuity in the data and ensures that only travel times (and not trip times) are represented. In this data set, each point is also map matched to a single ALK link and includes a value indicating how far along that link the point is located. The one monument data set aggregates each trip’s data points into single time- stamped speed values for each ALK link that the trip traverses. This is a subset of the GGD data set. When there are multiple observations for the same link within a single trip, only the data point closest to the midpoint of the link is retained, and its time stamp is interpolated to the time the vehicle likely passed the link’s center point. The speed values in this data set are computed based on the total travel time along the link and the link’s length, which effectively evens out the instantaneous speeds over the link. This data set aids travel time analysis by greatly reducing the number of data points required to compute travel times over road segments for a single trip. The ALK links themselves are defined in three configuration files called links, nodes, and shapes. Each link lies within a cell of a rectangular grid and is uniquely identified by the combination of its grid ID and link ID. Links are bounded on either end by nodes whose coordinates are defined in the nodes file. The geometry of each link can be drawn from coordinates found in the shapes file. Additionally, links are labeled with a class identifier that corresponds to one of the following road types:

483 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Interstate, Interstate without ramps, divided road, primary road, ferry, secondary road, ramps, and local road. Local roads make up the vast majority of the links in the network configuration. Data Management Analysis of this probe data set was primarily carried out on the aggregated trip-link speeds present in the one monument data. The aggregated speeds in this data set are similar in format to the traffic message channel path–based data analyzed in the Atlanta case study. The Atlanta case study compared GPS trace data with video detector data, but only after the data had been aggregated into link-based speed measurements. The complete GPS trace data in the GGD data set are the only data from any of the five case studies that trace the entire path of vehicle trips. Even though these GPS trace data are not analyzed directly in this case study, they deepen the analysis done on the one monument data to enable sophisticated computations, as described in the use case section below. The data were provided by ALK in flat files and managed by the research team manually through custom processing routines run offline. To focus on issues related to probe vehicle data processing, no additional data sources were considered. METHODOLOGY Overview The central goal of this use case was to advance the understanding of practical tech- niques for working with probe vehicle data in travel time reliability monitoring appli- cations. To accomplish this, the research team analyzed a collection of probe vehicle data. This section first describes the study route, illustrating how the probe data set was assembled and processed for the route and explaining the implications of data density on the resulting analysis. The section then describes methods for identifying and visualizing congestion and travel time reliability from sparse probe data. Finally, the section lays the groundwork for computing route-level TT-PDFs, a methodological issue that is explored in depth in the use case. Site Description The methodological steps in this section were conducted on a 17.4-mile route in New York City that travels from the densely residential Boerum Hill neighborhood of Brooklyn to JFK International Airport. This route was chosen because it lies within a well-connected roadway network over which several alternate routes can be taken. This makes for a more interesting analysis, as drivers in the area likely base some of their travel decisions on the travel time and travel time reliability of this particular route. The route is also varied, traversing a series of arterials and three major freeways between Boerum Hill and JFK International Airport. The route begins at Atlantic Avenue and Flatbush Avenue, then travels over the Brooklyn–Queens Expressway (I-278 eastbound), the Queens– Midtown Expressway (I-495 eastbound), and the Van Wyck Expressway (I-678 southbound), ending near JFK International Airport’s cell phone parking lot. This route is shown in Figure C.179, with the origin identified in white and the destination in black.

484 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The first step in route analysis was to determine which ALK links make up the study route, which was done by visually identifying the ALK grids through which the route travels. From there, it was possible to map all Interstate-class links contained in the relevant grids and visually identify the links that make up the route. After complet- ing this process, the team found the 17.4-mile-long route to be made up of 102 ALK links. The grid IDs and link IDs of these links were labeled with their order within the route and stored. After route links have been identified, it is possible to calculate the number of data points recorded for each link. Probe data are sparser during times when fewer vehicles are traveling (i.e., at night), making certain types of time-of-day analysis more difficult. As each data point contains a time stamp, counts of data points by link and time of day can be obtained directly from the data. The time stamps must be converted from coor- dinated universal time to local time (here, eastern standard time), with adjustments made for daylight savings time, before the counts can be interpreted. Data availability on the study route during the 11-year period of coverage is dis- played in Figure C.180. As the figure shows, data coverage over the route is generally quite sparse, with the most densely covered link–hour containing 71 points. Sparse Figure C.179. Study route.

485 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY data coverage means that analysis requiring data partitioning, such as comparing weekday and weekend speeds, will likely not yield rich results. The three freeway seg- ments have the best data coverage; coverage is sparser on the arterials near the origin, the freeway connectors, and the airport roads at the destination. Data coverage is highest in the evenings and around midday. Contributing to the sparseness of the data, no individual vehicle trips traversed the entire route from beginning to end. Methods As there were no travel time records for the entire route, methodologies had to be developed to construct the route travel time distribution piecemeal from the individual link data. The advantage of this approach is that it uses the entirety of the data set, rather than a subset of long trips. Obtaining composite travel time distributions from vehicles that only traveled on a portion of the route is a complex process, primarily be- cause, as this project has shown, travel times on consecutive links often have a strong linear dependence. This linear dependence must be accounted for when combining individual link travel times into an overall route travel time distribution. Accounting for this dependence is a core methodological challenge that is fully explored in the use case section. The research team first approached this complex topic by examining PDFs of speeds on an individual link, the results of which are presented in this section. Figure C.180. Quantity of data analyzed. 12 2014.08.07 L02 14 App. C Part 6 INSERT for 2nd pages.docx [Insert Figure C.180] [caption] Figure C.180. Quantity of data analyzed. <H2>Methods As there were no travel time records for the entire route, methodologies had to be developed to construct the route travel time distribution piecemeal fro the individual link data. The advantage of this approach is that it uses the entirety of the data set, rather than a subset of long trips. Obtaining composite travel time distributions from vehicles that only traveled on a portion 0 0 3 6 9 12 15 6 12 T im e  o f   D a y 18 Data  Count  per  Link  per  Hour Distance  Along  Route  (miles) 24 70 60 50 40 30 20 10 0 I-­278Arterials I-­495 I-­678

486 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY To understand the traffic conditions represented in the data set, time-of-day–based speed distributions on a single link can be plotted. Figure C.181 depicts hourly PDFs of speeds observed on Link 38 in the route (near the I-278/I-495 interchange). From this visual, it is clear that most speeds fall between 45 and 65 mph, with the exception of the p.m. peak. From 2:00 to 7:00 p.m., the speeds appear to be bimodally distrib- uted, with a lower modal speed around 10 mph. With the knowledge that mixed traffic conditions occur during the afternoon period on the 38th link in the route, afternoon speeds along the entire route can be analyzed. Speed measurements on each link during the 3 to 8 p.m. commute period were obtained from the one monument data set. To illustrate speed changes along the route in the afternoon period, the median afternoon speed for each link is plotted, as shown in Figure C.182. Each link has multiple speed measurements over the 11-year study period during these hours, so speeds between the 25th and 75th percentile for each link are shaded in gray to indicate the rough extent of each link’s afternoon speed distribution. Speeds appear to dip in the middle of the freeway segments. Median speeds along the route outside of the afternoon period remain relatively high through- out the freeway segments, indicating afternoon period congestion. Figure C.181. Time-of-day speed distribution on Link 38. 14 2014.08.07 L02 14 App. C Part 6 INSERT for 2nd pages.docx Figure C.181. Time-of-day speed distribution on Link 38. With the knowledge that mixed traffic conditions occur during the afternoon period on the 38th link in the route, afternoon speeds along the entire route can be analyzed. Speed measurements on each link during the 3 to 8 p.m. commute period were obtained from the one monument data set. To illustrate speed changes along the route in the afternoon period, the median afternoon speed for each link is plotted, as shown in Figure C.182. Each link has multiple speed measurements over the 11-year study period during these hours, so speeds between the 25th and 75th percentile for each link are shaded in gray to indicate the rough extent of each link’s afternoon speed distribution. Speeds appear to dip in the middle of the freeway segments. Median speeds along the route outside of the afternoon period remain relatively high throughout the freeway segments, indicating afternoon period congestion. Speed  (mph) Link  38  Time  of  Day  Speed  Distribution Hour  of  Day 0 05152535455565758595105 2 4 6 8 10 12 14 16 18 20 22

487 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Next the team looked at how speeds vary across the route throughout the whole day, again considering the entire speed distribution on each link–hour. Speed measure- ments on each link during each hour of the day were extracted from the one monu- ment data set, and the 25th percentile, median, and 75th percentile speeds for each link–hour were computed. The variation of speeds along the route throughout the day is presented in Figure C.183. Link–hours with no data (mostly at freeway interchanges and toward the end of the route at night) are marked with a speed of zero. The speed data appear to show three triangular regions in the afternoon period of each freeway segment. These triangular regions indicate bottleneck regions of low speeds during the afternoon commute period. Figure C.182. Quartile speeds along route by time of day. 0 0 3 6 9 12 15 6 12 Ti m e of D ay 18 25th Percentile Speeds Distance Along Route (miles) 24 I-278Arterials I-495 I-678 0 3 6 9 12 15 Median Speeds Distance Along Route (miles) I-278Arterials I-495 I-678 90 80 70 60 50 40 30 20 10 0 0 3 6 9 12 15 75th Percentile Speeds Distance Along Route (miles) I-278Arterials I-495 I-678

488 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Results Using the quartile speeds for each link throughout the day, it is possible to simulate trip trajectories along the route for any slice of the speed distribution. This is done by fi rst choosing a virtual trip start time and then moving along the route link by link, simulating the arrival time at the next link based on the speed and length of the current link. The link speeds used to advance this simulation must correspond to the time of day in the virtual vehicle’s trip. Figure C.184 shows the trajectory of trips simulated using afternoon period link mean speeds at 30-minute intervals. This type of time–space contour plot is practical in helping to identify locations or times that experience long travel times and to view how unreliable conditions affect trips at different times of the day. For example, the virtual trip departing at 5 p.m. appears to experience more congestion at the beginning of the I-678 segment than later trips do. This gives it a longer travel time than it would have experienced had it departed 30 minutes later. Figur e C.183. Route speed profi le.

489 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY USE CASE ANALYSIS The single use case evaluated in this case study is a site-specifi c application of the probe data processing and analysis techniques described in the methodology section. The motivation for this use case was to generate and compare travel time distributions along a route at different times of day using only probe data. The methodology sec- tion describes a technique for simulating trips based on probe speed measurements; however, these simulated trips only apply to a particular slice of the speed distribution (such as the median speed). A more complex approach is needed to measure and illus- trate the variation in speeds and travel times on a route at a given time. This use case demonstrates three methods for obtaining route travel time distribu- tions from probe-based speed data. For continuity, analysis is performed on the route described above. The analysis in each of the three methods is performed on the one Figure C.184. Virtual trips simulated over median link speeds.

490 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY monument data set. For this analysis, the most important variables in the data set are time stamp, speed, trip ID, and indexed position within a trip, if any (many trips are made up of a single point on the route). This yields two types of information useful for travel time analysis: (1) individual vehicle time-stamped link speeds and (2) individual vehicle link travel times, as derived from the differences in the time stamps of consecu- tive trip points (for trips with more than one point). The methods differ in how they use these features of the data set to construct TT-PDFs. Method 1 The first method is the only method to use all available data elements in the one monu- ment probe data set to construct the route TT-PDF. This method uses discrete link speeds, as well as trip-based travel times, to construct the travel time distribution in different time periods of the day. Since the data coverage on the arterial links at the beginning of the route is so sparse, analysis is focused on the route from Link 17 to JFK International Airport. The method is divided into two stages: a preparatory stage and a distribution construction stage. Preparatory Stage In the preparatory stage, each link in the route is considered, and trips are identified that began on that link and traveled at least one link downstream on the route. The goal of this step is to calculate a link start point–to–link end point travel time for each multilink trip in the data set. Each one monument data point contains a LinkOffset value that indicates the distance along the link that the speed value was taken (e.g., 0.5 indicates that the data point was taken at the link’s midpoint). This trip travel time cal- culation method uses the data point time stamps to determine the travel time between each trip’s first and last link and the link speed, length, and offset to extend that travel time to the start point of the first link and the end point of the last link. For a trip that travels from Link 1 to Link n, the trip travel time is computed by using Equation C.8: This step results in a set of travel times for each link that measures trips from that link to some downstream link. The travel times were divided by time period (morn- ing, midday, afternoon, and nighttime) and were then assembled into trip travel time distributions for each link and time period. TripTT= Length LinkOffset Speed Timestamp Timestamp Length 1 LinkOffset Speedn n n n 1 1 1 1( ) ( )∗ + − + ∗ − (C.8)

491 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Distribution Construction Stage The distribution construction stage builds up the full travel time distribution along the route link by link in four steps. Each iteration of the steps adds the subsequent down- stream link into the route travel time distribution. The route travel time distribution is initialized as the travel time distribution on the first link on the route, as computed from all data points on the first link. The following four steps are then carried out sequentially down the route for all links: 1. Compute the travel time distribution for the current link using all data points measured on the link. 2. Add the travel time distribution for the current link to the route travel time dis- tribution computed in Step 4 for the upstream link, assuming independence. To add two independent distributions of data, each point of the first data set must be summed with each point of the second data set. If the size of one data set is m and the size of the other is n, the size of the data set resulting from their sum is the product of the two sizes: mn. This is equivalent to convolving the PDFs of the two independent distributions. 3. Obtain the set of travel times computed in the preparatory stage that end at the current link, and merge their adjusted data sets to the data set of the route travel time distribution computed in Step 2. The adjusted data set will have been com- puted in Step 4 for a previous link. 4. For all trips that start at the downstream link, add the route travel time distribu- tion computed in Step 3 to their travel time. This adjusts these travel times such that they represent the travel time distribution between the beginning of the route and the end of the trip. The resulting TT-PDFs computed using this method for four time periods are shown in Figure C.185. The odd multimodal distribution of the 10 p.m. to 12 a.m. travel times is due to a proportionally larger number of trip-based speeds than discrete link speeds at night. At other times of the day, the number of link speeds overwhelms the number of trip-based speeds, smoothing out the effects of individual trips.

492 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Method 2 The second method for computing route TT-PDFs ignores the linear dependence between consecutive links and directly computes the route travel time distribution as if all link travel times were independent. This method is based entirely on directly observed link speeds, discarding the time stamp differences between points in the same trip. It works by simply convolving the distributions of travel times on consecutive links down the route. For example, the frequency distribution of travel times on the first link is added to the frequency distribution of travel times on the second link, and so on until a full travel time distribution for the entire route is obtained. The resulting TT-PDFs computed using this method for four time periods are shown in Figure C.186. This is the simplest route TT-PDF creation method considered in this case study. Here every single measurement is treated as independent of all others, ignoring all trip relationships between points. As in Method 1, travel time distributions are computed for four time periods during the day. With the trip-based travel times discarded, the outlying spikes in the 10 p.m. to 12:00 a.m. travel time distribution are no longer seen. Figure C.185. Route PDF generation using Method 1. Travel Time Distribution: 7am to 9am Travel Time (minutes) 15 20 25 30 35 40 45 50 Fr eq ue nc y 0 50 0 10 00 15 00 Travel Time Distribution: 10pm to 12am Travel Time (minutes) 15 20 25 30 35 40 45 50 Fr eq ue nc y 0 10 00 30 00 20 00 40 00 Travel Time Distribution: 12pm to 2pm Travel Time (minutes) 15 20 25 30 35 40 45 50 Fr eq ue nc y 0 50 0 10 00 15 00 Travel Time Distribution: 5pm to 7pm Travel Time (minutes) Fr eq ue nc y 20 30 40 50 60 70 80 0 50 0 10 00 15 00 Method 1

493 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY The speeds between 5 and 7 p.m. appear to be shifted by roughly the same amount as seen in Method 1. The 7 to 9 a.m. time period and the 12:00 to 2 p.m. time period appear to have similar bimodality to that generated by Method 1. Method 3 The third method developed for constructing route TT-PDFs computes and leverages the correlation between speeds on consecutive links within a trip. This method, which only requires speeds measured from trips that traveled on multiple links, uses the few- est one monument data elements. It builds route TT-PDFs by simulating trips along a route, taking into account the measured data on each link, as well as synthesized trips based on observed data and computed incident matrices. It builds up travel times link by link. As with the previous two methods, due to the lack of data on the arterials near the beginning of the route, the team began the route on Link 17. The method begins by computing incidence matrices for each pair of consecutive links. These incidence matrices describe the correlation in speeds between the two links. To construct the incidence matrices, evenly spaced bins are defined to group the speed Figure C.186. Route PDF generation using Method 2. Travel Time Distribution: 7am to 9am Fr eq ue nc y 0 50 0 10 00 15 00 20 00 Travel Time (minutes) 15 20 25 30 35 40 45 50 Travel Time Distribution: 10pm to 12am Fr eq ue nc y 0 20 00 40 00 60 00 Travel Time (minutes) 15 20 25 30 35 40 45 50 Travel Time Distribution: 12pm to 2pm Fr eq ue nc y 0 50 0 10 00 20 00 15 00 Travel Time (minutes) 15 20 25 30 35 40 45 50 Travel Time Distribution: 5pm to 7pm Fr eq ue nc y 0 50 0 15 00 20 00 25 00 Travel Time (minutes) 20 30 40 50 60 70 80 Method 2

494 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY data for each link. In this use case, 10 bins between 0 and 80 mph (each bin is 8 mph wide) were used. A two-dimensional incidence matrix is created for each pair of consec- utive links to capture the nature of the speed relationship between the two links within different bins. Speed bins on Link 1 are represented in the incidence matrix’s rows, and speed bins on Link 2 are represented in its columns. Because 10 bins were used in this use case, all incidence matrices are 10 × 10. Consider an incidence matrix for two consecutive links: Link 1 and Link 2. The incidence matrix describes the likelihood of a speed on Link 2 occurring given a speed on Link 1. The entry in the (m, n) cell of this incidence matrix contains the quantity of Link 2 speed measurements that fell into the nth bin when the Link 1 speed came from the mth bin. The counts in the cells of the incidence matrix become synthesized trip points for each observed data point on Link 1. For example, suppose a single Link 17 speed observation falls within the fourth speed bin, and the incidence matrix for Links 17 and 18 lists two speeds in the fifth bin and three speeds in the fourth bin on Link 18 following a 4th bin speed on Link 17. This single observed speed on Link 17 has resulted in five pairs of speeds across Links 17 and 18 (two between it and the fifth speed bin, and three between it and the fourth speed bin). These five speed pairs can be thought of as synthesized trips between the two links because they capture the correlation between speeds on the two consecutive links while using the observed data. This process is repeated for each observed speed on Link 17, and all synthesized trips over the first two links are recorded. To continue the process on the next pair, Links 18 and 19, the speeds on Link 18 resulting from the incidence matrix technique described above (there were five speeds in the example) are combined with the directly observed speeds on Link 18. This col- lection of speeds is then subjected to the same incidence matrix procedure to obtain synthesized Link 19 speeds for each speed on Link 18 that was either directly observed or synthesized from Link 17’s directly observed speeds. When the final link in the route is reached in this way, the speed on each link in each synthesized trip can be used to obtain its travel time. The distributions of these travel times calculated at different times of day are shown in Figure C.187. Since each preceding link speed generates multiple speeds for the following link, this method generates a large amount of data very quickly. To keep the travel time data set manageable, the growing data set of synthesized speeds was periodically reduced to a random sample whenever it grew too large to efficiently process. The multimodal pattern seen in the 10 p.m. to 12 a.m. data from Method 1 is even more pronounced in travel times synthesized with this method. Both of these methods leverage individual trip travel times across multiple links. The low quantity of data at night exaggerates the influence of individual trips on the data, creating these spikes. This method produces very narrow travel time distributions that are offset slightly from those generated by the other two methods. Here it can be seen that travel times during the morning and midday time periods are faster by 5 minutes compared with the other methods, with dramatically fewer long travel times. The 5 to 7 p.m. travel time distribution is again the most widely distributed, but travel times are shifted to the right (slower) by 10 minutes compared with the results from Methods 1 and 2.

495 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Conclusions Each of the three methods presented for assembling route TT-PDFs from probe vehicle data are enabled by the techniques introduced in the methodology section. Constructing these PDFs requires identification of the data points corresponding to a particular route, separation of data by time of day when possible, and an understanding of the relation- ships between link speed distributions and route speed distributions. These tools, com- bined with the research team’s findings related to speed correlations between consecutive links within a trip, led to the development of these three PDF-generation methods. Methods 1 and 2 compared well with each other, but the results of Method 3 dif- fered in terms of travel time magnitude and variability. The differences in the shapes of the distributions across methods, particularly in the nighttime period when data were sparse, demonstrate the strong influence of the correlations of speeds along con- secutive links within a route. With most of the nighttime coverage made by full trips composed of two or more points, the time stamp–based travel times dominated the nighttime data set. The modes of these unusually shaped distributions reveal indi- vidual trips in the data. Figure C.187. Route PDF generation using Method 3. Method 3 Travel Time Distribution: 7am to 9am Travel Time (minutes) Fr eq ue nc y 0 20 00 40 00 60 00 80 00 15 20 25 30 35 40 45 50 Travel Time Distribution: 12pm to 2pm Travel Time (minutes) Fr eq ue nc y 0 10 00 20 00 30 00 15 20 25 30 35 40 45 50 Travel Time Distribution: 5pm to 7pm Travel Time (minutes) Fr eq ue nc y 0 50 0 10 00 15 00 20 30 40 50 60 70 80 Travel Time Distribution: 10pm to 12am Travel Time (minutes) Fr eq ue nc y 0 50 0 15 00 10 00 20 00 15 20 25 30 35 40 45 50

496 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY Although results were not validated with a different data source, the PDFs gen- erated using Methods 1 and 2 appear to match expectations. An online trip planner estimates the travel time on this route to be 28 minutes, which generally agrees with the distributions seen here. They resemble typical route travel time distributions, even though no trips were observed traveling along the entire route. It is possible to extract quantitative travel time reliability metrics from the time- of-day travel time distributions compiled and presented in this section. Knowing the distribution of travel times on a route enables the data user to compute any reliability metric, such as planning time or buffer time. LESSONS LEARNED Overview This case study demonstrates that it is possible to obtain trip reliability measures based on probe data, even when that probe data are sparse. The travel time distribution for the route is constructed from vehicles that only travel on a portion of the route, and it takes into account the linear dependence of speeds on consecutive links. This case study also contributes techniques for creating time–space contour plots based on probe speeds. These contour plots can be made to represent any measured speed percentile, so that contours for the worst observed conditions can be compared with typical conditions. Probe Data Characteristics Much of this case study effort focused on understanding the aggregation steps used to convert data from GPS receivers into link-based speeds. Understanding the way raw GPS data are processed and aggregated is vital for proper interpretation of data ele- ments. It also enables all components of the data set to be used to increase the richness of TT-PDFs. As probe data find wider adoption for travel time monitoring, it is important for users to understand that data from these sources are still sparse. This sparseness necessitates complex processes for determining travel time distributions on routes of interest. When GPS and other technologies reach a certain penetration rate in the population and more vehicles traverse entire routes, the assemblage of route travel time distributions will be simplified. However, the construction of well-formed PDFs requires that every element in the data set (from speeds on single links to complex travel times across multiple links) should be used to generate the distribution. Probe data sparseness also increases the minimum level of temporal aggregation that can be supported by the data set. For example, in this case study, the quantity of data was not sufficient to measure route travel time reliability at a granularity of 5 minutes, which was the common reporting unit for case studies that relied on loop detector data. Instead, aggregation had to be done at the peak period, multihour level. Additionally, in this case study, weekend trips could not be removed from the data set, as there were not sufficient weekday data points to generate full PDFs. Finally, in order to generate the presented results, all data points collected over the 11-year span of the data set had to be used. In practice, this long time frame does not allow for

497 GUIDE TO ESTABLISHING MONITORING PROGRAMS FOR TRAVEL TIME RELIABILITY trend analysis. Transportation planners and operators often require an understanding of how route travel times vary on a day-by-day, week-by-week, and month-by-month basis. One probe data characteristic that counteracts the sparseness problem is that data coverage is highest during the time periods when and at the locations where the most vehicles are traveling on the roadway. These are also the time periods and locations at which reliability monitoring is the most critical. As probe technologies become more common in vehicles, the availability of data points and route-level trip data will natu- rally increase, resulting in richer data sets that can be analyzed at a finer-grained inter- val than was possible in this case study. REFERENCES 1. Population, Housing Units, Area, and Density: 2000 - State – County/County Equivalent Census 2000 Summary File 1 (SF 1) 100-Percent Data. United States Census Bureau, U.S. Department of Commerce, 2000. http://factfinder2.census. gov/faces/tableservices/jsf/pages/productview.xhtml?pid=DEC_00_SF1_GCTPH1. ST05&prodType=table. Accessed Feb. 22, 2012. 2. ACS S0802: Means of Transportation to Work by Selected Characteristics, 1-Year Estimates. American Community Survey, United States Census Bureau, U.S. Department of Commerce, 2010. Retrieved February 22, 2012, from http:// factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml.

TRB’s second Strategic Highway Research Program (SHRP 2) Report S2-L02-RR-2: Guide to Establishing Monitoring Programs for Travel Time Reliability describes how to develop and use a Travel Time Reliability Monitoring System (TTRMS).

The guide also explains why such a system is useful, how it helps agencies do a better job of managing network performance, and what a traffic management center (TMC) team needs to do to put a TTRMS in place.

SHRP 2 Reliability Project L02 has also released Establishing Monitoring Programs for Travel Time Reliability , that describes what reliability is and how it can be measured and analyzed, and Handbook for Communicating Travel Time Reliability Through Graphics and Tables , offers ideas on how to communicate reliability information in graphical and tabular form.

A related paper in TRB’s Transportation Research Record, “ Synthesizing Route Travel Time Distributions from Segment Travel Time Distributions ,” examines a way to synthesize route travel time probability density functions (PDFs) on the basis of segment-level PDFs in Sacramento, California.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

  • Digital Marketing
  • Apps & Website

Expand My Business

Nested Loops in C Programming: Examples and Use Cases

Nested Loops in C Programming: Examples and Use Cases

  • Key Takeaways

Clearly define loop conditions to prevent infinite loops and ensure loops terminate as expected.

Always update loop variables correctly within the loop to avoid running into infinite loops.

Carefully check and test loop boundaries to avoid off-by-one errors, which can lead to incorrect results or crashes.

Leverage debugging tools to step through your code and monitor loop behavior, making it easier to spot and fix errors.

Break down complex nested loops into smaller, manageable functions to simplify debugging and enhance code readability.

Develop comprehensive test cases, especially for edge conditions, to catch and fix errors early in the development process.

Programming is a meticulous task, and even seasoned developers can fall prey to common errors like infinite loops and off-by-one mistakes. These errors can lead to significant issues, from performance problems to complete system failures. How can you ensure that your loops run efficiently and correctly, avoiding these common pitfalls? In this article, we’ll explore these typical mistakes and provide practical tips for debugging and preventing them, helping you write more reliable code.

Introduction to Nested Loops in C

Nested loops are a fundamental concept in C programming that involve placing one loop inside another. This technique allows developers to execute a set of instructions repeatedly for each iteration of an outer loop. The outer loop runs through its sequence, and for each iteration, the inner loop completes its entire sequence. This process continues until both loops have executed their respective iterations. Understanding and utilizing nested loops effectively can lead to efficient and powerful code, especially when dealing with multi-dimensional data structures or complex algorithms.

  • What are Nested Loops?

Nested loops occur when a loop is placed inside another loop. This means that for each iteration of the outer loop, the inner loop will execute all its iterations. For example, if the outer loop runs ten times and the inner loop runs five times, the inner loop will execute a total of fifty times. This structure is useful for tasks that require multiple levels of repetition, such as iterating through multi-dimensional arrays or generating combinations of elements.

  • Why Use Nested Loops?

Nested loops are particularly useful when dealing with problems that require multi-dimensional iteration. For instance, in matrix operations, nested loops allow for the traversal of rows and columns efficiently. They are also essential in scenarios where you need to perform repetitive tasks within another set of repetitive tasks, such as processing items in a nested list or array. The ability to nest loops provides a powerful tool for managing complex data structures and implementing algorithms that require multiple layers of iteration.

  • Basic Structure of Nested Loops

The basic structure of nested loops in C involves placing one loop inside another. The outer loop typically controls the number of iterations of the inner loop. Here’s a simple example:

In this example, the outer loop runs three times, and for each iteration of the outer loop, the inner loop runs two times. This results in the inner loop executing a total of six times. The structure of nested loops can be extended to more levels, allowing for even more complex iterations, depending on the requirements of the program.

Nested loops are a versatile tool in C programming, enabling the handling of complex tasks with multiple levels of iteration. By mastering nested loops, programmers can write more efficient and effective code for a wide range of applications.

Understanding Nested Loop Control Flow

  • Loop Structure and Iteration

Nested loops involve placing one loop inside another. The outer loop controls the number of complete iterations of the inner loop. Each time the outer loop executes once, the inner loop runs through all its iterations. This structure is common in scenarios where multiple levels of iteration are necessary, such as in multi-dimensional array processing or generating complex patterns.

  • Code Example (Simple Nested for Loops)

Here’s a basic example of nested loops in C:

State of Technology 2024

Humanity's Quantum Leap Forward

Explore 'State of Technology 2024' for strategic insights into 7 emerging technologies reshaping 10 critical industries. Dive into sector-wide transformations and global tech dynamics, offering critical analysis for tech leaders and enthusiasts alike, on how to navigate the future's technology landscape.

  • Data and AI Services

With a Foundation of 1,900+ Projects, Offered by Over 1500+ Digital Agencies, EMB Excels in offering Advanced AI Solutions. Our expertise lies in providing a comprehensive suite of services designed to build your robust and scalable digital transformation journey.

This code features two for loops. The outer loop, controlled by i , runs three times. Each iteration of the outer loop triggers the inner loop, controlled by j , which also runs three times.

  • Explanation of Loop Progression

In the provided example, the outer loop starts with i = 1 . For this value of i , the inner loop runs from j = 1 to j = 3 , printing the values of i and j each time. Once the inner loop completes its three iterations, the outer loop increments i to 2, and the inner loop runs again from j = 1 to j = 3 . This process continues until the outer loop completes all its iterations. This control flow ensures that every combination of i and j within the specified range is printed, demonstrating the comprehensive iteration mechanism of nested loops.

Practical Examples of Nested Loops in C

  • Example 1: Multiplication Table

Nested loops are essential for generating multiplication tables in C programming. The outer loop iterates through the rows, representing the multiplicand. The inner loop runs through the columns, representing the multiplier. Each iteration of the inner loop computes the product of the current row and column indices, storing or displaying the result. This approach efficiently produces a complete multiplication table, demonstrating the power and simplicity of nested loops in handling repetitive tasks in a structured manner.

  • Example 2: Matrix Operations

Matrix operations, such as addition, subtraction, and multiplication, heavily rely on nested loops. In C, performing matrix multiplication involves three nested loops. The outer loop iterates over the rows of the first matrix, the middle loop iterates over the columns of the second matrix, and the innermost loop performs the multiplication and summation of corresponding elements. This structure allows for precise element-wise operations, showcasing the versatility of nested loops in handling complex mathematical computations.

  • Example 3: Pattern Printing (e.g., Pyramids, Diamonds)

Pattern printing, such as creating pyramids or diamond shapes, is another common application of nested loops in C programming. The outer loop controls the number of rows, while the inner loop manages the number of spaces and stars printed on each row. By adjusting the conditions and increments of these loops, various intricate patterns can be generated. This example illustrates how nested loops can be used creatively to produce visually appealing outputs, reinforcing their utility in both academic exercises and real-world applications.

Use Cases of Nested Loops in Real-World Applications

  • Sorting Algorithms (e.g., Bubble Sort, Selection Sort)

Nested loops play a crucial role in sorting algorithms like Bubble Sort and Selection Sort. In Bubble Sort, the outer loop iterates through the entire list, while the inner loop compares adjacent elements, swapping them if necessary. This process continues until the list is sorted. Selection Sort, on the other hand, uses nested loops to find the minimum element in the unsorted portion of the list and swap it with the first unsorted element. These sorting algorithms demonstrate the importance of nested loops in organizing data efficiently.

  • Searching Algorithms

Searching algorithms often utilize nested loops to locate specific elements within a dataset. For instance, in a linear search through a multi-dimensional array, the outer loop iterates through rows, while the inner loop searches through columns. This nested structure allows the algorithm to traverse and examine each element systematically. Similarly, in more complex searching algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS) in graph theory, nested loops are employed to explore nodes and edges, ensuring thorough examination and accurate results.

  • Data Analysis

Nested loops are indispensable in data analysis tasks, particularly when dealing with multi-dimensional data structures. For example, in matrix multiplication, the outer loop iterates over rows of the first matrix, the middle loop over columns of the second matrix, and the innermost loop performs the multiplication and addition of corresponding elements. This nested loop structure enables the efficient processing of large datasets, ensuring accurate and timely analysis. Additionally, nested loops are used in statistical analysis, such as calculating correlation coefficients or performing regression analysis on multi-variable datasets.

  • Data Processing

In data processing, nested loops are often employed to transform and manipulate complex data structures. For instance, in image processing, nested loops can iterate over pixels in a two-dimensional grid, applying filters or transformations to enhance or modify the image. Similarly, in text processing, nested loops can traverse through characters and words in a document, performing tasks like tokenization, stemming, or sentiment analysis. These applications highlight the versatility of nested loops in handling diverse data processing tasks, making them a fundamental tool in the toolkit of data scientists and engineers.

Performance Optimization Techniques

  • Identifying Bottlenecks in Nested Loops

Nested loops can significantly impact the performance of a program, especially when dealing with large datasets. Identifying bottlenecks in nested loops involves understanding the complexity and execution flow of the loops. Profiling tools can help pinpoint which parts of the code are consuming the most resources. By analyzing the loop’s execution time, memory usage, and iterations, developers can identify inefficient operations and optimize them. This process often involves breaking down the nested loops into simpler components and examining each part for potential improvements.

  • Loop Unrolling and Other Optimization Techniques

Loop unrolling is a common optimization technique that reduces the overhead associated with loop control. By increasing the number of operations within a single iteration, loop unrolling decreases the number of iterations needed, which can lead to significant performance gains. Other optimization techniques include minimizing the use of expensive operations within loops, using efficient data structures, and parallelizing the loop’s execution where possible. These techniques aim to reduce the overall computational load and improve the program’s runtime efficiency.

  • Real-World Examples of Optimized Nested Loops

Real-world examples of optimized nested loops can be found in various applications, such as image processing , scientific computing, and database querying. For instance, in image processing, optimizing nested loops can accelerate tasks like convolution and filtering. By applying techniques like loop unrolling and memory access optimization, developers can achieve faster processing times. Similarly, in scientific computing, optimized nested loops can enhance the performance of simulations and numerical computations. These examples demonstrate how targeted optimization techniques can lead to substantial improvements in real-world applications.

Common Mistakes and How to Avoid Them

  • Infinite Loops

Infinite loops are a common mistake in programming where a loop runs indefinitely, causing the program to freeze or crash. This typically happens due to incorrect loop conditions or failure to update loop variables. To avoid infinite loops, ensure that your loop conditions are properly defined and that loop variables are updated correctly within the loop. Use debugging tools to step through your code and verify the loop’s behavior. Regularly test your code to catch any infinite loops early in the development process.

  • Off-by-One Errors

Off-by-one errors occur when a loop iterates one time too many or one time too few, usually due to incorrect initialization or termination conditions. These errors can lead to incorrect results or program crashes. To avoid off-by-one errors, carefully check the loop boundaries and ensure that they match the intended number of iterations. Use clear and consistent indexing practices and leverage tools like debuggers to monitor the loop’s execution. Writing test cases that cover edge conditions can also help catch these errors.

  • Tips for Debugging Nested Loops

Nested loops can be particularly challenging to debug due to their complexity and potential for multiple layers of errors. To effectively debug nested loops, start by isolating the outer loop and verifying its correctness before moving to the inner loops. Use print statements or a debugger to track the values of loop variables at each level. Simplify the loops by breaking them into smaller functions if possible. Additionally, ensure that the termination conditions for each loop are correctly defined and tested.

By understanding and addressing these common mistakes, you can write more reliable and efficient code. Regular testing, careful attention to loop conditions, and effective debugging strategies are key to avoiding these pitfalls in your programming projects.

Avoiding common mistakes in programming, such as infinite loops and off-by-one errors, is essential for developing reliable and efficient software. By carefully defining loop conditions, updating loop variables correctly, and thoroughly testing your code, you can minimize these errors. Debugging nested loops requires a systematic approach, isolating each loop and using tools to track variable values. With these strategies, you can improve your debugging skills and ensure your code runs smoothly. Implementing these best practices will lead to more robust and maintainable programs.

  • Q: What are nested loops in Python?

Nested loops in Python involve placing one loop inside another. This allows for more complex iteration, such as iterating over multi-dimensional arrays or creating patterns.

  • Q: How do you write a nested loop in Python?

A nested loop in Python can be written using the for or while loops, where an inner loop is placed within the body of an outer loop, executing the inner loop completely for each iteration of the outer loop.

  • Q: What are some common use cases for nested loops in Python?

Nested loops in Python are commonly used for tasks like matrix operations, pattern printing, and working with multi-dimensional data structures like lists of lists.

  • Q: What are nested loops in Java?

Nested loops in Java involve placing one loop inside another. They are used to perform complex iterations, such as traversing two-dimensional arrays or generating pattern-based outputs.

  • Q: How do you write a nested loop in Java?

In Java, a nested loop is written by placing one for, while, or do-while loop inside the body of another loop, enabling the inner loop to execute completely for each iteration of the outer loop.

  • Q: What are common use cases for nested loops in Java?

Nested loops in Java are used in algorithms like bubble sort, matrix manipulation, and pattern printing, where multiple levels of iteration are necessary.

favicon

Related Post

Network application: understanding the basics and importance, what is system integration: a comprehensive overview, understanding device fingerprinting: how it works and why it matters, the rise of ai in india: a comprehensive overview, patent analytics: a comprehensive guide for startups, micromobility trends: what’s new in 2024, table of contents.

Expand My Business is Asia's largest marketplace platform which helps you find various IT Services like Web and App Development, Digital Marketing Services and all others.

  • IT Staff Augmentation
  • Data & AI
  • E-commerce Development

Article Categories

  • Technology 698
  • Business 329
  • Digital Marketing 302
  • Social Media Marketing 129
  • E-Commerce 129
  • Website Development 108
  • Software 104

Sitemap / Glossary

Copyright © 2024 Mantarav Private Limited. All Rights Reserved.

expand my business

  • Privacy Overview
  • Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Learn to Code, Prepare for Interviews, and Get Hired

01 Career Opportunities

  • Top 50 Mostly Asked C Interview Questions and Answers

02 Beginner

  • If Statement in C
  • Understanding do...while loop in C
  • Understanding for loop in C
  • if else if statements in C Programming
  • If...else statement in C Programming
  • Understanding realloc() function in C
  • Understanding While loop in C
  • Why C is called middle level language?
  • Beginner's Guide to C Programming
  • First C program and Its Syntax
  • Escape Sequences and Comments in C
  • Keywords in C: List of Keywords
  • Identifiers in C: Types of Identifiers
  • Data Types in C Programming - A Beginner Guide with examples
  • Variables in C Programming - Types of Variables in C ( With Examples )
  • 10 Reasons Why You Should Learn C
  • Boolean and Static in C Programming With Examples ( Full Guide )
  • Operators in C: Types of Operators
  • Bitwise Operators in C: AND, OR, XOR, Shift & Complement
  • Expressions in C Programming - Types of Expressions in C ( With Examples )
  • Conditional Statements in C: if, if..else, Nested if
  • Switch Statement in C: Syntax and Examples
  • Ternary Operator in C: Ternary Operator vs. if...else Statement
  • Loop in C with Examples: For, While, Do..While Loops
  • Nested Loops in C - Types of Expressions in C ( With Examples )
  • Infinite Loops in C: Types of Infinite Loops
  • Jump Statements in C: break, continue, goto, return
  • Continue Statement in C: What is Break & Continue Statement in C with Example

03 Intermediate

  • Constants in C language

Getting Started with Data Structures in C

  • Functions in C Programming
  • Call by Value and Call by Reference in C
  • Recursion in C: Types, its Working and Examples
  • Storage Classes in C: Auto, Extern, Static, Register
  • Arrays in C Programming: Operations on Arrays
  • Strings in C with Examples: String Functions

04 Advanced

  • How to Dynamically Allocate Memory using calloc() in C?
  • How to Dynamically Allocate Memory using malloc() in C?
  • Pointers in C: Types of Pointers
  • Multidimensional Arrays in C: 2D and 3D Arrays
  • Dynamic Memory Allocation in C: Malloc(), Calloc(), Realloc(), Free()

05 Training Programs

  • Java Programming Course
  • C++ Programming Course
  • C Programming Course
  • Data Structures and Algorithms Training
  • Getting Started With Data..

Getting Started with Data Structures in C

C Programming For Beginners Free Course

What are data structures in c.

Have you begun to learn C Programming? Are you aware of the data structures used in C language? Data Structures in C provide a method for storing and organizing data in the computer memory. A data structure is a fundamental building block for all critical operations performed on data. Effective utilization of the data structures leads to program efficiency.

In this C tutorial , we'll delve deep into the data structures used in the C language. We'll understand various types of data structures with examples. At the end of the tutorial, you'll differentiate different data structures based on their characteristics. Let's get started.

Types of Data Structures in C

Types of Data Structures in C

There are two types of arrays in the C language:

Types of Arrays in C

  • Single-dimensional arrays: Here the elements are stored in a single row in contiguous memory locations. They are also known as 1D arrays.

2. Linked Lists

A Linked List is a linear data structure consisting of a series of connected nodes randomly stored in the memory. Here, each node consists of two parts, the first part is the data and the second part contains the pointer to the address of the next node. The pointer of the last node of the linked list consists of a null pointer, as it points to nothing. The elements are stored dynamically in the linked list.

Representation of Linked Lists

There are two ways to represent a graph in C:

Example of Adjacency Matrix Representation of Graph in C

The above program demonstrates the creation of a graph using an adjacency matrix. First of all, we initialize a matrix of size 10. Then we define a function to add an edge to the graph. At last, we print the adjacency matrix.

Example of Adjacency List Representation of Graph in C

The above code defines structures for nodes and lists. The createGraph() function initializes the graph, addEdge() adds edges between vertices and printGraph() displays the adjacency list of each vertex.

Tree data structure forms a hierarchy containing a collection of nodes such that each node of the tree stores a value and a list of references to other nodes (the "children"). The topmost node of the tree is called the root node from which the tree originates, and the nodes below it are called the child nodes.

Tree in Data Structures

Syntax to Declare a Tree in C

There are some popular types of trees in data structures. Let's look at some of them

Binary Tree Representation in Data Structures

Choosing the Right Data Structure in C

  • Understand Requirements: Start by thoroughly understanding the requirements of your application like the type of data to be stored, the frequency and nature of data operations, memory constraints, and expected performance metrics.
  • Analyze Operations: Take a proper analysis of the operations that need to be performed on the data, including searching, sorting, updating, and traversing.
  • Consider Time and Space Complexity: Evaluate the time and space complexity of the type of operations for various data structures.
  • Size of Data Set: Different data structures and algorithms have different performance characteristics when operating on different size data sets.

Hence, in the above article, we saw all the prominent data structures in C with examples. We saw the two broad classifications of data structures as primitive and non-primitive. We got into the depth of linear and non-linear non-primitive data structures. Enroll in our C Programming Course to learn C programming in complete detail.

Q1. What are data structures in C?

Q2. what is the function of data structures in c, live classes schedule.

Can't find convenient schedule? Let us know

About Author

Author image

She is passionate about different technologies like JavaScript, React, HTML, CSS, Node.js etc. and likes to share knowledge with the developer community. She holds strong learning skills in keeping herself updated with the changing technologies in her area as well as other technologies like Core Java, Python and Cloud.

Java Programming Course

  • 22+ Video Courses
  • 800+ Hands-On Labs
  • 400+ Quick Notes
  • 55+ Skill Tests
  • 45+ Interview Q&A Courses
  • 10+ Real-world Projects
  • Career Coaching Sessions
  • Email Support

We use cookies to make interactions with our websites and services easy and meaningful. Please read our Privacy Policy for more details.

  • Machine Learning
  • Google Analytics
  • Artificial Intelligence
  • Financial Accounting
  • Illustrator
  • Digital Art
  • After Effects
  • Operating System
  • Cloud Computing
  • Influencer Marketing
  • Email Marketing
  • Digital Marketing
  • Content Marketing
  • Affiliate Marketing

Logo

C++ is a high-performance and general-purpose programming language that has been widely used in various industries and research fields. It is a compiled language that allows developers to create efficient and robust applications for different platforms. With its unique features like memory management, object-oriented programming, and generic programming, C++ has been the go-to language for many industries and research domains. In this blog, we will discuss some real-world applications of C++ and their use cases.

Gaming Industry

The gaming industry is one of the biggest consumers of C++. The language’s high-performance and low-level control make it ideal for game development. C++ allows developers to create games that can run on different platforms, including desktop, mobile, and gaming consoles. Some popular games developed using C++ include Call of Duty, Grand Theft Auto, and World of Warcraft.

Operating Systems

C++ has played a significant role in developing operating systems like Windows, MacOS, and Linux. The language’s low-level programming features make it possible to write operating system components that require high performance and efficient memory management. For instance, the Linux operating system kernel is entirely written in C++.

Robotics is another area where C++ is heavily used. The language’s real-time processing capabilities make it ideal for developing robot control software. C++ allows developers to create efficient and reliable control algorithms for robots. Some applications of C++ in robotics include autonomous driving, industrial automation, and smart factories.

C++ is widely used in the finance industry to develop high-performance trading systems, risk management tools, and other financial applications. The language’s speed and low-level control make it ideal for developing real-time trading systems that can handle large volumes of data. Some examples of C++ applications in finance include Bloomberg Terminal and Matlab.

Scientific Computing

C++ is a popular language for scientific computing, particularly in high-performance computing applications. The language’s low-level control and efficient memory management make it ideal for developing scientific simulations, numerical analysis, and other computational applications. Some popular scientific computing libraries written in C++ include Boost, Armadillo, and Eigen.

Image and Video Processing

C++ is widely used in image and video processing applications, particularly in the computer vision field. The language’s low-level control and speed make it ideal for developing real-time computer vision applications. Some examples of C++ applications in image and video processing include OpenCV, VTK, and ITK.

The aerospace industry is another area where C++ is heavily used. The language’s high-performance and low-level control make it ideal for developing flight control software, avionics systems, and other aerospace applications. Some examples of C++ applications in the aerospace industry include NASA’s Mars Rover and SpaceX’s Falcon 9 rocket.

Machine Learning and Artificial Intelligence

C++ is becoming increasingly popular in the field of machine learning and artificial intelligence. The language’s speed and low-level control make it ideal for developing complex algorithms that require high-performance computing. Some popular machine learning frameworks written in C++ include TensorFlow and Caffe.

Electronic Design Automation (EDA)

EDA is another industry where C++ is widely used. The language’s low-level control and efficient memory management make it ideal for developing electronic design automation software. Some examples of C++ applications in EDA include Cadence Design Systems and Synopsys.

Automotive Industry

The automotive industry is another area where C++ is heavily used. The language’s real-time processing capabilities make it ideal for developing automotive software that requires high performance and reliability. Some examples of C++ applications in the automotive industry include engine control units, infotainment systems, and autonomous driving.

Video Game Engines

Apart from developing games, C++ is also heavily used in developing video game engines. Game engines provide game developers with a framework to create games quickly and efficiently. Some popular game engines written in C++ include Unreal Engine and CryEngine.

Banking and Trading Systems

C++ is widely used in developing banking and trading systems. The language’s speed and low-level control make it ideal for developing high-performance trading platforms and risk management tools. Some examples of C++ applications in banking and trading include the New York Stock Exchange and Goldman Sachs.

Computer Graphics

C++ is a popular language for developing computer graphics applications. The language’s low-level control and speed make it ideal for developing real-time rendering engines and graphical user interfaces. Some examples of C++ applications in computer graphics include OpenGL and DirectX.

Medical Imaging

C++ is also widely used in medical imaging applications. The language’s efficient memory management and real-time processing capabilities make it ideal for developing medical imaging software that can handle large volumes of data. Some examples of C++ applications in medical imaging include OsiriX and 3D Slicer.

Conclusion:  C++ is a versatile language that has a wide range of applications in various industries and research domains. Its speed, memory management, and low-level control make it ideal for developing high-performance and reliable applications in fields ranging from gaming to finance to aerospace. As technology continues to evolve, it is likely that C++ will continue to play a significant role in shaping the future of these industries.

Take your C++ skills to the next level with LearnTube’s comprehensive online courses. LearnTube is a safe and reliable and platform that provides a variety of powerful learning tools, including a dedicated app and a WhatsApp bot, to enhance your learning experience. Whether you are a beginner or an advanced learner, LearnTube offers a broad range of C++ courses, from introductory to advanced certifications. Browse our website today to explore the extensive selection of courses available on LearnTube and elevate your C++ proficiency to new heights.

Team LearnTube

More from author

Top aws interview questions & answers 2024, how much will i earn as a flutter developer the ultimate salary guide for 2024, top companies hiring flutter developers in 2024, leave a reply cancel reply.

Save my name, email, and website in this browser for the next time I comment.

Related posts

Common mistakes to avoid in c++ programming.

spot_img

Latest posts

Want to stay up to date with the latest news.

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!

LearnTube by CareerNinja is a platform with 200+ Free courses, 900+ hiring partners that will help you with placement and internship opportunities. So study along with 2Lakh+ users today!

Latest Posts

Most popular, 15 hidden photoshop features you probably didn’t know about, debugging django applications: tips and tricks, advantages and disadvantages of bootstrap, fast access.

  • Business & Finance
  • Information Technology
  • Data Science
  • Design & Creative
  • Programming

© CareerNinja 2022 | All rights reserved | Made with Love

Javatpoint Logo

  • Design Pattern
  • Interview Q

C Control Statements

C functions, c dynamic memory, c structure union, c file handling, c preprocessor, c command line, c programming test, c interview.

JavaTpoint

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Interview Questions

Company Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Artificial Intelligence

AWS Tutorial

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

Machine Learning

DevOps Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Control System

Data Mining Tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

  • C Data Types
  • C Operators
  • C Input and Output
  • C Control Flow
  • C Functions
  • C Preprocessors
  • C File Handling
  • C Cheatsheet
  • C Interview Questions
  • C Hello World Program
  • C Program For Bubble Sort
  • Structure of the C Program
  • Output of C Program | Set 29
  • C++ Programming Examples
  • Output of C Program | Set 21
  • Output of C Program | Set 19
  • Output of C Programs | Set 2
  • Output of C Programs | Set 3
  • Output of C Programs | Set 1
  • Output of C Program | Set 22
  • Output of C Programs | Set 4
  • Output of C Program | Set 18
  • Output of C Program | Set 28
  • Output of C Programs | Set 5
  • Output of C Programs | Set 6
  • Output of C Program | Set 20
  • Output of C Programs | Set 9

C Programs : Practicing and solving problems is the best way to learn anything. Here, we have provided 100+ C programming examples in different categories like basic C Programs, Fibonacci series in C, String, Array, Base Conversion, Pattern Printing, Pointers, etc. These C programs are the most asked interview questions from basic to advanced level.

C Programs

C Program Topics :

  • Basic C Programs
  • Control Flow Programs
  • Pattern Printing Programs
  • Functions Programs
  • Arrays Programs
  • Strings Programs
  • Conversions Programs
  • Pointers Programs
  • Structures and Unions Programs
  • File I/O Programs
  • Date and Time Programs
  • More C Programs

   

C Program – Basic
  • C Program to Print Your Own Name  
  • C Program to Print an Integer Entered By the User
  • C Program to Add Two Numbers
  • C Program to Check Whether a Number is Prime or Not
  • C Program to Multiply two Floating-Point Numbers  
  • C Program to Print the ASCII Value of a Character
  • C Program to Swap Two Numbers
  • C Program to Calculate Fahrenheit to Celsius
  • C Program to Find the Size of int, float, double, and char
  • C Program to Add Two Complex Numbers  
  • C Program to Print Prime Numbers From 1 to N  
  • C Program to Find Simple Interest
  • C Program to Find Compound Interest
  • C Program for Area And Perimeter Of Rectangle  
C Program – Control Flow
  • C Program to Check Whether a Number is Positive, Negative, or Zero
  • C Program to Check Whether Number is Even or Odd
  • C Program to Check Whether a Character is Vowel or Consonant  
  • C Program to Find Largest Number Among Three Numbers
  • C Program to Calculate Sum of Natural Numbers  
  • C Program to Print Alphabets From A to Z Using Loop
  • C Program to Check Leap Year
  • C Program to Find Factorial of a Number
  • C Program to Make a Simple Calculator 
  • C Program to Generate Multiplication Table  
  • C Program to Print Fibonacci Series
  • C Program to Find LCM of Two Numbers
  • C Program to Check Armstrong Number
  • C Program to Display Armstrong Numbers Between 1 to 1000  
  • C Program to Display Armstrong Number Between Two Intervals  
  • C Program to Reverse a Number
  • C Program to Check Whether a Number is a Palindrome or Not  
  • C Program to Display Prime Numbers Between Intervals
  • C Program to Check whether the input number is a Neon Number
  • C Program to Find All Factors of a Natural Number
  • C program to  Sum of Fibonacci Numbers at Even Indexes up to N Terms  
C Program – Pattern Printing
  • C Program to Print Simple Pyramid Pattern 
  • C Program to Print Given Triangle  
  • C Program to Print 180 0 Rotation of Simple Pyramid
  • C Program to Print Inverted Pyramid  
  • C Program to Print Number Pattern
  • C Program to Print Character Pattern  
  • C Program to Print Continuous Character Pattern
  • C Program to Print Hollow Star Pyramid
  • C Program to Print Inverted Hollow Star pyramid  
  • C Program to Print Hollow Star Pyramid in a Diamond Shape
  • C Program to Print Full Diamond Shape Pyramid
  • C Program to Print Pascal’s Pattern Triangle Pyramid  
  • C Program to Print Floyd’s Pattern Triangle Pyramid  
  • C Program to Print Reverse Floyd pattern Triangle Pyramid  
C Program – Functions
  • C Program to Check Prime Number By Creating a Function  
  • C Program to Display Prime Numbers Between Two Intervals Using Functions  
  • C Program to Find All Roots of a Quadratic Equation
  • C Program to Check Whether a Number can be Express as Sum of Two Prime Numbers
  • C Program to Find the Sum of Natural Numbers using Recursion  
  • C Program to Calculate the Factorial of a Number Using Recursion 
  • C Program to Find G.C.D Using Recursion
  • C Program to Reverse a Stack using Recursion
  • C Program to Calculate Power Using Recursion
C Program – Arrays
  • C Program to Print a 2D Array
  • C Program to Find the Largest Element in an Array
  • C Program to Find the Maximum and Minimum in an Array
  • C Program to Search an Element in an Array (Binary search)
  • C Program to Calculate the Average of All the Elements Present in an Array  
  • C Program to Sort an Array using Bubble Sort
  • C Program to Sort an Array using Merge Sort
  • C Program to Sort an Array Using Selection Sort 
  • C Program to Sort an Array Using Insertion Sort
  • C Program to Sort the Elements of an Array in Descending Order
  • C Program to Sort the Elements of an Array in Ascending Order 
  • C Program to Remove Duplicate Elements From a Sorted Array
  • C Program to Merge Two Arrays  
  • C Program to Remove All Occurrences of an Element in an Array  
  • C Program to Find Common Array Elements   
  • C Program to Copy All the Elements of One Array to Another Array
  • C Program For Array Rotation 
  • C Program to Sort the 2D Array Across Rows
  • C Program to Check Whether Two Matrices Are Equal or Not  
  • C Program to Find the Transpose
  • C Program to Find the Determinant of a Matrix
  • C Program to Find the Normal and Trace  
  • C Program to Add Two Matrices
  • C Program to Multiply Two Matrices
  • C Program to Print Boundary Elements of a Matrix  
  • C Program to Rotate Matrix Elements  
  • C Program to Compute the Sum of Diagonals of a Matrix  
  • C Program to Interchange Elements of First and Last in a Matrix Across Rows  
  • C Program to Interchange Elements of First and Last in a Matrix Across Columns  
C Program – Strings
  • C Program to Add or Concatenate Two Strings
  • C Program to Add 2 Binary Strings
  • C Program to Get a Non-Repeating Character From the Given String
  • C Program to check if the string is palindrome or not
  • C Program to Reverse an Array or String
  • C program to Reverse a String Using Recursion
  • C Program to Find the Length of a String
  • C Program to Sort a String
  • C Program to Check For Pangram String
  • C Program to Print the First Letter of Each Word  
  • C Program to Determine the Unicode Code Point at a Given Index  
  • C Program to Remove Leading Zeros  
  • C Program to Compare Two Strings
  • C Program to Compare Two Strings Lexicographically  
  • C Program to Insert a String into Another String
  • C Program to Split a String into a Number of Sub-Strings  
C Program – Conversions
  • C Program For Boolean to String Conversion  
  • C Program For Float to String Conversion
  • C Program For Double to String Conversion  
  • C Program For String to Long Conversion
  • C Program For Long to String Conversion
  • C Program For Int to Char Conversion  
  • C Program For Char to Int Conversion  
  • C Program For Octal to Decimal Conversion  
  • C Program For Decimal to Octal Conversion
  • C Program For Hexadecimal to Decimal Conversion  
  • C Program For Decimal to Hexadecimal Conversion  
  • C Program For Decimal to Binary Conversion 
  • C Program For Binary to Decimal Conversion
C Program – Pointers
  • How to Return a Pointer from a Function in C
  • How to Declare a Two-Dimensional Array of Pointers in C?
  • C Program to Find the Largest Element in an Array using Pointers
  • C Program to Sort an Array using Pointers
  • C Program to Sort a 2D Array of Strings
  • C Program to Check if a String is a Palindrome using Pointers
  • C Program to Create a Copy of a Singly Linked List using Recursion
C Program – Structures and Unions
  • C Program to Store Information of Students Using Structure
  • C Program to Store Student Records as Structures and Sort them by Name
  • C Program to Add N Distances Given in inch-feet System using Structures
  • C Program to Add Two Complex Numbers by Passing Structure to a Function
  • C Program to Store Student Records as Structures and Sort them by Age or ID
  • Read/Write Structure to a File in C 
  • Flexible Array Members in a Structure in C
C Program – File IO
  • C Program to Create a Temporary File
  • C Program to Read/Write Structure to a File
  • C Program to Rename a file
  • C Program to Make a File Read-Only
  • C program to Compare Two Files and Report Mismatches
  • C Program to Copy One File into Another File  
  • C Program to Print all the Patterns that Match Given Pattern From a File
  • C Program to Append the Content of One Text File to Another
  • C Program to Read Content From One File and Write it Into Another File
  • C Program to Read and Print all Files From a Zip File  
C Program – Date and Time
  • C Program to Format time in AM-PM format 
  • C program to Print Digital Clock with the Current Time
  • C Program to Display Dates of Calendar Year in Different Formats
  • C Program to Display Current Date and Time
  • C Program to Maximize Time by Replacing ‘_’ in a Given 24-Hour Format Time
  • C Program to Convert the Local Time to GMT
  • C Program to Convert Hours into Minutes and Seconds
C Program – More C Programs
  • C Program to Show Runtime exceptions  
  • C Program to Show Types of errors  
  • C Program to Show Unreachable Code Error  
  • C Program to Find Quotient and Remainder 
  • C Program to Find the Initials of a Name 
  • C Program to Draw a Circle in Graphics
  • Printing Source Code of a C Program Itself

FAQs on C Program

Q1: what is c programming.

C is a structured, high-level, and general-purpose programming language, developed in the early 1970s by Dennis Ritchie at Bell Labs. C language is considered as the mother language of all modern programming languages, widely used for developing system software, embedded software, and application software.

Q2: How do I write a “Hello, World!” program in C?

To write a “Hello, World!” program in C, you can use the following code: #include <stdio.h> int main() {   printf(“Hello, World!\n”);   return 0; } This code uses the printf function to display the “Hello, World!” message on the screen.

Q3: Why should you learn C Programming?

There are many reasons why you should learn C programming: Versatility Efficiency Portability Widely used Foundation for other languages Employment opportunities and more.

Please Login to comment...

Similar reads, improve your coding skills with practice.

 alt=

What kind of Experience do you want to share?

Case Study Research Method in Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Case studies are in-depth investigations of a person, group, event, or community. Typically, data is gathered from various sources using several methods (e.g., observations & interviews).

The case study research method originated in clinical medicine (the case history, i.e., the patient’s personal history). In psychology, case studies are often confined to the study of a particular individual.

The information is mainly biographical and relates to events in the individual’s past (i.e., retrospective), as well as to significant events that are currently occurring in his or her everyday life.

The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies.

Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

This makes it clear that the case study is a method that should only be used by a psychologist, therapist, or psychiatrist, i.e., someone with a professional qualification.

There is an ethical issue of competence. Only someone qualified to diagnose and treat a person can conduct a formal case study relating to atypical (i.e., abnormal) behavior or atypical development.

case study

 Famous Case Studies

  • Anna O – One of the most famous case studies, documenting psychoanalyst Josef Breuer’s treatment of “Anna O” (real name Bertha Pappenheim) for hysteria in the late 1800s using early psychoanalytic theory.
  • Little Hans – A child psychoanalysis case study published by Sigmund Freud in 1909 analyzing his five-year-old patient Herbert Graf’s house phobia as related to the Oedipus complex.
  • Bruce/Brenda – Gender identity case of the boy (Bruce) whose botched circumcision led psychologist John Money to advise gender reassignment and raise him as a girl (Brenda) in the 1960s.
  • Genie Wiley – Linguistics/psychological development case of the victim of extreme isolation abuse who was studied in 1970s California for effects of early language deprivation on acquiring speech later in life.
  • Phineas Gage – One of the most famous neuropsychology case studies analyzes personality changes in railroad worker Phineas Gage after an 1848 brain injury involving a tamping iron piercing his skull.

Clinical Case Studies

  • Studying the effectiveness of psychotherapy approaches with an individual patient
  • Assessing and treating mental illnesses like depression, anxiety disorders, PTSD
  • Neuropsychological cases investigating brain injuries or disorders

Child Psychology Case Studies

  • Studying psychological development from birth through adolescence
  • Cases of learning disabilities, autism spectrum disorders, ADHD
  • Effects of trauma, abuse, deprivation on development

Types of Case Studies

  • Explanatory case studies : Used to explore causation in order to find underlying principles. Helpful for doing qualitative analysis to explain presumed causal links.
  • Exploratory case studies : Used to explore situations where an intervention being evaluated has no clear set of outcomes. It helps define questions and hypotheses for future research.
  • Descriptive case studies : Describe an intervention or phenomenon and the real-life context in which it occurred. It is helpful for illustrating certain topics within an evaluation.
  • Multiple-case studies : Used to explore differences between cases and replicate findings across cases. Helpful for comparing and contrasting specific cases.
  • Intrinsic : Used to gain a better understanding of a particular case. Helpful for capturing the complexity of a single case.
  • Collective : Used to explore a general phenomenon using multiple case studies. Helpful for jointly studying a group of cases in order to inquire into the phenomenon.

Where Do You Find Data for a Case Study?

There are several places to find data for a case study. The key is to gather data from multiple sources to get a complete picture of the case and corroborate facts or findings through triangulation of evidence. Most of this information is likely qualitative (i.e., verbal description rather than measurement), but the psychologist might also collect numerical data.

1. Primary sources

  • Interviews – Interviewing key people related to the case to get their perspectives and insights. The interview is an extremely effective procedure for obtaining information about an individual, and it may be used to collect comments from the person’s friends, parents, employer, workmates, and others who have a good knowledge of the person, as well as to obtain facts from the person him or herself.
  • Observations – Observing behaviors, interactions, processes, etc., related to the case as they unfold in real-time.
  • Documents & Records – Reviewing private documents, diaries, public records, correspondence, meeting minutes, etc., relevant to the case.

2. Secondary sources

  • News/Media – News coverage of events related to the case study.
  • Academic articles – Journal articles, dissertations etc. that discuss the case.
  • Government reports – Official data and records related to the case context.
  • Books/films – Books, documentaries or films discussing the case.

3. Archival records

Searching historical archives, museum collections and databases to find relevant documents, visual/audio records related to the case history and context.

Public archives like newspapers, organizational records, photographic collections could all include potentially relevant pieces of information to shed light on attitudes, cultural perspectives, common practices and historical contexts related to psychology.

4. Organizational records

Organizational records offer the advantage of often having large datasets collected over time that can reveal or confirm psychological insights.

Of course, privacy and ethical concerns regarding confidential data must be navigated carefully.

However, with proper protocols, organizational records can provide invaluable context and empirical depth to qualitative case studies exploring the intersection of psychology and organizations.

  • Organizational/industrial psychology research : Organizational records like employee surveys, turnover/retention data, policies, incident reports etc. may provide insight into topics like job satisfaction, workplace culture and dynamics, leadership issues, employee behaviors etc.
  • Clinical psychology : Therapists/hospitals may grant access to anonymized medical records to study aspects like assessments, diagnoses, treatment plans etc. This could shed light on clinical practices.
  • School psychology : Studies could utilize anonymized student records like test scores, grades, disciplinary issues, and counseling referrals to study child development, learning barriers, effectiveness of support programs, and more.

How do I Write a Case Study in Psychology?

Follow specified case study guidelines provided by a journal or your psychology tutor. General components of clinical case studies include: background, symptoms, assessments, diagnosis, treatment, and outcomes. Interpreting the information means the researcher decides what to include or leave out. A good case study should always clarify which information is the factual description and which is an inference or the researcher’s opinion.

1. Introduction

  • Provide background on the case context and why it is of interest, presenting background information like demographics, relevant history, and presenting problem.
  • Compare briefly to similar published cases if applicable. Clearly state the focus/importance of the case.

2. Case Presentation

  • Describe the presenting problem in detail, including symptoms, duration,and impact on daily life.
  • Include client demographics like age and gender, information about social relationships, and mental health history.
  • Describe all physical, emotional, and/or sensory symptoms reported by the client.
  • Use patient quotes to describe the initial complaint verbatim. Follow with full-sentence summaries of relevant history details gathered, including key components that led to a working diagnosis.
  • Summarize clinical exam results, namely orthopedic/neurological tests, imaging, lab tests, etc. Note actual results rather than subjective conclusions. Provide images if clearly reproducible/anonymized.
  • Clearly state the working diagnosis or clinical impression before transitioning to management.

3. Management and Outcome

  • Indicate the total duration of care and number of treatments given over what timeframe. Use specific names/descriptions for any therapies/interventions applied.
  • Present the results of the intervention,including any quantitative or qualitative data collected.
  • For outcomes, utilize visual analog scales for pain, medication usage logs, etc., if possible. Include patient self-reports of improvement/worsening of symptoms. Note the reason for discharge/end of care.

4. Discussion

  • Analyze the case, exploring contributing factors, limitations of the study, and connections to existing research.
  • Analyze the effectiveness of the intervention,considering factors like participant adherence, limitations of the study, and potential alternative explanations for the results.
  • Identify any questions raised in the case analysis and relate insights to established theories and current research if applicable. Avoid definitive claims about physiological explanations.
  • Offer clinical implications, and suggest future research directions.

5. Additional Items

  • Thank specific assistants for writing support only. No patient acknowledgments.
  • References should directly support any key claims or quotes included.
  • Use tables/figures/images only if substantially informative. Include permissions and legends/explanatory notes.
  • Provides detailed (rich qualitative) information.
  • Provides insight for further research.
  • Permitting investigation of otherwise impractical (or unethical) situations.

Case studies allow a researcher to investigate a topic in far more detail than might be possible if they were trying to deal with a large number of research participants (nomothetic approach) with the aim of ‘averaging’.

Because of their in-depth, multi-sided approach, case studies often shed light on aspects of human thinking and behavior that would be unethical or impractical to study in other ways.

Research that only looks into the measurable aspects of human behavior is not likely to give us insights into the subjective dimension of experience, which is important to psychoanalytic and humanistic psychologists.

Case studies are often used in exploratory research. They can help us generate new ideas (that might be tested by other methods). They are an important way of illustrating theories and can help show how different aspects of a person’s life are related to each other.

The method is, therefore, important for psychologists who adopt a holistic point of view (i.e., humanistic psychologists ).

Limitations

  • Lacking scientific rigor and providing little basis for generalization of results to the wider population.
  • Researchers’ own subjective feelings may influence the case study (researcher bias).
  • Difficult to replicate.
  • Time-consuming and expensive.
  • The volume of data, together with the time restrictions in place, impacted the depth of analysis that was possible within the available resources.

Because a case study deals with only one person/event/group, we can never be sure if the case study investigated is representative of the wider body of “similar” instances. This means the conclusions drawn from a particular case may not be transferable to other settings.

Because case studies are based on the analysis of qualitative (i.e., descriptive) data , a lot depends on the psychologist’s interpretation of the information she has acquired.

This means that there is a lot of scope for Anna O , and it could be that the subjective opinions of the psychologist intrude in the assessment of what the data means.

For example, Freud has been criticized for producing case studies in which the information was sometimes distorted to fit particular behavioral theories (e.g., Little Hans ).

This is also true of Money’s interpretation of the Bruce/Brenda case study (Diamond, 1997) when he ignored evidence that went against his theory.

Breuer, J., & Freud, S. (1895).  Studies on hysteria . Standard Edition 2: London.

Curtiss, S. (1981). Genie: The case of a modern wild child .

Diamond, M., & Sigmundson, K. (1997). Sex Reassignment at Birth: Long-term Review and Clinical Implications. Archives of Pediatrics & Adolescent Medicine , 151(3), 298-304

Freud, S. (1909a). Analysis of a phobia of a five year old boy. In The Pelican Freud Library (1977), Vol 8, Case Histories 1, pages 169-306

Freud, S. (1909b). Bemerkungen über einen Fall von Zwangsneurose (Der “Rattenmann”). Jb. psychoanal. psychopathol. Forsch ., I, p. 357-421; GW, VII, p. 379-463; Notes upon a case of obsessional neurosis, SE , 10: 151-318.

Harlow J. M. (1848). Passage of an iron rod through the head.  Boston Medical and Surgical Journal, 39 , 389–393.

Harlow, J. M. (1868).  Recovery from the Passage of an Iron Bar through the Head .  Publications of the Massachusetts Medical Society. 2  (3), 327-347.

Money, J., & Ehrhardt, A. A. (1972).  Man & Woman, Boy & Girl : The Differentiation and Dimorphism of Gender Identity from Conception to Maturity. Baltimore, Maryland: Johns Hopkins University Press.

Money, J., & Tucker, P. (1975). Sexual signatures: On being a man or a woman.

Further Information

  • Case Study Approach
  • Case Study Method
  • Enhancing the Quality of Case Studies in Health Services Research
  • “We do things together” A case study of “couplehood” in dementia
  • Using mixed methods for evaluating an integrative approach to cancer care: a case study

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

  • All Solutions
  • Audience measurement
  • Media planning
  • Marketing optimization
  • Content metadata
  • Nielsen One
  • All Insights
  • Case Studies
  • Perspectives
  • Data Center
  • The Gauge TM – U.S.
  • Top 10 – U.S.
  • Top Trends – Denmark
  • Top Trends – Germany
  • Women’s World Cup
  • Men’s World Cup
  • Big Data + Panel
  • News Center

Client Login

2024 Annual Marketing Report

2024 Annual Marketing Report

Discover how global marketers are allocating budgets, maximizing ROI and what these trends mean for your own impact…

Are you investing in performance marketing for the right reasons?

A look at how ctv reach and viewership trends shift across generations.

Influencer marketing: The obvious approach

Influencer marketing: The obvious approach

Working with Nielsen’s Brand Impact solution has been a valuable partnership for Obviously,” says Heather at…

‘Data driven’ is no longer enough for your ROI strategy

ROI strategies hinge on capturing the right data at every stage of the customer journey. Just because data is easy to…

Outcomes-minded metrics: The marketing KPIs your CFO cares about

Outcomes-minded metrics: The marketing KPIs your CFO cares about

These are five outcomes-focused KPIs that help marketers show impact and unlock budget.

Reaching Asian American Audiences 2024

Understanding the media preferences of Asian American, Native Hawaiian and Pacific Islanders is critical to resonating in…

Reaching Asian American Audiences 2024

Featured reports

Metadata matters: Powering future FAST channel success

Metadata matters: Powering future FAST channel success

This guide will help FAST channels prepare for the future, when search and discovery features within individual services…

Explore all insights

case study in c programming

Find the right solution for your business

In an ever-changing world, we’re here to help you stay ahead of what’s to come with the tools to measure, connect with, and engage your audiences.

How can we help?

IMAGES

  1. A Simple case study using C Programming

    case study in c programming

  2. Arrays in C Programming

    case study in c programming

  3. Case Study

    case study in c programming

  4. Case Study: Tutorial on C++ Lesson 9

    case study in c programming

  5. C Programming Language (Handwritten) Study Notes Free PDF

    case study in c programming

  6. 10 Best C Programming Books Updated 2021 All About Testing

    case study in c programming

VIDEO

  1. C Language Tutorial for Beginners

  2. Embedded C Programming Style: Tutorial 19

  3. C Program for Case Study

  4. How to solve a case study ( live class with a demo case)

  5. A Simple case study using C Programming

  6. AP&T Case Study: C&K Johansen

COMMENTS

  1. Top 25 C Projects with Source Code in 2023

    Advanced C Projects With Source Code. 20. Dino Game. Description: Dino Game is the current most played game as it is available on most personal computers, as it is available in the Chrome browser. Dino game is a simple 2D game in which a dino player runs passing on all the hurdles.

  2. PDF A case study of C source code verification: the Schorr-Waite algorithm

    case study. One feature of this annotation language, shared with JML, is that annotations follow a syntax similar to C, so that a C programmer may learn it quite easily. Once a C program is annotated, verification is done by running CADUCEUS on its source code in a way similar to a classical compiler, but resulting in the generation of so-

  3. Switch Statement in C

    The working of the switch statement in C is as follows: Step 1: The switch variable is evaluated. Step 2: The evaluated value is matched against all the present cases. Step 3A: If the matching case value is found, the associated code is executed. Step 3B: If the matching code is not found, then the default case is executed if present.

  4. switch...case in C Programming

    How does the switch statement work? The expression is evaluated once and compared with the values of each case label. If there is a match, the corresponding statements after the matching label are executed. For example, if the value of the expression is equal to constant2, statements after case constant2: are executed until break is encountered ...

  5. C How to Program: With Case Studies in Applications and Systems

    For courses in computerprogramming. A user-friendly,code-intensive introduction to C programming with case studies introducingapplications and system programming C How to Program is a comprehensive introduction toprogramming in C. Like other texts of the Deitels' How to Program series,the book's modular presentation serves as a detailed, beginner source ofinformation for college students ...

  6. PDF C How to Program, Ninth Edition with Case Studies Introducing ...

    1. Introduction to Computers and C Intro to Hardware, Software & Internet; Test-Drive Microsoft Visual Studio, Apple Xcode, GNU gcc & GNU gcc in Docker 2. Intro to C Programming Input, Output, Types, Arithmetic, Decision Making, Secure C 3. Structured Program Development Algorithm Development, Problem Solving, if, if/else, while, Secure C 4 ...

  7. C Tutorial

    C is a general-purpose, procedural, high-level programming language used in the development of computer software and applications, system programming, games, and more. C language was developed by Dennis M. Ritchie at the Bell Telephone Laboratories in 1972. It is a powerful and flexible language which was first developed for the programming of ...

  8. Using case studies to teach the c programming language to beginning

    Abstract. The primary purpose of this descriptive study was to investigate the effects of using the case study method with critical thinking questions to teach the C programming language to beginning programmers. To this end, the author completely redesigned Programming I, a beginning C programming language course, and then taught the course to ...

  9. Numerical C: Applied Computational Programming with Case Studies

    Dispatched in 3 to 5 business days. Free shipping worldwide - see info. This book teaches applied numerical computing using the C programming language, starting with a quick primer on the C programming language and its SDK. It then dives into progressively more complex applied math formula for computational methods using C with examples.

  10. C How to Program: With Case Studies in Applications and Systems

    C How to Program is a user-friendly, code-intensive introduction to C programming with case studies introducing applications and system programming. Like other texts of the Deitels' How to Program series, the book's modular presentation serves as a detailed beginner source of information for college students looking to embark on a career in ...

  11. 15+ Exciting C Projects Ideas With Source Code

    Q. Is C good for big projects? A. C is indeed suitable for large projects. Programming in C requires a great deal of discipline than most modern programming languages. C aids in the learning of programming fundamentals, and because it is a procedural language, it necessitates a large amount of hard code in comparison to its competitors. Q.

  12. Programming case studies

    Programming case studies. A case study consists of a programming problem, one or more solutions to the problem, and a narrative description of the process used by an expert to produce the solutions.. Rationale for case studies and ideas for using them in a programming course are laid out in the following papers: "The Case for Case Studies of Programming Problems", Marcia C. Linn and ...

  13. Programming case study: Encouraging cross-disciplinary projects

    Programming case study: Encouraging cross-disciplinary projects. Google Classroom. To give fellow teachers an idea for how they can teach our curriculum in a classroom setting, we are creating case studies. Here's one case study of how teacher Ellen Reller uses our curriculum in her classroom in Lowell High School in California.

  14. Innovative Classroom Activity with Flipped Teaching for Programming in

    The following are the student learning outcomes list for programming in C course. On the completion of the course, the students should be able to: SLO1: Analyze the problem statement. SLO2: Choose the appropriate C programming constructs to solve the problems. SLO3: Demonstrate the advantages and disadvantages of specific techniques to be used.

  15. C How to Program: With Case Studies in Applications and Systems

    C; C How to Program: With Case Studies in Applications and Systems Programming, Global Edition; Switch content of the page by the Role toggle. I'm a student I'm an educator. the content would be changed according to the role.

  16. CASE Study FOR C PROGRAMMING EXERCISE

    CASE STUDY. Movie Ticket Booking System. People nowadays like doing thing through online for example shopping online, bill payment online, study online and many more. In order to make people easier and save the time, Movie Ticket Booking System was introduced. This system can save people time by no need to go to the cinema and buy the ticket.

  17. A case study investigating programming students' peer ...

    This case study aims to contribute to the programming education in schools by investigating how students learn in an online programming while involved in peer review of codes. The study subsequently examines students' perceptions of the pedagogical, social and technical design of the online programming learning environment.

  18. C--CASE STUDIES

    Read chapter C--CASE STUDIES: TRB's second Strategic Highway Research Program (SHRP 2) Report S2-L02-RR-2: Guide to Establishing Monitoring Programs for...

  19. How C++ Is Used in Embedded Systems: Applications and Case Studies

    Performance Optimization: C++ allows for fine-grained control over system resources, making it ideal for performance-critical embedded applications. Object-Oriented Approach: The object-oriented nature of C++ facilitates modular and reusable code, enhancing productivity in embedded system development. Hardware Interaction: With its capability ...

  20. Nested Loops in C Programming: Examples and Use Cases

    Example 3: Pattern Printing (e.g., Pyramids, Diamonds) Pattern printing, such as creating pyramids or diamond shapes, is another common application of nested loops in C programming. The outer loop controls the number of rows, while the inner loop manages the number of spaces and stars printed on each row.

  21. PDF THE LARGE INTEGER CASE STUDY

    Case Study in C++ A Manual for Students The AP Program wishes to acknowledge and to thank Owen Astrachan of Duke University for developing this case study and the accompanying documentation. Please note that reproduction of this document is permitted for face-to-face teaching purposes only.

  22. C (programming language)

    C (pronounced / ˈ s iː / - like the letter c) is a general-purpose computer programming language.It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential.By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has ...

  23. Data Structures in C

    Types of Data Structures in C. Primitive Data Structures: These data types are predefined in the C programming language. They store the data of only one type. Data types like short, integer, float, character, and double comes in this category. Non-Primitive Data Structures: They can store data of more than one type.

  24. C++ Applications in Real-World Scenarios, Examples and Use Cases

    C++ is a popular language for developing computer graphics applications. The language's low-level control and speed make it ideal for developing real-time rendering engines and graphical user interfaces. Some examples of C++ applications in computer graphics include OpenGL and DirectX. Medical Imaging.

  25. Learn C Programming Language Tutorial

    2) C as a system programming language. A system programming language is used to create system software. C language is a system programming language because it can be used to do low-level programming (for example driver and kernel). It is generally used to create hardware devices, OS, drivers, kernels, etc. For example, Linux kernel is written in C.

  26. Tuskegee Syphilis Study

    The Tuskegee Study of Untreated Syphilis in the Negro Male (informally referred to as the Tuskegee Experiment or Tuskegee Syphilis Study) was a study conducted between 1932 and 1972 by the United States Public Health Service (PHS) and the Centers for Disease Control and Prevention (CDC) on a group of nearly 400 African American men with syphilis. The purpose of the study was to observe the ...

  27. C Programs

    C Programs. C Programs: Practicing and solving problems is the best way to learn anything. Here, we have provided 100+ C programming examples in different categories like basic C Programs, Fibonacci series in C, String, Array, Base Conversion, Pattern Printing, Pointers, etc.

  28. Case study

    A case study is an in-depth, detailed examination of a particular case (or cases) within a real-world context. For example, case studies in medicine may focus on an individual patient or ailment; case studies in business might cover a particular firm's strategy or a broader market; similarly, case studies in politics can range from a narrow happening over time like the operations of a specific ...

  29. Case Study Research Method in Psychology

    Case studies are in-depth investigations of a person, group, event, or community. Typically, data is gathered from various sources using several methods (e.g., observations & interviews). The case study research method originated in clinical medicine (the case history, i.e., the patient's personal history). In psychology, case studies are ...

  30. Insights

    The Record: U.S. audio listening trends powered by Nielsen and Edison Research. The Record from Nielsen delivers a quarterly look at how U.S. audiences spend their time with ad-supported audio media. Load More. Discover the latest Nielsen insights based on our robust data and analytics to connect and engage with today's audiences.