Guide to Conducting a Feasibility Study

hypothesis feasibility study

What is an Operating Model?

' src=

  • Category Growth

So you’re thinking of a launching a new venture? Entering a new market? Launching a new product? It’s estimated that only one in fifty business ideas are actually commercially viable and so you’ll want to understand the viability of any proposed project before you invest your time, energy and money into it. That’s why you need a feasibility study.

Why do you need a feasibility study  .

With such a low success rate of new business ventures a business feasibility study is the best way to learn whether you have an idea that could work and guard against wastage of further investment. If the results are positive, then the outcome of the feasibility study can be used as the basis for a full business plan allowing your to proceed with a clearer view of the risks involved and move forward quicker. If it’s negative then you’ve skilfully avoided wasting time and money on a venture that wouldn’t have worded out.

What is a feasibility study?  

A feasibility study aims to make a recommendation as to the likely success of a venture. At the heart of any feasibility study is a hypothesis or question that you want to answer.  Examples include “is there a demand for a X new product or product feature”, “should we enter Y market” and “should we launch Z new venture”.

How to conduct a feasibility study?  

Once you’ve got a clear hypothesis or question that you want to answer, you need to look at five areas that will impact the feasibility of your idea. Let’s look at each of these in turn:

Market Feasibility

Is the market in question attractive? Are there high barrier to entry? Is it of a size that will support our ambitions? Is it growing? Are there any regulatory or legislative requirements to enter or participate in the market?

Technical Feasibility

What technical skills/ability/knowledge/equipment is required? Do you have or could you source the technical expertise required? Do you fully understand the technical requirements underpinning your hypothesis? Could you manufacture / develop the product or service with the resources you have available?

Business Model Feasibility

How will the idea make money? How will you attract users? What costs will you have to pay? Have you modelled the financials? Do you have access to the funding needed? What legal entity structure would you need?

Management Model Feasibility

Who will lead the venture? Do you have the skills and expertise required to manage and operate the venture/product/market? Does the team have the time needed to deliver the venture? If not, can they be recruited or are their skills hard to find?

Exit Feasibility

Do you have a plan to exit the venture and do you need one?

When competing a feasibility study each of the above areas should have a recommendation as to whether it’s feasible or not from that specific perspective factoring in the resources you have available.  This should conclude with a recommendation based on the analysis as to if the venture is or isn’t feasible and the key data points that underpin that recommendation.

Remember that a great feasibility study should not just give you a go / no-go decision. It should provide either a spring board to move forward, highlighting the key areas to focus on to achieve success or a useful analysis highlighting the key obstacles that make the venture unfeasible and should be considered for any future ideas. Even if the answer is no, it’s not a wasted effort, the analysis will leave you better informed for future decisions.

A feasibility study is an essential tool for anyone looking at a new venture. It’s very easy to get excited by a new idea of proposition and steam ahead spending time and money without having a clear view as to whether it’s viable or not. A feasibility study should be your first stop to maximise the returns on your time, energy and investment.

Best of luck with your feasibility studies!

Chris Purcell @ Prussel & Co

  • Product overview
  • All features
  • Latest feature release
  • App integrations

CAPABILITIES

  • project icon Project management
  • Project views
  • Custom fields
  • Status updates
  • goal icon Goals and reporting
  • Reporting dashboards
  • workflow icon Workflows and automation
  • portfolio icon Resource management
  • Capacity planning
  • Time tracking
  • my-task icon Admin and security
  • Admin console
  • asana-intelligence icon Asana AI
  • list icon Personal
  • premium icon Starter
  • briefcase icon Advanced
  • Goal management
  • Organizational planning
  • Project intake
  • Resource planning
  • Product launches
  • View all uses arrow-right icon

Featured Reads

hypothesis feasibility study

  • Work management resources Discover best practices, watch webinars, get insights
  • Customer stories See how the world's best organizations drive work innovation with Asana
  • Help Center Get lots of tips, tricks, and advice to get the most from Asana
  • Asana Academy Sign up for interactive courses and webinars to learn Asana
  • Developers Learn more about building apps on the Asana platform
  • Community programs Connect with and learn from Asana customers around the world
  • Events Find out about upcoming events near you
  • Partners Learn more about our partner programs
  • Asana for nonprofits Get more information on our nonprofit discount program, and apply.
  • Project plans
  • Team goals & objectives
  • Team continuity
  • Meeting agenda
  • View all templates arrow-right icon
  • Project planning |
  • How to conduct a feasibility study: Tem ...

How to conduct a feasibility study: Templates and examples

Julia Martins contributor headshot

Conducting a feasibility study is an important step in successful project management. By evaluating the viability of a proposed project, a feasibility study helps you identify potential challenges and opportunities, ensuring you make informed decisions. In this guide, we’ll walk you through how to conduct a feasibility study with practical templates and real-world examples, designed for project managers seeking to optimize their project planning process.

It can be exciting to run a large, complex project that has a huge potential impact on your organization. On the one hand, you’re driving real change. On the other hand, failure is intimidating. 

What is a feasibility study? 

A feasibility study—sometimes called a feasibility analysis or feasibility report—is a way to evaluate whether or not a project plan could be successful. A feasibility study evaluates the practicality of your project plan in order to judge whether or not you’re able to move forward with the project. 

It does so by answering two questions: 

Does our team have the required tools or resources to complete this project? 

Will there be a high enough return on investment to make the project worth pursuing? 

Benefits of conducting a feasibility study

There are several key benefits to conducting a feasibility study before launching a new project:

Confirms market opportunities and the target market before investing significant resources

Identifies potential issues and risks early on

Provides in-depth data for better decision making on the proposed project's viability

Creates documentation on expected costs and benefits, including financial analysis

Obtains stakeholder buy-in by demonstrating due diligence

Feasibility studies are important for projects that represent significant investments for your business. Projects that also have a large potential impact on your presence in the market may also require a feasibility assessment. 

As the project manager , you may not be directly responsible for driving the feasibility study, but it’s important to know what these studies are. By understanding the different elements that go into a feasibility study, you can better support the team driving the feasibility study and ensure the best outcome for your project.

When should you conduct a feasibility analysis?

A feasibility study should be conducted after the project has been pitched but before any work has actually started. The study is part of the project planning process. In fact, it’s often done in conjunction with a SWOT analysis or project risk assessment , depending on the specific project. 

Feasibility studies help: 

Confirm market opportunities before committing to a project

Narrow your business alternatives

Create documentation about the benefits and disadvantages of your proposed initiative

Provide more information before making a go-or-no-go decision

You likely don’t need a feasibility study if:

You already know the project is feasible

You’ve run a similar project in the past

Your competitors are succeeding with a similar initiative in market

The project is small, straightforward, and has minimal long-term business impact

Your team ran a similar feasibility analysis within the past three years

One thing to keep in mind is that a feasibility study is not a project pitch. During a project pitch, you’re evaluating whether or not the project is a good idea for your company and whether the goals of the project are in line with your overall strategic plan. Typically, once you’ve established that the project is a good idea, you'll run a feasibility study to confirm that the project is possible with the tools and resources you have at your disposal. 

Types of feasibility studies

There are five main types of feasibility studies: technical feasibility, financial feasibility, market feasibility (or market fit), operational feasibility, and legal feasibility. Most comprehensive feasibility studies will include an assessment of all five of these areas.

Technical feasibility

A technical feasibility study reviews the technical resources available for your project. This study determines if you have the right equipment, enough equipment, and the right technical knowledge to complete your project objectives . For example, if your project plan proposes creating 50,000 products per month, but you can only produce 30,000 products per month in your factories, this project isn’t technically feasible. 

Financial feasibility

Financial feasibility describes whether or not your project is fiscally viable. A financial feasibility report includes a cost-benefit analysis of the project. It also forecasts an expected return on investment (ROI) and outlines any financial risks. The goal at the end of the financial feasibility study is to understand the economic benefits the project will drive. 

Market feasibility

The market feasibility study is an evaluation of how your team expects the project’s deliverables to perform in the market. This part of the report includes a market analysis, a market competition breakdown, and sales projections.

Operational feasibility

An operational feasibility study evaluates whether or not your organization is able to complete this project. This includes staffing requirements, organizational structure, and any applicable legal requirements. At the end of the operational feasibility study, your team will have a sense of whether or not you have the resources, skills, and competencies to complete this work. 

Legal feasibility

A legal feasibility analysis assesses whether the proposed project complies with all relevant legal requirements and regulations. This includes examining legal and regulatory barriers, necessary permits, licenses, or certifications, potential legal liabilities or risks, and intellectual property considerations. The legal feasibility study ensures that the project can be completed without running afoul of any laws or incurring undue legal exposure for the organization.

Feasibility assessment checklist

Most feasibility studies are structured in a similar way. These documents serve as an assessment of the practicality of a proposed business idea. Creating a clear feasibility study helps project stakeholders during the decision making process. 

The essential elements of a feasibility study are: 

An executive summary describing the project’s overall viability

A description of the product or service being developed during this project

Any technical considerations , including technology, equipment, or staffing

The market survey , including a study of the current market and the marketing strategy 

The operational feasibility study evaluates whether or not your team’s current organizational structure can support this initiative

The project timeline

Financial projections based on your financial feasibility report

6 steps to conduct a feasibility study

You likely won’t be conducting the feasibility study yourself, but you will probably be called on to provide insight and information. To conduct a feasibility study, hire a trained consultant or, if you have an in-house project management office (PMO) , ask if they take on this type of work. In general, here are the steps they’ll take to complete this work: 

1. Run a preliminary analysis

Creating a feasibility study is a time-intensive process. Before diving into the feasibility study, it’s important to evaluate the project for any obvious and insurmountable roadblocks. For example, if the project requires significantly more budget than your organization has available, you likely won’t be able to complete it. Similarly, if the project deliverables need to be live and in the market by a certain date but won’t be available for several months after that, the project likely isn’t feasible either. These types of large-scale obstacles make a feasibility study unnecessary because it’s clear the project is not viable.

2. Evaluate financial feasibility

Think of the financial feasibility study as the projected income statement for the project. This part of the feasibility study clarifies the expected project income and outlines what your organization needs to invest—in terms of time and money—in order to hit the project objectives. 

During the financial feasibility study, take into account whether or not the project will impact your business's cash flow. Depending on the complexity of the initiative, your internal PMO or external consultant may want to work with your financial team to run a cost-benefit analysis of the project. 

3. Run a market assessment

The market assessment, or market feasibility study, is a chance to identify the demand in the market. This study offers a sense of expected revenue for the project and any potential market risks you could run into. 

The market assessment, more than any other part of the feasibility study, is a chance to evaluate whether or not there’s an opportunity in the market. During this study, it’s critical to evaluate your competitor’s positions and analyze demographics to get a sense of how the project will go. 

4. Consider technical and operational feasibility

Even if the financials are looking good and the market is ready, this initiative may not be something your organization can support. To evaluate operational feasibility, consider any staffing or equipment requirements this project needs. What organizational resources—including time, money, and skills—are necessary in order for this project to succeed? 

Depending on the project, it may also be necessary to consider the legal impact of the initiative. For example, if the project involves developing a new patent for your product, you will need to involve your legal team and incorporate that requirement into the project plan.

5. Review project points of vulnerability

At this stage, your internal PMO team or external consultant have looked at all four elements of your feasibility study—financials, market analysis, technical feasibility, and operational feasibility. Before running their recommendations by you and your stakeholders, they will review and analyze the data for any inconsistencies. This includes ensuring the income statement is in line with your market analysis. Similarly, now that they’ve run a technical feasibility study, are any liabilities too big of a red flag? (If so, create a contingency plan !) 

Depending on the complexity of your project, there won’t always be a clear answer. A feasibility analysis doesn’t provide a black-and-white decision for a complex problem. Rather, it helps you come to the table with the right questions—and answers—so you can make the best decision for your project and for your team.

6. Propose a decision

The final step of the feasibility study is an executive summary touching on the main points and proposing a solution. 

Depending on the complexity and scope of the project, your internal PMO or external consultant may share the feasibility study with stakeholders or present it to the group in order to field any questions live. Either way, with the study in hand, your team now has the information you need to make an informed decision.

Feasibility study examples

To better understand the concepts behind feasibility assessments, here are two hypothetical examples demonstrating how these studies can be applied in real-world scenarios.

Example 1: New product development

A consumer goods company is considering launching a new product line. Before investing in new product development, they conduct a feasibility study to assess the proposed project.

The feasibility study includes:

Market research to gauge consumer interest, assess competitor offerings, and estimate potential market share for the target market.

Technological considerations, including R&D requirements, production processes, and any necessary patents or certifications.

In-depth financial analysis projects sales volumes, revenue, costs, and profitability over a multi-year period.

Evaluation of organizational readiness, including the skills of the current management team and staff to bring the new product to market.

Assessment of legal feasibility to ensure compliance with regulations and identify any potential liability issues.

The comprehensive feasibility study identifies a promising market opportunity for the new business venture. The company decides to proceed with the new project, using the feasibility report as a template for their business development process. The study helps secure funding from key decision-makers, setting this start-up product initiative up for success.

Example 2: Real estate development deal

A property developer is evaluating the feasibility of purchasing land for a new residential community. They commission a feasibility study to determine the viability of this real estate development project.

The feasibility assessment covers:

Detailed analysis of the local housing market, including demand drivers, comparable properties, pricing, and absorption rates.

Site planning to assess the property's capacity, constraints, and technological considerations.

In-depth review of legal feasibility, including zoning, permitting, environmental regulations, and other potential legal hurdles.

Financial analysis modeling various development scenarios and estimating returns on investment.

Creation of an opening day balance sheet projecting the assets, liabilities, and equity for the proposed project.

Sensitivity analysis to evaluate the impact of changes in key assumptions on the project's scope and profitability.

The feasibility study concludes that while the real estate start-up is viable, it carries significant risk. Based on these findings, the developer makes an informed decision to move forward, but with a revised project's scope and a phased approach to mitigate risk. The comprehensive feasibility analysis proves critical in guiding this major investment decision.

Which phase of the project management process involves feasibility studies?

Feasibility studies are a key part of the project initiation and planning phases. They are typically conducted after a project has been conceptualized but before significant resources are invested in detailed planning and execution.

The purpose of a feasibility assessment is to objectively evaluate the viability of a proposed project, considering factors such as technical feasibility, market demand, financial costs and benefits, legal requirements, and organizational readiness. By thoroughly assessing these aspects, a feasibility study helps project stakeholders make an informed go-or-no-go decision.

While feasibility studies are a critical tool in the early stages of project management, they differ from other planning documents like project charters, business cases, and business plans. Here's a closer look at these key differences:

Feasibility study vs. project charter

A project charter is a relatively informal document to pitch your project to stakeholders. Think of the charter as an elevator pitch for your project objectives, scope, and responsibilities. Typically, your project sponsor or executive stakeholders review the charter before ratifying the project. 

A feasibility study should be implemented after the project charter has been ratified. This isn’t a document to pitch whether or not the project is in line with your team’s goals—rather, it’s a way to ensure the project is something you and your team can accomplish.

Feasibility study vs. business case

A business case is a more formalized version of the project charter. While you’d typically create a project charter for small or straightforward initiatives, you should create a business case if you are pitching a large, complex initiative that will make a major impact on the business. This longer, more formal document will also include financial information and typically involve more senior stakeholders. 

After your business case is approved by relevant stakeholders, you'll run a feasibility study to make sure the work is doable. If you find it isn’t, you might return to your executive stakeholders and request more resources, tools, or time in order to ensure your business case is feasible.

Feasibility study vs. business plan

A business plan is a formal document outlining your organization’s goals. You typically write a business plan when founding your company or when your business is going through a significant shift. Your business plan informs a lot of other business decisions, including your three- to five-year strategic plan . 

As you implement your business and strategic plan, you’ll invest in individual projects. A feasibility study is a way to evaluate the practicality of any given individual project or initiative.

Achieve project success with Asana

Are you done with your feasibility study? You’re ready to run a project! Set your project up for success by tracking your progress with a work management tool like Asana. From the small stuff to the big picture, Asana organizes work so teams know what to do, why it matters, and how to get it done.

Related resources

hypothesis feasibility study

New site openings: How to reduce costs and delays

hypothesis feasibility study

Provider onboarding software: Simplify your hiring process

hypothesis feasibility study

15 creative elevator pitch examples for every scenario

hypothesis feasibility study

Timesheet templates: How to track team progress

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework

* E-mail: [email protected]

Affiliation Centre for Primary Care and Public Health, Queen Mary University of London, London, United Kingdom

Affiliation Department of Mathematics and Statistics, Lancaster University, Lancaster, Lancashire, United Kingdom

Affiliation School of Health and Related Research, University of Sheffield, Sheffield, South Yorkshire, United Kingdom

Affiliation Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada

Affiliation Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, Oxfordshire, United Kingdom

Affiliation Centre of Academic Primary Care, University of Aberdeen, Aberdeen, Scotland, United Kingdom

  • Sandra M. Eldridge, 
  • Gillian A. Lancaster, 
  • Michael J. Campbell, 
  • Lehana Thabane, 
  • Sally Hopewell, 
  • Claire L. Coleman, 
  • Christine M. Bond

PLOS

  • Published: March 15, 2016
  • https://doi.org/10.1371/journal.pone.0150205
  • Reader Comments

Fig 1

We describe a framework for defining pilot and feasibility studies focusing on studies conducted in preparation for a randomised controlled trial. To develop the framework, we undertook a Delphi survey; ran an open meeting at a trial methodology conference; conducted a review of definitions outside the health research context; consulted experts at an international consensus meeting; and reviewed 27 empirical pilot or feasibility studies. We initially adopted mutually exclusive definitions of pilot and feasibility studies. However, some Delphi survey respondents and the majority of open meeting attendees disagreed with the idea of mutually exclusive definitions. Their viewpoint was supported by definitions outside the health research context, the use of the terms ‘pilot’ and ‘feasibility’ in the literature, and participants at the international consensus meeting. In our framework, pilot studies are a subset of feasibility studies, rather than the two being mutually exclusive. A feasibility study asks whether something can be done, should we proceed with it, and if so, how. A pilot study asks the same questions but also has a specific design feature: in a pilot study a future study, or part of a future study, is conducted on a smaller scale. We suggest that to facilitate their identification, these studies should be clearly identified using the terms ‘feasibility’ or ‘pilot’ as appropriate. This should include feasibility studies that are largely qualitative; we found these difficult to identify in electronic searches because researchers rarely used the term ‘feasibility’ in the title or abstract of such studies. Investigators should also report appropriate objectives and methods related to feasibility; and give clear confirmation that their study is in preparation for a future randomised controlled trial designed to assess the effect of an intervention.

Citation: Eldridge SM, Lancaster GA, Campbell MJ, Thabane L, Hopewell S, Coleman CL, et al. (2016) Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework. PLoS ONE 11(3): e0150205. https://doi.org/10.1371/journal.pone.0150205

Editor: Chiara Lazzeri, Azienda Ospedaliero-Universitaria Careggi, ITALY

Received: August 13, 2015; Accepted: February 10, 2016; Published: March 15, 2016

Copyright: © 2016 Eldridge et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Due to a requirement by the ethics committee that the authors specified when the data will be destroyed, the authors are not able to give unlimited access to the Delphi study quantitative data. These data are available from Professor Sandra Eldridge. Data will be available upon request to all interested researchers. Qualitative data from the Delphi study are not available because the authors do not have consent from participants for wider distribution of this more sensitive data.

Funding: The authors received small grants from Queen Mary University of London (£7495), University of Sheffield (£8000), NIHR RDS London (£2000), NIHR RDS South East (£2400), Chief Scientist Office Scotland (£1000). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: All authors have completed the ICMJE disclosure form at http://www.icmje.org/coi_disclosure.pdf and declare support from the following organisations that might have an interest in the submitted work – Queen Mary University of London, Sheffield University, NIHR, Chief Scientist Office Scotland; financial relationships with NIHR, MRC, EC FP7, Canadian Institute for Health Research, Wiley, who might have an interest in the submitted work in the previous three years. No other relationships or activities have influenced the submitted work. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Introduction

There is a large and growing number of studies in the literature that authors describe as feasibility or pilot studies. In this paper we focus on feasibility and pilot studies conducted in preparation for a future definitive randomised controlled trial (RCT) that aims to assess the effect of an intervention. We are primarily concerned with stand-alone studies that are completed before the start of such a definitive RCT, and do not specifically cover internal pilot studies which are designed as the early stage of a definitive RCT; work on the conduct of internal pilot studies is currently being carried out by the UK MRC Network of Hubs for Trials Methodology Research. One motivating factor for the work reported in this paper was the inconsistent use of terms. For example, in the context of RCTs ‘pilot study’ is sometimes used to refer to a study addressing feasibility in preparation for a larger RCT, but at other times it is used to refer to a small scale, often opportunistic, RCT which assesses efficacy or effectiveness.

A second, related, motivating factor was the lack of agreement in the research community about the use of the terms ‘pilot’ and ‘feasibility’ in relation to studies conducted in preparation for a future definitive RCT. In a seminal paper in 2004 reviewing the literature in relation to pilot and feasibility studies conducted in preparation for an RCT [ 1 ], Lancaster et al reported that they could find no formal guidance as to what constituted a pilot study. In the updated UK Medical Research Council (MRC) guidance on designing and evaluating complex interventions published four years later, feasibility and pilot studies are explicitly recommended, particularly in relation to identifying problems that might occur in an ensuing RCT of a complex intervention [ 2 ]. However, while the guidance suggests possible aims of such studies, for example, testing procedures for their acceptability, estimating the likely rates of recruitment and retention of subjects, and the calculation of appropriate sample sizes, no explicit definitions of a ‘pilot study’ or ‘feasibility study’ are provided. In 2010, Thabane and colleagues presented a number of definitions of pilot studies taken from various health related websites [ 3 ]. While these definitions vary, most have in common the idea of conducting a study in advance of a larger, more comprehensive, investigation. Thabane et al also considered the relationship between pilot and feasibility, suggesting that feasibility should be the main emphasis of a pilot study and that ‘a pilot study is synonymous with a feasibility study intended to guide the planning of a large scale investigation’. However, at about the same time, the UK National Institute for Health Research (NIHR) developed definitions of pilot and feasibility studies that are mutually exclusive, suggesting that feasibility studies occurred slightly earlier in the research process and that pilot studies are ‘a version of the main study that is run in miniature to test whether the components of the main study can all work together’. Arain et al . felt that the NIHR definitions were helpful, and showed that studies identified using the keyword ‘feasibility’ had different characteristics from those identified as ‘pilot’ studies [ 4 ]. The NIHR wording for pilot studies has been changed more recently to ‘a smaller version of the main study used to test whether the components of the main study can all work together’ ( Fig 1 ). Nevertheless, it still contrasts with the MRC framework guidance that explicitly states: ‘A pilot study need not be a “scale model” of the planned main-stage evaluation, but should address the main uncertainties that have been identified in the development work’ [ 2 ]. These various, sometimes conflicting, approaches to the interpretation of the terms ‘pilot’ and ‘feasibility’ exemplify differences in current usage and opinion in the research community.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0150205.g001

While lack of agreement about definitions may not necessarily affect research quality, it can become problematic when trying to develop guidance for research conduct because of the need for clarity over what the guidance applies to and therefore what it should contain. Previous research has identified weaknesses in the reporting and conduct of pilot and feasibility studies [ 1 , 3 , 4 , 7 ], particularly in relation to studies conducted in preparation for a future definitive RCT assessing the effect of an intervention or therapy. While undertaking research to develop guidance to address some of the weaknesses in reporting these studies, we became convinced by the current interest in this area, the lack of clarity, and the differences of opinion in the research community, that a re-evaluation of the definitions of pilot and feasibility studies was needed. This paper describes the process and results of this re-evaluation and suggests a conceptual framework within which researchers can operate when designing and reporting pilot/feasibility studies. Since our work on reporting guidelines focused specifically on pilot and feasibility studies in preparation for an RCT assessing the effect of some intervention or therapy, we restrict our re-evaluation to these types of pilot and feasibility studies.

The process of developing and validating the conceptual framework for defining pilot and feasibility studies was, to a large extent, integral to the development of our reporting guidelines, the core components of which were a large Delphi study and an international expert consensus meeting focused on developing an extension of the 2010 CONSORT statement for RCTs [ 8 ] to randomised pilot studies. The reporting guidelines, Delphi study and consensus meeting are therefore referred to in this paper. However, the reporting guidelines will be reported separately; this paper focuses on our conceptual framework.

Developing a conceptual framework—Delphi study

Following research team discussion of our previous experience with, and research on, pilot and feasibility studies we initially produced mutually exclusive definitions of pilot and feasibility studies based on, but not identical to, the definitions used by the NIHR. We drew up two draft reporting checklists based on the 2010 CONSORT statement [ 8 ], one for what we had defined as feasibility studies and one for what we had defined as pilot studies. We constructed a Delphi survey, administered on-line by Clinvivo [ 9 ], to obtain consensus on checklist items for inclusion in a reporting guideline, and views on the definitions. Following user-testing of a draft version of the survey with a purposive sample of researchers active in the field of trials and pilot studies, and a workshop at the 2013 Society for Clinical Trials Conference in Boston, we further refined the definitions, checklists, survey introduction and added additional questions.

The first round of the main Delphi survey included: a description and explanation of our definitions of pilot and feasibility studies including examples (Figs 2 and 3 ); questions about participants’ characteristics; 67 proposed items for the two checklists and questions about overall appropriateness of the guidelines for feasibility or pilot studies; and four questions related to the definitions of feasibility and pilot studies: How appropriate do you think our definition for a pilot study conducted in preparation for an RCT is ? How appropriate do you think our definition for a feasibility study conducted in preparation for an RCT is ? How appropriate is the way we have distinguished between two different types of study conducted in preparation for an RCT ? How appropriate are the labels ‘pilot’ and ‘feasibility’ for the two types of study we have distinguished ? Participants were asked to rate their answers to the four questions on a nine-point scale from ‘not at all appropriate’ to ‘completely appropriate’. There was also a space for open comments about the definitions. The second round included results from the first round and again asked for further comments about the definitions.

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g002

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g003

Participants for the main survey were identified as likely users of the checklist including trialists, methodologists, statisticians, funders and journal editors. Three hundred and seventy potential participants were approached by email from the project team or directly from Clinvivo. These were individuals identified based on personal networks, authors of relevant studies in the literature, members of the Canadian Institute of Health Research, Biostatistics section of Statistics Society of Canada, and the American Statistical Society. The International Society for Clinical Biostatistics and the Society for Clinical Trials kindly forwarded our email to their entire membership. There was a link within the email to the on-line questionnaire. Each round lasted three weeks and participants were sent one reminder a week before the closure of each survey. The survey took place between August and October 2013. Ethical approval was granted by the ScHARR research ethics committee at the University of Sheffield.

Developing a conceptual framework—Open meeting and research team meetings

The results of the Delphi survey pertaining to the definitions of feasibility and pilot studies were presented to an open meeting at the 2 nd UK MRC Trials Methodology Conference in Edinburgh in November 2013 [ 13 ]. Attendees chose their preferred proposition from four propositions regarding the definitions, based variously on our original definitions, the NIHR and MRC views of pilot and feasibility studies and different views expressed in the Delphi survey. At a subsequent two-day research team meeting we collated the findings from the Delphi survey and the open meeting, and considered definitions of piloting and feasiblity outside the health research context found from on-line searches using the terms ‘pilot definition’, ‘feasiblity definition’, ‘pilot study definition’ and ‘feasibility study definition’ in Google. We expected all searches to give a very large number of hits and examined the first two pages of hits only from each search. From this, we developed a conceptual framework reflecting consensus about the definitions, types and roles of feasibility and pilot studies conducted in preparation for an RCT evaluating the effect of an intervention or therapy. To ensure we incorporated the views of all researchers likely to be conducting pilot/feasiblity studies, two qualitative researchers joined the second day of the meeting which focused on agreeing this framework. Throughout this process we continually referred back to examples that we had identified to check that our emerging definitions were workable.

Validating the conceptual framework—systematic review

To validate the proposed conceptual framework, we identified a selection of recently reported studies that fitted our definition of pilot and feasibility studies, and tested a number of hypotheses in relation to these studies. We expected that approximately 30 reports would be sufficient to test the hypotheses. We conducted a systematic review to identify studies that authors described as pilot or feasibility studies, by searching Medline via PubMed for studies that had the words ‘pilot’ or ‘feasibility’ in the title. To increase the likelihood that the studies would be those conducted in preparation for a randomised controlled trial of the effect of a therapy or intervention we limited our search to those that contained the word ‘trial’ in the title or abstract. For full details of the search strategy see S1 Fig .

To focus on current practice, we selected the 150 most recent studies from those identified by the electronic search. We did not exclude protocols since we were primarily interested in identifying the way researchers characterised their study and any possible future study and the relationship between them; we expected investigators to describe these aspects of their studies in a similar way in protocols and reports of findings. Two research team members independently reviewed study abstracts to assess whether each study fitted our working definition of a pilot or feasibility study in preparation for an RCT evaluating the effect of an intervention or therapy. Where reviewers disagreed, studies were classed as ‘possible inclusions’ and disagreements resolved by discussion with referral to the full text of the paper as necessary. Given the difficulty of interpreting some reports and to ensure that all research team members agreed on inclusion, the whole team then reviewed relevant extracted sections of the papers provisionally agreed for inclusion. We recognised that abstracts of some studies might not include appropriate information, and therefore that our initial abstract review could have excluded some relevant studies; we explored the extent of this potential omission of studies by reviewing the full texts of a random sample of 30 studies from the original 150. Since our prime goal was to identify a manageable number of relevant studies in order to test our hypotheses rather than identify all possible relevant studies we did not include any additional studies as a result of this exploratory study.

We postulated that the following hypotheses would support our conceptual framework:

  • The words ‘pilot’ and ‘feasibility’ are both used in the literature to describe studies undertaken in preparation for an RCT evaluating the effect of an intervention or therapy
  • It is possible to identify a subset of studies within the literature that are RCTs conducted in preparation for a larger RCT which evaluates the effect of an intervention or therapy. Authors do not use the term ‘pilot trial’ consistently in relation to these studies.
  • Within the literature it is not possible to apply unique mutually exclusive definitions of pilot and feasibility studies in preparation for an RCT evaluating the effect of an intervention or therapy that are consistent with the way authors describe their studies.
  • Amongst feasibility studies in preparation for an RCT which evaluates the effect of an intervention or therapy it is possible to identify some studies that are not pilot studies as defined within our conceptual framework, but are studies that acquire information about the feasibility of applying an intervention in a future study.

In order to explore these hypotheses, we categorised included studies into three groups that tallied with our framework (see results for details): randomised pilot studies, non-randomised pilot studies, feasibility studies that are not pilot studies. We also extracted data on objectives, and the phrases that indicated that the studies were conducted in preparation for a subsequent RCT.

Validating the conceptual framework—Consensus meeting

We also took an explanation and visual representation of our framework to an international consensus meeting primarily designed to reach consensus on an extension of the 2010 CONSORT statement to randomised pilot studies. There were 19 invited participants with known expertise, experience, or interest in pilot and feasibility studies, including representatives of CONSORT, funders, journal editors, and those who had been involved in writing the NIHR definitions of pilot and feasibility studies and the MRC guidance on designing and evaluating complex interventions. Thus this was an ideal forum in which to discuss the framework also. This project was not concerned with any specific disease, and was methodological in design; no patients or public were involved.

Ninety-three individuals, including chief investigators, statisticians, trial managers, clinicians, research assistants and a funder, participated in the first round of the Delphi survey and 79 in the second round. Over 70% of participants in the first round felt that our definitions, the way we had distinguished between pilot and feasibility studies, and the labels ‘pilot’ and ‘feasibility’ were appropriate. However, these four items had some of the lowest appropriateness ratings in the survey and there were a large number of comments both in direct response to our four survey items related to appropriateness of definitions, and in open comment boxes elsewhere in the survey. Some of these comments are presented in Fig 4 . Some participants commented favourably on the definitions we had drawn up (quote 1) but others were confused by them (quote 2). Several compared our definitions to the NIHR definitions pointing out the differences (quote 3) and suggesting this might make it particularly difficult for the research community to understand our definitions (quote 4). Some expressed their own views about the definitions (quote 5); largely these tallied with the NIHR definitions. Others noted that both the concept of feasibility and the word itself were often used in relation to studies which investigators referred to as pilot studies (quote 6). Others questioned whether it was practically and/or theoretically possible to make a distinction between pilot and feasibility studies (quote 6, quote 7), suggesting that the two terms are not mutually exclusive and that feasibility was more of an umbrella term for studies conducted prior to the main trial. Some participants felt that, using our definitions, feasibility studies would be less structured and more variable and therefore their quality would be less appropriately assessed via a checklist (quote 8). These responses regarding definitions mirrored what we had found in the user-testing of the Delphi survey, the Society for Clinical Trials workshop, and differences of opinion already apparent in the literature. In the second round of the survey there were few comments about definitions.

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g004

There was a wide range of participants in the open meeting, including senior quantitative and qualitative methodologists, and a funding body representative. The four propositions we devised to cover different views about definitions of pilot and feasibility studies are shown in Fig 5 . Fourteen out of the fifteen attendees who voted on these propositions preferred propositions 3 or 4, based on comments from the Delphi survey and the MRC guidance on designing and evaluating complex interventions respectively. Neither of these propositions implied mutually exclusive definitions of pilot and feasibility studies.

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g005

Definitions of feasibility outside the health research context focus on the likelihood of being able to do something. For example, the Oxford on-line dictionary defines feasibility as: ‘The state or degree of being easily or conveniently done’ [ 14 ] and a feasibility study as: ‘An assessment of the practicality of a proposed plan or method’ [ 15 ]. Some definitions also suggest that a feasibility study should help with decision making, for example [ 16 ]: ‘The feasibility study is an evaluation and analysis of the potential of a proposed project. It is based on extensive investigation and research to support the process of decision making’. Outside the health research context the word ‘pilot’ has several different meanings but definitions of pilot studies usually focus on an experiment, project or development undertaken in advance of a future wider experiment, project or development. For example the Oxford on-line dictionary describes a pilot study as: ‘Done as an experiment or test before being introduced more widely’ [ 17 ]. Several definitions carry with them ideas that the purpose of a pilot study is also to facilitate decision making, for example ‘a small-scale experiment or set of observations undertaken to decide how and whether to launch a full-scale project’ [ 18 ] and some definitions specifically mention feasibility, for example: ‘a small scale preliminary study conducted in order to evaluate feasibility’ [ 19 ].

In keeping with these definitions not directly related to the health research context, we agreed that feasiblity is a concept encapsulating ideas about whether it is possible to do something and that a feasibility study asks whether something can be done , should we proceed with it , and if so , how . While piloting is also concerned with whether something can be done and whether and how we should proceed with it, it has a further dimension; piloting is implementing something, or part of something, in a way you intend to do it in future to see whether it can be done in practice. We therefore agreed that a pilot study is a study in which a future study or part of a future study , is conducted on a smaller scale to ask the question whether something can be done , should we proceed with it , and if so , how . The corollary of these definitions is that all pilot studies are feasibility studies but not all feasibility studies are pilot studies. Within the context of RCTs, the focus of our research, the ‘something’ in the definitions can be replaced with ‘a future RCT evaluating the effect of an intervention or therapy’. Studies that address the question of whether the RCT can be done, should we proceed with it and if so how, can then be classed as feasibility or pilot studies. Some of these studies may, of course, have other objectives but if they are mainly focusing on feasiblity of the future RCT we would include them as feasiblity studies. All three studies used as examples in our Delphi survey [ 10 – 12 ] satisfy the definition of a feasiblity study. However, a study by Piot et al , that we encountered while developing the Delphi study, does not. This study is described as a pilot trial in the abstract but the authors present only data on effectiveness and although they state that their results require confirmation in a larger study it is not clear that their pilot study was conducted in preparation for such a larger study [ 20 ]. On the other hand, Palmer et al ‘performed a feasibility study to determine whether patient and surgeon opinion was permissive for a Randomised Controlled Trial (RCT) comparing operative with non-operative treatment for FAI [femoroacetabular impingement]’ [ 12 ]. Heazell et al describe the aim of their randomised study as ‘to address whether a randomised controlled trial (RCT) of the management of RFM [reduced fetal movement] was feasible’ [ 10 ]. Their study was piloting many of the aspects they hoped to implement in a larger trial of RFM, thus making this also a pilot study, whereas the study conducted by Palmer et al , which comprised a questionnare to clinicians and seeking patient opinion, is not a pilot study but is a feasibility study.

Within our framework, some important studies conducted in advance of a future RCT to evaluate the effect of a therapy or intervention are not feasibility studies. For example, a systematic review, usually an essential pre-requisite for such an RCT, normally addresses whether the future RCT is necessary or desirable , not whether it is feasible . To reflect this, we developed a comprehensive diagrammatical representation of our framework for studies conducted in preparation for an RCT which, for completeness, includes, on the left hand side, early studies that are not pilot and feasibility studies, such as systematic reviews and, along the bottom, details of existing or planned reporting guidelines for different types of study ( S2 Fig ).

Validating the conceptual framework—Systematic review

From the 150 most recent studies identified by our electronic search, we identified 27 eligible reports ( Fig 6 ). In keeping with our working definition of a pilot or feasibility study, to be included the reports had to show evidence that investigators were addressing at least some feasibility objectives and that the study was in preparation for a future RCT evaluating the effect of an intervention. Ideally we would have stipulated that the primary objective of the study should be a feasibility objective but, given the nature of the reporting of most of these studies, we felt this would be too restrictive.

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g006

The 27 studies are reported in Table 1 and results relating to terminology that authors used summarised in Table 2 . Results in Table 2 support our first hypothesis that the words ‘pilot’ and ‘feasibility’ are both used in the literature to describe studies undertaken in preparation for a randomised controlled trial of effectiveness; 63% (17/27) used both terms somewhere in the title or abstract. The table also supports our second hypothesis that amongst the subset of feasibility studies in preparation for an RCT that are themselves RCTs, authors do not use the term ‘pilot trial’ consistently in relation to these studies; of the 18 randomised studies only eight contained the words ‘pilot’ and ‘trial’ in the title. Our third hypothesis, namely that it is not possible to apply unique mutually exclusive definitions of pilot and feasibility studies in preparation for an RCT that are consistent with the way authors describe their studies, is supported by the characteristics of studies presented in Table 1 and summarised in Table 2 . We could find no design or other features (such as randomisation or presence of a control group) that distinguished between those that investigators called feasibility studies and those that they called pilot studies. However, the fourth hypothesis, that amongst studies in preparation for an RCT evaluating the effect of an intervention or therapy it is possible to identify some studies that explore the feasibility of a certain intervention or acquire related information about the feasibility of applying an intervention in a future study but are not pilot studies, was not supported; we identified no such studies amongst those reported in Table 1 . Nevertheless, we had identified two prior to carrying out the review [ 10 , 15 ].

thumbnail

https://doi.org/10.1371/journal.pone.0150205.t001

thumbnail

https://doi.org/10.1371/journal.pone.0150205.t002

Out of our exploratory sample of 30 study reports for which we reviewed full texts rather than only titles and abstracts, we identified 10 that could be classed as pilot or feasibility studies using our framework. We had already identified four of these in our sample reported in Table 1 , but had failed to identify the other six. As expected, this was because key information to identify them as pilot or feasiblity studies such as the fact that they were in preparation for a larger RCT, or that the main objectives were to do with feasiblity were not included in the abstract. Thus our assumption that an initial screen using only abstracts resulted in the omission of some pilot and feasiblity studies was correct.

International consensus meeting participants agreed with the general tenets of our conceptual framework including the ideas that all pilot studies are feasibility studies but that some feasibility studies are not pilot studies. They suggested that any definitive diagrammatic representation should more strongly reflect non-linearity in the ordering of feasibility studies. As a result of their input we produced a new, simplified, diagrammatical representation of the framework ( Fig 7 ) which focuses on the key elements represented inside an oval shape on our original diagram, omits the wider context outside this shape, and highlights some features, including the non-linearity, more clearly.

thumbnail

https://doi.org/10.1371/journal.pone.0150205.g007

The finalised framework

Fig 7 represents the framework. The figure indicates that where there is uncertainty about future RCT feasibility, a feasibility study is appropriate. Feasibility is thus an overarching concept within which we distinguish between three distinct types of study. Randomised pilot studies are those studies in which the future RCT, or parts of it, including the randomisation of participants, is conducted on a smaller scale (piloted) to see if it can be done. Thus randomised pilot studies can include studies that for the most part reflect the design of a future definitive trial but, if necessary due to remaining uncertainty, may involve trying out alternative strategies, for example, collecting an outcome variable via telephone for some participants and on-line for others. Within the framework randomised pilot studies could also legitimately be called randomised feasibility studies. Two-thirds of the studies presented in Table 1 are of this type.

Non-randomised pilot studies are similar to randomised pilot studies; they are studies in which all or part of the intervention to be evaluated and other processes to be undertaken in a future trial is/are carried out (piloted) but without randomisation of participants. These could also legitimately be called by the umbrella term, feasibility study. These studies cover a wide range from those that are very similar to randomised pilot studies except that the intervention and control groups have not been randomised, to those in which only the intervention, and no other trial processes, are piloted. One-third of studies presented in Table 1 are of this type.

Feasibility studies that are not pilot studies are those in which investigators attempt to answer a question about whether some element of the future trial can be done but do not implement the intervention to be evaluated or other processes to be undertaken in a future trial, though they may be addressing intervention development in some way. Such studies are rarer than the other types of feasibility study and, in fact, none of the studies in Table 1 were of this type. Nevertheless, we include these studies within the framework because they do exist; the Palmer study [ 15 ] in which surgeons and patients were asked about the feasibility of randomisation is one such example. Other examples might be interviews to ascertain the acceptability of an intervention, or questionnaires to assess the types of outcomes participants might think important. Within the framework these studies can be called feasibility studies but cannot be called pilot studies since no part of the future randomised controlled trial is being conducted on a smaller scale.

Investigators may conduct a number of studies to assess feasibility of an RCT to test the effect of any intervention or therapy. While it may be most common to carry out what we have referred to as feasibility studies that are not pilot studies before non-randomised pilot studies , and non-randomised pilot studies prior to randomised pilot studies , the process of feasibility work is not necessarily linear and such studies can in fact be conducted in any order. For completeness the diagram indicates the location of internal pilot studies.

There are diverse views about the definitions of pilot and feasibility studies within the research community. We reached consensus over a conceptual framework for the definitions of these studies in which feasibility is an overarching concept for studies assessing whether a future study, project or development can be done. For studies conducted in preparation for a RCT assessing the effect of a therapy or intervention, three distinct types of study come under the umbrella of feasibility studies: randomised pilot studies, non-randomised pilot studies, feasibility studies that are not pilot studies. Thus pilot studies are a subset of feasibility studies. A review of the literature confirmed that it is not possible to apply mutually exclusive definitions of pilot and feasibility studies in preparation for such an RCT that are consistent with the way authors describe their studies. For example Lee et al [ 31 ], Boogerd et al [ 22 ] and Wolf et al [ 38 ] all describe randomised studies exploring the feasibility of introducing new systems (brain computer interface memory training game, on-line interactive treatment environment, bed-exit alarm respectively) but Lee et al describe their study as a ‘A Randomized Control Pilot Study’, with the word ‘feasibility’ used in the abstract and text, while the study by Boogerd et al . is titled ‘Teaming up: feasibility of an online treatment environment for adolescents with type 1 diabetes’, and Wolf at al describe their study as a pilot study without using the word ‘feasibility’.

Our re-evaluation of the definitions of pilot and feasibility studies was conducted over a period of time with input via a variety of media by multi-disciplinary and international researchers, publishers, editors and funders. It was to some extent a by-product of our work developing reporting guidelines for such studies. Nevertheless, we were able to gather a wide range of expert views, and the iterative nature of the development of our thinking has been an important part of obtaining consensus. Other parallel developments, including the recent establishment of the new Pilot and Feasibility Studies journal [ 48 ], suggest that our work is, indeed, timely. We encountered several difficulties in reviewing empirical study reports. Firstly, it was sometimes hard to assess whether studies were planned in preparation for an RCT or whether the authors were conducting a small study and simply commenting on the fact that a larger RCT would be useful. Secondly, objectives were sometimes unclear, and/or effectiveness objectives were often emphasised in spite of recommendations that pilot and feasibility studies should not be focusing on effectiveness [ 1 , 4 ]. In identifying relevant studies we erred on the side of inclusiveness, acknowledging that getting these studies published is not easy and that there are, as yet, no definitive reporting guidelines for investigators to follow. Lastly, our electronic search was unable to identify any feasibility studies that were not pilot studies according to our definitions. Subsequent discussion with qualitative researchers suggested that this is because such studies are often not described as feasibility studies in the title or abstract.

Our framework is compatible with the UK MRC guidance on complex interventions which suggests a ‘feasibility and piloting’ phase as part of the work to design and evaluate such interventions without any explicit distinction between pilot and feasibility studies. In addition, although our framework has a different underlying principle from that adopted by UK NIHR, the NIHR definition of a pilot study is not far from the subset of studies we have described as randomised pilot studies. Although there appears to be increasing interest in pilot and feasibility studies, as far as we are aware no other funding bodies specifically address the nature of such studies. The National Institute for Health in the USA does, however, routinely require published pilot studies before considering funding applications for certain streams, and the Canadian Institutes of Health Research routinely have calls for pilot or feasibility studies in different clinical areas to gather evidence necessary to determine the viability of new research directions determined by their strategic funding plans. These approaches highlight the need for clarity regarding what constitutes a pilot study.

There are several previous reviews of empirical pilot and feasibility studies [ 1 , 4 , 7 ]. In the most recent, reviewing studies published between 2000 and 2009 [ 7 ], the authors identified a large number of studies, described similar difficulty in identifying whether a larger study was actually being planned, and similar lack of consistency in the way the terms ‘pilot’ and ‘feasibility’ are used. Nevertheless, in methodological work, many researchers have adopted fairly rigid definitions of pilot and feasibility studies. For example, Bugge et al in developing the ADEPT framework refer to the NIHR definitions and suggest that feasibility studies ask questions about ‘whether the study can be done’ while pilot trials are ‘(a miniature version of the main trial), which aim to test aspects of study design and processes for the implementation of a larger main trial in the future’ [ 49 ]. Although not explicitly stated, the text seems to suggest that pilot and feasibility studies are mutually exclusive. Our work indicates that this is neither necessary nor desirable. There is, however, general agreement in the literature about the purpose of pilot and feasibility studies. For example, pilot trials are ‘to provide sufficient assurance to enable a larger definitive trial to be undertaken’ [ 50 ], and pilot studies are ‘designed to test the performance characteristics and capabilities of study designs, measures, procedures, recruitment criteria, and operational strategies that are under consideration for use in a subsequent, often larger, study’ [ 51 ], and ‘play a pivotal role in the planning of large-scale and often expensive investigations’ [ 52 ]. Within our framework we define all studies aiming to assess whether a future RCT is do-able as ‘feasibility studies’. Some might argue that the focus of their study in preparation for a future RCT is acceptability rather than feasibility, and indeed, in other frameworks, such as the RE-AIM framework [ 53 ], feasibility and acceptability are seen as two different concepts. However, it is perfectly possible to explore the acceptability of an intervention, of a data collection process or of randomisation in order to determine the feasibility of a putative larger RCT. Thus the use of the term ‘feasibility study’ for a study in preparation for a future RCT is not incompatible with the exploration of issues other than feasibility within the study itself.

There are numerous previous studies in which the investigators review the literature and seek the counsel of experts to develop definitions and clarify terminology. Most of these relate to clinical or physiological definitions [ 54 – 56 ]. A few explorations of definitions relate to concepts such as quality of life [ 57 ]. Implicit in much of this work is that from time to time definitions need rethinking as knowledge and practice moves on. From an etymological point of view this makes sense. In fact, the use of the word ‘pilot’ to mean something that is a prototype of something else only appears to emerge in the middle of the twentieth century and the first use of the word in relation to research design that we could find was in 1947—a pilot survey [ 58 ]. Thus we do not have to look very far back to see changes in the use of one of the words we have been dealing with in developing our conceptual framework. We hope what we are proposing here is helpful in the early twenty-first century to clarify the use of the words ‘pilot’ and ‘feasibility’ in a health research context.

We suggest that researchers view feasibility as an overarching concept, with all studies done in preparation for a main study open to being called feasibility studies, and with pilot studies as a subset of feasibility studies. All such studies should be labelled ‘pilot’ and/or ‘feasibility’ as appropriate, preferably in the title of a report, but if not certainly in the abstract. This recommendation applies to all studies that contribute to an assessment of the feasibility of an RCT evaluating the effect of an intervention. Using either of the terms in the title will be most helpful for those conducting future electronic searches. However, we recognise that for qualitative studies, authors may find it convenient to use the terms in the abstract rather than the title. Authors also need to describe objectives and methods well, reporting clearly if their study is in preparation for a future RCT to evaluate the effect of an intervention or therapy.

Though the focus of this work was on the definitions of pilot and feasibility studies and extensive recommendations for the conduct of these studies is outside its scope, we suggest that in choosing what type of feasibility study to conduct investigators should pay close attention to the major uncertainties that exist in relation to trial or intervention. A randomised pilot study may not be necessary to address these; in some cases it may not even be necessary to implement an intervention at all. Similarly, funders should look for a justification for the type of feasibility study that investigators propose. We have has also highlighted the need for better reporting of these studies. The CONSORT extension for randomised pilot studies that our group has developed are important in helping to address this need and will be reported separately. Nevertheless, further work will be necessary to extend or adapt these reporting guidelines for use for non-randomised pilot studies and for feasibility studies that are not pilot studies. There is also more work to be done in developing good practice guidance for the conduct of pilot and feasibility studies.

Supporting Information

S1 fig. search strategy to identify studies that authors described as pilot or feasibility studies..

https://doi.org/10.1371/journal.pone.0150205.s001

S2 Fig. Initial comprehensive diagrammatic representation of framework.

https://doi.org/10.1371/journal.pone.0150205.s002

Acknowledgments

We thank Alicia O’Cathain and Pat Hoddinot for discussions about the reporting of qualitative studies, and consensus participants for their views on our developing framework. Claire Coleman was funded by a National Institute for Health Research (NIHR) Research Methods Fellowship. This article presents independent research funded by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Author Contributions

Conceived and designed the experiments: SE GL MC LT SH CB. Performed the experiments: SE GL MC LT SH CB CC. Analyzed the data: SE GL MC LT SH CB CC. Contributed reagents/materials/analysis tools: SE GL MC LT SH CB. Wrote the paper: SE GL MC LT SH CB CC.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 5. National Institute for Health Research. NIHR Evaluation, Trials and Studies | Glossary 2015. Available: http://www.nets.nihr.ac.uk/glossary/feasibility-studies . Accessed 2015 Mar 17.
  • 6. National Institute for Health Research. NIHR Evaluation, Trials and Studies | Pilot studies 2015. Available: http://www.nets.nihr.ac.uk/glossary/pilot-studies . Accessed 2015 Mar 17.
  • 9. CLINVIVO. Clinvivo Limited 2015 [cited 2015 9 April]. Available: http://www.clinvivo.com/ .
  • 13. In Conference Ltd. 2nd Clinical Trials Methodology Conference | 18–19 November 2013 EICC, Edinburgh, Scotland 2013. Available: http://www.methodologyconference2013.org.uk/ . Accessed 2015 Mar 17.
  • 14. Oxford Dictionaries. Oxford Dictionaries | feasibility 2015. Available: http://www.oxforddictionaries.com/definition/english/feasibility . Accessed 2015 Mar 17.
  • 15. Oxford Dictionaries. Oxford Dictionaries | feasibility study 2015. Available: http://www.oxforddictionaries.com/definition/english/feasibility-study . Accessed 2015 Mar 17.
  • 16. Wikipedia. Feasibility study 2015 [cited 2015 17 March]. Available: http://en.wikipedia.org/wiki/Feasibility_study .
  • 17. Oxford Dictionaries. Oxford Dictionaries | pilot 2015. Available from: http://www.oxforddictionaries.com/definition/english/pilot . Accessed 2015 Mar 17.
  • 18. Collins. Collins English Dictionary | pilot study 2015. Available from: http://www.collinsdictionary.com/dictionary/english/pilot-study . Accessed 2015 Mar 17.
  • 19. Wikipedia. Pilot experiment 2015. Available: http://en.wikipedia.org/wiki/Pilot_experiment . Accessed 2015 Mar 17.
  • Open access
  • Published: 06 January 2010

A tutorial on pilot studies: the what, why and how

  • Lehana Thabane 1 , 2 ,
  • Jinhui Ma 1 , 2 ,
  • Rong Chu 1 , 2 ,
  • Ji Cheng 1 , 2 ,
  • Afisi Ismaila 1 , 3 ,
  • Lorena P Rios 1 , 2 ,
  • Reid Robson 3 ,
  • Marroon Thabane 1 , 4 ,
  • Lora Giangregorio 5 &
  • Charles H Goldsmith 1 , 2  

BMC Medical Research Methodology volume  10 , Article number:  1 ( 2010 ) Cite this article

364k Accesses

1588 Citations

116 Altmetric

Metrics details

A Correction to this article was published on 11 March 2023

This article has been updated

Pilot studies for phase III trials - which are comparative randomized trials designed to provide preliminary evidence on the clinical efficacy of a drug or intervention - are routinely performed in many clinical areas. Also commonly know as "feasibility" or "vanguard" studies, they are designed to assess the safety of treatment or interventions; to assess recruitment potential; to assess the feasibility of international collaboration or coordination for multicentre trials; to increase clinical experience with the study medication or intervention for the phase III trials. They are the best way to assess feasibility of a large, expensive full-scale study, and in fact are an almost essential pre-requisite. Conducting a pilot prior to the main study can enhance the likelihood of success of the main study and potentially help to avoid doomed main studies. The objective of this paper is to provide a detailed examination of the key aspects of pilot studies for phase III trials including: 1) the general reasons for conducting a pilot study; 2) the relationships between pilot studies, proof-of-concept studies, and adaptive designs; 3) the challenges of and misconceptions about pilot studies; 4) the criteria for evaluating the success of a pilot study; 5) frequently asked questions about pilot studies; 7) some ethical aspects related to pilot studies; and 8) some suggestions on how to report the results of pilot investigations using the CONSORT format.

1. Introduction

The Concise Oxford Thesaurus [ 1 ] defines a pilot project or study as an experimental, exploratory, test, preliminary, trial or try out investigation. Epidemiology and statistics dictionaries provide similar definitions of a pilot study as a small scale

" ... test of the methods and procedures to be used on a larger scale if the pilot study demonstrates that the methods and procedures can work" [ 2 ];

"...investigation designed to test the feasibility of methods and procedures for later use on a large scale or to search for possible effects and associations that may be worth following up in a subsequent larger study" [ 3 ].

Table 1 provides a summary of definitions found on the Internet. A closer look at these definitions reveals that they are similar to the ones above in that a pilot study is synonymous with a feasibility study intended to guide the planning of a large-scale investigation. Pilot studies are sometimes referred to as "vanguard trials" (i.e. pre-studies) intended to assess the safety of treatment or interventions; to assess recruitment potential; to assess the feasibility of international collaboration or coordination for multicentre trials; to evaluate surrogate marker data in diverse patient cohorts; to increase clinical experience with the study medication or intervention, and identify the optimal dose of treatments for the phase III trials [ 4 ]. As suggested by an African proverb from the Ashanti people in Ghana " You never test the depth of a river with both feet ", the main goal of pilot studies is to assess feasibility so as to avoid potentially disastrous consequences of embarking on a large study - which could potentially "drown" the whole research effort.

Feasibility studies are routinely performed in many clinical areas. It is fair to say that every major clinical trial had to start with some piloting or a small scale investigation to assess the feasibility of conducting a larger scale study: critical care [ 5 ], diabetes management intervention trials [ 6 ], cardiovascular trials [ 7 ], primary healthcare [ 8 ], to mention a few.

Despite their noted importance, the reality is that pilot studies receive little or no attention in scientific research training. Few epidemiology or research textbooks cover the topic with the necessary detail. In fact, we are not aware of any textbook that dedicates a chapter on this issue - many just mention it in passing or provide a cursory coverage of the topic. The objective of this paper is to provide a detailed examination of the key aspects of pilot studies. In the next section, we narrow the focus of our definition of a pilot to phase III trials. Section 3 covers the general reasons for conducting a pilot study. Section 4 deals with the relationships between pilot studies, proof-of-concept studies, and adaptive designs, while section 5 addresses the challenges of pilot studies. Evaluation of a pilot study (i.e. how to determine if a pilot study was successful) is covered in Section 6. We deal with several frequently asked questions about pilot studies in Section 7 using a "question-and-answer" approach. Section 8 covers some ethical aspects related to pilot studies; and in Section 9, we follow the CONSORT format [ 9 ] to offer some suggestions on how to report the results of pilot investigations.

2. Narrowing the focus: Pilot studies for randomized studies

Pilot studies can be conducted in both quantitative and qualitative studies. Adopting a similar approach to Lancaster et al . [ 10 ], we focus on quantitative pilot studies - particularly those done prior to full-scale phase III trials. Phase I trials are non-randomized studies designed to investigate the pharmacokinetics of a drug (i.e. how a drug is distributed and metabolized in the body) including finding a dose that can be tolerated with minimal toxicity. Phase II trials provide preliminary evidence on the clinical efficacy of a drug or intervention. They may or may not be randomized. Phase III trials are randomized studies comparing two or more drugs or intervention strategies to assess efficacy and safety. Phase IV trials, usually done after registration or marketing of a drug, are non-randomized surveillance studies to document experiences (e.g. side-effects, interactions with other drugs, etc) with using the drug in practice.

For the purposes of this paper, our approach to utilizing pilot studies relies on the model for complex interventions advocated by the British Medical Research Council - which explicitly recommends the use of feasibility studies prior to Phase III clinical trials, but stresses the iterative nature of the processes of development, feasibility and piloting, evaluation and implementation [ 11 ].

3. Reasons for Conducting Pilot Studies

Van Teijlingen et al . [ 12 ] and van Teijlingen and Hundley [ 13 ] provide a summary of the reasons for performing a pilot study. In general, the rationale for a pilot study can be grouped under several broad classifications - process, resources, management and scientific (see also http://www.childrens-mercy.org/stats/plan/pilot.asp for a different classification):

Process: This assesses the feasibility of the steps that need to take place as part of the main study. Examples include determining recruitment rates, retention rates, etc.

Resources: This deals with assessing time and budget problems that can occur during the main study. The idea is to collect some pilot data on such things as the length of time to mail or fill out all the survey forms.

Management: This covers potential human and data optimization problems such as personnel and data management issues at participating centres.

Scientific: This deals with the assessment of treatment safety, determination of dose levels and response, and estimation of treatment effect and its variance.

Table 2 summarizes this classification with specific examples.

4. Relationships between Pilot Studies, Proof-of-Concept Studies, and Adaptive Designs

A proof-of-concept (PoC) study is defined as a clinical trial carried out to determine if a treatment (drug) is biologically active or inactive [ 14 ]. PoC studies usually use surrogate markers as endpoints. In general, they are phase I/II studies - which, as noted above, investigate the safety profile, dose level and response to new drugs [ 15 ]. Thus, although designed to inform the planning of phase III trials for registration or licensing of new drugs, PoC studies may not necessarily fit our restricted definition of pilot studies aimed at assessing feasibility of phase III trials as outlined in Section 2.

An adaptive trial design refers to a design that allows modifications to be made to a trial's design or statistical procedures during its conduct, with the purpose of efficiently identifying clinical benefits/risks of new drugs or to increase the probability of success of clinical development [ 16 ]. The adaptations can be prospective (e.g. stopping a trial early due to safety or futility or efficacy at interim analysis); concurrent (e.g. changes in eligibility criteria, hypotheses or study endpoints) or retrospective (e.g. changes to statistical analysis plan prior to locking database or revealing treatment codes to trial investigators or patients). Piloting is normally built into adaptive trial designs by determining a priori decision rules to guide the adaptations based on cumulative data. For example, data from interim analyses could be used to refine sample size calculations [ 17 , 18 ]. This approach is routinely used in internal pilot studies - which are primarily designed to inform sample size calculation for the main study, with recalculation of the sample size as the key adaptation. Unlike other phase III pilots, an internal pilot investigation does not usually address any other feasibility aspects - because it is essentially part of the main study [ 10 , 19 , 20 ]..

Nonetheless, we need to emphasize that whether or not a study is a pilot, depends on its objectives. An adaptive method is used as a strategy to reach that objective. Both a pilot and a non-pilot could be adaptive.

5. Challenges of and Common Misconceptions about Pilot Studies

Pilot studies can be very informative, not only to the researchers conducting them but also to others doing similar work. However, many of them never get published, often because of the way the results are presented [ 13 ]. Quite often the emphasis is wrongly placed on statistical significance, not on feasibility - which is the main focus of the pilot study. Our experience in reviewing submissions to a research ethics board also shows that most of the pilot projects are not well designed: i.e. there are no clear feasibility objectives; no clear analytic plans; and certainly no clear criteria for success of feasibility.

In many cases, pilot studies are conducted to generate data for sample size calculations. This seems especially sensible in situations where there are no data from previous studies to inform this process. However, it can be dangerous to use pilot studies to estimate treatment effects, as such estimates may be unrealistic/biased because of the limited sample sizes. Therefore if not used cautiously, results of pilot studies can potentially mislead sample size or power calculations [ 21 ] -- particularly if the pilot study was done to see if there is likely to be a treatment effect in the main study. In section 6, we provide guidance on how to proceed with caution in this regard.

There are also several misconceptions about pilot studies. Below are some of the common reasons that researchers have put forth for calling their study a pilot.

The first common reason is that a pilot study is a small single-centre study. For example, researchers often state lack of resources for a large multi-centre study as a reason for doing a pilot. The second common reason is that a pilot investigation is a small study that is similar in size to someone else's published study. In reviewing submissions to a research ethics board, we have come across sentiments such as

So-and-so did a similar study with 6 patients and got statistical significance - ours uses 12 patients (double the size)!

We did a similar pilot before (and it was published!)

The third most common reason is that a pilot is a small study done by a student or an intern - which can be completed quickly and does not require funding. Specific arguments include

I have funding for 10 patients only;

I have limited seed (start-up) funding;

This is just a student project!

My supervisor (boss) told me to do it as a pilot .

None of the above arguments qualifies as sound reasons for calling a study a pilot. A study should only be conducted if the results will be informative; studies conducted for the reasons above may result in findings of limited utility, which would be a waste of the researchers' and participants' efforts. The focus of a pilot study should be on assessment of feasibility, unless it was powered appropriately to assess statistical significance. Further, there is a vast number of poorly designed and reported studies. Assessment of the quality of a published report may be helpful to guide decisions of whether the report should be used to guide planning or designing of new studies. Finally, if a trainee or researcher is assigned a project as a pilot it is important to discuss how the results will inform the planning of the main study. In addition, clearly defined feasibility objectives and rationale to justify piloting should be provided.

Sample Size for Pilot Studies

In general, sample size calculations may not be required for some pilot studies. It is important that the sample for a pilot be representative of the target study population. It should also be based on the same inclusion/exclusion criteria as the main study. As a rule of thumb, a pilot study should be large enough to provide useful information about the aspects that are being assessed for feasibility. Note that PoC studies require sample size estimation based on surrogate markers [ 22 ], but they are usually not powered to detect meaningful differences in clinically important endpoints. The sample used in the pilot may be included in the main study, but caution is needed to ensure the key features of the main study are preserved in the pilot (e.g. blinding in randomized controlled trials). We recommend if any pooling of pilot and main study data is considered, this should be planned beforehand, described clearly in the protocol with clear discussion of the statistical consequences and methods. The goal is to avoid or minimize the potential bias that may occur due to multiple testing issues or any other opportunistic actions by investigators. In general, pooling when done appropriately can increase the efficiency of the main study [ 23 ].

As noted earlier, a carefully designed pilot study may be used to generate information for sample size calculations. Two approaches may be helpful to optimize information from a pilot study in this context: First, consider eliciting qualitative data to supplement the quantitative information obtained in the pilot. For example, consider having some discussions with clinicians using the approach suggested by Lenth [ 24 ] to illicit additional information on possible effect size and variance estimates. Second, consider creating a sample size table for various values of the effect or variance estimates to acknowledge the uncertainty surrounding the pilot estimates.

In some cases, one could use a confidence interval [CI] approach to estimate the sample size required to establish feasibility. For example, suppose we had a pilot trial designed primarily to determine adherence rates to the standardized risk assessment form to enhance venous thromboprophylaxis in hospitalized patients. Suppose it was also decided a priori that the criterion for success would be: the main trial would be ' feasibl e' if the risk assessment form is completed for ≥ 70% of eligible hospitalized patients.

6. How to Interpret the Results of a Pilot Study: Criteria for Success

It is always important to state the criteria for success of a pilot study. The criteria should be based on the primary feasibility objectives. These provide the basis for interpreting the results of the pilot study and determining whether it is feasible to proceed to the main study. In general, the outcome of a pilot study can be one of the following: (i) Stop - main study not feasible; (ii) Continue, but modify protocol - feasible with modifications; (iii) Continue without modifications, but monitor closely - feasible with close monitoring and (iv) Continue without modifications - feasible as is.

For example, the Prophylaxis of Thromboembolism in Critical Care Trial (PROTECT) was designed to assess the feasibility of a large-scale trial with the following criteria for determining success [ 25 ]:

98.5% of patients had to receive study drug within 12 hours of randomization;

91.7% of patients had to receive every scheduled dose of the study drug in a blinded manner;

90% or more of patients had to have lower limb compression ultrasounds performed at the specified times; and

> 90% of necessary dose adjustments had to have been made appropriately in response to pre-defined laboratory criteria .

In a second example, the PeriOperative Epidural Trial (POET) Pilot Study was designed to assess the feasibility of a large, multicentre trial with the following criteria for determining success [ 26 ]:

one subject per centre per week (i.e., 200 subjects from four centres over 50 weeks) can be recruited ;

at least 70% of all eligible patients can be recruited ;

no more than 5% of all recruited subjects crossed over from one modality to the other; and

complete follow-up in at least 95% of all recruited subjects .

7. Frequently asked questions about pilot studies

In this Section, we offer our thoughts on some of the frequently asked questions about pilot studies. These could be helpful to not only clinicians and trainees, but to anyone who is interested in health research.

Can I publish the results of a pilot study?

- Yes, every attempt should be made to publish.

Why is it important to publish the results of pilot studies?

- To provide information about feasibility to the research community to save resources being unnecessarily spent on studies that may not be feasible. Further, having such information can help researchers to avoid duplication of efforts in assessing feasibility.

- Finally, researchers have an ethical and scientific obligation to attempt publishing the results of every research endeavor. However, our focus should be on feasibility goals. Emphasis should not be placed on statistical significance when pilot studies are not powered to detect minimal clinically important differences. Such studies typically do not show statistically significant results - remember that underpowered studies (with no statistically significant results) are inconclusive, not negative since "no evidence of effect" is not "evidence of no effect" [ 27 ].

Can I combine data from a pilot with data from the main study?

- Yes, provided the sampling frame and methodologies are the same. This can increase the efficiency of the main study - see Section 5.

Can I combine the results of a pilot with the results of another study or in a meta-analysis?

- Yes, provided the sampling frame and methodologies are the same.

- No, if the main study is reported and it includes the pilot study.

Can the results of the pilot study be valid on their own, without existence of the main study

- Yes, if the results show that it is not feasible to proceed to the main study or there is insufficient funding.

Can I apply for funding for a pilot study?

- Yes. Like any grant, it is important to justify the need for piloting.

- The pilot has to be placed in the context of the main study.

Can I randomize patients in a pilot study?

- Yes. For a phase III pilot study, one of the goals could be to assess how a randomization procedure might work in the main study or whether the idea of randomization might be acceptable to patients [ 10 ]. In general, it is always best for a pilot to maintain the same design as the main study.

How can I use the information from a pilot to estimate the sample size?

- Use with caution, as results from pilot studies can potentially mislead sample size calculations.

- Consider supplementing the information with qualitative discussions with clinicians - see section 5; and

- Create a sample size table to acknowledge the uncertainty of the pilot information - see section 5.

Can I use the results of a pilot study to treat my patients?

- Not a good idea!

- Pilot studies are primarily for assessing feasibility.

What can I do with a failed or bad pilot study?

- No study is a complete failure; it can always be used as bad example! However, it is worth making clear that a pilot study that shows the main study is not likely to be feasible is not a failed (pilot) study. In fact, it is a success - because you avoided wasting scarce resources on a study destined for failure!

8. Ethical Aspects of Pilot Studies

Halpern et al . [ 28 ] stated that conducting underpowered trials is unethical. However, they proposed that underpowered trials are ethical in two situations: (i) small trials of interventions for rare diseases -- which require documenting explicit plans for including results with those of similar trials in a prospective meta-analysis; (ii) early-phase trials in the development of drugs or devices - provided they are adequately powered for defined purposes other than randomized treatment comparisons. Pilot studies of phase III trials (dealing with common diseases) are not addressed in their proposal. It is therefore prudent to ask: Is it ethical to conduct a study whose feasibility can not be guaranteed (i.e. with a high probability of success)?

It seems unethical to consider running a phase III study without having sufficient data or information about the feasibility. In fact, most granting agencies often require data on feasibility as part of their assessment of the scientific validity for funding decisions.

There is however one important ethical aspect about pilot studies that has received little or no attention from researchers, research ethics boards and ethicists alike. This pertains to the issue of the obligation that researchers have to patients or participants in a trial to disclose the feasibility nature of pilot studies. This is essential given that some pilot studies may not lead to further studies. A review of the commonly cited research ethics guidelines - the Nuremburg Code [ 29 ], Helsinki Declaration [ 30 ], the Belmont Report [ 31 ], ICH Good Clinical Practice [ 32 ], and the International Ethical Guidelines for Biomedical Research Involving Human Subjects [ 33 ] - shows that pilot studies are not addressed in any of these guidelines. Canadian researchers are also encouraged to follow the Tri-Council Policy Statement (TCPS) [ 34 ] - it too does not address how pilot studies need to be approached. It seems to us that given the special nature of feasibility or pilot studies, the disclosure of their purpose to study participants requires special wording - that informs them of the definition of a pilot study, the feasibility objectives of the study, and also clearly defines the criteria for success of feasibility. To fully inform participants, we suggest using the following wording in the consent form:

" The overall purpose of this pilot study is to assess the feasibility of conducting a large study to [state primary objective of the main study]. A feasibility or pilot study is a study that... [state a general definition of a feasibility study]. The specific feasibility objectives of this study are ... [state the specific feasibility objectives of the pilot study]. We will determine that it is feasible to carry on the main study if ... [state the criteria for success of feasibility] ."

9. Recommendation for Reporting the Results of Pilot Studies

Adopted from the CONSORT Statement [ 9 ], Table 3 provides a checklist of items to consider including in a report of a pilot study.

Title and abstract

Item #1: the title or abstract should indicate that the study is a "pilot" or "feasibility".

As a number one summary of the contents of any report, it is important for the title to clearly indicate that the report is for a pilot or feasibility study. This would also be helpful to other researchers during electronic information search about feasibility issues. Our quick search of PUBMED [on July 13, 2009], using the terms "pilot" OR "feasibility" OR "proof-of-concept" for revealed 24423 (16%) hits of studies that had these terms in the title or abstract compared with 149365 hits that had these terms anywhere in the text.

Item #2: Scientific background for the main study and explanation of rationale for assessing feasibility through piloting

The rationale for initiating a pilot should be based on the need to assess feasibility for the main study. Thus, the background of the main study should clearly describe what is known or not known about important feasibility aspects to provide context for piloting.

Item #3: Participants and setting of the study

The description of the inclusion-exclusion or eligibility criteria for participants should be the same as in the main study. The settings and locations where the data were collected should also be clearly described.

Item #4: Interventions

Precise details of the interventions intended for each group and how and when they were actually administered (if applicable) - state clearly if any aspects of the intervention are assessed for feasibility.

Item #5: Objectives

State the specific scientific primary and secondary objectives and hypotheses for the main study and the specific feasibility objectives. It is important to clearly indicate the feasibility objectives as the primary focus for the pilot.

Item #6: Outcomes

Clearly define primary and secondary outcome measures for the main study. Then, clearly define the feasibility outcomes and how they were operationalized - these should include key elements such as recruitment rates, consent rates, completion rates, variance estimates, etc. In some cases, a pilot study may be conducted with the aim to determine a suitable (clinical or surrogate) endpoint for the main study. In such a case, one may not be able to define the primary outcome of the main study until the pilot is finished. However, it is important that determining the primary outcome of the main study be clearly stated as part of feasibility outcomes.

Item #7: Sample Size

Describe how sample size was determined. If the pilot is a proof-of-concept study, is the sample size calculated based on primary/key surrogate marker(s)? In general if the pilot is for a phase III study, there may be no need for a formal sample size calculation. However, the confidence interval approach may be used to calculate and justify the sample size based on key feasibility objective(s).

Item #8: Feasibility criteria

Clearly describe the criteria for assessing success of feasibility - these should be based on the feasibility objectives.

Item #9: Statistical Analysis

Describe the statistical methods for the analysis of primary and secondary feasibility outcomes.

Item #10: Ethical Aspects

State whether the study received research ethics approval. Describe how informed consent was handled - given the feasibility nature of the study.

Item #11: Participant Flow

Describe the flow of participants through each stage of the study (use of a flow-diagram is strongly recommended -- see CONSORT [ 9 ] for a template). Describe protocol deviations from pilot study as planned with reasons for deviations. State the number of exclusions at each stage and corresponding reasons for exclusions.

Item #12: Recruitment

Report the dates defining the periods of recruitment and follow-up.

Item #13: Baseline Data

Report the baseline demographic and clinical characteristics of the participants.

Item #14: Outcomes and Estimation

For each primary and secondary feasibility outcomes, report the point estimate of effect and its precision ( e.g ., 95% CI) - if applicable.

Item # 15: Interpretation

Interpretation of the results should focus on feasibility, taking into account the stated criteria for success of feasibility, study hypotheses, sources of potential bias or imprecision (given the feasibility nature of the study) and the dangers associated with multiplicity - repeated testing on multiple outcomes.

Item #16: Generalizability

Discuss the generalizability (external validity) of the feasibility aspects observed in the study. State clearly what modifications in the design of the main study (if any) would be necessary to make it feasible.

Item #17: Overall evidence of feasibility

Discuss the general results in the context of overall evidence of feasibility. It is important that the focus be on feasibility.

9. Conclusions

Pilot or vanguard studies provide a good opportunity to assess feasibility of large full-scale studies. Pilot studies are the best way to assess feasibility of a large expensive full-scale study, and in fact are an almost essential pre-requisite. Conducting a pilot prior to the main study can enhance the likelihood of success of the main study and potentially help to avoid doomed main studies. Pilot studies should be well designed with clear feasibility objectives, clear analytic plans, and explicit criteria for determining success of feasibility. They should be used cautiously for determining treatment effects and variance estimates for power or sample size calculations. Finally, they should be scrutinized the same way as full scale studies, and every attempt should be taken to publish the results in peer-reviewed journals.

Change history

11 march 2023.

A Correction to this paper has been published: https://doi.org/10.1186/s12874-023-01880-1

Waite M: Concise Oxford Thesaurus. 2002, Oxford, England: Oxford University Press, 2

Google Scholar  

Last JM, editor: A Dictionary of Epidemiology. 2001, Oxford University Press, 4

Everitt B: Medical Statistics from A to Z: A Guide for Clinicians and Medical Students. 2006, Cambridge University Press: Cambridge, 2

Book   Google Scholar  

Tavel JA, Fosdick L, ESPRIT Vanguard Group. ESPRIT Executive Committee: Closeout of four phase II Vanguard trials and patient rollover into a large international phase III HIV clinical endpoint trial. Control Clin Trials. 2001, 22: 42-48. 10.1016/S0197-2456(00)00114-8.

Article   CAS   PubMed   Google Scholar  

Arnold DM, Burns KE, Adhikari NK, Kho ME, Meade MO, Cook DJ: The design and interpretation of pilot trials in clinical research in critical care. Crit Care Med. 2009, 37 (Suppl 1): 69-74. 10.1097/CCM.0b013e3181920e33.

Article   Google Scholar  

Computerization of Medical Practice for the Enhancement of Therapeutic Effectiveness. Last accessed August 8, 2009, [ http://www.compete-study.com/index.htm ]

Heart Outcomes Prevention Evaluation Study. Last accessed August 8, 2009, [ http://www.ccc.mcmaster.ca/hope.htm ]

Cardiovascular Health Awareness Program. Last accessed August 8, 2009, [ http://www.chapprogram.ca/resources.html ]

Moher D, Schulz KF, Altman DG, CONSORT Group (Consolidated Standards of Reporting Trials): The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. J Am Podiatr Med Assoc. 2001, 91: 437-442.

Lancaster GA, Dodd S, Williamson PR: Design and analysis of pilot studies: recommendations for good practice. J Eval Clin Pract. 2004, 10: 307-12. 10.1111/j..2002.384.doc.x.

Article   PubMed   Google Scholar  

Craig N, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M: Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ. 2008, 337: a1655-10.1136/bmj.a1655.

Article   PubMed   PubMed Central   Google Scholar  

Van Teijlingen ER, Rennie AM, Hundley V, Graham W: The importance of conducting and reporting pilot studies: the example of the Scottish Births Survey. J Adv Nurs. 2001, 34: 289-295. 10.1046/j.1365-2648.2001.01757.x.

Van Teijlingen ER, Hundley V: The Importance of Pilot Studies. Social Research Update. 2001, 35-[ http://sru.soc.surrey.ac.uk/SRU35.html ]

Lawrence Gould A: Timing of futility analyses for 'proof of concept' trials. Stat Med. 2005, 24: 1815-1835. 10.1002/sim.2087.

Fardon T, Haggart K, Lee DK, Lipworth BJ: A proof of concept study to evaluate stepping down the dose of fluticasone in combination with salmeterol and tiotropium in severe persistent asthma. Respir Med. 2007, 101: 1218-1228. 10.1016/j.rmed.2006.11.001.

Chow SC, Chang M: Adaptive design methods in clinical trials - a review. Orphanet J Rare Dis. 2008, 3: 11-10.1186/1750-1172-3-11.

Gould AL: Planning and revising the sample size for a trial. Stat Med. 1995, 14: 1039-1051. 10.1002/sim.4780140922.

Coffey CS, Muller KE: Properties of internal pilots with the univariate approach to repeated measures. Stat Med. 2003, 22: 2469-2485. 10.1002/sim.1466.

Zucker DM, Wittes JT, Schabenberger O, Brittain E: Internal pilot studies II: comparison of various procedures. Statistics in Medicine. 1999, 18: 3493-3509. 10.1002/(SICI)1097-0258(19991230)18:24<3493::AID-SIM302>3.0.CO;2-2.

Kieser M, Friede T: Re-calculating the sample size in internal pilot designs with control of the type I error rate. Statistics in Medicine. 2000, 19: 901-911. 10.1002/(SICI)1097-0258(20000415)19:7<901::AID-SIM405>3.0.CO;2-L.

Kraemer HC, Mintz J, Noda A, Tinklenberg J, Yesavage JA: Caution regarding the use of pilot studies to guide power calculations for study proposals. Arch Gen Psychiatry. 2006, 63: 484-489. 10.1001/archpsyc.63.5.484.

Yin Y: Sample size calculation for a proof of concept study. J Biopharm Stat. 2002, 12: 267-276. 10.1081/BIP-120015748.

Wittes J, Brittain E: The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med. 1990, 9: 65-71. 10.1002/sim.4780090113.

Lenth R: Some Practical Guidelines for Effective Sample Size Determination. The American Statistician. 2001, 55: 187-193. 10.1198/000313001317098149.

Cook DJ, Rocker G, Meade M, Guyatt G, Geerts W, Anderson D, Skrobik Y, Hebert P, Albert M, Cooper J, Bates S, Caco C, Finfer S, Fowler R, Freitag A, Granton J, Jones G, Langevin S, Mehta S, Pagliarello G, Poirier G, Rabbat C, Schiff D, Griffith L, Crowther M, PROTECT Investigators. Canadian Critical Care Trials Group: Prophylaxis of Thromboembolism in Critical Care (PROTECT) Trial: a pilot study. J Crit Care. 2005, 20: 364-372. 10.1016/j.jcrc.2005.09.010.

Choi PT, Beattie WS, Bryson GL, Paul JE, Yang H: Effects of neuraxial blockade may be difficult to study using large randomized controlled trials: the PeriOperative Epidural Trial (POET) Pilot Study. PLoS One. 2009, 4 (2): e4644-10.1371/journal.pone.0004644.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Altman DG, Bland JM: Absence of evidence is not evidence of absence. BMJ. 1995, 311: 485-

Halpern SD, Karlawish JH, Berlin JA: The continuing unethical conduct of underpowered clinical trials. JAMA. 2002, 288: 358-362. 10.1001/jama.288.3.358.

The Nuremberg Code, Research ethics guideline 2005. Last accessed August 8, 2009, [ http://www.hhs.gov/ohrp/references/nurcode.htm ]

The Declaration of Helsinki, Research ethics guideline. Last accessed December 22, 2009, [ http://www.wma.net/en/30publications/10policies/b3/index.html ]

The Belmont Report, Research ethics guideline. Last accessed August 8, 2009, [ http://ohsr.od.nih.gov/guidelines/belmont.html ]

The ICH Harmonized Tripartite Guideline-Guideline for Good Clinical Practice. Last accessed August 8, 2009, [ http://www.gcppl.org.pl/ma_struktura/docs/ich_gcp.pdf ]

The International Ethical Guidelines for Biomedical Research Involving Human Subjects. Last accessed August 8, 2009, [ http://www.fhi.org/training/fr/Retc/pdf_files/cioms.pdf ]

Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans, Government of Canada. Last accessed August 8, 2009, [ http://www.pre.ethics.gc.ca/english/policystatement/policystatement.cfm ]

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/10/1/prepub

Download references

Acknowledgements

Dr Lehana Thabane is clinical trials mentor for the Canadian Institutes of Health Research. We thank the reviewers for insightful comments and suggestions which led to improvements in the manuscript.

Author information

Authors and affiliations.

Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada

Lehana Thabane, Jinhui Ma, Rong Chu, Ji Cheng, Afisi Ismaila, Lorena P Rios, Marroon Thabane & Charles H Goldsmith

Biostatistics Unit, St Joseph's Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane, Jinhui Ma, Rong Chu, Ji Cheng, Lorena P Rios & Charles H Goldsmith

Department of Medical Affairs, GlaxoSmithKline Inc., Mississauga, ON, Canada

Afisi Ismaila & Reid Robson

Department of Medicine, Division of Gastroenterology, McMaster University, Hamilton, ON, Canada

Marroon Thabane

Department of Kinesiology, University of Waterloo, Waterloo, ON, Canada

Lora Giangregorio

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lehana Thabane .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors' contributions

LT drafted the manuscript. All authors reviewed several versions of the manuscript, read and approved the final version.

The original online version of this article was revised: the authors would like to correct the number of sample size in the fourth paragraph under the heading Sample Size for Pilot Studies from “75 patients” to “289 patients”.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Thabane, L., Ma, J., Chu, R. et al. A tutorial on pilot studies: the what, why and how. BMC Med Res Methodol 10 , 1 (2010). https://doi.org/10.1186/1471-2288-10-1

Download citation

Received : 09 August 2009

Accepted : 06 January 2010

Published : 06 January 2010

DOI : https://doi.org/10.1186/1471-2288-10-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Pilot Study
  • Sample Size Calculation
  • Research Ethic Board
  • Adaptive Design

BMC Medical Research Methodology

ISSN: 1471-2288

hypothesis feasibility study

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Clinical Study
  • Open access
  • Published: 17 April 2014

A feasibility study testing four hypotheses with phase II outcomes in advanced colorectal cancer (MRC FOCUS3): a model for randomised controlled trials in the era of personalised medicine?

  • T S Maughan 1 ,
  • A M Meade 2 ,
  • R A Adams 3 ,
  • S D Richman 4 ,
  • R Butler 5 ,
  • D Fisher 2 ,
  • R H Wilson 6 ,
  • B Jasani 7 ,
  • G R Taylor 4 ,
  • G T Williams 7 ,
  • J R Sampson 7 ,
  • M T Seymour 8 ,
  • L L Nichols 2 ,
  • S L Kenny 2 ,
  • A Nelson 9 ,
  • C M Sampson 9 ,
  • E Hodgkinson 10 ,
  • J A Bridgewater 11 ,
  • D L Furniss 10 ,
  • M J Pope 2   na1 ,
  • J K Pope 2   na1 ,
  • M Parmar 2 ,
  • P Quirke 4 &
  • R Kaplan 2  

British Journal of Cancer volume  110 ,  pages 2178–2186 ( 2014 ) Cite this article

3199 Accesses

25 Citations

3 Altmetric

Metrics details

  • Clinical trials
  • Colorectal cancer
  • Personalized medicine

This article has been updated

Background:

Molecular characteristics of cancer vary between individuals. In future, most trials will require assessment of biomarkers to allocate patients into enriched populations in which targeted therapies are more likely to be effective. The MRC FOCUS3 trial is a feasibility study to assess key elements in the planning of such studies.

Patients and Methods:

Patients with advanced colorectal cancer were registered from 24 centres between February 2010 and April 2011. With their consent, patients’ tumour samples were analysed for KRAS/BRAF oncogene mutation status and topoisomerase 1 (topo-1) immunohistochemistry. Patients were then classified into one of four molecular strata; within each strata patients were randomised to one of two hypothesis-driven experimental therapies or a common control arm (FOLFIRI chemotherapy). A 4-stage suite of patient information sheets (PISs) was developed to avoid patient overload.

A total of 332 patients were registered, 244 randomised. Among randomised patients, biomarker results were provided within 10 working days (w.d.) in 71%, 15 w.d. in 91% and 20 w.d. in 99%. DNA mutation analysis was 100% concordant between two laboratories. Over 90% of participants reported excellent understanding of all aspects of the trial. In this randomised phase II setting, omission of irinotecan in the low topo-1 group was associated with increased response rate and addition of cetuximab in the KRAS, BRAF wild-type cohort was associated with longer progression-free survival.

Conclusions:

Patient samples can be collected and analysed within workable time frames and with reproducible mutation results. Complex multi-arm designs are acceptable to patients with good PIS. Randomisation within each cohort provides outcome data that can inform clinical practice.

Similar content being viewed by others

hypothesis feasibility study

The Oncology Biomarker Discovery framework reveals cetuximab and bevacizumab response patterns in metastatic colorectal cancer

hypothesis feasibility study

Fluoropyrimidine type, patient age, tumour sidedness and mutation status as determinants of benefit in patients with metastatic colorectal cancer treated with EGFR monoclonal antibodies: individual patient data pooled analysis of randomised trials from the ARCAD database

hypothesis feasibility study

Critical evaluation of molecular tumour board outcomes following 2 years of clinical practice in a Comprehensive Cancer Centre

Cancer is the product of a somatic evolutionary process, in which successive advantageous genetic and epigenetic alterations drive the progression of the disease ( Greaves and Maley, 2012 ). Although current knowledge indicates many similar changes in different cancers, the number of possible combinations of changes even within a given anatomical/histological type such as colorectal cancer (CRC) is very large ( The Cancer Genome Network Atlas, 2012 ). This raises a major challenge in the search for effective therapies that target the properties of any given cancer, especially for advanced disease where clonal evolution and the selective pressure of prior therapies drive increasing diversity and resistance to subsequent therapy ( Sequist et al, 2011 ; Gerlinger et al, 2012 ). This emerging understanding of the heterogeneity of cancer is a major challenge to clinical trialists and demands new methodologies for testing novel therapies.

Fundamental to this challenge is the identification of biomarkers that help enrich the evaluated population for benefit from a specific therapy. In CRC, the use of epidermal growth factor receptor (EGFR)-targeted therapy has led to the discovery of the importance of KRAS and recently NRAS mutations ( Douillard et al, 2013 ) in prediction of lack of response to that therapy and association of BRAF mutation with a particularly poor prognosis in advanced CRC (ACRC; Lievre et al, 2006 ; Karapetis et al, 2008 ; Maughan et al, 2011 ). Further biomarker candidates under evaluation as potentially predicting lack of benefit from anti-EGFR therapy are PI3K mutations and loss of PTEN expression ( De Roock et al, 2010 ; Seymour et al, 2013 ).

This paper reports the results of the MRC FOCUS3 trial (ISRCTN83171665), a randomised feasibility trial for the selection of therapy for patients with ACRC based on their KRAS and BRAF mutation status as well as their topoisomerase 1 (topo-1) expression status.

Materials and Methods

Trial design.

Patients were registered on the day they provided written consent for the release of a tumour sample. Upon determination of their biomarker status, patients were allocated to one of four molecular subgroups for randomisation: (1) low topo-1 expression levels and both KRAS and BRAF wild type, (2) low topo-1 and either KRAS- or BRAF -activating mutations, (3) high topo-1 and both KRAS and BRAF wild type and (4) high topo-1 and either KRAS or BRAF mutations. These randomisation subgroups correspond to the prior hypotheses that: (1) in patients with low topo-1 tumours, FU alone is similarly effective and therefore preferable to irinotecan/FU combination ( Braun et al, 2008 ); (2) in patients with KRAS/BRAF wild-type tumours, anti-EGFR therapy improves outcomes ( Van Cutsem et al, 2009 ); (3) in patients with high topo-1 tumours, addition of oxaliplatin to irinotecan/FU improves outcomes ( Braun et al, 2008 ) and (4) in patients with KRAS/BRAF-mutated tumours, anti-VEGF therapy might improve outcomes. There was no specific rationale for a biologically targeted therapy in patients with KRAS mutations; however, there were data suggesting benefit of bevacizumab ( Ince et al, 2005 ).

Patients were randomised centrally by the MRC CTU via telephone using minimisation and allocated in a 1 : 1 : 1 ratio to the control arm (A) common to each of the four subgroups or one of two experimental regimens ( Figures 1 and 2 ). If either molecular test failed, patients could still be randomised in a 1 : 1 ratio based on the results available ( Figure 1 ). Treatment allocation was not masked. Randomisation was stratified by standard clinical prognostic factors.

figure 1

Trial design.

figure 2

Diagram in patient information sheet 1 – given to patients to explain the tests carried out on their tumour sample.

Eligibility criteria were age ⩾ 18 years, colorectal adenocarcinoma, inoperable metastatic or locoregional RECIST measurable disease, no previous chemotherapy for metastases, WHO performance status 0–2 and good organ function ( Maughan and Meade, 2010 ). Written informed consent for both molecular testing and randomisation was required.

Outcome measures and sample size

The primary outcome measures for FOCUS3 were process outcomes, namely, in this national multi-site setting, how frequently the target could be met of ⩽ 10 w.d. between the date of registration and: (1) the provision of results to the investigator and (2) randomisation.

The target sample size was 240 patients; if >226 tumour blocks were processed within 10 w.d., we could reliably state that ⩾ 90% samples could be analysed within that time frame. If <206 blocks were processed within 10 w.d., we could reliably exclude a turnaround rate of 90% (i.e., the upper 95% confidence limit would exclude 90%).

Secondary outcome measures included toxicity, response rates (RRs) and progression-free survival (PFS) of the different regimens within each molecular subgroup; reproducibility of biomarker results and attitudes of patients to the study design, the consent process and refusal rates for trial entry.

Informed consent and patients attitudes to the trial design

A staged set of patient information sheets (PISs) was developed with input from patients, carers and nursing staff: PIS 1 explained the need for further analyses of tumour tissue using a very simple diagram and no technical details (see Figure 2 ), PIS2, given to patients before results of their molecular tests were known, covered the general issues of a three arm RCT and treatment side-effects. PIS3, in four specific versions a–d, describing the three arm randomisation for each of the four molecular sub-types (1–4) was given to patients before randomisation. PIS4, versions a–e, contained full details of the five treatment regimens (A–E).

Patient understanding of the information was captured on a questionnaire delivered immediately following their reading of the stage 2 PIS.

Attitudes of participants to trial entry, understanding and experience, particularly to the proposed 2 weeks time for tumour testing before treatment allocation, were evaluated by one-to-one semi-structured interviews using interpretative phenomenological analysis in a subgroup of randomised patients ( Smith and Osborn, 2003 ).

Sample collection and analysis process

The clinical research nurse (CRN) at the recruiting hospital requested the patients’ diagnostic FFPE block. Histopathology agreements were in place between MRC and all diagnostic hospitals outlining the trial rationale and stressing the importance of sending blocks promptly to the central laboratories. The MRC CTU team actively tracked samples throughout the biomarker analysis process. Upon reconfirmation of eligibility, and with their consent, patients were randomised.

Biomarker analysis

Analysis of DNA extracted from macro-dissected FFPE sections of KRAS codons 12, 13 and 61 and BRAF codon 600 was each performed by Pyrosequencing (details in Supplementary Appendix ).

Topo-1 protein expression was identified using a topo-1 antibody (NCL-TOPO1; Leica, Wetzlar, Germany; details in Supplementary Appendix ). Each case was scored on the basis of the percentage of positive tumour cells (<10% scored low, >10% high).

Quality assurance of biomarker analysis

Fifty samples were blinded and exchanged between the two laboratories before the trial and analysed for KRAS and BRAF mutation status. Throughout the trial both laboratories took part in external quality assessment (UK NEQAS) for KRAS . Topo-1 IHC was compared between laboratories.

Interventions and assessments

The five treatment regimens were all based on the 2-weekly FOLFIRI regimen – folinic acid and irinotecan followed by bolus and infusional 5-fluouracil (5-FU; Douillard et al, 2000 ): (A) Control: FOLFIRI, (B) omits irinotecan: LV5FU2, (C) adds oxaliplatin: FOLFOXIRI (FOLFIRI and oxaliplatin), (D) FOLFIRI plus cetuximab and (E) FOLFIRI plus bevacizumab. Doses in (C) were dependent on patient age and WHO performance status. The chemotherapy regimens FOLFIRI and LV5FU2 are internationally recognised acronyms. The actual regimens used in FOCUS3 were established in the UK ( Cheeseman et al, 2002 ; Leonard et al, 2002 ). They have been used in large numbers of patients, have been shown to be both efficacious and safe ( Seymour et al, 2007 ) and will be referred to as FOLFIRI and LV5FU2 in this paper. The FOLFIRI regimen consisted of an IV infusion of 180 mg m −2 IV infusion over 30 min followed by 350 mg IV infusion d,l-folinic acid or 175 mg l-folinic acid over 2 h. A 400 mg m −2 IV bolus injection of 5-FU was then administered over 5 min followed by 2400 mg m −2 5-flurouracil IV infusion over 46 h. For the LV5FU2 regimen, irinotecan was omitted and the 5-fluourouracil IV infusion dose was increased to 2800 mg m −2 . There were three different FOLFOXIRI regimens, which were prescribed based on the patient’s age and WHO PS status. The regimen for patients aged 70 years or less and with PS=0–1 contained 180 mg m −2 irinotecan and 85 mg m −2 oxaliplatin, 400 mg m −2 5-fluorouracil bolus and 2400 mg m −2 5-fluorouracil infusion. The individual components were reduced to 80% of full dose for patients ⩾ 70 years or PS=2 and to 60% for patients ⩾ 70 years and PS=2. In arm D, cetuximab was administered before chemotherapy as an IV dose of 500 mg m −2 , whereas in arm E bevacizumab was administered first as a 5 mg kg −1 IV infusion. All of the regimens are described in detail in the FOCUS 3 protocol ( Maughan and Meade, 2010 ).

If molecular results were not confirmed by 2 weeks, patients could have one cycle of LV5FU2 before randomisation. Treatment continued for at least 24 weeks or until disease progression on treatment.

Patient symptoms were scored using National Cancer Institute Common Toxicity Criteria for Adverse Events version 3.0. SAEs and deaths, together with an assessment of causality, were continuously reported; and were reassessed by an experienced oncologist on behalf of the MRC.

CT scans were performed within 5 weeks before the start of treatment and then 12 weekly on treatment and evaluated using RECIST (v1.1) criteria. Responses were not confirmed by repeat scans and external radiological review was not undertaken.

Statistical methods

Analyses were conducted according to a predefined statistical analysis plan, which was approved by the FOCUS3 TMG before database lock (first analysed in August 2011, data updated for final analysis in May 2012).

For each of the co-primary process outcomes, an exact binomial 95% confidence interval was calculated around the result. Exploratory analyses of the efficacy end points were planned in relation to the four hypotheses stated above (Trial Design), which in each case involved factorial analysis of two relevant molecular subgroups, as illustrated in Figure 1 . Time-to-event curves for analysis of PFS were estimated using the Kaplan–Meier method. All statistical analyses were carried out using Stata version 12 (StataCorp, College Station, TX, USA).

Between February 2010 and April 2011, 332 patients from 24 centres in the UK were registered for the FOCUS3 trial.

Topo-1 status was determined in 306 patients (92%) and was highly expressed (2–3) in 244 (73%). KRAS and BRAF status were determined in 319 patients (96%), of whom 117 (37%) had a KRAS mutation alone, 25 (8%) BRAF mutation alone, 1 (<1%) both mutations, 169 (53%) were double wild type and 7 (2%) had a BRAF mutation but inconclusive KRAS status. No association was seen between topo-1 expression and KRAS / BRAF mutation status ( Table 1 ).

Of patients registered, 288 were eligible for randomisation, and ultimately 244 (85%) were randomised. The reasons why patients were not randomised are described in Figure 3 (Consort Diagram). The main baseline characteristics and treatment allocation of all randomised patients are shown in Table 2 (and in Supplementary Tables 1 and 2 ) and Figure 3 . The distribution of KRAS/BRAF and Topo-1 status both at registration and randomisation is shown in Table 1 .

figure 3

CONSORT diagram.

Primary process outcomes

The two co-primary process outcome measures were not met. Of those patients randomised 180 (74%) had their biomarker results within 10 w.d. of registration (95% CI=68%, 79%). However, the results for 225 patients (92%) were available to investigators within 15 w.d. of randomisation (95% CI=88%, 95%). The interval between registration and randomisation was less than or equal to 10 w.d. in only 70 (29%) patients (95% CI=23%, 35%), which suggests delays due to clinical issues (such as visit scheduling after results were available) had a greater impact on timelines than delays in biomarker analysis ( Supplementary Table 3 ).

Reproducibility of biomarker results

100% concordance was achieved in the DNA mutation analysis results obtained between the two reference laboratories. Initial crossing over of topo-1 samples between the laboratories produced consistent results, although there were a higher proportion of ‘high’ expressing tumours than was observed in FOCUS. The Cardiff centre was not able to fully adopt the previously validated Leeds laboratory topo-1 protocol, and early in the trial it was realised that the protocols adopted at the two centres were not giving uniformly consistent results required for trial purposes. All subsequent sample testing for KRAS , BRAF and topo-1 was therefore performed at Leeds.

Patient understanding

In all, 90–95% of participants self-reported that they either fully or mostly understood all of the aspects of the trial, see Figure 4 . The areas that were least well understood were the need to wait 2 weeks before start of treatment, how treatment was allocated and what happens during treatment.

figure 4

Patient understanding of the consent process. Q1: Understanding of PIS2. Q2: Understanding why tumour was tested. Q3: Understanding of different treatments. Q4: Understanding of why you had to wait 2 weeks. Q5: Understanding of how treatment was allocated. Q6: Understanding of what happens during treatment. Q7: Understanding of request to give blood, complete questionnaire, take part in an interview.

Qualitative research

In-depth, interviews with 14 randomised patients were analysed using interpretative phenomenological analysis and will be published in full elsewhere. The dominant issue for the majority of participants was that they were discussing the trial immediately following diagnosis of ACRC. This was a greater concern than trial entry itself. Two of the fourteen interviewees experienced delays with results from tumour testing, causing significant distress. The majority of patients expressed no concern with tumour testing times but highlighted distress caused by prior delays during diagnosis and treatment.

Relationships with family were key to ongoing practical and emotional support and particularly relevant to the decision to enrol on the trial and the processing of information. The multiple roles of the CRN emerged in relation to recruitment and the ongoing care of participants in the trial. Reasons for enrolling in FOCUS3 related to altruism, perception of the trial as offering personalised treatment and better care, finding a cure for cancer and being the only option available.

Treatment and follow-up

Of the 244 randomised patients, 4 did not commence treatment—2 from arm A and 2 from arm E. Of the remaining 240, two patients (0.8%) received a single initial cycle of LV5FU2 alone before commencing their allocated regimens. Full-dose FOLFOXIRI was initiated in the 86% of patients with high topo-1 who were <70 and PFS 0–1; the remainder commenced at lower doses as per protocol. The median number of cycles of treatment delivered was 12 (IQR=7–13).

Efficacy outcomes

Efficacy outcomes were assessed in May 2012 when the median duration of follow-up was 15.2 months (IQR=12.6–18.8 months).

In patients with low topo-1 (B vs A, n =30), 12-week RR was 60% with LV5FU2 alone and 47% with FOLFIRI, supporting the original hypothesis that irinotecan does not add benefit in this group. There was no evidence of a difference in PFS.

There was no improvement in RR (40% vs 45%) or in PFS (HR=1.08 (0.67–1.76)) with the addition of oxaliplatin ( n =127) to FOLFIRI (C vs A). The complex randomisation algorithm resulted in a gender imbalance with more males in this group, which has uncertain relevance.

In patients with KRAS and BRAF wild type (D vs A, n =92), the addition of cetuximab to FOLFIRI was associated with an increased RR (44% vs 66%) and PFS (HR=0.44 (0.23–0.82)), consistent with the results of the phase III Crystal trial ( Van Cutsem et al, 2009 , 2011 ).

For the addition of bevacizumab to FOLFIRI in patients with KRAS or BRAF mutations (E vs A, n =72), there was an observed increased RR (47% vs 33%). No PFS benefits were observed.

Kaplan–Meier survival curves are presented in Figure 5 and 12-week RR data are summarised in Table 3 .

figure 5

Treatment comparisons – progression-free survival.

Toxicity observed was as expected for the LV5FU2, FOLFIRI, FOLFIRI+cetuximab and FOLFIRI+bevacizumab regimens. The anticipated increased toxicity of the FOLFOXIRI regimen was minimal, with only 27% grade 3+ neutropenia. This may be due to the reduced dosing schedule in the elderly/less fit patients ( n =9 of 127) previously described ( Supplementary Table 4 ).

The primary objective of FOCUS3 was to assess the feasibility of undertaking a complex biomarker-driven trial in a national multicentre setting. Although the study did not meet either of its ambitious pre-specified co-primary process outcome measures, the trial has shown that complex prospective biomarker-driven RCTs are possible on a substantial scale across the United Kingdom. Extra resources are required in the reference pathology laboratories to undertake the biomarker analyses, but within investigator sites and the trials office there is no requirement for special dedicated staff.

Potentially eligible patients were necessarily approached for consent at precisely the time when they had recently learned of the life-threatening status of their disease; our qualitative research showed this was the dominating concern in their minds. That we achieved our target patient number from 24 centres in 1 year demonstrated that the strategy for explaining the trial was successful and that, even under difficult circumstances, complex trials can be attractive to patients. Our four-step consent procedure was developed in consultation with patients and carers and was praised by the research ethics committee. The responses to the questionnaire administered after patients had read their stage 2 PIS showed high levels of understanding of the trial. The subsequent steps in the consent process, with specific patient consent forms for each molecular cohort and for each treatment, avoided information overload and provided only that information that was specifically relevant to the particular patient.

The logistics of retrieval of the FFPE blocks from the diagnostic hospitals was a major concern. Prior written agreement, a modest (£15) fee for retrieval and detailed sample tracking by CTU personnel minimised delay. The critical lessons were the need for excellent communication between all parties in the chain: from CRN to pathologist to the central laboratories to the coordinating trials unit.

A delay in reporting analysis results back to the MRC CTU was observed in 22 cases and was distressing to some patients. The delays were due to insufficient tumour in the block ( n =4), unexpected technical difficulties ( n =6), initial testing inconclusive or failed ( n =12). This was mitigated by allowing patients ( n =2) to start cycle one of chemotherapy using the infusional 5-FU and folinic acid backbone, which was common to all treatment protocols and then adding in the relevant additional agents for cycle 2 once the biomarker results were available.

Overall, the most important laboratory issue was reproducibility of IHC results. Although 100% concordance was achieved in the calling of KRAS and BRAF mutations between the two laboratories, it proved very difficult to perform and report the topo-1 IHC staining intensity in a sufficiently comparable way. Owing to technical- and manpower-based organisational limitations, it was not possible to completely replicate the manual staining methodology adopted initially by the Leeds laboratory in the Cardiff laboratory where an automated staining platform was used. Even what were deemed inconsequential differences between staining protocols contributed to this lack of consistency. For future studies, contributing diagnostic centres will use the same antibodies, protocol and automated staining platform. Detailed guidance on scoring, blinded replication in contributing centres with face to face comparison of discrepantly scored sections have been implemented for IHC tests in FOCUS4. On trial quality assurance by double reading of slides will ensure comparability of evaluation.

This trial was structured so that we could address four distinct hypotheses, any or all of which might be the subject of a subsequent phase III trial. Our first hypothesis, arising from the observation in the earlier FOCUS trial that patients with low topo-1 expression appear to gain no benefit from the addition of irinotecan to LV5FU2 ( Seymour et al, 2007 ; Braun et al, 2008 ), was supported and remains an intriguing one. Only 30 patients were randomised to this comparison because of the lower than expected rate of low topo-1 expression, but the high RR (60%) in the LV5FU2 only treated patients suggests further work in this area might be rewarding.

The second hypothesis proposed that patients with high topo-1 expression, who alone in FOCUS gained benefit from either irinotecan or oxaliplatin in comparison to 5-FU ( Braun et al, 2008 ), may derive additional benefit from the triple chemotherapy regimen. With the protocol-specified dose reductions, the regimen was well tolerated. However, in contrast to the international literature ( Falcone et al, 2007 , 2013 ), although patients had a minimally higher RR, there was no hint of a PFS benefit.

The third hypothesis, tested in 92 patients with KRAS and BRAF wild-type tumours, was that the addition of cetuximab would increase efficacy. This recapitulated the Crystal study ( Van Cutsem et al, 2009 , 2011 ) and benefits in PFS and RR were observed.

Finally, our fourth hypothesis for patients with KRAS or BRAF mutations (72 patients) was based on the limited data that bevacizumab retains efficacy in these patients ( Ince et al, 2005 ). No benefits on either RR or PFS were observed.

The FOCUS4 trial programme ( Kaplan et al, 2013 ) has recently opened to recruitment building on many of the lessons learned in FOCUS3. Patient and clinician enthusiasm for biomarker-stratified trials and the rapid accrual observed in FOCUS3 have encouraged us to be optimistic in our predicted recruitment targets: 2400 registered patients with over 1500 randomised into multiple biomarker-directed comparisons in 4 years for FOCUS4. Staged PISs have been designed with information given at the time of registration limited to that which is necessary for consent for release of tumour blocks, plus a minimal outline of the protocol so as to avoid information overload. Detailed quality assurance work has been undertaken between the two biomarker reference laboratories, especially for the IHC tests (PTEN and mismatch repair proteins). In FOCUS4, the allocation by biomarker to specific comparisons occurs for patients with stable or responding disease after 4 months of first-line chemotherapy. Knowing that in FOCUS3 we completed biomarker analysis in 99% of patients within 20 w.d. of consent, the FOCUS4 logistics (registration of patients up to 12 weeks into their first-line chemotherapy) should facilitate accrual. Detailed engagement with pathologists in referring hospitals and a relatively small (£15) payment per case enabled rapid release of blocks for central analysis in FOCUS3 and the same pattern has been used in FOCUS4. Perhaps most important is the strength of the team working established through FOCUS3, including patient representatives, clinicians, biomarker experts (including histopathologists, immunohistochemists, geneticists and technicians), statisticians, research nurses, pharmacists, trial managers and data managers. To this, we have added research network managers to ensure improved patient transfers between district general hospitals and experimental cancer medicine centres, who are required in FOCUS4 for some patients randomised to the novel agent combinations being studied.

The FOCUS3 trial was a feasibility study designed to address the challenges of patient acceptability, technical logistics, and to test a novel design for examining the predictive role of biomarkers for first-line therapy of ACRC. We have shown that such studies are feasible and very well received by participants. The central trial design concepts have been taken forward into a major UK trial programme FOCUS4-molecular selection of therapy in CRC: a molecularly stratified RCT programme, which opened to accrual in January 2014 ( Kaplan et al, 2013 ).

Change history

29 april 2014.

This paper was modified 12 months after initial publication to switch to Creative Commons licence terms, as noted at publication

Braun MS, Richman SD, Quirke P, Daly C, Adlard JW, Elliott F, Barrett JH, Selby P, Meade AM, Stephens RJ, Parmar MK, Seymour MT (2008) Predictive biomarkers of chemotherapy efficacy in colorectal cancer: results from the UK MRC FOCUS trial. J Clin Oncol 26 : 2690–2698.

Article   CAS   Google Scholar  

Cheeseman SL, Joel SP, Chester JD, Wilson G, Dent JT, Richards FJ, Seymour MT (2002) A /'modified de Gramont/' regimen of fluorouracil, alone and with oxaliplatin, for advanced colorectal cancer. Br J Cancer 87 : 393–399.

De Roock W, Claes B, Bernasconi D, De Schutter J, Biesmans B, Fountzilas G, Kalogeras KT, Kotoula V, Papamichael D, Laurent-Puig P, Penault-Llorca F, Rougier P, Vincenzi B, Santini D, Tonini G, Cappuzzo F, Frattini M, Molinari F, Saletti P, De Dosso S, Martini M, Bardelli A, Siena S, Sartore-Bianchi A, Tabernero J, Macarulla T, Di Fiore F, Gangloff AO, Ciardiello F, Pfeiffer P, Qvortrup C, Hansen TP, Van Cutsem E, Piessevaux H, Lambrechts D, Delorenzi M, Tejpar S (2010) Effects of KRAS, BRAF, NRAS, and PIK3CA mutations on the efficacy of cetuximab plus chemotherapy in chemotherapy-refractory metastatic colorectal cancer: a retrospective consortium analysis. Lancet Oncol 11 : 753–762.

Douillard JY, Cunningham D, Roth AD, Navarro M, James RD, Karasek P, Jandik P, Iveson T, Carmichael J, Alakl M, Gruia G, Awad L, Rougier P (2000) Irinotecan combined with fluorouracil compared with fluorouracil alone as first-line treatment for metastatic colorectal cancer: a multicentre randomised trial. Lancet 355 : 1041–1047.

Douillard JY, Oliner KS, Siena S, Tabernero J, Burkes R, Barugel M, Humblet Y, Bodoky G, Cunningham D, Jassem J, Rivera F, Kocákova I, Ruff P, Błasińska-Morawiec M, Šmakal M, Canon JL, Rother M, Williams R, Rong A, Wiezorek J, Sidhu R, Patterson SD (2013) Panitumumab-FOLFOX4 treatment and RAS mutations in colorectal cancer. N Engl J Med 369 (11): 1023–1034.

Falcone A, Cremolini C, Masi G, Lonardi S, Zagonel V, Salvatore L, Trenta P, Tomasello G, Ronzoni M, Ciuffreda L, Zaniboni A, Tonini G, Buonadonna A, Valsuani C, Chiara S, Carlomagno C, Boni C, Marcucci L, Boni L, Loupakis F (2013) FOLFOXIRI/bevacizumab (bev) versus FOLFIRI/bev as first-line treatment in unresectable metastatic colorectal cancer (mCRC) patients (pts): Results of the phase III TRIBE trial by GONO group. J Clin Oncol 31 (suppl): abstr 3505.

Google Scholar  

Falcone A, Ricci S, Brunetti I, Pfanner E, Allegrini G, Barbara C, Crino L, Benedetti G, Evangelista W, Fanchini L, Cortesi E, Picone V, Vitello S, Chiara S, Granetto C, Porcile G, Fioretto L, Orlandini C, Andreuccetti M, Masi G (2007) Phase III trial of infusional fluorouracil, leucovorin, oxaliplatin, and irinotecan (FOLFOXIRI) compared with infusional fluorouracil, leucovorin, and irinotecan (FOLFIRI) as first-line treatment for metastatic colorectal cancer: the Gruppo Oncologico Nord Ovest. J Clin Oncol 25 : 1670–1676.

Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, Mcdonald NQ, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C (2012) Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366 : 883–892.

Greaves M, Maley CC (2012) Clonal evolution in cancer. Nature 481 : 306–313.

Ince WL, Jubb AM, Holden SN, Holmgren EB, Tobin P, Sridhar M, Hurwitz HI, Kabbinavar F, Novotny WF, Hillan KJ, Koeppen H (2005) Association of k-ras, b-raf, and p53 status with the treatment effect of bevacizumab. J Natl Cancer Inst 97 : 981–989.

Kaplan R, Maughan T, Crook A, Fisher D, Wilson R, Brown L, Parmar M (2013) Evaluating many treatments and biomarkers in oncology: a new design. J Clin Oncol 31 : 4562–4568.

Article   Google Scholar  

Karapetis CS, Khambata-Ford S, Jonker DJ, O'callaghan CJ, Tu D, Tebbutt NC, Simes RJ, Chalchal H, Shapiro JD, Robitaille S, Price TJ, Shepherd L, Au HJ, Langer C, Moore MJ, Zalcberg JR (2008) K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med 359 : 1757–1765.

Leonard P, Seymour MT, James R, Hochhauser D, Ledermann JA (2002) Phase II study of irinotecan with bolus and high dose infusional 5-FU and folinic acid (modified de Gramont) for first or second line treatment of advanced or metastatic colorectal cancer. Br J Cancer 87 : 1216–1220.

Lievre A, Bachet JB, Le Corre D, Boige V, Landi B, Emile JF, Cote JF, Tomasic G, Penna C, Ducreux M, Rougier P, Penault-Llorca F, Laurent-Puig P (2006) KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res 66 : 3992–3995.

Maughan TS, Adams RA, Smith CG, Meade AM, Seymour MT, Wilson RH, Idziaszczyk S, Harris R, Fisher D, Kenny SL, Kay E, Mitchell JK, Madi A, Jasani B, James MD, Bridgewater J, Kennedy MJ, Claes B, Lambrechts D, Kaplan R, Cheadle JP (2011) Addition of cetuximab to oxaliplatin-based first-line combination chemotherapy for treatment of advanced colorectal cancer: results of the randomised phase 3 MRC COIN trial. Lancet 377 : 2103–2114.

Maughan TS, Meade AM (2010) FOCUS 3 (CR12) protocol: a study to determine the feasibility of molecular selection of therapy using KRAS, BRAK and topo-1 in patients with metastatic or locally advanced colorectal cancer. Avialable at: http://www.ctu.mrc.ac.uk/plugins/StudyDisplay/protocols/FOCUS%203%20Protocol%20and%20appendices%20v4%200%20Nov%202010.pdf .

Sequist LV, Waltman BA, Dias-Santagata D, Digumarthy S, Turke AB, Fidias P, Bergethon K, Shaw AT, Gettinger S, Cosper AK, Akhavanfard S, Heist RS, Temel J, Christensen JG, Wain JC, Lynch TJ, Vernovsky K, Mark EJ, Lanuti M, Iafrate AJ, Mino-Kenudson M, Engelman JA (2011) Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci Transl Med 3 : 75ra26.

Seymour MT, Brown SR, Middleton G, Maughan T, Richman S, Gwyther S, Lowe C, Seligmann JF, Wadsley J, Maisey N, Chau I, Hill M, Dawson L, Falk S, O'callaghan A, Benstead K, Chambers P, Oliver A, Marshall H, Napp V, Quirke P (2013) Panitumumab and irinotecan versus irinotecan alone for patients with KRAS wild-type, fluorouracil-resistant advanced colorectal cancer (PICCOLO): a prospectively stratified randomised trial. Lancet Oncol 14 : 749–759.

Seymour MT, Maughan TS, Ledermann JA, Topham C, James R, Gwyther SJ, Smith DB, Shepherd S, Maraveyas A, Ferry DR, Meade AM, Thompson L, Griffiths GO, Parmar MK, Stephens RJ (2007) Different strategies of sequential and combination chemotherapy for patients with poor prognosis advanced colorectal cancer (MRC FOCUS): a randomised controlled trial. Lancet 370 : 143–152.

Smith JA, Osborn M (2003) Interpretative phenomenological analysis. In: Qualitative Psychology . A Practical Guide to Research Methods Smith JA, (ed), pp 51–80. Sage: London.

The Cancer Genome Network Atlas (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487 : 330–337.

Van Cutsem E, Köhne CH, Láng I, Folprecht G, Nowacki M, Cascinu S, Shchepotin I, Maurel J, Cunningham D, Tejpar S, Schlichting M, Zubel A, Celik I, Rougier P, Ciardiello F (2011) Cetuximab plus irinotecan, fluorouracil, and leucovorin as first-line treatment for metastatic colorectal cancer: updated analysis of overall survival according to tumor KRAS and BRAF mutation status. J Clin Oncol 29 : 2011–2019.

Van Cutsem E, Kohne CH, Hitre E, Zaluski J, Chang Chien CR, Makhson A, D'haens G, Pinter T, Lim R, Bodoky G, Roh JK, Folprecht G, Ruff P, Stroh C, Tejpar S, Schlichting M, Nippgen J, Rougier P (2009) Cetuximab and chemotherapy as initial treatment for metastatic colorectal cancer. N Engl J Med 360 : 1408–1417.

Download references

Acknowledgements

We are indebted to the 332 patients and their families who participated in FOCUS3.

The design of the Medical Research Council (MRC) FOCUS3 trial was conceived and developed by the National Cancer Research Institute (NCRI) advanced colorectal cancer group. The trial was funded by the MRC. Additional support was provided by Merck KGaA (free cetuximab), Pfizer and Roche (educational research grants for the MRC colorectal research portfolio). The topo-1 antibody was provided free from Leica. Laboratory work in Leeds was also supported by funding from Yorkshire Cancer Research and the Leeds Experimental Cancer Medicines Centre. All tumour samples from patients who consented for future CRC research are stored at the Wales Cancer Bank.

The MRC was the overall sponsor of the study. FOCUS3 was approved by the Medicines and Healthcare Regulatory Agency (MHRA) on 12 June 2009 and Research Ethics Committee for Wales on 26 May 2009. The trial was coordinated by the MRC Clinical Trials Unit (CTU) following the principles of GCP, conducted with a Trial Management Group (TMG), monitored by a Data Monitoring Committee (DMC) and overseen by an independent Trial Steering Committee. Data collection at UK sites was supported by staff funding from the National Cancer Research Networks. All statistical analyses were performed at the MRC CTU. The trial is registered as an International Standard Randomised Controlled trial, number ISRCTN83171665.

Trial Management Group : TS Maughan (chair), R Adams, RH Wilson, MT Seymour, B Jasani, R Butler, S Richman, P Quirke, AM Nelson, GT Williams, G Taylor, H Grabsch, I Frayling, J Sampson, E Hodgkinson, P Rogers, M Pope and MRC CTU staff.

MRC Clinical Trials Unit: AM Meade, R Kaplan, D Fisher, SL Kenny, JK Mitchell, LL Nichols, L Harper, K Letchemanan, M Parmar.

Data Monitoring Committee: AM Meade, R Kaplan, D Fisher, TS Maughan, MT Seymour.

Trial Steering Committee: C Parker (current chair), R Rudd, J Whelan.

Sponsor: Medical Research Council.

Clinical Investigators (Institution—(number of patients contributed)): Bridgewater J, King J, Aggarwal A, Harinarayanan S, Melcher L, Karp Stephen (North Middlesex Hospital (32)), Furniss D, Wadsley J, Walkington L, Simmons T, Hornbuckle J, Pledge S, Clenton S (Weston Park Hospital (30)), Roy R, Dhadda A (Castle Hill Hospital (26)), Adams R, Maughan T, Jones R, Brewster A, Iqbal N, Arif, Crosby T (Velindre Hospital (23)), Falk S, Garadi K, Hopkins K (Bristol Haematology and Oncology Centre (18)), Seymour M, Swinson D, Anthoney A, (St James’ University Hospital, Leeds (18)), Leonard P, Mohamed M, (Whittington Hospital (14)), Benstead K, Farrugia D, Shepherd S (Cheltenham General Hospital (11)), Blesing C, Hyde K, Grant W (Great Western Hospital (10)), Lowdell C, Cleator S, Riddle P, Kenny L, Ahmad R (Charing Cross Hospital (9)), Hill M, Bhattacharjee P, Sevitt T, Summers J, Shah R (Maidstone Hospital (9)), Whillis D, Nicholls A, Ireland H, Macgregor C (Raigmore Hospital (8)), Sizer B, Basu D (Essex County Hospital (7)), Dent J, Hofmann U (Huddersfield Royal Infirmary (6)), Roy R, Butt M, Iqbal M (Diana, Princess of Wales Hospital (6)), Dent J (Calderdale Royal Hospital (6)), Hickish T, Osborne R (Poole Hospital (3)), Hickish T, Astras G, Purandare L (Royal Bournemouth Hospital (2)), Tahir S, Srinivasan G (Broomfield Hospital (2)), Gollins S, Kodavatiganti R (Wrexham Maelor Hospital (2)), Bale C, Mullard A, Fuller C, Williams R, Stuart N (Ysbyty Gwynedd (1)), Gollins S, Neupane R (Glan Clwyd Hospital (1)), Bessell E, Potter V (Nottingham University Hospital (0)), Tsang D (Southend University Hospital (0)).

In addition to the above-named individuals, we acknowledge the contributions of a large number of clinicians, research nurses, data managers and other clinical and support staff at the participating centres.

Author information

M J Pope and J K Pope: Malcolm and Janet Pope are Consumer Representatives; they also represent Velindre Hospital, Patient Liaison Group, Cardiff CF14 2TL, UK

Authors and Affiliations

CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford OX3 7DQ, UK

T S Maughan

MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, London, WC2B 6NH, UK

A M Meade, D Fisher, L L Nichols, S L Kenny, M J Pope, J K Pope, M Parmar & R Kaplan

Cardiff University and Velindre Cancer Centre, Cardiff, UK

Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, LS9 7TF, UK

S D Richman, G R Taylor & P Quirke

University Hospital of Wales, Cardiff, CF14 4XW, UK

Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast, BT9 7AE, UK

Institute of Cancer and Genetics, Cardiff University, Cardiff, CF14 4XN, UK

B Jasani, G T Williams & J R Sampson

St James’s Institute of Oncology, University of Leeds, Leeds, LS9 7TF, UK

M T Seymour

Wales Cancer Trials Unit, Cardiff University, Cardiff, CF14 4YS, UK

A Nelson & C M Sampson

Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, S5 7AU, UK

E Hodgkinson & D L Furniss

UCL Cancer Institute, London, WC1E 6BT, UK

J A Bridgewater

Department of Oncology, Castle Hill Hospital, East Riding of Yorkshire, HU16 5JQ, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to A M Meade .

Additional information

This work is published under the standard license to publish agreement. After 12 months the work will become freely available and the license terms will switch to a Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License.

Supplementary Information accompanies this paper on British Journal of Cancer website

Supplementary information

Supplementary information (doc 401 kb), rights and permissions.

From twelve months after its original publication, this work is licensed under the Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

Reprints and permissions

About this article

Cite this article.

Maughan, T., Meade, A., Adams, R. et al. A feasibility study testing four hypotheses with phase II outcomes in advanced colorectal cancer (MRC FOCUS3): a model for randomised controlled trials in the era of personalised medicine?. Br J Cancer 110 , 2178–2186 (2014). https://doi.org/10.1038/bjc.2014.182

Download citation

Received : 21 November 2013

Revised : 27 February 2014

Accepted : 13 March 2014

Published : 17 April 2014

Issue Date : 29 April 2014

DOI : https://doi.org/10.1038/bjc.2014.182

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • colorectal cancer
  • multi-arm trials
  • personalised medicine

This article is cited by

Patient understanding and acceptability of an early lung cancer diagnosis trial: a qualitative study.

  • Hayley C. Prout
  • Allan Barham
  • Annmarie Nelson

Trials (2018)

A phase II study of weekly irinotecan in patients with locally advanced or metastatic HER2- negative breast cancer and increased copy numbers of the topoisomerase 1 (TOP1) gene: a study protocol

  • Iben Kümler
  • Eva Balslev
  • Dorte Nielsen

BMC Cancer (2015)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

hypothesis feasibility study

Pilot Study in Research: Definition & Examples

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A pilot study, also known as a feasibility study, is a small-scale preliminary study conducted before the main research to check the feasibility or improve the research design.

Pilot studies can be very important before conducting a full-scale research project, helping design the research methods and protocol.

How Does it Work?

Pilot studies are a fundamental stage of the research process. They can help identify design issues and evaluate a study’s feasibility, practicality, resources, time, and cost before the main research is conducted.

It involves selecting a few people and trying out the study on them. It is possible to save time and, in some cases, money by identifying any flaws in the procedures designed by the researcher.

A pilot study can help the researcher spot any ambiguities (i.e., unusual things), confusion in the information given to participants, or problems with the task devised.

Sometimes the task is too hard, and the researcher may get a floor effect because none of the participants can score at all or can complete the task – all performances are low.

The opposite effect is a ceiling effect, when the task is so easy that all achieve virtually full marks or top performances and are “hitting the ceiling.”

This enables researchers to predict an appropriate sample size, budget accordingly, and improve the study design before performing a full-scale project.

Pilot studies also provide researchers with preliminary data to gain insight into the potential results of their proposed experiment.

However, pilot studies should not be used to test hypotheses since the appropriate power and sample size are not calculated. Rather, pilot studies should be used to assess the feasibility of participant recruitment or study design.

By conducting a pilot study, researchers will be better prepared to face the challenges that might arise in the larger study. They will be more confident with the instruments they will use for data collection.

Multiple pilot studies may be needed in some studies, and qualitative and/or quantitative methods may be used.

To avoid bias, pilot studies are usually carried out on individuals who are as similar as possible to the target population but not on those who will be a part of the final sample.

Feedback from participants in the pilot study can be used to improve the experience for participants in the main study. This might include reducing the burden on participants, improving instructions, or identifying potential ethical issues.

Experiment Pilot Study

In a pilot study with an experimental design , you would want to ensure that your measures of these variables are reliable and valid.

You would also want to check that you can effectively manipulate your independent variables and that you can control for potential confounding variables.

A pilot study allows the research team to gain experience and training, which can be particularly beneficial if new experimental techniques or procedures are used.

Questionnaire Pilot Study

It is important to conduct a questionnaire pilot study for the following reasons:
  • Check that respondents understand the terminology used in the questionnaire.
  • Check that emotive questions are not used, as they make people defensive and could invalidate their answers.
  • Check that leading questions have not been used as they could bias the respondent’s answer.
  • Ensure that the questionnaire can be completed in a reasonable amount of time. If it’s too long, respondents may lose interest or not have enough time to complete it, which could affect the response rate and the data quality.

By identifying and addressing issues in the pilot study, researchers can reduce errors and risks in the main study. This increases the reliability and validity of the main study’s results.

Assessing the practicality and feasibility of the main study

Testing the efficacy of research instruments

Identifying and addressing any weaknesses or logistical problems

Collecting preliminary data

Estimating the time and costs required for the project

Determining what resources are needed for the study

Identifying the necessity to modify procedures that do not elicit useful data

Adding credibility and dependability to the study

Pretesting the interview format

Enabling researchers to develop consistent practices and familiarize themselves with the procedures in the protocol

Addressing safety issues and management problems

Limitations

Require extra costs, time, and resources.

Do not guarantee the success of the main study.

Contamination (ie: if data from the pilot study or pilot participants are included in the main study results).

Funding bodies may be reluctant to fund a further study if the pilot study results are published.

Do not have the power to assess treatment effects due to small sample size.

  • Viscocanalostomy: A Pilot Study (Carassa, Bettin, Fiori, & Brancato, 1998)
  • WHO International Pilot Study of Schizophrenia (Sartorius, Shapiro, Kimura, & Barrett, 1972)
  • Stephen LaBerge of Stanford University ran a series of experiments in the 80s that investigated lucid dreaming. In 1985, he performed a pilot study that demonstrated that time perception is the same as during wakefulness. Specifically, he had participants go into a state of lucid dreaming and count out ten seconds, signaling the start and end with pre-determined eye movements measured with the EOG.
  • Negative Word-of-Mouth by Dissatisfied Consumers: A Pilot Study (Richins, 1983)
  • A pilot study and randomized controlled trial of the mindful self‐compassion program (Neff & Germer, 2013)
  • Pilot study of secondary prevention of posttraumatic stress disorder with propranolol (Pitman et al., 2002)
  • In unstructured observations, the researcher records all relevant behavior without a system. There may be too much to record, and the behaviors recorded may not necessarily be the most important, so the approach is usually used as a pilot study to see what type of behaviors would be recorded.
  • Perspectives of the use of smartphones in travel behavior studies: Findings from a literature review and a pilot study (Gadziński, 2018)

Further Information

  • Lancaster, G. A., Dodd, S., & Williamson, P. R. (2004). Design and analysis of pilot studies: recommendations for good practice. Journal of evaluation in clinical practice, 10 (2), 307-312.
  • Thabane, L., Ma, J., Chu, R., Cheng, J., Ismaila, A., Rios, L. P., … & Goldsmith, C. H. (2010). A tutorial on pilot studies: the what, why and how. BMC Medical Research Methodology, 10 (1), 1-10.
  • Moore, C. G., Carter, R. E., Nietert, P. J., & Stewart, P. W. (2011). Recommendations for planning pilot studies in clinical and translational research. Clinical and translational science, 4 (5), 332-337.

Carassa, R. G., Bettin, P., Fiori, M., & Brancato, R. (1998). Viscocanalostomy: a pilot study. European journal of ophthalmology, 8 (2), 57-61.

Gadziński, J. (2018). Perspectives of the use of smartphones in travel behaviour studies: Findings from a literature review and a pilot study. Transportation Research Part C: Emerging Technologies, 88 , 74-86.

In J. (2017). Introduction of a pilot study. Korean Journal of Anesthesiology, 70 (6), 601–605. https://doi.org/10.4097/kjae.2017.70.6.601

LaBerge, S., LaMarca, K., & Baird, B. (2018). Pre-sleep treatment with galantamine stimulates lucid dreaming: A double-blind, placebo-controlled, crossover study. PLoS One, 13 (8), e0201246.

Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The role and interpretation of pilot studies in clinical research. Journal of psychiatric research, 45 (5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008

Malmqvist, J., Hellberg, K., Möllås, G., Rose, R., & Shevlin, M. (2019). Conducting the Pilot Study: A Neglected Part of the Research Process? Methodological Findings Supporting the Importance of Piloting in Qualitative Research Studies. International Journal of Qualitative Methods. https://doi.org/10.1177/1609406919878341

Neff, K. D., & Germer, C. K. (2013). A pilot study and randomized controlled trial of the mindful self‐compassion program. Journal of Clinical Psychology, 69 (1), 28-44.

Pitman, R. K., Sanders, K. M., Zusman, R. M., Healy, A. R., Cheema, F., Lasko, N. B., … & Orr, S. P. (2002). Pilot study of secondary prevention of posttraumatic stress disorder with propranolol. Biological psychiatry, 51 (2), 189-192.

Richins, M. L. (1983). Negative word-of-mouth by dissatisfied consumers: A pilot study. Journal of Marketing, 47 (1), 68-78.

Sartorius, N., Shapiro, R., Kimura, M., & Barrett, K. (1972). WHO International Pilot Study of Schizophrenia1. Psychological medicine, 2 (4), 422-425.

Teijlingen, E. R; V. Hundley (2001). The importance of pilot studies, Social research UPDATE, (35)

Print Friendly, PDF & Email

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The role and interpretation of pilot studies in clinical research

Affiliation.

  • 1 Weill Cornell Medical College, Department of Psychiatry, Box 140, 525 East 68th Street, New York, NY 10065, USA. [email protected]
  • PMID: 21035130
  • PMCID: PMC3081994
  • DOI: 10.1016/j.jpsychires.2010.10.008

Pilot studies represent a fundamental phase of the research process. The purpose of conducting a pilot study is to examine the feasibility of an approach that is intended to be used in a larger scale study. The roles and limitations of pilot studies are described here using a clinical trial as an example. A pilot study can be used to evaluate the feasibility of recruitment, randomization, retention, assessment procedures, new methods, and implementation of the novel intervention. A pilot study is not a hypothesis testing study. Safety, efficacy and effectiveness are not evaluated in a pilot. Contrary to tradition, a pilot study does not provide a meaningful effect size estimate for planning subsequent studies due to the imprecision inherent in data from small samples. Feasibility results do not necessarily generalize beyond the inclusion and exclusion criteria of the pilot design. A pilot study is a requisite initial step in exploring a novel intervention or an innovative application of an intervention. Pilot results can inform feasibility and identify modifications needed in the design of a larger, ensuing hypothesis testing study. Investigators should be forthright in stating these objectives of a pilot study. Grant reviewers and other stakeholders should expect no more.

Copyright © 2010 Elsevier Ltd. All rights reserved.

PubMed Disclaimer

Similar articles

  • [Roaming through the methodology. VIII. Pilot studies: sense and nonsense]. Beurskens AJ, de Vet HC, Kant I. Beurskens AJ, et al. Ned Tijdschr Geneeskd. 1998 Sep 26;142(39):2142-5. Ned Tijdschr Geneeskd. 1998. PMID: 9856230 Dutch.
  • Guidelines for Designing and Evaluating Feasibility Pilot Studies. Teresi JA, Yu X, Stewart AL, Hays RD. Teresi JA, et al. Med Care. 2022 Jan 1;60(1):95-103. doi: 10.1097/MLR.0000000000001664. Med Care. 2022. PMID: 34812790 Free PMC article.
  • Guidelines for Evaluating the Feasibility of Recruitment in Pilot Studies of Diverse Populations: An Overlooked but Important Component. Stewart AL, Nápoles AM, Piawah S, Santoyo-Olsson J, Teresi JA. Stewart AL, et al. Ethn Dis. 2020 Nov 19;30(Suppl 2):745-754. doi: 10.18865/ed.30.S2.745. eCollection 2020. Ethn Dis. 2020. PMID: 33250621 Free PMC article.
  • What is a pilot or feasibility study? A review of current practice and editorial policy. Arain M, Campbell MJ, Cooper CL, Lancaster GA. Arain M, et al. BMC Med Res Methodol. 2010 Jul 16;10:67. doi: 10.1186/1471-2288-10-67. BMC Med Res Methodol. 2010. PMID: 20637084 Free PMC article. Review.
  • A brief overview of pilot studies and their sample size justification. Kunselman AR. Kunselman AR. Fertil Steril. 2024 Jun;121(6):899-901. doi: 10.1016/j.fertnstert.2024.01.040. Epub 2024 Feb 6. Fertil Steril. 2024. PMID: 38331310 Review.
  • Towards equity & inclusion: a critical examination of genetic Counselling Education on Intersex Healthcare. Atayan AB, Huerne K, Palmour N, Joly Y. Atayan AB, et al. BMC Med Educ. 2024 Aug 29;24(1):942. doi: 10.1186/s12909-024-05898-x. BMC Med Educ. 2024. PMID: 39210433 Free PMC article.
  • Mapping review of 'proof-of-concept' in mental health implementation research using the TRL framework: a need for a better focus and conceptual clarification. Woods CE, Lukersmith S, Salvador-Carulla L. Woods CE, et al. BMJ Open. 2024 Aug 22;14(8):e080078. doi: 10.1136/bmjopen-2023-080078. BMJ Open. 2024. PMID: 39179274 Free PMC article. Review.
  • Focusing with colorectal cancer patients: a pilot study of a brief online group intervention. Gomes M, Silva ER, Salgado J. Gomes M, et al. Front Psychol. 2024 Aug 8;15:1339823. doi: 10.3389/fpsyg.2024.1339823. eCollection 2024. Front Psychol. 2024. PMID: 39176044 Free PMC article.
  • Cognitive load in individuals with a transfemoral amputation during single- and dual-task walking: a pilot study of brain activity in people using a socket prosthesis or a bone-anchored prosthesis. Möller S, Hagberg K, Ramstrand N. Möller S, et al. J Rehabil Med. 2024 Aug 22;56:jrm40111. doi: 10.2340/jrm.v56.40111. J Rehabil Med. 2024. PMID: 39175448 Free PMC article.
  • Duration of face down positioning following full-thickness macular hole repair: A protocol for a randomized pilot study. Nanji K, Oquendo PL, Srinivasan S, Vyas C, Prasad F, Farrokhyar F, Chaudhary V. Nanji K, et al. PLoS One. 2024 Aug 20;19(8):e0304566. doi: 10.1371/journal.pone.0304566. eCollection 2024. PLoS One. 2024. PMID: 39163327 Free PMC article. Clinical Trial.
  • Campden-Main BC, Wegielski Z. The control of deviant behavior in chronically disturbed psychotic patients by the oral administration of reserpine. Ann N Y Acad Sci. 1955 Apr 15;61(1):117–122. - PubMed
  • Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. 3. New York: Springer; 1998.
  • Goller ES. A controlled trial of reserpine in chronic schizophrenia. J Ment Sci Oct. 1960;106:1408–1412. - PubMed
  • Jovanovic BD, Levy PS. A look at the rule of three. The American Statistician. 1997;57:137–139.
  • Jovanovic BD, Zalenski RJ. Safety evaluation and confidence intervals when the number of observed events is small or zero. Annals of Emergency Medicine. 1997;30:301–6. - PubMed
  • Search in MeSH

Related information

  • Cited in Books

Grants and funding

  • P30 MH068638/MH/NIMH NIH HHS/United States
  • P30 MH068638-05/MH/NIMH NIH HHS/United States
  • R01 MH060447/MH/NIMH NIH HHS/United States
  • R01 MH060447-07/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full text sources.

  • Elsevier Science
  • Europe PubMed Central
  • PubMed Central
  • MedlinePlus Health Information
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Member Benefits
  • Communities
  • Grants and Scholarships
  • Student Nurse Resources
  • Member Directory
  • Course Login
  • Professional Development
  • Organizations Hub
  • ONS Course Catalog
  • ONS Book Catalog
  • ONS Oncology Nurse Orientation Program™
  • Account Settings
  • Help Center
  • Print Membership Card
  • Print NCPD Certificate
  • Verify Cardholder or Certificate Status

ONS Logo

  • Trouble finding what you need?
  • Check our search tips.

hypothesis feasibility study

  • Oncology Nursing Forum
  • Number 5 / September 2018

Feasibility Studies: What They Are, How They Are Done, and What We Can Learn From Them

Anne M. Kolenic

Nursing clinical research is a growing field, and as more nurses become engaged in conducting clinical research, feasibility studies may be their first encounter. Understanding what they are, how to conduct them, and the importance of properly reporting their outcomes is vital to the continued advancement of nursing science.

Become a Member

Purchase this article.

has been added to your cart

Related Articles

Cancer prevention and early detection: a changing field with an unchanging goal, palliative care: building a foundation for clinical oncology nursing practice, pandemic-related innovations in oncology nursing research: seeking the positive, related books.

Core Curriculum for Oncology Nursing (Seventh Edition)

Core Curriculum for Oncology Nursing (Seventh Edition)

Advanced Oncology Nursing Certification Review and Resource Manual (Third Edition)

Advanced Oncology Nursing Certification Review and Resource Manual (Third Edition)

Access Device Guidelines: Recommendations for Nursing Practice and Education (Fourth Edition)

Access Device Guidelines: Recommendations for Nursing Practice and Education (Fourth Edition)

  • Open access
  • Published: 02 September 2024

Implementation of health-promoting retail initiatives in the Healthier Choices in Supermarkets Study—qualitative perspectives from a feasibility study

  • Katrine Sidenius Duus   ORCID: orcid.org/0000-0002-1630-3132 1 ,
  • Tine Tjørnhøj-Thomsen   ORCID: orcid.org/0000-0003-3621-6682 1 &
  • Rikke Fredenslund Krølner   ORCID: orcid.org/0000-0002-4928-4310 1  

BMC Medicine volume  22 , Article number:  349 ( 2024 ) Cite this article

Metrics details

Improving food environments like supermarkets has the potential to affect customers’ health positively. Scholars suggest researchers and retailers collaborate closely on implementing and testing such health-promoting interventions, but knowledge of the implementation of such interventions is limited. We explore the implementation of four health-promoting food retail initiatives selected and developed by a partnership between a research institution, a large retail group, and a non-governmental organisation.

The four initiatives included downsizing of bags for pick’n’ mix sweets and soda bottles at the check-out registers, shelf tags promoting healthier breakfast cereal options, and replacing a complimentary bun with a banana offered to children. The initiatives were implemented for 6 weeks (or longer if the store manager allowed it) in one store in Copenhagen, Denmark. Data were collected through observations, informal interviews with customers, and semi-structured interviews with retailers. We conducted a thematic analysis of transcripts and field notes inspired by process evaluation concepts and included quantitative summaries of selected data.

Two out of four initiatives were not implemented as intended. The implementation was delayed due to delivery issues, which also resulted in soda bottles not being downsized as intended. The maintenance of the shelf tags decreased over time. Retailers expressed different levels of acceptability towards the initiatives, with a preference for the complimentary banana for children. This was also the only initiative noticed by customers with both positive and negative responses. Barriers and facilitators of implementation fell into three themes: Health is not the number one priority, general capacity of retailers, and influence of customers and other stakeholders on store operation.

Conclusions

The retailers’ interests, priorities, and general capacity influenced the initiative implementation. Retailers’ acceptability of the initiatives was mixed despite their involvement in the pre-intervention phase. Our study also suggests that customer responses towards health-promoting initiatives, as well as cooperation with suppliers and manufacturers in the development phase, may be determining to successful implementation. Future studies should explore strategies to facilitate implementation, which can be applied prior to and during the intervention.

Peer Review reports

What we eat affects our health and well-being [ 1 ]. Diet is associated with obesity, cancers [ 2 ], and mental well-being [ 3 ], and a healthy diet has been associated with lower all-cause mortality [ 4 ]. One important factor in improving diet is to create a food environment that supports a healthy diet [ 5 , 6 ]. In modern societies, such as Denmark, supermarkets are the main source of food [ 7 ]. Supermarkets therefore hold a significant influence on what food we buy and potentially also eat [ 7 , 8 , 9 ]. Studies report associations between the concentration of supermarkets and overweight and obesity in the neighbourhood [ 10 ] and between the healthfulness of supermarkets and people’s diets [ 11 , 12 ]. Moreover, unhealthy food and beverage products are promoted more often than healthy products and beverages in, for example, supermarkets [ 9 , 13 , 14 ]. This indicates a need to explore how and if it is possible to implement health promotion initiatives in supermarkets and whether customers respond to such initiatives as intended.

Studies show that health-promoting interventions in supermarkets can affect customers to purchase more healthy products [ 7 , 9 , 15 , 16 , 17 ]. Reviews and a meta-analysis have concluded that the most effective initiative in supermarket settings is price changes—the evidence points to the positive effects of reduced prices to increase the purchase of healthier products, especially fruit and vegetables [ 7 , 17 ]. Even though price reductions seem to be effective, they seem more challenging to implement due to retailers’ drive for profit and low preference for financing such price cuts [ 7 , 18 ]. There is some evidence that nudges in terms of product information and positioning, as well as altering the number of available products, can impact what products are being purchased [ 15 , 16 ]. However, the quality of this evidence is low. Overall, most of the studies that have explored the effect of interventions in supermarkets have been conducted in the USA and other high-income countries [ 15 , 16 ], in controlled settings, or applied a weak study design, such as non-randomised studies [ 16 , 17 ]. To our knowledge, only a few studies have been conducted in Denmark [ 19 , 20 , 21 , 22 , 23 , 24 , 25 ]. These studies represent different designs and types of interventions: reformulation of private-label products to reduce calorie content [ 24 ], informational claims to promote low-salt foods [ 23 ], nudges via signs to promote sales of fruit and vegetables [ 22 ], positioning (shelf-space management) of dairy products [ 20 ], replacement of sugar confectionery with fruit and healthy snacks at the checkout [ 19 ], discount on fruit and vegetables combined with space management [ 25 ] and structural changes in supermarkets and education of supermarket employees as part of a multicomponent intervention [ 21 ] (the three latter studies are reporting from the same project). All but one study [ 23 ] found an effect of the applied intervention strategies, although mostly small or modest. This calls for more studies in real-life settings and investigations of why some interventions have the desired effect while others do not. Lack of effect may be explained by 1) customers not noticing or finding the initiatives relevant [ 19 , 23 ], 2) customers buying other products instead of or additionally to promoted intervention products [ 20 , 24 ], 3) the shelf organising effect [ 20 ], or 4) theory fail in regards to customer behaviour [ 22 ].

Several studies have explored facilitators and barriers to the implementation of health-promoting interventions in supermarkets. Reviews show that implementation is supported if the retailer is receptive to innovation, feels responsible for community health, and receives financial support or subsidies [ 26 ]. Furthermore, implementation is supported if the intervention provides the retailers with knowledge of health promotion and business skills [ 26 , 27 ]. Other facilitators include compatibility with context and customers’ needs, positive customer responses to the initiative, the prospect of improved public image, establishment of partnerships, low retailer effort requirements, and increased profit or sales [ 26 , 27 ]. Health-promoting interventions in supermarkets are hindered by high customer demand for unhealthy products and lower demand for healthy products, constraints of store infrastructure, challenges in product supply, high staff turnover, and lack of time [ 26 , 27 ]. Other barriers are doubt regarding changing customers’ behaviour, poor communication between collaborators [ 26 ], high running costs, and risk of spoilage [ 26 , 27 ].

Middle et al. [ 26 ] conclude that the underlying mechanism of barriers and facilitators of implementation is the (mis)alignment of retailers’ and intervention researchers’ interests. The authors, therefore, suggest a close collaboration between intervention researchers and retailers to work towards an alignment of interests and resolving or avoiding misalignment, which is supported by Gupta et al. [ 27 ]. However, knowledge of how such collaborative efforts affect the implementation of healthy food retail interventions is warranted.

The aim of this study is to explore the implementation, acceptability, and feasibility of four different health-promoting food retail initiatives to increase customers’ purchase of healthy food and beverages, which were selected and developed together with food retailers: 1) Promotion of healthier breakfast cereals and products using shelf tags, 2) downsizing of sodas sold at the checkout desks, 3) downsizing of bags for the pick’n’ mix sweets, 4) replacement of a complimentary bun for children with a banana. The study has three research objectives:

To document the implementation and sustainment of the initiatives over time

To explore the retailers’ and customers’ responses to and acceptability of the initiatives

To investigate barriers and facilitators of implementation and sustainment of the initiatives.

Setting and the initiatives

This study was conducted in Denmark during 2020 and 2021, 2 years that involved two major societal events, first the coronavirus disease pandemic and later the start of the Russia-Ukraine war. Both events heavily influenced the circumstances of everyday life including opportunities for conducting research and running businesses. The specific influences on this study will be unfolded later in the findings and discussion sections.

In this study, we collaborated with the retailer Salling Group, which holds 34.2% of the market share of grocery retailers in Denmark [ 28 ]. Salling Group is owned by the Salling Foundations and has no shareholders—all profits go to reinvestment in the business and donations to sports (amateur and professional), charity, education, and research. Salling Group owns three national supermarket chains: føtex, Netto and Bilka, alongside other businesses. For the feasibility test, we collaborated with føtex, which owns over 100 stores all over Denmark, including 23 stores called føtex food. føtex (except føtex food) offers both groceries and many different non-food products (e.g. textiles, cosmetics, toys, electronics, and home accessories).

The initiatives were selected and developed by a partnership, including a group of researchers at the National Institute of Public Health, University of Southern Denmark, consultants from the Danish Cancer Society, and employees at the Corporate Social Responsibility (CSR) department in Salling Group, the marketing department at føtex, and two store managers (hereafter referred collectively to as ‘the retailers’) over approximately 2 years. The process involved in-person meetings, desk research (the use of existing material [ 29 ]), visits to the test store, and a prototype test of three suggested initiatives. The researchers initiated the collaboration and were responsible for designing the research study and data collection and analyses. The retailers hosted the site of the feasibility test, contributed to the selection and development of initiatives and co-managed the practical part of the study. The Danish Cancer Society was recruited by the research project to develop the initiatives. A detailed description of the collaboration and development process is reported elsewhere (Duus et al.  unpublished ).

The feasibility test ended up including four initiatives: 1) Promotion of healthier breakfast cereals and products using shelf tags, 2) downsizing of soda sold at the checkout desks, 3) downsizing of bags for the pick’n’ mix sweets, 4) replacement of a complimentary bun for children with a banana (suggested by the retailers). The initiatives were based on a compromise between the willingness of the retailers and the interest and ideas of the remaining partners rather than on what the literature suggests are the most effective strategies (Duus et al.  unpublished ). Detailed descriptions of the initiatives and the rationale behind them are found in Table 1 .

The prototype test showed that 1) It was important to have a sign informing the customers about the initiative that offered a free banana to children instead of the usual free bun to create a better understanding of the changed offer; 2) Promotional shelf tags needed weekly maintenance as some would fall off; 3) It was difficult to sustain an initiative promoting ready-to-serve salads and ready-to-cook vegetables next to different fresh meats, as it met resistance among the staff due to being an additional task and led to more product waste (Customers did not expect to find these products next to the meat and therefore might not notice them). The learnings from the prototype test led to modifications of the implementation plan and the discard of the latter initiative. The prototype test also made us aware of how quickly the selection of food offered and the layout of the store changed over time, which the researcher, therefore, paid extra attention to during subsequent data collection. Moreover, the researcher made sure to update the list of products that should have a shelf tag a few weeks before the implementation to include new products offered.

The føtex marketing department developed a script to inform the staff at the test store about the feasibility test, explaining and showing each initiative and the aim of the study overall. This was sent to the store manager after being reviewed by the researchers. The store manager was responsible for informing all relevant staff about the implementation and maintenance of the initiatives. The føtex marketing department also made sure to inform the relevant suppliers. Employees at the test store and brand staff from a brewery (who stock the coolers at the check-out desks) implemented the initiatives in the store. The research group did not correct or maintain the initiatives in the store after they were launched; however, the researchers monitored it and reported back to the retailers, either at meetings or by email.

Overall study design

The four initiatives were implemented in the test store for 6 weeks (or longer if the store manager allowed it) starting in September 2021. A føtex store in central Copenhagen (the capital city of Denmark) was chosen as the test store. This decision was made for pragmatic reasons, as the research institute is based in Copenhagen, and based on Salling Group’s decision as it offered their new store layout, which all stores were in the process of being converted to (it was the same store as where the prototype test was conducted).

We designed a qualitative study involving participant observations and interviews to evaluate the feasibility of the initiatives. The methods were designed to explore the partnership and collaboration (the aim of another publication [Duus et al. Unpublished ]), as well as the implementation of the initiatives [ 30 ]. In the design of this study, we were inspired by McGill et al.'s (2020) two-phase framework of qualitative process evaluation from a complex systems perspective. This framework suggests an evaluation that looks at changes over time, starting with phase 1, the static system description and hypothesis generation about how the system might change when the intervention is introduced, followed by phase 2, an adaptive evaluation approach to the system undergoing change which follows emerging findings [ 31 ].

Data collection

In-store observations.

During October and November 2020, we mapped the store layout and customer flow in the test store as part of the static system description. Over 3 weeks, three research assistants performed 12 participant observations of 1005 min in total. The observations followed an observation guide which covered 1) the physical setting (e.g. the layout, placement of products, signs, and pictures); 2) the people (e.g. who are the customers? Are people shopping alone or together with others? How do they move around the store? What are the staff doing?) and 3) short interviews with customers (if possible) about their shopping at the particular store, and their thoughts about the layout of the store. The research teams’ access to the store was approved by the store manager, and research assistants wore a key chain with a sign showing their name and affiliation during the observations. During this data collection period, it was made mandatory to use face masks in supermarkets due to the coronavirus disease pandemic. As the implementation was delayed to approximately 1 year after this static description was completed, one participant observation in the test store was performed at the end of August 2021, just before initiative implementation, to document any major changes in the store layout and selection. Key lessons from these observations about the test supermarket and customers’ behaviour in the store included knowledge on 1) the route around the store, 2) the different times spent at the store, 3) interactions with objects (e.g. products and phones), 4) interactions with children, 5) behaviour of the staff, and 6) sensory impression (Additional file 1). These lessons informed our following data generation and assisted in contextualising our analysis.

The first author monitored the implementation process through participant observations of status meetings ( n  = 2) and correspondence via email and phone with the store manager and the contact person at føtex. In-store participant observations were conducted during and after the feasibility test period, September 2021–May 2022 ( n  = 25 ~ 1795 min in total; see Additional file 2). These observations focused on documenting the presence of the initiatives as well as customers’ and staff’s responses to the initiatives. Access to the store was once again approved by the store manager, and the researcher wore a key chain. During the participant observations in-store, we conducted informal interviews with customers (see Additional file 2 for examples of questions), which lasted a maximum of 5 min each. The first author would approach people and ask if they were interested in answering a brief question. She introduced herself by her first name, where she worked and explained she was doing a research project about shopping patterns. The participant observations were documented by taking notes and photos. Handwritten notes were digitalised and written down at the first chance after leaving the store.

Qualitative interviews

Between November 2021 and February 2023, the first author conducted four semi-structured interviews with retailers ( n  = 3) who had been involved in the study (Table 2 ) to explore their views on the initiatives and the implementation process. Interview guides were used in all interviews alongside different prompts (e.g. timelines and documents). Interview guides were tailored to each participant’s specific role and involvement in the development and implementation of the initiatives. Besides questions related to the initiatives and the implementation effort, the guides included questions about the informants’ background and motivation for the project (personally and professionally), their view on their role and scope for action (individually and organisationally) and their perception of the collaboration with the other organisations. After the participants’ consent was given verbally right before the interview, the interviews were recorded and later transcribed verbatim.

To explore the level of implementation (research objective I), all field notes and photos taken during and after the feasibility test were reviewed to assess whether the initiatives were present and to what degree (e.g. x out of x possible tags).

To explore the perception of the initiatives among employees and customers (research objective II) and identify barriers and facilitators for implementing the initiatives (research objective III), we followed a thematic analysis inspired by Braun and Clarke [ 32 ]. Firstly, field notes and interview transcripts were read thoroughly and openly coded, by writing keywords in the margin of the material, with a focus on the two research objectives. After initial coding, the codes were summarised into broader themes, by writing them into a document with short descriptions and revised according to data excerpts and the full empirical material. The themes drew on the process evaluation concepts: acceptability, responsiveness [ 30 ], motivation, general capacity to implement [ 33 ] and commercial viability [ 34 ]. Lastly, the themes were named, and the final analysis was written up.

We have structured the presentation of study findings as follows: Firstly, we present the implementation of the initiatives overall. Secondly, we present the implementation of each initiative, customers’ responses to them, and the retailers’ perspectives. Lastly, we present the overall facilitators and barriers to the implementation of the initiatives.

Implementation of the initiatives

The implementation of the initiatives was challenged. Firstly, we found that not all the preparations for the implementation were finished in time for the scheduled day. On the scheduled day, the retailer decided to push back the implementation by 1 week. The main reasons were that there had been some misunderstandings around the ordering of the smaller sodas. It was informed that the smaller soda would be a 330 ml can instead of the 375 ml bottle at the price of DKK 10.00 (~ 1.3 euros). The 500 ml bottle usually sold at the coolers cost DKK 16.00 (~ 2.2 euros). The Danish Cancer Society and the research group had two concerns about this: 1) the use of a can instead of a bottle would make the interpretation of the results very difficult, as the bottle and the can have two different functions to the customer—with the can, the product would be consumed all at once, whereas the bottle with the screw lid could be saved for later after it had been open; 2) the price was too low—the price per litre would be lower on the smaller sodas than it had been on those replaced. No changes were made despite these concerns.

Secondly, just days before the implementation, the retailers informed the other partners that they would stick with cans for the test of smaller-sized sodas and that they would now be 250 ml. They acknowledged that both the size and the packing were not optimal but that the optimal 375 ml in a bottle was just not possible. Additionally, they informed the researchers that they could no longer find the new bags produced for the pick’n’mix sweet display.

These challenges led to a delay of the implementation of the initiatives by 1 week, but also a staggered implementation, where the initiatives were implemented when ready (the soda initiative 2 weeks later and the bags for pick’n’ mix sweets 8 weeks later). The retailers agreed to push back the end day correspondingly, upholding the 6 weeks of implementation. Table 3 shows an overview of the implementation of the four initiatives according to the day and week of the feasibility test period.

Smaller product sizes of sodas at the checkout desk

As seen from Table  3 , we did observe the implementation of a smaller product size of the targeted sodas in all coolers, besides the one at the bakery, in the week leading up to the agreed date. We hereafter observed a full implementation of 250 ml cans during the first 2 weeks of implementation. During the third week and the beginning of the fourth week, we observed a mix of 250 and 330 ml cans or only 330 ml cans. The store manager explained that this was probably due to non-delivering from the supplier. At the end of the fourth week and for the last 2 weeks, we observed a full implementation of 250 ml cans. As the targeted size of the initiative was a 375 ml bottle, the initiative was not implemented as intended. After the 6-week feasibility test period, we observed that the smaller 250 ml cans were available in all coolers for at least eight more weeks. As expected, the presentation of the coolers fluctuated over the period. On days of stocking (Monday, Wednesday, and Friday), the coolers would look neat and full, while they would appear more empty or messy on other days.

Customer responsiveness

We observed very few customers who bought any products from the coolers, and we did not get to talk to any customers about the initiative. However, the observations in the store showed no distinct change in customers’ behaviour around the coolers nor expressions of discontent or excitement with the initiative. In an interview with the store manager, he explained that he believed customers had not noticed the change.

Retailer perspectives

The store manager was positive about the initiative, but from his perspective, the decision to implement it should be made at the procurement level and by the suppliers. However, he did have an opinion on how to implement it. The price needed to be fair according to the product it replaced. Moreover, he drew attention to the fact that it was the supplier’s personnel who stocked the products rather than his own. The store manager was, therefore, not surprised that the employees at the store had little to say about the initiative. føtex’s representative (B) was also positive about the initiative and expressed in the interview that the chain would be willing to implement it—if they found it to be the ‘right thing’ to do. However, the representative also emphasised the importance of agreeing with the suppliers, which is a time-consuming process and ‘not done in just six months’.

Shelf tags for breakfast cereal products

From the first day of the implementation, some tags were missing, and one tag was consistently misplaced (Table  3 ). During the first 3 weeks, 10% ( n  = 3) of the tags were missing. This portion progressively increased to 23% until the end of the fifth week. In the sixth week, the portion decreased at first to 16% but decreased again and ended at 26%. In the weeks after the implementation period, the tags stayed present but slowly came off. Approximately 6 months later, three (10%) of the tags were still present. We observed throughout the feasibility test that the presentation of the area varied, which is to be expected in a busy supermarket. At times, the area looked messy; boxes would block access to some products, products would be sold out, some would change packaging, and new products would be introduced to the selection.

When we asked customers about the tags, we learned that they had been unaware of them and that some believed that it was not something they would use—some did not know the meaning of the labels on the tags, while others did not find the labels relevant for them.

[The tags] don’t matter. My wife is pretty health conscious, so we don’t use those, let alone know with such a thing as breakfast cereal. (Male customer)

From our observations of the behaviour of the customers in the breakfast products and cereals department, we find two interesting groups: Those who shop alone and those who shop together with others (primarily children). These groups seem to practice different behaviours.

Among those who do their grocery shopping by themselves, we find two subgroups: 1) those who have planned or know exactly what they want to buy, and 2) those who decide at the store. For the first sub-group, we observed that some showed this by practising a behaviour where they would walk quickly and purposefully towards the shelves and quickly pick up a product. Others would look determined to find a specific product, as the fieldnote excerpt illustrates:

A woman stands looking at the muesli. She first grabs an orange bag on the bottom shelf, then a more yellow one next door and puts the first one back on the shelf. She inspects the bag she took. She starts to look around the shelves more and reaches for a bag that has a pinker look on the top shelf. She puts it back and reaches into the space next to it, where there are a few bags at the very back, but she has difficulty reaching them. A man comes by, notices the woman, and offers to help her. The woman indicates a yes, and the man reaches up and grabs a bag ‘that's the one!’ says the woman as the man hands her the bag.

Another example was a man who kept looking back and forth between some muesli and granola products and his phone before he eventually chose a product. It is unknown whether the man was looking at a specific note, a text request from his family, or a picture on his phone, yet what was on his phone seemed to determine the product he bought. Overall, this group seemed very unlikely to be influenced by the tags, as they had made their choice already before they entered the store.

For the second sub-group, those who seemed to make their decision in the store, we observed that some would just stop and glance at the products without choosing one before moving on with their shopping. Others would look more randomly at the selection than those described above, walk back and forth in the aisle, compare different products and read the info on the back of the products.

For those who shopped together with others (most often children), we observed that when adults shopped with children, the choices of the child and the choices of the adult often conflicted. In one example of a child and a woman who looked at breakfast cereal products, the child was initially allowed to pick a product and asked for different chocolate variants, which all featured cartoon figures; however, the woman rejected all of the child’s choices. In the interaction, the child was met with demands from the woman regarding the attributes of the products: they could not contain chocolate or sugar. In the end, it was the woman who chose a product based on her experience of the child’s preferences and her criteria. In similar situations, we did observe an attempt at compromising between the adult’s and the child’s criteria, which was explained by this woman:

I ask them [woman and boy aged about 10] what they look for when choosing breakfast cereals. The woman looks at the boy and says, ‘Well, what are we looking for?’. The boy does not answer but looks at her and me and smiles. The woman herself replies, ‘Something we can agree on. Something he likes but is not too unhealthy, either’. I ask her what she considers unhealthy. She waffles for a bit and then replies, ‘Yes, but he wants that Lions cereal, for example, and I don’t want him to have that. So something that’s not de facto sweets’. She takes the box of granola that they have chosen [Paulún's blueberry/lemon granola] out of the basket, looks at it and says, ‘So we chose this one. There's probably also a lot of fructose and caramelised stuff in it, but yeah.’

This illustrates the high impact children had on the choices of breakfast products, but also how the parents tried to control and negotiate the final choice.

Retailer perspective

The store manager had little faith in the effectiveness of the shelf tags:

The thing about tagging cereals, I don't think that makes the slightest difference. The reason why I’m sceptical in that regard is that it’s a mixture of what I do on a daily basis. It’s especially the behavioural patterns of our customers, but also how I act as a customer myself to a degree. I don't think shelf tags with the whole grain label or anything like that; in my experience it hasn’t changed things much. (Store manager)

His view on the effect of the initiative was in line with our observations of the customers in the store. Furthermore, the store manager explained that it was difficult to maintain the initiative, as it was not part of the employees’ daily routine. This was also the argument of why the tags lingered after the test period—it was just not part of the usual protocol either to hang them up or take them down. This perspective was shared by the føtex representative (B), who also highlighted the cost of this maintenance.

Contrary to the store managers’ sceptics, the føtex representative (B) was more positive about the initiative:

I think it’s a good initiative. We work a lot with tags and labels in general. [...] I think making it transparent to the consumer is really interesting because there’s nothing wrong with buying a box of Nesquick cereal every once in a while. At least we should not claim it’s the wrong thing to do. But you just have to be clear about what you’re buying, and I think those labels help with that. (føtex representative (B))

She explained that the initiative was highly compatible with their usual strategies. However, she also explained in the interview that a barrier to using shelf tags to promote the buying of certain products was that the chain was trying to reduce the printed material they used in their stores as part of their CSR strategy and to reduce costs.

Replacement of the complimentary bun for children with a banana

The complimentary banana was fully implemented in the feasibility test period except for 1 day of observation, where the signs were not visible (Table  3 ). The initiative also remained available and present by the sign for at least 10 weeks after the implementation period. Furthermore, the store manager informed the researcher that they would continue to provide bananas for customers requesting this as an act of customer service. From the observations, we do find that the presentation of the initiative changed throughout the period. At first, the bananas were placed in a cardboard box on the display counter, which was later replaced with a nicer-looking basket. The number of bananas and their colour also fluctuated during the different days, which would be expected due to the delivery of the bananas and how often they are restocked. However, compared to the buns, we never observed that the bananas were not available, making it a reliable offer no matter the time of the day.

We observed two ways (1 and 2) that the complimentary offer for children was brought up: 1) A customer would ask for the ‘bun for children’, or 2) the staff would offer the complimentary banana to buying customers. In the first way 1), we saw two responses from the staff (a and b) and the customers (i and ii): (a) The customer would be offered the bun with no mention of the banana, or (b) the staff would inform the customer that they no longer offered buns but that they offered a banana instead. The customers had two primary responses to this message: (i) The customer rejected the offer and decided to buy a bun or another item instead. The child was often included in this decision. (ii) The customer accepted the offer and received the banana. In some cases, the child did not accept the offer and the customer compensated for this response by buying a bun or another product for the child. In the second way 2), in which the staff offered the banana spontaneously, the customers almost always reacted positively and accepted the offer.

The following excerpt illustrates why some customers rejected the offer:

A woman with a child of about 1-year-old in a stroller walks up to the bakery and asks for a children's bun. The child has already noticed the buns from the moment they arrive and sits, pointing at the buns through the glass window and babbling. The shop assistant says that there are no children's buns but bananas and points to the sign. The woman replies, ‘I’d like to buy a bun, then’. The assistant takes the bun and enters it into the till, while the woman says, ‘Bananas are so messy’. The assistant smiles and says, ‘Well yeah, I'll pass that on’. The woman replies, ‘It's just that the banana is rather a bother, and the assistant replies, ‘But I think we’ll be offering [the buns] again eventually’.

Thus, adults rejected the offer because eating a banana was a messier process than eating a bun. During meetings and interviews, the retailer also highlighted this as the main reason for rejections of the offer, especially among those with younger children. Another reason for rejection was that the parents did not appreciate the offer nor perceived a need to offer their children a banana instead of a bun.

This initiative was the most successful and interesting one in the eyes of the store manager.

I’d like to highlight the banana for kids, which is clearly the initiative I found most customers were pleased with. (Store manager)

Many customers responded positively to the new offer, which was emphasised as a marker of success. It was also the reason why the initiative continued after the 6-week period, and the store manager explained that they would continue to give bananas to those who asked for them.

The following excerpt illustrates what the bun meant to føtex and the chain’s relationship with its customers.

The children's bun has been around for donkey’s years, and it’s become ingrained in parents and kids alike that you can get them in føtex. So, we’re quite interested in learning how many people would actually, if presented with the alternative, choose something else, like, for example, the banana. I’m quite surprised by that – we can't track it, unfortunately – but off the top of my head, up to 40 to 50 percent actually choose the banana. I find that very interesting. (føtex representative (B))

Thus, it came as a surprise that the initiative was so well received. However, despite the positive experiences with the initiative, the retailers also commented on the cost. They highlighted that the banana was more expensive than the bun, and if it should be an option offered in all stores, then it would have to be prioritised at the executive level as an additional expenditure. In this case, the banana would only be an alternative to the bun and not a replacement. This was rationalised by the retailers’ attitude of not making choices on behalf of the customers.

Smaller bags for pick’n’ mix sweets

This initiative was not implemented until 8 weeks after the initial implementation date. It was fully implemented for five out of the six weeks; during the third week, we observed that the old, larger bags had been hung in front of the new smaller bags. At 2 weeks and four and a half months after the feasibility test, the smaller bags could still be found behind the larger bags—however, it is unlikely that these would have been used, as the obvious choice would have been the bag at the very front. As described for the other areas, this also fluctuated in its presentation and stocking.

We did not get any direct reactions from customers on the smaller bag. However, our observations showed that different strategies were used to decide the amount of candy among customers who bought pick’n’mix sweets. Some showed signs of visually assessing the amount of sweets in the bag, which were the customers we would expect to influence. We often observed this strategy among adults with children, where it was the adult who would visually assess the amount and communicate to the child when they had picked enough.

Those with very young children would walk alongside the child and select the sweets for them, and some adults would encourage the choice of the child by pointing out different variants and commenting on the appearance of the sweets.

Other strategies were to mix according to a pre-defined number of pieces or volume:

A boy of about 10 and a girl of about 8 come over and mix sweets. They repeatedly weigh the bag while doing so. A woman comes over, and the girl says, ‘Hello mummy!’ The woman says, ‘Don’t forget to weigh it’. She then grabs a bag herself and begins to mix sweets. The boy asks the girl, ‘Did you weigh it?’. The girl walks over to the scales and says, ‘I think I’ve got enough’. However, she does not close the bag, and she begins to walk around somewhat restlessly, then says, ‘I don’t know what to pick. I’m still [a few] grammes short’.

An interesting aspect of the situation above is that the girl expressed that she was satisfied with what she had chosen, but she felt that she had to meet the prespecified weight and, therefore, tried to find more sweets to put in her bag. Such strategies undermine the mechanism which the initiative was trying to influence.

Overall, the retailers were positive about this initiative. The føtex representative (B) highlighted that this initiative was interesting as it was a stealth initiative, compared to the initiatives with the sodas, and would change the behaviour of the customers without them noticing. In her opinion, this was not a problem, as people paid per gram.

The store manager had a clear demand for the implementation; it should be easy for both the staff and customers to use. This perspective was backed up by a føtex representative (B) who said:

If there’s something that doesn’t work for us, it’s... if it doesn’t work for our customers, that’s what we need to solve first. (føtex representative (B))

This shows how one success criterion of the retailers is customer satisfaction, which we elaborate on later (See: Influence of customers and other stakeholders on store operation).

The initiative was very delayed, and one reason was that it was challenging to create a new bag that would work in the store. This resulted in the order of many different bags in large quantities due to the agreements with the suppliers, which had been very costly for the retailer.

The føtex representative (B) also reflected on what the potential evidence of an effect would mean to the retailer:

Then we’ll have to wait and see if people buy fewer sweets. And of course, this is something that we must take into account because it’s no secret that part of being a responsible business is to make a profit. And if we sell fewer sweets, then we make less money. (føtex representative (B))

This shows how health and financial profit were seen as opposites and how the success of the initiative would not necessarily lead to it being viewed favourably, as it would negatively affect their profit. Any implementation in the chain would, therefore, have to be a strategic decision.

Facilitators and barriers

In the sections above, we have focused on the four specific initiatives. In the following, we will present analytical findings that go across the initiatives and elucidate what facilitated and hampered the implementation of the initiatives overall. We have organised our findings under three headings: Health is not the number one priority; General capacity of the retailer; and Influence of customers and other stakeholders on store operation.

Health is not the number one priority

In this section, we present the retailers’ motivation for and interest in engaging in the project and working with health and health promotion and what drives and/or curbs this motivation. In our understanding of motivation, we draw on Scaccia et al. [ 33 ] and view motivation as incentives and disincentives that contribute to the desirability of using an initiative focusing on health.

We find that the retailers expressed motivation for working with health and health promotion, which at first seemed to be based on interest. The retailer representatives explained how they personally were interested in health and wanted to learn more, but also that the organisation had an interest in health, especially among children and young people, and wanted to contribute to health-preventing activities, for example, by financially supporting local sports clubs. According to one retailer representative, this was because physical activity and healthy eating promote happier customers, as well as happy employees. The argument points to retailers’ focus on customer satisfaction (see: Influence of customers and other stakeholders on store operation). The focus on the customers relates to another factor of motivation: Working with health was also seen as a relative advantage in that customers increasingly demand healthier products and alternatives. Lastly, we found that the motivation for working with health was a feeling of obligation due to the view of having a social responsibility:

I would say, in purely business and commercial terms, we are, indeed, a commercial business that was created to make money. There’s no ignoring that (laughs). So, of course, this is our main KPI [key performance indicator]. But that being said, we also agree that we have a social responsibility because we are as big as we are. We make a lot of foodstuffs available to the Danes, as do many of our colleagues in our industry, so there is no doubt that we have a role to play in terms of what we make available. (føtex representative (A))

According to the excerpt, this obligation was rooted in the size of the organisation and, thereby, the major influence on people’s selection of food products. However, the excerpt also highlighted that health was not their first priority, which was profit. This point has been repeatedly mentioned among retailers, which reinforces its validity; they were a business and had to gain profit to keep running their operation, which presented limits for what could be implemented. The store manager even expressed how he perceived the running of a supermarket and promotion of public health as incompatible goals and something he had never seen an example of in a real-life supermarket.

However, from the interviews with the retailers and our fieldwork, it seemed that this was not completely black and white, as the retailers were willing to give up their profit in some cases. An example is the hiding of tobacco products in all Salling Groups’ supermarket chains, which they voluntarily implemented in 2018, which led to a significant decrease in profit from tobacco products.

After all, the Salling Group pioneered this with tobacco products. I'm proud of that, but I also think it’s the right thing to do. My personal opinion is that it was the absolutely correct move they chose to make, by making it harder to market a product that is obviously bad for my health. We’re not there with pick‘n’mix sweets just yet, in that we would claim they’re bad for your health, but the mindset in terms of; that is, upholding the mindset when it comes to cigarettes is something that we, as an industry, can easily support in close cooperation with, among others, yourselves [researchers] and the industry. (Store manager)

Risk seemed to be the driver. If the retailer was convinced that the risk was real or big enough, then they were willing to give up some of their profits because it was the ‘right thing to do’, and they would have the courage and power to do so. It was mentioned by all three informants that they did not believe in bans, limitations or hiding of products, as this interfered with the customer’s freedom of choice. This viewpoint was a barrier to the implementation of all initiatives that used strategies that would minimise or reduce the availability of a product. Yet, as with the tobacco products, we found other examples where this restriction of choice was justified by the retailer. One example was that the føtex chain only sold organic bananas. From a sign in the store, this was because:

‘we want to avoid the spray agent chlorpyrifos. Among other things, it is suspected of harming the development of children and foetuses. We can’t live with that suspicion and therefore you can only buy organic bananas in the future’

As with the cigarettes, the argument here was the health risks. In the interview with the store manager about restricting choices, animal welfare and political reasons (e.g. Russia’s warfare against Ukraine) were mentioned as other arguments for doing so.

So, despite an immediate motivation for working with health, the retailer also expressed how other interests and priorities may hinder and set aside the work with health.

General capacity of the retailer

This section presents our findings relating to the general capacity of the retailer in the form of resources, organisational size, and culture. General capacity is understood as the readiness or ability to implement any new initiative [ 33 ].

Through the interviews with the føtex representative (B) and working together with the retailer during the project, we have found that the retailer seemed to be used to and willing to implement new initiatives. In this current study, they accounted for all expenses related to the development of materials for the test and were also willing to risk some of their profit for a short period of time. The føtex representative (B) highlighted this high level of available resources several times in the interview:

I have some leverage, so when we do something, we don’t do it by halves. What I find most motivating, and I can say that with complete peace of mind, is that if the Salling Group says they’re going to do something, or if føtex says they’re going to do something or says they want to win this particular battle, then we win it, and then we do it to the full. [...] So when we say, for example, with this health project, that ‘we want to work with health,’ then we do want to work with health, and we’re going to make a difference in health, too. (føtex representative (B))

In this excerpt, she expressed that the mere size of the company allowed them to push any agenda if they wanted to. However, this also underlines that this capacity is dependent on the retailer’s willingness, a willingness that was not in favour of many of the initiatives that the researcher, based on the literature, thought would have the greatest effect.

Even though the size of the company came with many available resources, the retailer also explained how the size of the company had worked against the project in several ways:

What I think made it difficult for us to get through with some of these things let's just take the sodas, in that case, we have a private label collaborator who has production facilities, and when they press the ‘Salling sodas’ button, it doesn't just spew out a few thousand bottles, but millions. So saying ‘can't we just try to reduce the size and give it a try.’ It's a giant setup, so it’s not possible to do that at a whim. You’d need to get a whole or half chain on board that can help sell such volumes because otherwise, the costs would go through the roof. (føtex representative (A))

What this excerpt explains is that even changes that appeared small would take tremendous effort and be very costly, due to the size of the organisation.

Another challenge of the implementation was embedded in the retailers’ organisational culture. Føtex representative (B) explained in the interview that conflicting goals between employees made it difficult and time-consuming when implementing new initiatives. Another barrier to implementing the initiatives was high staff turnover at the retailer. In an interview with a føtex representative, she explained that people often shifted around different positions in the organisation, which ended in the project falling between two stools, leading to misunderstandings of agreements and changes in attitudes towards the initiatives.

In summary, we find that the retailers could, in some respects, have a strong general capacity to implement new initiatives by having available resources and being used to implement new initiatives. Regardless, this study shows that this was not utilised due to a lack of willingness. Moreover, we find that the size and organisational culture of the retailer hampered the implementation of the initiatives.

Influence of customers and other stakeholders on store operation

The last section reports on the influence of customers on the retailer’s willingness to implement the initiatives, and the influence of other stakeholders, especially producers, on what can be implemented.

We found that the customer’s reactions and attitudes were determining to the retailer when implementing any new initiative, as indicated in the sections above. According to the retailer, the customer was the focus when designing the layout of the store:

We are in very close dialogue with our clients, we do quantitative surveys and we do focus groups, we do in-depth interviews. And in that context, we're trying to understand, when you're shopping, how do you go about it. Is it easy for you to find the items you are looking for? And based on the responses, we try to adapt our stores to make things easy for our customers. (føtex representative (A))

The same representative also mentioned that she thought it would have been a strength of the project to have conducted interviews with the customers as a part of the development process, emphasising the weight they put on the customer’s attitudes. The retailers highlighted the importance of customer satisfaction and convenience in their shopping experience as a barrier to implementing certain initiatives, such as changing the placement of products. However, these same factors have also proven to be facilitators for other initiatives, such as the tags for breakfast products and the complimentary banana for children, as demonstrated above.

Another important stakeholder for the supermarkets was the suppliers of their products. Others were government actors (e.g. the Danish Veterinary and Food Administration). For both downsizing initiatives, the suppliers of the products (sodas and bags for sweets) were key to the success of their implementation. In an interview with the store manager, he explained the huge role some of these suppliers have in the daily operation of the store and the chain.

After all, we’ve got a chain agreement that our head office has made with the breweries. I don’t get to decide which items are in our refrigerators. [...] The tricky thing is that we’re not only dealing with føtex or the Salling Group. We also have to do with some other, equally large companies that are also just coming in. Plus, I have people here X times a week to service their particular area. [...] [Another thing] that proved tricky, as far as I recall, was that the alternatives offered, people felt strongly about those because the breweries made some strategic choices, and because of those, some of the items that we might be able to stock, they didn't want to sell those. (Store manager)

This excerpt illustrates how suppliers like the breweries, as shown earlier, influenced the implementation and affected the decisions made by the retailer.

This section indicates that even though the retailer is convinced that a given initiative would be interesting to implement in their supermarket, the suppliers often must agree as well, and finally, the customers must also welcome it.

In this study, we have explored the implementation, acceptability, and feasibility of four different health-promoting food retail initiatives aimed at customers in a real-life supermarket setting, using different qualitative methods. We found that (i) Two initiatives (downsizing of bags for the pick’n’ mix sweets and the complimentary banana for children) were implemented to a high degree, yet delivery issues caused delays according to the planned date, especially for the bags. The downsizing of soda bottles was not implemented as intended; the size and packaging deviated from the original plan due to delivery failure. Moreover, the implementation decreased over the feasibility test for the initiative with shelf tags, as it took more continuous maintenance. For all initiatives, we found that they lingered after the feasibility test; however, only the banana for children was somewhat sustained for a period to accommodate customer demand. (ii) The retailers expressed different levels of acceptability towards the initiatives, and different representatives sometimes also showed different levels of acceptability towards the same initiative, such as the tags on the breakfast products. The most well-received initiative was the banana for children, which is somewhat unsurprising, as it was the retailers themselves that suggested including this initiative. Additionally, the positive response from the customers that they got supported the retailers’ positive attitude towards the initiative. We also found that many customers responded well to this initiative; however, we also observed a group that did not accept the initiative and preferred the bun over the banana. For the remaining initiatives, customers did not seem to notice them. Yet, we did observe customer behaviours that would probably work against the suggested mechanisms of some of the initiatives. (iii) In general, we describe three themes of barriers and facilitators that influence the implementation and possible sustainment of the initiatives: Health is not the number one priority, General capacity of the retailer, and Influence of customers and other stakeholders on store operation. Firstly, we found the retailers were motivated to work with health, both from a personal and professional perspective. The motivation was rooted in a feeling of social responsibility as well as health initiatives being viewed as a relative advantage, due to demand and making customers happier. Still, other priorities, such as profit and maintaining customers’ ‘free choice’, challenged the motivation to implement such initiatives. Secondly, the retailer showed a high level of available resources, which supported their general capacity to implement the initiatives; however, the large size of the organisation and its culture proved to be barriers to the implementation. Lastly, the analysis showed that the influence of both customers and other stakeholders was crucial to the implementation, both in terms of what is possible and what the retailers would be interested in and prioritise.

Our findings are similar to those of others [ 26 , 35 ]. Winkler et al. [ 35 ] found that even though supermarket actors found health-promoting initiatives meaningful to engage in, their engagement was challenged by a business mindset, practical routines, and structural requirements. Thus, despite the involvement of retailers in the development, selection and implementation of the initiatives, studies suggest that healthy food retail initiatives still encounter some fundamental barriers towards the implementation, such as the economical aspect or the view on customers’ free choice. However, our results also indicate that it might be possible to persuade food retailers to remove products or restrict choices if the evidence or arguments of it being the right thing to do are sufficiently strong, as with organic bananas or tobacco products. This has also been the case of another retailer in Denmark, which has decided that all their stores should be tobacco and nicotine-free by the end of 2028 to reduce the number of smokers [ 36 ]. Another solution is to identify win–win initiatives, as the complimentary banana for children was somewhat an example of (if we consider the banana as a healthier alternative) and which other studies have found as well [ 35 , 37 ].

Even though the four initiatives were implemented (yet two not as intended) in this study, and we found them to be somewhat acceptable to the retailers, we must still highlight that these initiatives represent a very small portion of the initiatives first suggested and entail several compromises from what the researchers had initially planned (Duus et al. Unpublished ). Moreover, the customer’s responses to the initiatives were mixed, and in some cases, their behaviour indicated that the initiatives would have little effect. Compared with studies testing similar initiatives, we find that 1) Shelf tags alone were found unlikely to change food purchases [ 38 ] and are likely to contribute to disparities in food purchases as not all customers know nutrition labels or have the literacy to read and understand them [ 39 ]. 2) Smaller bags for pick’n’ mix sweets could be successfully implemented and, based on results from another study, might be able to decrease the volume of sweets sold [ 40 ]. Moreover, others have also shown that customers are willing to buy smaller product options [ 41 ]. Taken together, this suggests that voluntary engagement with researchers might not suffice to make changes that would improve the supermarket environment as opted for to support population health. This view has also been suggested by Winkler et al. [ 35 ], and in the Lancet series on commercial determinants of health, an even more critical perspective on engagements with commercial actors as food retailers is presented [ 42 , 43 ]. Here they warn against how commercial actors use partnerships with researchers, among others, as a tool to improve their reputation and credibility [ 42 ].

In our collaborative process with the retailer, we experienced many challenges. We did not accomplish aligning retailers’ and researchers’ interests as scholars have suggested being the prerequisite of implementing healthy food retail interventions in supermarkets [ 26 , 27 ]. This underlines the importance of the pre-intervention phase, as described by Hawe, Shiell, and Riley [ 44 ], which is fundamental to a successful implementation. During the pre-intervention phase, the establishment of relationships between different people or agencies often occurs, and these relationships may play a crucial role in the implementation and the explanation of why some work and others do not [ 44 ]. In line with this, another study has suggested exploring what implementation strategies might promote the uptake of evidence-based interventions among food retailers [ 45 ]. They found that contrary to many other studies, the intervention in their study was compatible with the interest of the store managers to which it was presented—these store managers had a strong feeling of social responsibility towards the communities they operated in [ 45 ].

Strength and limitations

The investigation of the feasibility test was strengthened by using different methods, process evaluation concepts, and a broad view including both the delivery and presentation of the initiatives as well as customer and retailer perspectives. We primarily got the retailer perspective from a strategic level, yet we had planned on conducting focus group interviews with staff at the test store to get perspectives from an operational level on the initiatives and the implementation process. However, no staff wanted to participate in an interview. The store manager explained that this probably was due to three things: 1) They had no interest in the study, or they were tired of the study, 2) the recruitment was done too late (approximately 2 months after the feasibility test period), and 3) the staff was overworked as a result of understaffing due to the coronavirus disease pandemic. Future studies aim also to analyse sales data in order to evaluate whether any changes in sales of the products we intervened on occurred. However, with the available data, we will not be able to analyse whether the initiatives change people’s eating patterns or whether they influence people differently in terms of their socioeconomic factors or other characteristics.

A thorough needs assessment [ 46 ] among supermarket customers to test the initiative’s assumptions and their food purchase patterns would have strengthened the study. However, this was not possible within the timeframe and funding scheme, so the development drew primarily on existing knowledge and the experience of the retailer and the Danish Cancer Society. Furthermore, the store visits conducted in the store during the development of the initiative also provided a few customer perspectives, which led to the exclusion of some ideas (Duus et al.  unpublished ).

Furthermore, we learned two methodological lessons from the in-store observations: 1) All observers were met by the feeling of being ‘in the way’ and a need to be in almost constant movement to not interfere with the order in the store. The observers were met with a feeling of self-awareness and a need to legitimise their presence at the store by wearing a sticker on their shirts saying ‘visitor’ or their university identification card. These feelings were amplified by the governmental advice of social distancing and the requirement to wear face masks in grocery stores, introduced during the period of observations. 2) Concerning this, the observers also found it challenging to approach customers for the short interviews due to the feeling of invading people’s private space, hence only five were conducted. This was especially challenging when wearing face masks, as it was impossible to produce and read non-verbal signals (e.g. smiles), and difficult to hear what people were saying.

Implications for future studies and practice

This study presents an investigation of the implementation of healthy food retail initiatives for supermarkets that have been developed and selected together with retailers as suggested by the literature. It suggests that the implementation of such initiatives is possible and—to some degree—high. Yet, the quality of the initiatives was rather low, and some were not implemented as intended. Moreover, we still present some of the same barriers and limitations as former studies that have not implemented collaborative strategies in the pre-intervention phase. Some of this may be due to challenges such as a high staff turnover at the retailer and a lack of a shared understanding, as shown in another study (Duus et al. unpublished ). Future studies must explore this further.

Lessons for future studies are to identify initiatives that customers appreciate, as this is important to retailers. Underlining a needs assessment as an important first step in intervention development [ 30 , 46 ]. Furthermore, future studies should involve a broader range of stakeholders, including manufacturers and suppliers, in the development of the initiatives, as they have significant power over what can be implemented. Future studies would also benefit from identifying and testing implementation strategies that can facilitate the implementation of this type of intervention in this setting.

We performed a qualitative investigation of the implementation, acceptability, and feasibility of four different healthy food retail initiatives aimed at customers in a real-life supermarket setting, which had been developed and selected together with retailers. Only two of the four initiatives were implemented as intended, and the perspectives of retailers and customers were mixed or unclear. Altogether, the study highlights the challenges of implementing healthy retail food initiatives despite early involvement of retailers in the selection and design of those initiatives. Adding to the challenges of implementation, the initiatives also represent a compromise between the interests of the researcher and the retailers and do not represent what the literature suggests as the most effective strategies. A compromise made to uphold the partnership and complete the funded research project. Future studies should further examine the impact and pitfalls of including retailers (or other commercial actors) in the development and selection of healthy food retail initiatives and try to identify successful implementation strategies facilitating implementation.

Availability of data and materials

The data generated and analysed during the current study are not publicly available due to their sensitive and confidential nature but are available from the corresponding author upon reasonable request.

Abbreviations

Corporate Social Responsibility

Key Performance Indicator

Healthy diet. WHO. 2020. https://www.who.int/news-room/fact-sheets/detail/healthy-diet . Accessed 20 July 2023.

Greenwald P, Clifford CK, Milner JA. Diet and cancer prevention. Eur J Cancer. 2001May 1;37(8):948–65.

Article   PubMed   CAS   Google Scholar  

Firth J, Gangwisch JE, Borsini A, Wootton RE, Mayer EA. Food and mood: how do diet and nutrition affect mental wellbeing? BMJ. 2020Jun;29(369): m2382.

Article   Google Scholar  

English LK, Ard JD, Bailey RL, Bates M, Bazzano LA, Boushey CJ, et al. Evaluation of Dietary Patterns and All-Cause Mortality. JAMA Netw Open. 2021Aug 31;4(8): e2122277.

Article   PubMed   PubMed Central   Google Scholar  

Swinburn B, Caterson I, Seidell J, James W. Diet, nutrition and the prevention of excess weight gain and obesity. Public Health Nutr. 2004Feb;7(1a):123–46.

Brug J. Determinants of healthy eating: motivation, abilities and environmental opportunities. Fam Pract. 2008 Dec 1;25(suppl_1):i50–5.

Adam A, Jensen JD. What is the effectiveness of obesity related interventions at retail grocery stores and supermarkets? —a systematic review. BMC Public Health. 2016Dec;16(1):1247.

Ball K, Timperio AF, Crawford DA. Understanding environmental influences on nutrition and physical activity behaviors: where should we look and what should we count? Int J Behav Nutr Phys Act. 2006Sep 26;3(1):33.

Sonntag D, Schneider S, Mdege N, Ali S, Schmidt B. Beyond Food Promotion: A Systematic Review on the Influence of the Food Industry on Obesity-Related Dietary Behaviour among Children. Nutrients. 2015;7(10):8565–76.

Viola D, Arno PS, Maroko AR, Schechter CB, Sohler N, Rundle A, et al. Overweight and obesity: Can we reconcile evidence about supermarkets and fast food retailers for public health policy? J Public Health Policy. 2013Aug;34(3):424–38.

Black C, Moon G, Baird J. Dietary inequalities: What is the evidence for the effect of the neighbourhood food environment? Health Place. 2014May;27:229–42.

Article   PubMed   Google Scholar  

Vogel C, Ntani G, Inskip H, Barker M, Cummins S, Cooper C, et al. Education and the Relationship Between Supermarket Environment and Diet. Am J Prev Med. 2016Aug;51(2):e27-34.

Chandon P, Wansink B. Does food marketing need to make us fat? A review and solutions Nutr Rev. 2012;70(10):571–93.

PubMed   Google Scholar  

Bennett R, Zorbas C, Huse O, Peeters A, Cameron AJ, Sacks G, et al. Prevalence of healthy and unhealthy food and beverage price promotions and their potential influence on shopper purchasing behaviour: A systematic review of the literature. Obes Rev. 2020 Jan;21(1).

Harbers MC, Beulens JWJ, Rutters F, de Boer F, Gillebaart M, Sluijs I, et al. The effects of nudges on purchases, food choice, and energy intake or content of purchases in real-life food purchasing environments: a systematic review and evidence synthesis. Nutr J. 2020Dec;19(1):103.

Hollands GJ, Carter P, Anwer S, King SE, Jebb SA, Ogilvie D, et al. Altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption. Cochrane Public Health Group, editor. Cochrane Database Syst Rev. 2019 Sep 4.

Slapø H, Schjøll A, Strømgren B, Sandaker I, Lekhal S. Efficiency of In-Store Interventions to Impact Customers to Purchase Healthier Food and Beverage Products in Real-Life Grocery Stores: A Systematic Review and Meta-Analysis. Foods. 2021May;10(5):922.

Gravlee CC, Boston PQ, Mitchell MM, Schultz AF, Betterley C. Food store owners’ and managers’ perspectives on the food environment: an exploratory mixed-methods study. BMC Public Health. 2014Dec;14(1):1031.

Winkler LL, Christensen U, Glümer C, Bloch P, Mikkelsen BE, Wansink B, et al. Substituting sugar confectionery with fruit and healthy snacks at checkout – a win-win strategy for consumers and food stores? a study on consumer attitudes and sales effects of a healthy supermarket intervention. BMC Public Health. 2016Nov 22;16(1):1184.

Adam A, Jensen JD, Sommer I, Hansen GL. Does shelf space management intervention have an effect on calorie turnover at supermarkets? J Retail Consum Serv. 2017Jan;1(34):311–8.

Toft U, Buch-Andersen T, Bloch P, Reinbach HC, Jensen BB, Mikkelsen BE, et al. A Community-Based, Participatory, Multi-Component Intervention Increased Sales of Healthy Foods in Local Supermarkets—The Health and Local Community Project (SoL). Int J Environ Res Public Health. 2023Jan;20(3):2478.

Bauer JM, Aarestrup SC, Hansen PG, Reisch LA. Nudging more sustainable grocery purchases: Behavioural innovations in a supermarket setting. Technol Forecast Soc Change. 2022Jun;1(179): 121605.

Denver S, Christensen T, Nordström J. Consumer preferences for low-salt foods: a Danish case study based on a comprehensive supermarket intervention. Public Health Nutr. 2021;24(12):3956–65.

Jensen JD, Sommer I. Reducing calorie sales from supermarkets – ‘silent’ reformulation of retailer-brand food products. Int J Behav Nutr Phys Act. 2017Aug 23;14(1):104.

Toft U, Winkler LL, Mikkelsen BE, Bloch P, Glümer C. Discounts on fruit and vegetables combined with a space management intervention increased sales in supermarkets. Eur J Clin Nutr. 2017Apr;71(4):476–80.

Middel CNH, Schuitmaker-Warnaar TJ, Mackenbach JD, Broerse JEW. Systematic review: a systems innovation perspective on barriers and facilitators for the implementation of healthy food-store interventions. Int J Behav Nutr Phys Act. 2019Dec;16(1):108.

Gupta A, Alston L, Needham C, Robinson E, Marshall J, Boelsen-Robinson T, et al. Factors Influencing Implementation, Sustainability and Scalability of Healthy Food Retail Interventions: A Systematic Review of Reviews. Nutrients. 2022Jan;14(2):294.

Denmark: market share of grocery retailers 2020. https://www.statista.com/statistics/565747/market-share-of-selected-grocery-retailers-in-denmark/ . Accessed 19 July 2023.

Moore N, editor. Desk research. In: How to Do Research: The Practical Guide to Designing and Managing Research Projects. Facet; 2006. p. 106–11.

Schultz Petersen K, Maindal HT, Ledderer L, Overgaard C. Komplekse interventioner: Udvikling, test, evaluering og implementering. Aalborg Universitetsforlag; 2022.

McGill E, Marks D, Er V, Penney T, Petticrew M, Egan M. Qualitative process evaluation from a complex systems perspective: A systematic review and framework for public health evaluators. PLoS Med. 2020Nov 2;17(11): e1003368.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006Jan;3(2):77–101.

Scaccia JP, Cook BS, Lamont A, Wandersman A, Castellow J, Katz J, et al. A practical implementation science heuristic for organizational readiness: R = MC2. J Community Psychol. 2015Apr;43(4):484–501.

Blake MR, Backholer K, Lancsar E, Boelsen-Robinson T, Mah C, Brimblecombe J, et al. Investigating business outcomes of healthy food retail strategies: A systematic scoping review. Obes Rev. 2019Oct;20(10):1384–99.

Winkler LL, Toft U, Glümer C, Bloch P, Buch-Andersen T, Christensen U. Involving supermarkets in health promotion interventions in the Danish Project SoL. A practice-oriented qualitative study on the engagement of supermarket staff and managers. BMC Public Health. 2023 Apr 18;23(1):706.

Lidl Danmark. https://om.lidl.dk/ansvarlighed/vi-fremmer-sundheden/udfasning-af-tobak . Accessed 7 March 2024.

Blake MR, Sacks G, Zorbas C, Marshall J, Orellana L, Brown AK, et al. The ‘Eat Well @ IGA’ healthy supermarket randomised controlled trial: process evaluation. Int J Behav Nutr Phys Act. 2021Dec;18(1):36.

Vandevijvere S, Berger N. The impact of shelf tags with Nutri-Score on consumer purchases: a difference-in-difference analysis of a natural experiment in supermarkets of a major retailer in Belgium. Int J Behav Nutr Phys Act. 2021Nov 18;18(1):150.

Robertson A, Lobstein T, Knai C. Obesity and socio-economic groups in Europe: Evidence review and implications for action. 2007.

Mørck CJ. Nyt forsøg afslører: Bland selv-posens størrelse gør en stor forskel. 2024. https://www.cancer.dk/nyheder-og-fortaellinger/2024/nyt-forsoeg-afsloerer-bland-selv-posens-stoerrelse-goer-en-stor-forskel/ . Accessed 4 July 2024.

Vandenbroele J, Slabbinck H, Van Kerckhove A, Vermeir I. Curbing portion size effects by adding smaller portions at the point of purchase. Food Qual Prefer. 2018Mar;1(64):82–7.

Gilmore AB, Fabbri A, Baum F, Bertscher A, Bondy K, Chang HJ, et al. Defining and conceptualising the commercial determinants of health. Lancet. 2023Apr;401(10383):1194–213.

Lacy-Nichols J, Nandi S, Mialon M, McCambridge J, Lee K, Jones A, et al. Conceptualising commercial entities in public health: beyond unhealthy commodities and transnational corporations. Lancet. 2023Apr;401(10383):1214–28.

Hawe P, Shiell A, Riley T. Theorising Interventions as Events in Systems. Am J Community Psychol. 2009;43(3–4):267–76.

Brimblecombe J, Miles B, Chappell E, De Silva K, Ferguson M, Mah C, et al. Implementation of a food retail intervention to reduce purchase of unhealthy food and beverages in remote Australia: mixed-method evaluation using the consolidated framework for implementation research. Int J Behav Nutr Phys Act. 2023Feb 17;20(1):20.

Skivington K, Matthews L, Simpson SA, Craig P, Baird J, Blazeby JM, et al. A new framework for developing and evaluating complex interventions: update of Medical Research Council guidance. BMJ. 2021Sep;30(374): n2061.

Download references

Acknowledgements

We want to thank all the participating retail group and supermarket staff members involved in this project and the implementation process. We appreciate the time and effort you have dedicated to this project and your openness. Furthermore, we want to acknowledge the customers who took the time to share their opinions with us during their daily grocery shopping.

We acknowledge Johanne Aviaja Rosing, Louise Ayoe Sparvath Brautsch, and Carl Johannes Middelboe for their assistance in conducting the pre- and post-intervention observations.

Open access funding provided by University of Southern Denmark This study is funded by the Danish Cancer Society, grant no.: R274-A16920. The first author (Katrine Sidenius Duus) has also received a Faculty Scholarship from the Faculty of Health Sciences at the University of Southern Denmark to support the completion of her PhD thesis, which this study is part of.

Author information

Authors and affiliations.

The National Institute of Public Health, University of Southern Denmark, Copenhagen, Denmark

Katrine Sidenius Duus, Tine Tjørnhøj-Thomsen & Rikke Fredenslund Krølner

You can also search for this author in PubMed   Google Scholar

Contributions

KSD, RFK, and TTT contributed to the funding acquisition, study conception and design. Data generation and analyses were performed by KSD. The first draft of the manuscript was written by KSD. RFK and TTT commented on previous versions of the manuscript and contributed in writing the final manuscript. KSD wrote up the final manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Katrine Sidenius Duus .

Ethics declarations

Ethics approval and consent to participate.

This study has been approved by SDU Research & Innovation Organization (notification no. 11.136). All informants who participated in interviews received written and verbal information about the aim, that participation was voluntary and that their information would be used for research purposes only and treated with confidentiality. By participating, consent for their data to be used for research was given. Data from the observation and documents were handled confidentially and with caution to protect sensitive information that could identify individuals.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Duus, K.S., Tjørnhøj-Thomsen, T. & Krølner, R.F. Implementation of health-promoting retail initiatives in the Healthier Choices in Supermarkets Study—qualitative perspectives from a feasibility study. BMC Med 22 , 349 (2024). https://doi.org/10.1186/s12916-024-03561-2

Download citation

Received : 24 May 2024

Accepted : 14 August 2024

Published : 02 September 2024

DOI : https://doi.org/10.1186/s12916-024-03561-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implementation
  • Qualitative research
  • Health promotion
  • Supermarkets
  • Involvement
  • Intervention

BMC Medicine

ISSN: 1741-7015

hypothesis feasibility study

  • Methodology
  • Open access
  • Published: 03 February 2021

Determining sample size for progression criteria for pragmatic pilot RCTs: the hypothesis test strikes back!

  • M. Lewis   ORCID: orcid.org/0000-0001-5290-7833 1 , 2 ,
  • K. Bromley 1 , 2 ,
  • C. J. Sutton 3 ,
  • G. McCray 1 , 2 ,
  • H. L. Myers 2 &
  • G. A. Lancaster 1 , 2  

Pilot and Feasibility Studies volume  7 , Article number:  40 ( 2021 ) Cite this article

45k Accesses

111 Citations

7 Altmetric

Metrics details

The current CONSORT guidelines for reporting pilot trials do not recommend hypothesis testing of clinical outcomes on the basis that a pilot trial is under-powered to detect such differences and this is the aim of the main trial. It states that primary evaluation should focus on descriptive analysis of feasibility/process outcomes (e.g. recruitment, adherence, treatment fidelity). Whilst the argument for not testing clinical outcomes is justifiable, the same does not necessarily apply to feasibility/process outcomes, where differences may be large and detectable with small samples. Moreover, there remains much ambiguity around sample size for pilot trials.

Many pilot trials adopt a ‘traffic light’ system for evaluating progression to the main trial determined by a set of criteria set up a priori. We construct a hypothesis testing approach for binary feasibility outcomes focused around this system that tests against being in the RED zone (unacceptable outcome) based on an expectation of being in the GREEN zone (acceptable outcome) and choose the sample size to give high power to reject being in the RED zone if the GREEN zone holds true. Pilot point estimates falling in the RED zone will be statistically non-significant and in the GREEN zone will be significant; the AMBER zone designates potentially acceptable outcome and statistical tests may be significant or non-significant.

For example, in relation to treatment fidelity, if we assume the upper boundary of the RED zone is 50% and the lower boundary of the GREEN zone is 75% (designating unacceptable and acceptable treatment fidelity, respectively), the sample size required for analysis given 90% power and one-sided 5% alpha would be around n = 34 (intervention group alone). Observed treatment fidelity in the range of 0–17 participants (0–50%) will fall into the RED zone and be statistically non-significant, 18–25 (51–74%) fall into AMBER and may or may not be significant and 26–34 (75–100%) fall into GREEN and will be significant indicating acceptable fidelity.

In general, several key process outcomes are assessed for progression to a main trial; a composite approach would require appraising the rules of progression across all these outcomes. This methodology provides a formal framework for hypothesis testing and sample size indication around process outcome evaluation for pilot RCTs.

Peer Review reports

The importance and need for pilot and feasibility studies is clear: “A well-conducted pilot study, giving a clear list of aims and objectives … will encourage methodological rigour … and will lead to higher quality RCTs” [ 1 ]. The CONSORT extension to external pilot and feasibility trials was published in 2016 [ 2 ] with the following key methodological recommendations: (i) investigate areas of uncertainty about the future definitive RCT; (ii) ensure primary aims/objectives are about feasibility, which should guide the methodology used; (iii) include assessments to address the feasibility objectives which should be the main focus of data collection and analysis; and (iv) build decision processes into the pilot design whether or how to proceed to the main study. Given that many trials incur process problems during implementation—particularly with regard to recruitment [ 3 , 4 , 5 ]—the need for pilot and feasibility studies is evident.

One aspect of pilot and feasibility studies that remains unclear is the required sample size. There is no consensus but recommendations vary from 10 to 12 per group through to 60–75 per group depending on the main objective of the study. Sample size may be based on precision of a feasibility parameter [ 6 , 7 ]; precision of a clinical parameter which may inform main trial sample size—particularly the standard deviation (SD) [ 8 , 9 , 10 , 11 ] but also event rate [ 12 ] and effect size [ 13 , 14 ]; or, to a lesser degree, for clinical scale evaluation [ 9 , 15 ]. Billingham et al. [ 16 ] reported that the median sample size of pilot and feasibility studies is around 30–36 per group but there is wide variation. Herbert et al. [ 17 ] reported that targets within internal as opposed to external pilots are often slightly larger and somewhat different, being based on percentages of the total sample size and timeline rather than any fixed sample requirement.

The need for a clear directive on sample size of studies is of upmost relevance. The CONSORT extension [ 2 ] reports that “Pilot size should be based on feasibility objectives and some rationale given” and states that a “confidence interval approach may be used to calculate and justify the sample size based on key feasibility objective(s)”. Specifically, item 7a (How sample size was determined: Rationale for numbers in the pilot trial) qualifies: “Many pilot trials have key objectives related to estimating rates of acceptance, recruitment, retention, or uptake … for these sorts of objectives, numbers required in the study should ideally be set to ensure a desired degree of precision around the estimated rate”. Item 7b (When applicable, explanation of any interim analyses and stopping guidelines) is generally an uncommon scenario for pilot and feasibility studies and is not given consideration here.

A key aspect of pilot and feasibility studies is to inform progression to the main trial, which has important implications for all key stakeholders (funders, researchers, clinicians and patients). The CONSORT extension [ 2 ] states that “decision processes about how to proceed needs to be built into the pilot design (which might involve formal progression criteria to decide whether to proceed, proceed with amendments, or not to proceed)” and authors should present “if applicable, the pre-specified criteria used to judge whether or how to proceed with a future definitive RCT; … implications for progression from pilot to future definitive RCT, including any proposed amendments”. Avery et al. [ 18 ] published recommendations for internal pilots emphasising a traffic light (stop-amend-go/red-amber-green) approach to progression with focus on process assessment (recruitment, protocol adherence, follow-up) and transparent reporting around the choice of trial design and the decision-making processes for stopping, amending or proceeding to a main trial. The review of Herbert et al. [ 17 ] reported that the use of progression criteria (including recruitment rate) and traffic light stop-amend-go as opposed to simple stop-go is increasing for internal pilot studies.

A common misuse of pilot and feasibility studies has been the application of hypothesis testing for clinical outcomes in small under-powered studies. Arain et al. [ 19 ] claimed that pilot studies were often poorly reported with inappropriate emphasis on hypothesis testing. They reviewed 54 pilot and feasibility studies published in 2007–2008, of which 81% incorporated hypothesis testing of clinical outcomes. Similarly, Leon et al. [ 20 ] stated that a pilot is not a hypothesis testing study: safety, efficacy and effectiveness should not be evaluated. Despite this, hypothesis testing has been commonly performed for clinical effectiveness/efficacy without reasonable justification. Horne et al. [ 21 ] reviewed 31 pilot trials published in physical therapy journals between 2012 and 2015 and found that only 4/31 (13%) carried out a valid sample size calculation on effectiveness/efficacy outcomes but 26/31 (84%) used hypothesis testing. Wilson et al. [ 22 ] acknowledged a number of statistical challenges in assessing potential efficacy of complex interventions in pilot and feasibility studies. The CONSORT extension [ 2 ] re-affirmed many researchers’ views that formal hypothesis testing for effectiveness/efficacy is not recommended in pilot/feasibility studies since they are under-powered to do so. Sim’s commentary [ 23 ] further contests such testing of clinical outcomes stating that treatment effects calculated from pilot or feasibility studies should not be the basis of a sample size calculation for a main trial.

However, when the focus of analysis is on confidence interval estimation for process outcomes, this does not give a definitive basis for acceptance/rejection of progression criteria linked to formal powering. The issue in this regard is that precision focuses on alpha ( α , type I error) without clear consideration of beta (β, type II error) and may therefore not reasonably capture true differences if a study is under-powered. Further, it could be argued that hypothesis testing of feasibility outcomes (as well as addressing both alpha and beta) is justified on the grounds that moderate-to-large differences (‘process-effects’) may be expected rather than small differences that would require large sample numbers. Moore et al. [ 24 ] previously stated that some pilot studies require hypothesis testing to guide decisions about whether larger subsequent studies can be undertaken, giving the following example of how this could be done for feasibility outcomes: asking the question “Is taste of dietary supplement acceptable to at least 95% of the target population?”, they showed that sample sizes of 30, 50 and 70 provide 48%, 78% and 84% power to reject an acceptance rate of 85% or lower if the true acceptance rate is 95% using a 1-sided α = 0.05 binomial test. Schoenfeld [ 25 ] advocates that, even for clinical outcomes, there may be a place for testing at the level of clinical ‘indication’ rather than ‘clinical evidence’. He suggested that preliminary hypothesis testing for efficacy could be conducted with high alpha (up to 0.25), not to provide definitive evidence but as an indication as to whether a larger study should be conducted. Lee et al. [ 14 ] also reported how type 1 error levels other than the traditional 5% could be considered to provide preliminary evidence for efficacy, although they did stop short of recommending doing this by concluding that a confidence interval approach is preferable.

Current recommendations for sample sizes of pilot/feasibility studies vary, have a single rather than a multi-criterion basis, and do not necessarily link directly to formal progression criteria. The purpose of this article is to introduce a simple methodology that allows sample size derivation and formal testing of proposed progression cut-offs, whilst offering suggestions for multi-criterion assessment, thereby giving clear guidance and sign-posting for researchers embarking on a pilot/feasibility study to assess uncertainty in feasibility parameters prior to a main trial. The suggestions within the article do not directly apply to internal pilot studies built into the design of a main trial, but given the similarities to external randomised pilot and feasibility studies, many of the principles outlined here for external pilots might also extend to some degree to internal pilots of randomised and non-randomised studies.

The proposed approach focuses on estimation and hypothesis testing of progression criteria for feasibility outcomes that are potentially modifiable (e.g. recruitment, treatment fidelity/ adherence, level of follow up). Thus, it aligns with the main aims and objectives of pilot and feasibility studies and with the progression stop-amend-go recommendations of Eldridge et al. [ 2 ] and Avery et al. [ 18 ].

Hypothesis concept

Let R UL denote the upper RED zone cut-off and G LL denote the lower GREEN zone cut-off. The concept is to set up hypothesis testing around progression criteria that tests against being in the RED zone (designating unacceptable feasibility—‘ STOP ’) based on an alternative of being in the GREEN zone (designating acceptable feasibility—‘ GO ’). This is analogous to the zero difference (null) and clinically important difference (alternative) in a main superiority trial. Specifically, we are testing against R UL when G LL is hypothesised to be true:

Null hypothesis: True feasibility outcome ( ε ) not greater than the upper “RED” stop limit ( R UL )

Alternative hypothesis: True feasibility outcome ( ε ) is greater than R UL

The test is a 1-tailed test with suggested alpha ( α ) of 0.05 and beta (β) of 0.05, 0.1 or 0.2, dependent on the required strength of evidence of the test. An example of a feasibility outcome might be percentage recruitment uptake.

Progression rules

Let E denote the observed point estimate (ranging from 0 to 1 for proportions, or for percentages 0–100%). Simple 3-tiered progression criteria would follow as:

E ≤ R UL [ P value non-significant ( P ≥ α )] -> RED (unacceptable—STOP)

R UL < E < G LL -> AMBER (potentially acceptable—AMEND)

E ≥ G LL [ P value significant ( P < α )] -> GREEN (acceptable—GO)

Sample size

Table 1 displays a quick look-up grid for sample size across a range of anticipated proportions for R UL and G LL for one-sample one-sided 5% alpha with typical 80% and 90% (as well as 95%) power for the normal approximation method with continuity correction (see Appendix for corresponding mathematical expression; derived from Fleiss et al. [ 26 ]). Table 2 is the same look-up grid relating to the Binomial exact approach with sample sizes derived using G*Power version 3.1.9.7 [ 27 ]. Clearly, as the difference between proportions R UL and G LL increases the sample size requirement is reduced.

Multi-criteria assessment

We recommend that progression for all key feasibility criteria should be considered separately, and hence overall progression would be determined by the worst-performing criterion, e.g. RED if at least one signal is RED, AMBER if none of the signals fall into RED but at least one falls into AMBER and GREEN if all signals fall into the GREEN zone. Hence, the GREEN signal to ‘GO’ across the set of individual criteria will give indication that progression to a main trial can take place without any necessary changes. A signal to ‘STOP’ and not proceed to a main trial is recommended if any of the observed estimates are ‘unacceptably’ low (i.e. fall within the RED zone). Otherwise, where neither ‘GO’ nor ‘STOP’ are signalled, the design of the trial will need amending by indication of subpar performance on one or more of the criteria.

Sample size requirements across multi-criteria will vary according to the designated parameters linked to the progression criteria, which may be set at different stages of the study on different numbers of patients (e.g. those screened, eligible, recruited and randomised, allocated to the intervention arm, total followed up). The overall size needed will be dictated by the requirement to power each of the multi-criteria statistical tests. Since these tests will yield separate conclusions in regard to the decision to ‘STOP’, ‘AMEND’ or ‘GO’ across all individual feasibility criteria there is no need to consider a multiple testing correction with respect to alpha. However, researchers may wish to increase power (and hence, sample size) to ensure adequate power to detect ‘GO’ signals across the collective set of feasibility criteria. For example, powering at 90% across three criteria (assumed independent) will ensure a collective power of 73% (i.e. 0.9 3 ), which may be considered reasonable, but 80% power across five criteria will reduce the power of the combined test to 33%. The final three columns of Table 1 cover the sample sizes required for 95% power, which may address collective multi-criteria assessment when considering keeping a high overall statistical power.

Further expansion of AMBER zone

Within the same sample size framework, the AMBER zone may be further split to indicate whether ‘minor’ or ‘major’ amendments are required according to the significance of the p value. Consider a 2-way split in the AMBER zone denoted by cut-off A C , which indicates the threshold for statistical significance, where an observed estimate below the cut-point will result in a non-significant result and an estimate at or above the cut-point a significant result. Let AMBER R denote the region of Amber zone adjacent to the RED zone between R UL and A C , and AMBER G denote the region of AMBER zone between A C and G LL adjacent to the GREEN zone. This would draw on two possible levels of amendment (‘major’ AMEND and ‘minor’ AMEND) and the re-configured approach would follow as:

R UL < E < G LL and P ≥ α { R UL < E < A c } -> AMBER R (major AMEND)

R UL < E < G LL and P < α { A c ≤ E < G LL } -> AMBER G (minor AMEND)

In Tables 1 and 2 in relation to designated sample sizes for different R UL and G LL and specified α and β, we show the corresponding cut-points for statistical significance ( p < 0.05) both in absolute terms of sample number ( n ) [ A C ] and as a percentage of the total sample sizes [ A C % ].

A motivating example (aligned to the normal approximation approach) is presented in Table 3 , which illustrates a pilot trial with three progression criteria. Table 4 presents the sample size calculations for the example scenario following the 3-tiered approach, and Table 5 gives the sample size calculations for the example scenario using the extended 4-tiered approach. Cut-points for the feasibility outcomes relating to the shown sample sizes are also presented to show RED, AMBER and GREEN zones for each of the three progression criteria.

Overall sample size requirement should be dictated by the multi-criteria approach. This is illustrated in Table 4 where we have three progression criteria each with a different denominator population. For recruitment uptake, the denominator denotes the total number of children screened and the numerator the number of children randomised; for follow-up, the denominator is the number of children randomised with the numerator being number of those randomised who are successfully followed up; and lastly for treatment fidelity, the denominator is the number allocated to the intervention arm with the numerator being the number of children who were administered the treatment correctly by the dietician. In the example in order to meet the individual ≥ 90% power requirement for all three criteria we would need: (i) for recruitment, the number to be screened to be 78; (ii) for treatment fidelity, the number in the intervention arm to be 34; and (iii) for follow up, the number randomised to be 44. In order to determine the overall sample size for the whole study, we base our decision on the criterion that requires the largest numbers, which is the treatment fidelity criterion which requires 68 to be randomised. We cannot base our decision on the 78 required to be screened for recruitment because this would give only an expected number of 28 randomised (i.e. 35% of 78). If we expect 35% recruitment uptake, then we need to inflate the total 68 (randomised) to be 195 (1/0.35 × 68) children to be screened (rounded to 200). This would give 99.9%, 90% and 98.8% power for criteria (i), (ii) and (iii), respectively (assuming 68 of the 200 screened are randomised), giving a very reasonable collective 88.8% power of rejecting the null hypotheses over the three criteria if the alternative hypotheses (for acceptable feasibility outcomes) are true in each case.

Inherent in our approach are the probabilities around sample size, power and hypothesised feasibility parameters. For example, taking the cut-offs from treatment fidelity as a feasibility outcome from Table 4 (ii), we set a lower GREEN zone limit of G LL = 0.75 (“acceptable” (hypothesised alternative value)) and an upper RED zone limit of R UL = 0.5 (“not acceptable” (hypothesised null value)) for rejecting the null for this criterion based on 90% power and a 1-sided 5% significance level (alpha). Figure 1 presents the normal probability density functions for ε , for the null and alternative hypotheses. In the illustration this would imply through normal sampling theory that if G LL holds true (i.e. true recruitment uptake ( ε ) = G LL ) there would be the following:

A probability of 0.1 (type II error probability β) of the estimate falling within RED/AMBER R zones (i.e. blue shaded area under the curve to the left of A C where the test result will be non-significant ( p ≥ 0.05))

Probability of 0.4 of it falling in the AMBER G zone (i.e. area under the curve to the right of A C but below G LL )

Probability of 0.5 of the estimate falling in the GREEN zone (i.e. G LL and above).

figure 1

Illustration of power using the 1-tailed hypothesis testing against the traffic light signalling approach to pilot progression. E , observed point estimate; R UL , upper limit of RED zone; G LL , lower limit of GREEN zone; Ac , cut-off for statistical significance (at the 1-sided 5% level); α , type I error; β , type II error

If R UL (the null) holds true (i.e. true feasibility outcome ( ε ) = R UL ), there would be the following:

A probability of 0.05 (one-tailed type I error probability α ) of the statistic/estimate falling in the AMBER G /GREEN zones (i.e. pink shaded area under the curve to the right of A C where the test result will be significant ( p < 0.05) as shown within Fig. 1 )

Probability of 0.45 of it falling in the AMBER R zone (i.e. to the left of A C but above R UL )

Probability of 0.5 of the estimate falling in the RED zone (i.e. R UL and below)

Figure 1 also illustrates how changing the sample size affects the sampling distribution and power of the analysis around the set null value (at R UL ) when the hypothesised alternative ( G LL ) is true. The figure emphasises the need for a large enough sample to safeguard against under-powering of the pilot analysis (as shown in the last plot which has a wider bell-shape than the first two plots and where the size of the beta probability is increased).

Figure 2 plots the probabilities of making each type of traffic light decision as functions of the true parameter value (focused on the recruitment uptake example from Table 5 (i)). Additional file 1 presents the R code for reproducing these probabilities and enables readers to insert different parameter values.

figure 2

Probability of traffic light given true underlying probability of an event using the example from Table 5 (i). Two plots are presented: a relating to normal approximation approach and b relating to binomial exact approach. Based on n = 200, R UL = 40 and G LL = 70

The methodology introduced in this article provides an innovative formal framework and approach to sample size derivation, aligning sample size requirement to progression criteria with the intention of providing greater transparency to the progression process and full engagement with the standard aims and objectives of pilot/feasibility studies. Through the use of both alpha and beta parameters (rather than alpha alone), the method ensures rigour and capacity to address the progression criteria by ensuring there is adequate power to detect an acceptable threshold for moving forward to the main trial. As several key process outcomes are assessed in parallel and in combination, the method embraces a composite multi-criterion approach that appraises signals for progression across all the targeted feasibility measures. The methodology extends beyond the requirement for ‘sample size justification but not necessarily sample size calculation’ [ 28 ].

The focus of the strategy reported here is on process outcomes, which align with the recommended key objectives of primary feasibility evaluation for pilot and feasibility studies [ 2 , 24 ] and necessary targets to address key issues of uncertainty [ 29 ]. The concept of justifying progression is key. Charlesworth et al. [ 30 ] developed a checklist for intended use in decision-making on whether pilot data could be carried forward to a main trial. Our approach builds on this philosophy by introducing a formalised hypothesis test approach to address the key objectives and pilot sample size. Though the suggested sample size derivation focuses around the key process objectives, it may also be the case that other objectives are also important, e.g. assessment of precision of clinical outcome parameters. In this case, researchers may also wish to ensure that the size of the study suitably covers the needs of those evaluations, e.g. to estimate the SD of the intended clinical outcome, then the overall sample size may be boosted to cover this additional objective [ 10 ]. This tallies with the review by Blatch-Jones et al. [ 31 ] who reported that testing recruitment, determining the sample size and numbers available, and the intervention feasibility were the most commonly used targets of pilot evaluations.

Hypothesis testing in pilot studies, particularly in the context of effectiveness/efficacy of clinical outcomes, has been widely criticised due to the improper purpose and lack of statistical power of such evaluations [ 2 , 20 , 21 , 23 ]. Hence, pilot evaluations of clinical outcomes are not expected to include hypothesis testing. Since the main focus is on feasibility the scope of the testing reported here is different and importantly relates back to the recommended objectives of the study whilst also aligning with nominated progression criteria [ 2 ]. Hence, there is clear justification for this approach. Further, for the simple 3-tiered approach hypothesis testing is somewhat hypothetical: there is no need to physically carry out a test since the zonal positioning of the observed sample statistic estimate for the feasibility outcome will determine the decision in regard to progression; thus adding to the simplicity of the approach.

The link between the sample size and need to adequately power the study to detect a meaningful feasibility outcome gives this approach the extra rigour over the confidence interval approach. It is this sample size-power linkage that is key to the determination of the respective probabilities of falling into the different zones and is a fundamental underpinning to the methodological approach. In the same way as for a key clinical outcome in a main trial where the emphasis is not just on alpha but also on beta thereby addressing the capacity to detect a clinically significant difference, similarly, our approach is to ensure there is sufficient capacity to detect a meaningful signal for progression to a main trial if it truly exists. A statistically significant finding in this context will at least provide evidence to reject RED (signifying a decision to STOP) and in the 4-tiered case it would fall above AMBER R (decision to major-AMEND); hence, the estimate will fall into AMBER G or GREEN (signifying a decision to minor-AMEND or GO, respectively). The importance of adequately powering the pilot trial to address a feasibility criterion can be simply illustrated. For example, if we take R UL as 50% and G LL as 75% but with two different sample sizes of n = 25 and n = 50; the former would have 77.5% power of rejecting RED on the basis of a 1-sided 5% alpha level whereas the larger sample size would have 97.8% power of rejecting RED. So, if G LL holds true, there would be 20% higher probability of rejecting the null and being in the AMBER G /GREEN zone for the larger sample giving an increased chance of progressing to the main trial. It will be necessary to carry out the hypothesis test for the extended 4-tier approach if the observed statistic ( E ) falls in the AMBER zone to determine statistical significance or not, which will inform whether the result falls into the ‘minor’ or ‘major’ AMBER sub-zones.

We provide recommended sample sizes within a look-up grid relating to perceived likely progression cut-points to aid quick access and retrievable sample sizes for researchers. For a likely set difference in proportions between hypothesised null and alternative parameters of 0.15 to 0.25 when α = 0.05 and β = 0.1 the corresponding total sample size requirements for the approach of normal approximation with continuity correction take the range of 33 to 100 (median 56) [similarly these are 33–98 (median 54) for the binomial exact method]. Note, for treatment fidelity/adherence/compliance particularly, the marginal difference could be higher, e.g. ≥ 25%, since in most situations we would anticipate and hope to attain a high value for the outcome whilst being prepared to make necessary changes within a wide interval of below par values (and providing the value is not unacceptably low). As this relates to an arm-specific objective (relating to evaluation of the intervention only), then a usual 1:1 pilot will require twice the size; hence, the arm-specific sample size powered for detecting a ≥ 25% difference from the null would be about 34 (or lower)—as depicted from our illustration (Table 4 (ii), equating to n ≤ 68 overall for a 1:1 pilot; intervention and control arms). Hence, we expect that typical pilot sizes of around 30–40 randomised per arm [ 16 ] would likely fit with the proposed methodology within this manuscript (the number needed for screening being extrapolated upward of this figure) but if a smaller marginal difference (e.g. ≤ 15%) is to be tested then these sample sizes may fall short. We stress that the overall required sample size needs to be carefully considered and determined in line with the hypothesis testing approach across all criteria ensuring sufficiently high power. In our paper, we have made recommendations regarding various sample sizes based on both the normal approximation (with continuity correction) and binomial exact approaches; these are conservative compared to the Normal approximation (without continuity correction).

Importantly, the methodology outlines the necessary multi-criterion approach to the evaluation of pilot and feasibility studies. If all progression criteria are performing as well as anticipated (highlighting ‘GO’ according to all criteria), then the recommendation of the pilot/feasibility study is that all criteria meet their desired levels with no need for adjustment and the main trial can proceed without amendment. However, if the worst signal (across all measured criteria) is an AMBER signal, then adjustment will be required against those criteria that fall within that signal. Consequently, there is the possibility that the criteria may need subsequent re-assessment to re-evaluate processes in line with updated performance for the criteria in question. If one or more of the feasibility statistics fall within the RED zone then this signals ‘STOP’ and concludes that a main trial is not feasible based on those criteria. This approach to collectively appraising progression based on the results of all feasibility outcomes assessed against their criteria will be conservative as the power of the collective will be lower than the individual power of the separate tests; hence, it is recommended that the power of the individual tests is set high enough (for example, 90–95%) to ensure the collective power is high enough (e.g. at least 70 or 80%) to detect true ‘GO’ signals across all the feasibility criteria.

In this article, we also expand the possibilities for progression criterion and hypothesis testing where the AMBER zone is sub-divided arbitrarily based on the significance of the p value. This may work well when the AMBER zone has a wide range and is intended to provide a useful and workable indication of the level of amendment (‘minor’ (non-substantive) or ‘major’ (substantive)) required to progress to the main trial. Examples of substantial amendments include study re-design with possible re-appraisal and change of statistical parameters, inclusion of several additional sites, adding further data recruitment methods, significant reconfiguration of exclusions, major change to the method of delivery of trial intervention to ensure enhanced treatment fidelity/adherence, enhanced measures to systematically ensure greater patient compliance with allocated treatment, additional mode(s) of collecting and retrieving data (e.g. use of electronic data collection methods in addition to postal questionnaires). Minor amendments include small changes to the protocol and methodology, e.g. addition of one or two sites for attaining a slightly higher recruitment rate, use of occasional reminders in regard to treatment protocol and adding a further reminder process for boosting follow up. For the most likely parametrisation of α = 0.05/β = 0.1, the AMBER zone division will be roughly at the midpoint. However, researchers can choose this point (the major/minor cut-point) based on decisive arguments around how major and minor amendments would align to the outcome in question. This should be factored within the process of sample size determination for the pilot. In this regard, a smaller sample size will move A C upwards (due to increased standard error/reduced precision) and hence increase the size of the AMBER R zone in relation to AMBER G (whereas a larger sample size will shift A C downwards and do the opposite, increasing the ratio of AMBER G :AMBER R ). From Table 1 , for smaller sample sizes (related to 80% power) the AMBER R zone makes up 56–69% of the total amber zone across presented scenarios, whereas this falls to 47–61% for samples (related to 90% power) and 41–56% for larger samples (related to 95% power) for the same scenarios. Beyond our proposed 4-tier approach, other ways of providing an indication of level of amendment could include evaluation and review of the point and interval estimates or by evaluating posterior probabilities via a Bayesian approach [ 14 , 32 ].

The methodology illustrated here focuses on feasibility outcomes presented as percentages/proportions, which is likely to be the most common form for progression criteria under consideration. However, the steps that have been introduced can be readily adapted to any feasibility outcomes taking a numerical format, e.g. rate of recruitment per month per centre, count of centres taking part in the study. Also, we point out that in the examples presented in the paper (recruitment, treatment fidelity and percent follow-up), high proportions are acceptable and low ones not. This would not be true for, say, adverse events where a reverse scale is required.

Biased sample estimates are a concern as they may result in a wrong decision being made. This systematic error is over-and-above the possibility of an erroneous decision being made on the basis of sampling error; the latter may be reduced through an increased pilot sample size. Any positive bias will inflate/overestimate the feasibility sample estimate in favour of progressing whereas a negative bias will deflate/underestimate it towards the null and stopping. Both are problematic for opposite reasons; for example, the former may inform researchers that the main trial can ‘GO’ ahead when in fact it will struggle to meet key feasibility targets, whereas the latter may caution against progression when in reality the feasibility targets of a main trial would be met. For example, in regard to the choice of centres (and hence practitioners and participants), a common concern is that the selection of feasibility trial centres might not be a fair and representative sample of the ‘population’ of centres to be used for the main trial. It may be that the host centre (likely used in pilot studies) recruits far better than others (positive bias), thus exaggerating the signal to progress and subsequent recruitment to the main trial. Beets et al. [ 33 ] ‘define “risk of generalizability biases” as the degree to which features of the intervention and sample in the pilot study are NOT scalable or generalizable to the next stage of testing in a larger, efficacy/effectiveness trial … whether aspects like who delivers an intervention, to whom it is delivered, or the intensity and duration of the intervention during the pilot study are sustained in the larger, efficacy/effectiveness trial.’ As in other types of studies, safeguards regarding bias should be addressed through appropriate pilot study design and conduct.

Issues relating to progression criteria for internal pilots may be different to those for external pilots and non-randomised feasibility studies. The consequence of a ‘stop’ within an internal pilot may be more serious for stakeholders (researchers, funders, patients) as it would bring an end to the planned continuation into the main trial phase, whereas there would be less at stake for a negative external pilot. By contrast, the consequence of a ‘GO’ signal may work the other way with a clear and immediate gain for the internal pilot whereas for an external pilot, the researchers would still need to apply and get the necessary funding and approvals to undertake an intended main trial. The chances of falling into the different traffic light zones are likely to be quite different between the two designs. Possibly external pilot and feasibility studies are more likely to have estimates falling in and around the RED zone than for internal pilots, reflecting the greater uncertainty in the processes for the former and greater confidence in the mechanisms for trial delivery for the latter. However, to counter this, there are often large challenges with recruitment within internal pilot studies where the target population is usually spread over more diverse sites than may be expected for an external pilot. Despite this possible imbalance, the interpretation of zonal indications remains consistent for external and internal pilot studies. As such, our focus with regard to the recommendations in this article are aligned to requirements for external pilots, though application of this methodology to a degree may similarly hold for internal pilots (and further, to non-randomised studies that can include progression criteria—including longitudinal observational cohorts with the omission of the treatment fidelity criterion).

Conclusions

We propose a novel framework that provides a paradigm shift towards formally testing feasibility progression criteria in pilot and feasibility studies. The outlined approach ensures rigorous and transparent reporting in line with CONSORT recommendations for evaluation of STOP-AMEND-GO criteria and presents clear progression sign-posting which should help decision-making and inform stakeholders. Targeted progression criteria are focused on recommended pilot and feasibility objectives, particularly recruitment uptake, treatment fidelity and participant retention, and these criteria guide the methodology for sample size derivation and statistical testing. This methodology is intended to provide a more definitive and rounded structure to pilot and feasibility design and evaluation than currently exists. Sample size recommendations will be dependent on the nature and cut-points for multiple key pre-defined progression criteria and should ensure a sufficient sample size for other feasibility outcomes such as review of the precision of clinical parameters to better inform main trial size.

Availability of data and materials

Not applicable.

Abbreviations

Significance level (Type I error probability)

AMBER sub-zone split adjacent to the GREEN zone (within 4-tiered approach)

AMBER sub-zone split adjacent to the RED zone (within 4-tiered approach)

AMBER-statistical significance threshold (within the AMBER zone) where an observed estimate below the cut-point will result in a non-significant result (p ≥ 0.05) and figures at or above the cut-point will be significant (p < 0.05)

A C expressed as a percentage of the sample size

Type II error probability

Estimate of feasibility outcome

True feasibility parameter

Lower Limit of GREEN zone

Sample size (n s = number of patients screened; n r = number of patients randomised; n i = number of patients randomised to the intervention arm only)

(1 – Type II error probability)

Upper Limit of RED zone

Lancaster GA, Dodd S, Williamson PR. Design and analysis of pilot studies: recommendations for good practice. J Eval Clin Pract. 2004;10(2):307–12.

Article   Google Scholar  

Eldridge SM, Chan CL, Campbell MJ, Bond CM, Hopewell S, Thabane L, et al. CONSORT 2010 statement: extension to randomised pilot and feasibility trials. Pilot Feasibility Stud. 2016;2:64.

McDonald AM, Knight RC, Campbell MK, Entwistle VA, Grant AM, Cook JA, et al. What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006;7:9.

Sully BG, Julious SA, Nicholl J. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies. Trials. 2013;14:166.

Walters SJ, Bonacho Dos Anjos Henriques-Cadby I, Bortolami O, Flight L, Hind D, Jacques RM, et al. Recruitment and retention of participants in randomised controlled trials: a review of trials funded and published by the United Kingdom Health Technology Assessment Programme. BMJ Open. 2017;7(3):e015276.

Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharm Stat. 2005;4:287–91.

Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, et al. A tutorial on pilot studies: the what, why and how. BMC Med Res Methodol. 2010;10:1.

Browne RH. On the use of a pilot sample for sample size determination. Stat Med. 1995;14:1933–40.

Article   CAS   Google Scholar  

Hertzog MA. Considerations in determining sample size for pilot studies. Res Nurs Health. 2008;31(2):180–91.

Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency. J Clin Epidemiol. 2012;65(3):301–8.

Whitehead AL, Julious SA, Cooper CL, Campbell MJ. Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Stat Methods Med Res. 2016;25(3):1057–73.

Teare MD, Dimairo M, Shephard N, Hayman A, Whitehead A, Walters SJ. Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: a simulation study. Trials. 2014;15:264.

Cocks K, Torgerson DJ. Sample size calculations for pilot randomized trials: a confidence interval approach. J Clin Epidemiol. 2013;66(2):197–201.

Lee EC, Whitehead AL, Jacques RM, Julious SA. The statistical interpretation of pilot trials: should significance thresholds be reconsidered? BMC Med Res Methodol. 2014;14:41.

Johanson GA, Brooks GP. Initial scale development: sample size for pilot studies. Edu Psychol Measurement. 2010;70(3):394–400.

Billingham SA, Whitehead AL, Julious SA. An audit of sample sizes for pilot and feasibility trials being undertaken in the United Kingdom registered in the United Kingdom Clinical Research Network database. BMC Med Res Methodol. 2013;13:104.

Herbert E, Julious SA, Goodacre S. Progression criteria in trials with an internal pilot: an audit of publicly funded randomised controlled trials. Trials. 2019;20(1):493.

Avery KN, Williamson PR, Gamble C, O’Connell Francischetto E, Metcalfe C, Davidson P, et al. Informing efficient randomised controlled trials: exploration of challenges in developing progression criteria for internal pilot studies. BMJ Open. 2017;7(2):e013537.

Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Med Res Methodol. 2010;10:67.

Leon AC, Davis LL, Kraemer HC. The role and interpretation of pilot studies in clinical research. J Psychiatr Res. 2011;45(5):626–9.

Horne E, Lancaster GA, Matson R, Cooper A, Ness A, Leary S. Pilot trials in physical activity journals: a review of reporting and editorial policy. Pilot Feasibility Stud. 2018;4:125.

Wilson DT, Walwyn RE, Brown J, Farrin AJ, Brown SR. Statistical challenges in assessing potential efficacy of complex interventions in pilot or feasibility studies. Stat Methods Med Res. 2016;25(3):997–1009.

Sim J. Should treatment effects be estimated in pilot and feasibility studies? Pilot Feasibility Stud. 2019;5:107.

Moore CG, Carter RE, Nietert PJ, Stewart PW. Recommendations for planning pilot studies in clinical and translational research. Clin Transl Sci. 2011;4(5):332–7.

Schoenfeld D. Statistical considerations for pilot studies. Int J Radiat Oncol Biol Phys. 1980;6(3):371–4.

Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions, Third Edition. New York: John Wiley & Sons; 2003. p. 32.

Book   Google Scholar  

Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:175–91.

Julious SA. Pilot studies in clinical research. Stat Methods Med Res. 2016;25(3):995–6.

Lancaster GA. Pilot and feasibility studies come of age! Pilot Feasibility Stud. 2015;1(1):1.

Charlesworth G, Burnell K, Hoe J, Orrell M, Russell I. Acceptance checklist for clinical effectiveness pilot trials: a systematic approach. BMC Med Res Methodol. 2013;13:78.

Blatch-Jones AJ, Pek W, Kirkpatrick E, Ashton-Key M. Role of feasibility and pilot studies in randomised controlled trials: a cross-sectional study. BMJ Open. 2018;8(9):e022233.

Willan AR, Thabane L. Bayesian methods for pilot studies. Clin Trials 2020;17(4):414-9.

Beets MW, Weaver RG, Ioannidis JPA, Geraci M, Brazendale K, Decker L, et al. Identification and evaluation of risk of generalizability biases in pilot versus efficacy/effectiveness trials: a systematic review and meta-analysis. Int J Behav Nutr Phys Act. 2020;17:19.

Download references

Acknowledgements

We thank Professor Julius Sim, Dr Ivonne Solis-Trapala, Dr Elaine Nicholls and Marko Raseta for their feedback on the initial study abstract.

KB was supported by a UK 2017 NIHR Research Methods Fellowship Award (ref RM-FI-2017-08-006).

Author information

Authors and affiliations.

Biostatistics Group, School of Medicine, Keele University, Room 1.111, David Weatherall Building, Keele, Staffordshire, ST5 5BG, UK

M. Lewis, K. Bromley, G. McCray & G. A. Lancaster

Keele Clinical Trials Unit, Keele University, Keele, Staffordshire, UK

M. Lewis, K. Bromley, G. McCray, H. L. Myers & G. A. Lancaster

Centre for Biostatistics, School of Health Sciences, University of Manchester, Manchester, Staffordshire, UK

C. J. Sutton

You can also search for this author in PubMed   Google Scholar

Contributions

ML and CJS conceived the original methodological framework for the paper. ML prepared draft manuscripts. KB and GMcC provided examples and illustrations. All authors contributed to the writing and provided feedback on drafts and steer and suggestions for article updating. All authors read and approved the final manuscript.

Corresponding author

Correspondence to M. Lewis .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

R codes used for Fig. 2 .

Mathematical formulae for derivation of sample size

The required sample size may be derived using normal approximation to binary response data—using a continuity correction, via Fleiss et al. [ 26 ] if the convention of np > 5 and n ( 1 − p ) > 5 holds true:

where R UL = upper limit of RED zone; G LL = lower limit of GREEN zone; z 1− α = one-sided statistical significance level (type I error probability); z 1−β = beta (type II error probability)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Lewis, M., Bromley, K., Sutton, C.J. et al. Determining sample size for progression criteria for pragmatic pilot RCTs: the hypothesis test strikes back!. Pilot Feasibility Stud 7 , 40 (2021). https://doi.org/10.1186/s40814-021-00770-x

Download citation

Received : 23 April 2020

Accepted : 07 January 2021

Published : 03 February 2021

DOI : https://doi.org/10.1186/s40814-021-00770-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Outcome and process assessment
  • Sample size, Statistics

Pilot and Feasibility Studies

ISSN: 2055-5784

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

hypothesis feasibility study

Part 1. Overview Information

National Institutes of Health ( NIH )

National Institute on Aging ( NIA )

National Institute on Alcohol Abuse and Alcoholism ( NIAAA )

National Institute of Arthritis and Musculoskeletal and Skin Diseases ( NIAMS )

Eunice Kennedy Shriver National Institute of Child Health and Human Development ( NICHD )

National Institute of Neurological Disorders and Stroke ( NINDS )

R15 Research Enhancement Award (REA)

  • April 4, 2024  - Overview of Grant Application and Review Changes for Due Dates on or after January 25, 2025. See Notice NOT-OD-24-084 .
  • August 31, 2022 - Implementation Changes for Genomic Data Sharing Plans Included with Applications Due on or after January 25, 2023. See Notice  NOT-OD-22-198 .
  • August 5, 2022 - Implementation Details for the NIH Data Management and Sharing Policy. See Notice  NOT-OD-22-189 .

See Part 2, Section III. 3. Additional Information on Eligibility.

The purpose of this HEAL Initiative program is to: (1) support basic and mechanistic pain research from R15-eligible undergraduate-focused serving institutions, health professional schools or graduate schools; (2) promote integrated, interdisciplinary research partnerships between Principal Investigators (PIs) from R15-eligible institutions and investigators from U.S. domestic institutions; and (3) enhance the pain research environment at the R15-eligible institution for health professional students, undergraduate and/or graduate students through active engagement in pain research.

Applications in response to this notice of funding opportunity (NOFO) should include plans to accomplish these goals. Specifically, applications should include a rigorous plan for conducting basic and mechanistic pain research in the Research Strategy section of the application . In addition, a research partnership between the PI’s institution and at least one investigator from a separate U.S. domestic institution that provides resources and/or expertise that will enhance the proposed pain research program must be included in a separate Team Management Plan. The proposed partnership will be a sub-award agreement(s) with at least one partnering institution, which does not need to be R15-eligible. The budget of all sub-awards must not exceed one third of the total budget. Furthermore, applications must include a Facilities & Other Resources document that demonstrates active involvement of health professional students, undergraduate and/or graduate students from the R15-eligible institution(s) in the proposed pain research projects.

This Notice of Funding Opportunity (NOFO) requires a Plan for Enhancing Diverse Perspectives (PEDP).

 30 days before application due date.

Application Due Dates Review and Award Cycles
New Renewal / Resubmission / Revision (as allowed) AIDS - New/Renewal/Resubmission/Revision, as allowed Scientific Merit Review Advisory Council Review Earliest Start Date
November 19, 2024 Not Applicable December 18, 2024 March 2025 May 2025 July 2025
October 28, 2025 October 28, 2025 November 24, 2025 March 2026 May 2026 July 2026
October 27, 2026 October 27, 2026 November 23, 2026 March 2027 May 2027 July 2027

All applications are due by 5:00 PM local time of applicant organization. 

Applicants are encouraged to apply early to allow adequate time to make any corrections to errors found in the application during the submission process by the due date.

No late applications will be accepted for this Notice of Funding Opportunity (NOFO).

Not Applicable

It is critical that applicants follow the instructions in the Research (R) Instructions in the  How to Apply - Application Guide , except where instructed to do otherwise (in this NOFO or in a Notice from NIH Guide for Grants and Contracts ).

Conformance to all requirements (both in the How to Apply - Application Guide and the NOFO) is required and strictly enforced. Applicants must read and follow all application instructions in the How to Apply - Application Guide as well as any program-specific instructions noted in Section IV. When the program-specific instructions deviate from those in the How to Apply - Application Guide , follow the program-specific instructions.

Applications that do not comply with these instructions may be delayed or not accepted for review.

There are several options available to submit your application through Grants.gov to NIH and Department of Health and Human Services partners. You must use one of these submission options to access the application forms for this opportunity.

  • Use the NIH ASSIST system to prepare, submit and track your application online.
  • Use an institutional system-to-system (S2S) solution to prepare and submit your application to Grants.gov and eRA Commons to track your application. Check with your institutional officials regarding availability.
  • Use Grants.gov Workspace to prepare and submit your application and eRA Commons to track your application.

Part 2. Full Text of Announcement

Section i. notice of funding opportunity description.

Applications in response to this notice of funding opportunity (NOFO) should include plans to accomplish these goals. Specifically, applications should include a rigorous plan for conducting basic and mechanistic pain research projects in the Research Strategy section of the application. In addition, a research partnership between the PI’s institution and at least one investigator from a separate U.S. domestic institution that provides resources and/or expertise that will enhance the proposed pain research program must be included in a separate Team Management Plan. The proposed partnership will be a sub-award agreement(s) with at least one partnering institution, which does not need to be R15-eligible. The budget of all sub-awards must not exceed one third of the total budget. Furthermore, applications must include a Facilities & Other Resources document   that demonstrates active involvement of health professional students or undergraduate and/or graduate students from the R15-eligible institution(s) in the proposed pain research projects.

The National Institutes of Health (NIH) Helping to End Addiction Long-term ® Initiative, or NIH HEAL Initiative ® , bolsters research across NIH to (1) improve treatment for opioid misuse and addiction and (2) enhance pain management. More information about the NIH HEAL Initiative is available at  https://heal.nih.gov/ . Research shows that diverse teams working together and capitalizing on innovative ideas and distinct perspectives outperform homogeneous teams. Scientists and trainees from diverse backgrounds and life experiences bring different perspectives, creativity, and individual enterprise to address complex scientific problems. See  the Notice of NIH’s Interest in Diversity ( NOT-OD-20-031 ) for more details. Promoting diversity in the pain research workforce is crucial to promoting future scientific advances in this area and to achieve the NIH HEAL Initiative’s workforce development goals.The initiative has funded multiple pain workforce enhancement programs that support early-career investigators. Despite these efforts, the NIH HEAL Initiative can benefit from additionally supporting  R-15 eligible institutions that involve undergraduate, graduate or health professional school/colleges students in pain research.

Since Fiscal Year (FY) 1985, NIH has made a special effort to stimulate research at educational institutions that provide baccalaureate and/or advanced degrees for a significant number of the nation’s research scientists who have not been major recipients of NIH support. NIH has implemented two parent award programs, the Academic Research Enhancement Award (AREA) program ( PAR-21-155 ) and Research Enhancement Award Program (REAP) ( PAR-22-060 ), to provide research experiences to health professional or undergraduate and/or graduate students pursuing biomedical or behavioral research at U.S. higher education institutions. Utilizing these two programs will further promote a diverse pain research workforce. This  Pain Research Enhancement Program (“PREP”) will further support meritorious collaborative pain research from designated educational levels  in the NIH HEAL Initiative, using the NIH Research Enhancement Award programs as a guide. Specifically, this NOFO aims to support new scientific solutions to the national opioid public health crisis byestablishing new research partnerships that will lead to research experiences for undergraduate, graduate, and health professional students,  to further enhance the pool of potential participants in the pain research pipeline.

Program Objectives:

The purpose of this HEAL Initiative program is to: (1) support basic and mechanistic pain research from R15-eligible undergraduate-focused serving institutions, health professional schools or graduate schools; (2) promote integrated, interdisciplinary research partnerships between Principal Investigators (PIs) from R15-eligible institutions and investigators from U.S. domestic institutions; and (3) enhance the pain research environment at the R15-eligible institution for health professional students, undergraduate and/or graduate students through active engagement in pain research. Successful applications will include plans detailing how they intend to accomplish all three goals. Please refer to Section III for specific R15 eligibility information. Although preliminary data are not required for an R15 application, they may be included if available. The scientific foundation for the proposed research should be based on published research and/or any available preliminary data.

Objective 1: Develop Small-Scale Basic and Mechanistic Pain Research Projects

Proposed research projects should be hypothesis driven and use a rigorous scientific design to generate research data/evidence and advance scientific knowledge. Applications should include objectives that are attainable within the 3-year grant period.

Pain research projects may include, but are not limited to, the study of: nociception and/or pain processing in non-pain populations, acute pain, cancer pain, chemotherapy-induced neuropathy, chronic pain, diabetic neuropathy, eye pain, gynecologic pain, headache, musculoskeletal pain, myofascial pain, obstetric pain, osteoarthritis, pain conditions across the lifespan (including in the context of aging), pain co-occurring with substance use disorders (SUDs), painful disorders of the orofacial region, painful neuropathy, post-stroke pain, post-surgical pain, sickle cell pain, and/or visceral pain. Innovative pain research topics that propose an interdisciplinary mechanistic pain research are considered high program priority under this initiative.

Projects may focus on basic  pain research with pre-clinical ( e.g., animal or in silico ) models or involve research participants ( e.g., observational studies, epidemiological studies, secondary data analyses, or device development). Alternatively, investigators may propose a mechanistic and/or “Basic Experimental Studies involving Humans” (BESH) clinical trial as described below.  Clinical trials designed primarily to determine the safety, tolerability, and/or clinical efficacy of an intervention will be considered non-responsive to this NOFO and withdrawn without review .

For this NOFO, only the following types of clinical trials will be supported:

  • Basic Experimental Studies with Humans (BESH) ,  defined as basic research studies involving humans that seek to understand the fundamental aspects of phenomena
  • Mechanistic trials ,  defined as studies designed to understand a biological or behavioral process, the pathophysiology of a disease, or the mechanism of action of an intervention (i.e., how an intervention works, but not if it works or is safe)

NIH defines a clinical trial as a research study in which one or more human subjects are prospectively assigned to one or more interventions (which may include placebo or other control) to evaluate the effects of those interventions on health-related biomedical or behavioral outcomes ( https://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-015.html ). For further clarification on how NIH defines the different types of clinical trials, please refer to the following resources:

  • NOT-OD-15-015: Notice of Revised NIH Definition of Clinical Trial
  • NIH's Definition of a Clinical Trial
  • Decision Tree for NIH Clinical Trial Definition
  • Guidance for Basic Experimental Studies with Humans (BESH) Funding Opportunities
  • NIH Definition of Clinical Trial Case Studies

Objective 2: Promote Integrated, Interdisciplinary Research Partnerships

A second key objective of this NOFO is to promote new research partnerships among investigators at R15-eligible institutions with separate (legally distinct) investigators at domestic research institutions. Investigators can have a multitude of research expertise that aligns with the proposed research projects and/or resources that can be shared to enhance the proposed research. Applications must propose a collaboration with at least one sub-award holder from a separate U.S. domestic research institution and should include details of how the collaboration will enhance the R15-research program must be described. Applications are permitted to have a subaward to a non-R15-eligible institution. However, it is expected that PD/PI(s) from R15-eligible institution(s) will lead the proposed project and complete most of the research at the R15-eligible institution. As such, PI(s) from R15-eligible institutions must serve as the contact program director (PD)/PI for the project. Additionally, no more than one third of the total budget for the project may be used by the identified sub-award institution.

Applications that propose new interdisciplinary are considered a high program priority under this NOFO. Interdisciplinary partnership could include, but are not limited to, any two or more areas of research expertise from the following:

  • Clinical pain management (e.g., nonpharmacologic or pharmacologic interventions)
  • Clinical pain research
  • Preclinical/basic pain biology and modeling
  • Specific disease and/or pathological conditions (either human or preclinical models)
  • Animal behavior
  • Artificial intelligence
  • Data science

In addition, a Team Management Plan is required as part of Objective 2.  Studies of team science have highlighted the need for effective management structures to achieve program goals. Many resources exist to aid in developing effective team-based programs (e.g., the  National Cancer Institute Collaboration and Team Science Field Guide ). The Team Management Plan focuses on management of the whole team/key personnel. Because teams will likely include individuals from widely divergent scientific backgrounds, teams must have a shared vision and a defined plan for communication and management of shared responsibilities, interpersonal interactions, and professional credit. The Team Management Plan should be included as an attachment (three pages maximum) to this application. It should address how the research team, including the PI from R15-eligible Instiution and collaborator(s), will work together to accomplish program objectives. See the application instructions for “Other Attachments” on the SF424(R&R) Other Project Information in Section IV.2 Instructions for Application Submission for details. The Team Management Plan should address the following points:

  • Organizational structure and team composition and roles
  • Shared leadership, contributions, and distributed responsibility for decision-making
  • Resource sharing and allocation
  • Credit assignment and/or intellectual property (IP) rights
  • Coordination and communication plans
  • Intra-team data sharing, archiving, and preservation

Objective 3: Enhance the Research Environment by Engaging Students

The third objective of this program is to enhance the pain research environment at the R15-eligible institution by engaging and providing research opportunities to health professional students or undergraduate and/or graduate students. A Facilities & Other Resources document is required to describe how the proposed research will enhance the pain research environment at the R15-eligible institution. Two-thirds of the proposed research project team should comprise personnel from the R15-eligible institutions, including health professional students, or graduate students or undergraduate students from the primary R15-eligible institution. Although the proposed research project must be led by the identified PD/PI, applications with strong and innovative student engagement are of high program priority.  If participating students have not yet been identified, the number and academic stage of those to be involved should be provided. Applications should identify which aspects of the proposed research will include student participation. Student involvement may include participation in the design of experiments, collection and analysis of data, execution and troubleshooting of experiments, participation in research meetings, and discussion of future research directions. When applicable, it is highly desirable that student participation also include presentation of research at local and/or national meetings (including the HEAL Annual Scientific Meeting and "Positively Uniting Researchers of Pain to Opine, Synthesize, & Engage" {PURPOSE} meeting), publication of journal articles, and collaborative interactions. By engaging in these activities and collaborating on pain-focused research projects at early stages of training, students will be better prepared and motivated to pursue careers in  pain research. Please see Section III for a list of eligible students.

This NOFO  aims to support pain research grants, not training or fellowship program s. As such, applications should not include training plans such as didactic training or non-research activities related to professional development.  Likewise, applications should not include independent student research projects. For applications that propose a clinical trial, the PD/PI must be the responsible individual of record for oversight of the trial though students can take part in all components of a clinical trial. Oversight includes (but is not limited to): interacting with relevant Institutional Review Board (IRB) staff; reviewing all informed consent documents; reporting potential serious adverse events; and maintaining responsibility for patient safety. However, the student can gain experience in all these components in conjunction with the individual leading the trial. Applications submitted to this NOFO may include additional investigators to those outlined above, including additional collaborators or consultants, or other individuals such as high school students, post-baccalaureate participants, postdoctoral fellows, or clinical fellows. However, involvement of such individuals does not fulfill the goal of enhancing the R15-eligible institutional environment and should account for less than one third of the overall proposed project team.

Additional Information

Non-responsiveness Criteria:

Applications deemed to be non-responsive will not proceed to review and will be withdrawn. Applications with one or more of the following characteristics are considered non-responsive to this NOFO:

  • Research that does not address the NIH HEAL Initiative mission to enhance pain management.
  • Failure to describe a proposed Research plan and specific aims primarily led by a PI from a R15-eligible Institution.
  • Omission of a domestic research partnership and accompanying sub award(s), or that include sub-award(s) that account for more than one third of the total project budget.
  • Failure to include the required Facilities & Other Resources document and Other Attachments, including a Team Management Planand, letters of support, including a letter of support from the identified subaward holder(s) and a letter of support from the R15-eligible institution’s provost. Please see s ection IV.2 “Instructions for Application Submission” for details. 
  • Proposing a clinical trial addressing safety, tolerability, efficacy, and/or effectiveness of pharmacologic, behavioral, biologic, surgical, or device (invasive or noninvasive) interventions.

Contacting Program Officers Prior to Submission

Applicants are strongly encouraged to consult with  program staff as plans for an application are being developed.

Rigor and Reproducibility

NIH strives for rigor and transparency in all research it funds. For this reason, the NIH HEAL Initiative   explicitly emphasizes the NIH application instructions related to rigor and transparency ( https://grants.nih.gov/policy/reproducibility/guidance.htm ) and provides additional guidance from individual NIH institutes and centers (ICs) to the scientific community. For example, the biological rationale for the proposed experiments must be based on rigorous and robust supporting data, which means that data should be collected via methods that minimize the risk of bias and be reported in a transparent manner. If previously published or preliminary studies do not meet these standards, applicants should address how the current study design addresses the deficiencies in rigor and transparency. Proposed experiments should likewise be designed in a manner that minimizes the risk of bias and ensures validity of experimental results.

Proposed research projects should incorporate adequate methodological rigor where applicable, including but not limited to a clear rationale for the chosen model(s) and primary/secondary endpoint(s), clear descriptions of tools and parameters, blinding, randomization, adequate sample size, prespecified inclusion/exclusion criteria, appropriate handling of missing data and outliers, appropriate controls, pre-planned analyses, and appropriate quantitative techniques.

Applications should also clearly indicate the exploratory vs. confirmatory components of the study, consider study limitations, and plan for transparent reporting of all methods, analyses, and results so that other investigators can evaluate the quality of the work and potentially perform replications. NIH intends to maximize the impact of NIH HEAL Initiative-supported projects through broad and rapid data sharing and immediate access to publications ( https://heal.nih.gov/about/public-access-data ). Guidelines for complying with the HEAL Public Access and Data Sharing Policy can be found at  https://heal.nih.gov/data/complying-heal-data-sharing-policy . More details about NIH HEAL Initiative data sharing are described in Section IV.

Clinical Trial Accrual Policy:

For applications that are proposing to conduct a clinical trial, a series of clinical recruitment milestones detailing completion of the clinical trial and providing contingency plans to proactively confront potential delays or disturbances in attaining the clinical recruitment milestones must be included along with a study timeline in the PHS Human Subjects and Clinical Trials Information form. Continuation of the award is conditional upon satisfactory progress, availability of funds, and scientific priorities of the NIH HEAL Initiative. If, at any time, recruitment falls significantly below the projected milestones for recruitment, NIH will consider ending support and negotiating an orderly phaseout of the award. NIH retains the option of periodic external peer review of progress. NIH program staff will closely monitor progress at all stages for milestones, accrual, and safety.  

Expected Activities of Coordination

NIH HEAL Initiative awardees are strongly encouraged to cooperate and coordinate their activities. It is expected that NIH HEAL Initiative awardees will cooperate and coordinate their activities after post award by participating in PD/PI meetings, including:

NIH HEAL Initiative Scientific Meeting Attendance

Applicants and students are highly encouraged to attend the annual NIH HEAL Initiative Scientific Meetings. The NIH HEAL Initiative hosts an annual meeting of more than 800 NIH HEAL Initiative-funded researchers across the initiative’s research portfolio and career stage spectrum, NIH staff, people with lived and living experience, community partners advising initiative-funded projects, advocacy groups, and other stakeholders to

  • Share research advances and cutting-edge science
  • Discover opportunities, challenges, and approaches to build on the initiative’s progress
  • Connect and explore collaboration with other NIH HEAL Initiative-funded researchers and collaborators to enhance initiative-funded research.

Annual National Pain Scientists Career Development Program (PURPOSE) Meeting

Applicants and students are also highly encouraged to enroll in the HEAL Initiative: Positively Uniting Researchers of Pain to Opine, Synthesize, and Engage (PURPOSE) network and attend its annual meetings. Details can be found at https://painresearchers.com . The HEAL R24 Coordinating Center for National Pain Scientists works to improve the collaboration between basic, translational, and clinical researchers who do not regularly collaborate or work together. One function of the HEAL R24 Coordinating Center for National Pain Scientists is to organize an annual meeting for established scientists as well as early-career pain investigators. This annual meeting facilitates the creation of a network of pain research mentors and mentees as well as fostering communication between scientists and clinicians of different disciplines and providing enhanced mentorship, leadership courses, and any additional training that might be helpful for early-career scientists. R15 recipients are encouraged to attend the annual PURPOSE meeting, either virtually or in person.

See Section VIII. Other Information for award authorities and regulations.

Plan for Enhancing Diverse Perspectives (PEDP) The NIH recognizes that teams comprised of investigators with diverse perspectives working together and capitalizing on innovative ideas and distinct viewpoints outperform homogeneous teams. There are many benefits that flow from a scientific workforce rich with diverse perspectives, including: fostering scientific innovation, enhancing global competitiveness, contributing to robust learning environments, improving the quality of the research, advancing the likelihood that underserved populations participate in, and benefit from research, and enhancing public trust. To support the best science, the NIH encourages inclusivity in research guided by the consideration of diverse perspectives. Broadly, diverse perspectives can include but are not limited to the educational background and scientific expertise of the people who perform the research; the populations who participate as human subjects in research studies; and the places where research is done. This NOFO requires a Plan for Enhancing Diverse Perspectives (PEDP), which will be assessed as part of the scientific and technical peer review evaluation.  Assessment of applications containing a PEDP are based on the scientific and technical merit of the proposed project. Consistent with federal law, the race, ethnicity, or sex (including gender identify, sexual orientation, or transgender status) of a researcher, award participant, or trainee will not be considered during the application review process or when making funding decisions.  Applications that fail to include a PEDP will be considered incomplete and will be administratively withdrawn before review. The PEDP will be submitted as Other Project Information as an attachment (see Section IV).  Applicants are strongly encouraged to read the NOFO instructions carefully and view the available PEDP guidance materials .

Investigators proposing NIH-defined clinical trials may refer to the Research Methods Resources website for information about developing statistical methods and study designs.

Section II. Award Information

Grant: A financial assistance mechanism providing money, property, or both to an eligible entity to carry out an approved project or activity.

The  OER Glossary  and the How to Apply - Application Guide provide details on these application types. Only those application types listed here are allowed for this NOFO.

Optional: Accepting applications that either propose or do not propose clinical trial(s).

Need help determining whether you are doing a clinical trial?

The NIH HEAL Initiative intends to commit an estimated total of $1.25 million to fund up to three awards per year for FY 2025, FY 2026, and FY 2027. Support for this funding opportunity is contingent upon annual NIH appropriations and the submission of a sufficient number of meritorious applications

Applicants may request up to $375,000 in direct costs for the entire project period. No more than one third of total project costs may go to non-R15-eligible institutions. Annual inflationary increases are not allowed.

The scope of the proposed project should determine the project period. The maximum project period is 3 years. 

NIH grants policies as described in the NIH Grants Policy Statement will apply to the applications submitted and awards made from this NOFO.

Section III. Eligibility Information

1. eligible applicants eligible organizations higher education institutions public/state controlled institutions of higher education private institutions of higher education the following types of higher education institutions are always encouraged to apply for nih support as public or private institutions of higher education: hispanic-serving institutions historically black colleges and universities (hbcus) tribally controlled colleges and universities (tccus) alaska native and native hawaiian serving institutions asian american native american pacific islander serving institutions (aanapisis) in addition, applicant organizations must meet the following criteria at the time of submission: the applicant organization must be an accredited public or nonprofit private school that grants baccalaureate or advanced degrees in health professions (see section below for more details) or biomedical and behavioral sciences. the application must be submitted by the eligible organization with a unique entity identifier (such as uei or duns) and a unique nih era institutional profile file (ipf) number. at the time of application submission, determination of eligibility will be based in part on nih institutional support. a year is defined as a federal fiscal year: from october 1 through september 30.   note that collaborating subawardees do not need to adhere to the r15 eligibility criteria stated above. however, they must be separate legal entities that fulfill the terms of an eligible subaward agreement. for this particular nofo, they must also be u.s. domestic institutions. more details can be found at https://grants.nih.gov/policy/subawards . undergraduate focused institutions: at the time of application submission, all the non-health professional components of the institution combined must not have received support from the nih totaling more than $6 million per year (in both direct and f&a/indirect costs) in 4 of the last 7 years. for institutions composed of multiple schools and colleges, the $6 million funding limit is based on the amount of nih funding received by all the non-health professional schools and colleges within the institution as a whole. note that all activity codes are included in this calculation except the following: c06, s10, and all activity codes starting with a g. help determining the organization funding level can be found at https://grants.nih.gov/grants/funding/determing-organization-funding-levels-r15-eligibility.pdf    an academic component is any school/college that is not a health professional school or college. a qualifying academic component (i.e., school/college) within an institution (e.g., school of arts and sciences) has greater undergraduate student enrollment than graduate student enrollment. all types of health professional schools and colleges are not eligible to apply and are not considered in this calculation.  for institutions with multiple campuses, eligibility can be considered for each individual campus (e.g., main, satellite, etc.) only if separate ueis and nih ipf numbers are established for each campus. for institutions that use one uei or nih ipf number for all campuses, eligibility is determined for all campuses (e.g., main, satellite, etc.) combined.   health professional and graduate schools   at the time of application submission, all components of the institution combined must not have received support from the nih totaling more than $6 million per year (in both direct and f&a/indirect costs) in 4 of the last 7 years. for institutions composed of multiple schools and colleges, the $6 million funding limit is based on the amount of nih funding received by all of the schools and colleges within the institution as a whole. note that all activity codes are included in this calculation except the following: c06, s10, and all activity codes starting with a g. a graduate school offers advanced degrees, beyond the undergraduate level, in an academic discipline including m.a., m.s., and ph.d. degrees. health professional schools and colleges are accredited institutions that provide education and training leading to a health professional degree, including but not limited to: b.s.n., m.s.n., d.n.p., m.d., d.d.s., d.o., pharm.d., d.v.m., o.d., d.p.t., d.c., n.d., d.p.m., m.o.t., o.t.d., d.p.t., m.s.-s.l.p., c.sc.d., s.l.p.d., au.d., m.s.p.o., m.s.a.t., and m.p.h. eligible health professional schools/colleges may include schools or colleges of nursing, medicine, dentistry, osteopathy, pharmacy, veterinary medicine, public health, optometry, allied health, chiropractic, naturopathy, podiatry, rehabilitation medicine, physical therapy, orthotics and prosthetics, kinesiology, occupational therapy, and psychology. accreditation must be provided by a body approved for such purpose by the secretary of education. for institutions with multiple campuses, eligibility can be considered for each individual campus (e.g., main, satellite, etc.) only if a unique identifier number and nih ipf number are established for each campus. for institutions that use one identifier number or nih ipf number for all campuses, eligibility is determined for all campuses (e.g., main, satellite, etc.) together. additional eligibility guidance a signed letter is required from the provost or similar official with institution-wide responsibility verifying the eligibility of the applicant institution at the time of application submission according to the eligibility criteria indicated above. see the application instructions for “other attachments” on the sf424(r&r) other project information form in section iv.2 instructions for application submission. final eligibility will be validated by nih prior to award. to assist in determining eligibility, organizations are encouraged to use the nih report website under nih awards by location & organization . a prep application must provide evidence of a subaward to a separate institution , and the grantee may partner with a non-r15-eligible institution. however, applicants should keep the goals of the prep in mind when preparing the application, which include strengthening the research environment of eligible institutions and engaging students from eligible institutions in pain research. it is expected that the project, and two-thirds of the total project budget, will be directed by the pd(s)/pi(s) at r15-eligible institution(s). a letter of support from each collaborator is required verifying the research collaboration at the time of application submission according to the eligibility criteria indicated above. the letter(s) should detail how the proposed research partnership will help to accomplish the proposed pain research project, enhance the r15-eligible institution’s research program, and promote synergy from an integrated, interdisciplinary research partnership(s) among the multiple proposed institutions. see the application instructions for “other attachments” on the sf424(r&r) other project information form in section iv.2 instructions for application submission. foreign organizations non-domestic (non-u.s.) entities (foreign organizations) are not eligible to apply. non-domestic (non-u.s.) components of u.s. organizations are not eligible to apply. foreign components, as defined in the nih grants policy statement , are allowed.  required registrations applicant organizations applicant organizations must complete and maintain the following registrations as described in the how to apply - application guide to be eligible to apply for or receive an award. all registrations must be completed prior to the application being submitted. registration can take 6 weeks or more, so applicants should begin the registration process as soon as possible. failure to complete registrations in advance of a due date is not a valid reason for a late submission, please reference nih grants policy statement section 2.3.9.2 electronically submitted applications for additional information system for award management (sam) – applicants must complete and maintain an active registration, which requires renewal at least annually . the renewal process may require as much time as the initial registration. sam registration includes the assignment of a commercial and government entity (cage) code for domestic organizations which have not already been assigned a cage code. nato commercial and government entity (ncage) code – foreign organizations must obtain an ncage code (in lieu of a cage code) in order to register in sam. unique entity identifier (uei) - a uei is issued as part of the sam.gov registration process. the same uei must be used for all registrations, as well as on the grant application. era commons - once the unique organization identifier is established, organizations can register with era commons in tandem with completing their grants.gov registrations; all registrations must be in place by time of submission. era commons requires organizations to identify at least one signing official (so) and at least one program director/principal investigator (pd/pi) account in order to submit an application. grants.gov – applicants must have an active sam registration in order to complete the grants.gov registration. program directors/principal investigators (pd(s)/pi(s)) all pd(s)/pi(s) must have an era commons account.  pd(s)/pi(s) should work with their organizational officials to either create a new account or to affiliate their existing account with the applicant organization in era commons. if the pd/pi is also the organizational signing official, they must have two distinct era commons accounts, one for each role. obtaining an era commons account can take up to 2 weeks. eligible individuals (program director/principal investigator) any individual(s) with the skills, knowledge, and resources necessary to carry out the proposed research as the program director(s)/principal investigator(s) (pd(s)/pi(s)) is invited to work with their organization to develop an application for support. individuals from diverse backgrounds, including individuals from underrepresented racial and ethnic groups, individuals with disabilities, and women are always encouraged to apply for nih support. see, reminder: notice of nih's encouragement of applications supporting individuals from underrepresented ethnic and racial groups as well as individuals with disabilities , not-od-22-019 . for institutions/organizations proposing multiple pds/pis, visit the multiple program director/principal investigator policy and submission details in the senior/key person profile (expanded) component of the how to apply - application guide . to be eligible for support under a prep grant, the pd(s)/pi(s) must meet the following additional criteria: each pd/pi must have a primary appointment at either an r15-eligible institution, including professional or graduate schools, undergraduate-focused organizations, or a college within the applicant institution, as defined in “eligible organizations,” above. if proposing multiple pd(s)/pi(s), each pd/pi must be at an r15-eligible institution. each pd/pi may not be the pd/pi of an active nih research grant, including another r15 grant, at the time of award of a prep grant, although they may be one of the key personnel for an active nih grant held by another pd/pi. each pd/pi may not be awarded support under more than one r15 grant at a time, although he or she may have support under successive new or renewal grants. 2. cost sharing.

This NOFO does not require cost sharing as defined in the NIH Grants Policy Statement NIH Grants Policy Statement Section 1.2 Definition of Terms.

3. Additional Information on Eligibility

Number of Applications

Applicant organizations may submit more than one application, provided that each application is scientifically distinct.

The NIH will not accept duplicate or highly overlapping applications under review at the same time, per NIH Grants Policy Statement Section 2.3.7.4 Submission of Resubmission Application . This means that the NIH will not accept:

  • A new (A0) application that is submitted before issuance of the summary statement from the review of an overlapping new (A0) or resubmission (A1) application.
  • A resubmission (A1) application that is submitted before issuance of the summary statement from the review of the previous new (A0) application.
  • An application that has substantial overlap with another application pending appeal of initial peer review (see  NIH Grants Policy Statement 2.3.9.4 Similar, Essentially Identical, or Identical Applications ).

Section IV. Application and Submission Information

1. requesting an application package.

The application forms package specific to this opportunity must be accessed through ASSIST, Grants.gov Workspace or an institutional system-to-system solution. Links to apply using ASSIST or Grants.gov Workspace are available in Part 1 of this NOFO. See your administrative office for instructions if you plan to use an institutional system-to-system solution.

2. Content and Form of Application Submission

It is critical that applicants follow the instructions in the Research (R) Instructions in the  How to Apply - Application Guide  except where instructed in this notice of funding opportunity to do otherwise. Conformance to the requirements in the How to Apply - Application Guide is required and strictly enforced. Applications that are out of compliance with these instructions may be delayed or not accepted for review.

Letter of Intent

Although a letter of intent is not required, is not binding, and does not enter into the review of a subsequent application, the information that it contains allows IC staff to estimate the potential review workload and plan the review.

By the date listed in Part 1. Overview Information , prospective applicants are asked to submit a letter of intent that includes the following information:

  • Descriptive title of proposed activity
  • Name(s), address(es), and telephone number(s) of the PD(s)/PI(s)
  • Names of other key personnel
  • Participating institution(s)
  • Number and title of this funding opportunity

The letter of intent should be sent to:

Jessica McKlveen, PhD National Center for Complementary & Integrative Health (NCCIH) Telephone: 301-594-8018 Email:  [email protected]

Page Limitations

All page limitations described in the How to Apply – Application Guide and the Table of Page Limits must be followed.

The following section supplements the instructions found in the How to Apply – Application Guide and should be used for preparing an application to this NOFO.

SF424(R&R) Cover

All instructions in the How to Apply - Application Guide must be followed.

SF424(R&R) Project/Performance Site Locations

Sf424(r&r) other project information.

Facilities & Other Resources (Required):

  • A profile of the scientific background, academic level, and expertise of the students of the applicant institution and any information or estimate of the number who have obtained a health professional baccalaureate or advanced degree and gone on to obtain an academic or professional doctoral or other advanced degree in the health-related sciences during the last 5 years.
  • Description of plans to build a broad team of prospective researchers, including students, with a variety of backgrounds, expertise, and skills, and to arrive at major decisions, accounting for different points of view. Personnel from the primary R15-eligible institution(s) should compose a two-thirds majority of the project team .
  • Description of the special characteristics of the applicant institution that make it appropriate for an PREP grant awarded through this NOFO to: (1) support the efforts by R15-eligible principal investigators (PIs) at undergraduate-focused institutions or health professional schools and graduate schools to conduct small-scale basic and mechanistic pain research projects; (2) promote integrated, interdisciplinary research partnerships between R15-eligible PIs and additional investigators from U.S. domestic institutions; and (3) enhance the pain research environment at the R15-eligible institution for health professional students or undergraduate and/or graduate students by actively engaging them in the proposed pain research projects.
  • Description of the likely impact of a PREP grant on the ability of the PD(s)/PI(s) to engage students in research.
  • Description of the likely impact of a PREP grant on the research environment of the applicant institution.
  • Description of the likely impact of the PREP grant on the ability of health professional or undergraduate and/or graduate students at the institution to gain experience conducting biomedical research.
  • Description of the resources of the grantee institution available for the proposed research (e.g., equipment, supplies, laboratory space, release time, matching funds).
  • Although the majority of the research project should be conducted at the R15-eligible institution, the use of special facilities or equipment at another institution is permitted. For any proposed research sites other than the applicant institution, provide a brief description of the resources and access students will need and have to these resources.

Applications without a Facilities & Other Resources document will be withdrawn. 

Other Attachments:

Applications that fail to include the following three required ‘other’ attachments will be considered incomplete and will be withdrawn.

1.Team Management Plan (Required three pages maximum):

A key goal of this program is to establish new research partnerships among R15-eligible investigators and other domestic research centers, programs, or institutions with complementary research expertise and/or resources. To ensure that prospective research teams fit the goals of the PREP, a team management plan is required. Applications with team management plans that exceed the three-page limit will be withdrawn.

As an “Other Attachment” entitled Team-Management-Plan.pdf, applications should describe how the research collaborators will function to accomplish program objectives. Team management approaches raised in the subsections listed below should be described in the plan. Note that a “Multiple PD/PI Leadership Plan” may also be submitted as a separate attachment, and if it is included the information in that plan should not be duplicated here. Whereas the Multiple PD/PI Leadership Plan focuses on leadership by and interactions across the PD/PIs, the Team Management Plan focuses on management of the whole team/key personnel. Applicants are encouraged to consult resources to aid in developing effective team-based programs (see e.g., the  NCI Collaboration and Team Science Field Guide ).

Organizational structure and team composition: The Team Management Plan should clearly show the organizational structure and composition of the proposed project team. Two-thirds of the proposed research project team should be health professional students trainees or graduate students or undergraduate from the primary R15-eligible institution. The plan should describe a management structure based on project objectives that effectively promote the proposed research. The structure should account for team composition, institutional resources, and policies that conform with PREP objectives outlined in Section I.

Shared leadership, contributions, and distributed responsibility for decision-making: The Team Management Plan should include a description of how the proposed collaborators will work together to direct the overall scientific team to leverage the diverse perspectives, expertise, and skills of the team members to successfully accomplish the goals of the project. One key consideration is that teams employing multidisciplinary approaches and having diverse areas of intellectual and technical expertise are more productive if the process for making decisions incorporates different points of view. The Team Management Plan should describe how major decisions will be made or how conflicts will be resolved.

Resource sharing and allocation across the team: Applications should describe management and decision-making processes that promote collective input for allocation of program resources with flexibility when resources may need to be dynamically reallocated to achieve programmatic goals. A plan for how intra-team, institutional, and regional resources that are integral to the team goals will be shared and made accessible to team members should also be included.

Credit assignment: A plan for how credit and IP will be shared, especially with the R15 institution’s students, should be included. Methods for attributing contributions to publications should be described to enable individual professional assessment in joint projects.

Coordination and communication plans: Practical aspects should be described, including frequency and logistics of real-time communication across all key personnel, consultants, scholars, early-stage investigators etc., and other significant contributors regardless of effort level.

An important and meaningful impact of team science may come from shaping the next generation of pain scientists. Because of the interdisciplinary expertise of the research groups, students are exposed to and can learn a variety of scientific approaches and methodologies, resulting in multifaceted early-stage investigators. Plans for how students trainees will be immersed in and benefit from different approaches taken by the collective team program should be described. This could include shared mentorship, inter-laboratory meetings, all-hands tutorials, shared meeting and document space, inter-laboratory visits, and student presentations.

2. Provost Letter(s) of Support: The application must include a PDF-formatted letter named “ProvostLetter.pdf” (without quotation marks). For MPI applications a signed provost letter is required from each involved institution. The letter must be signed by the provost or similar official with institution-wide responsibility attesting to the following information:

For Undergraduate Focused Institutions:

  • The eligible academic component(s) (i.e., the college/school level) must have more undergraduates than graduate students as of the date of submission.
  • All the non-health professional components of the institution together have received support from the NIH totaling no more than $6 million per year (in both direct and F&A/indirect costs) in 4 of the last 7 years, as described in Section III, "Eligible Organization".
  • Validation that the PD/PI has (or in the case of a multiple PD/PI application that all PD(s)/PI(s) have) a primary appointment at the qualifying component (i.e., the college/school level).  

For Health Professional and Graduate Schools:

  • The eligible academic component(s) (i.e., the college/school level) must be a health professional or graduate school that awards health professional baccalaureate or advanced degrees in biomedical and/or biobehavioral sciences.
  • All components of the institution together have received support from NIH totaling no more than $6 million per year (in both direct and F&A/indirect costs) in 4 of the last 7 years, as described in Section III, “Eligible Organization.”
  • Validation that the PD/PI has (or in the case of a multiple PD/PI application that all PD(s)/PI(s) have) a primary appointment at the qualifying component (i.e., the college/school level).

3. Collaborator Letter(s) of Support:  Applications must include additional PDF-formatted letter(s) from collaborating subaward holder(s) named “CollaboratorLetter_ Initials .pdf” (without quotation marks). For multiple collaborators, a signed letter is required from each involved collaborator. Note that collaborators do not need to meet the R15-eligibility criteria outlined above. The letter should demonstrate the collaborator's willingness to collaborate with the study lead as well as briefly outline their contributions to the project that will result in a well-integrated, interdisciplinary research approach to the understanding of pain. If the proposed collaboration is a new research partnership among investigators, this information should also be included.

Plan for Enhancing Diverse Perspectives (PEDP)

  • In an "Other Attachment" entitled "Plan for Enhancing Diverse Perspectives," all applicants must include a summary of actionable strategies to advance the scientific and technical merit of the proposed project through expanded inclusivity.
  • Applicants should align their proposed strategies for PEDP with the research strategy section, providing a holistic and integrated view of how enhancing diverse perspectives and inclusivity are buoyed throughout the application.
  • The PEDP will vary depending on the scientific aims, expertise required, the environment and performance site(s), as well as how the project aims are structured.
  • Actionable strategies using defined approaches for the inclusion of diverse perspectives in the project;
  • Description of how the PEDP will advance the scientific and technical merit of the proposed project;
  • Anticipated timeline of proposed PEDP activities;
  • Evaluation methods for assessing the progress and success of PEDP activities.

Examples of items that advance inclusivity in research and may be appropriate for a PEDP can include, but are not limited to:

  • Partnerships with different types of institutions and organizations (e.g., research-intensive; undergraduate-focused; HBCUs; emerging research institutions; community-based organizations).
  • Project frameworks that enable communities and researchers to work collaboratively as equal partners in all phases of the research process.
  • Outreach and planned engagement activities to enhance recruitment of individuals from diverse groups as human subjects in clinical trials, including those from underrepresented backgrounds.
  • Description of planned partnerships that may enhance geographic and regional diversity.
  • Outreach and recruiting activities intended to diversify the pool of applicants for research training programs, such as outreach to prospective applicants from groups underrepresented in the biomedical sciences, for example, individuals from underrepresented racial and ethnic groups, those with disabilities, those from disadvantaged backgrounds, and women.
  • Plans to utilize the project infrastructure (i.e., research and structure) to enhance the research environment and support career-advancing opportunities for junior, early- and mid-career researchers.
  • Transdisciplinary research projects and collaborations among researchers from fields beyond the biological sciences, such as physics, engineering, mathematics, computational biology, computer and data sciences, as well as bioethics.

Examples of items that are not appropriate in a PEDP include, but are not limited to:

  • Selection or hiring of personnel for a research team based on their race, ethnicity, or sex (including gender identify, sexual orientation, or transgender status).
  • A training or mentorship program limited to certain researchers based on their race, ethnicity, or sex (including gender identify, sexual orientation, or transgender status).

For further information on the Plan for Enhancing Diverse Perspectives (PEDP), please see PEDP guidance materials .

SF424(R&R) Senior/Key Person Profile

R&r or modular budget.

  • The total budget for all years of the proposed project must be requested in Budget Period 1. Do not complete Budget Periods 2 or 3. They are not required and will not be accepted with the application.
  • Applicants submitting an application with direct costs of $250,000 or less (total for all years, excluding consortium Facilities and Administrative [F&A] costs) must use the Modular Budget.
  • Applicants submitting an application with direct costs of $250,001 - $375,000 (total for all years, excluding consortium Facilities and Administrative [F&A] costs) must use the R&R Budget.
  • Students must be compensated for their participation in the laboratory's research and in accord with institutional policies. Student salaries can be requested in the R15 budget, or other resources at the university can be used to pay them for their participation. Undergraduate students who are compensated from the R15 grant or other institutional funds should receive at least the national minimum wage . Compensation through course credit hours towards graduation is allowable, but must be justified. If universities/colleges provide room and board for summer research students, details must be provided in the application.
  • NIH does not fund stipends for undergraduates on R15 awards.

Budget Justification:

Personnel Justification: Since a primary objective of the PREP is to expose and incorporate students into multidisciplinary pain research, PD(s)/PI(s) must include health professional or undergraduate and/or graduate students from the applicant institution/applicant component in the proposed research. Students from the R15-eligible institution should compose the majority of the research team (two thirds or more). Indicate aspects of the proposed research in which students will participate. If participating students have not yet been identified, the number and academic level of those to be involved should be provided. Collaborators or consultants for the project should provide additional budget information, including their names, their organizational affiliations, and the services they will perform.

PEDP implementation costs: Applicants may include allowable costs associated with PEDP implementation (as outlined in the Grants Policy Statement section 7): https://grants.nih.gov/grants/policy/nihgps/html5/section_7/7.1_general.htm.

R&R Subaward Budget

Phs 398 cover page supplement, phs 398 research plan.

All instructions in the  How to Apply - Application Guide must be followed, with the following additional instructions:

Research Strategy:  

The research strategy must address how the proposed project intends to accomplish all three objectives of this program, including: 1) Supporting the efforts by R15-eligible principal investigators (PIs) at undergraduate-focused institutions OR health professional schools and graduate schools to conduct small-scale basic and mechanistic pain research; (2) promoting integrated, interdisciplinary research partnerships between R15-eligible PIs and additional investigators from U.S. domestic institutions; and (3) enhancing the pain research environment at the R15-eligible institution for health professional students or undergraduate and/or graduate students by actively engaging them in the proposed pain research projects.

Applications should include a detailed description of a research approach that will  produce rigorous data that can be disseminated and advance our basic and mechanistic understanding of pain.  Additionally, the research strategy should detail how the proposed research partnership includes sufficient integrative pain expertise and related resources and/or institutional infrastructure that increase the likelihood of success. The application should detail how the proposed scientific research and proposed program and research partnership will have a substantial effect on strengthening the research environment at the proposed applicant’s institution.

Applications should provide details on how the research project will be directed by the R15-eligible PI and how two-thirds of the research project will be conducted at the R15-eligible institution. The research strategy should detail how the research team will recruit additional prospective investigators, including students, from a range of backgrounds, skills, and expertise for the broad pool of researchers who may apply to participate and contribute to the project. Applications should include details about how the investigators will cooperate and coordinate their activities with other HEAL investigators at PD/PI meetings, including (but not limited to) other investigators in the R15 program, the HEAL Annual Scientific and PURPOSE meetings.Proposed PD/PI(s) should include evidence of experience supervising students in previous research efforts, as well as describing any innovative approaches to engage students in the proposed pain research project. Applications should provide additional details outlining student involvement in the research project by addressing the following questions:

  • How will students engage in conducting hands-on rigorous research?
  • How will students participate in research activities such as planning, execution, and/or analysis of the research?
  • Are there any additional plans for student involvement, such as presentation at local or national meetings, participation in publication of research findings, and development of, or participation in, collaborative activities?
  • How will the project provide students with adequate opportunities to improve their research capabilities and support their progress toward a future career in pain research?
  • Note-The purpose of this program is to support pain research projects, not student training. Formal training plans (e.g., non-research activities, didactic training, seminars) should not be provided, although a brief description of activities related to enhancing students’ research capabilities and progress (e.g., the use of individual development plans) is permitted. Furthermore, applications should not include independent student research projects.

Resource Sharing Plan : Individuals are required to comply with the instructions for the Resource Sharing Plans as provided in the  How to Apply - Application Guide .

Other Plan(s): 

All instructions in the How to Apply - Application Guide must be followed, with the following additional instructions:

  • All applicants planning research (funded or conducted in whole or in part by NIH) that results in the generation of scientific data are required to comply with the instructions for the Data Management and Sharing Plan. All applications, regardless of the amount of direct costs requested for any one year, must address a Data Management and Sharing Plan. 

The NIH HEAL Initiative has additional requirements that must be addressed in the Data Management and Sharing Plan. All HEAL-generated data must be shared through the HEAL Initiative Data Ecosystem following HEAL’s compliance guidance ( https://heal.nih.gov/data/complying-heal-data-sharing-policy ). Specifically, HEAL applicants must include:

  • Plans to submit data and metadata (and code, if applicable) to a HEAL-compliant data repository ( https://www.healdatafair.org/resources/guidance/selection ) and follow requirements of the selected repository.
  • Plans to register your study with the HEAL platform within one year of award ( https://heal.github.io/platform-documentation/study-registration/ ).
  • Plans to submit HEAL-defined study-level metadata within one year of award (HTTP ://github.com/HEAL/heal-metadata-schemas/blob/main/for-investigators-how-to/study-level-metadata-fields/study-metadata-schema-for-humans.pdf ) and  https://heal.github.io/platform-documentation/slmd_submission/ .
  • Plans to submit data dictionaries to the HEAL Data Ecosystem, if applicable.
  • HEAL pain clinical studies must include a plan to use HEAL core Common Data Elements (CDEs) ( https://heal.nih.gov/data/common-data-elements ). NIH HEAL Initiative clinical studies that are using copyrighted questionaries are required to obtain licenses for use prior to initiating data collection. Licenses must be shared with the HEAL CDE team and the program officer prior to use of copyrighted materials.
  • To the extent possible, all other (nonpain) HEAL studies conducting clinical trials or research involving human subjects are expected to use questionnaires by the HEAL Common Data Elements (CDE) Program ( https://heal.nih.gov/data/common-data-elements ) if applicable and relevant to their research.
  • Studies using CDEs, regardless of whether they are part of the HEAL repository, will be required to report which questionnaires are being used.
  • To the extent possible, NIH HEAL Initiative awardees are expected to integrate broad data sharing consent language into their informed consent forms.

The NIH HEAL Initiative has developed additional details and resources to fulfill these requirements ( https://www.healdatafair.org/resources/road-map ). Budgeting guidance for data sharing can be found in NOT-OD-21-015 and the NIH Scientific Data Sharing site .

Appendix:  Only limited Appendix materials are allowed. Follow all instructions for the Appendix as described in the How to Apply - Application Guide .

  • No publications or other material, with the exception of blank questionnaires or blank surveys, may be included in the Appendix.

PHS Human Subjects and Clinical Trials Information

When involving human subjects research, clinical research, and/or NIH-defined clinical trials (and when applicable, clinical trials research experience) follow all instructions for the PHS Human Subjects and Clinical Trials Information form in the How to Apply - Application Guide , with the following additional instructions:

If you answered “Yes” to the question “Are Human Subjects Involved?” on the R&R Other Project Information form, you must include at least one human subjects study record using the Study Record: PHS Human Subjects and Clinical Trials Information form or Delayed Onset Study record.

Study Record: PHS Human Subjects and Clinical Trials Information

Section 2 - Study Population Characteristics

2.5 Recruitment and Retention Plan

Describe the following: 

  • Recruitment milestones; 
  • The planned recruitment methods, including use of contact lists (participants and/or sites), databases or other pre-screening resources, advertisements, outreach, media / social media and referral networks or groups;
  • If there are known participant or study-related barriers to accrual or participation (based on literature or prior experience), please list these barriers and describe plans to address them to optimize success; 
  • Contingency plans for participant accrual if enrollment significantly lags behind accrual benchmarks; 5) participant retention and adherence strategies; and 6) possible competition from other trials for study participants.

2.7 Study Timeline

Include a table or graph of the overall study timeline. This is expected to be a visual representation (such as a Gantt chart) of recruitment milestones and key project management activities. A narrative is not expected in this section.

The study timeline should include recruitment milestones that need to be met throughout the life cycle of the clinical trial to ensure its success, and the subtasks that will be used to reach the recruitment milestones. In the timeline, the study duration is expected to be displayed in months. The timeline should include, but is not limited to, the following:

(a) When the study opens to enrollment (b) When recruitment milestones (see below) are met (c) What subtasks are needed to reach of the recruitment milestones (d) When final transfer of the data will occur (e) When analysis of the study data will occur (f) When the primary study manuscript will be submitted for publication

Delayed Onset Study

Note: Delayed onset does NOT apply to a study that can be described but will not start immediately (i.e., delayed start). All instructions in the How to Apply - Application Guide must be followed.

PHS Assignment Request Form

3. unique entity identifier and system for award management (sam).

See Part 2. Section III.1 for information regarding the requirement for obtaining a unique entity identifier and for completing and maintaining active registrations in System for Award Management (SAM), NATO Commercial and Government Entity (NCAGE) Code (if applicable), eRA Commons, and Grants.gov

4. Submission Dates and Times

Part I.  contains information about Key Dates and times. Applicants are encouraged to submit applications before the due date to ensure they have time to make any application corrections that might be necessary for successful submission. When a submission date falls on a weekend or Federal holiday , the application deadline is automatically extended to the next business day.

Organizations must submit applications to Grants.gov (the online portal to find and apply for grants across all Federal agencies). Applicants must then complete the submission process by tracking the status of the application in the eRA Commons , NIH’s electronic system for grants administration. NIH and Grants.gov systems check the application against many of the application instructions upon submission. Errors must be corrected and a changed/corrected application must be submitted to Grants.gov on or before the application due date and time.  If a Changed/Corrected application is submitted after the deadline, the application will be considered late. Applications that miss the due date and time are subjected to the NIH Grants Policy Statement Section 2.3.9.2 Electronically Submitted Applications .

Applicants are responsible for viewing their application before the due date in the eRA Commons to ensure accurate and successful submission.

Information on the submission process and a definition of on-time submission are provided in the How to Apply – Application Guide .

5. Intergovernmental Review (E.O. 12372)

This initiative is not subject to intergovernmental review.

6. Funding Restrictions

All NIH awards are subject to the terms and conditions, cost principles, and other considerations described in the NIH Grants Policy Statement .

Pre-award costs are allowable only as described in the NIH Grants Policy Statement Section 7.9.1 Selected Items of Cost .

Applications must be submitted electronically following the instructions described in the How to Apply - Application Guide . Paper applications will not be accepted.

Applicants must complete all required registrations before the application due date. Section III. Eligibility Information contains information about registration.

For assistance with your electronic application or for more information on the electronic submission process, visit How to Apply – Application Guide . If you encounter a system issue beyond your control that threatens your ability to complete the submission process on-time, you must follow the Dealing with System Issues guidance. For assistance with application submission, contact the Application Submission Contacts in Section VII .

Important reminders:

All PD(s)/PI(s) must include their eRA Commons ID in the Credential field of the Senior/Key Person Profile form . Failure to register in the Commons and to include a valid PD/PI Commons ID in the credential field will prevent the successful submission of an electronic application to NIH. See Section III of this NOFO for information on registration requirements.

The applicant organization must ensure that the unique entity identifier provided on the application is the same identifier used in the organization’s profile in the eRA Commons and for the System for Award Management. Additional information may be found in the How to Apply - Application Guide .

See more tips for avoiding common errors.

Applications must include a PEDP submitted as Other Project Information as an attachment. Applications that fail to include a PEDP will be considered incomplete and will be administratively withdrawn before review.

Upon receipt, applications will be evaluated for completeness and compliance with application instructions by the Center for Scientific Review and responsiveness by components of participating organizations , NIH. Applications that are incomplete, non-compliant and/or nonresponsive will not be reviewed.

In order to expedite review, applicants are requested to notify the NCCIH Referral Office by email at  [email protected] when the application has been submitted. Please include the NOFO and title, PD/PI name, and title of the application.

Recipients or subrecipients must submit any information related to violations of federal criminal law involving fraud, bribery, or gratuity violations potentially affecting the federal award. See Mandatory Disclosures, 2 CFR 200.113 and NIH Grants Policy Statement Section 4.1.35 .

Send written disclosures to the NIH Chief Grants Management Officer listed on the Notice of Award for the IC that funded the award and to the HHS Office of Inspector Grant Self Disclosure Program at [email protected]

Post Submission Materials

Applicants are required to follow the instructions for post-submission materials, as described in the policy

The following post-submission materials will be accepted: Team Management Plan (e.g., due to the hiring, replacement, or loss of an investigator).

Section V. Application Review Information

1. criteria.

Only the review criteria described below will be considered in the review process.  Applications submitted to the NIH in support of the NIH mission are evaluated for scientific and technical merit through the NIH peer review system.

For this particular NOFO, note the following:

The purpose of this HEAL Initiative program is to (1) support the efforts by R15-eligible principal investigators (PIs) at primarily undergraduate-focused serving institutions or health professional schools and graduate schools to conduct small-scale basic and mechanistic pain research projects ; (2) promote integrated, interdisciplinary research partnerships between R15-eligible PIs and investigators from U.S. domestic institutions; and (3) enhance the pain research environment at the R15-eligible institution for health professional students or undergraduate and/or graduate students by actively engaging them in the proposed pain research projects.

Applications in response to this notice of funding opportunity (NOFO) should include plans to accomplish these goals. Specifically, applications should include a rigorous plan for conducting basic and mechanistic pain research projects in the Research Strategy section of the application . In addition, a research partnership between the PI’s institution and at least one investigator from a separate U.S. domestic institution that provides resources and/or expertise that will enhance the proposed pain research program must be included in a separate Team Management Plan. The proposed partnership will be a sub-award agreement(s) with at least one partnering institution, which does not need to be R15-eligible. The budget of all sub-awards must not exceed one third of the total budget. Furthermore, applications must include a Facilities & Other Resources document  that demonstrates active involvement of health professional students or undergraduate and/or graduate students from the R15-eligible institution(s) in the proposed pain research projects.

Although preliminary data are not required for an R15 application, they may be included if available. The scientific foundation for the proposed research should be based on published research and/or any available preliminary data.

A proposed Clinical Trial application may include study design, methods, and intervention that are not by themselves innovative but address important questions or unmet needs. Additionally, the results of the clinical trial may indicate that further clinical development of the intervention is unwarranted or lead to new avenues of scientific investigation.

Reviewers will provide an overall impact score to reflect their assessment of the likelihood for the project to exert a sustained, powerful influence on the research field(s) involved, in consideration of the following review criteria and additional review criteria (as applicable for the project proposed).As part of the overall impact score, reviewers should consider and indicate how the Plan for Enhancing Diverse Perspectives affects the scientific merit of the project.

Reviewers will consider each of the review criteria below in the determination of scientific merit and give a separate score for each. An application does not need to be strong in all categories to be judged likely to have major scientific impact. For example, a project that by its nature is not innovative may be essential to advance a field.

Does the project address an important problem or a critical barrier to progress in the field? Is the prior research that serves as the key support for the proposed project rigorous? If the aims of the project are achieved, how will scientific knowledge, technical capability, and/or clinical practice be improved? How will successful completion of the aims change the concepts, methods, technologies, treatments, services, or preventative interventions that drive this field?

In addition, for applications involving clinical trials

Are the scientific rationale and need for a clinical trial to test the proposed hypothesis or intervention well supported by preliminary data, clinical and/or preclinical studies, or information in the literature or knowledge of biological mechanisms? For trials focusing on clinical or public health endpoints, is this clinical trial necessary for testing the safety, efficacy or effectiveness of an intervention that could lead to a change in clinical practice, community behaviors or health care policy? For trials focusing on mechanistic, behavioral, physiological, biochemical, or other biomedical endpoints, is this trial needed to advance scientific understanding?

Specific to this NOFO:

Taking into consideration the type of R15-eligible institution the application has been submitted from, if funded, will this grant have a substantial effect on strengthening the research environment at the applicant institution and exposing students to research ?

Does the project adequately describe how the research partnership will advance our understanding of pain conditions? 

If the aims of the project are achieved, will the project yield rigorous data that can be disseminated and is likely to be important to the field?

Will the proposed collaboration appropriately improve the R15 institutional environment in a manner to support more students to engage in pain research at that institution?

Are the PD(s)/PI(s), collaborators, and other researchers well suited to the project? If Early Stage Investigators or those in the early stages of independent careers, do they have appropriate experience and training? If established, have they demonstrated an ongoing record of accomplishments that have advanced their field(s)? If the project is collaborative or multi-PD/PI, do the investigators have complementary and integrated expertise; are their leadership approach, governance and organizational structure appropriate for the project?

With regard to the proposed leadership for the project, do the PD/PI(s) and key personnel have the expertise, experience, and ability to organize, manage and implement the proposed clinical trial and meet milestones and timelines? Do they have appropriate expertise in study coordination, data management and statistics? For a multicenter trial, is the organizational structure appropriate and does the application identify a core of potential center investigators and staffing for a coordinating center?

Does the application provide details about how the research project will be directed by the R15-eligible PI and how two-thirds of the research project will be conducted at the R15-eligible institution?

Is it clear how the applicant intends to recruit additional prospective investigators, including students, from a range of backgrounds, skills, and expertise for the pool of researchers who may apply to address the proposed scientific problem?

Will the combined scientific expertise (of the proposed collaborative research team) likely result in a well-integrated, interdisciplinary research approach to the understanding of pain?

Does the team of investigators include sufficient integrative pain expertise for the proposed research?

How appropriate is the PD/PI’'s experience in supervising and engaging students in research?

Does the application include details about how the investigators will cooperate and coordinate their activities with other HEAL investigators at PD/PI meetings, including (but not limited to) other investigators in the R15 program, the HEAL Annual Scientific and PURPOSE meetings?

Team Management Plan (Attachment):

How fair and adequate are the governance processes for decision making, conflict resolution, and resource allocation outlined in the plan? 

How effective is the plan for team leadership and management with sufficient examples of distributed responsibility?

How well would the program leadership create a sustainable environment for maintaining cohesiveness, productivity, and shared vision?

How adequate are the management plans for shared professional credit?

If shared research resources will be utilized, how adequate are the plans for resource sharing and allocation to ensure that all team members will have the access they require?

How well does the plan include examples of team coordination and communication?

How clearly does the plan include details about which personnel are available at the R15-eligible institution(s), including health professional students or graduate students or undergraduate students, that would compose a two-thirds majority of the project team and how they would contribute to the research project?

How well does the management plan outline how the collaborative partnership will supervise and engage students?

Does the application challenge and seek to shift current research or clinical practice paradigms by utilizing novel theoretical concepts, approaches or methodologies, instrumentation, or interventions? Are the concepts, approaches or methodologies, instrumentation, or interventions novel to one field of research or novel in a broad sense? Is a refinement, improvement, or new application of theoretical concepts, approaches or methodologies, instrumentation, or interventions proposed?

Does the design/research plan include innovative elements, as appropriate, that enhance its sensitivity, potential for information or potential to advance scientific knowledge or clinical practice?

Does the proposed research include innovative interdisciplinary pain research topics?

Is the proposed research partnership a new collaboration between investigators?

Are innovative approaches for engaging health professional or undergraduate and/or graduate students in research proposed?

Are the overall strategy, methodology, and analyses well-reasoned and appropriate to accomplish the specific aims of the project? Have the investigators included plans to address weaknesses in the rigor of prior research that serves as the key support for the proposed project? Have the investigators presented strategies to ensure a robust and unbiased approach, as appropriate for the work proposed? Are potential problems, alternative strategies, and benchmarks for success presented? If the project is in the early stages of development, will the strategy establish feasibility and will particularly risky aspects be managed? Have the investigators presented adequate plans to address relevant biological variables, such as sex, for studies in vertebrate animals or human subjects? 

If the project involves human subjects and/or NIH-defined clinical research, are the plans to address 1) the protection of human subjects from research risks, and 2) inclusion (or exclusion) of individuals on the basis of sex/gender, race, and ethnicity, as well as the inclusion or exclusion of individuals of all ages (including children and older adults), justified in terms of the scientific goals and research strategy proposed?

Does the application adequately address the following, if applicable

Study Design

Is the study design justified and appropriate to address primary and secondary outcome variable(s)/endpoints that will be clear, informative and relevant to the hypothesis being tested? Is the scientific rationale/premise of the study based on previously well-designed preclinical and/or clinical research? Given the methods used to assign participants and deliver interventions, is the study design adequately powered to answer the research question(s), test the proposed hypothesis/hypotheses, and provide interpretable results? Is the trial appropriately designed to conduct the research efficiently? Are the study populations (size, gender, age, demographic group), proposed intervention arms/dose, and duration of the trial, appropriate and well justified?

Are potential ethical issues adequately addressed? Is the process for obtaining informed consent or assent appropriate? Is the eligible population available? Are the plans for recruitment outreach, enrollment, retention, handling dropouts, missed visits, and losses to follow-up appropriate to ensure robust data collection? Are the planned recruitment timelines feasible and is the plan to monitor accrual adequate? Has the need for randomization (or not), masking (if appropriate), controls, and inclusion/exclusion criteria been addressed? Are differences addressed, if applicable, in the intervention effect due to sex/gender and race/ethnicity?

Are the plans to standardize, assure quality of, and monitor adherence to, the trial protocol and data collection or distribution guidelines appropriate? Is there a plan to obtain required study agent(s)? Does the application propose to use existing available resources, as applicable?

Data Management and Statistical Analysis

Are planned analyses and statistical approach appropriate for the proposed study design and methods used to assign participants and deliver interventions? Are the procedures for data management and quality control of data adequate at clinical site(s) or at center laboratories, as applicable? Have the methods for standardization of procedures for data management to assess the effect of the intervention and quality control been addressed? Is there a plan to complete data analysis within the proposed period of the award?

Taking into consideration the type of R15-eligible institution the application has been submitted from, how suitable are the plans for ensuring that students are well integrated into the research program?

How will this project provide students with a high-quality research experience focused on the execution, analysis, and reporting of the study? 

Would students have adequate opportunities to present at national or local meetings, publish research findings, and/or participate in other collaborative activities? 

Would the proposed research project provide adequate opportunities for students to improve their research capabilities and support their progress toward a biomedical research career? 

Will the scientific environment in which the work will be done contribute to the probability of success? Are the institutional support, equipment and other physical resources available to the investigators adequate for the project proposed? Will the project benefit from unique features of the scientific environment, subject populations, or collaborative arrangements?

If proposed, are the administrative, data coordinating, enrollment and laboratory/testing centers, appropriate for the trial proposed?

Does the application adequately address the capability and ability to conduct the trial at the proposed site(s) or centers? Are the plans to add or drop enrollment centers, as needed, appropriate?

If international site(s) is/are proposed, does the application adequately address the complexity of executing the clinical trial?

If multi-sites/centers, is there evidence of the ability of the individual site or center to: (1) enroll the proposed numbers; (2) adhere to the protocol; (3) collect and transmit data in an accurate and timely fashion; and, (4) operate within the proposed organizational structure?

Does the "Facilities & Other Resources" attachment describe strong and innovative approaches to how students or trainees will participate in the research project?

Does the application demonstrate appropriate plans to recruit health professional or undergraduate and/or graduate students from diverse backgrounds to participate in the research project?

Does the application provide a plan to aid students at the R15-eligible institution/academic component to pursue careers in the biomedical sciences?

Do(es) the PD/PI(s) have sufficient time and institutional support to conduct the proposed project?

Is there synergy to be gained from the integrated, interdisciplinary research partnership(s) among the multiple proposed institutions?

As applicable for the project proposed, reviewers will evaluate the following additional items while determining scientific and technical merit, and in providing an overall impact score, but will not give separate scores for these items.

Specific to applications involving clinical trials

Is the study timeline described in detail, taking into account start-up activities, the anticipated rate of enrollment, and planned follow-up assessment? Is the projected timeline feasible and well justified? Does the project incorporate efficiencies and utilize existing resources (e.g., CTSAs, practice-based research networks, electronic medical records, administrative database, or patient registries) to increase the efficiency of participant enrollment and data collection, as appropriate?

Are potential challenges and corresponding solutions discussed (e.g., strategies that can be implemented in the event of enrollment shortfalls)?

Specific to this NOFO: Are the clinical trial recruitment milestones feasible given the proposed study timeline?

For research that involves human subjects but does not involve one of the categories of research that are exempt under 45 CFR Part 46, the committee will evaluate the justification for involvement of human subjects and the proposed protections from research risk relating to their participation according to the following five review criteria: 1) risk to subjects, 2) adequacy of protection against risks, 3) potential benefits to the subjects and others, 4) importance of the knowledge to be gained, and 5) data and safety monitoring for clinical trials.

For research that involves human subjects and meets the criteria for one or more of the categories of research that are exempt under 45 CFR Part 46, the committee will evaluate: 1) the justification for the exemption, 2) human subjects involvement and characteristics, and 3) sources of materials. For additional information on review of the Human Subjects section, please refer to the Guidelines for the Review of Human Subjects .

When the proposed project involves human subjects and/or NIH-defined clinical research, the committee will evaluate the proposed plans for the inclusion (or exclusion) of individuals on the basis of sex/gender, race, and ethnicity, as well as the inclusion (or exclusion) of individuals of all ages (including children and older adults) to determine if it is justified in terms of the scientific goals and research strategy proposed. For additional information on review of the Inclusion section, please refer to the Guidelines for the Review of Inclusion in Clinical Research .

The committee will evaluate the involvement of live vertebrate animals as part of the scientific assessment according to the following three points: (1) a complete description of all proposed procedures including the species, strains, ages, sex, and total numbers of animals to be used; (2) justifications that the species is appropriate for the proposed research and why the research goals cannot be accomplished using an alternative non-animal model; and (3) interventions including analgesia, anesthesia, sedation, palliative care, and humane endpoints that will be used to limit any unavoidable discomfort, distress, pain and injury in the conduct of scientifically valuable research. Methods of euthanasia and justification for selected methods, if NOT consistent with the AVMA Guidelines for the Euthanasia of Animals, is also required but is found in a separate section of the application. For additional information on review of the Vertebrate Animals Section, please refer to the Worksheet for Review of the Vertebrate Animals Section.

Reviewers will assess whether materials or procedures proposed are potentially hazardous to research personnel and/or the environment, and if needed, determine whether adequate protection is proposed.

For Resubmissions, the committee will evaluate the application as now presented, taking into consideration the responses to comments from the previous scientific review group and changes made to the project.

Not applicable. 

Not applicable.  

As applicable for the project proposed, reviewers will consider each of the following items, but will not give scores for these items, and should not consider them in providing an overall impact score.

Reviewers will assess whether the project presents special opportunities for furthering research programs through the use of unusual talent, resources, populations, or environmental conditions that exist in other countries and either are not readily available in the United States or augment existing U.S. resources.

Reviewers will assess the information provided in this section of the application, including 1) the Select Agent(s) to be used in the proposed research, 2) the registration status of all entities where Select Agent(s) will be used, 3) the procedures that will be used to monitor possession use and transfer of Select Agent(s), and 4) plans for appropriate biosafety, biocontainment, and security of the Select Agent(s).

Reviewers will comment on whether the Resource Sharing Plan(s) (e.g., Sharing Model Organisms ) or the rationale for not sharing the resources, is reasonable.

For projects involving key biological and/or chemical resources, reviewers will comment on the brief plans proposed for identifying and ensuring the validity of those resources.

Reviewers will consider whether the budget and the requested period of support are fully justified and reasonable in relation to the proposed research.

2. Review and Selection Process Applications will be evaluated for scientific and technical merit by (an) appropriate Scientific Review Group(s) convened by NCCIH, in accordance with NIH peer review policies and practices , using the stated review criteria. Assignment to a Scientific Review Group will be shown in the eRA Commons. As part of the scientific peer review, all applications will receive a written critique. Applications may undergo a selection process in which only those applications deemed to have the highest scientific and technical merit (generally the top half of applications under review) will be discussed and assigned an overall impact score. Appeals of initial peer review will not be accepted for applications submitted in response to this NOFO. Applications will be assigned on the basis of established PHS referral guidelines to the appropriate NIH Institute or Center. Applications will compete for available funds with all other recommended applications submitted in response to this NOFO. Following initial peer review, recommended applications will receive a second level of review by the appropriate national Advisory Council or Board. The following will be considered in making funding decisions: Scientific and technical merit of the proposed project, including the PEDP, as determined by scientific peer review Availability of funds. Relevance of the proposed project to program priorities. Please note that reviewers will not consider race, ethnicity, age, or sex (including gender identity, sexual orientation or transgender status) of a researcher, award participant, or trainee, even in part, in providing critiques, scores, or funding recommendations. NIH will not consider such factors in making its funding decisions. If the application is under consideration for funding, NIH will request "just-in-time" information from the applicant as described in the  NIH Grants Policy Statement Section 2.5.1. Just-in-Time Procedures . This request is not a Notice of Award nor should it be construed to be an indicator of possible funding. Prior to making an award, NIH reviews an applicant’s federal award history in SAM.gov to ensure sound business practices. An applicant can review and comment on any information in the Responsibility/Qualification records available in SAM.gov.  NIH will consider any comments by the applicant in the Responsibility/Qualification records in SAM.gov to ascertain the applicant’s integrity, business ethics, and performance record of managing Federal awards per 2 CFR Part 200.206 “Federal awarding agency review of risk posed by applicants.”  This provision will apply to all NIH grants and cooperative agreements except fellowships. 3. Anticipated Announcement and Award Dates

After the peer review of the application is completed, the PD/PI will be able to access his or her Summary Statement (written critique) via the  eRA Commons . Refer to Part 1 for dates for peer review, advisory council review, and earliest start date.

Information regarding the disposition of applications is available in the  NIH Grants Policy Statement Section 2.4.4 Disposition of Applications .

Section VI. Award Administration Information

1. award notices.

A Notice of Award (NoA) is the official authorizing document notifying the applicant that an award has been made and that funds may be requested from the designated HHS payment system or office. The NoA is signed by the Grants Management Officer and emailed to the recipient’s business official.

In accepting the award, the recipient agrees that any activities under the award are subject to all provisions currently in effect or implemented during the period of the award, other Department regulations and policies in effect at the time of the award, and applicable statutory provisions.

Recipients must comply with any funding restrictions described in  Section IV.6. Funding Restrictions . Any pre-award costs incurred before receipt of the NoA are at the applicant's own risk.  For more information on the Notice of Award, please refer to the  NIH Grants Policy Statement Section 5. The Notice of Award and NIH Grants & Funding website, see  Award Process.

Individual awards are based on the application submitted to, and as approved by, the NIH and are subject to the IC-specific terms and conditions identified in the NoA.

ClinicalTrials.gov: If an award provides for one or more clinical trials. By law (Title VIII, Section 801 of Public Law 110-85), the "responsible party" must register and submit results information for certain “applicable clinical trials” on the ClinicalTrials.gov Protocol Registration and Results System Information Website ( https://register.clinicaltrials.gov ). NIH expects registration and results reporting of all trials whether required under the law or not. For more information, see https://grants.nih.gov/policy/clinical-trials/reporting/index.htm

Institutional Review Board or Independent Ethics Committee Approval: Recipient institutions must ensure that all protocols are reviewed by their IRB or IEC. To help ensure the safety of participants enrolled in NIH-funded studies, the recipient must provide NIH copies of documents related to all major changes in the status of ongoing protocols.

Data and Safety Monitoring Requirements: The NIH policy for data and safety monitoring requires oversight and monitoring of all NIH-conducted or -supported human biomedical and behavioral intervention studies (clinical trials) to ensure the safety of participants and the validity and integrity of the data. Further information concerning these requirements is found at http://grants.nih.gov/grants/policy/hs/data_safety.htm and in the application instructions (SF424 (R&R) and PHS 398).

Investigational New Drug or Investigational Device Exemption Requirements: Consistent with federal regulations, clinical research projects involving the use of investigational therapeutics, vaccines, or other medical interventions (including licensed products and devices for a purpose other than that for which they were licensed) in humans under a research protocol must be performed under a Food and Drug Administration (FDA) investigational new drug (IND) or investigational device exemption (IDE).

2. Administrative and National Policy Requirements

The following Federal wide and HHS-specific policy requirements apply to awards funded through NIH:

  • The rules listed at 2 CFR Part 200 , Uniform Administrative Requirements, Cost Principles, and Audit Requirements for Federal Awards.
  • All NIH grant and cooperative agreement awards include the NIH Grants Policy Statement as part of the terms and conditions in the Notice of Award (NoA). The NoA includes the requirements of this NOFO. For these terms of award, see the NIH Grants Policy Statement Part II: Terms and Conditions of NIH Grant Awards, Subpart A: General and Part II: Terms and Conditions of NIH Grant Awards, Subpart B: Terms and Conditions for Specific Types of Grants, Recipients, and Activities .
  • HHS recognizes that NIH research projects are often limited in scope for many reasons that are nondiscriminatory, such as the principal investigator’s scientific interest, funding limitations, recruitment requirements, and other considerations. Thus, criteria in research protocols that target or exclude certain populations are warranted where nondiscriminatory justifications establish that such criteria are appropriate with respect to the health or safety of the subjects, the scientific study design, or the purpose of the research. For additional guidance regarding how the provisions apply to NIH grant programs, please contact the Scientific/Research Contact that is identified in Section VII under Agency Contacts of this NOFO.

All federal statutes and regulations relevant to federal financial assistance, including those highlighted in  NIH Grants Policy Statement Section 4 Public Policy Requirements, Objectives and Other Appropriation Mandates.

Recipients are responsible for ensuring that their activities comply with all applicable federal regulations.  NIH may terminate awards under certain circumstances.  See  2 CFR Part 200.340 Termination and  NIH Grants Policy Statement Section 8.5.2 Remedies for Noncompliance or Enforcement Actions: Suspension, Termination, and Withholding of Support . 

3. Data Management and Sharing

Consistent with the 2023 NIH Policy for Data Management and Sharing, when data management and sharing is applicable to the award, recipients will be required to adhere to the Data Management and Sharing requirements as outlined in the NIH Grants Policy Statement . Upon the approval of a Data Management and Sharing Plan, it is required for recipients to implement the plan as described.

HEAL Data Sharing Requirements

NIH intends to maximize the impact of NIH HEAL Initiative-supported projects through broad and rapid data sharing. All NIH HEAL Initiative award recipients, regardless of the amount of direct costs requested for any one year, are required to comply with the HEAL Public Access and Data Sharing Policy. NIH HEAL Initiative award recipients must follow all requirements and timelines developed through the HEAL Initiative Data Ecosystem ( https://heal.nih.gov/about/heal-data-ecosystem ), as described in the initiative’s  compliance guidance (See “Already Funded” section:  https://heal.nih.gov/data/complying-heal-data-sharing-policy ):   

1. Select a HEAL-compliant data repository ( https://www.healdatafair.org/resources/guidance/selection )

  • Data generated by NIH HEAL Initiative-funded projects must be submitted to study-appropriate, HEAL-compliant data repositories to ensure the data is accessible via the HEAL Initiative Data Ecosystem.
  • Some repositories require use of specific data dictionaries or structured data elements, so knowing your repository’s requirements up front can help reduce the burden of preparing data for submission.
  • HEAL-funded awardees must follow requirements for selected repository.

2. Within one year of award,  register your study with the HEAL platform ( https://heal.github.io/platform-documentation/study-registration/ )

  • This process will connect the platform to information about your study and data, including metadata, and identify the selected repository. HEAL requests initial submission within one year of award, with annual updates, and to be updated in accordance with any release of study data.

3.  Within one year of award, submit HEAL-specific study-level metadata.

  • Some of the required study-level metadata ( https://github.com/HEAL/heal-metadata-schemas/blob/main/for-investigators-how-to/study-level-metadata-fields/study-metadata-schema-for-humans.pdf ) will be autopopulated as part of the registration process.  

4. Submit data and metadata (and code, if applicable) to HEAL-compliant repository

  • At the completion of the study and/or when prepared to make the final data deposits in the repositor(ies) of choice, ensure your  study registration ( https://heal.github.io/platform-documentation/study-registration/ ) is complete.
  • Submit data dictionaries to the HEAL data ecosystem, if applicable.
  • The NIH HEAL Initiative expects data sharing timelines to align with timeline requirements stated in the Final NIH Policy for Data Management and Sharing ( NOT-OD-21-013 ).

6. Additional Requirements for NIH HEAL Initiative studies conducting clinical research or research involving human subjects.

These studies must meet the following additional requirements:

  • NIH HEAL Initiative trials that are required to register in clinicaltrials.gov should reference support from and inclusion in the NIH HEAL Initiative by including the standardized term “the HEAL Initiative ( https://heal.nih.gov/ )” in the Study Description Section.
  • Studies that wish to use questionnaires not already included in the HEAL CDE repository should consult with their program official and the HEAL CDE team. New questionnaires will be considered for inclusion in the repository on a case-by-case basis and only when appropriate justification is provided.
  • NIH HEAL Initiative clinical studies that are using copyrighted questionaries are required to obtain licenses for use prior to initiating data collection. Licenses must be shared with the HEAL CDE team and the program officer prior to use of copyrighted materials. For additional information, visit the HEAL CDE Program ( https://heal.nih.gov/data/common-data-elements ).
  • To the extent possible, all other (nonpain) HEAL studies conducting clinical trials or research involving human subjects are expected to use questionnaires by the HEAL CDE Program ( https://heal.nih.gov/data/common-data-elements ) if applicable and relevant to their research.

Additional details, resources, and tools to assist with data-related activities can be found at https://www.healdatafair.org .  Budgeting guidance for data sharing can be found in  NOT-OD-21-015 and the  NIH Scientific Data Sharing site .

All data collected as part of the NIH HEAL Initiative are collected under a Certificate of Confidentiality and entitled to the protections thereof. Institutions who receive data and/or materials from this award for performance of activities under this award are required to use the data and/or materials only as outlined by the NIH HEAL Initiative, in a manner that is consistent with applicable state and Federal laws and regulations, including any informed consent requirements and the terms of the institution’s NIH funding, including NOT-OD-17-109 and 42 U.S.C. 241(d). Failure to adhere to this criterion may result in enforcement actions.

4. Reporting

Progress reports for multi-year funded awards are due annually on or before the anniversary of the budget/project period start date of award. The reporting period for multi-year funded award progress report is the calendar year preceding the anniversary date of the award. Information on the content of the progress report and instructions on how to submit the report using the RPPR are posted at http://grants.nih.gov/grants/policy/myf.htm

  • Recipients will provide updates at least annually on implementation of the PEDP.

( To follow the next section ):

Report and ensure immediate public access to HEAL-funded publications

Publications resulting from NIH HEAL Initiative-funded studies must be immediately publicly available upon publication. 

  • For manuscripts published in journals that are not immediately open access, authors should arrange with journals in advance to pay for immediate open access. 
  • Costs to ensure manuscripts are immediately publicly available upon publication should be included in budget requests. 

Prior to publication, the NIH HEAL Initiative expects investigators to alert their program officers of upcoming manuscripts to ensure coordination of communication and outreach efforts.

Award recipients and their collaborators are required to acknowledge NIH HEAL Initiative support by referencing in the acknowledgment sections of any relevant publication:

“This research was supported by the National Institutes of Health through the NIH HEAL Initiative ( https://heal.nih.gov ) under award number [include specific grant/contract/award number; with NIH grant number(s) in this format: R01GM987654].” 

A final RPPR, invention statement, and the expenditure data portion of the Federal Financial Report are required for closeout of an award, as described in the NIH Grants Policy Statement Section 8.6 Closeout . NIH NOFOs outline intended research goals and objectives. Post award, NIH will review and measure performance based on the details and outcomes that are shared within the RPPR, as described at 2 CFR Part 200.301.

Section VII. Agency Contacts

We encourage inquiries concerning this funding opportunity and welcome the opportunity to answer questions from potential applicants.

eRA Service Desk (Questions regarding ASSIST, eRA Commons, application errors and warnings, documenting system problems that threaten submission by the due date, and post-submission issues)

Finding Help Online:  https://www.era.nih.gov/need-help  (preferred method of contact) Telephone: 301-402-7469 or 866-504-9552 (Toll Free)

General Grants Information (Questions regarding application instructions, application processes, and NIH grant resources) Email:  [email protected]  (preferred method of contact) Telephone: 301-480-7075

Grants.gov Customer Support (Questions regarding Grants.gov registration and Workspace) Contact Center Telephone: 800-518-4726 Email:  [email protected]

Alex Tuttle, Ph.D. National Center for Complementary and Integrative Health (NCCIH) Phone: 301-814-6115 Email:  [email protected]

Mark Egli, Ph.D. National Institute on Alcohol Abuse and Alcoholism (NIAAA) Phone: 301-594-6382 E-mail: [email protected]

Rebecca N Lenzi, Ph.D. NATIONAL INSTITUTE OF ARTHRITIS AND MUSCULOSKELETAL AND SKIN DISEASES (NIAMS) Phone: (301) 402-2446 E-mail: [email protected]

Rene Etcheberrigaray, M.D. National Institute on Aging (NIA) Phone: 301-451-9798 Email: [email protected]

Susan Marden, PhD, RN Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Telephone: 301-435-6838 Email: [email protected]  

Elizabeth Sypek, PhD National Institute of Neurological Disorders and Stroke (NINDS) Email:  [email protected]

Examine your eRA Commons account for review assignment and contact information (information appears 2 weeks after the submission due date).

Debbie Chen National Center for Complementary and Integrative Health (NCCIH) Phone: 301-594-3788 Email:  [email protected]

Judy Fox National Institute on Alcohol Abuse and Alcoholism (NIAAA) Telephone: 301-443-4704 Email:  [email protected]

Erik Edgerton NATIONAL INSTITUTE OF ARTHRITIS AND MUSCULOSKELETAL AND SKIN DISEASES (NIAMS) Phone: 301-594-7760 E-mail: [email protected]

Ryan Blakeney National Institute on Aging (NIA) Phone: 301-451-9802 Email: [email protected]

Margaret Young Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Telephone: 301-642-4552 Email: [email protected]

Section VIII. Other Information

Recently issued trans-NIH policy notices may affect your application submission. A full list of policy notices published by NIH is provided in the NIH Guide for Grants and Contracts . All awards are subject to the terms and conditions, cost principles, and other considerations described in the NIH Grants Policy Statement .

Awards are made under the authorization of Sections 301 and 405 of the Public Health Service Act as amended (42 USC 241 and 284) and under Federal Regulations 42 CFR Part 52 and 2 CFR Part 200.

NIH Office of Extramural Research Logo

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Pilot Feasibility Stud

Logo of pilotfs

Determining sample size for progression criteria for pragmatic pilot RCTs: the hypothesis test strikes back!

1 Biostatistics Group, School of Medicine, Keele University, Room 1.111, David Weatherall Building, Keele, Staffordshire ST5 5BG UK

2 Keele Clinical Trials Unit, Keele University, Keele, Staffordshire UK

C. J. Sutton

3 Centre for Biostatistics, School of Health Sciences, University of Manchester, Manchester, Staffordshire UK

H. L. Myers

G. a. lancaster, associated data.

Not applicable.

The current CONSORT guidelines for reporting pilot trials do not recommend hypothesis testing of clinical outcomes on the basis that a pilot trial is under-powered to detect such differences and this is the aim of the main trial. It states that primary evaluation should focus on descriptive analysis of feasibility/process outcomes (e.g. recruitment, adherence, treatment fidelity). Whilst the argument for not testing clinical outcomes is justifiable, the same does not necessarily apply to feasibility/process outcomes, where differences may be large and detectable with small samples. Moreover, there remains much ambiguity around sample size for pilot trials.

Many pilot trials adopt a ‘traffic light’ system for evaluating progression to the main trial determined by a set of criteria set up a priori. We construct a hypothesis testing approach for binary feasibility outcomes focused around this system that tests against being in the RED zone (unacceptable outcome) based on an expectation of being in the GREEN zone (acceptable outcome) and choose the sample size to give high power to reject being in the RED zone if the GREEN zone holds true. Pilot point estimates falling in the RED zone will be statistically non-significant and in the GREEN zone will be significant; the AMBER zone designates potentially acceptable outcome and statistical tests may be significant or non-significant.

For example, in relation to treatment fidelity, if we assume the upper boundary of the RED zone is 50% and the lower boundary of the GREEN zone is 75% (designating unacceptable and acceptable treatment fidelity, respectively), the sample size required for analysis given 90% power and one-sided 5% alpha would be around n = 34 (intervention group alone). Observed treatment fidelity in the range of 0–17 participants (0–50%) will fall into the RED zone and be statistically non-significant, 18–25 (51–74%) fall into AMBER and may or may not be significant and 26–34 (75–100%) fall into GREEN and will be significant indicating acceptable fidelity.

In general, several key process outcomes are assessed for progression to a main trial; a composite approach would require appraising the rules of progression across all these outcomes. This methodology provides a formal framework for hypothesis testing and sample size indication around process outcome evaluation for pilot RCTs.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40814-021-00770-x.

The importance and need for pilot and feasibility studies is clear: “A well-conducted pilot study, giving a clear list of aims and objectives … will encourage methodological rigour … and will lead to higher quality RCTs” [ 1 ]. The CONSORT extension to external pilot and feasibility trials was published in 2016 [ 2 ] with the following key methodological recommendations: (i) investigate areas of uncertainty about the future definitive RCT; (ii) ensure primary aims/objectives are about feasibility, which should guide the methodology used; (iii) include assessments to address the feasibility objectives which should be the main focus of data collection and analysis; and (iv) build decision processes into the pilot design whether or how to proceed to the main study. Given that many trials incur process problems during implementation—particularly with regard to recruitment [ 3 – 5 ]—the need for pilot and feasibility studies is evident.

One aspect of pilot and feasibility studies that remains unclear is the required sample size. There is no consensus but recommendations vary from 10 to 12 per group through to 60–75 per group depending on the main objective of the study. Sample size may be based on precision of a feasibility parameter [ 6 , 7 ]; precision of a clinical parameter which may inform main trial sample size—particularly the standard deviation (SD) [ 8 – 11 ] but also event rate [ 12 ] and effect size [ 13 , 14 ]; or, to a lesser degree, for clinical scale evaluation [ 9 , 15 ]. Billingham et al. [ 16 ] reported that the median sample size of pilot and feasibility studies is around 30–36 per group but there is wide variation. Herbert et al. [ 17 ] reported that targets within internal as opposed to external pilots are often slightly larger and somewhat different, being based on percentages of the total sample size and timeline rather than any fixed sample requirement.

The need for a clear directive on sample size of studies is of upmost relevance. The CONSORT extension [ 2 ] reports that “Pilot size should be based on feasibility objectives and some rationale given” and states that a “confidence interval approach may be used to calculate and justify the sample size based on key feasibility objective(s)”. Specifically, item 7a (How sample size was determined: Rationale for numbers in the pilot trial) qualifies: “Many pilot trials have key objectives related to estimating rates of acceptance, recruitment, retention, or uptake … for these sorts of objectives, numbers required in the study should ideally be set to ensure a desired degree of precision around the estimated rate”. Item 7b (When applicable, explanation of any interim analyses and stopping guidelines) is generally an uncommon scenario for pilot and feasibility studies and is not given consideration here.

A key aspect of pilot and feasibility studies is to inform progression to the main trial, which has important implications for all key stakeholders (funders, researchers, clinicians and patients). The CONSORT extension [ 2 ] states that “decision processes about how to proceed needs to be built into the pilot design (which might involve formal progression criteria to decide whether to proceed, proceed with amendments, or not to proceed)” and authors should present “if applicable, the pre-specified criteria used to judge whether or how to proceed with a future definitive RCT; … implications for progression from pilot to future definitive RCT, including any proposed amendments”. Avery et al. [ 18 ] published recommendations for internal pilots emphasising a traffic light (stop-amend-go/red-amber-green) approach to progression with focus on process assessment (recruitment, protocol adherence, follow-up) and transparent reporting around the choice of trial design and the decision-making processes for stopping, amending or proceeding to a main trial. The review of Herbert et al. [ 17 ] reported that the use of progression criteria (including recruitment rate) and traffic light stop-amend-go as opposed to simple stop-go is increasing for internal pilot studies.

A common misuse of pilot and feasibility studies has been the application of hypothesis testing for clinical outcomes in small under-powered studies. Arain et al. [ 19 ] claimed that pilot studies were often poorly reported with inappropriate emphasis on hypothesis testing. They reviewed 54 pilot and feasibility studies published in 2007–2008, of which 81% incorporated hypothesis testing of clinical outcomes. Similarly, Leon et al. [ 20 ] stated that a pilot is not a hypothesis testing study: safety, efficacy and effectiveness should not be evaluated. Despite this, hypothesis testing has been commonly performed for clinical effectiveness/efficacy without reasonable justification. Horne et al. [ 21 ] reviewed 31 pilot trials published in physical therapy journals between 2012 and 2015 and found that only 4/31 (13%) carried out a valid sample size calculation on effectiveness/efficacy outcomes but 26/31 (84%) used hypothesis testing. Wilson et al. [ 22 ] acknowledged a number of statistical challenges in assessing potential efficacy of complex interventions in pilot and feasibility studies. The CONSORT extension [ 2 ] re-affirmed many researchers’ views that formal hypothesis testing for effectiveness/efficacy is not recommended in pilot/feasibility studies since they are under-powered to do so. Sim’s commentary [ 23 ] further contests such testing of clinical outcomes stating that treatment effects calculated from pilot or feasibility studies should not be the basis of a sample size calculation for a main trial.

However, when the focus of analysis is on confidence interval estimation for process outcomes, this does not give a definitive basis for acceptance/rejection of progression criteria linked to formal powering. The issue in this regard is that precision focuses on alpha ( α , type I error) without clear consideration of beta (β, type II error) and may therefore not reasonably capture true differences if a study is under-powered. Further, it could be argued that hypothesis testing of feasibility outcomes (as well as addressing both alpha and beta) is justified on the grounds that moderate-to-large differences (‘process-effects’) may be expected rather than small differences that would require large sample numbers. Moore et al. [ 24 ] previously stated that some pilot studies require hypothesis testing to guide decisions about whether larger subsequent studies can be undertaken, giving the following example of how this could be done for feasibility outcomes: asking the question “Is taste of dietary supplement acceptable to at least 95% of the target population?”, they showed that sample sizes of 30, 50 and 70 provide 48%, 78% and 84% power to reject an acceptance rate of 85% or lower if the true acceptance rate is 95% using a 1-sided α = 0.05 binomial test. Schoenfeld [ 25 ] advocates that, even for clinical outcomes, there may be a place for testing at the level of clinical ‘indication’ rather than ‘clinical evidence’. He suggested that preliminary hypothesis testing for efficacy could be conducted with high alpha (up to 0.25), not to provide definitive evidence but as an indication as to whether a larger study should be conducted. Lee et al. [ 14 ] also reported how type 1 error levels other than the traditional 5% could be considered to provide preliminary evidence for efficacy, although they did stop short of recommending doing this by concluding that a confidence interval approach is preferable.

Current recommendations for sample sizes of pilot/feasibility studies vary, have a single rather than a multi-criterion basis, and do not necessarily link directly to formal progression criteria. The purpose of this article is to introduce a simple methodology that allows sample size derivation and formal testing of proposed progression cut-offs, whilst offering suggestions for multi-criterion assessment, thereby giving clear guidance and sign-posting for researchers embarking on a pilot/feasibility study to assess uncertainty in feasibility parameters prior to a main trial. The suggestions within the article do not directly apply to internal pilot studies built into the design of a main trial, but given the similarities to external randomised pilot and feasibility studies, many of the principles outlined here for external pilots might also extend to some degree to internal pilots of randomised and non-randomised studies.

The proposed approach focuses on estimation and hypothesis testing of progression criteria for feasibility outcomes that are potentially modifiable (e.g. recruitment, treatment fidelity/ adherence, level of follow up). Thus, it aligns with the main aims and objectives of pilot and feasibility studies and with the progression stop-amend-go recommendations of Eldridge et al. [ 2 ] and Avery et al. [ 18 ].

Hypothesis concept

Let R UL denote the upper RED zone cut-off and G LL denote the lower GREEN zone cut-off. The concept is to set up hypothesis testing around progression criteria that tests against being in the RED zone (designating unacceptable feasibility—‘ STOP ’) based on an alternative of being in the GREEN zone (designating acceptable feasibility—‘ GO ’). This is analogous to the zero difference (null) and clinically important difference (alternative) in a main superiority trial. Specifically, we are testing against R UL when G LL is hypothesised to be true:

  • Null hypothesis: True feasibility outcome ( ε ) not greater than the upper “RED” stop limit ( R UL )
  • Alternative hypothesis: True feasibility outcome ( ε ) is greater than R UL

The test is a 1-tailed test with suggested alpha ( α ) of 0.05 and beta (β) of 0.05, 0.1 or 0.2, dependent on the required strength of evidence of the test. An example of a feasibility outcome might be percentage recruitment uptake.

Progression rules

Let E denote the observed point estimate (ranging from 0 to 1 for proportions, or for percentages 0–100%). Simple 3-tiered progression criteria would follow as:

  • E ≤ R UL [ P value non-significant ( P ≥ α )] -> RED (unacceptable—STOP)
  • R UL < E < G LL -> AMBER (potentially acceptable—AMEND)
  • E ≥ G LL [ P value significant ( P < α )] -> GREEN (acceptable—GO)

Sample size

Table ​ Table1 1 displays a quick look-up grid for sample size across a range of anticipated proportions for R UL and G LL for one-sample one-sided 5% alpha with typical 80% and 90% (as well as 95%) power for the normal approximation method with continuity correction (see Appendix for corresponding mathematical expression; derived from Fleiss et al. [ 26 ]). Table ​ Table2 2 is the same look-up grid relating to the Binomial exact approach with sample sizes derived using G*Power version 3.1.9.7 [ 27 ]. Clearly, as the difference between proportions R UL and G LL increases the sample size requirement is reduced.

Sample size and significance cut-points for (G LL -R UL ) differences for a one-sample test, power (80%, 90%, 95%) and 1-tailed 5% significance level based on normal approximation (with continuity correction)

(0.05), (0.2) (0.05), (0.1) (0.05), (0.05)
%% (%) (%) (%)
10207912.3 (15.6)11116.3 (14.7)14320.2 (14.1)
152510121.1 (20.8)14028.0 (20.0)17934.7 (19.4)
15304911.5 (23.4)6815.0 (22.1)8718.5 (21.3)
203011931.0 (26.0)16541.5 (25.1)20951.3 (24.6)
20355716.4 (28.7)7821.4 (27.5)9926.3 (26.6)
20403410.6 (31.3)4613.7 (29.7)5916.9 (28.6)
253513441.7 (31.2)18555.9 (30.2)23469.4 (29.7)
25406321.4 (34.0)8628.1 (32.7)10934.7 (31.8)
25453713.6 (36.7)5117.8 (35.0)6421.7 (33.9)
2550259.8 (39.2)3312.3 (37.4)4215.1 (36.0)
304014652.9 (36.2)20171.0 (35.3)25387.9 (34.7)
30456826.6 (39.1)9335.2 (37.8)11743.3 (37.0)
30503916.4 (42.1)5421.7 (40.3)6726.3 (39.2)
30552611.6 (44.8)3515.0 (42.7)4418.2 (41.4)
3060188.6 (47.8)2511.3 (45.1)3013.1 (43.8)
354515564.0 (41.3)21285.6 (40.4)267106.3 (39.8)
35507131.5 (44.3)9741.7 (43.0)12151.0 (42.1)
35554119.4 (47.3)5625.5 (45.5)6930.7 (44.4)
35602713.5 (50.1)3617.3 (48.1)4420.6 (46.8)
35651910.1 (53.0)2512.7 (50.7)3115.2 (49.1)
405016074.2 (46.4)21999.5 (45.4)275123.4 (44.9)
40557336.1 (49.4)9947.6 (48.1)12458.6 (47.2)
40604222.0 (52.4)5628.4 (50.8)7034.7 (49.6)
40652715.0 (55.5)3619.2 (53.4)4422.9 (52.1)
40701911.1 (58.5)2514.0 (56.1)3016.4 (54.7)
455516383.8 (51.4)222112.1 (50.5)278138.7 (49.9)
45607440.3 (54.5)10053.2 (53.2)12464.9 (52.3)
45654224.2 (57.6)5631.3 (55.9)6937.8 (54.9)
45702716.4 (60.7)3621.1 (58.6)4425.2 (57.3)
45751912.1 (63.8)2414.8 (61.7)2917.5 (60.2)
506016291.5 (56.5)220122.2 (55.5)275151.5 (55.0)
50657343.5 (59.6)9857.1 (58.3)12169.5 (57.5)
50704125.8 (62.8)5533.6 (61.1)6740.2 (60.0)
50752717.8 (65.8)3421.8 (64.1)4226.3 (62.7)
556515997.8 (61.5)214129.7 (60.6)267160.2 (60.0)
55707145.9 (64.7)9459.6 (63.4)11773.2 (62.6)
55754027.2 (67.9)5234.5 (66.3)6441.7 (65.2)
6070152101.1 (66.5)204133.9 (65.6)253164.6 (65.1)
60756847.4 (69.8)8961.0 (68.5)10973.8 (67.7)
60803827.8 (73.1)4834.4 (71.6)5941.6 (70.5)
6575142101.6 (71.6)189133.6 (70.7)234164.1 (70.1)
65806347.2 (74.9)8159.7 (73.7)9972.2 (72.9)
65853426.7 (78.5)4433.8 (76.8)5239.5 (75.9)
708012998.9 (76.6)170128.8 (75.8)209157.2 (75.2)
70855644.8 (80.1)7256.8 (78.9)8767.9 (78.1)
758511392.3 (81.7)147118.9 (80.9)179143.8 (80.3)
75904840.9 (85.3)6050.5 (84.2)7159.3 (83.5)
80909380.7 (86.8)119102.4 (86.0)143122.3 (85.5)

R UL upper limit of RED zone (expressed as percentage of total sample), G LL lower limit of GREEN zone (expressed as percentage of total sample), A C AMBER-statistical significance threshold (within the AMBER zone) where an observed estimate below the cut-point will result in a non-significant result ( p ≥ 0.05) and figures at or above the cut-point will be significant ( p < 0.05) (%, as a percentage of n )

Sample sizes were derived using the normal approximation to the binomial distribution (with continuity correction) formula given in the Appendix , which by convention is stable for np > 5 and n (1 − p ) > 5.

For this approach, A C % is calculated from the 1-sided upper 95% confidence limit for the null proportion: 100% × ( R UL + z 1−α √(( R UL (1 − R UL ))/ n )) [e.g. for R UL = 20% v G LL = 35%, n = 78, power 90%: A C % = 100% × (0.2 + 1.645√((0.2(1 − 0.2))/78)) = 27.5%. In the example this is expressed as a proportion (0.275)]

The A C values do not account for the continuity correction (− 0.5 deduction) which would need to be applied to the observed count from a study prior to cross-checking against the A C cut-offs provided here

Sample size and significance cut-points for (G LL -R UL ) differences for a one-sample test, power (80%, 90%, 95%) and 1-tailed 5% significance level based on the binomial exact test

(0.05), (0.2) (0.05), (0.1) (0.05), (0.05)
%% (%) (%) (%)
10207813 (16.7)10917 (15.6)13520 (14.8)
152510122 (21.8)13628 (20.6)17635 (19.9)
15304812 (25.0)6415 (23.4)8519 (22.4)
203011631 (26.7)16041 (25.6)20451 (25.0)
20355617 (30.4)7722 (28.6)9827 (27.6)
20403512 (34.3)4715 (31.9)6018 (30.0)
253512941 (31.8)17955 (30.7)23069 (30.0)
25406222 (35.5)8328 (33.7)10735 (32.7)
25453614 (38.9)4918 (36.7)6222 (35.5)
25502611 (42.3)3313 (39.4)4216 (38.1)
304014453 (36.8)19369 (35.8)24887 (35.1)
30456727 (40.3)9336 (38.7)11443 (37.7)
30503917 (43.6)5322 (41.5)6727 (40.3)
30552512 (48.0)3616 (44.4)4419 (43.2)
3060179 (52.9)2512 (48.0)2813 (46.4)
354514862 (41.9)20684 (40.8)262105 (40.1)
35506831 (45.6)9642 (43.8)11951 (42.9)
35554120 (48.8)5325 (47.2)6831 (45.6)
35602614 (53.8)3618 (50.0)4522 (48.9)
35651911 (57.9)2413 (54.2)2915 (51.7)
405015874 (46.8)21498 (45.8)268121 (45.1)
40557136 (50.7)9446 (48.9)11957 (47.9)
40604223 (54.8)5629 (51.8)6734 (50.7)
40652816 (57.1)3419 (55.9)4524 (53.3)
40701912 (63.2)2515 (60.0)2816 (57.1)
455515480 (51.9)220112 (50.9)269135 (50.2)
45607039 (55.7)9853 (54.1)11963 (52.9)
45654225 (59.5)5431 (57.4)6838 (55.9)
45702516 (64.0)3622 (61.1)4426 (59.1)
45751611 (68.8)2315 (65.2)2918 (62.1)
506015890 (57.0)213119 (55.9)268148 (55.2)
50656942 (60.9)9355 (59.1)11969 (58.0)
50703724 (64.9)5333 (62.3)6741 (61.2)
50752316 (69.6)3322 (66.7)4227 (64.3)
556515093 (62.0)210128 (61.0)262158 (60.3)
55707046 (65.7)9259 (64.1)11472 (63.2)
55753726 (70.3)5034 (68.0)6241 (66.1)
607014396 (67.1)197130 (66.0)248162 (65.3)
60756244 (71.0)8559 (69.4)10773 (68.2)
60803627 (75.0)4533 (73.3)6043 (71.7)
657513396 (72.2)180128 (71.1)230162 (70.4)
65805542 (76.4)7556 (74.7)9872 (73.5)
65853125 (80.6)4233 (78.6)5240 (76.9)
708011992 (77.3)164125 (76.2)204154 (75.5)
70854940 (81.6)6955 (79.7)8567 (78.8)
758510385 (82.5)139113 (81.3)176142 (80.7)
75904539 (86.7)5547 (85.5)7059 (84.3)
80908272 (87.8)11297 (86.6)135116 (85.9)

R UL upper limit of RED zone (expressed as percentage of total sample), G LL lower limit of GREEN zone (expressed as percentage of total sample), A C AMBER-statistical significance threshold (within the AMBER zone) where an observed estimate below the cut-point will result in a non-significant result ( p ≥ 0.05) and figures at or above the cut-point will be significant ( p < 0.05) (%, expressed as a percentage of sample size ( n ))

Multi-criteria assessment

We recommend that progression for all key feasibility criteria should be considered separately, and hence overall progression would be determined by the worst-performing criterion, e.g. RED if at least one signal is RED, AMBER if none of the signals fall into RED but at least one falls into AMBER and GREEN if all signals fall into the GREEN zone. Hence, the GREEN signal to ‘GO’ across the set of individual criteria will give indication that progression to a main trial can take place without any necessary changes. A signal to ‘STOP’ and not proceed to a main trial is recommended if any of the observed estimates are ‘unacceptably’ low (i.e. fall within the RED zone). Otherwise, where neither ‘GO’ nor ‘STOP’ are signalled, the design of the trial will need amending by indication of subpar performance on one or more of the criteria.

Sample size requirements across multi-criteria will vary according to the designated parameters linked to the progression criteria, which may be set at different stages of the study on different numbers of patients (e.g. those screened, eligible, recruited and randomised, allocated to the intervention arm, total followed up). The overall size needed will be dictated by the requirement to power each of the multi-criteria statistical tests. Since these tests will yield separate conclusions in regard to the decision to ‘STOP’, ‘AMEND’ or ‘GO’ across all individual feasibility criteria there is no need to consider a multiple testing correction with respect to alpha. However, researchers may wish to increase power (and hence, sample size) to ensure adequate power to detect ‘GO’ signals across the collective set of feasibility criteria. For example, powering at 90% across three criteria (assumed independent) will ensure a collective power of 73% (i.e. 0.9 3 ), which may be considered reasonable, but 80% power across five criteria will reduce the power of the combined test to 33%. The final three columns of Table ​ Table1 1 cover the sample sizes required for 95% power, which may address collective multi-criteria assessment when considering keeping a high overall statistical power.

Further expansion of AMBER zone

Within the same sample size framework, the AMBER zone may be further split to indicate whether ‘minor’ or ‘major’ amendments are required according to the significance of the p value. Consider a 2-way split in the AMBER zone denoted by cut-off A C , which indicates the threshold for statistical significance, where an observed estimate below the cut-point will result in a non-significant result and an estimate at or above the cut-point a significant result. Let AMBER R denote the region of Amber zone adjacent to the RED zone between R UL and A C , and AMBER G denote the region of AMBER zone between A C and G LL adjacent to the GREEN zone. This would draw on two possible levels of amendment (‘major’ AMEND and ‘minor’ AMEND) and the re-configured approach would follow as:

  • R UL < E < G LL and P ≥ α { R UL < E < A c } -> AMBER R (major AMEND)
  • R UL < E < G LL and P < α { A c ≤ E < G LL } -> AMBER G (minor AMEND)

In Tables ​ Tables1 1 and ​ and2 2 in relation to designated sample sizes for different R UL and G LL and specified α and β, we show the corresponding cut-points for statistical significance ( p < 0.05) both in absolute terms of sample number ( n ) [ A C ] and as a percentage of the total sample sizes [ A C % ].

A motivating example (aligned to the normal approximation approach) is presented in Table ​ Table3, 3 , which illustrates a pilot trial with three progression criteria. Table ​ Table4 4 presents the sample size calculations for the example scenario following the 3-tiered approach, and Table ​ Table5 5 gives the sample size calculations for the example scenario using the extended 4-tiered approach. Cut-points for the feasibility outcomes relating to the shown sample sizes are also presented to show RED, AMBER and GREEN zones for each of the three progression criteria.

Motivating example—feasibility trial for oral protein energy supplements as flavoured drinks to improve nutritional status in children with cystic fibrosis

A feasibility trial is being set up to see whether children aged 2 to 15 years with cystic fibrosis will take oral protein energy supplements as flavoured drinks to improve their nutritional status, compared to receiving dietary advice alone. Children are to be randomised in a 1:1 allocation ratio using a parallel two-arm design. The research team wants to be sure they can meet three feasibility objectives before they go ahead and plan the main trial: reasonable recruitment uptake, high treatment fidelity (i.e. extent to which dietician practitioners comply with the treatment protocol) and adequate retention of children at follow up. The team asks their senior statistician to help them decide on an appropriate methodology including pilot sample size. The statistician suggests a traffic light approach incorporating hypothesis testing of the feasibility outcomes.

Together, the team devise three progression criteria that should be met before the main trial can be considered feasible as follows:

a. At least 35% of the children screened as eligible should be recruited but the trial will not be feasible if recruitment uptake is 20% or less.

b. A high level of treatment fidelity should be maintained with 75% or more children being given the correct treatment plan by the dietician, but if 50% or less children are given the plan as specified in the protocol then the trial is not feasible.

c. 85% or more of the children should be retained in the study at follow up, with 65% or less retention indicating that the main trial is not feasible.

The decision criteria and required sample size around these are detailed through two possible approaches within Table 4 (simple 3-tier approach) and Table 5 (extended 4-tier approach). The statistician is to use the normal approximation method (with continuity correction) for the sample size calculation and analysis.

Case illustration (standard 3-tiered approach)

A two-arm parallel design (1:1 allocation to intervention and control arms) with three key feasibility objectives, to assess (i) recruitment uptake (percent of screened patients recruited), (ii) treatment fidelity and (iii) participant retention (follow up). Hypothesis testing incorporates (1-sided) = 5% and power = 90%. The normal approximation method is used.

Assume the progression criteria (and affiliated sample size requirements) for each are as follows:

(i) Recruitment uptake ≤ 20% (RED zone) and ≥ 35% (GREEN zone) { = 20%, = 35%}

→ Required sample size = 78 [total screened patients]

(ii) Treatment fidelity ≤ 50% (RED zone) and ≥ 75% (GREEN zone) { = 50%, = 75%}

→ Required sample size = 34 [intervention arm only]

(iii) Follow up: ≤ 65% (RED zone), ≥ 85% (GREEN zone) { = 65%, = 85%}

→ Required sample size = 44 (total randomised participants with 22 per arm)

The sample sizes across criteria (i)-(iii) are at different levels—(i) is at the level of screened patients, whereas (ii)–(iii) are at the level of randomised patients. To meet criteria (i), we need ≥ 78 (although we will recruit = 200 (i.e. (1/0.35) × (rounded up to 200)) where 0.35 is the expected proportion uptake of the total number screened), and for (ii)–(iii), we need = 68 (34 per arm, based on (ii)).

Taking each of the objectives in turn (and the updated sample sizes to meet the multi-criteria objectives), we express progression criteria for the three objectives as follows:

(i) Recruitment uptake [required ≥ 78; expected = 200; maximum = 340 (i.e. (1/0.2)x )]

• ≤ 0.2 [ ≥ 0.05] -> RED (STOP)

• 0.2 < < 0.35 -> AMBER (AMEND)

• ≥ 0.35 [P < 0.05] -> GREEN (GO)

Signals for expected = 200:

0 to 40 (RED), > 40 to < 70 (AMBER) and 70 to 200 (GREEN) {i.e. 0.2 × 200 = 40; 0.35 × 200 = 70}

(ii) Treatment fidelity [ = 34 (intervention arm only)]

• ≤ 0.5 [ ≥ 0.05] -> RED (STOP)

• 0.5 < < 0.75 -> AMBER (AMEND)

• ≥ 0.75 [ < 0.05] -> GREEN (GO)

Signals for = 34:

0 to 17 (RED), > 17 to < 25.5 (AMBER) and 25.5 to 34 (GREEN) {i.e. 0.5 × 34 = 17; 0.75 × 34 = 25.5}

(iii) Follow up [ = 68 (intervention and control arms)]

• ≤ 0.65 [ ≥ 0.05] -> RED (STOP)

• 0.65 < < 0.85 -> AMBER (AMEND)

• ≥ 0.85 [ < 0.05] -> GREEN (GO)

Signals for = 68:

0 to 44.2 (RED), > 44.2 to < 57.8 (AMBER) and 57.8 to 68 (GREEN) {i.e. 0.65 × 68 = 44.2; 0.85 × 68 = 57.8}

[Note: The continuity correction (− 0.5 deduction) needs to be applied to the observed count from the study for each criterion prior to assessing into which signal band it falls]

In accordance with the multi-criteria aim, the decision to proceed would be based on the worst signal

➢ If signal = RED for (i) or (ii) or (iii) -> overall signal is RED

➢ Else, if no signal is RED but signal = AMBER for (i) or (ii) or (iii) -> overall signal is AMBER

➢ Else, if signals = GREEN for (i) and (ii) and (iii) -> overall signal is GREEN

R UL upper limit of RED zone, G LL lower limit of GREEN zone, n s number of screened patients who are eligible to being randomised, n r number of eligible patients randomised, n i number of patients randomised to the intervention arm

Case illustration (re-visited using 4-tiered approach)

Taking each of the objectives in turn, we re-express the progression criteria for the three objectives according to the 4-tiered approach, as follows:

(i) Recruitment uptake [expected = 200]

• ≤ 0.2 [ ≥ 0.05] -> RED (STOP)

• 0.2 < < 0.35 -> AMBER (AMEND)//{ = 0.247 (i.e. 0.2 + 1.645√(0.2 × 0.8/200))}*

o 0.2 < < 0.247 [ ≥ 0.05] -> AMBER (AMEND-major)

o 0.247 ≤ < 0.35 [ < 0.05] -> AMBER (AMEND-minor)

• ≥ 0.35 [ < 0.05] -> GREEN (GO)

Signals for = 200:

0 to 40 (RED), > 40 to < 49.4 (AMBER ), 49.4 to < 70 (AMBER ) and 70 to 200 (GREEN) {i.e. 0.2 × 200 = 40, 0.247 × 200 = 49.4, 0.35 × 200 = 70}

(ii) Treatment fidelity [ = 34 (intervention arm only)]

• ≤ 0.5 [ ≥ 0.05] -> RED (STOP)

• 0.5 < < 0.75 -> AMBER (AMEND)//{ = 0.641 (i.e. 0.5 + 1.645√(0.5 × 0.5/34))—as shown in Table 1}*

o 0.5 < < 0.641 [ ≥ 0.05] -> AMBER (AMEND-major)

o 0.641 ≤ < 0.75 [ < 0.05] -> AMBER (AMEND-minor)

• ≥ 0.75 [ < 0.05] -> GREEN (GO)

Signals for n = 34:

0 to 17 (RED), > 17 to < 21.79 (AMBER ), 21.79 to < 25.5 (AMBER ) and 25.5 to 34 (GREEN) {i.e. 0.5 × 34 = 17, 0.641 × 34 = 21.794, 0.75 × 34 = 25.5}

(iii) Follow up [ = 68 (intervention and control arms)]

• ≤ 0.65 [ ≥ 0.05] -> RED (STOP)

• 0.65 < < 0.85 -> AMBER (AMEND)//{ = 0.745 (i.e. 0.65 + 1.645x√(0.65 × 0.35/68))}*

o  0.65 < < 0.745 [ ≥ 0.05] -> AMBER (AMEND-major)

o  0.745 ≤ < 0.85 [ < 0.05] -> AMBER (AMEND-minor)

• ≥ 0.85 [ < 0.05] -> GREEN (GO)

Signals for = 68:

0 to 44.2 (RED), > 44.2 to < 50.66 (AMBER ), 50.66 to < 57.8 (AMBER ) and 57.8 to 70 (GREEN) {i.e. 0.65 × 68 = 44.2, 0.745 × 68 = 50.66, 0.85 × 68 = 57.8}

[Note: The continuity correction (-0.5 deduction) needs to be applied to the observed count from the study for each criterion prior to assessing into which signal band it falls]

In accordance with the multi-criteria aim, the decision to proceed would be based on the worst signal (as in Table 4)

n s number of screened patients who are eligible to being randomised, n r number of eligible patients randomised, n i number of patients randomised to the intervention arm

* A C is calculated from the 1-sided upper 95% confidence limit for the null proportion: R UL + z 1−α √(( R UL (1 − R UL ))/ n ) where z 1−α = 1.645 (for 1-sided 5% significance test)

Overall sample size requirement should be dictated by the multi-criteria approach. This is illustrated in Table ​ Table4 4 where we have three progression criteria each with a different denominator population. For recruitment uptake, the denominator denotes the total number of children screened and the numerator the number of children randomised; for follow-up, the denominator is the number of children randomised with the numerator being number of those randomised who are successfully followed up; and lastly for treatment fidelity, the denominator is the number allocated to the intervention arm with the numerator being the number of children who were administered the treatment correctly by the dietician. In the example in order to meet the individual ≥ 90% power requirement for all three criteria we would need: (i) for recruitment, the number to be screened to be 78; (ii) for treatment fidelity, the number in the intervention arm to be 34; and (iii) for follow up, the number randomised to be 44. In order to determine the overall sample size for the whole study, we base our decision on the criterion that requires the largest numbers, which is the treatment fidelity criterion which requires 68 to be randomised. We cannot base our decision on the 78 required to be screened for recruitment because this would give only an expected number of 28 randomised (i.e. 35% of 78). If we expect 35% recruitment uptake, then we need to inflate the total 68 (randomised) to be 195 (1/0.35 × 68) children to be screened (rounded to 200). This would give 99.9%, 90% and 98.8% power for criteria (i), (ii) and (iii), respectively (assuming 68 of the 200 screened are randomised), giving a very reasonable collective 88.8% power of rejecting the null hypotheses over the three criteria if the alternative hypotheses (for acceptable feasibility outcomes) are true in each case.

Inherent in our approach are the probabilities around sample size, power and hypothesised feasibility parameters. For example, taking the cut-offs from treatment fidelity as a feasibility outcome from Table ​ Table4 4 (ii), we set a lower GREEN zone limit of G LL = 0.75 (“acceptable” (hypothesised alternative value)) and an upper RED zone limit of R UL = 0.5 (“not acceptable” (hypothesised null value)) for rejecting the null for this criterion based on 90% power and a 1-sided 5% significance level (alpha). Figure ​ Figure1 1 presents the normal probability density functions for ε , for the null and alternative hypotheses. In the illustration this would imply through normal sampling theory that if G LL holds true (i.e. true recruitment uptake ( ε ) = G LL ) there would be the following:

  • A probability of 0.1 (type II error probability β) of the estimate falling within RED/AMBER R zones (i.e. blue shaded area under the curve to the left of A C where the test result will be non-significant ( p ≥ 0.05))
  • Probability of 0.4 of it falling in the AMBER G zone (i.e. area under the curve to the right of A C but below G LL )
  • Probability of 0.5 of the estimate falling in the GREEN zone (i.e. G LL and above).

An external file that holds a picture, illustration, etc.
Object name is 40814_2021_770_Fig1_HTML.jpg

Illustration of power using the 1-tailed hypothesis testing against the traffic light signalling approach to pilot progression. E , observed point estimate; R UL , upper limit of RED zone; G LL , lower limit of GREEN zone; Ac , cut-off for statistical significance (at the 1-sided 5% level); α , type I error; β , type II error

If R UL (the null) holds true (i.e. true feasibility outcome ( ε ) = R UL ), there would be the following:

  • A probability of 0.05 (one-tailed type I error probability α ) of the statistic/estimate falling in the AMBER G /GREEN zones (i.e. pink shaded area under the curve to the right of A C where the test result will be significant ( p < 0.05) as shown within Fig. ​ Fig.1 1 )
  • Probability of 0.45 of it falling in the AMBER R zone (i.e. to the left of A C but above R UL )
  • Probability of 0.5 of the estimate falling in the RED zone (i.e. R UL and below)

Figure ​ Figure1 1 also illustrates how changing the sample size affects the sampling distribution and power of the analysis around the set null value (at R UL ) when the hypothesised alternative ( G LL ) is true. The figure emphasises the need for a large enough sample to safeguard against under-powering of the pilot analysis (as shown in the last plot which has a wider bell-shape than the first two plots and where the size of the beta probability is increased).

Figure ​ Figure2 2 plots the probabilities of making each type of traffic light decision as functions of the true parameter value (focused on the recruitment uptake example from Table ​ Table5 5 (i)). Additional file 1 presents the R code for reproducing these probabilities and enables readers to insert different parameter values.

An external file that holds a picture, illustration, etc.
Object name is 40814_2021_770_Fig2_HTML.jpg

Probability of traffic light given true underlying probability of an event using the example from Table ​ Table5 5 (i). Two plots are presented: a relating to normal approximation approach and b relating to binomial exact approach. Based on n = 200, R UL = 40 and G LL = 70

The methodology introduced in this article provides an innovative formal framework and approach to sample size derivation, aligning sample size requirement to progression criteria with the intention of providing greater transparency to the progression process and full engagement with the standard aims and objectives of pilot/feasibility studies. Through the use of both alpha and beta parameters (rather than alpha alone), the method ensures rigour and capacity to address the progression criteria by ensuring there is adequate power to detect an acceptable threshold for moving forward to the main trial. As several key process outcomes are assessed in parallel and in combination, the method embraces a composite multi-criterion approach that appraises signals for progression across all the targeted feasibility measures. The methodology extends beyond the requirement for ‘sample size justification but not necessarily sample size calculation’ [ 28 ].

The focus of the strategy reported here is on process outcomes, which align with the recommended key objectives of primary feasibility evaluation for pilot and feasibility studies [ 2 , 24 ] and necessary targets to address key issues of uncertainty [ 29 ]. The concept of justifying progression is key. Charlesworth et al. [ 30 ] developed a checklist for intended use in decision-making on whether pilot data could be carried forward to a main trial. Our approach builds on this philosophy by introducing a formalised hypothesis test approach to address the key objectives and pilot sample size. Though the suggested sample size derivation focuses around the key process objectives, it may also be the case that other objectives are also important, e.g. assessment of precision of clinical outcome parameters. In this case, researchers may also wish to ensure that the size of the study suitably covers the needs of those evaluations, e.g. to estimate the SD of the intended clinical outcome, then the overall sample size may be boosted to cover this additional objective [ 10 ]. This tallies with the review by Blatch-Jones et al. [ 31 ] who reported that testing recruitment, determining the sample size and numbers available, and the intervention feasibility were the most commonly used targets of pilot evaluations.

Hypothesis testing in pilot studies, particularly in the context of effectiveness/efficacy of clinical outcomes, has been widely criticised due to the improper purpose and lack of statistical power of such evaluations [ 2 , 20 , 21 , 23 ]. Hence, pilot evaluations of clinical outcomes are not expected to include hypothesis testing. Since the main focus is on feasibility the scope of the testing reported here is different and importantly relates back to the recommended objectives of the study whilst also aligning with nominated progression criteria [ 2 ]. Hence, there is clear justification for this approach. Further, for the simple 3-tiered approach hypothesis testing is somewhat hypothetical: there is no need to physically carry out a test since the zonal positioning of the observed sample statistic estimate for the feasibility outcome will determine the decision in regard to progression; thus adding to the simplicity of the approach.

The link between the sample size and need to adequately power the study to detect a meaningful feasibility outcome gives this approach the extra rigour over the confidence interval approach. It is this sample size-power linkage that is key to the determination of the respective probabilities of falling into the different zones and is a fundamental underpinning to the methodological approach. In the same way as for a key clinical outcome in a main trial where the emphasis is not just on alpha but also on beta thereby addressing the capacity to detect a clinically significant difference, similarly, our approach is to ensure there is sufficient capacity to detect a meaningful signal for progression to a main trial if it truly exists. A statistically significant finding in this context will at least provide evidence to reject RED (signifying a decision to STOP) and in the 4-tiered case it would fall above AMBER R (decision to major-AMEND); hence, the estimate will fall into AMBER G or GREEN (signifying a decision to minor-AMEND or GO, respectively). The importance of adequately powering the pilot trial to address a feasibility criterion can be simply illustrated. For example, if we take R UL as 50% and G LL as 75% but with two different sample sizes of n = 25 and n = 50; the former would have 77.5% power of rejecting RED on the basis of a 1-sided 5% alpha level whereas the larger sample size would have 97.8% power of rejecting RED. So, if G LL holds true, there would be 20% higher probability of rejecting the null and being in the AMBER G /GREEN zone for the larger sample giving an increased chance of progressing to the main trial. It will be necessary to carry out the hypothesis test for the extended 4-tier approach if the observed statistic ( E ) falls in the AMBER zone to determine statistical significance or not, which will inform whether the result falls into the ‘minor’ or ‘major’ AMBER sub-zones.

We provide recommended sample sizes within a look-up grid relating to perceived likely progression cut-points to aid quick access and retrievable sample sizes for researchers. For a likely set difference in proportions between hypothesised null and alternative parameters of 0.15 to 0.25 when α = 0.05 and β = 0.1 the corresponding total sample size requirements for the approach of normal approximation with continuity correction take the range of 33 to 100 (median 56) [similarly these are 33–98 (median 54) for the binomial exact method]. Note, for treatment fidelity/adherence/compliance particularly, the marginal difference could be higher, e.g. ≥ 25%, since in most situations we would anticipate and hope to attain a high value for the outcome whilst being prepared to make necessary changes within a wide interval of below par values (and providing the value is not unacceptably low). As this relates to an arm-specific objective (relating to evaluation of the intervention only), then a usual 1:1 pilot will require twice the size; hence, the arm-specific sample size powered for detecting a ≥ 25% difference from the null would be about 34 (or lower)—as depicted from our illustration (Table ​ (Table4 4 (ii), equating to n ≤ 68 overall for a 1:1 pilot; intervention and control arms). Hence, we expect that typical pilot sizes of around 30–40 randomised per arm [ 16 ] would likely fit with the proposed methodology within this manuscript (the number needed for screening being extrapolated upward of this figure) but if a smaller marginal difference (e.g. ≤ 15%) is to be tested then these sample sizes may fall short. We stress that the overall required sample size needs to be carefully considered and determined in line with the hypothesis testing approach across all criteria ensuring sufficiently high power. In our paper, we have made recommendations regarding various sample sizes based on both the normal approximation (with continuity correction) and binomial exact approaches; these are conservative compared to the Normal approximation (without continuity correction).

Importantly, the methodology outlines the necessary multi-criterion approach to the evaluation of pilot and feasibility studies. If all progression criteria are performing as well as anticipated (highlighting ‘GO’ according to all criteria), then the recommendation of the pilot/feasibility study is that all criteria meet their desired levels with no need for adjustment and the main trial can proceed without amendment. However, if the worst signal (across all measured criteria) is an AMBER signal, then adjustment will be required against those criteria that fall within that signal. Consequently, there is the possibility that the criteria may need subsequent re-assessment to re-evaluate processes in line with updated performance for the criteria in question. If one or more of the feasibility statistics fall within the RED zone then this signals ‘STOP’ and concludes that a main trial is not feasible based on those criteria. This approach to collectively appraising progression based on the results of all feasibility outcomes assessed against their criteria will be conservative as the power of the collective will be lower than the individual power of the separate tests; hence, it is recommended that the power of the individual tests is set high enough (for example, 90–95%) to ensure the collective power is high enough (e.g. at least 70 or 80%) to detect true ‘GO’ signals across all the feasibility criteria.

In this article, we also expand the possibilities for progression criterion and hypothesis testing where the AMBER zone is sub-divided arbitrarily based on the significance of the p value. This may work well when the AMBER zone has a wide range and is intended to provide a useful and workable indication of the level of amendment (‘minor’ (non-substantive) or ‘major’ (substantive)) required to progress to the main trial. Examples of substantial amendments include study re-design with possible re-appraisal and change of statistical parameters, inclusion of several additional sites, adding further data recruitment methods, significant reconfiguration of exclusions, major change to the method of delivery of trial intervention to ensure enhanced treatment fidelity/adherence, enhanced measures to systematically ensure greater patient compliance with allocated treatment, additional mode(s) of collecting and retrieving data (e.g. use of electronic data collection methods in addition to postal questionnaires). Minor amendments include small changes to the protocol and methodology, e.g. addition of one or two sites for attaining a slightly higher recruitment rate, use of occasional reminders in regard to treatment protocol and adding a further reminder process for boosting follow up. For the most likely parametrisation of α = 0.05/β = 0.1, the AMBER zone division will be roughly at the midpoint. However, researchers can choose this point (the major/minor cut-point) based on decisive arguments around how major and minor amendments would align to the outcome in question. This should be factored within the process of sample size determination for the pilot. In this regard, a smaller sample size will move A C upwards (due to increased standard error/reduced precision) and hence increase the size of the AMBER R zone in relation to AMBER G (whereas a larger sample size will shift A C downwards and do the opposite, increasing the ratio of AMBER G :AMBER R ). From Table ​ Table1, 1 , for smaller sample sizes (related to 80% power) the AMBER R zone makes up 56–69% of the total amber zone across presented scenarios, whereas this falls to 47–61% for samples (related to 90% power) and 41–56% for larger samples (related to 95% power) for the same scenarios. Beyond our proposed 4-tier approach, other ways of providing an indication of level of amendment could include evaluation and review of the point and interval estimates or by evaluating posterior probabilities via a Bayesian approach [ 14 , 32 ].

The methodology illustrated here focuses on feasibility outcomes presented as percentages/proportions, which is likely to be the most common form for progression criteria under consideration. However, the steps that have been introduced can be readily adapted to any feasibility outcomes taking a numerical format, e.g. rate of recruitment per month per centre, count of centres taking part in the study. Also, we point out that in the examples presented in the paper (recruitment, treatment fidelity and percent follow-up), high proportions are acceptable and low ones not. This would not be true for, say, adverse events where a reverse scale is required.

Biased sample estimates are a concern as they may result in a wrong decision being made. This systematic error is over-and-above the possibility of an erroneous decision being made on the basis of sampling error; the latter may be reduced through an increased pilot sample size. Any positive bias will inflate/overestimate the feasibility sample estimate in favour of progressing whereas a negative bias will deflate/underestimate it towards the null and stopping. Both are problematic for opposite reasons; for example, the former may inform researchers that the main trial can ‘GO’ ahead when in fact it will struggle to meet key feasibility targets, whereas the latter may caution against progression when in reality the feasibility targets of a main trial would be met. For example, in regard to the choice of centres (and hence practitioners and participants), a common concern is that the selection of feasibility trial centres might not be a fair and representative sample of the ‘population’ of centres to be used for the main trial. It may be that the host centre (likely used in pilot studies) recruits far better than others (positive bias), thus exaggerating the signal to progress and subsequent recruitment to the main trial. Beets et al. [ 33 ] ‘define “risk of generalizability biases” as the degree to which features of the intervention and sample in the pilot study are NOT scalable or generalizable to the next stage of testing in a larger, efficacy/effectiveness trial … whether aspects like who delivers an intervention, to whom it is delivered, or the intensity and duration of the intervention during the pilot study are sustained in the larger, efficacy/effectiveness trial.’ As in other types of studies, safeguards regarding bias should be addressed through appropriate pilot study design and conduct.

Issues relating to progression criteria for internal pilots may be different to those for external pilots and non-randomised feasibility studies. The consequence of a ‘stop’ within an internal pilot may be more serious for stakeholders (researchers, funders, patients) as it would bring an end to the planned continuation into the main trial phase, whereas there would be less at stake for a negative external pilot. By contrast, the consequence of a ‘GO’ signal may work the other way with a clear and immediate gain for the internal pilot whereas for an external pilot, the researchers would still need to apply and get the necessary funding and approvals to undertake an intended main trial. The chances of falling into the different traffic light zones are likely to be quite different between the two designs. Possibly external pilot and feasibility studies are more likely to have estimates falling in and around the RED zone than for internal pilots, reflecting the greater uncertainty in the processes for the former and greater confidence in the mechanisms for trial delivery for the latter. However, to counter this, there are often large challenges with recruitment within internal pilot studies where the target population is usually spread over more diverse sites than may be expected for an external pilot. Despite this possible imbalance, the interpretation of zonal indications remains consistent for external and internal pilot studies. As such, our focus with regard to the recommendations in this article are aligned to requirements for external pilots, though application of this methodology to a degree may similarly hold for internal pilots (and further, to non-randomised studies that can include progression criteria—including longitudinal observational cohorts with the omission of the treatment fidelity criterion).

Conclusions

We propose a novel framework that provides a paradigm shift towards formally testing feasibility progression criteria in pilot and feasibility studies. The outlined approach ensures rigorous and transparent reporting in line with CONSORT recommendations for evaluation of STOP-AMEND-GO criteria and presents clear progression sign-posting which should help decision-making and inform stakeholders. Targeted progression criteria are focused on recommended pilot and feasibility objectives, particularly recruitment uptake, treatment fidelity and participant retention, and these criteria guide the methodology for sample size derivation and statistical testing. This methodology is intended to provide a more definitive and rounded structure to pilot and feasibility design and evaluation than currently exists. Sample size recommendations will be dependent on the nature and cut-points for multiple key pre-defined progression criteria and should ensure a sufficient sample size for other feasibility outcomes such as review of the precision of clinical parameters to better inform main trial size.

Acknowledgements

We thank Professor Julius Sim, Dr Ivonne Solis-Trapala, Dr Elaine Nicholls and Marko Raseta for their feedback on the initial study abstract.

Abbreviations

Alpha ( )Significance level (Type I error probability)
AMBER AMBER sub-zone split adjacent to the GREEN zone (within 4-tiered approach)
AMBER AMBER sub-zone split adjacent to the RED zone (within 4-tiered approach)
AMBER-statistical significance threshold (within the AMBER zone) where an observed estimate below the cut-point will result in a non-significant result (p ≥ 0.05) and figures at or above the cut-point will be significant (p < 0.05)
% expressed as a percentage of the sample size
Beta (β)Type II error probability
Estimate of feasibility outcome
True feasibility parameter
Lower Limit of GREEN zone
Sample size (n = number of patients screened; n = number of patients randomised; n = number of patients randomised to the intervention arm only)
Power = 1-Beta(1 – Type II error probability)
Upper Limit of RED zone

Mathematical formulae for derivation of sample size

The required sample size may be derived using normal approximation to binary response data—using a continuity correction, via Fleiss et al. [ 26 ] if the convention of np > 5 and n ( 1 − p ) > 5 holds true:

where R UL = upper limit of RED zone; G LL = lower limit of GREEN zone; z 1− α = one-sided statistical significance level (type I error probability); z 1−β = beta (type II error probability)

Authors’ contributions

ML and CJS conceived the original methodological framework for the paper. ML prepared draft manuscripts. KB and GMcC provided examples and illustrations. All authors contributed to the writing and provided feedback on drafts and steer and suggestions for article updating. All authors read and approved the final manuscript.

KB was supported by a UK 2017 NIHR Research Methods Fellowship Award (ref RM-FI-2017-08-006).

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

COMMENTS

  1. Guidelines for Designing and Evaluating Feasibility Pilot Studies

    Pilot studies are a necessary first step to assess the feasibility of methods and procedures to be used in a larger study. Some consider pilot studies to be a subset of feasibility studies (), while others regard feasibility studies as a subset of pilot studies.As a result, the terms have been used interchangeably ().Pilot studies have been used to estimate effect sizes to determine the sample ...

  2. Guidance for conducting feasibility and pilot studies for

    Implementation trials aim to test the effects of implementation strategies on the adoption, integration or uptake of an evidence-based intervention within organisations or settings. Feasibility and pilot studies can assist with building and testing effective implementation strategies by helping to address uncertainties around design and methods, assessing potential implementation strategy ...

  3. PDF Core Guide: Pilot and Feasibility Studies

    A feasibility study is performed to assess whether some aspect of a proposed project or study will work [1]. They may also be used to estimate important parameters that ... treatment effect and a corresponding hypothesis test for the purposes of determining futility or success of the treatment in order to determine whether early stopping is ...

  4. What is a pilot or feasibility study? A review of current practice and

    Background In 2004, a review of pilot studies published in seven major medical journals during 2000-01 recommended that the statistical analysis of such studies should be either mainly descriptive or focus on sample size estimation, while results from hypothesis testing must be interpreted with caution. We revisited these journals to see whether the subsequent recommendations have changed the ...

  5. Guide to Conducting a Feasibility Study

    A feasibility study aims to make a recommendation as to the likely success of a venture. At the heart of any feasibility study is a hypothesis or question that you want to answer. Examples include "is there a demand for a X new product or product feature", "should we enter Y market" and "should we launch Z new venture".

  6. How We Design Feasibility Studies

    A practice-derived treatment hypothesis may be able to be refined efficiently by conducting a case-control feasibility study. Such a study might examine retrospectively whether better outcomes are associated with being exposed versus not being exposed to a tobacco policy. Or the same question might be addressed prospectively via a cohort study.

  7. What is a pilot or feasibility study? A review of current practice and

    Background. A brief definition is that a pilot study is a 'small study for helping to design a further confirmatory study'[].A very useful discussion of exactly what is a pilot study has been given by Thabane et al. [] Such kinds of study may have various purposes such as testing study procedures, validity of tools, estimation of the recruitment rate, and estimation of parameters such as the ...

  8. How to conduct a feasibility study: Templates and examples

    To conduct a feasibility study, hire a trained consultant or, if you have an in-house project management office (PMO), ask if they take on this type of work. In general, here are the steps they'll take to complete this work: 1. Run a preliminary analysis. Creating a feasibility study is a time-intensive process.

  9. Defining Feasibility and Pilot Studies in Preparation for ...

    A feasibility study asks whether something can be done, should we proceed with it, and if so, how. A pilot study asks the same questions but also has a specific design feature: in a pilot study a future study, or part of a future study, is conducted on a smaller scale. ... Results in Table 2 support our first hypothesis that the words 'pilot ...

  10. A tutorial on pilot studies: the what, why and how

    Feasibility studies are routinely performed in many clinical areas. It is fair to say that every major clinical trial had to start with some piloting or a small scale investigation to assess the feasibility of conducting a larger scale study: critical care [], diabetes management intervention trials [], cardiovascular trials [], primary healthcare [], to mention a few.

  11. Research

    Pilot or feasibility research articles may include qualitative research or pilot work for cost-effectiveness analysis. As a pilot or feasibility study generally does not include a formal power calculation to test a hypothesis then we recommend that confidence intervals are clearly stated with any estimates.

  12. What is a pilot or feasibility study? A review of current ...

    Results: 54 pilot or feasibility studies published in 2007-8 were found, of which 26 (48%) were pilot studies of interventions and the remainder feasibility studies. The majority incorporated hypothesis-testing (81%), a control arm (69%) and a randomization procedure (62%). Most (81%) pointed towards the need for further research.

  13. A hypothesis test of feasibility for external pilot trials assessing

    In Section 3 we will describe a formal hypothesis test of feasibility based on recruitment, follow-up, and adherence rates. We will show how null and alternative hypotheses can be defined in terms of the power which will be obtained in the definitive trial, define an appropriate test statistic, and use the statistic's sampling distribution to ...

  14. A feasibility study testing four hypotheses with phase II outcomes in

    The MRC FOCUS3 trial is a feasibility study to assess key elements in the planning of such studies. ... within each strata patients were randomised to one of two hypothesis-driven experimental ...

  15. Maximising the impact of qualitative research in feasibility studies

    Feasibility studies are increasingly undertaken in preparation for randomised controlled trials in order to explore uncertainties and enable trialists to optimise the intervention or the conduct of the trial. Qualitative research can be used to examine and address key uncertainties prior to a full trial. We present guidance that researchers, research funders and reviewers may wish to consider ...

  16. Pilot Study in Research: Definition & Examples

    Advantages. Limitations. Examples. A pilot study, also known as a feasibility study, is a small-scale preliminary study conducted before the main research to check the feasibility or improve the research design. Pilot studies can be very important before conducting a full-scale research project, helping design the research methods and protocol.

  17. Nuts and Bolts of Conducting Feasibility Studies

    Null hypothesis significance testing is not appropriate for these studies unless the sample size is properly powered. The primary tests of the intervention effectiveness hypotheses should occur in the main study, not in the studies that are serving as feasibility or pilot studies. ... Published feasibility study typologies are rare and ...

  18. A feasibility study testing four hypotheses with phase II outcomes in

    The MRC FOCUS3 trial is a feasibility study to assess key elements in the planning of such studies. ... within each strata patients were randomised to one of two hypothesis-driven experimental therapies or a common control arm (FOLFIRI chemotherapy). A 4-stage suite of patient information sheets (PISs) was developed to avoid patient overload. ...

  19. The role and interpretation of pilot studies in clinical research

    A pilot study is a requisite initial step in exploring a novel intervention or an innovative application of an intervention. Pilot results can inform feasibility and identify modifications needed in the design of a larger, ensuing hypothesis testing study. Investigators should be forthright in stating these objectives of a pilot study.

  20. Feasibility Studies: What They Are, How They Are Done, and What We Can

    Anne M. Kolenic. feasibility studies, clinical research, nursing science, evidence-based practice. ONF 2018, 45 (5), 572-574. DOI: 10.1188/18.ONF.572-574. Preview. Nursing clinical research is a growing field, and as more nurses become engaged in conducting clinical research, feasibility studies may be their first encounter. Understanding what ...

  21. Implementation of health-promoting retail initiatives in the Healthier

    The aim of this study is to explore the implementation, acceptability, and feasibility of four different health-promoting food retail initiatives to increase customers' purchase of healthy food and beverages, which were selected and developed together with food retailers: 1) Promotion of healthier breakfast cereals and products using shelf ...

  22. Methodological reporting in feasibility studies: a descriptive review

    This study design remains misused, as evidenced by the fact that a large proportion of feasibility studies in the nursing intervention research literature still focus on hypothesis testing. Another objective was to identify the rationales and key references to support sample sizes in feasibility studies in nursing intervention research.

  23. The #HOPE4LIVER Single-Arm Pivotal Trial for Histotripsy of Primary and

    The THERESA study , a feasibility trial in hepatocellular carcinoma and hepatic metastases, ... indicating that the null hypothesis was rejected and the primary safety goal was met. Two participants experienced Common Terminology Criteria for Adverse Events grade 3 events (sepsis in a participant with a pre-existing indwelling biliary stent ...

  24. Determining sample size for progression criteria for pragmatic pilot

    The current CONSORT guidelines for reporting pilot trials do not recommend hypothesis testing of clinical outcomes on the basis that a pilot trial is under-powered to detect such differences and this is the aim of the main trial. It states that primary evaluation should focus on descriptive analysis of feasibility/process outcomes (e.g. recruitment, adherence, treatment fidelity).

  25. EE1 moves to geothermal feasibility study

    Earths Energy's new techno-economic feasibility study for its geothermal projects will provide a clear path to maximise energy production. It follows a review of project data and status by ...

  26. RFA-AT-25-001: HEAL Initiative: Pain Research Enhancement Program (PREP

    The purpose of this HEAL Initiative program is to: (1) support basic and mechanistic pain research from R15-eligible undergraduate-focused serving institutions, health professional schools or graduate schools; (2) promote integrated, interdisciplinary research partnerships between Principal Investigators (PIs) from R15-eligible institutions and investigators from U.S. domestic institutions ...

  27. Determining sample size for progression criteria for pragmatic pilot

    They reviewed 54 pilot and feasibility studies published in 2007-2008, of which 81% incorporated hypothesis testing of clinical outcomes. Similarly, Leon et al. stated that a pilot is not a hypothesis testing study: safety, efficacy and effectiveness should not be evaluated. Despite this, hypothesis testing has been commonly performed for ...