A practical guide on conducting a literature search in PubMed

Hello, everyone! Today I come to share with you a skill that I’ve been developing that has been very helpful in my day-to-day whenever I have a question about a topic in the scientific literature that I need to learn more about. That skill is the ability to perform a good bibliographic search on data platforms, one of the main ones being PubMed. I will also provide a very simple Excel spreadsheet that I use to generate the search mechanisms of my interest.

My post will follow this flow:

1 - Introduction: the importance of a good literature search
2 - Establishing a research question and focus
3 - Basic concepts: MeSH terms, boolean operators and special characters
4 - Performing an adequate and straightforward literature search in PubMed

1 - Introduction: the importance of a good literature search

There exist two forms of knowledge: one is knowing the subject matter directly, and the other is knowing where to locate information about it.

As of the date I began writing this topic (October 1, 2023), typing the code “all[sb]” into PubMed revealed that the platform contained 36,272,745 publications. Of particular interest to our forum, I also decided to search for the number of articles published related to Parkinson’s Disease, finding 138,030. Below, I will also place some graphs illustrating how quickly these publications are growing year after year.

As you can see, this is a very large number of publications that are annually released on PubMed, including on Parkinson’s Disease, with an even greater increase in Parkinson’s Disease publications relative to the total number of PubMed publications, showing the relevance the disease has been gaining. And let it be clear: these results apply only to PubMed. Today we know that multiple other databases are useful for research such as Embase, Web of Science, and Cochrane.

Now, I pose the relevant question to you: how do I find the information I want in an ever-growing ocean of articles like this? Well, this is the purpose of this topic.

2 - Establishing a research question and focus

Many questions impact the type of information you wish to obtain, such as your pre-existing knowledge on the subject and the focus of the information you desire. Basically, keep one thing in mind initially when you want to do a literature review: is the question I want to answer or the topic I want to learn about general or specific?

For example, if you are a person with very little knowledge in Parkinson’s genetics who wishes to delve deeper and initially understand the subject, it is advisable that you start by reading narrative literature reviews published in quality scientific journals and not initially read more specific articles. Therefore, you should use keywords or article restriction mechanisms that preferably point you to these review articles. In these cases where our search is more introductory than specific, perhaps even a more simplified search on Google or Google Scholar with simple free terms may even answer your question without the need to use PubMed (example in Parkinson’s genetics). This method also applies to systematic reviews and meta-analyses, which slightly narrows the research question (example considering levodopa-induced dyskinesia).

On the other hand, if your desire is to answer a very specific research question or you wish to evaluate methodologies or individual results of other articles of your interest in a specific area, you will probably benefit from elaborating a more detailed search mechanism to use in PubMed or other platforms (example: what articles exist that use machine learning to discover subtypes of Parkinson’s Disease?). However, one observation: sometimes, you may locate the articles of your specific interest in literature reviews that have already been made.

Keep in mind that performing a good literature search requires knowing which key concepts you need to unite in order to find what you want. This can be a straightforward and intuitive process or a more elaborate construction. Intuitively, you could simply unite some of the terms associated with PD with the terms associated with its treatment if you want to study treatments for PD (e.g. treatment, levodopa, dopamine agonists etc). Alternatively, you could derive key concepts from more complex and structured protocols such as the frameworks PEO (Population, Exposure and Outcome), PICOS (Population/Problem/Phenomenon, Intervention, Comparison, Outcome, Study design) or SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) (reference).


Extracted from Watson M. How to undertake a literature search: a step-by-step guide. Br J Nurs. 2020 Apr 9;29(7):431-435. doi: 10.12968/bjon.2020.29.7.431. PMID: 32279549.

3 - Basic concepts: MeSH terms, boolean operators and special characters

MeSH Terms

MeSH (Medical Subject Headings) is a controlled vocabulary developed by the National Library of Medicine. It is used to index articles in PubMed and facilitate more accurate search results. MeSH terms standardize the terminology, allowing for more effective literature searches. For instance, searching for “Myocardial Infarction” using MeSH terms will also retrieve articles labeled in their keywords with related terms like “Heart Attack.”

Boolean Operators

Boolean operators (AND, OR, NOT) are used to refine PubMed searches.

AND: Narrows the search by retrieving articles that contain both terms. For example, “Parkinson’s Disease AND Genetics” will only show articles that discuss both topics.
OR: Broadens the search by retrieving articles that contain either term. For example, “Parkinson’s Disease OR Alzheimer’s Disease” will show articles discussing either condition.
NOT: Excludes articles containing a specific term from the search results. For example, “Parkinson’s Disease NOT Genetics” will exclude articles discussing the genetic aspects of Parkinson’s Disease.

Special Characters

Special characters like square brackets [ ], quotation marks “”, and asterisks * can enhance search precision.

Square Brackets: Used to specify fields, e.g., Author[au] searches for a specific author.
Quotation Marks: Used for exact phrase searches, e.g., “Parkinson’s Disease” will search for that exact phrase.
Asterisks: Used for truncation, e.g., Genet* will search for Genetics, Genetic, Genetically, etc.

4 - Performing an adequate and straightforward literature search in PubMed

Here, I will outline the steps I take when I want to make literature searches in PubMed, mostly related to specific topics. I will also provide an excel spreadsheet with a formula for creating simple yet useful literature searches.

4.1 - Define the topic of interest: Prevalence of Parkinson’s Disease in Latin American Countries (specific)
4.2 - Define the key concepts you need to address. In this case: (1) Parkinson’s Disease and (2) Prevalence/Epidemiology and (3) Latin American Countries
4.3 - Now let’s go into the search process creation using a spreadsheet I created (which is optional, you can do it the way you want). Below, here is a blank example of how it is

This sheet has some useful hyperlinks in the above part. In order, from left to right: PubMed search tool, MeSH terms search tool, search filter resources (some predetermined advanced filters that you can use in some specific situations) and a Yale MeSH analyzer (a tool in which you offer up to 20 PMIDs and it gives you a summary of the used keywords).

It is divided into 5 key concepts, which can be joined by AND, OR and NOT boolean operators. It contains a space for MeSH names and for free research terms or words. The MeSH name column is optional, therefore, you can leave it blank and just provide your simple research words or terms.

4.4 - In order to build the research mechanism for my first key concept (Parkinson’s Disease”), we should assess the MeSH website and look for Parkinson’s Disease, as shown below.

After clicking on it, a description of what is PD is displayed and, below, you can find some entry terms that will lead to this MeSH when present.

Now, we can use some of these entry terms as additional words to find our desired articles. In my experience, this enhances the number of publications we find instead of just using only the MeSH term as some authors do not specify this term in their keywords. You can also include other words that come up on your mind if they are not here and enhance this search mechanism even further. Below, I will show how my selection ended:

You will notice a few things: (1) the MeSH term is present exactly as it exists and (2) I’ve not used all different entry terms that exist and that I’ve put all the selected words in “”. These happened because I’ve noticed that just using some terms present there would necessarily include the others and because I want these terms to be found exactly as they are written. For example, an article that has “primary” and “parkinsonism” written in different phrases would be included if I hadn’t put them between “”.

4.5 - Now let’s repeat the process for the concept of “Prevalence/Epidemiology”. In this specific case, I think that using MeSH terms both for Prevalence and Epidemiology, so I will explore both.

Notice I’ve used an * near the end of “Prevalence” in order for the word “Prevalences” to also appear (which is cited as an entry term for Prevalence in MeSH).

Now what I thought would do best and how they are presented.

Notice that I’ve put both inside the same key concept and that I addressed the need for Epidemiology to be recognized by its MeSH terms using the [mh] code after it and it’s free term.

4.6 - Now let’s repeat the process for the concept of Latin America. In this case, I will request the aid of my old friend “ChatGPT” to list me all the names of Latin American countries in addition to those related to Latin America present in MeSH in order for it to be more comprehensive. You could try finding those terms in already established search filters such as the one I mentioned earlier.

Note that this MeSH term does not include a lot of different entry terms.

It is advisable to review what ChatGPT tells you, however, it seems pretty appropriate!

Note that I’ve put all the Latin American countries names in only one cell. There is no problem in that as the function will work the same way.

4.7 - Final search mechanism: in the lowest part of the sheet, you can find the junction of all formulas, as shown below:

4.8 - Evaluating search results. This link provides the search results I’ve obtained, which are 558. However, I’ve noticed a great deal of these articles are not related to what I want to know, but just tangentially mentioning the words “Prevalence” or “Epidemiology” in their abstracts. There, I’ve decided to remove the free search terms related to prevalence and epidemiology, thus yielding a much better result of 81 articles that are related to what I want (link).

Concluding remarks

Well, that’s what I had to say! I hope I’ve helped you with something useful. If you want to access, copy or download the excel spreadsheet I’ve used for this demonstration, click here (redirects to a Google Drive file). If you have any suggestions or questions, please, let me know!

4 Likes