Natural Language Processing: Media Sentiment as an Indicator of Overtourism

by Seth Borko + Skift Team - Apr 2019

Skift Research Take

Overtourism is one of the biggest challenges faced by the travel industry. Quantifying it helps diagnose and fight the issue. Our new method of measuring local sentiment toward tourists can build upon and complement existing overtourism metrics.

Report Overview

Overtourism manifests itself differently in every locale, but most destinations recognize the importance of local sentiment in understanding the problem. To evaluate this aspect of overtourism, Skift Research constructed an index that analyzes the text of a large set of over 17,000 local media reports and measures the frequency of stories that indicate a negative press sentiment toward arriving tourists.

Our theory is that struggles with overtourism will show up in the form of negative stories reported in media outlets. By measuring the level of negative tourism stories reported in the local press, we aim to create an index that can indicate overtourism. Higher results on the index indicate that communities are struggling with the negative impact of overtourism.

To test our new methodology, we focused on Iceland as a case study. We found that our new gauge of negative media sentiment worked well as an indicator of overtourism in Iceland. While there can be no one all-encompassing metric to gauge overtourism, our measurement of local media sentiment adds new information to the conversation that was previously difficult to quantify. It also gives us the ability to tie changes in overtourism back to underlying variations in the economy and visitor flows.

What You'll Learn From This Report

  • Why we believe it’s important to quantify local sentiment in tourism destination
  • Our proprietary measure of negative media sentiment toward tourists
  • Case study applying our overtourism gauge to Iceland and comparing it with existing data points
  • Negative media sentiment in Barcelona from 2015–2018
  • Description of methodology and process

Executive Summary

Tourism is one of the world’s fastest growing industries driven by changes in cultures, demographics, and incomes. This growth is beneficial for the most part, but has brought challenges as well, namely overtourism. The term “overtourism” was coined by Skift and quickly adopted by the industry. We defined the term “as a new construct to look at potential hazards to popular destinations worldwide, as the dynamic forces that power tourism often inflict unavoidable negative consequences if not managed well.”

Overtourism manifests itself differently in every locale but most destinations recognize the importance of local sentiment in understanding the problem. To understand this aspect of overtourism, Skift Research constructed an index that analyzes the text of a large set of local media reports and measures the frequency of stories that indicate a negative press sentiment towards arriving tourists.

Our theory is that struggles with overtourism will show up in the form of negative stories reported in media outlets. By measuring the level of negative tourism stories reported in the local press, we aim to create an index which can indicate overtourism. Higher results on the index indicate that communities are struggling with the negative impact of overtourism.

Analyzing news sentiment complements polling data because news stories are published daily or weekly, allowing us to update our results with a higher frequency than a poll, potentially even in real time. It also allows us to construct a backwards-looking time series based on archived articles.

We tested our theory by focusing on Iceland as a case study and found our new gauge works well as an indicator of overtourism. The index rises during the peak summer months of maximum tourist arrivals, correlates well with other data points (e.g. visitor arrivals and tourism employment), and tracks well with traditional opinion polls.

We believe that measuring local media sentiment vis-á-vis tourism is able to add new insights on top of the already well-researched discussion of overtourism.


Overtourism Overview

The term “overtourism” was coined by Skift and quickly adopted by the industry. We defined the term “as a new construct to look at potential hazards to popular destinations worldwide, as the dynamic forces that power tourism often inflict unavoidable negative consequences if not managed well.” In practice it is a qualitative descriptor, meaning different things to different people and used in an “I’ll know it when I see it” sense.

In long-standing holiday destinations and major cities, residents and governments are struggling to diagnose and treat overtourism. Venice will start charging day-trippers a tourist tax, while Thailand is closing the beach made famous by Leonardo DiCaprio to allow it time to recover from the effects of large crowds. Dubrovnik hopes to stem crowding issues by better coordinating and dispersing cruise ship arrivals.

Overtourism is a complex issue with many underlying indicators. Stakeholders are diverse and include governments, residents, natural resources, and the visitors themselves. McKinsey, in a December 2017 study, lays a strong framework for identifying issues caused by overtourism. It highlights five major problems: 1) alienated local residents, 2) degraded tourist experiences, 3) overloaded infrastructure, 4) damage to nature, and 5) threats to culture and heritage.

This report focuses on the first problem: Alienated local residents. McKinsey highlights several crucial quantitative metrics in this regard including the density of tourism (measured by the number of visitors) and tourism intensity (measured by the number of visitors per resident). In addition to these quantitative metrics, local residents’ sense of well-being should also be taken into consideration, as it’s an important indicator of quality of life that local governments are responsible for. Yet, the qualitative nature of sentiment makes it hard to measure.

We constructed an index that analyzes the text of a large set of local media reports and measures the frequency of stories that indicate a negative press sentiment towards arriving tourists.

While this does not measure local sentiment directly, we believe that in many cases local media can act as an effective proxy for resident sentiment.

Most destinations recognize the importance of local sentiment in understanding overtourism and one traditional approach to quantify this is opinion polling. These polls are useful and, in fact, McKinsey suggests this approach. It writes, “the sustainable number of visitors per square kilometer or per capita will differ from one destination to another. A more precise picture should be established through regular surveys of residents.”

Some limitations of surveys are that they can be expensive and time consuming to complete, especially if we hope to scale them across many different destinations. If we want to understand how overtourism changes over time, the logistical burden grows as we would need to field polls at a regular interval. And many destinations might only decide to start polling once it becomes clear that overtourism is an issue. There is no way to go back in time to poll past opinions, making it hard to understand what the pre-overtourism baseline of sentiment looked like.

Analyzing news sentiment complements polling data because stories are published daily or weekly, allowing us to update our results with a higher frequency than a poll, potentially even in real time. It also allows us to construct a backwards-looking time series based on archived articles. This longer time series could help us search for relationships with existing quantitative time series data with greater confidence.

And finally, as most news articles are publicly available, or available behind a paywall for a relatively small fee, this methodology can theoretically be scaled to multiple destinations cheaper and with fewer logistical headaches than a poll.

To test our new methodology, we focus on Iceland as a case study. Iceland was an early example of the destabilizing trend of overtourism. International visits to Iceland exploded starting around 2012 and have barely slowed since; reshaping the nation’s economy in the process.

We find that our new gauge of negative media sentiment works well as an indicator of overtourism. The index rises during the peak summer months of maximum tourist arrivals, as one would predict. It correlates well with other data points that are established as being important inputs into the overtourism discussion, such as visitor arrivals, visitor dispersion, and tourism employment. Our tourism media sentiment index also tracks well with traditional opinion polls.

We also began the process of expanding our index to new locations, beginning with Barcelona. The indicator shows steadily growing negative sentiment in Barcelona from 2015 through 2017 which is what we would expect to see based on our understanding of the local situation there.


Case Study: Negative Media Sentiment Towards Tourism in Iceland

We created an index to track negative media sentiment towards tourism in Iceland. The index is based on the frequency of stories published containing words that indicate tourism problems as a share of total articles published. Our theory is that struggles with overtourism will show up in the form of negative stories reported in media outlets. By measuring the level of negative tourism stories reported in the local press, we aim to create an index which can indicate overtourism. Higher results on the index indicate that communities are struggling with the negative impact of overtourism.

We tested this theory by comparing the sentiment data against tourist inputs in Iceland and found strong correlation between negative tourism sentiment and heavier tourist influx.

More details on how these were calculated are included in our methodology section below.

Data Collection

For an in-depth case study in how media text analysis can track negative tourism sentiment we decided to focus on Iceland. Iceland makes a good example because it 1) is a widely recognized case in overtourism, 2) has very high standards for tourism data collection, and 3) has high-quality English language journalism. We collected all news articles published by three English-language Icelandic media publications: The Iceland Monitor, The Reykjavík Grapevine, and Iceland Review.

  1. The Iceland Monitor is the English-language sister of Morgunblaðið, which is the largest and most-read news website in Iceland. The Iceland Monitor had an average of 337 million monthly visitors over the last six months through February 2019 according to SimilarWeb. We collected the text of 7,755 articles published on The Iceland Monitor from January 2015 through March 2019.
  2. The Reykjavík Grapevine is by its description, the Icelandic tourist market’s most prevalent publication. It is published in English 21 times a year; monthly during the off season and biweekly during high season. It had an average of 345 million monthly visitors over the last six months through February 2019 according to SimilarWeb. We collected the text of 4,140 articles published on The Reykjavík Grapevine from January 2015 through March 2019.
  3. Iceland Review is the longest running magazine in Iceland, in continuous print since 1963, according to its website. It had an average of 253 million monthly visitors over the last six months through February 2019 according to SimilarWeb. We collected the text of 5,158 articles published on Iceland Review from February 2015 through March 2019.

Time Series Analysis

In sum, we collected 17,053 articles from January 2015 through March 2019. We then aggregated and analyzed the full text of each story to determine if they reported on a negative aspect of tourism, as per our methodology discussed elsewhere in this report. Finally, to reduce noise in the data, the time series was smoothed with a six-month moving average.

We began by examining each website independently, charting the share of negative tourist stories as a percentage of all published stories. The higher the result of this charting, the greater the level of overtourism being reported in that media outlet.

The Iceland Review, which has a lower publication frequency and is more focused on tourism, reports a higher share of such stories. The Iceland Monitor on the other hand, which has a greater focus on business and politics, has a lower share of overtourism stories.

Exhibit 1: Each media source has a different baseline but similar trends

While each source has its own baseline level of negative tourism sentiment, we find that they tend to exhibit similar overall trends. To us, this could indicate — though each has their own style — they are all still covering core underlying changes in sentiment and behavior.

For our final measurement of negative tourism sentiment in Iceland, we aggregated each of these sources. The results in the below exhibit shows the share of negative tourism stories as a share of all published stories in our three Icelandic publications over time.

Our data show a steady upward trend from 2015 through 2017, punctuated by high summer peaks. Overtourism appears to have stabilized in 2018 relative to 2017, though the overall level is still elevated.

Exhibit 2: Combining all three outlets gives us an overall sentiment gauge for Iceland

Comparison to Opinion Polling

How does this measure of tourism media sentiment compare to traditional methods of measuring sentiment such as opinion polling? The Icelandic Tourist Board has been collecting data on tourism since 2009 and polling residents on their attitudes towards foreign visitors since 2014.

The question that best tracks local views on overtourism, in our opinion, is one that asks Icelanders whether they agree with the statement that “tourist pressure on Icelandic nature is too high.” Those agreeing rose from 63% in 2014 to 79% in 2017, before declining to 75% in 2018.

Exhibit 3: Iceland runs opinion polls to gauge local sentiment

To get an approximate sense of our data on an annual basis, we resampled our daily, aggregated Icelandic data to a yearly level and then compared it to Icelandic opinion polls. The units of each measurement are different, and therefore this not a direct comparison, but overlaying the two time series on top of each other on different axes shows that both methods exhibit a similar trend. Skift’s measure of negative media sentiment towards tourism increases from 15.4% in 2015 up to 19.4% in 2017 and then moderates in 2018.

In both cases, the trend is for a steady deterioration of sentiment that peaks in 2017 and then moderates in 2018, albeit local negative sentiment remains at a heightened level.

Exhibit 4: Our media sentiment gauge directionally accurate relative to traditional polls that directly measure local tourism sentiment

While it is hard to draw firm conclusions from just four data points, this comparison gives us moderate confidence that our media sentiment indicator is directionally accurate versus traditional opinion polling methodologies in Iceland.

Drivers of Overtourism in Iceland

In order to better understand potential drivers of overtourism in Iceland, we ran a regression between our measures of overtourism with changes in on-the-ground economic and touristic trends.

More specifically, this regression was between the absolute number of negative tourism articles (overtourism articles) published in Icelandic media outlets and five different tourism factors. The data series we measured, and the impacts we believe they represent, are:

  1. Impact of overall tourism levels: The number of foreign visitor overnight stays in Iceland (STAYS)
  2. Impact of alternative accommodations: The number of overnight stays by foreign visitors in registered alternative accommodations. Registered accommodations exclude unlisted homesharing rentals which are difficult to track in official government statistics but do include registered apartment stays (ALT)
  3. Impact of visitor geographic dispersion: The number of overnight stays by foreign visitors in locations outside the capital region of Iceland (GEO)
  4. Impact of building out tourism infrastructure: The supply of registered hotel rooms available for tourists (ROOMS)
  5. Impact of tourism on the local economy: The number of Icelandic jobs in the tourism sector (JOBS)

Each of these data series were accessed from Statistics Iceland on a monthly basis and then normalized as a year-on-year growth rate. We also took the year-on-year growth rate of the smoothed six-month moving average of published overtourism articles. The regression has fairly high explanatory power with an R-squared of 60%. This means three-fifths of the change in our negative media sentiment can be explained by the above factors. The regression yields an equation of:

Exhibit 5: Regression outputs of our local media negative tourism sentiment model in Iceland

The regression coefficients in this analysis allow us to quantify and rank the biggest drivers of overtourism. The coefficients are scalars that magnify how deeply these impacts are felt. Positive coefficients increase overtourism while negative coefficients combat its effects.

Overall Tourism Levels

The absolute number of visitors arriving in Iceland is, unsurprisingly, the biggest factor hurting sentiment as regards tourists. Foreign visitor levels have exploded since 2012, when Iceland received 2.6 million foreign overnight visitor stays, to 7.8 million in 2018. The coefficient of 2.0 means that every 1% increase in visitors leads to a 2% increase in negative articles about tourists.

Exhibit 6: The level and growth of foreign overnight stays in Iceland

Increases in all of the remaining factors (stays in alternative accommodations, geographic dispersion of tourists, hotel room supply, and tourism employment) actually combat the negative perception of tourists by Icelanders and should be pursued further by the government.

Alternative Accommodations

Perhaps surprisingly, the most important factor for fighting overtourism in our regression model is the growth of visitors that stay in registered alternative accommodations. We note one caveat which is that we do not have adequate data on unregistered alternative accommodations, such as peer-to-peer apartment rentals on sites like Airbnb, which tend to have a bad reputation among locals. However, a large portion of the alternative accommodations that we can track are still apartments and whole homes available for rent (just the ones that owners or property managers bother to register). This category also includes campgrounds.

Registered alternative accommodations have growth from 0.4 million overnight stays in 2008 to 2.7 million in 2018. These alternative accommodations now account for one-third of all room nights in the country, up from one-quarter in 2008.

Exhibit 7: The level and growth of foreign overnight stays at registered alternative accommodations in Iceland

We have several theories as to why alternative accommodations are so counterintuitively helpful in fighting overtourism in Iceland. For starters, there is a much higher ratio of alternative to traditional accommodations in more rural parts of Iceland. It is likely that alternative accommodations create a strong hospitality network in further afield parts of the country and thereby encourage tourists to leave the crowds and explore outside of the main tourist hubs. In this way, alternative accommodations encourage geographic dispersion, which we discuss further below.

Exhibit 8: Alternative accommodations a much greater proportion of supply stock outside of Reykjavik

We also suspect that the stereotype of the hard-partying Airbnb-er, while grounded in truth, may be overplayed. Skift Research’s 2019 Experiential Traveler Survey found that 53% of travelers who self-report as preferring the “hotel experience,” also preferred to keep to popular areas and activities compared with a 40% rate for travelers that prefer “the Airbnb experience.”

Exhibit 9: Guests staying in alternative accommodations over hotels more likely to explore off the beaten path

Our research suggests that guests who stay in vacation rentals are more likely to seek out local experiences. While on the other hand, many visitors in Reykjavík’s hotels are transferring through to continental Europe with a one-night stopover in Iceland.

This means that in many instances, alternative accommodation guests are staying longer, traveling further throughout the country, and engaging more with the local culture. This could potentially make these guests a more powerful boost to the local Icelandic economy than those staying in hotels, and thereby engender more positive sentiment.

Geographic Dispersion

The next most important factor for fighting overtourism in Iceland in our model is the geographic dispersion of visitors. This will not be a new idea for many destinations which have already begun to step up the marketing of ‘undertouristed’ locales.

Iceland has done a nice job of dispersing tourists to date. 40% of foreigner hotel room nights are now spent outside of the capital, compared to just 25% in 2008.

Exhibit 10: Visitors to Iceland are increasingly exploring outside the capital region

This strategy comes with its own challenges. For instance, more travelers outside of Reykjavik puts greater pressure on remote environments. And we understand, that given Iceland’s remote nature, that it can also increase the demand for search and rescue when wayward tourists get trapped in snowstorms.

But overall, our model indicates that geographic dispersion is a worthwhile strategy to offset overtourism struggles. Many experts, including the McKinsey report cited above concur. Dispersing tourists throughout Iceland allows for their economic boost to be more widely felt at the same time that it takes pressure off of congested city center tourism infrastructure. Tourists traveling further afield are also, in our opinion, more likely to engage with locals and be conscientious visitors.

Tourism Infrastructure and Economics

Other strategies that can help fight overtourism include building out adequate hospitality infrastructure and making sure that tourism economics lead to business growth and job creation through smart legislation and regulation.

We believe, based on our model, that adding new hotel supply has been helpful in fighting negative tourism sentiment in Iceland. We are aware that some destinations have taken an opposite approach, such as Amsterdam, which has moved to suspend new hotel development. But these strategies are highly localized. Iceland, which was somewhat unprepared for its tourism boom, has needed to add hotel supply to keep up with demand. Lack of available hotel supply is also a contributing factor in leading visitors to unregistered accommodations which provoke local outrage.

Tourism in Iceland has become big business. In 2017, tourism directly contributed $1.8 billion (ISK 221 billion) to the economy, or 8.6% of total GDP. That is up from a 3.5% share of GDP in 2009. The tourism sector employs an average of 28,000 people throughout the year. But that spikes to a seasonal high of 32,000 jobs during the summer peak, representing just under 9% of the country’s population.

A Multipronged Approach to Fighting Overtourism Is Key

Ultimately, all tourism issues are local and our analysis is specific to Iceland. One of the benefits of creating a proxy measure of overtourism is it allows us to untangle the many contributing factors to overtourism and show how they come together in a way unique to Iceland.

The most important conclusion to be drawn from our simple overtourism model is the importance of pursuing all of these strategies, and others, in tandem. Consider again the coefficients on each of these variables. The 2.0 attached to the foreigner visits is greater than the absolute value of any other negative coefficient, and in fact greater than the absolute value of the next two combined.

This means that a 1% increase in geographic dispersion would not be enough to offset the negative sentiment impact of a 1% increase in foreign visits. One would need a 10% increase in tourism sector jobs to offset the negative sentiment impact of just a 1% increase in foreign visitors.

What our model makes clear is that no one anti-overtourism strategy should be pursued in isolation. And despite the specificity of this model to Iceland, we believe that this finding applies to most destinations. A given locale may of course have a preferred technique, but it should be implemented as part of a broader, overarching strategy designed to combat overtourism.


Expanding Media Tourism Sentiment to Additional Locations

We believe that it should be possible to expand this methodology to additional locations. We began with expanding to Barcelona, another well-known poster child for overtourism. To measure media sentiment towards in Barcelona, we collected data from two news sources.

  1. Ara, the English language version of Diaria Ara. Ara is the third most read daily newspaper in Catalonia.
  2. CatalanNews, an English-language subsidiary of the Catalan News Agency (Agència Catalana de Notícies, ACN). The ACN news agency has been operating in the region since 1999.

Altogether we collected 7,356 articles from January 2015 through March 2019.

Our data shows dramatic year-on-year increases in overtourism issues in Barcelona in 2016 and early-2017. This is followed, in our data, by an almost nearly as precipitous decline in late-2017 and 2018. It is interesting to note that in June 2015, Barcelona elected its current mayor, Ada Colau, who ran on a platform of curbing tourism excess in the city.

Colau implemented new policies including a crackdown on illegal apartment rentals, a temporary moratorium on new hotel development, and entry fees for certain popular attractions.

These actions seem to have helped the situation, based on our data. We see little seasonality in the 2018 time series data, a contrast with the sharp summer peaks that Barcelona experienced in 2016 and 2017. At the very least, media coverage of the topic has slowed significantly.

Exhibit 11: Our overall media tourism sentiment gauge for Barcelona

Comparing Iceland to Barcelona

Comparing the two datasets, it would appear that both Iceland and Barcelona experience similar amounts of negative sentiment towards tourists. This makes sense as both are well publicized examples of how destinations have struggled with surging tourist crowds. We note here that as our sources are collected from local media organizations, we believe that our data is somewhat insulated from the press coverage these cities have received at global news organizations.

It is hard to say, which destination truly has it worse with regards to overtourism as this is just one measure of local sentiment, not to mention the difficulty of comparing a city to a country. But we note that in 2017, Iceland experienced a truly bad year of sentiment towards tourists.

Exhibit 12: Comparing our negative tourism media sentiment gauge in Iceland and Barcelona

Both countries also saw sentiment improve in 2018. To us, this is evidence that awareness of the issue and smart tourism policy can be effective. Tourist crowds can develop seemingly overnight as destinations go viral, such as was the case in Dubrovnik’s star turn on Game of Thrones. And once the throngs have arrived, it seems impossible to get them to leave. But this is not the case. Though it takes time, effort, and political will, many destinations are improving their tourism problems to the benefit of all.

City residents and activists should not despair over tourist crowds. Instead they should carefully diagnose the problem, measuring local sentiment amongst many other available metrics for understanding overtourism. From there they can work to implement smart policies and regulations to target key pain points in an efficient way.


Further Research and Conclusions

There is plenty more work left to be done on this topic. Future areas of research include whether we can expand this methodology to include more destinations and to compare it with a broader range of underlying economic data points. We would also like to examine whether it is possible to make our sentiment model more sophisticated by refining our data collection and analysis techniques.

One interesting avenue to explore would be to refine our wordlists to move beyond merely negative sentiment surrounding tourists but to attempt to tie it back to specific problem areas — such as housing prices, noise levels, or pollution. We would also look to include the sentiment of destination social media and travel review websites in future iterations of this work.

While there can be no one all-encompassing metric to gauge overtourism, we believe that our measurement of local media sentiment adds new information to the conversation that was previously difficult to quantify. It also gives us the ability to tie changes in overtourism back to underlying variations in the economy and visitor flows. Our analysis demonstrates the need for multipronged responses to high levels of tourism if destinations hope to be effective in combating negative local sentiment.


Full Methodology

Prior Research

Sentiment analysis, a subset of natural language processing, is a growing field for academics and businesses alike. Services, like Crimson Hexagon, among others, are popping up to offer brands the ability to track consumer sentiment on social media.

These services are gaining mainstream traction, but for the purposes of measuring overtourism were too focused on how tourists feel about a destination. Instead, we wanted to understand how locals feel about visitors. Social media often gives the sentiment of the outside looking in, while we wanted to understand the inside view looking out.

When designing for this perspective, we modeled our approach around prior research done by Baker, Bloom, and Davis who created an Economic Policy Uncertainty Index (EPU). This approach has been highly successful. It has been cited in over 2,000 papers, is useful in forecasting economic output, and correctly spikes at times of high uncertainty such as the Gulf War, 9/11, or the Great Recession.

Exhibit 13: Economic Policy Uncertainty Index tracks well with moments of high volatility

The index, in part, tracks the sentiment of major U.S. newspapers. The index is created by scanning every news article in major papers for words that fall into three buckets — one each for “economics,” policy,” and “uncertainty.” Articles with words from all three buckets are counted and then normalized by the total volume of all articles published in a given month. From the authors’ paper:

“Our index reflects the frequency of articles in 10 leading US newspapers that contain the following triple: ‘economic’ or ‘economy’; ‘uncertain’ or ‘uncertainty’; and one or more of ‘congress’, ‘deficit’, ‘Federal Reserve’, ‘legislation’, ‘regulation’ or ‘White House’.

As we suggested in our background, the authors were able to extend this methodology back in time and across countries. They were also able to extend this methodology to new policy categories, such as by creating an index of migration fear.

We effectively recreated a methodology along these lines that extends the policy categories to tourism and uncertainty.

Primary Methodology

In this section, we will describe our methodology using the example of data collected from The Iceland Monitor as our source of news. We recreated this methodology across all of the other media outlets that we analyzed.

Data Collection

We began by running a web scraper to collect every article published by the paper and available online. For each article, we collected the headline, date published, brief description, story category, URL, and full article text.

We collected the full text for 7,451 stories published on the Iceland Monitor between August 2014 and July 2018. The Iceland Monitor published an average of 160 stories per month over this timeframe.

Data Processing

Next, we created two wordlists. The first wordlist pertained to tourism while the second pertained to problems or uncertainty. We then searched the full text of all 7.4k stories to see which contained text from both the tourism and the uncertainty wordlists. All words were stemmed, removing the suffix of each, so that our search includes plural, conjugated, and other forms of these words (i.e. “Travel” includes “Traveled,” “Traveler,” “Traveling,” and “Travels”).

Tourism Wordlist: abroad, foreign, foreigners, overseas, passenger, tourism, tourist, tourists, travel, traveler, traveller, travelling
Problem/Uncertainty Wordlist: accident, angry, arrest, arrested, backlash, blame, blamed, complaints, concern, confront, confrontation, crime, crowd, crowded, damage, danger, dangerous, dead, death, deface, destroy, die, died, disputes, disregard, disrespect, disturbance, emergency, fallout, frightened, frustrate, frustration, hospital, illegal, injure, injured, interfere, intervene, irresponsible, litter, lost, missing, negative, nuisance, offend, outrage, overcrowd, pickpocket, police, pollution, problem, problems, protests, rubbish , ruin, tension, theft, thief, thieves, trash, trashed, uncomfortable, vandal, vandalise, vandalism, vandalize, warn

We considered articles that had words from both the tourism and problem/uncertainty sets to be overtourism articles.

Example of an article with only a tourism flag

Headline: WOW air promises new US destination in 2016
Description: The CEO of Icelandic low-cost airline WOW air has said the airline expects to announce at least one more US destination in 2016 with “a lot more” to follow in the future.

Example of article with only a problem/uncertainty flag

Headline: Total destruction of building in yesterday’s blaze
Description: The Reykjavik Fire Department finds it unusual how quickly yesterday’s fire spread. The property in Miðtún 4 in Garðabær, a municipality of Reykjavik is completely destroyed.

Example of an ‘overtourism’ article with both a tourism and a problem/uncertainty flag

Headline: English signs become the norm in Reykjavik
Description: Professor of Icelandic Eirikur Rognvaldsson says that English signs in the Reykjavik city centre are a part of a more extensive problem involving the Icelandic language.

Then for each month we compared the number of overtourism articles to the total number of articles published to compute a monthly measure of overtourism frequency. The higher the score, the worse the level of perceived overtourism by the media.

Data Analysis

Below are the raw scores showing the monthly frequency of overtourism articles as a share of all articles published in the month for the Iceland Monitor.

Exhibit 14: Raw Iceland Monitor overtourism articles frequency data

To make the data easier to interpret, we smoothed the results with 6-month moving average.

The below exhibit shows the preliminary results of our analysis of overtourism sentiment in Iceland based on the text of the Iceland Monitor. The higher the index level, the worse the overtourism sentiment.

Exhibit 15: Smoothed Iceland Monitor overtourism index

These trended results show a seasonal summer peak that overlays a long-term trend towards even worse levels of overtourism.

If we seasonally adjust and rebase the index, the long-term upward trend in overtourism (i.e., worsening overtourism conditions) becomes more apparent as the summer peaks become less apparent.

Exhibit 16: Seasonal adjustments show declining sentiment reported in the Iceland Monitor

This chart also suggests to us, however, that the Iceland rate of change in overtourism has slowed. There was a big increase in dissatisfaction with tourism in 2016, but it is possible that in the intervening years Iceland has put new measures and infrastructure in place to better manage its tourists.

Alternative Methodologies

We explored several alternative methodologies before ultimately selecting the method we describe above.

Method #2: Word Count in Travel Segment Only

The Iceland Monitor tags all the articles we collected to one of five categories: news, culture and living, politics and society, nature and travel, or eat and drink. There is also an opinion and weather sections, though we excluded these articles.

In this modified methodology, the core process is the same, but we only draw source stories from the nature and travel category. We calculate overtourism from the frequency of only nature and travel overtourism stories to create our second index.

Exhibit 17: Analyzing only travel stories yields similar pattern but different magnitudes

The result is that the trends tend to be similar, but the magnitudes are different. The travel-only methodology yields the same exact median overtourism score as the primary methodology (median = 126), but does so with greater volatility (i.e., higher highs and lower lows), likely due to the fewer data points available for calculations (N = 1,664 for travel-only vs. N = 7,451 for full data set).

Ultimately, we discarded this methodology as it a) tracks fairly well with our initial methodology at a 60% correlation, and so does not add much extra informational value, but with the drawback that, b) it is not scalable as not every news source will have an in-depth local travel section, c) travel-only gives us fewer data samples to work with, and d) some potential overtourism stories are overlooked because the editors of the Iceland Monitor tag them as news, not nature and travel.

Method #3: Google Sentiment Analysis

In this methodology, we collected all Iceland Monitor articles tagged to the nature and travel category as in method #2. However, rather than analyzing our own wordlists, we fed the full text of these articles to Google’s “Cloud Natural Language” software platform. This is a professional natural language processing tool developed by Google and based on its proprietary machine learning tech.

The Google Cloud Natural Language Software takes the text of an article and generates a sentiment score based on a proprietary algorithm. The score ranges from -1 (very negative) to +1 (very positive).

After processing in the Google Cloud, every Iceland Monitor “nature and travel” article was assigned a sentiment score. We then averaged the sentiment score for all travel articles in the month.

Then, as in our primary method, we smoothed the data. For easy comparison to methods #1 and #2, we have inverted the data here so that they all read the same way: higher scores imply greater levels of overtourism.

Exhibit 18: The full range of potential overtourism sentiment measures that we considered

The dataset, perhaps unsurprisingly, does not fit as well with the first two methods based on word frequency rather than a machine learning algorithm. However, we do note that despite different absolute levels, the trend is somewhat similar. Method three shows 1) summer seasonality, 2) worsening overtourism in 2016, and 3) recent improvements. However, in all examples the trend is far more faint than in the first two methods.

This Google method has a fair correlation (53%) with the second methodology, which makes sense as both only look at articles drawn from the travel section of the Iceland Monitor. There is no correlation between the primary method and this third method.

This method is arguably more sophisticated than our primary methodology. But we ultimately passed on it for several reasons.

  1. The Google Natural Language Processing (NLP) algorithm is a black box and so it is difficult to audit and understand how it assigns sentiment scores. This both disadvantages us, the researchers, and makes it harder to communicate our findings.
  2. The algorithm is trained to be general purpose and is not travel specific.
  3. The algorithm will assign a sentiment score to any article we feed it. That means we need a way of tagging articles as being relevant to travel/tourism. In the case of the Iceland Monitor, this problem is solved by only drawing from articles in the travel category. But as discussed above, not every media organization will have a travel category tag. This means that in order to scale this methodology we would have to devote additional work to pre-tagging articles as travel related before sentiment scoring.


Further Reading