
Objective of the Challenge
As part of this data analysis module, you will explore a new approach to data processing, “Data Visualization”.
Your goal is to analyze, as a team, a dataset and tell a story using charts based on an original dataset as you would in a “Data Visualization” (or “DataViz”) competition.
The goal is not to perform complex “mathematical” demonstrations but to tell a comprehensible and interesting story for everyone. Therefore, place particular importance on this “story” you are going to tell, and on the design of your charts and presentation materials.
What is Data Visualization?
Data visualization is the process of representing information graphically to make it easier to understand, analyze, and communicate. In tourism, hospitality, and food studies, data visualization helps professionals interpret trends, customer behaviors, sales performance, and operational efficiency.
By visualizing tourism data, we move beyond raw numbers and extract insights that drive better decision-making. Whether it’s identifying peak travel seasons, targeting the right audience, or optimizing pricing and marketing strategies, data visualization empowers the tourism and hospitality industry to make smarter, data-driven choices.
Description of the Challenge
Tourism in the Occitanie region in 2018
Throughout the year, thousands of tourists stay overnight in our beautiful region.
Here you will find a unique dataset that locates them and counts them by overnight stays. You will know :
- The accommodation capacities (hotel, camping, etc.) of each department
- The origin of the tourists, whether from a French department or abroad
- The weather and the main cultural events for each day
Some rules of the game
- You can use any tools you wish to explore these data and propose a visual representation based on charts such as Excel, SPSS, PSPP, Tableau (https://www.tableau.com/), Observable HQ (https://observablehq.com/), and all Python libraries as well as any presentation support for your results such as PowerPoint, Canva, Adobe PDF…
- You must provide a list of the tools used to create the charts You can use any type of data analysis tool
- You can perform all types of calculations based on this dataset
- The format of the visual analysis will be in PDF and should not exceed the equivalent of 2 A4 pages or 3 screenshots if the creation is on the web.
- You will add all the necessary contextual elements to comment on the chart(s).
- You are not required to use all the data.
- Apart from base maps, you are not allowed to use data other than those provided.
Data for the challenge
- The overnight stay volumes were constructed by a mobile phone operator from phone call data. These data were provided by the Regional Tourism Committee (CRT).
- The data regarding accommodation capacities were constructed by TDV from data provided by the Regional Tourism Committee (CRT).
- The data regarding events were constructed by TDV from data provided by the Regional Tourism Committee (CRT).
- The weather data come from a website providing historical weather data for many cities in France and around the world.
- The geometry data of the departments are included only in the geojson file. This format is suitable for those who wish to use mapping tools such as the free software QGIS or JavaScript libraries such as d3.js.
- The cell phone location data are not raw data but the result of an innovative processing work (adjustment, segmentation, anonymization) carried out by the telephone operator with the participation of tourism stakeholders. The “volume of overnight stays” data are therefore statistical estimates.
- The datasets are usable in this framework following the agreement of Mr. Alain Otteinheimer, President of the Toulouse Dataviz association, director of DataSens.
The exhaustive description of the data can be found on the following Github repository : https://github.com/ToulouseDataViz/Hackaviz2020/blob/master/README.md
The data includes several files : Download the data
Synthetic and easy-to-access data: Nuitées.xls and .CSV
- 365 lines and 15 columns
- Overnight stays per day summarized by department
The most detailed but not the simplest to exploit: par_origines.xlsx and .csv
- 493,235 lines and 8 columns
- Per day with all the details
Crossing capacities x overnight stays: Serves as an optional complement to others
- capacites.xlsx, .csv, and .geojson
- 13 lines and 61 columns
- Per week in categories of overnight stays by department
It is possible to create beautiful visualizations from just one of these three data files, the simplest being nuitées which is an aggregate of par_origines.
The more expert will manage to combine the three, but it is not certain that the most beautiful story needs all this data.
The important thing is to tell a beautiful story with quality charts.
Details of the Files and Download
Nuitées
Aggregation of data from the par_origines file. For each day of the year (365 lines / 15 columns) :
- Date
- Number of overnight stays in department 09
- Number of overnight stays in department 11
- Number of overnight stays in department 12
- Number of overnight stays in department 30
- Number of overnight stays in department 31
- Number of overnight stays in department 32
- Number of overnight stays in department 34
- Number of overnight stays in department 46
- Number of overnight stays in department 48
- Number of overnight stays in department 65
- Number of overnight stays in department 66
- Number of overnight stays in department 81
- Number of overnight stays in department 82
- Number of overnight stays in the Occitanie region
par_origines
For each day of the year 2018 (532,399 lines / 8 columns) :
- Date
- Department or country of origin of the tourists
- Destination department in Occitanie
- Volume of overnight stays in the destination department
- Status of the holidays of the department of origin
- Noon temperature (solar) of the destination department:
- 0: not on vacation,
- 1: on vacation,
- 2: not specified
- Qualitative status of the weather in the destination department:
- 0: very unfavorable weather,
- 1: unfavorable weather,
- 2: correct weather,
- 3: favorable weather,
- 4: ideal weather
- Number of major events in the destination department
capacités
For each department (13 lines / 61 columns) :
- Department
- Name of the department
- Population of the department
- Number of places (people) in collective accommodation
- Number of places (people) in rental accommodation
- Number of places (people) in outdoor accommodation
- Number of places (people) in hotel accommodation
- Total number of places (people)
- Number of overnight stays for week 1
- Number of overnight stays for week 53
Additional data :
Coding of departments, coding of countries, and list of events.
Step-by-Step Guide: How to Perform Data Visualization with Unstructured Tourism Data
📌 Introduction: What is Data Visualization?
Data visualization is the process of turning raw data into meaningful visual insights. In the Data Visualization Challenge organized by Marketeur Expert, we have tourism-related data collected without a predefined purpose. Our goal is to make the data speak by identifying trends, patterns, and relationships that can help the tourism and hospitality industry make better decisions.
🔍 Step 1: Understanding the Available Data
Before we visualize anything, we must explore what kind of data we have. In this challenge, we have datasets related to overnight stays in France with multiple dimensions:
Dataset | Description |
---|---|
Nuitées (Overnight stays) | Number of overnight stays in different regions. |
Pays (Countries) | List of countries from which tourists originate. |
Départements (Departments) | Administrative divisions of France where stays are recorded. |
Événements (Events) | Special events that may influence tourism activity. |
Par Origines (By Origins) | Breakdown of overnight stays by country of origin. |
Capacités (Capacities) | Number of accommodations available in each region. |
💡 Key Questions to Ask at This Stage:
- What kind of variables do we have? (e.g., dates, locations, number of stays)
- Are there missing values or inconsistencies in the data?
- Can we combine different datasets to get deeper insights?
📊 Step 2: Formulating Key Questions
Since the data was collected without a specific purpose, we need to define what we want to discover. Here are some important questions we can explore:
1️⃣ Trends & Seasonality
- How do overnight stays fluctuate throughout the year?
- What are the peak tourism months in each region?
🎯 Visualization Suggestion → Line Chart to show trends over time.
2️⃣ Popular Tourist Destinations
- Which regions attract the most visitors?
- Are there differences in domestic vs. international tourism?
🎯 Visualization Suggestion → Bar Chart or Heat Map of France to highlight popular areas.
3️⃣ Visitor Origins & Market Segmentation
- Which countries send the most tourists to France?
- How do visitor preferences vary by nationality?
🎯 Visualization Suggestion → Pie Chart or Stacked Bar Chart to show the proportion of different nationalities.
4️⃣ Impact of Events on Tourism
- Do major events (festivals, sports tournaments) increase overnight stays?
- Can we see a spike in bookings around these events?
🎯 Visualization Suggestion → Before & After Line Chart to compare overnight stays before and after an event.
5️⃣ Supply vs. Demand for Accommodation
- Are there regions where hotel capacity is insufficient compared to demand?
- Where should new accommodations be developed?
🎯 Visualization Suggestion → Scatter Plot comparing available accommodation capacity vs. actual overnight stays.
🛠 Step 3: Cleaning & Preparing Data
Before we create visualizations, we must ensure the data is clean and usable. This involves:
✅ Removing duplicate records
✅ Fixing missing values (e.g., filling gaps with averages)
✅ Standardizing data formats (e.g., ensuring dates are uniform)
✅ Merging datasets where needed (e.g., combining visitor origins with overnight stays)
💡 Example: If we find that some tourist stays don’t have an associated department, we could look at the closest available data points to fill the gaps.
📌 Step 4: Choosing the Right Visualizations
Once the data is cleaned, we select the best visualization type to answer each question:
Question | Best Visualization Type |
---|---|
How does tourism vary over time ? | Line Chart |
What are the most visited regions ? | Heat Map or Bar Chart |
Where do tourists come from ? | Pie Chart or Stacked Bar Chart |
Do events influence tourism ? | Before/After Comparison Chart |
Is there enough hotel capacity ? | Scatter Plot or Heat Map |
🎨 Step 5: Creating the Visualizations
Now, we can use tools such as:
- Excel (for basic charts)
- Tableau / Power BI (for interactive dashboards)
- Python (Matplotlib, Seaborn) (for advanced visualizations)
- Google Data Studio (for online reporting)
💡 Example:
If we want to see the impact of a major music festival in Paris on hotel bookings, we could:
- Filter data for the Paris region.
- Compare overnight stays before, during, and after the festival.
- Plot a line graph to show the trend.
- Add annotations to highlight key dates.
📢 Step 6: Interpreting the Insights
Once visualizations are created, we must analyze the patterns to draw meaningful conclusions. Some possible insights:
🔍 Seasonality Insight:
“Summer months (July-August) see a 40% increase in overnight stays in coastal regions.”
📌 Action: Hotels can increase rates during peak months.
🔍 Visitor Origin Insight:
“British tourists represent the largest foreign group in Normandy.”
📌 Action: Tourism boards can create English-language marketing campaigns.
🔍 Event Impact Insight:
“The Cannes Film Festival causes a 60% spike in overnight stays.”
📌 Action: Hotels can offer special packages during the festival.
✅ Step 7: Communicating & Using the Insights
Finally, we present our findings in a clear and engaging way:
- Dashboards for hotel managers
- Reports for tourism boards
- Infographics for marketing teams
A good data visualization should be:
🎯 Clear – Avoid too much text, focus on visuals.
📊 Accurate – Ensure data is correct and up-to-date.
📢 Actionable – Provide insights that help make decisions.
🚀 Conclusion: Turning Data into Action
Even when data is collected without a clear purpose, we can still make it speak by: ✔️ Exploring the dataset
✔️ Asking the right questions
✔️ Cleaning and organizing the data
✔️ Choosing the best visualizations
✔️ Extracting meaningful insights
✔️ Using insights to improve tourism strategies
Examination Modalities
Your work will be evaluated through one of two solutions left to your choice :
- An oral presentation by group of a maximum duration of 10 minutes
- OR
- A video presentation of your DataViz, including your comments, of a maximum duration of 10 minutes to be submitted in the deposit space of this page the evening before the exam date at the latest. The video deposit space will be opened later.
In both cases, the oral presentation or the viewing of the video will be followed by questions for a maximum duration of 5 minutes.
Evaluation Criteria
The works will be evaluated according to different criteria, including the following :
Evaluation of Visualizations
Criteria | Points |
---|---|
Ability of the visualization to clarify the data | 5 |
Ability of the visualization to be easily understood | 5 |
Choice of colors appropriate to the message to be conveyed | 5 |
Ability of the visualization to faithfully transcribe the data (choice of scales or addition of effects that could mislead the audience) | 5 |
Evaluation of the Oral Presentation
Criteria | Points |
---|---|
Ability to enhance the subject (Dynamism of the presentation, ability to arouse interest) | 5 |
Quality of the presentation materials (care taken for the realization) | 5 |
Quality of the responses to questions | 5 |
Ability to explain the work carried out | 5 |
Please note, this is not a “statistical performance” but rather a test of creativity, originality, and searching for the best way to “illuminate” the data. If you attempted a complex analysis without succeeding, still present at the end of your presentation what you wanted to do and how you tried to go about it.
Oral Presentation Times
The times for oral presentations will be defined later.
Sources of Inspiration
https://www.dataviz-inspiration.com/
https://www.awwwards.com/websites/data-visualization/
Some Tools
https://www.tableau.com/fr-fr/academic/teaching
https://observablehq.com/pricing
Python and Some Libraries
https://geopandas.org/en/stable/
https://python-visualization.github.io/folium/
Some Tutorials 😉
Google Colaboratory & PandasDownload
Have fun !
This challenge is published with the permission of the Toulouse Dataviz association (https://toulouse-dataviz.fr/)