Short heading goes here

Short description goes here so it complements the heading.

""
Table of contents
Listen to article Loading…
Share

Video

test

---

Video BG

Wide Image

Booking.com team photo

Table 1

LabelDetails
Example AShort description for A.
Example BLonger description that wraps; no horizontal scroll.
Example CAnother line with more content to test wrapping.

Table 2

Model Evaluation Scores

Solution Challenge it solves How it works
Smart Filters Traditional search relied on drop-down menus and checkboxes, limiting travelers to a small number of filters. Uses GPT-4o mini to understand prompts like “sunset views” or “great gym.”

Goes beyond predefined filters by analyzing reviews, images, and listing details.

Surfaces more relevant results, driving engagement and conversions.
Property Q&A Many travelers have specific questions about properties that aren’t easily answered in a static listing. OpenAI’s LLMs were fine-tuned on user content and property descriptions.

Handles queries like “Is there a crib available?” or “Is the pool open in winter.”

Adapts to ambiguity in pet-policy definitions.
AI Review Summaries Travelers often struggle to sift through thousands of reviews when comparing properties. GPT-4o mini analyzes and summarizes reviews into themes (cleanliness, location, amenities).

Generates concise summaries, speeding decisions and boosting confidence.
Help Me Reply Manages guest communications efficiently and cuts response times. Auto-generates responses and templates via OpenAI’s models.

Hosts track replies with a reply-score metric.

Table 3

Model Evaluation Scores

GPT-4.5 GPT-4o
GPQA (science)71.4%53.6%
AIME ‘24 (math)36.7%9.3%
MMMLU (multilingual)85.1%81.5%
MMMU (multimodal)74.4%69.1%
SWE-Lancer Diamond (coding)* 32.6%

$186,125
23.3%

$138,750
SWE-Bench Verified (coding)*38.0%30.7%

* Numbers shown represent best internal performance.

Table 4

Model Evaluation: GPT-4 Comparison

GPT-4.5 GPT-4o Baseline
GPQA (science)71.4%53.6%50.0%
AIME ‘24 (math)36.7%9.3%25.0%
MMMLU (multilingual)85.1%81.5%70.0%
MMMU (multimodal)74.4%69.1%60.0%
SWE-Lancer Diamond (coding)*32.6%23.3%10.0%
SWE-Bench Verified (coding)*38.0%30.7%20.0%

* Numbers shown represent best internal performance.

Block Quotes:

This doesn't look anything like what I want

Video

test

---

Video BG

Wide Image

Booking.com team photo

Table 1

LabelDetails
Example AShort description for A.
Example BLonger description that wraps; no horizontal scroll.
Example CAnother line with more content to test wrapping.

Table 2

Booking.com Flagship AI Solutions

Solution Challenge it solves How it works
Smart Filters Traditional search relied on drop-down menus and checkboxes, limiting travelers to a small number of filters. Uses GPT-4o mini to understand prompts like “sunset views” or “great gym.”

Goes beyond predefined filters by analyzing reviews, images, and listing details.

Surfaces more relevant results, driving engagement and conversions.
Property Q&A Many travelers have specific questions about properties that aren’t easily answered in a static listing. OpenAI’s LLMs were fine-tuned on user content and property descriptions.

Handles queries like “Is there a crib available?” or “Is the pool open in winter.”

Adapts to ambiguity in pet-policy definitions.
AI Review Summaries Travelers often struggle to sift through thousands of reviews when comparing properties. GPT-4o mini analyzes and summarizes reviews into themes (cleanliness, location, amenities).

Generates concise summaries, speeding decisions and boosting confidence.
Help Me Reply Manages guest communications efficiently and cuts response times. Auto-generates responses and templates via OpenAI’s models.

Hosts track replies with a reply-score metric.

Table 3

Model Evaluation Scores

GPT-4.5 GPT-4o
GPQA (science)71.4%53.6%
AIME ‘24 (math)36.7%9.3%
MMMLU (multilingual)85.1%81.5%
MMMU (multimodal)74.4%69.1%
SWE-Lancer Diamond (coding)* 32.6%

$186,125
23.3%

$138,750
SWE-Bench Verified (coding)*38.0%30.7%

* Numbers shown represent best internal performance.

Table 4

Model Evaluation: GPT-4 Comparison

GPT-4.5 GPT-4o Baseline
GPQA (science)71.4%53.6%50.0%
AIME ‘24 (math)36.7%9.3%25.0%
MMMLU (multilingual)85.1%81.5%70.0%
MMMU (multimodal)74.4%69.1%60.0%
SWE-Lancer Diamond (coding)*32.6%23.3%10.0%
SWE-Bench Verified (coding)*38.0%30.7%20.0%

* Numbers shown represent best internal performance.

It works fine...

Block Quotes:

This doesn't look anything like what I want

FAQ

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Short question goes here

Short heading goes here