In this blog post, I'll talk about the "access logs," which are definitely something you encounter regularly when it comes to the operation and maintenance of web servers.
In recent years, nginx's access logs have surpassed Apache in terms of global market share. I'd like to explain how to view, configure, and locate nginx access logs.
Test environment
Linux
OS: AlmaLinux release 9. 2 (VirtualBox 7.0.12 )
Middleware: nginx (1:1.20.1-14.el9_2.1.alma.1), HTTP(80)
Browser
Chrome: 120.0.6099.217 (Official Build) (64-bit)
Test page
Domain: example.com
※Access via hosts file modification due to localhost environment
HTML: index.html (for the top page), FAQ.html (for the FAQ page)
The location of nginx access logs and log examples
The default location for access logs is "/var/log/nginx/access.log". If you just want to quickly check the access logs, it's recommended to open them with the "less" command, which has a light load.
|
log entry when accessing to URL: example.com (index.html)
192.168.33.1 - - [17/Jan/2024:08:47:50 +0000] "GET / HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
log entry when accessing index.html (internal link) → FAQ.html
192.168.33.1 - - [17/Jan/2024:08:50:33 +0000] "GET /FAQ.html HTTP/1.1" 200 34 "http://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "-"
I've set up a site on my local environment to accept access requests to example.com, and extracted some log entries in case of accessing from the browser.
These logs are from accessing the example.com top page (index.html) and then navigating to the FAQ page (FAQ.html).
While the initial IP address and timestamp are straightforward, the subsequent parts might be a bit hard to understand. I'll explain by comparing them to the configuration items.
Log Format
The basic configuration file for nginx is located at "/etc/nginx/nginx.conf".
Within this file, the "log_format" directive defines the format of the access logs.
※ The output destination of the access logs is also being defined.
1 less /etc/nginx/nginx.conf
2 ~Some excerpts~
3 http {
4 log_format main '$remote_addr - $remote_user [$time_local] "$request" '
5 '$status $body_bytes_sent "$http_referer" '
6 '"$http_user_agent" "$http_x_forwarded_for"';
The part "log_format main" defines the format name as "main".
Following that, the format specifies what content to output, which consists of nginx variables along with formatting elements like hyphens and braces to format the display.
Log Format Explanation / Comparison with Access Log (2️)
Log Format | Content | Access Log 2️⃣ Value | Remarks |
$remote_addr | Connected IP Address | 192.168.33.1 | For directly requested IP, the LB’s IP is recorded when requested via LB. |
- | Hyphen as a separator | - | |
$remote_user | Username for basic authentication | - (blank) | Basic authentication is often used during development or maintenance, hence typically blank. |
[$time_local] | [Local time at the completion of processing + Timezone] | [17/Jan/2024:08:27:22 +0000] | "+0000" represents the time difference "+0000" corresponds to UTC (Standard Time) "+0900" corresponds to JST (Japan Standard Time) |
"$request" | "Request content" (Method, Request path, HTTP version) | "GET /FAQ.html HTTP/1.1" | This means that a request that "GET (Display) the page of FAQ.html with HTTP/1.1" has been received. |
$status | "Status Code" | 200 (Successful) | |
$body_bytes_sent | "Bytes Sent to Client" | 34 (byte) | Bytes in the main data (body) of FAQ.html for example. |
"$http_referer" | "Referrer" (Access Source URL) | ※top page | Access from the top page to FAQ: "-" (blank) indicates direct access via URL. |
"$http_user_agent" | "User Agent" (Browser & OS Information) | "Mozilla/5.0 (Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" | Access from Windows OS using a Chrome-based browser. |
"$http_x_forwarded_for" | "X-Forwarded-For" (Source IP) | "-" | When accessed via proxy or LB, the IP address of the preceding source is displayed. |
A lot of information can be obtained
A more concise summary of the above table would be like:
IP: 192.168.33.1
Username: None (Not authenticated)
Access Time: January 17, 2024, 08:27:22 UTC (+9 hours for Japan Standard Time)
Destination: FAQ site page (FAQ.html)
Connection Status: Successful (200)
Data Size: 4B (bytes)
Access Source: From the top page (http://example.com)
Environment: Using Chrome-based browser on Windows OS (as declared)
Through LB or Proxy: Not routed through LB or Proxy (as here is empty)
In this way, quite a lot of information can be obtained from access logs.
By aggregating this information, it is possible to investigate access trends and whether access is malicious or not.
Even the default log format is extremely useful, let's make the most of it!
Glossary
Basic Authentication
This is a simple authentication feature that requires the entry of a predetermined username and password name.
Since it is the simplest and the most basic thing, it’s just intended for basic and temporary use cases, such as during construction or emergency maintenance.
Especially over HTTP (port 80) connections, authentication information is transmitted in plaintext (unencrypted), making it vulnerable from a security standpoint. Therefore, even for temporary use, it is recommended to have the site operating exclusively over HTTPS (port 443) connections, where data is encrypted during transmission.
Referer
This refers to the previous URL with a link to the page being accessed.
This is a mechanism where, if you open the homepage from a Google search, Google's URL is recorded in the log. Similarly, if you open the FAQ page from the site's homepage, the homepage's URL is logged.
This term is actually a misspelling of the English word "referrer" which means the source of a reference. Interestingly, it was adopted with its misspelled form during the specification process and continues to be used in that manner to this day, creating an amusing history of it.
HTTP status codes
The third digit of the number is important, and I will omit the rest as it would be lengthy to include all the details.
2xx: Success response
3xx: Redirect response
4xx: Client error response
5xx: Server error response
As shown above, the third digit generally indicates the status.
The most common codes we see are 200 (success), 302 (temporary redirect), 404 (non-existent location is not accessible), and 503 (server is unable to process).
User-Agent
The term "user agent" refers to the software used for communication with a website.
Typically, accessing websites involves using a web browser, so the term has come to refer to the information about the browser (along with additional information such as the operating system, etc.) that the user is using.
X-Forwarded-For
When load balancers (LB) or proxies communicate, the header that specifies the originating IP address is called "X-Forwarded-For."
In cases where communication occurs between the client (user) and the web server via a load balancer or proxy, the web server records the IP address of the load balancer or proxy, but not the IP address of the original client.
For this reason, it is the de facto standard to store the source IP in an "X-Forwarded-For" (header) when communicating through an LB or Proxy.
Side note: Defining the name "main" in the log format.
Why do we define names? Because the log format to be used is specified by name when configuring the log output.
1 less /etc/nginx/nginx.conf
2 ~Some excerpts~
3 access_log /var/log/nginx/access.log main;
The "access_log" directive is used to specify the destination for outputting log files. Since the defined item and the item to be used are different, naming is necessary.
In other words, multiple definitions can be set.
For example, a simplified log format with less unnecessary information can be defined as "easy," while a log format with more detailed information can be defined as "detailed".
This allows you to use different definitions for different domains and environments.
What happens if a format name is not specified in the access_log directive?
It's possible that there are cases where there is no specification of a format name. In such cases, there are no issues with syntax checking or functionality.
If a format name is not specified in the access_log directive, even though it's not explicitly written in the conf file, the built-in "combined" definition is used as the default setting.
1 log_format combined '$remote_addr - $remote_user [$time_local] '
2 '"$request" $status $body_bytes_sent '
3 '"$http_referer" "$http_user_agent"';
The usage of the above definition is documented in the nginx official documentation.
It is subtly different in terms of format from the "main" defaultly written in the conf file as "$http_x_forwarded_for" is not specified at the end.
By the way, this default "combined" definition in nginx shares the same name and output format as in Apache.
Summary
Apache logs are often accessed and contain a wealth of information.
On the other hand, there are fewer opportunities to work with Nginx compared to Apache, so I thought it would be convenient to write an article summarizing the information about it.
I personally find nginx logs easier to understand and prefer them over Apache's log format specifications.
I hope this article could provide some useful knowledge to those who read it.
Thank you very much.
Reference
Module ngx_http_log_module
Module ngx_http_core_module
The 'Basic' HTTP Authentication Scheme
Referer
HTTP Response status code
User agent
X-Forwarded-For
This blog post is translated from a blog post written by Nakamura on our Japanese website Beyond Co..
Коментарі