Tuesday, March 20, 2018

"no space left on device"

What you should to do if you got a message "no space left on device" on some file system (/u01 for example) ?


1.

Start from inode analysis:
# date
# df -i

and look the values Inodes, IUsed, IFree, IUse% in the output.

If all 100% inode are in use, then we suspect a lot of files were created recently in one of sub-directories.
To find biggest (by number of files) directory :
# find /u01 -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n | tail -50
You'll got a 50 biggest directories.
Delete redundant files.
Try to do it without rm command, use adrci or find with -mtime 30 option.
Deleteing many files require many time, so run these commands in screen.

Adrci example:

[oracle@edbadm01 trace]$ adrci
adrci> show homes
adrci> set home diag/rdbms/irbis/irbis1
adrci> purge -age 60
       - Purge diagnostic data older than <mins> from the ADR home
Or
adrci> purge -size 10485760000 - Purge diagnostic data from the ADR home until the size of the home reaches <bytes> bytes (approx 10G).

Adrci can remove any number of files without annoing "/bin/rm: Argument list too long" message.
Also this program will work after your terminal window is accidentally closed (or network failure).
But adrci is command from Oracle Home and it can work with Oracle trace files only.

Find example:
use find and -mtime option:
# date; find /u01/app -name *aud -mtime +30 -type f -delete
3rd way: rm. It is the least recommended way because you should to use rm to delete closed files only.  Don't use rm to remove files opened by any process. Test files with fuser command before removing.
It's very easy to make a mistake using rm, so use adrci or find if possible.


2.

If the previous step found many free inodes (ensured there are no obstacles to create a file)
then we need to investigate free space in file system:
# date; df -h

We need to understand how many free space has interestiong file system.
If there is no free space ( Size = Used, Use% = 100% ), then we need to delete some files.

Start analyze biggest directories (by sum of file size ):
# date; du -m /u01/app/|sort -rn |less 

Usually i choose the 1st biggest directory and delete files in it.
Then i choose next biggest directory and so on ... until i have enough space.

Usually i use 2 command to navigate through the catalog tree :
# du -sm * | sort -rn  - to understand which sub-directory is biggest
and
# cd  some_dir - to go to this directory

It may be useful to quickly find big files in your file system:
#  find /u01 -size +50M
Don't forget about -mtime option, it is good addition to point old files.


3.

If you've found enough number of free inodes & enough quantity of free space
but you still get "no space left on device" then this means that some process
has big opened file & this file was removed by rm command & but this process still running and keep this file opened.

Hint:
If the size of big file should be set to zero, then use > command (not rm this file) :

[oracle@ed04dbadm01 log]$ cd /u01/app/oracle/diag/rdbms/killme122/km1221/trace/
[oracle@ed04dbadm01 trace]$ ls -l alert_km1221.log
-rw-r----- 1 oracle asmadmin 430376 Mar 20 08:00 alert_km1221.log
[oracle@ed04dbadm01 trace]$ > alert_km1221.log
[oracle@ed04dbadm01 trace]$ ls -l alert_km1221.log
-rw-r----- 1 oracle asmadmin 0 Mar 20 11:11 alert_km1221.log


If you use rm to delete big files then you will get unexplained loss of disk space.

To know which files are opened use lsof command (list of open files).
It shows all open files for all processes in system, PIDs and names of processes.
Its output produce many lines on the screen so i use lsof | less .

Analyse open files - is difficult case, but you can solve this problem easy and quickly if it is possible
- to kill process[es] (kill -9) which have opened files in our file system (# lsof | grep /u01/app )
  or
- to restart or shutdown process[es] which have opened files in our file system (# lsof | grep /u01/app )
  or
- to restart OS (operation system or server).

After this you will see free space in file system.

4.
If you were not helped by previous recommendations then take new disk and extend your file system ;)
It may be not a joke.
If you inspect all your files and found all files as necessary (no extra files and no free space)
then it is become a time to grow this file system.

Good luck !

No comments:

Post a Comment

cellcli -e list diskmap

The interesting LIST DISKMAP command in the storage cells links all views to hard disks: PCI bus address as Name 252:5, OS name (/dev/sdh,...