The previous version of the code used to create the directories for the individual images for the NEB algorithm suffered in that it was somewhat overly restrictive for any potential user. That problem was partially addressed by making it so that the program only starts to copy the REACTANT and PRODUCT files from after the Route Card. The Route Card starts with the '#' in any Gaussian input file. Other suggestions for how to make the program easier in terms of input are of course welcome.
Current Issues
The bulk of this post is concerning some of the current issues of the program which are proving particularly difficult to resolve. The main problem thus far centers around the submission and execution of the Gaussian09 input files. All information presented here is from the same program which was run once producing the errors in question. This run is to be taken as representative of the issues currently facing the program.
In simplest terms the issue is that the submission of Gaussian files is producing inconsistent results. Sometimes when the files in the Image directories are submitted automatically they execute normally and produce the desired results. The other, and more common situation, is that Gaussian experiences an "Error termination" claiming that it could now find the route card.
The Code:
...
...
(This is a thread, I created it to automatically submit all the jobs in each directory after all the initial files have been created)
void *SubmitJob(void *threadID)
{
long tID;
int k;
int pass;
k = 0;
tID = (long)threadID;
pass = tID;
int c;
char ImageArray[pass][50];
char WrittingArray[7][100];
c = 0;
(creates relevant names for the various directories)
while(c < pass)
{
sprintf(ImageArray[c], "Image_%d", c + 1);
c++;
}
c = 0;
while(c < pass)
{
k = chdir(ImageArray[c]);
if(k != 0)
{
printf("Directory Change Failed!");
}
else
{
(This is the critical, and troublesome step, where the Gaussian job is submitted in its directory)
system("qsub g09.sh");
printf("Job Submitted \n");
sleep(1);
}
c++;
k = chdir("../");
}
pthread_exit(NULL);
}
...
...
(Code is almost entirely the same as earlier here, up until after the creation of the files. This is where counters are all reset and the bash script g09.sh is copied to each directory for the submission process.)
//Reset all counters
j = 0;
g = 0;
c = 0;
c = 0;
char CommandArray[3+i][100];
sprintf(CommandArray[0], "g03 <Gaussian03.com> output.log");
while(c < i)
{
sprintf(CommandArray[c], "cp g09.sh Image_%d", c+1);
system(CommandArray[c]);
c++;
}
c = 0;
pthread_t threads[i];
int rc;
long t;
t = i;
(Here is where the call to the thread is made to start going through directories to submit the jobs)
rc = pthread_create(&threads[t], NULL, SubmitJob, (void *)t);
if (rc)
{
printf("ERROR; return code from pthread_create() is %d\n", rc);
}
c = 0;
pthread_exit(NULL);
return 0;
The Bash Script:
I have altered the original g09.sh script slightly to add a file to the output called "Status." It is meant to be easy to read by program by having the first letter be either 'I' or 'C'. This is a nice switch statement for the program when reading through submitted directories to check and see if the job has been completed yet. Although, I might have to change its current implementation so that the program doesn't accidentally try and read the file while it is being written to.
#! /bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -m abes
#$ -pe mpich 2
#$ -notify
#
# Necessary variables
. /share/apps/bin/bashrc
. /share/apps/bin/an_functions.sh
# Gaussian 09
export g09root="/share/apps/gaussian"
. $g09root/g09/bsd/g09.profile
## added by hhe
#. $g09root/g09/bsd/g09.login
# Folder where the files are located
export INIT_DIR="$PWD"
# Name of the Gaussian 09 input file
export INAME="Gaussian03"
export ARRAY_JOB=""
# Prepare to run Gaussian 09
## changed by hhe
#export GAUSS_SCRDIR="/misc/hhe1"
export GAUSS_LFLAGS=' -vv -opt "Tsnet.Node.lindarsharg: ssh"'
LINDAWORKERS=$(cat $PE_HOSTFILE | grep -v "catch_rsh" | awk -F '.' '{ print $1}' | tr '\n' ',' | sed 's/,$//')
# Calculation specific information
export LOCATION=`hostname | awk -F '.' '{print $1}'`
cat << EndOfFile > $INIT_DIR/job_info.${JOB_ID}${ARRAY_JOB}
Job ID : $JOB_ID
Username : $USER
Primary group : hhe-users
Login node : $SGE_O_HOST
Working directory : $PWD
Program : Gaussian 09 (parallel)
Input file : t
Exclusive access : No
Array job : No
Task ID range : Not applicable
Dependent job ID : None specified
SMS notification : No
# of hosts : $NHOSTS
# of processors : $NSLOTS
Parent node : $LOCATION
Worker nodes : `cat $TMP/machines | sed q`
`cat $TMP/machines | sed '1d' | sed 's/^/ /'`
Job submission time : `sge_jst $JOB_ID `
Job start time : `date -R`
EndOfFile
# Start the timer
TIME_START=$(date +%s)
(My addition to the code)
cat << EndOfFile > $INIT_DIR/Status
Incomplete
EndOfFile
# Prepend input deck with necessary information and run
# Gaussian 09 (parallel)
## commented by hhe
#( sed -i '/%chk=/d' ${INAME}${ARRAY_JOB}.com; echo %NProcShared=${NSLOTS}; echo %LindaWorkers=${LINDAWORKERS}; echo %chk=${INAME}${ARRAY_JOB}.chk; cat ${INAME}${ARRAY_JOB}.com ) | $g09root/g09/g09 >& ${INAME}_NP${NSLOTS}${ARRAY_JOB}.log
#changed hhe
$g09root/g09/g09 < ${INAME}.com > ${INAME}_NP${NSLOTS}${ARRAY_JOB}.log
# End the timer
TIME_END=$(date +%s)
# Delete the core* files
rm -f ${INIT_DIR}/core*
rm -f ${INIT_DIR}/g09.sh.o${JOB_ID}${ARRAY_JOB}
# Calculate time difference
TIME_TOTAL=`time2dhms $(( $TIME_END - $TIME_START ))`
cat << EndOfFile >> $INIT_DIR/job_info.${JOB_ID}${ARRAY_JOB}
Job end time : `date -R`
Total run time : $TIME_TOTAL
EndOfFile
rm -f ${INIT_DIR}/g09.sh.po${JOB_ID}${ARRAY_JOB}
rm -f ${INIT_DIR}/Status
(Status file is removed and then replaced with a new one with the word "Completed" inside)
cat << EndOfFile > $INIT_DIR/Status
Completed
EndOfFile
Reuslts
After compiling the code using gcc -pthread -o (since the program uses threads) I obtained the following results:
- Image_1:
- Termination Status: Error termination (Route card not found)
- Image_2:
- Termination Status: Error termination (Route card not found)
- Image_3:
- Termination Status: Error termination (Route card not found)
- Image_4:
- Termination Status: Error termination (Route card not found)
- Image_5:
- Termination Status: Normal termination
- Image_6:
- Termination Status: Normal termination
- Image_7:
- Termination Status: Normal termination
- Image_8:
- Termination Status: Normal termination
- Image_9:
- Termination Status: Normal termination
- Image_10:
- Termination Status: Normal termination
This leads me to the conclusion that there must be an issue in communicating the files to Gaussian in either the 'qsub' or 'g09.sh' process. However, I am not terribly familiar with either one of these commands. Perhaps, there is a work around which I am not aware of. I hope to try and find some sort of fix to this problem in the next week. If you have any comments or suggestions please let me know. Thank you!