Artificial
Resume Deciphering Intelligent
Software ( ARDIS )
Artificial Resume Generating
Intelligent Software
(
ARGIS )
-------------------------------------------------------------------------------------------------------
Note : Most of what I envisaged in my following notes 20 years ago ,
must have materialized by now !
But ,
it did lead to the launch of...www.3pJobs.com ....on 14 Nov 1997 !
Date written
: 01
Dec 1996
Uploaded : 03
Nov 2016
--------------------------------------------------------------------------------------------------------------------------------
What are these software ? What will they do ? How will
they help us ? How will they help our
clients / candidates ?
ARDIS :
This software will break up / dissect a Resume into its
different constituents such as ,
# Physical
information ( data ) about a candidate ( Executive )
# Academic
information about a candidate
# Employment
Record ( Industry / Function / Products / Services , wise )
# Salary
# Achievements
/ Contributions
# Attitudes / Attributes
/ Skills /
Knowledge
# His
preferences with respect to Industry / Function / Location
In fact , if every candidate was to fill in our EDS (
Executive Data Sheet ) , the info would automatically fall into " proper
" slots / fields since our EDS forces a candidate to " dissect "
himself into various compartments
But ,
Getting every applicant / executive to fill in our
standardized EDS is next to impossible - and , may not even be necessary
Executives ( who have already spent a lot of time and
energy preparing / typing their bio-data ), are most reluctant to sit down once
more and spend a lot of time once again , to furnish us the SAME information /
data in our neatly arranged blocks of EDS . For them , this duplication is a
WASTE of TIME !
EDS is designed for our ( information handling /
processing / retrieving ) convenience and that is the way he perceives it ! Even if he is vaguely conscious that this (
filling in of EDS ) would help him in the long run , he does NOT see any
IMMEDIATE BENEFIT from filling this - hence , reluctant to do so
We , too have a problem - a " COST / TIME / EFFORT
" problem
If we are receiving 100 bio-data each day ( this should
happen soon ) , whom to send our EDS and whom NOT to ?
This can be decided only by a SENIOR executive /
consultant , who goes through each and every bio data , DAILY , and reaches a
conclusion as to ,
* which resumes
are of " interest " & need sending an EDS
* which resumes are " marginal " or
not of immediate interest , where we need not spend time / money / energy
of sending an EDS
We may not be able to employ a number of Senior /
Competent Consultants who can scrutinize all incoming bio-data and take this
decision on a DAILY basis ! This , itself would be a costly proposition
So ,
On one hand
> we have time / cost / energy
/ effort of sending EDS to everyone ,
On second hand
> we have time / cost of
several Senior Consultants to separate out " chafe " from "
wheat "
NEITHER IS DESIRABLE
!
But ,
from each bio data received daily , we still need to DECIPHER
, and drop into relevant slots / fields , RELEVANT DATA / INFORMATION , which
would enable us to ,
# Match a
candidate's profile with " Client Requirement Profile " against
specific requests
# Match a
candidate's profile against " Specific Vacancies " that any
Corporation ( client or not ) , may post on
our VACANCY
BULLETIN BOARD ( un-advertized vacancies )
# Match a
candidate's profile against " Most Likely Companies who are likely to hire
/ need such an executive " ,
using our
CORPORATE DATABASE , which will contain
info such as , PRODUCTS / SERVICES of each and every
Company
# Convert each
bio data received into a RE-CONSTITUTED BIO DATA ( Converted Bio data ) , to
enable us to send
it out to
any client / non-client organization , at the click of a mouse
# Generate (
for commercial / profitable exploitation ) , such by-product services as ,
* Compensation Trends
* Organization Charts
* Job Descriptions....etc
# Permit a
candidate to log into our DATABASE and remotely modify / alter his bio data
# Permit a
client ( or a non-client ) , to log into our DATABASE and remotely conduct a
SEARCH
ARDIS is required on the assumption that , for a
long time to come , " TYPED BIO DATA " would form a major source of
our database
Other sources , such as
,
* Duly filled
in EDS ( hard copy )
* EDS on a floppy
* Downloading
EDS over Internet ( or Dial-Up phone lines ) ,
and uploading after filling in ( like Intellimatch ),
will continue to play a minor role in foreseeable
future
HOW WILL
ARDIS WORK ?
Step # 1
Receive typed Bio Data
Step # 2
Scan bio data
Step # 3
Create BIT-MAP image
Step # 4
Using OCR , convert to ASCII ( using PageMaker )
Convert to English characters ( by comparison )
Step # 5
OWR / Optical Word Reader
Convert to English language WORDS , to create a
Directory of Keywords ( using ISYS )
Compare with KEY-WORDS , stored in WORD DIRECTORY of
" Most Frequently Used " WORDS in 3,500 converted bio-data ( ISYS
analysis )
Step # 6
OPR / Optical Phrase Reader
Pick out " Phrases " and create DIRECTORY of
" Key Phrases " ( ARDIS )
* Detect "
Pre-fixes " & " Suffixes " used with each KEY WORD that go
to make up " Most Frequently Used
PHRASES "
* Calculate
" Occurrence Frequency "
* Calculate
" Probability " of each Occurrence
* Create "
Phrase Directories " for comparison
Step # 7
OSR / Optical Sentence Reader
Pick out " Sentences " & create ,
Directory of " KEY SENTENCES "
Most commonly used VERBS / ADVERBS / PREPOSITIONS , with each " Key Phrase " to create
Directory of KEY SENTENCES
TO RECAPITULATE
:
ARDIS
will ,
* Recognize
" Characters "
* Convert to
" WORDS "
* Compare with
6,258 key words which we have found in
3,500 converted Bio Data ( using ISYS ) .
If a " Word "
has not
already appeared ( > 10 times ) in those 3500 bio data , then its "
chance " ( probability ) of occurring
in the next bio data , is very very small
indeed
But even then ,
ARDIS
software will store in memory , each " Occurrence " of each Word (
old or new / first time or a thousandth time ) ,
And ,
will continuously calculate its " Probability of
Occurrence " as :
P = [ No of Occurrence
of the given word so far ] .. divided by... { Total No of occurrence of all the
words in the
in the
entire population so far }
So that ,
By the time we have SCANNED , 10,000 bio data , we
would have literally covered ALL the words that have , even a small PROBABILITY
of OCCURRENCE !
So , with each new bio data " scanned " , the
" probability of occurrence " of each word is getting , more and more
accurate !
Same logic will hold for,
* KEY PHRASES
* KEY SENTENCES
The " Name of the Game " is : Probability of
Occurrence
As someone once said :
If you allow 1000 monkeys to keep on hammering keys of
1000 type-writers , for 1000 years , you will , at the end find that , between them , they have "
re-produced " , the entire literary works of Shakespeare !
But today , if
you store into a Super Computer ,
* all the words
appearing in English language ( incl Verbs / Adverbs / Adjectives ..etc )
* the "
Logic " behind construction of English language ,
then ,
I am sure , the Super Computer could reproduce the
entire works of Shakespeare , in 3 MONTHS !
And , as you would have noticed , ARDIS is a
" SELF LEARNING " type of
software !
The more it reads ( scans ) , the more it learns ( memorizes
words , phrases & even sentences )
Because of its SELF LEARNING / SELF CORRECTING / SELF
IMPROVING , capability , ARDIS gets better & better equipped to detect , in
a scanned bio data ,
* Spelling Mistakes
( wrong WORD )
* Context Mistakes
( wrong Prefix or Suffix )
*
Preposition Mistakes ( wrong PHRASE )
* Verb /
Adverb Mistakes ( wrong SENTENCE ),
With minor variations ,
- ALL Thoughts ,
Words ( written ) , Speech ( spoken ) and Actions , keep on " repeating
" again and again and again
It is this REPETITIVENESS of Words , Phrases , and
Sentences in Resumes , that we plan to exploit
In fact ,
by examining & memorizing the several hundred ( or
thousand ) " Sequences " in which the words appear , it should be
possible to " Construct " the " Grammar " ie: the logic
behind the sequences
I suppose , this is the manner in which the experts
were able to unravel the " meaning " of hierographic inscriptions on
Egyptian tombs .
They learned a completely strange / obscure language by
studying the " Repetitiveness " & " Sequential "
occurrence of unknown characters
===============================================================\
Added on 11 JULY 2022 :
( 18 May 2021 )
Extract :
LaMDA’s conversational skills have been years in the making. Like many recent language models, including BERT and GPT-3, it’s built on Transformer, a neural network architecture that Google Research invented and open-sourced in 2017. That architecture produces a model that can be trained to read many words (a sentence or paragraph, for example), pay attention to how those words relate to one another and then predict what words it thinks will come next.
But unlike most other language models, LaMDA was trained on dialogue. During its training, it picked up on several of the nuances that distinguish open-ended conversation from other forms of language. One of those nuances is sensibleness. Basically: Does the response to a given conversational context make sense? For instance, if someone says:
“I just started taking guitar lessons.”
You might expect another person to respond with something like:
“How exciting! My mom has a vintage Martin that she loves to play.”
That response makes sense, given the initial statement. But sensibleness isn’t the only thing that makes a good response. After all, the phrase “that’s nice” is a sensible response to nearly any statement, much in the way “I don’t know” is a sensible response to most questions. Satisfying responses also tend to be specific, by relating clearly to the context of the conversation. In the example above, the response is sensible and specific.
LaMDA builds on earlier Google research, published in 2020, that showed Transformer-based language models trained on dialogue could learn to talk about virtually anything. Since then, we’ve also found that, once trained, LaMDA can be fine-tuned to significantly improve the sensibleness and specificity of its responses.
==============================================================
HOW TO
BUILD DIRECTORIES OF
" PHRASES "
?
From 6252 words , let us pick any word , say : ACHIEVEMENT
Now we ask the software to scan the Directory
containing 3500 converted Bio Data , with instruction that every time the word " Achievement " is spotted
, the software will immediately spot / record the " prefix " .
The
software will record , ALL the words that appeared before " Achievement
" as also the " Number of times " each of this prefix appeared
Word =
ACHIEVEMENT
Prefix found......................... No of times found
( Occurrence )................ Probability of Occurrence
--------------------------------------------------------------------------------------------------------------------------------
*
Major........................................
10.................................................. 10 / 55 =
*
Minor.........................................
9................................................... 9 / 55 =
*
Significant................................... 8 ....................................................
8 / 55 =
*
Relevant.................................. 7
................................................... 7 / 55 =
*
True.......................................... 6
................................................. 6 / 55 =
*
Factual........................................ 5
* My ........................................ 4
* Typical .................................. 3
*
Collective................................... 2
*
Approximate................................. 1
--------------------------------------------------------------------------------------------------------------------------------
TOTAL NO OF
OCCURRENCES.......... 55......................................................
( Total Probability ) 1.000
--------------------------------------------------------------------------------------------------------------------------------
As more and more bio data are scanned ,
* The Number of
" Prefixes " will go on increasing
* The Number of
" Occurrences " of each prefix will also go on increasing
* The overall "
population size " will also go on increasing
* The " Probability
of Occurrence " of each prefix will go on getting more and
more accurate ie; more and more
representative
This process can go on and on and on ( as long as we
keep on scanning bio data )
But " Accuracy Improvements " will decline /
taper off , once a sufficiently large number of prefixes ( to the word ,
ACHIEVEMENT ), have been accumulated . Saturation will take place !
The whole process can be repeated with the WORDS that
appear as " SUFFIXES " to the word " ACHIEVEMENT "
And the probability of occurrence of each " Suffix"
, also determined
Word = ACHIEVEMENT
--------------------------------------------------------------------------------------------------------------------------------
Suffix ............................ No of Times
Found....................................... Probability of Occurrence
--------------------------------------------------------------------------------------------------------------------------------
*
Attained.......................... 20
..
............................................................. 20 / 54
*
Reached.......................
15...................................................................15
/ 54
*
Planned.....................
10 ...................................................................
10 / 54
* Targeted..................
5................................................................... 5 /
54
* Arrived........................ 3
.................................................................... 3 / 54
* Recorded ................... 1
...................................................................... 1
/ 54
--------------------------------------------------------------------------------------------------------------------------------
TOTAL OF ALL OCCURRENCES .... 54 ( Population Size
).. Total Probability ...........
1.000
-------------------------------------------------------------------------------------------------------------------------------
Having figured
out the " Probabilities of Occurrences " of each of the prefixes and
each of the suffixes ( to a given word , - in this case , ACHIEVEMENT ) , we
could next tackle the issue of " a given combination of prefix and suffix
"
eg;
What is the probability of :
* Prefix =
" Major " / Word = ACHIEVEMENT / Suffix
= " Attained " ?
Why is all of this Statistical exercise required ?
If we wish to stop at merely " Deciphering "
a resume , then I don't think , we need to go through this
For mere " Deciphering " , all we need is to
create a KNOWLEDGE BASE of :
* Skills
* Knowledge
* Attitudes
* Attributes
* Industries
* Companies
* Functions
* Edu
Qualifications
* Products /
Services
* Names ...etc
Having created the " knowledge base " ,
simply scan a bio data , recognize " words " , compare with the words
contained in the " knowledge base " , find CORRESPONDENCE /
EQUIVALENCE , and allot / file each scanned word into respective " Fields
" against each PEN ( Permanent Executive Number )
PRESTO !
You have dissected and stored the MAN , in appropriate boxes !
Our EDS has these " boxes " . Problem is
manual data entry
The data entry operator ,
- searches
appropriate " word " from appropriate " EDS Box " and
transfers to appropriate screen
To eliminate this manual ( time consuming operation ) ,
we need ARDIS
We already have a DATA BASE of 6500 words
All we need to do is to write down against each word ,
whether it is a ,
* Skill
* Attribute
* Knowledge
* Edu
* Product
* Company
* Location
* Industry
* Function etc
The moment we do this , what was a mere " Data
base " , becomes a " Knowledge Base " , ready to serve as a
" COMPARATOR "
And as each NEW bio data is scanned , it will throw up
words for which there is no " Clue "
Each such NEW word will have to be manually "
Categorized " and added to the " Knowledge base "
Then what is the advantage of calculating for ,
* each WORD
* each SUFFIX
* each PREFIX
* each PHRASE
* each SENTENCE
,
- its probability of occurrence ?
The ADVANTAGES are :
# 1
Detect "
unlikely " prefixes / suffixes
Suppose ARDIS detects , " Manor Achievement "
ARDIS finds that the " probability " of ,
* " Manor
" as prefix to ACHIEVEMENT , is 0.00009 ( say , NIL )
hence , the CORRECT prefix has to be ,
* " Major
" ( and not " Manor " ) , for which , the probability is ( say )
... 0.4056
# 2
ARDIS detects
words " Mr HANOVAR "
It recognizes this as a spelling mistake and corrects
automatically to , " Mr HONAVAR "
OR,
it reads , place of birth as " KOLHAPURE "
It recognizes it as " KOLHAPUR " , or vice
versa , if it says " My name is KOLHAPUR , "
# 3
Today , while scanning ( using OCR ) , when a mistake
is detected , it gets highlighted on the screen or an asterisk / underline
starts blinking
This draws attention of the operator , who manually
corrects the " mistake " , after consulting a dictionary or his own
knowledge base
Once ARDIS has calculated the probabilities of lakhs of
words and even the probabilities of their " Most likely sequence of
occurrences " , then , hopefully the OCR can " self - correct " any
word or phrase , without operator intervention
So the scanning accuracy of OCR should eventually
become 100 % and not 75 % - 85 % as at present
# 4
Eventually , we want that ,
- a bio data is
scanned , and automatically
- re-constitutes
itself into our converted BIO DATA
FORMAT
This is the concept of
ARGIS ( automatic resume generating intelligence software )
Here again , the idea is to eliminate the manual data
entry of the entire bio data - our Ultimate Goal
But ARGIS is not possible without first installing
ARDIS ,
and that too with the calculation of the "
Probability of Occurrence " as THE MAIN FEATURE of the software
By studying and memorizing and calculating the "
Probability of Occurrence " of lakhs of words / phrases / sentences ,
ARDIS actually " learns " English grammar through " Frequency of
Usage "
And it is this Knowledge Base which enables ARGIS to
re-constitute a bio data ( in our format ) , in a GRAMMATICALLY CORRECT way
No comments:
Post a Comment