Hiring and recruiting the right talent has always been a challenge for enterprises across the globe irrespective of the size, industry, or brand value. With flourishing markets and blooming job opportunities, HR professionals are flooded with resumes and it has become an arduous task managing them. Earlier days, recruiters used to manage the entire process manually stacking cabinets with resumes they receive over a period.  As the resume information expires quickly, sorting & shortlisting manually slows down the whole hiring and recruitment process.

Resume parsing was introduced to simplify the recruitment process using automation, by leveraging Machine Learning & Natural Language Processing. It helps HR professionals intelligently manage resume information, by removing the headache of manually handling each resume. Incorporating resume parsing helped enterprises streamline the entire recruitment process, helping them hire the right talent for the right job efficiently.

What is resume parsing?

Resume parsing is a technology that enables automatic processing of resumes by extracting available data from resumes and organizing in a structured manner. In addition to extracting standard fields such as name, work experience, email, contact number, education skills, etc., custom fields can be added to include other information that might not have included in the traditional resume format.

Why Resume Auto Parser (RAP)?

Combing through resumes in search of specific requirements covering multiple parameters has always been a nightmare for recruiters. This becomes additionally inefficient as the experience level, and skills of candidates keep improving, and there are only few tools with which recruiters can keep track of these dynamics without the active involvement of candidates.

As part of our recent engagement with a large enterprise, Experion developed a customized version of Resume Parser solution which helps the client’s HR department add or update the candidate profile automatically whenever a candidate presents a new or an updated resume.

How does the Resume Auto Parser (RAP) Solution work?

The RAP solution is based on pattern recognition, encapsulating the patterns commonly observed in resumes of technical professionals, especially in the IT domain. The pilot version of the solution developed by Experion can capture the following candidate information from a large repository of resumes:

1) Name 2) Telephone number 3) Email 4) Years of experience 5) Technical skills 6) Languages spoken 7) Hobbies, etc.

The solution can examine any number of resumes and generate a report with the above details. The input file format can be DOC, PDF, or JPEG.  The output format is CSV, which enables the report to be viewed and formatted using MS Excel.

Challenges Identified

1) The diversity of resume design and file-formats coupled with lack of annotated data means that any data-intensive approaches become impractical.

2) It has been observed in various instances that name (candidate names) recognition using tools such as SPACY was not yielding accurate results as the model (‘en_core_web_sm’) did not prove to be accurate in identifying Indian names and surnames. An attempt to train the model using an available dataset of Indian names, though improved the performance, also fell short of desired accuracy level.

Document Parsing

Resumes are saved mainly in DOC or PDF format. Rarely, resumes could have an image format as well, especially while dealing with screen-prints. The RAP solution is designed for parsing resumes in DOC, PDF, JPG and PNG formats.

RAP Architecture

The overall architecture of Experion’s RAP solution combines pattern recognition, expert systems, and regular expressions to undertake intensive text analytics.

a) Pattern Recognition

Pattern recognition is used for capturing the name of a candidate. A pattern for capturing candidate names is identified after observing a considerable number of resumes. This pattern can identify names to a large extent accurately. Similarly, another pattern was identified and followed for isolating the overall work experience.

b) Expert System

An expert system is used on a smaller scale, mostly in isolating the text for fetching the candidate names and overall years of experience.

c) Regular Expression

Python regular expression (Regex) is used for extracting the email ids and contact numbers of the candidates. A certain level of formatting is also performed over the captured phone numbers so that they appear in a consistent format.

For example:

  • 1234567890 is transformed to 123-456-7890
  • 91 1234567890 is transformed to +91-123-456-7890 etc

The RAP solution is also capable of extracting more than one contact number and email ids.

How did  Experion’s Resume Auto Parser (RAP) solution benefit recruiters?

  • Examine any number of resumes to extract and summarize required information in easily readable tabular format.
  • Handle most file formats such as DOC PDF, JPG, PNG, etc.
  • Fetch candidate information from screenshots.
  • Provide information in sorted order of overall years of experience, which enables recruiters to map their requirements with a profile accurately.

Areas of improvement

The pattern recognition, though displays appreciable performance, still needs fine-tuning to be perfect. There are rare cases in which the expert system fails to isolate the text while recognizing a candidate name.

Reference & Courtesy


Our code is a profoundly improved/ refurbished version of the above.