Skip to content

[GSoC] Week-3/4 Configuring Extractors and Mappings for Hindi

Published: at 05:00 AM

This is the third and forth week(17-21st June) of the coding period of GSoC where the main aim was to configure different existing extractors and mappings for Hindi.

Table of Contents

Open Table of Contents

Extractor configuration

I have configured several extractors, discussed below, with some of the code changes:

Before Before After After

All the extractors are also added to the properties file.

extractors.hi=.MappingExtractor,.HomepageExtractor,.DisambiguationExtractor,.TopicalConceptsExtractor,.ImageExtractorNew,.AnchorTextExtractor,.CommonsResourceExtractor

Learning and Next Steps

  1. Challenges in Server Setup for Viewing Statistics: I encountered an issue with setting up the server to view statistics. However, this challenge was swiftly resolved with the assistance of another GSoC contributor, Meti. Collaborating on this solution has been a great example of teamwork and mutual support.

  2. Plan to Configure the Abstract Extractor for Hindi: I aim to configure the abstract extractor to support Hindi. This will involve tailoring the extractor to handle the nuances of the Hindi language.

  3. Increase Mapping Coverage of Hindi Mappings: In line with our mentors’ suggestions, I will work on manually adding mappings to increase the Hindi coverage. This will be a significant step towards enhancing the inclusivity and richness of our dataset.