GSoC 2017 : Week 7 & 8

Status update for the 7th and 8th week

July 28, 2017 Krishanu Konar

4 minute read

Okay, It has been some time since I last posted an update on my project. The last 2 weeks have probably been the busiest of my life, working on my GSoC project, last minute preparation for the upcoming campus placement drive and of course, the exams and interviews of various companies. It was a long, lifeless, sleep deprived, caffeine ridden mindf*ck of an experience, and the most intense time I’ve ever been through all my life.

Thankfully, After a few failed tests and ample amount of disappointments, I finally cleared the initial rounds, and eventually ended up clearing all 5 rounds of a certain reputed company, culminating in a job offer. I was happy, but I still had a big task ahead…. and that would be completing the Summer of Code. So, I took a “deserved” nap and started working on my project again.

So, by the end of week 6, I was supposed to complete the user defined mappings part, since I had to start implementing JSONpedia library integration from week 7. Custom mapping rules were finalized last week, and this week’s major challenge was adding the custom mapper functions. This was a bit more complicated than simply adding new rules, as I had to store settings which could be run as an independent mapper function. So, after some planning, I came up with a basic, yet powerful procedure. I’d store all the features related to the mapper function in a separate settings file (custom_mappers.json), which would be used by a generic function which would be used as a mapper. Since most of the project was completely modular, this was possible.

The idea was to isolate all the steps in the triple extraction process, and implement each of them separately and combine them into a complete mapper function. So, we keep a dictionary entry in a JSON file, each representing a mapper function. The first task would be identifying section headers, followed by finding keywords to find subsections and the ontology classes/properties for those keywords. We’ll also give user the power to select the extractor functions that would be used in the process, letting him choose the trade-off between the quality and the quantity of the extraction. A sample mapper function settings would look something like this:

{
    "MUSIC_GENRE_MAPPER": {
        "headers": {
            "en": ["bands", "artists"]
        },
        "extractors": [1, 2, 3, 4],
        "ontology": {
            "en": {
                "default": "notableArtist",
                "artist": "notableArtist",
                "band": "notableBand",
                "Subgenre" : "SubGenre",
                "division" : "SubGenre",
                "festivals" : "relatedFestivals"
            }
        },
        "years": "Yes"
    }
}

Then, a common method was written that made use of these settings to run the extraction process. The mapper settings would be dynamically loaded each time a new setting was added so that it could be used by rulesGenerator. After that, we also had to make sure that the select mapper identifies the user defined mapper functions. The following snippet does the trick:

is_custom_map_fn = False
try:
    if lang in eval(domain):
        domain_keys = eval(domain)[lang]  # e.g. ['bibliography', 'works', ..]
    else:
        print("The language provided is not available yet for this mapping")
        return 0
except NameError:  #key not found(predefined mappers)
    if domain not in CUSTOM_MAPPERS.keys():
        print "Cannot find the domain's mapper function!!"
        print 'You can add a mapper function for this mapping using rulesGenerator.py and try again...\n'
        return 0
    else:
        is_custom_map_fn = True
        domain_keys = CUSTOM_MAPPERS[domain]["headers"][lang]

And with that, the custom mapper was ready. A lot of details have been omitted here about the implementation. If you’re interested, you can head over to GitHub page and check out the code ;)

Week 8’s job was looking at the JSONpedia Live service and figure out a way to use the JSONpedia library. The goal was to figure out a way to use JSONpedia library in the project so that we could get rid of the dependency on the live web-service. So, I downloaded the JSONpedia library and went through the code for a while. I also had to come up with a plan to use this library, which was written in Java, in my list-extractor program, which was written in python.

After giving it some thought, I came up with the idea of retrieving the JSON representation using the library and the use the json.loads() to load that string into a dictionary and then work normally with that dictionary. So, I decided that I could run the library independently and then pipe the output to the python string variable and use it.

So, I started and completed writing a Java wrapper function for JSONpedia library, that’ll take commandline arguments, parse it and make appropriate calls to JSONpedia and print the output to stdout. So, I can fork a subprocess that computes this and pipes the result to the python file and hence we can integrate it. I used JCommander to parse commandline parameters to the wrapper function that would emulate the Live query. I’m still working in this process.

The coming week would be the evaluation week, and I’ll be continuing to work on this irrespective of the result. Hope I pass though, fingers crossed!

You can follow my project on github here.

gsoc

Home

My Site

About

GSoC

Categories

Recent Posts

Short Notes: Virtual File Systems (VFS)

Short Notes: Virtual File Systems (VFS)

Short Notes: cGroups and Namespaces

Short Notes: Unix System Calls

Short Notes: Inter Process Communication

GSoC 2017 : Week 7 & 8

Recent Posts

Short Notes: Virtual File Systems (VFS)

Short Notes: Virtual File Systems (VFS)

Short Notes: cGroups and Namespaces

Short Notes: Unix System Calls

Short Notes: Inter Process Communication

Categories

About