-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
MRG: Use mffpy for EGI MFF event reading
#13932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
e465273
5a2248e
e8a58ad
b10ea86
31d8824
b815651
f4119cf
2e3d57b
60f9ece
0a6250b
fb2f8a5
a9a3d99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Use mffpy-backed parsing for EGI MFF event tracks while tolerating nanosecond timestamps that some files store in ``beginTime``, by `Pragnya Khandelwal`_. |
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,12 +3,11 @@ | |||||||||||
| # License: BSD-3-Clause | ||||||||||||
| # Copyright the MNE-Python contributors. | ||||||||||||
|
|
||||||||||||
| from datetime import datetime | ||||||||||||
| from glob import glob | ||||||||||||
| from os.path import basename, join, splitext | ||||||||||||
| from os.path import basename, splitext | ||||||||||||
|
|
||||||||||||
| import numpy as np | ||||||||||||
|
|
||||||||||||
| from ...fixes import _parse_mffpy_datetime | ||||||||||||
| from ...utils import _soft_import, _validate_type, logger, warn | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|
@@ -23,7 +22,9 @@ def _read_events(input_fname, info): | |||||||||||
| Header info array. | ||||||||||||
| """ | ||||||||||||
| n_samples = info["last_samps"][-1] | ||||||||||||
| mff_events, event_codes = _read_mff_events(input_fname, info["sfreq"]) | ||||||||||||
| mff_events, event_codes = _read_mff_events( | ||||||||||||
| input_fname, info["sfreq"], info["meas_dt_local"] | ||||||||||||
| ) | ||||||||||||
| info["n_events"] = len(event_codes) | ||||||||||||
| info["event_codes"] = event_codes | ||||||||||||
| events = np.zeros([info["n_events"], info["n_segments"] * n_samples]) | ||||||||||||
|
|
@@ -35,65 +36,99 @@ def _read_events(input_fname, info): | |||||||||||
| return events, info, mff_events | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def _read_mff_events(filename, sfreq): | ||||||||||||
| """Extract the events. | ||||||||||||
| def _read_mff_events(filename, sfreq, start_time): | ||||||||||||
| """Extract the events with mffpy. | ||||||||||||
|
|
||||||||||||
| Parameters | ||||||||||||
| ---------- | ||||||||||||
| filename : path-like | ||||||||||||
| File path. | ||||||||||||
| sfreq : float | ||||||||||||
| The sampling frequency | ||||||||||||
| start_time : datetime | ||||||||||||
| The recording start time used as the event anchor. | ||||||||||||
| """ | ||||||||||||
| orig = {} | ||||||||||||
| for xml_file in glob(join(filename, "*.xml")): | ||||||||||||
| xml_type = splitext(basename(xml_file))[0] | ||||||||||||
| et = _parse_xml(xml_file) | ||||||||||||
| if et is not None: | ||||||||||||
| orig[xml_type] = et | ||||||||||||
| xml_files = orig.keys() | ||||||||||||
| xml_events = [x for x in xml_files if x[:7] == "Events_"] | ||||||||||||
| for item in orig["info"]: | ||||||||||||
| if "recordTime" in item: | ||||||||||||
| start_time = _ns2py_time(item["recordTime"]) | ||||||||||||
| break | ||||||||||||
| # Use defusedxml to parse Events XML directly (avoid mffpy's strict | ||||||||||||
| # datetime parsing which may include nanosecond fractions). We still use | ||||||||||||
| # mffpy.Reader for locating the Events.xml files inside the MFF. | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| _soft_import("mffpy", "reading EGI MFF data") | ||||||||||||
| _soft_import("defusedxml", "reading EGI MFF data") | ||||||||||||
| import defusedxml.ElementTree as DET | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason you're using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks so much for taking a look! There are two reasons for this: MNE-Python has a strict internal policy requiring Because of the 9-digit fractional timestamp bug we discussed in issue BEL-Public/mffpy#138 calling
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The struggle is real as we say. This might be something more for @scott-huberty and @drammock to ponder beyond this GSOC. The MFF reader in mffpy currently used for reading epochs or evoked files is via The pros of keeping the mffpy way of things is just how easy some things are like returning typed objects (.sensors, .epochs, .events), I suppose we could have a conversation with BEL and other mffpy maintainers about switching the current My thesis is currently: "if the point of this project is to use mffpy to parse MFF files, then you should use mffpy's way of doing things and if those need changing, then we should probably change those upstream". Perhaps the compromise could be using the XML.* functions for this with the goal of changing things upstream in mffpy.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm - okay faster than expected I have a mostly working
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @drammock / @larsoner : 1) I assume it's ok if one of our dependencies parses XML files using and 2) If mffpy devs cut a release after the aforementioned timestamp bug is fixed upstream... Is there any chance that MNE is comfortable with putting a lower pin on mffpy's latest release, so that we do not have to keep our shim around? 😁
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pins are fine for newly introduced dependencies. It's only an issue if we move from no pin to pin, or bump the pin.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
we use defusedxml in our code because it closes some vulnerabilities present in the standard library. We cannot force dependencies to do likewise, but can suggest/encourage it upstream. IIRC, lxml has similar safeguards to defusedxml, but is a much more heavyweight package (does more stuff). I'd need to look into it to be sure though. If a required dependency used lxml then I'd actually consider switching us to use it too --- why have 2 libs installed if one will suffice? But mffpy will be an optional dep (right?) so that logic doesn't necessarily apply, at least not as strongly, esp if there are salient differences in security or install size or API convenience.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Edit: sorry for the in Felicity of my last message, I just saw that @pmolfese is already looking into what it would take to switch to defusedxml upstream. Thanks! LMK if I can be of help there.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
OK - we already depend on
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked through the MFFPy commits since 0.10.0 (2024 june) and it looks like all bugfixes / CI / maintenance stuff since then. So the risk of a pin breaking user code is low. No objection to setting lower pin of 0.11 when it's released. |
||||||||||||
| import mffpy | ||||||||||||
|
|
||||||||||||
| reader = mffpy.Reader(filename) | ||||||||||||
| try: | ||||||||||||
| files_list = sorted(reader.directory.listdir()) | ||||||||||||
| except Exception: | ||||||||||||
| files_list = [] | ||||||||||||
| tracks = [] | ||||||||||||
| for xml_name in files_list: | ||||||||||||
| stem = splitext(basename(xml_name))[0] | ||||||||||||
| if not stem.startswith("Events"): | ||||||||||||
| continue | ||||||||||||
| with reader.directory.filepointer(stem) as fp: | ||||||||||||
| try: | ||||||||||||
| root = DET.parse(fp).getroot() | ||||||||||||
| except Exception: | ||||||||||||
| # fallback: try reading as bytes and parse string | ||||||||||||
| try: | ||||||||||||
| fp.seek(0) | ||||||||||||
| txt = fp.read() | ||||||||||||
| root = DET.fromstring(txt) | ||||||||||||
| except Exception as exc2: | ||||||||||||
| warn( | ||||||||||||
| f"Could not parse the XML file {xml_name}: {exc2}", | ||||||||||||
| RuntimeWarning, | ||||||||||||
| ) | ||||||||||||
| continue | ||||||||||||
| # identify eventTrack root (namespace-insensitive) | ||||||||||||
| if _ns(root.tag) == "eventTrack": | ||||||||||||
| tracks.append(root) | ||||||||||||
|
|
||||||||||||
| markers = [] | ||||||||||||
| code = [] | ||||||||||||
| for xml in xml_events: | ||||||||||||
| for event in orig[xml][2:]: | ||||||||||||
| event_start = _ns2py_time(event["beginTime"]) | ||||||||||||
| start = (event_start - start_time).total_seconds() | ||||||||||||
| if event["code"] not in code: | ||||||||||||
| code.append(event["code"]) | ||||||||||||
| marker = { | ||||||||||||
| "name": event["code"], | ||||||||||||
| "start": start, | ||||||||||||
| "start_sample": int(np.trunc(start * sfreq)), | ||||||||||||
| "end": start + float(event["duration"]) / 1e9, | ||||||||||||
| "chan": None, | ||||||||||||
| } | ||||||||||||
| markers.append(marker) | ||||||||||||
| events_tims = dict() | ||||||||||||
| for ev in code: | ||||||||||||
| trig_samp = list( | ||||||||||||
| c["start_sample"] for n, c in enumerate(markers) if c["name"] == ev | ||||||||||||
| ) | ||||||||||||
| events_tims.update({ev: trig_samp}) | ||||||||||||
| for root in tracks: | ||||||||||||
| # each child 'event' element | ||||||||||||
| for event_el in root.findall("{*}event"): | ||||||||||||
| # extract fields by tag name ignoring namespace | ||||||||||||
| ev = {} | ||||||||||||
| for child in event_el: | ||||||||||||
| tag = _ns(child.tag) | ||||||||||||
| ev[tag] = child.text | ||||||||||||
| # parse times and duration | ||||||||||||
| event_start = _parse_mffpy_datetime( | ||||||||||||
| ev.get("beginTime"), tzinfo=start_time.tzinfo | ||||||||||||
| ) | ||||||||||||
| if event_start is None: | ||||||||||||
| continue | ||||||||||||
| start_sec = (event_start - start_time).total_seconds() | ||||||||||||
| code_str = ev.get("code", "") | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are probably a couple checks you could add:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are really great suggestions. Because this is the very first step of my GSoC project, my mentors requested that I keep this initial micro-PR strictly confined to 1:1 functional parity with MNE's legacy reader. The legacy reader didn't enforce the 4-character limit or map the labels, so I want to avoid expanding the scope just yet. |
||||||||||||
| if code_str not in code: | ||||||||||||
| code.append(code_str) | ||||||||||||
| # duration in xml is typically in nanoseconds | ||||||||||||
| duration = None | ||||||||||||
| if ev.get("duration") is not None: | ||||||||||||
| try: | ||||||||||||
| duration = int(ev.get("duration")) / 1e9 | ||||||||||||
| except Exception: | ||||||||||||
| duration = None | ||||||||||||
| markers.append( | ||||||||||||
| { | ||||||||||||
| "name": code_str, | ||||||||||||
| "start": start_sec, | ||||||||||||
| "start_sample": int(np.trunc(start_sec * sfreq)), | ||||||||||||
| "end": start_sec + (duration if duration is not None else 0.0), | ||||||||||||
| "chan": None, | ||||||||||||
| } | ||||||||||||
| ) | ||||||||||||
|
|
||||||||||||
| events_tims = { | ||||||||||||
| ev: [marker["start_sample"] for marker in markers if marker["name"] == ev] | ||||||||||||
| for ev in code | ||||||||||||
| } | ||||||||||||
| return events_tims, code | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def _parse_xml(xml_file: str) -> list[dict[str, str]] | None: | ||||||||||||
| """Parse XML file.""" | ||||||||||||
| defusedxml = _soft_import("defusedxml", "reading EGI MFF data") | ||||||||||||
| try: | ||||||||||||
| xml = defusedxml.ElementTree.parse(xml_file) | ||||||||||||
| except defusedxml.ElementTree.ParseError as e: | ||||||||||||
| warn(f"Could not parse the XML file {xml_file}: {e}") | ||||||||||||
| return | ||||||||||||
| root = xml.getroot() | ||||||||||||
| return _xml2list(root) | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def _xml2list(root): | ||||||||||||
| """Parse XML item.""" | ||||||||||||
| output = [] | ||||||||||||
|
|
@@ -150,15 +185,6 @@ def _xml2dict(root): | |||||||||||
| return output | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def _ns2py_time(nstime): | ||||||||||||
| """Parse times.""" | ||||||||||||
| nsdate = nstime[0:10] | ||||||||||||
| nstime0 = nstime[11:26] | ||||||||||||
| nstime00 = nsdate + " " + nstime0 | ||||||||||||
| pytime = datetime.strptime(nstime00, "%Y-%m-%d %H:%M:%S.%f") | ||||||||||||
| return pytime | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def _combine_triggers(data, remapping=None): | ||||||||||||
| """Combine binary triggers.""" | ||||||||||||
| new_trigger = np.zeros(data.shape[1]) | ||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is now merged. If you pull against the
developbranch you should be able to test it against your code without the monkey patch.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pmolfese, just pulled the develop branch and tested it locally—it works perfectly! No more crashes on the nanosecond timestamps. Thank you so much for turning that around so fast!
I'll leave the shim in this PR for now just to keep the MNE CI happy (since it pulls from PyPI), but I'll open a quick PR to rip it out as soon as you guys cut the next official release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested a new release from mffpy: BEL-Public/mffpy#141
However even if they do cut a release, we will need to keep the shim if MNE decides against pinning to the latest release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PragnyaKhandelwal see #13932 (comment) - we are good to bump the lower pin (when mffpy cuts a new release).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for digging into the commit history @drammock, and thanks for the update @scott-huberty!
Since
mffpyhasn't officially published v0.11 toPyPIyet, how would you prefer to handle this PR?We can merge it as-is (with the shim), and I can open a quick follow-up PR to bump the pin and remove the shim the day the release drops. Or, we can just leave this PR open and do it all right here once v0.11 is live.
Happy to do whichever fits the workflow best!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PragnyaKhandelwal I suppose that we can merge it with the shim and remove it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scott-huberty should the xml parse be switched out to the
mffpyone first? Or make that a follow up PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmolfese do you mean should we use
lxmlfor our shim instead ofdefusedxml?That is a good question..Given the discussion at BEL-Public/mffpy#139, I guess I would vote for a follow up PR that is entirely dedicated to swapping out all uses of
defusedxmlforlxmlin our codebase.And hopefully we get a new
mffpyrelease soon and can remove our shim, which would make that follow up PR a little simpler!Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scott-huberty - I meant more to use the
mffpymethods to get the XML info (like events) which abstract away the actual XML parser. For example, I might do something like the following to get event codes, labels, and onset times:Obviously can be done after initial merge as well. And of course happy to help with other conversions to
lxmlbut I suspect for EGI files, the mffpy functions should wrap that functionality entirely. They have methods for doing events, epochs, etc. Should abstract away some complexity and lay the burden on other (likely my) shoulders to make sure they're working with any updates or iterations of MFF.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah Ok! Yes, I think that ideally we should let
mffpydo the XML parsing, and we just use their API to get the events, etc.@PragnyaKhandelwal after BEL cuts v0.11, can you make sure that we rely on
mffpyAPI to extract the events, particularly in our function_read_mff_events? Ideally we should not have to do any xml parsing ourselves.