# Parsing evtx to json

## The data type within the event

The structure of each evtx is quite straight forward, basically, data are stored in Event/System and Event/EventData.

However, the most dynamic part is some data will be stored as attribute and some are text. Here are the example of how I resolve that.

## Resolve the dynamic XML DOM

In order to resolve evtx, I selected "xmltodict". It gets you a direct access to specific node, actually, there're only 2 of them.

### Load the XML

In xmltodict, starting with simple "with open"  statement. Using .parse() function to read the content. Then we have the data on hand.

```python
import xmltodict
with open('./sample.xml') as fd:    
    evts = xmltodict.parse(fd.read())
```

### Read the System node

Within the system, we can see the challenge where some data stored as an attributes form, such as Provider. And some stored as standard text form, like EventID.&#x20;

To access the System node, we can just simply access by dict key

```python
system = evts['Event']['System']
```

As it had already converted as dictionary, let's take a look on what if we iterate all the keys and values once. The best way to iterate all keys and values for a list is to use .items()

```python
for k, v in system.items():
    print(k,":",v)
```

```markup
Provider : OrderedDict([('@Name', 'Microsoft-Windows-Security-Auditing'), ('@Guid', '{54849625-5478-4994-a5ba-3e3b0328c30d}')])
EventID : 4624
Version : 2
Level : 0
Task : 12544
Opcode : 0
Keywords : 0x8020000000000000
TimeCreated : OrderedDict([('@SystemTime', '2021-03-30T09:56:42.9750901Z')])
EventRecordID : 15086
Correlation : OrderedDict([('@ActivityID', '{e4dd31b4-2473-0001-3232-dde47324d701}')])
Execution : OrderedDict([('@ProcessID', '780'), ('@ThreadID', '4836')])
Channel : Security
Computer : FAKEVICTIM45B1
Security : None
```

So you can see there are two type of values presented. **OrderedDict** and **Str**. If you examine carefully with type(), you can tell the **OrderedDict** is a class of **collections.**&#x20;

Take a closer look of Provider node, the **OrderedDict** is functioned as a wrapper for both attributes, Name and Guid

```markup
<Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3e3b0328c30d}' />
```

### &#x20;Make it more JSON

Now I know the dictionary with XML blood, my target is to turn it to more JSON, like this:

```javascript
{'Provider': 
    {
        'Name': 'Microsoft-Windows-Security-Auditing', 
        'Guid': '{54849625-5478-4994-a5ba-3e3b0328c30d}'
    }, 
'EventID': '4624'
...
...
}
```

Since we have all the idea of the data structure now, it could be easily implemented by setting up the condition using type():

```python
_evtDict = {}
for k, v in system.items():
    if type(v) == collections.OrderedDict:
        _evtAttr = {}
        for k1,v1 in v.items():
            _evtAttr[k1[1:]] = v1
        _evtDict[k] = _evtAttr
    if type(v) == str:
        _evtDict[k] = v
```

&#x20;

## The example of evtx that i used to convert&#x20;

```markup
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
    <System>
        <Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-a5ba-3e3b0328c30d}' />
        <EventID>4624</EventID>
        <Version>2</Version>
        <Level>0</Level>
        <Task>12544</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8020000000000000</Keywords>
        <TimeCreated SystemTime='2021-03-30T09:56:42.9750901Z' />
        <EventRecordID>15086</EventRecordID>
        <Correlation ActivityID='{e4dd31b4-2473-0001-3232-dde47324d701}' />
        <Execution ProcessID='780' ThreadID='4836' />
        <Channel>Security</Channel>
        <Computer>GURU45B1</Computer>
        <Security />
    </System>
    <EventData>
        <Data Name='SubjectUserSid'>S-1-5-18</Data>
        <Data Name='SubjectUserName'>FAKEVICTIM45B1$</Data>
        <Data Name='SubjectDomainName'>WORKGROUP</Data>
        <Data Name='SubjectLogonId'>0x3e7</Data>
        <Data Name='TargetUserSid'>S-1-5-18</Data>
        <Data Name='TargetUserName'>SYSTEM</Data>
        <Data Name='TargetDomainName'>NT AUTHORITY</Data>
        <Data Name='TargetLogonId'>0x3e7</Data>
        <Data Name='LogonType'>5</Data>
        <Data Name='LogonProcessName'>Advapi  </Data>
        <Data Name='AuthenticationPackageName'>Negotiate</Data>
        <Data Name='WorkstationName'>-</Data>
        <Data Name='LogonGuid'>{00000000-0000-0000-0000-000000000000}</Data>
        <Data Name='TransmittedServices'>-</Data>
        <Data Name='LmPackageName'>-</Data>
        <Data Name='KeyLength'>0</Data>
        <Data Name='ProcessId'>0x2fc</Data>
        <Data Name='ProcessName'>C:\Windows\System32\services.exe</Data>
        <Data Name='IpAddress'>-</Data>
        <Data Name='IpPort'>-</Data>
        <Data Name='ImpersonationLevel'>%%1833</Data>
        <Data Name='RestrictedAdminMode'>-</Data>
        <Data Name='TargetOutboundUserName'>-</Data>
        <Data Name='TargetOutboundDomainName'>-</Data>
        <Data Name='VirtualAccount'>%%1843</Data>
        <Data Name='TargetLinkedLogonId'>0x0</Data>
        <Data Name='ElevatedToken'>%%1842</Data>
    </EventData>
</Event>
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ph20eow.gitbook.io/tech-stuff/parsing-evtx-to-json.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
