How to Use Map in Python to Read Mixxed Data From a Line
Introduction
A tab-delimited file is a well-known and widely used text format for data substitution. By using a structure like to that of a spreadsheet, it as well allows users to present data in a way that is piece of cake to understand and share across applications - including relational database management systems.
The IANA standard for tab-separated values requires the commencement line of the file to contain the field names. Additionally, other lines (which represent separate records) must have the aforementioned number of columns.
Other formats, such as comma-separated values, often pose the challenge of having to escape commas, which are frequent inside text (as opposed to tabs).
Opening Files with Python
Before we dive into processing tab-separated values, we volition review how to read and write files with Python. The following case uses the open()
built-in function to open a file named players.txt located in the current directory:
one with open ( 'players.txt' ) as players_data : 2 players_data . read ( )
python
The
open()
office accepts an optional parameter that indicates how the file will exist used. If not present, read-simply style is assumed. Other alternatives include, but are not limited to,'west'
(open up for writing in truncate mode) and'a'
(open for writing in append style).
Subsequently pressing Enter twice to execute the above suite, we will meet tabs (\t) betwixt fields, and new line breaks (\north) as record separators in Fig. ane:
Although nosotros will be primarily concerned with extracting information from files, we can also write to them. Again, notation the utilize of \n at the beginning to indicate a new record and \t to split fields:
1 with open ( 'players.txt' , 'a' ) as players_data : ii players_data . write ( '\n{}\t{}\t{}\t{}\t{}\t{}\t{}' . format ( 'Trey' , 'Burke' , '23' , '1.85' , '2013' , '79.4' , '23.two' ) )
python
Although the format()
function helps with readability, there are more efficient methods to handle both reading and writing - all available within the same module in the standard library. This is specially important if we are dealing with large files.
Introducing the CSV Module
Although information technology was named later on comma-separated values, the CSV module tin can manage parsed files regardless of the field delimiter - exist it tabs, vertical bars, or just about anything else. Additionally, this module provides two classes to read from and write data to Python dictionaries (DictReader and DictWriter, respectively). In this guide we will focus on the former exclusively.
Offset off, we volition import the CSV module:
Side by side, nosotros volition open up the file in read-just style, instantiate a CSV reader object, and use it to read one row at a time:
1 with open up ( 'nba_games_november2018_visitor_wins.txt' , newline = '' ) as games : two game_reader = csv . reader ( games , delimiter = '\t' ) 3 for game in game_reader : 4 print ( game )
python
Although information technology is not strictly necessary in our instance, we volition pass
newline = ''
as an argument to theopen()
function as per the module documentation. If our file contains newlines inside quoted fields, this ensures that they volition be processed correctly.
Fig. 2 shows that each row was read into a list subsequently the above suite was executed:
Although this undoubtedly looks much better than our previous version where tabs and new lines were mixed with the bodily content, there is notwithstanding room for improvement.
The DictReader Class
To begin, we will create an empty list where we will shop each game as a separate lexicon:
Finally, we will repeat the same code every bit to a higher place with only a minor change. Instead of printing each row, we will add it to games_list. If y'all are using Python 3.five or older, y'all can omit dict()
and use games_list.suspend(game)
instead. In Python three.6 and newer, this function is used to plow the ordered dictionary into a regular one for better readability and easier manipulation.
1 with open ( 'nba_games_november2018_visitor_wins.txt' , newline = '' ) as games : two game_reader = csv . DictReader ( games , delimiter = '\t' ) 3 for game in game_reader : 4 games_list . suspend ( dict ( game ) )
python
We can get one step further and use list comprehension to return only those games where the visitor score was greater than 130. The post-obit statement creates a new list called visitor_big_score_games and populates it with each game inside games_list where the status is true:
one visitor_big_score_games = [ game for game in games_list if int ( game [ 'Visitor score' ] ) > 130 ]
python
Now that we have a list of dictionaries, we tin write information technology to a spreadsheet equally explained in Importing Data from Microsoft Excel Files with Python or manipulate it otherwise. Another option consists of writing the list converted to string into a plain text file named visitor_big_score_games.json for distribution in JSON format:
1 with open up ( 'visitor_big_score_games.json' , 'west' ) as games : ii games . write ( str ( visitor_big_score_games ) )
python
The
write()
function requires a string every bit an argument. That is why we had to convert the entire list into a string before performing the write operation.
If you but want to view the list, not turn it into a spreadsheet or a JSON file, yous can alternatively utilise pprint()
to display it in a user-friendly format every bit shown in Fig. iii:
i import pprint equally pp 2 pp . pprint ( visitor_big_score_games )
python
As you can meet, the possibilities are endless and the just limit is our imagination!
Summary
In this guide nosotros learned how to import and manipulate information from tab-delimited files with Python. This not only is a highly valuable skill for data scientists, only for web developers and other Information technology professionals as well.
Source: https://www.pluralsight.com/guides/importing-data-from-tab-delimited-files-with-python
0 Response to "How to Use Map in Python to Read Mixxed Data From a Line"
Post a Comment