11.1Review
Typical data analysis involves not only writing and executing code, but also writing text and displaying graphics that help tell the story of the analysis. In fact, we'd prefer thatPredictthese three media, with text and images serving as a narrative for the code and its output. In this chapter, we'll show you how to achieve this using Jupyter Notebooks, a common data science coding platform. Jupyter notebooks do exactly what we need: they let you combine text, images, and (executable!) code in a single document. In this chapter we pay attention touseJupyter notebooks for R programming and writing text via a web interface. These skills are essential to performing analysis; think of it as getting dressed in the morning! Note that we assume you already have Jupyter set up and ready to go. If not, read the chapter first13to learn how to install and configure Jupyter on your own computer.
11.2Chapter Learning Objectives
At the end of the chapter, readers can:
- Create new Jupyter notebooks.
- Write, edit, and run R code in a Jupyter notebook.
- Write, edit, and view text in a Jupyter notebook.
- Open and view plain text data files in Jupyter.
- Export Jupyter notebooks to other standard file types (eg.
.html
,.pdf
).
11.3Jupiter
Jupyter is an online interactive development environment for creating, editing and running documents called Jupyter Notebooks. Jupyter notebooks are documents that contain a combination of computer code (and its output) and formatting table text. Because they combine these two analytic artifacts into a single document—the code is not separate from the output or written report—notebooks are one of the leading tools for creating repeatable data analytics. Repeatable data analysis is one where you can reliably and easily reproduce the same results when analyzing the same data. While this sounds like something that should always be true for any data analysis, in reality it often isn't; a conscious effort must be made to analyze the data in a reproducible manner. The image shows an example of what Jupyter's notebook looks like11.1.
Figure 11.1: Screenshot of Jupyter Notebook.
11.3.1Access to Jupiter
One of the easiest ways to get started with Jupyter is to use a web platform called JupyterHub. JupyterHubs often have Jupyter, R, a suite of R packages, and collaboration tools installed, configured, and ready to use. JupyterHubs are typically created and hosted by organizations and require authentication for access. For example, if you are reading this book as part of a course, your instructor may have set up JupyterHubal for you! Jupyter can also be installed on your own computer; see chapter13for instructions.
11.4Code the cells
The parts of a Jupyter notebook that contain code are called code cells. A code cell that has not yet been executed does not have a number in square brackets to the left of the cell (Fig11.2). Executing a code cell executes all the code in it and displays the output (if any) directly below the code that generated it. Output can include printed text or numbers, data frames, and data visualizations. Cells that are triggered also have a number in square brackets to the left of the cell. This number indicates the order in which the cells were started (Fig11.3).
Figure 11.2: A cell of Jupyter code not yet executed.
Figure 11.3: Executed Jupyter code cell.
11.4.1Run the code cells
Code cells can be executed independently or as part of an entire notebook using one of the "Start everythingcommands found inStartLubGrainmenus in Jupyter. Running a single cell of code independently is a workflow commonly used when editing or writing your own R code. Running the entire notebook is a workflow that is commonly used to ensure that the analysis is done in its entirety before sharing with others and when the notebook is used as part of an automated process.
For a code cell to work independently, it must first be activated. You do this by clicking on it with the cursor. Jupiter indicates that the cell is activated by marking it with a blue rectangle on the left. After cell activation (Fig11.4), the cell can be started by pressing a keyStart() on the toolbar or using a keyboard shortcutShift + Enter
.
Figure 11.4: Activated cell ready for use. A blue rectangle to the left of a cell (indicated by a red arrow) indicates that it is ready to run. The cell can be started by clicking the start button (highlighted in red).
You have three options to run all cells of code in the entire notebook:
ChooseStart>>Start all cellsz-menu.
ChooseGrain>>Restart the kernel and start all the cells…from the menu (fig11,5).
Click the button () on the toolbar.
All of these commands execute all code cells in the notebook. However, there is a slight difference between them. In particular, only options 2 and 3 above will restart the R session before all cells have started; option 1 does not restart the session. Restarting the R session means that all previous objects created from active cells before running this command are deleted. In other words, restarting the session and starting all cells (option 2 or 3) emulates how the notebook code would work if Jupyter was completely restarted before starting the entire notebook.
Figure 11.5: You can restart an R session by clicking Reboot Kernel and Reboot All Cells...
11.4.2Grain
The kernel is the program that runs the code in your notebook and displays the results. Kernels for many different programming languages are built for Jupyter, which means that Jupyter can interpret and execute in many different programming languages. To run R code, your laptop needs an R kernel. In the upper right corner of the window, you'll see a circle indicating the status of your kernel. When the circle is empty (), the kernel is idle and ready to run code. If the circle is filled with (), the kernel executes the code.
You may encounter problems when the kernel hangs for too long, your laptop is very slow and unresponsive, or the kernel loses connection. If this happens, try the following:
- Press at the top of the screenGrain, IBreak the core.
- If that doesn't help, click hereGrain, IRestarting the kernel...For this you will need to run the code cells from the beginning of the notebook to where you left off.
- If that doesn't help either, restart Jupyter. First, save your work by clickingDurationin the upper left corner of the screenKeep the notebook. If you then access Jupyter using the JupyterHub server, zDurationclick on the menuHub control panel. ChooseStop my serverto then close itmy waiterrestart button. If you use Jupyter on your own computer, zDurationclick on the menuNearand restart Jupyter. Finally, go back to the notebook you were working on.
11.4.3Create new code cells
To create a new code cell in Jupyter (Fig11.6), Click+
button on the toolbar. By default, all new cells in Jupyter start as code cells, so after that you just need to write the R code in the newly created cell!
Figure 11.6: New cells can be created by clicking the + button. These are standard code cells.
11,5Markdown cells
Text cells in a Jupyter notebook are called Markdown cells. Markdown cells are text cells in a formatted format, which means you canboldandmake italictext, create topic headings, create bulleted and numbered lists, and more. These cells are called "Markdown" because they useMarkdown languageto specify rich text formatting. You don't need to learn Markdown to write text in Markdown cells in Jupyter; plain text works fine. Finally, you may want to learn a little more about Markdown so you can create beautifully designed analytics. See the additional resources at the end of this chapter to find out where to start learning Markdown.
11.5.1Edit Markdown cells
To edit a Markdown cell in Jupyter, you must double-click the cell. After you do this, unformatted (oris not displayed) the text version is shown (fig11.7). You can then use the keyboard to edit the text. To format (orshown) text (fig11.8), ClickStart() on the toolbar or use the buttonShift + Enter
Keyboard shortcut.
Figure 11.7: A Markdown cell in Jupyter is not yet displayed and cannot be edited.
Figure 11.8: A Jupyter Markdown cell displayed with rich text formatting.
11.5.2Create new Markdown cells
Click to create a new Markdown cell in Jupyter+
on the toolbar. By default, all new cells in Jupyter start as code cells, so the cell format must be changed to be recognized and displayed as a Markdown cell. To do this, click on the cell with the pointer to check if it is active. Then click the toolbar dropdown that says "Code" (next to the button) and change it from "Encode" Rade "Discount" (Ris11.9).
Figure 11.9: The new cells are standard code cells. To create Markdown cells, you need to change the cell format.
11.6Save your work
As with any file you work on, it's important to save your work often so you don't lose your progress! Jupyter has an autosave feature where open files are periodically saved. By default, it's every two minutes. You can also manually save a Jupyter notebook by selectingKeep the notebookCombiDurationmenu by clicking the disk icon on the toolbar or by pressing the keyboard shortcut (Controle + S
for Windows orCommando + S
za macOS).
11.7Best practices for booting laptops
11.7.1Best practices for running code cells
As you may already know (or at least imagine), Jupyter notebooks are great for interactively editing, writing, and running R code; that's what they were made for! Therefore, Jupyter notebooks are flexible with respect to the order in which a cell of code is executed. This flexibility means that code cells can be executed in any order usingStart() button. But this flexibility has a downside: it can lead to Jupyter notebooks whose code cannot be executed in linear order (from the top to the bottom of the notebook). A non-linear notebook is problematic because linear order is the conventional way to launch code documents, and others will have that expectation when they launch the notebook. Finally, if the code is used in an automated process, it should run in a linear order, from the top to the bottom of the notebook.
The most common way to accidentally create a non-linear notebook is to rely solely on using the cell trigger button. Suppose you are writing R code that creates an R object, for example a variable namedj
. When you run this cell and createj
, persists until intentionally deleted via R code or when a Jupyternotebook R session (tj., kernel) is stopped or restarted. It can also be referenced in another separate code cell (Fig11.10Together, this means that you can write down the cell containing the above code in the referenced notebookj
and run it without errors in the current session (Fig11.11). This can also be done successfully in future sessions if and only if you run the cells in the same unconventional order. However, this unconventional order is difficult to remember and is not the order in which other people would expect the code to execute. Therefore, it may lead to errors in the future when the laptop is booted in the usual linear order (Fig11.12).
Figure 11.10: Code written incorrectly but not yet executed.
Figure 11.11: Code written in a different order and executed without errors using the Run Nonlinear Order button. The order of execution can be followed by following the numbers to the left of the code cells; their order indicates the order in which cells are executed.
Figure 11.12: Code written in the wrong order and executed in linear order using the "Restart kernel and run all cells..." option This caused an error when executing the second code cell and failed to execute all code cells in the notebook.
You can also accidentally create a non-workbook by creating an object in a cell that will later be deleted. In such a scenario this object exists only for that one specific R session and will not exist after restarting and restarting the notebook. If this object was specified in another cell in that notebook, it would fail when restarting the notebook in a new session.
These events should not adversely affect the current R session while you are writing code; but as you can see now, they will likely lead to errors when this notebook is run in a future session. You can avoid this by regularly running the entire notebook in a new R session. If you restart the session and get new errors when you run all the cells in linear order, you can at least be aware that there is a problem. Knowing this as soon as possible will help you fix the problem and keep your laptop running linearly from start to finish.
As a best practice, we recommend running the entire notebook in a new R session at least 2-3 times during each work shift. Remember that you are criticalmust do so in a new R sessionby restarting the kernel. We recommend using any of themGrain>>Restart the kernel and start all the cells…command from a menu button or toolbar. pay attention to itStart>>Start all cellsmenu item will not restart the kernel, so it is not enough to protect against these bugs.
11.7.2Best practices for including R packages in notebooks
Currently, most data analysis relies on functions of external R packages that are not built into R. An example isorderly world
metapackage that we rely heavily on in this book. This package gives us access to features such ascsv_reading
read data,choose
to subsets of columns iggplot
to create high quality images.
As mentioned earlier in the book, external R packages must be loaded before the functions they contain can be used. Our recommended way to do this is vialibrary (package name)
. But where should this line of code be written in the Jupyter notebook? An idea might be to load the library just before using the function in the notebook. While this technically works, it causes hidden or at least non-obvious dependencies on the R package when other viewers either try to run the laptop. These hidden dependencies can lead to errors when running the laptop on another computer if the required R packages are not installed. If your data analysis code takes a long time to run, it can take a long time to discover hidden dependencies that need to be installed for the analysis to run without errors.
Therefore, we recommend that you load all R packages into the code cell at the top of the Jupyter notebook. Loading all packages first ensures that all packages are loaded before calling their functions, assuming the laptop boots in a linear top-down order as recommended above. It also makes it easy for others viewing or using a laptop to see which external R packages are used in the analysis, and thus which packages they need to install on their computer to run the analysis successfully.
11.7.3A summary of best practices for running laptops
Write the code so that it can be executed in linear order.
When writing code in a Jupyter notebook, run the notebook regularly in a linear sequence and in its entirety (2-3 times per work session) withGrain>>Restart the kernel and start all the cells…Jupyter menu command or toolbar button.
Write code that loads external R packages on top of Jupyternotebook.
(Video) Qualitative Coding Tutorial: How To Code Qualitative Data For Analysis (4 Steps + Examples)
11.8Explore the data files
It is essential to inspect the data files before attempting to load them into R to see if there are any column names, what the delimiters are, and if there are any skip lines. In Jupyter, you can view data files that are saved as plain text files (for example, comma- and tab-separated files) in their plain text format (Figure11.14) by right-clicking on the file name in the Jupyter file explorer and selectingopen withand then selectEditor(Lik11.13Let's say you don't want to open the data file with an editor. In that case, Jupyter will display a nice table for you and you won't be able to see the column separators and so you won't know what function to use or what arguments to use or what values to specify for them.
Figure 11.13: Opening data files with the editor in Jupyter.
Figure 11.14: Data file displayed in the editor in Jupyter.
11.9Export to another file format
In Jupyter, R code is viewed, edited, and executed in the Jupyter notebookfile format with the file extension.ipynb
. This file format is not easy to open and view outside of Jupyter. Therefore, to share your analysis with people who do not normally use Jupyter, it is recommended to export your analysis as a more common file type, such as.html
file or a.pdf
. We recommend that you export a Jupyter notebook after performing your analysis so that you can also share your code output. But keep in mind that your audience can'tstartYour analysis with a.html
Lub.pdf
file. If you want your audience to be able to replay the analytics, you need to share the file with them.ipynb
Jupyter-notebookbestand.
11.9.1Export to HTML
export to.html
will share a file that anyone can open via a web browser (eg Firefox, Safari, Chrome or Edge). The.html
output creates a document that visually resembles what the Jupyter Notebook looked like in Jupyter. One note is that if there are images in the Jupyter notebook, you must split the image files and.html
file to view them.
11.9.2Export to PDF
export to.pdf
will share a file that anyone can open with many programs, including Adobe Acrobat, Preview, web browsers, and many others. The advantage of exporting to PDF is that it is a stand-alone document, even if the Jupyter Notebook contains references to image files. Unfortunately, the default settings make the document look completely different from a Jupyter notebook. The font, page margins, and other details look different in the file.pdf
Exit.
11.10Create a new Jupyter notebook
At some point, you'll want to create a new Jupyter notebook for your own project, rather than browsing, running, or editing a notebook started by someone else. Go to for thisInitiatortab and click the R icon belowNotebookcolumn. unlessInitiatorcard visible, you can get a new one by clicking+button at the top of Jupyter File Explorer (Fig11:15 am).
Figure 11.15: Clicking on the R icon under the Notebook heading creates a new Jupyter notebook with the R kernel.
When you create a new Jupyter notebook, you must give it a descriptive name, such as the default filenameUntitled.ipynb
. You can rename files by first right-clicking on the name of the notebook file you just created, and then clickingRename. Edits the file name. Use the keyboard to change the name. UrgentlyA nurse
or clicking anywhere else in the Jupyter interface saves the renamed file.
We do not recommend using spaces or non-standard characters in file names. This does not prevent you from using this file in Jupyter. However, such things become a hassle when you start doing more advanced data science projects that involve repetition and automation. We recommend naming files with lowercase letters and separating words with a hyphen (-
) or podvlaka (_
).
11.11Additional funds
- FromJupyterLab documentationis another place to learn more about working in Jupyternotebooks. This documentation goes into much more detail on all the topics we've covered in this chapter, as well as more advanced topics.
- If you want to learn more about the Markdown rich text formatting language, two good places to start are CommonMarkMarkdown cheat sheetandMarkdown Guide.
FAQs
How do you switch between code and Markdown in JupyterLab? ›
You can change the cell type of any cell in Jupyter Notebook using the Toolbar. The default cell type is Code. To use the Keyboard Shortcuts, hit the esc key. After that, you can change a cell to Markdown by hitting the m key, or you can change a cell to Code by hitting the y key.
How do I add text to a Jupyter Notebook? ›Writing text
If you want to use the notebook for code- that's great! Start typing commands. If however, you are using the notebook for writing then you need a different box- a text box. Click on the code box, and click on the 'Cell' menu at the top of the screen. From there select 'Cell type' and click 'Markdown'.
- **bold text** __bold text__ <strong>bold text</strong> <b>bold text</b>
- *italic text* _italic text_ <em>italic text</em> <i>italic text</i>
- ~strike text~ <strike>strike text</strike> <del>strike text<del>
- ***bold and italic text***
Code cells
A code cell allows you to edit and write new code, with full syntax highlighting and tab completion.
Press Esc key, type m for markdown cell, press Enter key. The cursor is now in the markdown cell waiting for instructions. Type your code or paste a code block.
What is the shortcut for Markdown and code in Jupyter Notebook? ›Markdown cells can be selected in Jupyter Notebook by using the drop-down or also by the keyboard shortcut 'm/M' immediately after inserting a new cell.
Can text be added to jupyter notebooks using Markdown cells? ›Markdown cell displays text which can be formatted using markdown language. In order to enter a text which should not be treated as code by Notebook server, it must be first converted as markdown cell either from cell menu or by using keyboard shortcut M while in command mode.
How do you add text to text in Python? ›The plus equal operator (+=) appends to strings and creates a new string while not changing the value of the original string.
Can I use Jupyter Notebook as a text editor? ›A Jupyter notebook is neither a simple text editor nor a full-featured IDE. Jupyter notebooks provide a quick and streamlined way for problem-solvers to prototype code and quickly share code.
How do you make text cross? ›- Select the text you want to strike through.
- Press Ctrl+D. A font dialog box will appear.
- Press Alt+K. The strikethrough feature should now be selected.
- Press OK. The text will now have a line through it.
How do you cross text in Markdown? ›
Using markdown, you can cross out text (strikethrough text) by putting two tildes (~) before and after the words you want to cross out.
How do you slash through text in Markdown? ›Strikethroughs. In order to create a crossed-out text, use the tilde in Markdown twice in a row, followed by the respective text and then another two tildes. ~~This text is struckthrough.
Where do you edit and write code in Jupyter Notebook? ›- A newly created notebook contains one code cell. You can change its type with the cell type selector in the notebook toolbar:
- To edit a code cell, just click it.
- To edit a Markdown cell, double-click it and start typing. To preview the output, press Shift + Enter .
- Create a function.
- Ensure the function has an intuitive name.
- Document the function with docstring.
- (Ideally) Unit test the function.
- Save the function in a . py file (. py file is referred as module)
- Import module in Notebook to access the function.
- Use the function in Notebook.
A Jupyter Notebook consists of three main components: cells, a runtime environment, and a file system. Cells are the individual units of the notebook, and they can contain either text or code: Text cells are used to write narrative text and include images, links, and equations.
How do I embed code in Markdown? ›There are two ways to format code in Markdown. You can either use inline code, by putting backticks (`) around parts of a line, or you can use a code block, which some renderers will apply syntax highlighting to.
How do I add code to Markdown? ›The basic Markdown syntax allows you to create code blocks by indenting lines by four spaces or one tab. If you find that inconvenient, try using fenced code blocks. Depending on your Markdown processor or editor, you'll use three backticks ( ``` ) or three tildes ( ~~~ ) on the lines before and after the code block.
How do I show Python code in Markdown? ›To add a Python code chunk to an R Markdown document, you can use the chunk header ```{python} , e.g., ```{python} print("Hello Python!") ```
What is the difference between Jupyter Notebook and Markdown? ›html or . pdf typically). Rmarkdown is based on markdown, a human readable markup language, Jupyter notebooks are based on JSON, a data interchange format common on the web. This means that Rmarkdown files can be easily edited using any text editor you like.
What is Markdown used for in Jupyter Notebook? ›Using Markdown, you can quickly include headers, links, images, bold, italic text, paragraphs, and ordered or unordered lists. In this guide, I'll be using Jupyter Notebook to demonstrate Markdown, however note that Markdown is not Jupyter specific. Many other services and products use it to allow easy text formatting.
How do I convert text to Markdown in Python? ›
- Load TXT file with an instance of Workbook.
- Convert TXT to MARKDOWN by calling Workbook.save method.
Cells in Jupyter notebook are of three types − Code, Markdown and Raw.
How do you write Markdown and LaTeX in Jupyter notebook? ›The Jupyter Notebook uses MathJax to render LaTeX inside HTML / Markdown. Just put your LaTeX math inside $ $ . Or enter in display math mode by writing between $$ $$ . The [n] is optional.
How do you add text after input in Python? ›What you need is controlling the terminal's cursor. After user giving input and hit enter, cursor moves to next line. You need to move the cursor up one line then move it to end of that line, print % then move cursor back to the new line. You can do that using Terminal's escape sequence.
How do you make text clickable in Python? ›- import tkinter as tk.
- # Create the main window.
- window = tk.Tk()
- # Create a function that will be called when the button is clicked.
- def on_button_click():
- print("Button was clicked!")
- # Create a button with the text "Click me!"
Open file in append mode and write to it
Open the file in append 'a' mode, and write to it using write() method. Inside write() method, a string "new text" is passed. This text is seen on the file as shown above. If you want to learn more about different types of file opening modes, please refer to Python file I/O.
Jupyter notebook is the most commonly used and popular python IDE used by data scientists. It is a web-based computation environment to create Jupyter notebooks, which are documents that contain code, equations, visualizations, and narrative text.
How do I run code in Jupyter text editor? ›When you open a new Jupyter notebook, you'll notice that it contains a cell. Cells are how notebooks are structured and are the areas where you write your code. To run a piece of code, click on the cell to select it, then press SHIFT+ENTER or press the play button in the toolbar above.
Is Jupyter notebook a Python editor? ›Jupyter (formerly IPython Notebook) is an open-source project that lets you easily combine Markdown text and executable Python source code on one canvas called a notebook. Visual Studio Code supports working with Jupyter Notebooks natively, and through Python code files.
How do I open markdown in VS code? ›Tip: You can also right-click on the editor Tab and select Open Preview (Ctrl+Shift+V) or use the Command Palette (Ctrl+Shift+P) to run the Markdown: Open Preview to the Side command (Ctrl+K V).
How do I view markdown in VS code? ›
- To open a separate preview window, use the keyboard shortcut Ctrl+Shift+V.
- To open side by side, use the keyboard shortcut Ctrl+K V.
Click the Select cell language icon on the cell toolbar and select Convert to Markdown. Use the Ctrl + M shortcut.
How do I switch between command and edit mode in Jupyter? ›- Edit mode: To enter the EDIT mode, press ENTER on your keyboard or click in a cell. ...
- Command mode: To enter the COMMAND mode press ESC or click anywhere outside the cell. ...
- Mouse navigation: ...
- Keyboard navigation: ...
- Few other examples of cell magic:
- Open HTML to Markdown tool and Copy and Paste HTML Code in Input Text Editor.
- If you do have a file, you can upload the file using the Upload file button. ...
- Click on HTML to Markdown button once data is available in Text Editor, via Paste, File, or URL.
To create inline code, wrap with backticks ` . To create a code block, either indent each line by 4 spaces, or place 3 backticks ``` on a line above and below the code block. A code block or span displays every character inside exactly as it was typed.
How is Markdown displayed? ›Markdown applications use something called a Markdown processor (also commonly referred to as a “parser” or an “implementation”) to take the Markdown-formatted text and output it to HTML format. At that point, your document can be viewed in a web browser or combined with a style sheet and printed.
How do you edit a Markdown cell in Jupyter? ›To edit a code cell, just click it. To edit a Markdown cell, double-click it and start typing. To preview the output, press Shift + Enter .
What is the shortcut for code in Jupyter? ›To run a code in Jupyter, you can either press Shift + Enter to run the code in the current cell and move to the next cell, or Ctrl + Enter to run the code in the current cell without moving the cursor.